Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis

P. Tio; L. Waldorp; K. VanDeun

doi:https://doi.org/10.1007/978-981-15-3311-2_22

Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis

Authors	P. Tio L. Waldorp K. VanDeun
Publication date	2020
Host editors	T. Imaizumi A. Okada S. Miyamoto F. Sakaori Y. Yamamoto M. Vichi
Book title	Advanced Studies in Classification and Data Science
ISBN	9789811533105
ISBN (electronic)	9789811533112
Series	Studies in Classification, Data Analysis, and Knowledge Organization
Event	Biennial Conference of the International Federation of Classification Societies, IFCS 2017
Pages (from-to)	275-287
Number of pages	13
Publisher	Singapore: Springer
Organisations	Faculty of Social and Behavioural Sciences (FMG) - Psychology Research Institute (PsyRes)
Abstract	Gaussian graphical models (GGMs) are a popular method for analysing complex data by modelling the unique relationships between variables. Recently, a shift in interest has taken place from investigating relationships within a (sub)discipline (e.g. genetics) to estimating relationships between variables from various subdisciplines (e.g. how gene expression relates to cognitive performance). It is thus not surprising that there is an increasing need for analysing large, so-called multi-source datasets, each containing detailed information from many data sources on the same individuals. GGMs are a straightforward statistical candidate for estimating unique cross-source relationships from such network-oriented data. However, the multi-source nature of the data poses two challenges: First, different sources may inherently differ from one another, biasing the estimation of the relations. Second, GGMs are not cut out for separating cross-source relationships from all other, source-specific relationships. In this paper we propose the addition of a simultaneous-component-model pre-processing step to the Gaussian graphical model, the combination of which is suitable for estimating cross-source relationships from multi-source data. Compared to the graphical lasso (a commonly used GGM technique), this Sparse Network and Component (SNAC) model more accurately estimates the unique cross-source relationships from multi-source data. This holds in particular when the multi-source data contains more variables than observations (p > n). Neither differences in sparseness of the underlying component structure of the data nor in the relative dominance of the cross-source compared to source-specific relationships strongly affect the relationship estimates. Sparse Network and Component analysis, a hybrid component-graphical model, is a promising tool for modelling unique relationships between different data sources, thus providing insight into how various disciplines are connected to one another.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1007/978-981-15-3311-2_22 (Final published version)
Other links	https://www.scopus.com/pages/publications/85092146461
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis