Constructing Graphical Models for Multi-Source Data: Sparse Network and Component Analysis

Authors
Publication date 2020
Host editors
  • T. Imaizumi
  • A. Okada
  • S. Miyamoto
  • F. Sakaori
  • Y. Yamamoto
  • M. Vichi
Book title Advanced Studies in Classification and Data Science
ISBN
  • 9789811533105
ISBN (electronic)
  • 9789811533112
Series Studies in Classification, Data Analysis, and Knowledge Organization
Event Biennial Conference of the International Federation of Classification Societies, IFCS 2017
Pages (from-to) 275-287
Number of pages 13
Publisher Singapore: Springer
Organisations
  • Faculty of Social and Behavioural Sciences (FMG) - Psychology Research Institute (PsyRes)
Abstract
Gaussian graphical models (GGMs) are a popular method for analysing complex data by modelling the unique relationships between variables. Recently, a shift in interest has taken place from investigating relationships within a (sub)discipline (e.g. genetics) to estimating relationships between variables from various subdisciplines (e.g. how gene expression relates to cognitive performance). It is thus not surprising that there is an increasing need for analysing large, so-called multi-source datasets, each containing detailed information from many data sources on the same individuals. GGMs are a straightforward statistical candidate for estimating unique cross-source relationships from such network-oriented data. However, the multi-source nature of the data poses two challenges: First, different sources may inherently differ from one another, biasing the estimation of the relations. Second, GGMs are not cut out for separating cross-source relationships from all other, source-specific relationships. In this paper we propose the addition of a simultaneous-component-model pre-processing step to the Gaussian graphical model, the combination of which is suitable for estimating cross-source relationships from multi-source data. Compared to the graphical lasso (a commonly used GGM technique), this Sparse Network and Component (SNAC) model more accurately estimates the unique cross-source relationships from multi-source data. This holds in particular when the multi-source data contains more variables than observations (p > n). Neither differences in sparseness of the underlying component structure of the data nor in the relative dominance of the cross-source compared to source-specific relationships strongly affect the relationship estimates. Sparse Network and Component analysis, a hybrid component-graphical model, is a promising tool for modelling unique relationships between different data sources, thus providing insight into how various disciplines are connected to one another.
Document type Conference contribution
Language English
Published at https://doi.org/10.1007/978-981-15-3311-2_22
Other links https://www.scopus.com/pages/publications/85092146461
Permalink to this page
Back