Machine learning for multi-source data integration

Open Access
Authors
Supervisors
Cosupervisors
Award date 26-06-2025
Number of pages 176
Organisations
  • Faculty of Science (FNWI) - Korteweg-de Vries Institute for Mathematics (KdVI)
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
  • Faculty of Science (FNWI) - Swammerdam Institute for Life Sciences (SILS)
Abstract
This thesis develops statistical and machine learning methods for integrating multi-source data, with a focus on gene co-expression inference. Real-world datasets often violate standard assumptions such as independence and identical distribution of samples, especially in biological research. To address this, the thesis proposes generative models and sequential testing frameworks that increase robustness and flexibility. The first part introduces two generative approaches that reframe data fusion as a noisy multi-view independent component analysis problem. This modeling approach facilitates downstream tasks, in particular gene co-expression inference. The second part advances sequential hypothesis testing using E-values, which allow continuous monitoring and rigorous control of type I error without requiring corrections for multiple testing. Two methods are proposed that use machine learning and test martingales to perform a wide range of statistical tests.
Document type PhD thesis
Language English
Downloads
Permalink to this page
cover
Back