Machine learning for multi-source data integration

Authors	T.P. Pandeva
Supervisors	P.D. Forré L.W. Hamoen
Cosupervisors	J.M. Mooij M.J. Jonker
Award date	26-06-2025
Number of pages	176
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	This thesis develops statistical and machine learning methods for integrating multi-source data, with a focus on gene co-expression inference. Real-world datasets often violate standard assumptions such as independence and identical distribution of samples, especially in biological research. To address this, the thesis proposes generative models and sequential testing frameworks that increase robustness and flexibility. The first part introduces two generative approaches that reframe data fusion as a noisy multi-view independent component analysis problem. This modeling approach facilitates downstream tasks, in particular gene co-expression inference. The second part advances sequential hypothesis testing using E-values, which allow continuous monitoring and rigorous control of type I error without requiring corrections for multiple testing. Two methods are proposed that use machine learning and test martingales to perform a wide range of statistical tests.
Document type	PhD thesis
Language	English
Downloads	Thesis
Permalink to this page

Back

UvA-DARE