Machine learning for multi-source data integration
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 26-06-2025 |
| Number of pages | 176 |
| Organisations |
|
| Abstract |
This thesis develops statistical and machine learning methods for integrating multi-source data, with a focus on gene co-expression inference. Real-world datasets often violate standard assumptions such as independence and identical distribution of samples, especially in biological research. To address this, the thesis proposes generative models and sequential testing frameworks that increase robustness and flexibility. The first part introduces two generative approaches that reframe data fusion as a noisy multi-view independent component analysis problem. This modeling approach facilitates downstream tasks, in particular gene co-expression inference. The second part advances sequential hypothesis testing using E-values, which allow continuous monitoring and rigorous control of type I error without requiring corrections for multiple testing. Two methods are proposed that use machine learning and test martingales to perform a wide range of statistical tests.
|
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
