Multimodal learning under visually challenging conditions
| Authors | |
|---|---|
| Supervisors | |
| Cosupervisors | |
| Award date | 19-11-2024 |
| Number of pages | 178 |
| Organisations |
|
| Abstract |
While many multimodal machine learning methods have obtained superior accuracy compared to single-sense unimodal approaches, they implicitly assume that the visual modality is always clear. However, their common assumption is easy to falsify since poor vision conditions are common in everyday practice. We find that when the vision conditions are challenging, existing machine learning methods often cannot effectively represent the information from other modalities. Hence, they rely too much on the visual modality, as it is usually reliable and informative in the training data. As a result, they cannot adapt when the visual conditions become poor and start to contain misleading information. Moreover, the multimodal model has never learned to find cross-modal correspondences in visually challenging scenarios. This thesis aims at studying multimodal learning under visually challenging conditions. We investigate each type of change in a separate chapter together with our solutions for more effective multimodal representation learning. Finally, we provide a brief conclusion in the last chapter of the thesis. We hope our journey is able to stimulate more research on multimodal learning under visually challenging conditions.
|
| Document type | PhD thesis |
| Language | English |
| Downloads | |
| Permalink to this page | |
