Heterogeneous Non-Local Fusion for Multimodal Activity Recognition

doi:https://doi.org/10.1145/3372278.3390675

Heterogeneous Non-Local Fusion for Multimodal Activity Recognition

Authors	P. Byvshev P. Mettes Y. Xiao
Publication date	2020
Book title	ICMR '20
Book subtitle	proceedings of the 2020 International Conference on Multimedia Retrieval : June 08-11, 2020, Dublin, Ireland
ISBN (electronic)	9781450370875
Event	10th ACM International Conference on Multimedia Retrieval, ICMR 2020
Pages (from-to)	63-72
Publisher	New York, NY: The Association for Computing Machinery
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	In this work, we investigate activity recognition using multimodal inputs from heterogeneous sensors. Activity recognition is commonly tackled from a single-modal perspective using videos. In case multiple signals are used, they come from the same homogeneous modality, e.g. in the case of color and optical flow. Here, we propose an activity network that fuses multimodal inputs coming from completely different and heterogeneous sensors. We frame such a heterogeneous fusion as a non-local operation. The observation is that in a non-local operation, only the channel dimensions need to match. In the network, heterogeneous inputs are fused, while maintaining the shapes and dimensionalities that fit each input. We outline both asymmetric fusion, where one modality serves to enforce the other, and symmetric fusion variants. To further promote research into multimodal activity recognition, we introduce GloVid, a first-person activity dataset captured with video recordings and smart glove sensor readings. Experiments on GloVid show the potential of heterogeneous non-local fusion for activity recognition, outperforming individual modalities and standard fusion techniques.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1145/3372278.3390675
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Heterogeneous Non-Local Fusion for Multimodal Activity Recognition