Objects2action: Classifying and localizing actions without any video example

M. Jain; J.C. van Gemert; T. Mensink; C.G.M. Snoek

doi:https://doi.org/10.1109/ICCV.2015.521

Objects2action: Classifying and localizing actions without any video example

Authors	M. Jain J.C. van Gemert T. Mensink C.G.M. Snoek
Publication date	2015
Book title	Proceedings: 2015 IEEE International Conference on Computer Vision: 11-18 December 2015, Santiago, Chile
ISBN	9781467383905
Event	ICCV 2015: IEEE International Conference on Computer Vision
Pages (from-to)	4588-4596
Publisher	Los Alamitos, CA: IEEE Computer Society
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach.
Document type	Conference contribution
Language	English
Published at	https://doi.org/10.1109/ICCV.2015.521
Downloads	JainICCV2015 (Submitted manuscript) Objects2action_Classifying_and_Localizing_Actions_without_Any_Video_Example (Final published version)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Objects2action: Classifying and localizing actions without any video example