VideoStory: A New Multimedia Embedding for Few-Example Recognition and Translation of Events

Authors
Publication date 2014
Book title MM '14: proceedings of the 2014 ACM Conference on Multimedia: November 3-7, 2014, Orlando, Florida, USA
ISBN
  • 9781450330633
Event 22nd ACM International Conference on Multimedia
Pages (from-to) 17-26
Publisher New York: ACM
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
This paper proposes a new video representation for few-example event recognition and translation. Different from existing representations, which rely on either low-level features, or pre-specified attributes, we propose to learn an embedding from videos and their descriptions. In our embedding, which we call VideoStory, correlated term labels are combined if their combination improves the video classifier prediction. Our proposed algorithm prevents the combination of correlated terms which are visually dissimilar by optimizing a joint-objective balancing descriptiveness and predictability. The algorithm learns from textual descriptions of video content, which we obtain for free from the web by a simple spidering procedure. We use our VideoStory representation for few-example recognition of events on more than 65K challenging web videos from the NIST TRECVID event detection task and the Columbia Consumer Video collection. Our experiments establish that i) VideoStory outperforms an embedding without joint-objective and alternatives without any embedding, ii) The varying quality of input video descriptions from the web is compensated by harvesting more data, iii) VideoStory sets a new state-of-the-art for few-example event recognition, outperforming very recent attribute and low-level motion encodings. What is more, VideoStory translates a previously unseen video to its most likely description from visual content only.
Document type Conference contribution
Language English
Published at https://doi.org/10.1145/2647868.2654913
Permalink to this page
Back