VideoGraph: Recognizing Minutes-Long Human Activities in Videos

Authors	N. Hussein E. Gavves A.W.M. Smeulders
Publication date	13-10-2019
Event	1st Workshop on Graph Based Learning in Computer Vision
Number of pages	10
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos.
Document type	Paper
Note	“According to organizer of the workshop, it was a mistake that this contribution was not published in the proceedings.”
Language	English
Published at	https://arxiv.org/abs/1905.05143
Other links	https://cs.stanford.edu/people/ranjaykrishna/sgrl/index.html#accepted
Downloads	hussein2019videograph (Final published version)
Permalink to this page

Back

UvA-DARE