VideoGraph: Recognizing Minutes-Long Human Activities in Videos
| Authors | |
|---|---|
| Publication date | 13-10-2019 |
| Event | 1st Workshop on Graph Based Learning in Computer Vision |
| Number of pages | 10 |
| Organisations |
|
| Abstract |
Many human activities take minutes to unfold. To represent them, related works opt for statistical pooling, which neglects the temporal structure. Others opt for convolutional methods, as CNN and Non-Local. While successful in learning temporal concepts, they are short of modeling minutes-long temporal dependencies. We propose VideoGraph, a method to achieve the best of two worlds: represent minutes-long human activities and learn their underlying temporal structure. VideoGraph learns a graph-based representation for human activities. The graph, its nodes and edges are learned entirely from video datasets, making VideoGraph applicable to problems without node-level annotation. The result is improvements over related works on benchmarks: Epic-Kitchen and Breakfast. Besides, we demonstrate that VideoGraph is able to learn the temporal structure of human activities in minutes-long videos.
|
| Document type | Paper |
| Note | “According to organizer of the workshop, it was a mistake that this contribution was not published in the proceedings.” |
| Language | English |
| Published at | https://arxiv.org/abs/1905.05143 |
| Other links | https://cs.stanford.edu/people/ranjaykrishna/sgrl/index.html#accepted |
| Downloads |
hussein2019videograph
(Final published version)
|
| Permalink to this page | |
