Oxford TRECVID 2006 - Notebook paper

Open Access
Authors
  • J. Sivic
  • A. Zisserman
Publication date 2006
Book title Proceedings of the 4th TRECVID Workshop
Publisher Gaithersburg, USA: NIST
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
The Oxford team participated in the high-level feature extraction and
interactive search tasks. A vision only approach was used for both
tasks, with no use of the text or audio information.

For the high-level feature extraction task, we used two different
approaches, one using sparse and one using dense visual features to
learn classifiers for all 39 required concepts, using the training data
supplied by MediaMill [Snoek et al. '06] for the 2005 data. In
addition, we also used a face specific classifier, with features
computed for specific facial parts, to facilitate answering
people-dependent queries such as ``government leader''. We submitted 3
different runs for this task. OXVGG_A was the result of using the dense
visual features only. OXVGG_OJ was the result of using the sparse
visual features for all the concepts, except for "government leader",
"face" and "person", where we prepended the results from the face
classifier. OXVGG_AOJ was a run where we applied rank fusion to merge
the outputs from the sparse and dense methods with weightings tuned to
the training data, and also prepended the face results for "face",
"person" and "government leader". In general, the sparse features
tended to perform best on the more object based concepts, such as "US
flag", while the dense features performed slightly better on more scene
based concepts, such as "military". Overall, the fused run did the best
with a Mean Average (inferred) Precision (MAP) of 0.093, the sparse run
came second with a MAP of 0.080, followed by the dense run with a MAP
of 0.053.

For the interactive search task, we coupled the results generated
during the high-level task with methods to facilitate efficient and
productive interactive search. Our system allowed for several
"expansion" methods based on the sparse and dense features, as well as a
novel on the fly face classification system, which coupled a Google
Images search with rapid Support Vector Machine (SVM) training and
testing to return results containing a particular person within a few
minutes. We submitted just one run, OXVGG_TVI, which performed well,
winning two categories and coming above the median in 18 out of 24
queries.
Document type Conference contribution
Language English
Published at http://www.robots.ox.ac.uk/~vgg/publications/papers/philbin06.pdf
Downloads
Permalink to this page
Back