Image Classification with the Fisher Vector: Theory and Practice

Open Access
Authors
Publication date 12-2013
Journal International Journal of Computer Vision
Volume | Issue number 105 | 3
Pages (from-to) 222-245
Organisations
  • Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract
A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K—with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Document type Article
Language English
Published at https://doi.org/10.1007/s11263-013-0636-x
Published at http://www.science.uva.nl/research/publications/2013/SanchezIJCV2013
Downloads
SanchezIJCV2013 (Submitted manuscript)
Permalink to this page
Back