Image Classification with the Fisher Vector: Theory and Practice

J. Sánchez; F. Perronnin; T. Mensink; J. Verbeek

doi:https://doi.org/10.1007/s11263-013-0636-x

Image Classification with the Fisher Vector: Theory and Practice

Authors	J. Sánchez F. Perronnin T. Mensink J. Verbeek
Publication date	12-2013
Journal	International Journal of Computer Vision
Volume \| Issue number	105 \| 3
Pages (from-to)	222-245
Organisations	Faculty of Science (FNWI) - Informatics Institute (IVI)
Abstract	A standard approach to describe an image for classification and retrieval purposes is to extract a set of local patch descriptors, encode them into a high dimensional vector and pool them into an image-level signature. The most common patch encoding strategy consists in quantizing the local descriptors into a finite set of prototypical elements. This leads to the popular Bag-of-Visual words representation. In this work, we propose to use the Fisher Kernel framework as an alternative patch encoding strategy: we describe patches by their deviation from an “universal” generative Gaussian mixture model. This representation, which we call Fisher vector has many advantages: it is efficient to compute, it leads to excellent results even with efficient linear classifiers, and it can be compressed with a minimal loss of accuracy using product quantization. We report experimental results on five standard datasets—PASCAL VOC 2007, Caltech 256, SUN 397, ILSVRC 2010 and ImageNet10K—with up to 9M images and 10K classes, showing that the FV framework is a state-of-the-art patch encoding technique.
Document type	Article
Language	English
Published at	https://doi.org/10.1007/s11263-013-0636-x (Final published version)
Published at	http://www.science.uva.nl/research/publications/2013/SanchezIJCV2013
Downloads	SanchezIJCV2013 (Submitted manuscript)
Permalink to this page

Back

UvA-DARE

Digital Academic Repository

Image Classification with the Fisher Vector: Theory and Practice