Associating neural word embeddings with deep image representations using Fisher Vectors

Benjamin Klein, Guy Lev, Gil Sadeh, Lior Wolf

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

253 Scopus citations

Abstract

In recent years, the problem of associating a sentence with an image has gained a lot of attention. This work continues to push the envelope and makes further progress in the performance of image annotation and image search by a sentence tasks. In this work, we are using the Fisher Vector as a sentence representation by pooling the word2vec embedding of each word in the sentence. The Fisher Vector is typically taken as the gradients of the log-likelihood of descriptors, with respect to the parameters of a Gaussian Mixture Model (GMM). In this work we present two other Mixture Models and derive their Expectation-Maximization and Fisher Vector expressions. The first is a Laplacian Mixture Model (LMM), which is based on the Laplacian distribution. The second Mixture Model presented is a Hybrid Gaussian-Laplacian Mixture Model (HGLMM) which is based on a weighted geometric mean of the Gaussian and Laplacian distribution. Finally, by using the new Fisher Vectors derived from HGLMMs to represent sentences, we achieve state-of-the-art results for both the image annotation and the image search by a sentence tasks on four benchmarks: Pascal1K, Flickr8K, Flickr30K, and COCO.

Original languageEnglish
Title of host publicationIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
PublisherIEEE Computer Society
Pages4437-4446
Number of pages10
ISBN (Electronic)9781467369640
DOIs
StatePublished - 14 Oct 2015
EventIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 - Boston, United States
Duration: 7 Jun 201512 Jun 2015

Publication series

NameProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition
Volume07-12-June-2015
ISSN (Print)1063-6919

Conference

ConferenceIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
Country/TerritoryUnited States
CityBoston
Period7/06/1512/06/15

Fingerprint

Dive into the research topics of 'Associating neural word embeddings with deep image representations using Fisher Vectors'. Together they form a unique fingerprint.

Cite this