Euclidean embedding of co-occurrence data

Amir Globerson, Gal Chechik, Fernando Pereira, Naftali Tishby

Research output: Contribution to journalArticlepeer-review

Abstract

Embedding algorithms search for a low dimensional continuous representation of data, but most algorithms only handle objects of a single type for which pairwise distances are specified. This paper describes a method for embedding objects of different types, such as images and text, into a single common Euclidean space, based on their co-occurrence statistics. The joint distributions are modeled as exponentials of Euclidean distances in the low-dimensional embedding space, which links the problem to convex optimization over positive semidefinite matrices. The local structure of the embedding corresponds to the statistical correlations via random walks in the Euclidean space. We quantify the performance of our method on two text data sets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling, IsoMap and correspondence analysis.

Original languageEnglish
Pages (from-to)2265-2295
Number of pages31
JournalJournal of Machine Learning Research
Volume8
StatePublished - Oct 2007
Externally publishedYes

Keywords

  • Embedding algorithms
  • Exponential families
  • Manifold learning
  • Matrix factorization
  • Multidimensional scaling
  • Semidefinite programming

Fingerprint

Dive into the research topics of 'Euclidean embedding of co-occurrence data'. Together they form a unique fingerprint.

Cite this