Embedding heterogeneous data using statistical models

Amir Globerson*, Gal Chechik, Fernando Pereira, Naftali Tishby

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Embedding algorithms are a method for revealing low dimensional structure in complex data. Most embedding algorithms are designed to handle objects of a single type for which pairwise distances are specified. Here we describe a method for embedding objects of different types (such as authors and terms) into a single common Euclidean space based on their co-occurrence statistics. The joint distributions of the heterogenous objects are modeled as exponentials of squared Euclidean distances in a low-dimensional embedding space. This construction links the problem to convex optimization over positive semidefinite matrices. We quantify the performance of our method on two text datasets, and show that it consistently and significantly outperforms standard methods of statistical correspondence modeling, such as multidimensional scaling and correspondence analysis.

Original languageEnglish
Title of host publicationProceedings of the 21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
Pages1605-1608
Number of pages4
StatePublished - 2006
Externally publishedYes
Event21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06 - Boston, MA, United States
Duration: 16 Jul 200620 Jul 2006

Publication series

NameProceedings of the National Conference on Artificial Intelligence
Volume2

Conference

Conference21st National Conference on Artificial Intelligence and the 18th Innovative Applications of Artificial Intelligence Conference, AAAI-06/IAAI-06
Country/TerritoryUnited States
CityBoston, MA
Period16/07/0620/07/06

Fingerprint

Dive into the research topics of 'Embedding heterogeneous data using statistical models'. Together they form a unique fingerprint.

Cite this