Pair distance distribution: A model of semantic representation

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We introduce PDD (Pair Distance Distribution), a novel corpus-based model of semantic representation. Most corpus-based models are VSMs (Vector Space Models), which while being successful, suffer from both practical and theoretical shortcomings. VSM models produce very large, sparse matrices, and dimensionality reduction is usually performed, leading to high computational complexity, and obscuring the meaning of the dimensions. Similarity in VSMs is constrained to be both symmetric and transitive, contrary to evidence from human subject tests. PDD is featurebased, created automatically from corpora without producing large, sparse matrices. The dimensions along which words are compared are meaningful, enabling better understanding of the model and providing an explanation as to how any two words are similar. Similarity is neither symmetric nor transitive. The model achieved accuracy of 97.6% on a published semantic similarity test.

Original languageEnglish
Title of host publicationProceedings of the 1st Workshop on Representation Learning for NLP, Rep4NLP 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
EditorsPhil Blunsom, Kyunghyun Cho, Shay Cohen, Edward Grefenstette, Karl Moritz Hermann, Laura Rimell, Jason Weston, Scott Wen-Tau Yih
PublisherAssociation for Computational Linguistics (ACL)
Pages184-192
Number of pages9
ISBN (Electronic)9781945626043
DOIs
StatePublished - 2016
Event1st Workshop on Representation Learning for NLP, Rep4NLP 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016 - Berlin, Germany
Duration: 11 Aug 2016 → …

Publication series

NameProceedings of the Annual Meeting of the Association for Computational Linguistics
ISSN (Print)0736-587X

Conference

Conference1st Workshop on Representation Learning for NLP, Rep4NLP 2016 at the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016
Country/TerritoryGermany
CityBerlin
Period11/08/16 → …

Fingerprint

Dive into the research topics of 'Pair distance distribution: A model of semantic representation'. Together they form a unique fingerprint.

Cite this