In defense of word embedding for generic text representation

Guy Lev, Benjamin Klein, Lior Wolf*

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

33 Scopus citations

Abstract

Statistical methods have shown a remarkable ability to capture semantics. The word2vec method is a frequently cited method for capturing meaningful semantic relations between words from a large text corpus. It has the advantage of not requiring any tagging while training. The prevailing view is, however, that it lacks the ability to capture semantics of word sequences and is virtually useless for most purposes, unless combined with heavy machinery. This paper challenges that view, by showing that by augmenting the word2vec representation with one of a few pooling techniques, results are obtained surpassing or comparable with the best literature algorithms. This improved performance is justified by theory and verified by extensive experiments on well studied NLP benchmarks (This work is inspired by [10]).

Original languageEnglish
Title of host publicationNatural Language Processing and Information Systems - 20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015, Proceedings
EditorsSiegfried Handschuh, André Freitas, Elisabeth Métais, Chris Biemann, Farid Meziane
PublisherSpringer Verlag
Pages35-50
Number of pages16
ISBN (Print)9783319195803
DOIs
StatePublished - 2015
Event20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015 - Passau, Germany
Duration: 17 Jun 201519 Jun 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9103
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference20th International Conference on Applications of Natural Language to Information Systems, NLDB 2015
Country/TerritoryGermany
CityPassau
Period17/06/1519/06/15

Funding

FundersFunder number
Intel Collaboration Research Institute for Computational Intelligence

    Fingerprint

    Dive into the research topics of 'In defense of word embedding for generic text representation'. Together they form a unique fingerprint.

    Cite this