Inferring paraphrases for a highly inflected language from a monolingual corpus

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We suggest a new technique for deriving paraphrases from a monolingual corpus, supported by a relatively small set of comparable documents. Two somewhat similar phrases that each occur in one of a pair of documents dealing with the same incident are taken as potential paraphrases, which are evaluated based on the contexts in which they appear in the larger monolingual corpus. We apply this technique to Arabic, a highly inflected language, for improving an Arabic-to-English statistical translation system. The paraphrases are provided to the translation system formatted as a word lattice, each assigned with a score reflecting its equivalence level. We experiment with the system on different configurations, resulting in encouraging results: our best system shows an increase of 1.73 (5.49%) in BLEU.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 15th International Conference, CICLing 2014, Proceedings
PublisherSpringer Verlag
Pages254-270
Number of pages17
EditionPART 2
ISBN (Print)9783642549021
DOIs
StatePublished - 2014
Event15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014 - Kathmandu, Nepal
Duration: 6 Apr 201412 Apr 2014

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 2
Volume8404 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference15th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2014
Country/TerritoryNepal
CityKathmandu
Period6/04/1412/04/14

Keywords

  • Arabic
  • Machine Translation
  • Paraphrases

Fingerprint

Dive into the research topics of 'Inferring paraphrases for a highly inflected language from a monolingual corpus'. Together they form a unique fingerprint.

Cite this