Deriving paraphrases for highly inflected languages from comparable documents

Kfir Bar*, Nachum Dershowitz

*Corresponding author for this work

Research output: Contribution to conferencePaperpeer-review

Abstract

We describe an automatic paraphrase-inference procedure for a highly inflected language like Arabic. Paraphrases are derived from comparable documents, that is, distinct documents dealing with the same topic. A co-training approach is taken, with two classifiers, one designed to model the contexts surrounding occurrences of paraphrases, and the other trained to identify significant features of the words within paraphrases. In particular, we use morpho-syntactic features calculated for both classifiers, as is to be expected when working with highly inflected languages. We provide some experimental results for Arabic, and for the simpler English, which we find to be encouraging. Our immediate interest is to incorporate such paraphrases within an Arabic-to- English translation system.

Original languageEnglish
Pages185-200
Number of pages16
StatePublished - 2012
Event24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India
Duration: 8 Dec 201215 Dec 2012

Conference

Conference24th International Conference on Computational Linguistics, COLING 2012
Country/TerritoryIndia
CityMumbai
Period8/12/1215/12/12

Keywords

  • Arabic
  • Comparable documents
  • Cotraining
  • Highly inflected languages
  • Morphologically rich languages
  • Paraphrases

Fingerprint

Dive into the research topics of 'Deriving paraphrases for highly inflected languages from comparable documents'. Together they form a unique fingerprint.

Cite this