Abstract
We describe an automatic paraphrase-inference procedure for a highly inflected language like Arabic. Paraphrases are derived from comparable documents, that is, distinct documents dealing with the same topic. A co-training approach is taken, with two classifiers, one designed to model the contexts surrounding occurrences of paraphrases, and the other trained to identify significant features of the words within paraphrases. In particular, we use morpho-syntactic features calculated for both classifiers, as is to be expected when working with highly inflected languages. We provide some experimental results for Arabic, and for the simpler English, which we find to be encouraging. Our immediate interest is to incorporate such paraphrases within an Arabic-to- English translation system.
Original language | English |
---|---|
Pages | 185-200 |
Number of pages | 16 |
State | Published - 2012 |
Event | 24th International Conference on Computational Linguistics, COLING 2012 - Mumbai, India Duration: 8 Dec 2012 → 15 Dec 2012 |
Conference
Conference | 24th International Conference on Computational Linguistics, COLING 2012 |
---|---|
Country/Territory | India |
City | Mumbai |
Period | 8/12/12 → 15/12/12 |
Keywords
- Arabic
- Comparable documents
- Cotraining
- Highly inflected languages
- Morphologically rich languages
- Paraphrases