TY - GEN
T1 - Processing Judeo-Arabic texts
AU - Bar, Kfir
AU - Dershowitz, Nachum
AU - Wolf, Lior
AU - Lubarsky, Yackov
AU - Choueka, Yaacov
N1 - Publisher Copyright:
© 2015 IEEE.
PY - 2016/2/29
Y1 - 2016/2/29
N2 - Judeo-Arabic is a set of dialects spoken and written by Jewish communities living in Arab countries. Judeo-Arabic is typically written in Hebrew letters, enriched with diacritic marks that relate to the underlying Arabic. However, some inconsistencies in rendering words in Hebrew letters increase the level of ambiguity of a given word. Furthermore, Judeo-Arabic texts usually contain non-Arabic words and phrases, such as quotations or borrowed words from Hebrew and Aramaic. We focus on two main tasks: (1) automatic transliteration of Judeo-Arabic Hebrew letters into Arabic letters, and (2) automatic identification of language switching points between Judeo-Arabic and Hebrew. For transliteration, we employ a statistical translation system trained on the character level, resulting in 96.9% precision, a significant improvement over the baseline. For the language switching task, we use a word-level supervised classifier, also showing some significant improvements over the baseline.
AB - Judeo-Arabic is a set of dialects spoken and written by Jewish communities living in Arab countries. Judeo-Arabic is typically written in Hebrew letters, enriched with diacritic marks that relate to the underlying Arabic. However, some inconsistencies in rendering words in Hebrew letters increase the level of ambiguity of a given word. Furthermore, Judeo-Arabic texts usually contain non-Arabic words and phrases, such as quotations or borrowed words from Hebrew and Aramaic. We focus on two main tasks: (1) automatic transliteration of Judeo-Arabic Hebrew letters into Arabic letters, and (2) automatic identification of language switching points between Judeo-Arabic and Hebrew. For transliteration, we employ a statistical translation system trained on the character level, resulting in 96.9% precision, a significant improvement over the baseline. For the language switching task, we use a word-level supervised classifier, also showing some significant improvements over the baseline.
KW - Code switching
KW - Judeo-Arabic
KW - Transliteration
UR - http://www.scopus.com/inward/record.url?scp=84969799380&partnerID=8YFLogxK
U2 - 10.1109/ACLing.2015.27
DO - 10.1109/ACLing.2015.27
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:84969799380
T3 - Proceedings - 1st International Conference on Arabic Computational Linguistics: Advances in Arabic Computational Linguistics, ACLing 2015
SP - 138
EP - 144
BT - Proceedings - 1st International Conference on Arabic Computational Linguistics
A2 - Gelbukh, Alexander
A2 - Shaalan, Khaled
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st International Conference on Arabic Computational Linguistics, ACLing 2015
Y2 - 17 April 2015 through 20 April 2015
ER -