TY - GEN
T1 - CodeSwitching and BackTransliteration Using a Bilingual Model
AU - Mitelman, Daniel Weisberg
AU - Dershowitz, Nachum
AU - Bar, Kfir
N1 - Publisher Copyright:
© 2024 Association for Computational Linguistics.
PY - 2024
Y1 - 2024
N2 - The challenges of automated transliteration and codeswitching–detection in JudeoArabic texts are addressed. We introduce two novel machinelearning models, one focused on transliterating JudeoArabic into Arabic, and another aimed at identifying nonArabic words, predominantly Hebrew and Aramaic. Unlike prior work, our models are based on a bilingual ArabicHebrew language model, providing a unique advantage in capturing shared linguistic nuances. Evaluation results show that our models outperform prior solutions for the same tasks. As a practical contribution, we present a comprehensive pipeline capable of taking JudeoArabic text, identifying nonArabic words, and then transliterating the Arabic portions into Arabic script. This work not only advances the state of the art but also offers a valuable toolset for making JudeoArabic texts more accessible to a broader Arabicspeaking audience and more amenable to modern language tools.
AB - The challenges of automated transliteration and codeswitching–detection in JudeoArabic texts are addressed. We introduce two novel machinelearning models, one focused on transliterating JudeoArabic into Arabic, and another aimed at identifying nonArabic words, predominantly Hebrew and Aramaic. Unlike prior work, our models are based on a bilingual ArabicHebrew language model, providing a unique advantage in capturing shared linguistic nuances. Evaluation results show that our models outperform prior solutions for the same tasks. As a practical contribution, we present a comprehensive pipeline capable of taking JudeoArabic text, identifying nonArabic words, and then transliterating the Arabic portions into Arabic script. This work not only advances the state of the art but also offers a valuable toolset for making JudeoArabic texts more accessible to a broader Arabicspeaking audience and more amenable to modern language tools.
UR - http://www.scopus.com/inward/record.url?scp=85188679932&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85188679932
T3 - EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024
SP - 1501
EP - 1511
BT - EACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024
A2 - Graham, Yvette
A2 - Purver, Matthew
A2 - Purver, Matthew
PB - Association for Computational Linguistics (ACL)
T2 - 18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024
Y2 - 17 March 2024 through 22 March 2024
ER -