CodeSwitching and BackTransliteration Using a Bilingual Model

Daniel Weisberg Mitelman, Nachum Dershowitz, Kfir Bar

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The challenges of automated transliteration and codeswitching–detection in JudeoArabic texts are addressed. We introduce two novel machinelearning models, one focused on transliterating JudeoArabic into Arabic, and another aimed at identifying nonArabic words, predominantly Hebrew and Aramaic. Unlike prior work, our models are based on a bilingual ArabicHebrew language model, providing a unique advantage in capturing shared linguistic nuances. Evaluation results show that our models outperform prior solutions for the same tasks. As a practical contribution, we present a comprehensive pipeline capable of taking JudeoArabic text, identifying nonArabic words, and then transliterating the Arabic portions into Arabic script. This work not only advances the state of the art but also offers a valuable toolset for making JudeoArabic texts more accessible to a broader Arabicspeaking audience and more amenable to modern language tools.

Original languageEnglish
Title of host publicationEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024
EditorsYvette Graham, Matthew Purver, Matthew Purver
PublisherAssociation for Computational Linguistics (ACL)
Pages1501-1511
Number of pages11
ISBN (Electronic)9798891760936
StatePublished - 2024
Event18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024 - St. Julian's, Malta
Duration: 17 Mar 202422 Mar 2024

Publication series

NameEACL 2024 - 18th Conference of the European Chapter of the Association for Computational Linguistics, Findings of EACL 2024

Conference

Conference18th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2024 - Findings of EACL 2024
Country/TerritoryMalta
CitySt. Julian's
Period17/03/2422/03/24

Fingerprint

Dive into the research topics of 'CodeSwitching and BackTransliteration Using a Bilingual Model'. Together they form a unique fingerprint.

Cite this