Stemming and segmentation for classical Tibetan

Orna Almogi, Lena Dankin, Nachum Dershowitz*, Yair Hoffman, Dimitri Pauls, Dorji Wangchuk, Lior Wolf

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Tibetan is a monosyllabic language for which computerized language tools are largely lacking. We describe the development of a syllable stemmer for Tibetan. The stemmer is based on a set of rules that strive to identify the vowel, the core letter of the syllable, and then the other parts. We demonstrate the value of the stemmer with two applications: determining stem similarity of two syllables and word segmentation. Our stemmer is being made available as an open-source tool and word segmentation as a freely-available online tool.

Original languageEnglish
Title of host publicationComputational Linguistics and Intelligent Text Processing - 17th International Conference, CICLing 2016, Revised Selected Papers
EditorsAlexander Gelbukh
PublisherSpringer Verlag
Pages294-306
Number of pages13
ISBN (Print)9783319754765
DOIs
StatePublished - 2018
Event17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016 - Konya, Turkey
Duration: 3 Apr 20169 Apr 2016

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9623 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference17th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2016
Country/TerritoryTurkey
CityKonya
Period3/04/169/04/16

Fingerprint

Dive into the research topics of 'Stemming and segmentation for classical Tibetan'. Together they form a unique fingerprint.

Cite this