Abstract
The aim of the paper is to present a project of a syntactically annotated corpus of Hittite, a dead cuneiform language (Anatolian family), the oldest Indo-European language attested in writing, that was spoken in 18-12 cc. BC on the territory of present-day Turkey. No publicly available corpus of Hittite with syntactic annotation exists so far, meanwhile Hittite syntax proves to be more and more interesting for the researchers, so the need of an online annotated corpus for this language is more and more compelling. There are certain problems arising in development of such a corpus. Some of them are specific to the language itself, like 2P clitic chains, their position in the clause in terms of generative linguistics, and constituency structure of the Hittite clause. Others are connected to sociolinguistic peculiarities of Hittite system of writing: Akkadian and Sumerian logograms had been widely used by the Hittite scribes, and should be properly marked up in a Hittite corpus. Another problem is lacunae - clay tablets had been heavily broken in the last 3000-3500 years. What should be principles of phrase structure annotation when half the sentence is gone? The paper discusses these and others problems and principles on the material of the presented project.
Original language | English |
---|---|
Pages (from-to) | 96-109 |
Number of pages | 14 |
Journal | CEUR Workshop Proceedings |
Volume | 1886 |
State | Published - 2016 |
Externally published | Yes |
Event | 2016 Workshop on Computational Linguistics and Language Science, CLLS 2016 - Moscow, Russian Federation Duration: 26 Apr 2016 → … |
Keywords
- Anatolian family
- Constituency trees
- Corpus linguistics
- Cuneiform
- Hittite
- Indo-European group of languages
- Phrase structure
- Syntax
- Transliteration
- Treebanks