Pragmatic markers and parts of speech: On the problems of annotation of the speech corpus

Natalia Bogdanova-Beglarian, Kristina Zaides

Research output: Contribution to journalConference articlepeer-review


The article considers the range of possibilities of pragmatic markers (PM) annotation: from the speaker’s code to the speaker’s commentaries for all difficult cases. The research is based on the material of two corpora of everyday Russian speech – “One Day of Speech” (ORD; dialogues / polylogues) and “Balanced Annotated Text Collection” (SAT; monologues). Two main annotation levels have become the objects of research in this paper: the part of speech of the original lexical unit, from which the basic version of the PM had derived (POS), and the model of formation of the PM which consist of more than one word (Model). The research shows the low feasibility of trying to fit PM into the system of traditional parts of speech, and, conversely, the importance and role of defining a model of formation of PM for their systematic description. In any case, the automatic annotation of corpus material turns out to be considerably difficult.

Original languageEnglish
Pages (from-to)129-139
Number of pages11
JournalCEUR Workshop Proceedings
StatePublished - 2021
Externally publishedYes
Event2020 International Conference "Internet and Modern Society", IMS 2020 - Virtual, St. Petersburg, Russian Federation
Duration: 17 Jun 202020 Jun 2020


  • Model of formation
  • Part of speech
  • Pragmatic marker
  • Pragmaticalization
  • Speech corpus
  • Spoken speech

Cite this