The Two-Stage Algorithm for Extraction of the Significant Pharmaceutical Named Entities and Their Relations in the Russian-Language Reviews on Medications on Base of the XLM-RoBERTa Language Model

Alexander Sboev*, Ivan Moloshnikov, Anton Selivanov, Gleb Rylkov, Roman Rybka

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The Internet contains a large amount of heterogeneous information, the extraction and structuring of which is currently a relevant task. This is especially relevant for tasks of social importance, in particular the analysis of the experience of using pharmaceutical products. In this paper, we propose a two-step sequential algorithm for extracting named entities and the relationships between them. Its creation was made possible by the availability of a marked-up corpus of Internet users’ reviews of medicines (Russian Drug Review Corpus). The basis of the algorithm is the language model XLM-RoBERTa-sag, which is pre-trained on a large corpus of unlabeled texts of reviews. The developed algorithm achieves the accuracy of identifying related entities: 71.6 and relations: 80.5, which is the first estimate of the accuracy of the solution of the considered problem on the Russian-language drug review texts.

Original languageEnglish
Title of host publicationBiologically Inspired Cognitive Architectures 2021 - Proceedings of the 12th Annual Meeting of the BICA Society
EditorsValentin V. Klimov, David J. Kelley
PublisherSpringer Science and Business Media Deutschland GmbH
Pages463-471
Number of pages9
ISBN (Print)9783030969929
DOIs
StatePublished - 2022
Externally publishedYes
Event12th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2021 - Virtual, Online
Duration: 12 Sep 202119 Sep 2021

Publication series

NameStudies in Computational Intelligence
Volume1032 SCI
ISSN (Print)1860-949X
ISSN (Electronic)1860-9503

Conference

Conference12th Annual International Conference on Biologically Inspired Cognitive Architectures, BICA 2021
CityVirtual, Online
Period12/09/2119/09/21

Keywords

  • Language models
  • Natural language processing
  • Neural networks
  • Pharmaceutical dataset
  • Relation extraction
  • Russian language

Fingerprint

Dive into the research topics of 'The Two-Stage Algorithm for Extraction of the Significant Pharmaceutical Named Entities and Their Relations in the Russian-Language Reviews on Medications on Base of the XLM-RoBERTa Language Model'. Together they form a unique fingerprint.

Cite this