ParaShoot: A Hebrew Question Answering Dataset

Omri Keren, Omer Levy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

NLP research in Hebrew has largely focused on morphology and syntax, where rich annotated datasets in the spirit of Universal Dependencies are available. Semantic datasets, however, are in short supply, hindering crucial advances in the development of NLP technology in Hebrew. In this work, we present ParaShoot, the first question answering dataset in modern Hebrew. The dataset follows the format and crowdsourcing methodology of SQuAD, and contains approximately 3000 annotated examples, similar to other question-answering datasets in low-resource languages. We provide the first baseline results using recently-released BERT-style models for Hebrew, showing that there is significant room for improvement on this task.
Original languageEnglish
Title of host publicationProceedings of the 3rd Workshop on Machine Reading for Question Answering
EditorsAdam Fisch, Alon Talmor, Danqi Chen, Eunsol Choi, Minjoon Seo, Patrick Lewis, Robin Jia, Sewon Min
Place of PublicationPunta Cana, Dominican Republic
PublisherAssociation for Computational Linguistics
Pages106-112
Number of pages7
ISBN (Electronic)978-1-954085-95-4
StatePublished - 1 Nov 2021
Event3rd Workshop on Machine Reading for Question Answering - Hybrid (Online and Co-located with EMNLP 2021 in the Dominican Republic
Duration: 10 Nov 202110 Nov 2021
Conference number: 3

Workshop

Workshop3rd Workshop on Machine Reading for Question Answering
Abbreviated titleMRQA 2021
Period10/11/2110/11/21

Fingerprint

Dive into the research topics of 'ParaShoot: A Hebrew Question Answering Dataset'. Together they form a unique fingerprint.

Cite this