Orb: An open reading benchmark for comprehensive evaluation of machine reading comprehension

Dheeru Dua, Ananth Gottumukkala, Alon Talmor, Sameer Singh, Matt Gardner

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Reading comprehension is one of the crucial tasks for furthering research in natural language understanding. A lot of diverse reading comprehension datasets have recently been introduced to study various phenomena in natural language, ranging from simple paraphrase matching and entity typing to entity tracking and understanding the implications of the context. Given the availability of many such datasets, comprehensive and reliable evaluation is tedious and time-consuming for researchers working on this problem. We present an evaluation server, ORB, that reports performance on seven diverse reading comprehension datasets, encouraging and facilitating testing a single model's capability in understanding a wide variety of reading phenomena. The evaluation server places no restrictions on how models are trained, so it is a suitable test bed for exploring training paradigms and representation learning for general reading facility. As more suitable datasets are released, they will be added to the evaluation server. We also collect and include synthetic augmentations for these datasets, testing how well models can handle out-of-domain questions.

Original languageEnglish
Title of host publicationMRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering
PublisherAssociation for Computational Linguistics (ACL)
Pages147-153
Number of pages7
ISBN (Electronic)9781950737819
StatePublished - 2019
Externally publishedYes
Event2nd Workshop on Machine Reading for Question Answering, MRQA@EMNLP 2019 - Hong Kong, China
Duration: 4 Nov 2019 → …

Publication series

NameMRQA@EMNLP 2019 - Proceedings of the 2nd Workshop on Machine Reading for Question Answering

Conference

Conference2nd Workshop on Machine Reading for Question Answering, MRQA@EMNLP 2019
Country/TerritoryChina
CityHong Kong
Period4/11/19 → …

Fingerprint

Dive into the research topics of 'Orb: An open reading benchmark for comprehensive evaluation of machine reading comprehension'. Together they form a unique fingerprint.

Cite this