Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language

Avia Efrat, Uri Shaham, Dan Kilman, Omer Levy

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Current NLP datasets targeting ambiguity can be solved by a native speaker with relative ease. We present Cryptonite, a large-scale dataset based on cryptic crosswords, which is both linguistically complex and naturally sourced. Each example in Cryptonite is a cryptic clue, a short phrase or sentence with a misleading surface reading, whose solving requires disambiguating semantic, syntactic, and phonetic wordplays, as well as world knowledge. Cryptic clues pose a challenge even for experienced solvers, though top-tier experts can solve them with almost 100% accuracy. Cryptonite is a challenging task for current models; fine-tuning T5-Large on 470k cryptic clues achieves only 7.6% accuracy, on par with the accuracy of a rule-based clue solver (8.6%).
Original languageEnglish
Title of host publicationProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
EditorsMarie-Francine Moens, Xuanjing Huang, Lucia Specia, Scott Wen-tau Yih
PublisherAssociation for Computational Linguistics
Pages4186-4192
Number of pages7
ISBN (Electronic)978-1-955917-09-4
StatePublished - 1 Nov 2021
Event2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021 - Hybrid - Online and in Punta Cana, Punta Cana, Dominican Republic
Duration: 7 Nov 202114 Nov 2021

Conference

Conference2021 Conference on Empirical Methods in Natural Language Processing, EMNLP 2021
Country/TerritoryDominican Republic
CityPunta Cana
Period7/11/2114/11/21

Fingerprint

Dive into the research topics of 'Cryptonite: A Cryptic Crossword Benchmark for Extreme Ambiguity in Language'. Together they form a unique fingerprint.

Cite this