Efficient Long-Text Understanding with Short-Text Models

Maor Ivgi, Uri Shaham, Jonathan Berant

Research output: Contribution to journalArticlepeer-review

30 Scopus citations

Abstract

Transformer-based pretrained language models (LMs) are ubiquitous across natural language understanding, but cannot be applied to long sequences such as stories, scien-tific articles, and long documents due to their quadratic complexity. While a myriad of efficient transformer variants have been proposed, they are typically based on cus-tom implementations that require expensive pretraining from scratch. In this work, we pro-pose SLED: SLiding-Encoder and Decoder, a simple approach for processing long sequences that re-uses and leverages battle-tested short-text pretrained LMs. Specifically, we partition the input into overlapping chunks, encode each with a short-text LM encoder and use the pretrained decoder to fuse information across chunks (fusion-in-decoder). We illustrate through controlled experiments that SLED offers a viable strategy for long text understanding and evaluate our approach on SCROLLS, a benchmark with seven datasets across a wide range of language understanding tasks. We find that SLED is competitive with specialized models that are up to 50x larger and require a dedicated and expensive pretraining step.

Original languageEnglish
Pages (from-to)284-299
Number of pages16
JournalTransactions of the Association for Computational Linguistics
Volume11
DOIs
StatePublished - 2023

Funding

FundersFunder number
European Union Horizons 2020 research and innovation programmeDELPHI 802800
Yan-dex Initiative for Machine Learning
Blavatnik Family Foundation
European Research Council

    Fingerprint

    Dive into the research topics of 'Efficient Long-Text Understanding with Short-Text Models'. Together they form a unique fingerprint.

    Cite this