TY - GEN
T1 - Focus Your Attention (with Adaptive IIR Filters)
AU - Lutati, Shahar
AU - Zimerman, Itamar
AU - Wolf, Lior
N1 - Publisher Copyright:
©2023 Association for Computational Linguistics.
PY - 2023
Y1 - 2023
N2 - We present a new layer in which dynamic (i.e., input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence prior to applying conventional attention. The input is split into chunks, and the coefficients of these filters are determined based on previous chunks to maintain causality. Despite their relatively low order, the causal adaptive filters are shown to focus attention on the relevant sequence elements. The new layer is grounded in control theory, and is shown to generalize diagonal state-space layers. The layer performs on-par with state-of-the-art networks, with a fraction of their parameters and with time complexity that is sub-quadratic with input size. The obtained layer is favorable to layers such as Heyna, GPT2, and Mega, both with respect to the number of parameters and the obtained level of performance on multiple long-range sequence problems.
AB - We present a new layer in which dynamic (i.e., input-dependent) Infinite Impulse Response (IIR) filters of order two are used to process the input sequence prior to applying conventional attention. The input is split into chunks, and the coefficients of these filters are determined based on previous chunks to maintain causality. Despite their relatively low order, the causal adaptive filters are shown to focus attention on the relevant sequence elements. The new layer is grounded in control theory, and is shown to generalize diagonal state-space layers. The layer performs on-par with state-of-the-art networks, with a fraction of their parameters and with time complexity that is sub-quadratic with input size. The obtained layer is favorable to layers such as Heyna, GPT2, and Mega, both with respect to the number of parameters and the obtained level of performance on multiple long-range sequence problems.
UR - http://www.scopus.com/inward/record.url?scp=85182342475&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85182342475
T3 - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
SP - 12538
EP - 12549
BT - EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings
A2 - Bouamor, Houda
A2 - Pino, Juan
A2 - Bali, Kalika
PB - Association for Computational Linguistics (ACL)
T2 - 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023
Y2 - 6 December 2023 through 10 December 2023
ER -