XAI for Transformers: Better Explanations through Conservative Propagation

Ameen Ali*, Thomas Schnake, Oliver Eberle, Grégoire Montavon, Klaus Robert Müller, Lior Wolf

*Corresponding author for this work

Research output: Contribution to journalConference articlepeer-review

25 Scopus citations


Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as the main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.

Original languageEnglish
Pages (from-to)435-451
Number of pages17
JournalProceedings of Machine Learning Research
StatePublished - 2022
Event39th International Conference on Machine Learning, ICML 2022 - Baltimore, United States
Duration: 17 Jul 202223 Jul 2022


FundersFunder number
European Research Council
European Unions Horizon 2020 research, innovation programmeERC CoG 725974
German Ministry for Education and Research01GQ1115, 01IS18025A, 01GQ0850, 01IS14013AE, 01IS18037A
IITP2017-0-00451, 2019-0-00079
Institute of Information & Communications Technology Planning & Evaluation


    Dive into the research topics of 'XAI for Transformers: Better Explanations through Conservative Propagation'. Together they form a unique fingerprint.

    Cite this