TY - JOUR
T1 - XAI for Transformers
T2 - 39th International Conference on Machine Learning, ICML 2022
AU - Ali, Ameen
AU - Schnake, Thomas
AU - Eberle, Oliver
AU - Montavon, Grégoire
AU - Müller, Klaus Robert
AU - Wolf, Lior
N1 - Publisher Copyright:
Copyright © 2022 by the author(s)
PY - 2022
Y1 - 2022
N2 - Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as the main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.
AB - Transformers have become an important workhorse of machine learning, with numerous applications. This necessitates the development of reliable methods for increasing their transparency. Multiple interpretability methods, often based on gradient information, have been proposed. We show that the gradient in a Transformer reflects the function only locally, and thus fails to reliably identify the contribution of input features to the prediction. We identify Attention Heads and LayerNorm as the main reasons for such unreliable explanations and propose a more stable way for propagation through these layers. Our proposal, which can be seen as a proper extension of the well-established LRP method to Transformers, is shown both theoretically and empirically to overcome the deficiency of a simple gradient-based approach, and achieves state-of-the-art explanation performance on a broad range of Transformer models and datasets.
UR - http://www.scopus.com/inward/record.url?scp=85163080865&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85163080865
SN - 2640-3498
VL - 162
SP - 435
EP - 451
JO - Proceedings of Machine Learning Research
JF - Proceedings of Machine Learning Research
Y2 - 17 July 2022 through 23 July 2022
ER -