TY - GEN
T1 - A Learning-based Approach for Explaining Language Models
AU - Barkan, Oren
AU - Toib, Yonatan
AU - Elisha, Yehonatan
AU - Koenigstein, Noam
N1 - Publisher Copyright:
© 2024 ACM.
PY - 2024/10/21
Y1 - 2024/10/21
N2 - We present Learning Attributions (LA), a novel method for explaining language models. The core idea behind LA is to train a dedicated attribution model that functions as a surrogate explainer for the language model. This attribution model is designed to identify which tokens are most influential in driving the model's predictions. By optimizing the attribution model to mask the minimal amount of information necessary to induce substantial changes in the language model's output, LA provides a mechanism to understand which tokens in the input are critical for the model's decisions. We demonstrate the effectiveness of LA across several language models, highlighting its superiority over multiple state-of-the-art explanation methods across various datasets and evaluation metrics.
AB - We present Learning Attributions (LA), a novel method for explaining language models. The core idea behind LA is to train a dedicated attribution model that functions as a surrogate explainer for the language model. This attribution model is designed to identify which tokens are most influential in driving the model's predictions. By optimizing the attribution model to mask the minimal amount of information necessary to induce substantial changes in the language model's output, LA provides a mechanism to understand which tokens in the input are critical for the model's decisions. We demonstrate the effectiveness of LA across several language models, highlighting its superiority over multiple state-of-the-art explanation methods across various datasets and evaluation metrics.
KW - deep learning
KW - explainable ai
KW - natural language processing
UR - http://www.scopus.com/inward/record.url?scp=85210009045&partnerID=8YFLogxK
U2 - 10.1145/3627673.3679548
DO - 10.1145/3627673.3679548
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85210009045
T3 - International Conference on Information and Knowledge Management, Proceedings
SP - 98
EP - 108
BT - CIKM 2024 - Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
PB - Association for Computing Machinery
T2 - 33rd ACM International Conference on Information and Knowledge Management, CIKM 2024
Y2 - 21 October 2024 through 25 October 2024
ER -