TY - CONF
T1 - Transformer Language Models without Positional Encodings Still Learn Positional Information
AU - Haviv, Adi
AU - Ram, Ori
AU - Press, Ofir
AU - Izsak, Peter
AU - Levy, Omer
N1 - Publisher Copyright:
© 2022 Association for Computational Linguistics.
PY - 2022
Y1 - 2022
N2 - Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position. Our findings indicate that causal LMs might derive positional awareness not only from the explicit positioning mechanism, but also from the effects of the causal mask.
AB - Causal transformer language models (LMs), such as GPT-3, typically require some form of positional encoding, such as positional embeddings. However, we show that LMs without any explicit positional encoding are still competitive with standard models, and that this phenomenon is robust across different datasets, model sizes, and sequence lengths. Probing experiments reveal that such models acquire an implicit notion of absolute positions throughout the network, effectively compensating for the missing information. We conjecture that causal attention enables the model to infer the number of predecessors that each token can attend to, thereby approximating its absolute position. Our findings indicate that causal LMs might derive positional awareness not only from the explicit positioning mechanism, but also from the effects of the causal mask.
UR - http://www.scopus.com/inward/record.url?scp=85149845940&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontoconference.paper???
AN - SCOPUS:85149845940
SP - 1382
EP - 1390
T2 - 2022 Findings of the Association for Computational Linguistics: EMNLP 2022
Y2 - 7 December 2022 through 11 December 2022
ER -