TY - JOUR
T1 - Emergent linguistic structure in artificial neural networks trained by self-supervision
AU - Manning, Christopher D.
AU - Clark, Kevin
AU - Hewitt, John
AU - Khandelwal, Urvashi
AU - Levy, Omer
N1 - Publisher Copyright:
© 2020 National Academy of Sciences. All rights reserved.
PY - 2020/12/1
Y1 - 2020/12/1
N2 - This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.
AB - This paper explores the knowledge of linguistic structure learned by large artificial neural networks, trained via self-supervision, whereby the model simply tries to predict a masked word in a given context. Human language communication is via sequences of words, but language understanding requires constructing rich hierarchical structures that are never observed explicitly. The mechanisms for this have been a prime mystery of human language acquisition, while engineering work has mainly proceeded by supervised learning on treebanks of sentences hand labeled for this latent structure. However, we demonstrate that modern deep contextual language models learn major aspects of this structure, without any explicit supervision. We develop methods for identifying linguistic hierarchical structure emergent in artificial neural networks and demonstrate that components in these models focus on syntactic grammatical relationships and anaphoric coreference. Indeed, we show that a linear transformation of learned embeddings in these models captures parse tree distances to a surprising degree, allowing approximate reconstruction of the sentence tree structures normally assumed by linguists. These results help explain why these models have brought such large improvements across many language-understanding tasks.
KW - Artificial neural netwok
KW - Self-supervision
KW - Syntax | learning
UR - http://www.scopus.com/inward/record.url?scp=85091816690&partnerID=8YFLogxK
U2 - 10.1073/pnas.1907367117
DO - 10.1073/pnas.1907367117
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 32493748
AN - SCOPUS:85091816690
SN - 0027-8424
VL - 117
SP - 30046
EP - 30054
JO - Proceedings of the National Academy of Sciences of the United States of America
JF - Proceedings of the National Academy of Sciences of the United States of America
IS - 48
ER -