TY - GEN
T1 - Unsupervised context sensitive language acquisition from a large corpus
AU - Solan, Zach
AU - Horn, David
AU - Ruppin, Eytan
AU - Edelman, Shimon
PY - 2004
Y1 - 2004
N2 - We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be "intermediate" for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children.
AB - We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be "intermediate" for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children.
UR - http://www.scopus.com/inward/record.url?scp=33244496414&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:33244496414
SN - 0262201526
SN - 9780262201520
T3 - Advances in Neural Information Processing Systems
BT - Advances in Neural Information Processing Systems 16 - Proceedings of the 2003 Conference, NIPS 2003
PB - Neural information processing systems foundation
T2 - 17th Annual Conference on Neural Information Processing Systems, NIPS 2003
Y2 - 8 December 2003 through 13 December 2003
ER -