TY - GEN
T1 - Text segmentation as a supervised learning task
AU - Koshorek, Omri
AU - Cohen, Adir
AU - Mor, Noam
AU - Rotman, Michael
AU - Berant, Jonathan
N1 - Publisher Copyright:
© 2018 Association for Computational Linguistics.
PY - 2018
Y1 - 2018
N2 - Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develop a segmentation model based on this dataset and show that it generalizes well to unseen natural text.
AB - Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develop a segmentation model based on this dataset and show that it generalizes well to unseen natural text.
UR - http://www.scopus.com/inward/record.url?scp=85077005592&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85077005592
T3 - NAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
SP - 469
EP - 473
BT - Short Papers
PB - Association for Computational Linguistics (ACL)
Y2 - 1 June 2018 through 6 June 2018
ER -