Text segmentation as a supervised learning task

Omri Koshorek, Adir Cohen, Noam Mor, Michael Rotman, Jonathan Berant

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Text segmentation, the task of dividing a document into contiguous segments based on its semantic structure, is a longstanding challenge in language understanding. Previous work on text segmentation focused on unsupervised methods such as clustering or graph search, due to the paucity in labeled data. In this work, we formulate text segmentation as a supervised learning problem, and present a large new dataset for text segmentation that is automatically extracted and labeled from Wikipedia. Moreover, we develop a segmentation model based on this dataset and show that it generalizes well to unseen natural text.

Original languageEnglish
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages469-473
Number of pages5
ISBN (Electronic)9781948087292
StatePublished - 2018
Event2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018 - New Orleans, United States
Duration: 1 Jun 20186 Jun 2018

Publication series

NameNAACL HLT 2018 - 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference
Volume2

Conference

Conference2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2018
Country/TerritoryUnited States
CityNew Orleans
Period1/06/186/06/18

Fingerprint

Dive into the research topics of 'Text segmentation as a supervised learning task'. Together they form a unique fingerprint.

Cite this