Unsupervised decomposition of a document into authorial components

Moshe Koppel*, Navot Akiva, Idan Dershowitz, Nachum Dershowitz

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We propose a novel unsupervised method for separating out distinct authorial components of a document. In particular, we show that, given a book artificially "munged" from two thematically similar biblical books, we can separate out the two constituent books almost perfectly. This allows us to automatically recapitulate many conclusions reached by Bible scholars over centuries of research. One of the key elements of our method is exploitation of differences in synonym choice by different authors.

Original languageEnglish
Title of host publicationACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics
Subtitle of host publicationHuman Language Technologies
Pages1356-1364
Number of pages9
StatePublished - 2011
Event49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011 - Portland, OR, United States
Duration: 19 Jun 201124 Jun 2011

Publication series

NameACL-HLT 2011 - Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies
Volume1

Conference

Conference49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, ACL-HLT 2011
Country/TerritoryUnited States
CityPortland, OR
Period19/06/1124/06/11

Fingerprint

Dive into the research topics of 'Unsupervised decomposition of a document into authorial components'. Together they form a unique fingerprint.

Cite this