TalkSumm: A dataset and scalable annotation method for scientific paper summarization based on conference talks

Guy Lev, Michal Shmueli-Scheuer, Jonathan Herzig, Achiya Jerbi, David Konopnicki

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Currently, no large-scale training data is available for the task of scientific paper summarization. In this paper, we propose a novel method that automatically generates summaries for scientific papers, by utilizing videos of talks at scientific conferences. We hypothesize that such talks constitute a coherent and concise description of the papers' content, and can form the basis for good summaries. We collected 1716 papers and their corresponding videos, and created a dataset of paper summaries. A model trained on this dataset achieves similar performance as models trained on a dataset of summaries created manually. In addition, we validated the quality of our summaries by human experts.

Original languageEnglish
Title of host publicationACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
PublisherAssociation for Computational Linguistics (ACL)
Pages2125-2131
Number of pages7
ISBN (Electronic)9781950737482
StatePublished - 2020
Externally publishedYes
Event57th Annual Meeting of the Association for Computational Linguistics, ACL 2019 - Florence, Italy
Duration: 28 Jul 20192 Aug 2019

Publication series

NameACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference

Conference

Conference57th Annual Meeting of the Association for Computational Linguistics, ACL 2019
Country/TerritoryItaly
CityFlorence
Period28/07/192/08/19

Fingerprint

Dive into the research topics of 'TalkSumm: A dataset and scalable annotation method for scientific paper summarization based on conference talks'. Together they form a unique fingerprint.

Cite this