Abstract
The aim of this paper is to review some methods for text authorship attribution and to discuss the development of a software library with tools for automatic authorship attribution. The presentation is focused on an analysis of two groups of tools oriented to: (1) methods for extraction of features and (2) methods for computing the distance between character strings based on data compression algorithms.
Original language | English |
---|---|
Pages (from-to) | 91-97 |
Number of pages | 7 |
Journal | Digital Presentation and Preservation of Cultural and Scientific Heritage |
Volume | 5 |
State | Published - 16 Feb 2017 |
Event | 5th International Conference on Digital Presentation and Preservation of Cultural and Scientific Heritage, DiPP 2015 - Veliko Tarnovo, Bulgaria Duration: 28 Sep 2015 → 30 Sep 2015 |
Keywords
- Compression algorithms
- N-grams
- Natural frequency zoned word distribution
- Normalized compression distance
- Text authorship identification