TY - GEN
T1 - Transcription Alignment for Highly Fragmentary Historical Manuscripts
T2 - 17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020
AU - Ben Ezra, Daniel Stokl
AU - Brown-Devost, Bronson
AU - Dershowitz, Nachum
AU - Pechorin, Alexey
AU - Kiessling, Benjamin
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/9
Y1 - 2020/9
N2 - Most of the Dead Sea Scrolls have now been digitally transcribed and imaged to very high standards. Our goal is to align the transcriptions with the text visible in the image, glyph by (often fragmentary) glyph. This involves several tasks, normally considered in isolation: (A) Baseline segmentation. (B) Line polygon extraction. (C) Automated transcription by handwritten character recognition, to aid in alignment. (D) Alignment of the Unicode characters in a line transcription with the characters in the image of that line. The task is frustrated by the degraded nature of the frequently very small and/or warped fragments with many broken letters, substantially different allographs, ligatures, and scribal idiosyncrasies. Furthermore, a great number of inconsistencies between current cataloguing systems for the data need to be resolved. For each task, we apply state-of-the-art machine-learning methods in addition to more traditional techniques, each presenting significant difficulties on account of the poor state of most fragments' preservation. We have built ground-truth datasets and have managed to achieve good results with well-preserved fragments by leveraging heavily augmented transfer learning from prior work with medieval manuscripts.
AB - Most of the Dead Sea Scrolls have now been digitally transcribed and imaged to very high standards. Our goal is to align the transcriptions with the text visible in the image, glyph by (often fragmentary) glyph. This involves several tasks, normally considered in isolation: (A) Baseline segmentation. (B) Line polygon extraction. (C) Automated transcription by handwritten character recognition, to aid in alignment. (D) Alignment of the Unicode characters in a line transcription with the characters in the image of that line. The task is frustrated by the degraded nature of the frequently very small and/or warped fragments with many broken letters, substantially different allographs, ligatures, and scribal idiosyncrasies. Furthermore, a great number of inconsistencies between current cataloguing systems for the data need to be resolved. For each task, we apply state-of-the-art machine-learning methods in addition to more traditional techniques, each presenting significant difficulties on account of the poor state of most fragments' preservation. We have built ground-truth datasets and have managed to achieve good results with well-preserved fragments by leveraging heavily augmented transfer learning from prior work with medieval manuscripts.
KW - historical manuscripts
KW - image segmentation
KW - transcription alignment
UR - http://www.scopus.com/inward/record.url?scp=85097797363&partnerID=8YFLogxK
U2 - 10.1109/ICFHR2020.2020.00072
DO - 10.1109/ICFHR2020.2020.00072
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85097797363
T3 - Proceedings of International Conference on Frontiers in Handwriting Recognition, ICFHR
SP - 361
EP - 366
BT - Proceedings - 2020 17th International Conference on Frontiers in Handwriting Recognition, ICFHR 2020
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 7 September 2020 through 10 September 2020
ER -