TY - JOUR
T1 - Deep learning for paleographic analysis of medieval Hebrew manuscripts
T2 - 2020 Twin Talks 2 and 3 Workshops at DHN 2020 and DH 2020: Understanding and Facilitating Collaboration in Digital Humanities 2020, TwinTalks 2020
AU - Shapira, Daria Vasyutinsky
AU - Rabaev, Irina
AU - Barakat, Berat Kurar
AU - Droby, Ahmad
AU - El-Sana, Jihad
N1 - Publisher Copyright:
© 2020 CEUR-WS. All rights reserved.
PY - 2020
Y1 - 2020
N2 - Our research project is part of the Visual Media Lab, headed by Professor Jihad El-Sana, the Department of Computer Science at Ben- Gurion University of the Negev, Israel. In this interdisciplinary project we apply deep learning models to classify script types and sub-types in medieval Hebrew manuscripts. The model incorporates the the techniques and databases of Hebrew paleography and (with reservations) Hebrew codicology. Main theoretical base of our project is the SfarData dataset, that in- cludes the full codicological descriptions and paleographical definitions of all dated medieval Hebrew manuscripts till the year 1540. In some ex- ceptional cases, we go beyond this dataset framework. The major source of the data in terms of high definition photos of manuscripts is the In- stitute of Microfilmed Hebrew Manuscripts at the National Library of Israel that has undertaken the mission to collect copies of all extant He- brew manuscripts from all over the world. We mostly use manuscripts from the National library of Israel, the British library, and the French National library. This multidisciplinary project brings together researchers from both fields, Humanities and Computer Science. Currently, one professor, one lec- turer, one post-doc, and two doctoral students are participating in the project. This is a very exciting work in which there are no ready-made so- lutions for the various challenges. We collectively discuss ways to address these challenges and adapt our solution on the go. During the presentation, we will talk about how our project functions and how we strive to achieve a common result. The inevitable difficul- ties that we face during this collaboration include, inter alia, different research systems in Humanities and in Computer Sciences, lack of com- mon terminology, different technical training, different requirements for publications and conferences, etc.
AB - Our research project is part of the Visual Media Lab, headed by Professor Jihad El-Sana, the Department of Computer Science at Ben- Gurion University of the Negev, Israel. In this interdisciplinary project we apply deep learning models to classify script types and sub-types in medieval Hebrew manuscripts. The model incorporates the the techniques and databases of Hebrew paleography and (with reservations) Hebrew codicology. Main theoretical base of our project is the SfarData dataset, that in- cludes the full codicological descriptions and paleographical definitions of all dated medieval Hebrew manuscripts till the year 1540. In some ex- ceptional cases, we go beyond this dataset framework. The major source of the data in terms of high definition photos of manuscripts is the In- stitute of Microfilmed Hebrew Manuscripts at the National Library of Israel that has undertaken the mission to collect copies of all extant He- brew manuscripts from all over the world. We mostly use manuscripts from the National library of Israel, the British library, and the French National library. This multidisciplinary project brings together researchers from both fields, Humanities and Computer Science. Currently, one professor, one lec- turer, one post-doc, and two doctoral students are participating in the project. This is a very exciting work in which there are no ready-made so- lutions for the various challenges. We collectively discuss ways to address these challenges and adapt our solution on the go. During the presentation, we will talk about how our project functions and how we strive to achieve a common result. The inevitable difficul- ties that we face during this collaboration include, inter alia, different research systems in Humanities and in Computer Sciences, lack of com- mon terminology, different technical training, different requirements for publications and conferences, etc.
UR - http://www.scopus.com/inward/record.url?scp=85095965409&partnerID=8YFLogxK
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.conferencearticle???
AN - SCOPUS:85095965409
SN - 1613-0073
VL - 2717
SP - 84
EP - 92
JO - CEUR Workshop Proceedings
JF - CEUR Workshop Proceedings
Y2 - 20 October 2020
ER -