TY - GEN
T1 - Image and text correction using language models
AU - Kissos, Ido
AU - Dershowitz, Nachum
N1 - Publisher Copyright:
© 2017 IEEE
PY - 2017/10/13
Y1 - 2017/10/13
N2 - We report on experiments with the use of learned classifiers for improving OCR accuracy and generating word-level correction candidates. The method involves the simultaneous application of several image- and text- correction models, followed by a performance evaluation that enables the selection of the best image-processing model for each document and the most likely corrections for each word. It relies on a training set comprising document images and their transcriptions, plus a domain corpus used to build the language model. It is applicable to any language with simple segmentation rules and performs well on morphologically-rich languages. Experiments with an Arabic newspaper corpus show a 50% reduction in word error rate, with per-document image enhancement a major contributor.
AB - We report on experiments with the use of learned classifiers for improving OCR accuracy and generating word-level correction candidates. The method involves the simultaneous application of several image- and text- correction models, followed by a performance evaluation that enables the selection of the best image-processing model for each document and the most likely corrections for each word. It relies on a training set comprising document images and their transcriptions, plus a domain corpus used to build the language model. It is applicable to any language with simple segmentation rules and performs well on morphologically-rich languages. Experiments with an Arabic newspaper corpus show a 50% reduction in word error rate, with per-document image enhancement a major contributor.
UR - http://www.scopus.com/inward/record.url?scp=85082655471&partnerID=8YFLogxK
U2 - 10.1109/ASAR.2017.8067779
DO - 10.1109/ASAR.2017.8067779
M3 - ???researchoutput.researchoutputtypes.contributiontobookanthology.conference???
AN - SCOPUS:85082655471
T3 - 1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
SP - 158
EP - 162
BT - 1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 1st IEEE International Workshop on Arabic Script Analysis and Recognition, ASAR 2017
Y2 - 3 April 2017 through 5 April 2017
ER -