OCR Error Correction Using Character Correction and Feature-Based Word Classification

Ido Kissos, Nachum Dershowitz

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

This paper explores the use of a learned classifier for post-OCR text correction. Experiments with the Arabic language show that this approach, which integrates a weighted confusion matrix and a shallow language model, improves the vast majority of segmentation and recognition errors, the most frequent types of error on our dataset.

Original languageEnglish
Title of host publicationProceedings - 12th IAPR International Workshop on Document Analysis Systems, DAS 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages198-203
Number of pages6
ISBN (Electronic)9781509017928
DOIs
StatePublished - 10 Jun 2016
Event12th IAPR International Workshop on Document Analysis Systems, DAS 2016 - Santorini, Greece
Duration: 11 Apr 201614 Apr 2016

Publication series

NameProceedings - 12th IAPR International Workshop on Document Analysis Systems, DAS 2016

Conference

Conference12th IAPR International Workshop on Document Analysis Systems, DAS 2016
Country/TerritoryGreece
CitySantorini
Period11/04/1614/04/16

Keywords

  • Classifier
  • OCR
  • information retrieval
  • spelling correction

Fingerprint

Dive into the research topics of 'OCR Error Correction Using Character Correction and Feature-Based Word Classification'. Together they form a unique fingerprint.

Cite this