How to improve the accuracy of Optical Character Recognition (OCR)?

Optical Character Recognition (OCR) is the process of turning a scanned document into editable and machine-readable text. At Docparser, we automatically apply OCR whenever we detect that your uploaded document is a scanned image.

The accuracy of OCR is usually near to 100% if your document comes in a professional scan quality. There are however various situations where OCR can yield to less accurate results, including:

  • The font size of the document is very small
  • The scanned image contains scanning artifacts (pixel noise, black paper borders, ...)
  • The text is not surrounded by a white background
  • The scanned image has low black and white contrasts
  • The document was not well aligned during scanning and the image is skewed

Docparser comes with a variety of  pre-processing filters to programmatically improve OCR accuracy. While those filters try to minimize the effects of most scanning issues, the most reliable way of improving OCR accuracy is to provide a high quality scan. 

A high quality scan has the following attributes:

  • A resolution of 200 - 300 DPI
  • Well aligned and no skewing
  • High black & white contrasts
  • No scanning artifacts (pixel noise, black paper borders, ...)

Still need help? Contact Us Contact Us