Â© 2009 Accusoft Corporation 1 | P a g e
Improve OCR Accuracy on Color Documents
Use Image Detergentâ„¢ to Clean Up Color Document Images Prior to OCR for
This white paper confirms that industry-standard practices to clean color document images can
be improved to produce higher OCR accuracy. Image Detergentâ„¢ from Accusoft improves
OCR accuracy by 5-10% more than a standard Smoothing filter. This white paper leads the
reader through the testing that proves it.
Standard smoothing algorithms provide a good way to reduce background noise and improve
the appearance of scanned documents. However, they are also highly destructive to text and
other data commonly found on a document image. The Image Detergent filter within the
ScanFixÂ® Xpress software development kit (SDK) from Accusoft works on a different principle
than other smoothing filters, and is intended specifically for use on color document images.
This paper explores the impact of the Image Detergent smoothing filter on color document
images containing various text and background colors. The two items quantitatively measured
were OCR accuracy and cleaned up file size. OCR accuracy was measurably improved using
Image Detergent, and file sizes for both lossless and lossy compression methods were
significantly smaller for the images after processing with Image Detergent.
Image Detergent is only available within ScanFix Xpress from Accusoft. A trial version can be
downloaded here: http://www.accusoft.com/scanfix.htm.
Figure 1 Before (top) and after (bottom) clips from an image cleaned with Image Detergent
Noise is a common problem in all areas of digital signal processing, and the realm of document
imaging is no exception. Noise in images typically shows up as specks or variations in color
where none is desirable. An example of this can be seen above; the pink background in the top
half of the image consists of many different shades of pink. This variation contain