How OCR Technology Really Works
June 16, 2015
James Daughtry, Software Engineer and Support Supervisor
OCR is a growing topic of discussion in the world of business. What exactly is it? Does it have the ability to improve the efficiency of my business? How does it work?
OCR, Simplified & Defined
OCR stands for Optical Character Recognition. As you read these words, your eyes and brain are actually performing OCR. Your eyes recognize the difference between black and white that make up the letters of this article, and your brain locates matching characters from your memory to determine what the patterns represent. Since the human brain is an excellent OCR engine, it can easily analyze words, combinations of words in context, and grammar to find meaning. Computers now perform a process very similar to this.
A Basic Overview
Computers see digitized images as a huge collection of pixels. An OCR algorithm analyzes many pixels in any given area to match the pattern of a known character. However, this process is complicated by the fact that the same character may be represented by many wildly different fonts or the limitless variations of handwriting. Originally, there was a standardized font to accomplish this pattern recognition called OCR-A. The OCR algorithm would find a connected sequence of black pixels, draw a box around it, then compare the contents of that box with the characters stored in memory. When a reasonable match was found, the matching character would be produced as text. A variation of this font can still be found on the MICR line of checks, but due to the variety of fonts and handwriting, it’s too limited for general use in the modern world. OCR engines still use the pattern recognition method for machine fonts, though not exclusively.
Another method of OCR in common use today has its own name: Intelligent Character Recognition (ICR). ICR detects the features of a character rather than the character as a whole, and if the collective features reach a certain percentage of confidence, then the character is determined to be matched. For example, the capital letter A has approximately three features. ‘/’ plus ‘-‘ plus ‘\ ‘= ‘A’. Regardless of the font, and barring exceptionally poor handwriting, this approach will locate most capital A characters.
For a high level view of how OCR works, let’s take a moment to consider what factors the computer considers when using OCR:
- Image quality: While an image may look fine to human eyes, the computer can perceive characters as blocky and broken up. This can negatively affect both whole pattern recognition and feature recognition. Any extraneous blobs, stray marks, or variance in orientation can also hurt recognition quality.
- DPI: DPI stands for Dots Per Inch. It’s a bit of a misnomer; a more accurate term would be Pixels Per Inch. The more pixels there are, the better OCR will be, though this also increases the storage size of an image. It is recommended to have at least 200×200 DPI for good OCR results.
- Color: OCR works best on black and white images. When images are in color, an OCR engine will usually attempt to turn them into black and white first. This process is called thresholding. Convenient as this may seem, the result of this thresholding process may not be as good as if the image were black and white to begin with.
- Character Size: The smaller a character is, the more difficult it is to read, much like how our eyes work. This is because there are fewer pixels to act as error correction.
- Kerning: Kerning is a typography term referring to the distance between characters. It’s a balancing game…too close together and the characters may be recognized as a single entity; too far away and the characters may not be recognized as part of the same word.
The Benefits of OCR
The continuous expansion of technology is an amazing feat; with OCR technology, we now have the ability to digitize documents that we previously were unable to. OCR can help data to be electronically extracted so that it can be edited, searched, displayed, and even stored and retrieved. In addition, software such as KTM and Kofax Capture utilize OCR to help recognize elements of a document (something you or I could do by hand) in an instant, greatly speeding up the process. This is certainly a grand improvement to the efficiency of any business!
OCR is an intensive process for computers. They must analyze individual pixels and patterns while matching the result with character heuristics stored in memory. They must perform image processing to ensure that it is recognizable. They must compare and contrast matched characters to determine confidence.
This process can take both time and RAM, and even the very best OCR engines are far from perfect. However, they can be extremely useful in the business world. Having the ability to digitize and edit material that was previously untouchable allows for greatly improved productivity. DoxTek is not based solely on OCR technology, but rather on providing solutions for businesses. As OCR is never guaranteed to produce 100% accurate results; DoxTek does not rely on OCR alone. Instead, we utilize the benefits it provides to deliver greater solutions and improved business processes to you.
PDF Version: How OCR Works