At a Glance: OCR and Invoice Processing

What is OCR?

OCR (optical character recognition) refers to the automatic text recognition on the basis of pictorial information: And what does that have to do with invoices? Invoices are not pictures, but text documents. True - and again not. If documents are scanned, they will be saved as an image.

The task of an OCR program is to generate computer-interpretable characters in an image file. This extracted text can then be fed into a machine for further processing.

So, in other words, OCR software converts pictorial information into computer-interpretable characters. Historically, this process has been done manually by a worker sitting in front of a screen. However, OCR takes this tedious role out of the employee’s hands, freeing up their time for more value-driven and business-oriented, strategic efforts.

How Does OCR Work?

  1. Layout analysis
    First, the OCR technology makes a layout analysis. The text is located and then subdivided into paragraphs, sentences, words and characters, and the program takes note and remembers which text elements are where.

  2. Character recognition.
    Next is character recognition where the software must correctly identify the characters it’s found. It does this with the aid of various pattern and feature recognition methods which are combined for an optimal result. For example, the features of an E get broken down a vertical line and three horizontal bars, regardless of which font is used. In pattern recognition, the software matches the found characters (pixels) with the characters in its own database. But it needs to get a 100% match for the character to be recognised.

  3. Rebuilding the characters into words/sentences.
    Finally, the OCR program rebuilds the characters into words and sentences. To get the best possible result, it uses built-in dictionaries to match the recognised text, and even tries to include grammatical rules.

What Influences the Quality of OCR?

The quality of your OCR results depends largely on the following factors:

  • Scope and quality of the sample database

  • Scope and quality of the dictionaries

  • Quality of algorithms for error correction

  • Color, contrast, layout and font of the original document

  • Resolution and quality of the image file.


The Biggest Misconceptions about OCR and E-Invoices

Regarding invoice processing, there are three major misconceptions about OCR and how it interacts with electronic invoicing. Luckily, each of these misconceptions is wrong.

Misconception #1 – They’re the same thing.

OCR is not equivalent to electronic invoice processing. It is rather a precursor to it. As described earlier, OCR is a machine-based process for collecting data and, in the case of invoices, transferring it to the ERP system so that it can be electronically processed from there. It does not replace the electronic processing, but only the upstream manual process of invoice entry.

Misconception #2 – You need OCR for e-invoicing

Not quite. Even without OCR, invoices can be processed electronically. On one hand you can manually capture paper invoices (which is very complex and error-prone), on the other hand you can receive invoices directly as an electronic record (i.e. as a real electronic invoice).

Misconception #3 – There goes half your human workforce.

Not at all! OCR is a complex process and even when extremely mature, manual editing won’t be completely eliminated. Many companies believe that they can be done by non-specialist temporary workers, but since OCR programs learn gradually, employees’ input is important for determining whether documents are correctly recorded and processed. It is therefore essential to use employees with appropriate know-how to check behind OCR technologies.

Invoice Processing with OCR: Pros and Cons

It’s inevitable that when a new software is adopted, during implementation there’s bound to be added work and expenses. Therefore, a lot of thought should go into whether the investment is worthwhile. Like we’ve seen, OCR is not a requirement for electronic invoice processing. But there are many advantages to it, as well.

If a company receives a lot of paper invoices, OCR capture can be a better alternative to manual entry as time savings will quickly outweigh the investment.

In addition, electronic invoice processing can be used with OCR regardless of whether suppliers send their invoices electronically or not. Service providers such as Basware offer their customers activation campaigns to convince suppliers to switch to electronic invoicing and find the right solution. By using OCR, however, you can immediately start with electronic invoice processing, even if initially many of your suppliers send paper invoices and are slow to switch to electronic.

But there are also arguments against OCR. Data quality is not as good as invoices received electronically from the outset. This means that a decent amount of manual reworking is required, especially in the early days when the program is still learning. In addition, the layout of the documents has to be standardised. Because OCR programs remember where a document is located, it must re-learn each time something is changed in the layout, such as placing the header data in a different location. Lack of standardisation slows down text recognition and increases the susceptibility to errors.

For High Quality Data, Seek Genuine Electronic Invoices

Experience has shown that there is usually a small amount of paper bills left, even if a majority of vendors send electronic invoices to you. Since it's not worth keeping your own expensive scanning and OCR infrastructure for a handful of documents, make sure that the solution provider of your choice also offers Scan & OCR services. This means that no matter what form of invoices your suppliers send to you, you will only receive electronic invoice data in the invoice receipt.


Basware offers two ways to implement this.

1. Basware takes over the scanning as well as the text recognition and validation of your data.
2. You scan yourself and then transfer the documents to Basware for further processing.

You might also like…

  • Check out this blog on why OCR is no longer the answer but e-Invoicing, SmartPDF, Smart Coding are.

  • In our factsheet, learn about the Basware innovation created to change emailed PDF invoices into readable electronic invoices (e-invoices).

Ready to Learn More?

Find more about Basware's Scan & Capture. If you are interested, contact us with any questions you have—we’re here to help!

Director of Global Business Development at Basware