How does Invoice Parsing work with Docparser?
Our invoice parsing rule presets allow a user to create a fully working PDF invoice Document Parser in a couple of minutes. Our proprietary extraction algorithms are using a mixture of pattern and keyword-based recognition and work out of the box.
How to create an invoice document parser?
The first step is to create a new Document Parser and choose 'Invoices' as the document type. You will then be prompted to upload some sample invoices. Once done, click on 'Continue to Parser'.
At this point, Docparser automatically created the following parsing rules for you:
- Invoice Date
- Invoice Number
- Invoice Totals
You can then navigate to 'Documents' and flip through the sample invoices you already uploaded. In case the fields listed above are the only data points you want to extract you are already done with the setup. In case your parsing rules are not returning the correct data, please refer to this article.
You can now upload more invoices and download your parsed in Excel, CSV, XML, or JSON format. You can also choose to create a webhook integration which will copy your parsed data to other software applications in real-time.
Adding Custom Parsing Rules
In case you want to extract more data fields from your invoices (vendor, line-items, ...) you can add additional custom parsing rules. Go to 'Parsing Rules' and click on 'Create Parsing Rule'. You'll see different parsing rule presets suitable for invoice parsing.
Next to the presets dedicated to invoice parsing you can also choose to add a 'Generic Parsing Rule' which will allow you to extract any kind of data from invoices. Please see the articles linked below for more details on creating custom rules.
Parsing Line-Items From Known Invoice Layouts
Docparser offers several table data extraction tools which makes it easy to parse invoice line-items. Setting up a line-item parsing rule is usually a quick procedure and takes only a couple of minutes thanks to our point & click table parsing tool. Depending on the complexity of your invoice layout, you can also apply advanced table parsing techniques to get all the data you need.
Parsing Line-Items From Unknown Invoice Layouts
If you want to extract line items from invoices where you don't know the layout, Docparser might not be a great fit for you. This is because extracting line-items and other specific data points requires a dedicated set of parsing rules trained to the specific layout/structure of the document. The idea is that you need to create one set of parsing rules for each invoice layout (e.g. for each vendor) if you want to extract data that is not covered by the smart filters (Invoice ID, Invoice Date, ...) mentioned above.
One common approach for this scenario is to create parsing rules for the most common invoice layouts only. In our experience, a small set of invoice layouts can easily cover 80% of the invoice volume. Our biggest customers are managing parsing rules for 750+ invoice layouts with Docparser.