How to process multiple layout variations with one single document parser?
The easiest way to get started with Docparser is to create one Document Parser for each type of document you want to parse. This method has however one major drawback: You need to be able to identify the Document Layout prior to uploading the document and send the document directly to the Document Parser which matches the layout.
In some cases, identifying the document layout before importing the document might not be possible.
This is where our advanced feature to "Process Multiple Document Layouts" comes into play. This feature allows you identify Document Layouts and dynamically route the document to a matching set of parsing rules.
How does the "Layout Model" feature work?
In a nutshell, the idea is that you create multiple sets of parsing rules (Layout Model) within one single Document Parser. Each set of parsing rules fits exactly one specific Document Layout. For each Layout Model you also create routing rules which allow you to identify the Document Layout and apply the matching set of parsing rules.
How can I set this up?
Here is step-by-step guide on how to set up Docparser to parse multiple document layouts with one single Document Parser. In our example, we will process invoices from different vendors with one single Document Parser.
1/ Activate the advanced feature "Process Multiple Document Layouts"
As a first step you need to activate the feature "Process Multiple Document Layouts" which you'll find the "Advanced" tab of your Document Parser settings.
2/ Upload enough sample documents to cover all layout variations
To make the set up as easy as possible, it is important to have enough sample documents in your Document Parser. We recommend to import at least two or more sample documents for each layout variation. Having multiple samples for each Document Layout is necessary to define document routing and data parsing rules which work reliably.
3/ Create default parsing rule to identify the Document Layout
Whenever a new document is imported, a first round of data extraction is performed. The results of this first data extraction run can then be used to identify the layout of a document. In other words, the decision which Layout Model should be applied is based on the results of the default parsing rules.
Creating default parsing rules which you can use for identifying your document layouts requires some creativity and highly depends on your use-case. In many cases, a simple "Tag Document" parsing rule will do a great job to identify the document type. This template lets you define a list of keywords (e.g. vendor names) which can be used to identify the layout of a document. In our invoice example, another way of identifying the invoice layout would be to create a parsing rule which returns all Tax IDs.
Please note that this step is optional in case the type of document can be identified by looking at things like the file name, the number of pages or a remote ID. If this is not the case, creating at least one default parsing to identify your document type is necessary.
4/ Create a new model for each layout variation
Navigate to the list of documents and select two files which have the same layout. Under "Perform Action" you'll find "Create a new Layout Model" for the selected files.
You are now prompted to name your new Layout Model. Choose a name which lets you easily identify the Layout Model later on.
Furthermore, you are prompted to define some routing rules for this Layout Model. As mentioned above, the decision which Layout Model is taken for parsing a document is based on rules which you need to define.
Docparser tries to find routing rules automatically for you by looking at all the data we have for your documents at this point. This is why you should select at least two documents of the same type so that Docparser can find similarities between the two of them.
The data points which are available for creating routing rules are the metadata (file name, page count, ...) as well as the data extracted by the Default Parsing Rules you created. If you want to base the routing decision on specific keywords inside your document, you first need to create a default parsing rule first which extracts the keywords (see "Tag Document" above).
5/ Create parsing rules for each Layout Model
Now that you have created a Layout Model for each of your document types, it's finally time to create Parsing Rules for each of them. Creating parsing rules for your Layout Models works exactly in the same way than creating Parsing Rules for single-layout parsers.
Once you are done with all the step above, your Document Parser is capable of identifying the layout of newly imported documents and apply a matching set of parsing rules. All further functionalities of your Document Parser (Download Data, Integrations, ...) will behave in the same way they do when using the single-layout mode.
Please note: Docparser will merge the parsed data of all parsing rules with the same name. If you download the parsed data of multiple documents and you want to have a uniform set of column names, please make sure that the parsing rules of all models have the same names and are of the same parsing ruletype. The same logic applies when using our webhook integrations, such as Google Sheets or Custom Webhooks.