What is a parsing rule and how do I create one?
A parsing rule is basically a set of instructions that tell our algorithm what kind of data you want to extract from your documents. Typically you will have one parsing rule for each data field inside your document.
Parsing rules are at the heart of Docparser and our mission is to make the creation of Parsing Rules as simple as possible for our users. In case you are getting stuck while setting up your parsing rules, please do not hesitate to contact our support staff. We will be more than happy to help you with the setup and provide a helping hand.
The following screencast video explains how to create your first parsing rules. Further below we go more into the details and you will find additional articles describing specific data extraction scenarios.
What are the general steps of setting up a parsing rule?
We typically distinguish between two types of parsing rules: Parsing Rules returning a single data field (e.g. Date, Phone Number, Postal Address) and Parsing Rules which return tabular data (Table Rows, Line-Items, Repeating Text Blocks, ...).
Both types of parsing rules are set up in a similar way though. You can create a parsing rule by navigating to the section 'Parsing Rules' inside your Document Parser and click on 'Create Parsing Rule'. When setting up a parsing rule, you will go through the following steps:
1/ Select a preset matching the type of data you want to extract
Docparser offers a variety of parsing rule templates. You will be presented with presets matching your use-case based on the type of document you chose when creating your Document Parser (Invoice, Purchase Order, Delivery Order, Bank Statement, ...).
You can also choose one of our 'Generic' and 'Advanced' parsing rule presets which will allow you to extract any kind of data from your PDF documents. Please refer to the linked articles below for more details on the different types of presets.
2/ Specify the area where you expect the data to reside inside the documents
In the majority of cases, the data fields of transactional business documents (Invoices, Purchase Orders, Delivery Orders, ...) are stored in a specific location inside the document. In those cases, Docparser will provide an easy to use point & click interface which you can use to define the location of your data field inside your document.
While most parsing rules require you to define the position of the data inside your document, we also provide parsing rules which do not need any location based instructions. Notably our Invoice and Purchase Order filters are capable to extract the right data without knowing the layout of a document.
3/ Refine your parsing results
The last step of the Parsing Rule setup process lets you chain up multiple filters to 'refine' your parsed data. This step is optional and only needed if the extracted data you are presented with needs to be post-processed.
If you want to post-process your parsed data, chaining up multiple filters proved to be a powerful method to crop away all unnecessary data fragments and end up with a clean set of data.
Depending on the type of data you are parsing (simple text fields or table rows), a different set of filters is available. The filters range from simple search & replace to formatting options for dates and numbers.
We hope this article gave you a good idea about what parsing rules are and how you can create your own. As mentioned above, please feel free to reach out to our support staff if you want a helping hand with the setup.