How can I extract table rows from a document?
Extracting tables from documents can be achieved by creating either a 'Table Rows' or 'Line Items' parsing rules. The following screen cast should give you a quick overview on how to create a PDF table extraction parsing rules. Further below you'll find a more detailed step-by-step guide.
1) Create a new parsing rule
Navigate to 'Parsing Rules', click on 'Add Parsing Rule' and choose 'Table Data' when prompted what type of data you want to extract
2) Define the table borders
The first step is to visually define where the table is located inside your document.
- Move the existing column separators so that they fit the column borders of your table
- Add as many column separators as needed with the '+' buttons to the left and to the right of the screen
- You can also define the area where the table is located by keeping your mouse pointer clicked while moving your mouse (optional). A defined table with column separators is shown below:
Please note: It is not always necessary and sometimes not recommended to crop an area selection. If you are working with tables ranging over several pages it is usually a better idea to not crop an area selection in the first step. Instead, just set the column dividers and add table row filters in the next step (see below) to extract the table data you need.
3) Refine parsed results
The results of the table extraction will be visible after clicking on 'Confirm & Continue' in the bottom right. The previous step of visually defining the table will give you a result which you likely want to refine somewhat. You can e.g. filter out unwanted rows, format dates, etc. Refining parsing results is done by chaining up multiple filters on the right side. A click on 'Add Table Filter' will reveal a menu with various options.
Below is a screen shot of data after the above selection area and column separators were defined: