How can I extract text from a fixed position?
Extracting a fixed data field (e.g. an invoice number, date, shipping address, ...) from a PDF document is easy with Docparser. All you need to do is to create a parsing rule which tells our algorithm where the data field is located in your document.
1) Create a new parsing rule
Navigate to "Parsing Rules" and click on "Add Parsing Rule". Choose the template "Text Fixed Position".
2) Draw a rectangular around the area where your data field is located
The first step is to visually define where the data field is located inside your document. For doing so, put your mouse on the upper left corner of where the text is located. Hold the mouse clicked and draw a square around the text you want to parse.
Please note: If the position of the text data changes slightly from time to time within your documents, draw a large enough rectangular which includes all possible positions. In this scenario, you can make use of the text filters in the next step to clean the data. If the text is in a completely different location each time then read this article.
Below you see a Phone Number extraction started. We select the area that will contain the phone number.
3) Refine parsed results
The results of the visual data extraction is available immediately once you click on 'Confirm & Continue'. In case the data is already extracted in the desired format you are done with the setup and you can save your parsing rule.
You can also choose to add text filters to further modify the parsed text. You can for example crop text from the beginning or the end, format dates, ... You can chain up multiple filters until the data is in the needed format.