This activity read the contents of the PDF data, including headers, and extracts the data. It identifies the fields, codes them, and groups them with the data in each field.
- Criteria – OCR queries to extract the data from PDF pages.
- From Page Number – Set the page extraction mode into "Range" and specify the page numbers to start the extraction.
- Image Resize Percentage – Enter the percentage value to rescale an image.
- OCR Engine – OCR engine instance returned by the activity Create Tesseract OCR Engine. The Tesseract OCR engine creates language-specified training data to recognize words. It biases the words and sentences that often appear together in a specified language as a human brain does. It produces accurate results with the training data.
- Page Extraction Mode – Set the page extraction mode to "All," "Single," or "Range" to continue the extraction.
- Password – The password of the PDF file, if necessary.
- PDF File Path – Specify the name of the PDF file to export as an image.
- Retain Temp Images – It Specifies to keep the exported images in the staging folder or delete them after the text extraction.
- Single Page Number – Set the page extraction mode to "Single" and specify the page number to extract text.
- Staging Folder – It specifies the path of the exported image folder.
- To Page Number – Set the page extraction mode to “Range” and specify until which page to extract the text.
- DisplayName – Add a display name to your activity.
- Private – By default, activity will log the values of your properties inside your workflow. If private is selected, then it stops logging.
- Continue On Error – Specifies if the automation should continue even when the activity throws an error. This field only supports Boolean values (True, False). The default value is False.
Note: If this activity is included in Try Catch and the value of this property is True, no error is caught when the project is executed
- Result – The list of pages and its corresponding metadata extracted and returned back by this activity.