Skip to main content

Extract PDF Text With OCR

Description#

This activity read the contents of the PDF text, including headers, and extracts the text.

Properties#

Input#

  • From Page Number – Set the page extraction mode into "Range" and specify the page numbers to start the extraction.
  • Image Format – Specify the image format to save the extracted images.
  • Image Resize Percentage – Allows you to rescale an image by the mentioned percentage.
  • OCR Engine – OCR engine instance returned by the activity Create Tesseract OCR Engine.
  • Page Extraction Mode – Set the page extraction mode to "All," "Single," or "Range" to continue the extraction.
  • Password – TSets the password to the PDF file, if necessary.
  • PDF File Path – Specify the name of the PDF file to export as an image.
  • Single Page Number – Set the page extraction mode to "Single" and specify the page number to extract text.
  • To Page Number – Set the page extraction mode to “Range” and specify until which page to extract the text.

Misc#

  • DisplayName – Add a display name to your activity.
  • Private – By default, activity will log the values of your properties inside your workflow. If private is selected, then it stops logging.

Optional#

  • Continue On Error – Specifies if the automation should continue even when the activity throws an error. This field only supports Boolean values (True, False). The default value is False.

    Note: If this activity is included in Try Catch and the value of this property is True, no error is caught when the project is executed

Output#

  • Result – It displays the input text extracted from the PDF file using the OCR engine.

Example#

Extract PDF Text With OCR

Download Example