Skip to main content

Create Tesseract OCR Engine

Description#

This activity extracts the specified string and related information from a specified UI element, image, or PDF using the Tesseract OCR Engine. OCR stands for Optical Character Recognition, a technology programmed to recognize text inside images, like scanned documents and photos.

Properties#

Input#

  • Data Path – Terresact engine path.
  • Image Height Multiplier – It allows rescaling an image height by the mentioned percentage.
  • Image Width Multiplier – – It allows rescaling an image width by the mentioned percentage.
  • Language – It specifies the language used by the OCR engine to extract the text. It can recognize more than 100 languages with Unicode support. An OCR engine can save time by digitizing documents rather than manually typing the content.

Misc#

  • DisplayName – Add a display name to your activity.
  • Private – By default, activity will log the values of your properties inside your workflow. If private is selected, then it stops logging.

Optional#

  • Continue On Error – Specifies if the automation should continue even when the activity throws an error. This field only supports Boolean values (True, False). The default value is False.

    Note: If this activity is included in Try Catch and the value of this property is True, no error is caught when the project is executed

Output#

  • OCR Engine – OCR engine instance returned by the activity Create Tesseract OCR Engine. The Tesseract OCR engine creates language-specified training data to recognize words. It biases the words and sentences that often appear together in a specified language as a human brain does. It produces accurate results with the training data.

Example#

Create Tesseract OCR Engine

Download Example