Skip to main content

Getting Started

Creating New Project#

In order to create a new document training model project from Trainer Buddy:

  1. Invoke TrainerBuddy.exe from IntelliBuddies installation folder
  2. In the File tab, click on New and select Blank Project
  3. This will pop up a new project dialog:

FieldDescription
Project NameSpecify a name for this project. Ensure that following characters are not used as part of the name: "\<>:/\|?*
LocationSpecify a location to save this project in your file system. A new folder with the Project Name specified above will be created under this specified location. All the project-related files and resources will be stored under this folder.
Image Resize PercentSpecify the image resize percentage from the dropdown. This will help OCR Engine to extract text more accurately.
DescriptionType in some description for your reference in the future about this project.
  1. Click on the Create button to create a new project with the details provided above.

Document Templates#

You can add, modify, and delete a document template for training your model from Trainer Buddy. In order to manage your document templates used for training your model, you should open the corresponding project in the Trainer Buddy.

Adding new document template#

  1. Click on Batch tab in Ribbon Tabs panel
  2. Click on Add menu in Ribbon Menu panel
  3. Select the document template that you want to train as part of this project
  4. This will add a new node with the name of the document template file selected under the Batch panel

For example, if the selected document templates name is invoice01.pdf, then the Batch panel would be updated as shown below:

Document Template Properties#

Once you have added a new document template into the project, you can view and configure the document template properties from the Properties panel.

PropertyDescription
NameThe name of this document template. By default, it will be set to the file name of the document template added. You can modify the name according to your project needs.
Selection ModeSelect the document identification mode. You have following options:
  • File Name Pattern: Identify document based on the file name pattern
  • Keywords: Identify document based on existence of the specified keywords as part of the content of the document
  • Select All: Identify document based on both the above options
KeywordsThe keywords to be matched in case the Selection Mode was Keywords. You can add, edit, and remove keywords from here.
Match All KeywordsCheck this if you want to match all the specified keywords to identify a document. By default, the document will be identified to be belonging to this template if one of the keywords match.
ToleranceThe Tolerance to be used while matching the keywords inside the document content. The following options are available:
  • Weak: Enables exact match criteria by ensuring zero(0%) percent tolerance
  • Medium: Enables match with mild tolerance up to 25%
  • Strong: Enables match with strong tolerance up to 50%
  • Custom: Enables custom specified match tolerance
Custom ToleranceThe custom tolerance in percentage to be used in case the Tolerance selected was Custom

Once you configure the document template properties, the same will be reflected in the Batch and Properties panel.

Page Templates#

Once you add a new document template into the project, it automatically lists all the pages of this template under the corresponding document template node inside the Batch panel. You can view all the pages by expanding the corresponding document template node in the Batch panel.

Context Menu#

You can manage the pages from the Batch panel. Trainer Buddy provides a context menu to manage the pages to be utilized for training under a corresponding document template.

MenuDescription
Add RegionAdds a new region node under this page
DisableDisables this page from the training project. Disabling the page will still keep the page node so that you can enable it back later
DeleteDeletes this page from the training project. A page once delete can never be reverted back

Page Template Properties#

You can view and configure the page properties from the Properties panel for the page selected in the Batch panel.

PropertyDescription
NameThe name of this page. By default, the name would be set to Page #, where # would represent the page number of the corresponding page inside document template
Title - PatternsYou can identify this page of the document by matching the patterns specified here.
Title - Match All PatternYou can check this flag to match all the patterns specified to identify this page.
Title - RegionYou can specify the region on this page to search for the Title Patterns.

Regions#

The performance of OCR Engine depends on the size of the image processing. The lesser the size, the higher the performance. It has also been seen in some cases the accuracy of extraction will also improve if we provide accurate Clipping Region to OCR Engine. You can manage the data extraction to be much faster and much accurate by defining the regions inside your pages. You can add a new region under a page using the Context Menu option Add Region for the corresponding page.

Context Menu#

The region node inside the Batch panel provides the following context menu options:

MenuDescription
Add FieldAdds a new field under this region
CopyCopies the entire region onto the clipboard so that you can paste it to re-use this region under a different page or document template
DeleteDelete this region

Region Properties#

You can view and configure the region properties from the Properties panel for the selected region under the Batch panel.

PropertyDescription
NameSpecify a name for this region. By default, a name would be assigned to this region in the format Region # where # would be region index.
RegionSpecify the BoundingRect for this region. By default, it would select the entire image as a region. You can select your region by clicking on the [...] button of the Region property inside the Properties panel. This will bring up the region selection dialog on top of the Image panel. You can then specify the region by holding the mouse left button, drag and release the button. You can then press the Apply button inside the region selection dialog to set the specified region inside the Region property.

Fields#

A Field is the leaf node in the Batch panel. It represents specific information that needs to be extracted from the document. You can add a new field under a region through the region context menu.

Field Properties#

You can view and configure the field properties in the Properties panel for the selected field inside the Batch panel.

PropertyDescription
NameSpecify a name for this field. By default, a name would be assigned to the field in the format Field # where # represents the index of this field.
Default ValueThe default value to be assigned to this field
OCR ParametersOCR Parameters to be used by OCR Engine while extracting this field value
RegionThe Bounding Rectangle in the page where this field's value is located
TypeShould be one of the following:
  • Absolute Position: The specified region represents the absolute position of the field value
  • Relative to Anchor: The specified region represents the relative position of the field value from the specified anchor text
  • Relative to Field: The specified region represents the relative position of the field value from the specified Field
  • Relative to Title: The specified region represents the relative position of the field value from the page title
Relative Anchor PatternsThe anchor patterns to be used in case of Relative to Anchor type field
Relative FieldThe name of the field to be used in case of Relative to Field type
ToleranceThe Tolerance to be used while matching the anchor patterns inside the document content. The following options are available:
  • Weak: Enables exact match criteria by ensuring zero(0%) percent tolerance
  • Medium: Enables match with mild tolerance up to 25%
  • Strong: Enables match with strong tolerance up to 50%
  • Custom: Enables custom specified match tolerance
Custom PercentageThe custom tolerance in percentage to be used in case the Tolerance selected was Custom

This way, you can add all the fields under the corresponding region. You can train the document training model by adding all the fields under this region and continue further to add any other regions under this page. Further, you can continue training the model to handle other pages under the current document template before proceeding to add more document templates to the model.

Validating Document Training Model#

Once you have completed training for all the document templates, you can validate the document training model by clicking the Validate button inside the Batch ribbon tab menu. Any errors occurring during the validation process will be reported under the Error panel.

Resolve all the errors before publishing or exporting the document training model.

Publishing Document Training Model#

Once the validation of your document training model is successful, you can publish the training model so that it could be consumed as part of IntelliBuddies OCR Activities.

You can publish by clicking on the Publish button available as part of the Batch ribbon tab menu. This will bring up the publish dialog asking for you to select the location to publish this document training model.

On publishing, the document training model will be serialized to a JSON file under the specified location. The name of the model would be selected based on the project name. The Output panel would display the message indicating the name of the training model published.

You can now start using this training model as part of the activities such as Identify Document With OCR and Extract PDF Data With OCR. The JSON file published by Trainer Buddy is the serialized version of DocumentQueries which goes as input to these Activities