Build solutions

Develop complete solutions for classifying documents and extracting unstructured data with Solution Builder’s guided workflow. Create a project—the workspace in which you develop your solution—then add documents, train models, and build a flow to classify and extract data.

Complete these tasks to develop an end-to-end document understanding solution:

  1. Create a Solution Builder project.

  2. Add the documents on which you want to train models.

  3. Annotate documents with the information you want to extract.

  4. Create and train classification and extraction models.

  5. Build a flow, an automated workflow that classifies and extracts data from documents.

  6. Optionally, build refiners and validations to add to your flow.

  7. Optionally, publish the finished solution to use in production or share with others.

Tip

Solution Builder includes a tutorial that walks you through creating your first project, adding and annotating documents, and building a flow. Start the tutorial in the File menu.

Creating a project

A Solution Builder project is the workspace in which you train models, create a workflow, and iterate on your solution.

When you open the Solution Builder app, you can start a project or open an existing project, if any have been created in the workspace.

To start a project, click Create Project and fill out the project creation dialog.

  1. Enter a project name.

  2. Optional. Enter a description of the project.

  3. Select a space and subspace for the project. You can create the project in any space and subspace in which you are a collaborator.

  4. Select a drive in which to store the project. By default, Solution Builder suggests the most recently used drive.

  5. Click Create.

Next, you’ll add the documents that you want to use to develop your solution.

Adding documents

After you’ve created a project, add the documents you want to use to develop your solution. These documents are used to train your model to classify documents and extract data, so they should be representative of the documents you want to process in your production environment.

You can add most common document formats to Solution Builder, including PDF, .docx, .xlsx, .csv, and .txt formats. See the complete list of supported file types for details. To add documents, you can either:

  • Drag and drop files or folders from your computer.

  • Click Select folder or files and navigate to the files you want to add. You can select any file or folder in subspaces you have access to.

  • Import documents from another project. Expand the submenu under + Add files, click Import from project, and select the files you want to add.

When you add files to a project, they’re automatically digitized. In most cases, the default digitization settings are suitable, but you can tune them as needed.

Tip

For details about digitization settings, see the parameter reference for the process files step in Flow, which provides similar settings.

If your project includes various kinds of documents, you can use the Type field when you upload documents to expedite creating classes in your annotation set. Solution Builder automatically assigns a type based on existing directory names, but you can modify this value as needed.

Managing documents

The Documents tab lets you manage the documents in your project. From this tab, you can perform these common tasks:

  • Add or import files.

  • Rename, download, delete, or view documents.

  • Create an annotation set containing selected documents.

  • Redigitize specific documents or add a digitization profile for your document set.

    Tip

    For details about digitization settings, see the parameter reference for the process files step in Flow, which provides similar settings.

  • Manage document information fields, and filter or sort by field.

Annotating documents

Annotating your document set is critical to training models. During the annotation process, you mark each document with the information you want to capture. The annotations you make help the model learn what type—or class—of document you have and what kind of information you want to capture from each document class.

Solution Builder automatically assigns classes to documents based on the document type, but if you didn’t specify the type when you uploaded documents, or if you want to change the type, you can do so when you create the annotation set.

Create an annotation set

Before you can start annotating documents, you need to collect the documents you’ve uploaded in an annotation set. You can create the annotation set from either the Documents tab or the Annotation sets tab.

Create an annotation set from the Documents tab

  1. Select the files you want to include in the set.

  2. Click + Create Annotation set.

  3. Enter a name for the annotation set.

  4. Optional. Enter a description for the annotation set.

  5. Optional. Select the field you want to assign as the document class. By default, Solution Builder uses the Type field, but you can choose any field.

  6. Optional. Import a class schema from the Instabase Marketplace. The class schema defines which document fields the model extracts from this class of document. If there’s no existing schema you want to import, you can define the class schema later in ML Studio.

  7. Optional. Select Do not auto-assign classes. If you select this option, you must add or import classes before you can begin annotation.

  8. Click Create.

Create an annotation set from the Annotation sets tab

  1. Click + Create new.

  2. Select the documents you want to include in the annotation set.

  3. Click Confirm.

  4. Enter a name for the annotation set.

  5. Optional. Enter a description for the annotation set.

  6. Optional. Select the field you want to assign as the document class. By default, Solution Builder uses the Type field, but you can choose any field.

  7. Optional. Import a class schema from the Instabase Marketplace. The class schema defines which document fields the model extracts from this class of document. If there’s no existing schema you want to import, you can define the class schema later in ML Studio.

  8. Optional. Select Do not auto-assign classes. If you select this option, you must add or import classes before you can begin annotation.

  9. Click Create.

After the set is created, click Open Annotation set to open the annotation set you just created, or find all available annotation sets in the Annotation sets tab. When you open an annotation set, Solution Builder opens it in the ML Studio app.

Next, you’ll add fields to each of the classes to indicate the data you want to extract.

Add or import classes

If you did not auto-assign classes to your documents, you must add or import a document class before you can begin annotation.

Add a new class

  1. In ML Studio, in your annotation set, click + Create or Import class to open the Manage Classes view.

  2. Click + Create new class.

  3. Add the class name.

  4. Optional. Add a description of the class.

  5. Optional. Add fields to define the class schema, which specifies the information you want to extract. You can add these fields when you create the class or you can add them afterward. See Add fields to classes for details.

Import a class

If you want to use a class that already exists in another annotation set or in the Marketplace, you can import the class. Imported classes already have a class schema, which defines the information the model needs to extract from the document class. You can add or edit the fields in the schema as needed after you import the class.

  1. In ML Studio, in the annotation set, click + Create or Import class to open the Manage Classes view.

  2. Click Import classes.

  3. Choose where you want to import classes from: an existing annotation set or from a Marketplace model.

    • If you import from the Marketplace, you can search for the model you want to use. Find the model from which you want to import, and click Import.
    • If you import from another annotation set, you can select from any Solution Builder project you have access to. Select the annotation set from which you want to import the class, and click Open.

Add fields to classes

To teach the model to extract data from your documents, set up the document fields you want to extract from each type, or class, of document. The set of fields you define for a class is known as the class schema, and you create and manage the schema in ML Studio. For each class, add fields to define the information you want to extract.

  1. In ML Studio, in the class list sidebar, select the class you want to add fields to.

  2. Click the plus (+) icon or click Add field.

  3. Enter the field name, such as name or date.

  4. Select the field type: text, table, or list (public preview).

    Note

    To extract data from tables, you must first enable table extraction by setting the environment variable ENABLE_ML_STUDIO_TABLE to true. See the environment variables documentation for details.

  5. Optional. Enter a description of the field.

  6. Click Add.

  7. Repeat these steps for each field you want to add to the class schema.

Alternatively, you can add, modify, or delete class fields in the class settings page. To open the class settings page in ML Studio, click the settings (gear) icon and select the Classes tab.

Annotate documents

To begin creating models, you must first annotate documents with the fields you want to extract. Annotating the documents teaches the model what data in the document corresponds to each of the fields you’ve defined in the class schema.

For details about annotation in ML Studio, see the annotation guide.

  1. In ML Studio, select a document from the document list to display the document in the center panel.

  2. Select a field from the class panel.

  3. Highlight the document area that contains the information for that field. You can click on words or numbers, or you can use your mouse to drag and draw a box around the information.

  4. Continue selecting fields and highlighting the information until you have annotated the document with each of the fields in the class.

  5. Click Mark as annotated on the annotated document.

  6. Repeat steps to annotate all of the documents in the annotation set.

  7. Return to Solution Builder to create classification and extraction models.

Creating and training models

You likely want to include both classification and extraction models in your solution. Classification models sort documents into their correct class, while extraction models identify and extract the data you’ve specified. You must repeat the tasks in this section for each type of model you need in your workflow.

If you’re processing a common document type, you might be able to skip or speed up model training by starting with a Marketplace model. Otherwise, you can train a base model, which is a more generalized deep learning model that doesn’t yet have familiarity with specific document types.

To train a deep learning model in Instabase, you provide example data to the model to teach it how to classify and extract data. For example, for document understanding solutions like you might create in Solution Builder, you provide the model with the annotation set: a document set that you have already sorted into different classes, each with multiple examples that show the model what kind of data you want and where on the documents it can find that data.

Create an ML Studio project

In Solution Builder, create an ML Studio project, essentially a container for your model, by selecting the model type and annotation set to use for training. You must complete this step regardless of whether you plan to use a Marketplace model or a base model.

  1. In the Solution Builder Annotation sets tab, select the annotation sets that you want to use to train the model, and click Create model.

  2. Select a model type:

    • Classification: Classifies documents. For example, you might classify documents by provider, such as ADP and Gusto paystubs, or by the kind of document, such as paystubs and bank statements.

    • Extraction: Extracts text and numbers from a document.

  3. Click Next.

  4. Enter a name for the model. For extraction models, Solution Builder suggests the class names as names for each extraction model, but you can change the names of the models. For classification models, which typically contain multiple classes, provide a name and, optionally, a description for the model.

  5. Click Create.

After the ML Studio project is created, it’s displayed in the Models tab. Click Open to begin training the model.

Import and train a model

After creating an ML Studio project based on an annotation set, train the model to learn how to classify documents or extract data from them.

  1. In the Solution Builder Models tab, select the model you want to train and click Open.

  2. (Optional) If a Marketplace model matches your document processing requirements, click Import to import it from the Marketplace.

  3. In the Trained models section, click Train.

  4. Adjust model training options and hyperparameters as needed and click Train.

    • To select a different base model than the default, or to use an imported Marketplace model, click the edit icon in the Select a model section.

    • The default training settings are a good starting point. For more details, see Model training options.

    The training job might take some time, depending on how many documents you’ve included in your model. When the job completes, you can view metrics about the trained model.

When you have finished training models, return to Solution Builder to create a flow that automates classifying and extracting data from your documents.

Creating a basic flow

Flows automate document processing, classifying documents and extracting data from them. Flows can also refine and validate data, perform custom functions, and more. Solution Builder can automatically generate a basic flow based on what you’ve already done in your project. Later, you can add refiners, validations, user-defined functions (UDFs), and other logic to the generated flow.

  1. In the Solution Builder Flows tab, click Generate Flow or Build from scratch.

    A generated flow gives you a simple flow that serves as a good template for developing a more complex flow, but you can choose to build a flow from scratch if you prefer.

  2. Select the models you want to include in the generated flow. Usually, you want one classification model as well as one extraction model for each class in the classification model.

  3. Your new flow opens in the Flow editor, on the Visual Editor tab. In this tab, you can add or change steps in the flow. When adding steps to the flow, you can select any modules from your project using the module picker.

  4. Add sample documents to test the flow, and specify optional settings.

    1. Select the Flow Settings tab, then click Select Files.

    2. Select the files you want to process in the flow and click Select.

      You can use the same files that you used for annotation and model training.

    3. Optional. Add any tags you’d like to use to categorize the flows. Tags can be useful to organize flows in more complex production environments.

    4. Optional. Add the pipeline that you want to assign the flow to. Only users in this pipeline can review your flow. For more details about Flow pipelines, see Organizing and managing reviews.

  5. Click Run Flow.

It takes a short time for Flow to complete the job. When the job has finished, you can open the results in a Flow review to assess how your solution performed. You might decide to continue testing and developing your project by adding refiners and validations to the flow, changing the digitization profile, or training new models.

Developing refiners

Refiners take model output and modify it to make it more accurate and usable. It’s a good idea to add a refiner after your model in a flow, because it’s rare for a model to extract data exactly as you want it.

The easiest way to develop a new refiner is to create the refiner from model output, so that the refiner already includes all the fields from your model. Alternatively, you can import a model that already exists in another project or directory, and edit it as needed.

Create a refiner from a model

You must first run the model so that you can identify how you want to refine it. For example, if the model extracts a date field in different formats, such as 12/30/2001 and 30/12/2001, you might want to create a refiner that transforms the date into one consistent format.

  1. In the Solution Builder Models tab, find the model for which you want to create a refiner.

  2. Click Run with and select the output on which to run the model. The output you choose depends on exactly what output you want to refine.

  3. In the dialog box, choose the specific documents or output set on which to run the model.

  4. Click Run. The model runs on the documents or output set you’ve chosen and creates an output set.

When the model has finished running, you can create a refiner for it.

  1. In the Solution Builder Models tab, locate the card for the model for which you want to create a refiner, and click Create Refiner.

  2. In the dialog, select the output set you want to refine and click Next. The output set you choose now is used only for developing the refiner; after you add the refiner to a flow, the refiner uses output from the flow.

  3. Enter a name for the refiner.

  4. Optional. Select Create and connect new Script set or select an existing Script set to connect to the refiner. Connecting a script set allows you to use custom functions within your refiner.

  5. Click Create.

After you create the refiner, you can view and open it from the Refiners tab in Solution Builder. When you open the refiner, it opens in the Refiner app in a new tab where you can develop your refiner.

Import a refiner

If you already have a refiner you’d like to work with, you can import a refiner from a project or from the file system.

  1. Expand the submenu under Create new and select where you want to import a refiner from.

  2. Select the refiner you want to import and click Next.

  3. Enter a new name for the refiner.

  4. Choose development data to use in your refiner by clicking Select, then choosing the data you want to add.

  5. After you’ve chosen the data to add, click Select.

  6. Click Import.

After you import the refiner, you can view and open it from the Refiners tab in Solution Builder. When you open the refiner, it opens in the Refiner app in a new tab where you can develop your refiner.

Viewing and developing refiners

To view your refiner, click the Refiners tab in Solution Builder, and click Open on the refiner you want to view.

In the refiner view, each row represents one of the processed records, and each column contains data for each of the record’s fields.

To develop your refiner, you’ll locate the values that the model isn’t extracting correctly and add logic to correct those values. For example, your results might include numbers from which you want to remove hyphens or spaces. You can often use built-in functions to correct your data, or you can write your own user-defined functions (UDFs).

For details about how to develop refiners, see the Refiner documentation.

Developing validations

Validations evaluate and validate that model or refiner output matches the rules you have specified.

For example, if a field contains letters, but the validation rule says that it should contain numbers, that field would fail type validation. Output that fails validation is then flagged for review, so that a human reviewer can check the output and correct it if necessary. Validation is an optional step in a flow, but it’s a good idea to include it.

Create a validation from the Refiners tab

To validate refiner output, you can create a validation directly from the Refiner tab in Solution Builder. You must have already run the refiner on the input data.

  1. On the refiner you want to use for your validation, click Create validations.

  2. Select the output set on which you want to base your validation.

  3. Click Next.

  4. Enter a name for the validation

  5. Optional. Choose a ground truth set to define correct values for the validation. For information on creating and using ground truth sets, see the accuracy metrics documentation.

  6. Click Create.

After the validation has been created, open it by clicking View here or by navigating to the Validations tab in Solution Builder and opening the validation set you want to configure.

Next, you’ll add rules to your validation, and conditions to those rules, as needed. For details about how to add rules and conditions to your validation, see the Validations documentation.

When you’ve finished configuring the validation, you can add it to a new or existing flow.

Create a validation from the Validations tab

To validate a classification or extraction model, or refiner output, create a validation from the Validations tab in Solution Builder.

  1. In the Solution Builder Validations tab, click Create new.

  2. Enter a name for the validation.

  3. In Select development data to use in your Validations, click Select.

  4. Choose the type of output you want to use in developing the validation: refiner, extraction, or classification output.

  5. Select the output set to use in your validation.

  6. Click Select.

  7. Optional. Choose a ground truth set to define correct values for the validation. For information on creating and using ground truth sets, see the accuracy metrics documentation.

  8. Click Create.

After the validation has been created, open it by clicking View here or by navigating to the Validations tab in Solution Builder and opening the validation set you want to configure.

Next, you’ll add rules to your validation, and conditions to those rules, as needed. For details about how to add rules and conditions to your validation, see the Validations documentation.

When you’ve finished configuring the validation, you can add it to a new or existing flow.

Import a validation

If you already have a validation configuration you’d like to work with, you can import a validation from a project or from the file system.

  1. Expand the submenu under Create new and select where you want to import a validation from.

  2. Select the validation you want to import and click Next.

  3. Enter a new name for the refiner.

  4. Choose development data to use in your refiner by clicking Select, then choosing the data you want to add.

  5. After you’ve chosen the data to add, click Select.

  6. Click Import.

After the validation has been imported, open it by clicking View here or by navigating to the Validations tab in Solution Builder and opening the validation set you want to configure.

Next, you’ll add rules to your validation, and conditions to those rules, as needed. For details about how to add rules and conditions to your validation, see the Validations documentation.

When you’ve finished configuring the validation, you can add it to a new or existing flow.

Compiling a solution

When your solution is ready, you can compile it to make it ready for deployment or publishing. Compiling generates a solution package containing a portable, compressed version of the flow with dependencies and proprietary information removed.

  1. In the Flow editor, select File > Compile Flow.

  2. Give the solution a version number and click Compile.

    The compiled flow is accessible in the Solution Builder Flows tab, on the Compiled output tab for the flow or within the file system at {PROJECT-NAME}/latest/flows/{PROJECT-NAME} flow/builds/{VERSION}.ibflowbin.

Publishing a solution

After your solution is compiled, you can publish the solution to the Marketplace for use throughout your Instabase instance.

  1. In the Solution Builder Flows tab, on the Compiled output tab for the flow you want to publish, click Publish to Marketplace.

  2. Specify details about your solution and click Publish.

    Publishing your solution as a solution accelerator makes your solution searchable and usable by others in your Instabase instance.