Flow V2 guide

Instabase’s term for an automation pipeline is a Flow.

After you’ve built a Flow, the result is a reusable pipeline that can be used to perform repetitive processes on similar document types. A Flow’s solution can also be used as an API within a real-time processing system.

In this guide, we will create a Flow that processes paystubs.

Info

This guide covers legacy Flow V2. For a guide to working with the latest version of Flow, see Developing flows.

Prerequisites

For this exercise, we’ll be working with ADP and Gusto paystubs. You can download them here:

1. Instabase workspaces

An Instabase workspace is a directory where you store your files and data. If you’re familiar with git, workspaces used to be called repositories, and accomplish the same purpose.

Creating a workspace

Activity

  1. Log in to Instabase

  2. On the left modal, select the Instabase icon, hover over Workspaces, and then select New Workspace in the top right of your window

  3. Keep the default owner, which is your user, and add a workspace name (for example, practice-flow)

  4. Add a short description (REQUIRED)

  5. Leave Private selected, then Create Workspace. Your workspace will be created with two default folders: files, and notebooks

  6. For this guide, we don’t need this structure. Remove both files and notebooks by right-clicking, and selecting Delete

2. Introducing Instabase project templates

Barring certain customizations and edge cases, Instabase requires you to structure your files and data in a specific format. We’ll refer to this structure as a “project template” for the rest of this guide.

You might have a project template for obscuring sensitive data on tax forms, or for getting personal details off of drivers licenses.

When you are in an Instabase workspace, you can create a new project from a set of the most commonly used project templates. A few examples include:

  • OCR: for transforming images into text

  • Redactor: for obscuring sensitive information in images

  • Refiner: for extracting relevant information from text documents

Generating a project template

When you create a new project template, all of the folders and files you need for a project of that type are automatically created.

The project template that you’ll create in this guide will be used to extract information from ADP and Gusto paystubs.

In practice, humans know what an ADP paystub file looks like. If we gather a few of these paystubs together, we can train Instabase on how to find an ADP file just as easily. We can do the same for many different types of files that must be stored in a specific directory structure.

Activity

  1. At the top of your workspace, select the New dropdown.

  2. Hover over New project, then select General Extraction Project.

  3. Name your project “ADP Paystub Extractor” and select Create Project. Your project’s parent folder that will contain your new project is Instabase Drive.

  4. Unzip the adp-paystubs directory that you previously downloaded.

  5. In the drop-down menu, select Choose folder from computer, select Select From Computer, navigate to the adp-paystubs folder that you unzipped earlier, select Upload, and select Upload again.

  6. The upload process can take a minute or two. When it’s done, select View Project to open the project creation results.

3. .ibflow (the long way)

Project template or not, the assembly of an Instabase Flow is represented by a specific file type, called .ibflow.

Nearly every Flow contains the following structure:

  1. Process Files is a universal document intake step. It is more than just Optical Character Recognition: it also converts files and homogenizes their output type.

  2. Map Records, which allows you to re-organize files, slices up the incoming documents into the record boundaries you prefer. For example, if two (or more) paystubs are stored on the same input file, you can define this sort for your Flow, allowing proper segmentation. This step isn’t needed for every dataset, but it is good practice to include.

  3. The repetitive process that you’re automating. This might be redaction, refinement, or any of the other App functionality found on Instabase.

  4. Merge Records, which combines all of your processed files into a single tidy report and output.

Flow has a specific convention for naming the output of each step. The number of the step is provided (s1, s2, etc) as well as the name of the step. It is important to become familiar with Flow output directories because the output of s1_process_files and s2_map_records are often used as the input to other Instabase Apps, such as Refiner.

Running a Flow

We still haven’t created any processes with our files, but this exercise will allow us to access our existing .ibflow file and observe the results of a processed Flow.

Activity

  1. Return to your root folder, Instabase Drive, expand the ADP Paystub Extractor folder, and select workflow.ibflow to open the Flow. Notice the structure mentioned in the section above.

  2. In the upper-right corner of the page, select Tools > Run.

  3. We’ll be running this Flow on our input data, so after selecting Choose Folder, select the folder called input, then select Open.

  4. Select Run. A blue modal will appear as the Flow runs, and a green modal will appear when the Flow completes. It’s okay if you receive an error message here, as we’ve skipped the third step.

  5. To view the output, do one of these actions:

    • Select View output in the green modal.

    • Return to ADP Paystub Extractor project using the breadcrumbs at the top of the screen, and then expand or select the out folder.

  6. Notice that each step of the Flow is now represented as a folder within your project’s file structure.

  7. Return to the ADB Paystub Extractor directory and open the viewme.ibrecipebook file.

4. Viewing a project template

Each type of “project” that you create on Instabase is assembled with different key ingredients, though some projects might share a step like “View OCR results”. Each project template is organized to keep your focus on the important pieces—data, your functions, and your output—rather than worry about Instabase’s rules and file structures under the hood.

You’re not performing any actions on your data in this activity. It’s a brief tour to give you an understanding of what’s going on under the hood.

Let’s view each of the three default steps of the Refiner project template.

Activity

  1. From the viewme.ibrecipebook file, select View under the “View OCR Results” section. This takes you to the s2_map_records directory that we saw in the previous activity.

    • Each input file now has a corresponding .ibdoc file that is the image file paired with its OCR data. The .ibdoc file is the standard Instabase file format that is the output of the process files step.

    • Select any of these files to launch the Review OCR app that provides a window into OCR results, as well as any data extraction that has occurred.

    • Toggle between the image and the text that Instabase extracted from the image by selecting the image icon and the A icon. Notice how the extraction preserved the text’s spacing and structure. How did it handle the tricky diagonal “THIS IS NOT A CHECK” text?

  2. Return to the ADB Paystub Extractor directory, open the viewme.ibrecipebook file, and in the Edit/Run Flow section, select Edit to edit the .ibflow file we viewed in the previous exercise. In the future, you can use this button to get to your Flow process instead of worrying about where things are located within your Instabase file structure.

  3. Return to the ADB Paystub Extractor directory, open the viewme.ibrecipebook file, and in the Edit Refiner 5 program section, select Edit to open a spreadsheet that we’ll edit in the next activity.

5. Automating a Refiner process

Refinement in Instabase is synonymous with extraction. Do you have a lot of files and you want to refine them to distill key values like “Name” or “Net Pay”? You accomplish this distillation with a Refiner Flow.

Refining Paystubs

Refiner has many features, as well as a plugin system, that we’ll delve into later. For now, we will explore how to create a field. A field is an entity we might want to extract, like a name or a pay_date.

Activity

  1. On the page that you were on when you finished the last activity, select + New Field to add a new text field.

  2. From the bottom pane, you can edit, rename, and test fields. Try changing the value of Field name from field_1 to greeting.

  3. In the text field on the left side of the bottom pane, type echo('hello'). Notice that the right side of the bottom pane updates with relevant function documentation.

    echo is a standard function that simply means “write” or “post”. 'hello' (in single quotes—double will error), is just a value (argument) that we’re asking echo to post.

  4. Select Run Field to populate this field for all the documents

  5. Save this Refiner program by selecting Save in the top-right corner of the page.

6. Updating an existing Flow

As you add new files, edit existing functions, and cultivate outputs, you’ll sometimes find it valuable to modify and re-run your Flows.

A completed Flow

Activity

  1. Return to your viewme.ibrecipebook file, and select Edit under Edit/Run Flow.

  2. At the top right of your workflow.ibflow view, select Tools > Run.

  3. For “Input Folder”, select Choose folder, then input, then Open. Finally, select Run.

  4. Return to your Instabase Drive, then unfurl the out folder. Now, you’ll see a new addition, s3_apply_refiner.

  5. Navigate to the s3_apply_refiner folder and open one of the results to review your extracted output. You see values for greeting, the field you created in the Refiner program.

  6. To view all of these fields combined, you can find the out.ibocr file, in s4_merge_files. Here, you’ll see all extracted values for the set of documents.

Conclusion

That’s it! You’ve successfully built an end-to-end Flow that allows you to process ADP Paystubs.

Next steps

If you’re feeling advanced, try to build another Flow without guidance! We included some Gusto paystubs in the pre-requisites for you to practice on.