Flow Binary API

The Flow Binary API allows you to compile a flow, generate and run a flow binary, and list, pause, resume, cancel, or retry flow jobs.

In this document, URL_BASE refers to the root URL of your Instabase instance, such as https://www.instabase.com.

Compile a flow binary

Method Syntax
POST URL_BASE/api/v1/flow_binary/compile

Description

Compile a flow to generate a flow binary.

Request parameters

Parameters are required unless marked as optional.

Name Type Description Values
flow_project_root string The longest common path to your ibflow files and all flows’ root directories. This is needed regardless of if the path is already in the API URL. For example, if the folder that contains the flows is folder1/folder2/folder3/workflows/, and the root directories of all the flows are under folder1/folder2/folder3/samples/, then your flow_project_root should be folder1/folder2/folder3.
binary_type string The type of flow binary file. Single Flow, Metaflow, Flows
predefined_binary_path string Optional. Path to save the compiled binary. If not specified, the default is flow_project_root/build/bin/.
settings dict Additional settings that depend on the binary_type. All paths here are relative paths to flow_project_root.
settings/flow_file string [Single Flow] Path to the flow to compile.
settings/flows_dir string [Metaflow] Directory that contains the flows to include in the metaflow.
settings/metaflow_file string [Metaflow] Path to the .ibmetaflow file that is used to configure the metaflow.
settings/classifier_file string [Metaflow] Path to the .ibclassifier file that is used for classification.
settings/flows_dir string [Flows] Directory that contains the flows to include in the metaflow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR
job_id The job-id for the compile job

Examples

Request

curl --location --request POST $URL_BASE'/api/v1/flow_binary/compile'
--header 'Authorization: Bearer '${ACCESS_TOKEN} \
--header 'Content-Type: application/json' \
--data-raw '{"binary_type": "Single Flow", \
"flow_project_root": "user1/HITL/fs/Instabase Drive/files/Checkpoint Step/", \
"settings": {"flow_file": "Workflows/checkpoint_step.ibflow"}}'

Response

{
    "status": "OK",
    "job_id": "6243b9a5-d1b7-4613-9acd-2e9571028853"
}

Run a Flow Binary

Method Syntax
POST URL_BASE/api/v1/flow/run_binary_async

Description

Run a flow binary.

Request parameters

Parameters are required unless marked as optional.

Name Type Description Values
binary_path string The path reference to your binary file.
input_dir string The folder containing the data to run the Flow on.
output_dir string Optional. The folder containing the output files. Defaults to input_dir/../out/.
job_id string Optional. The job ID for this job. If omitted, a job ID is generated automatically. The job ID must be no more than 50 characters long.
settings dict Optional. Additional settings for the flow run.
settings/output_has_run_id boolean Optional. Whether output should be written into a directory with a unique, timestamped run_id. Defaults to false.
settings/delete_out_dir boolean Optional. Whether to delete any existing content in the output folder, before running the binary. Is a no-op if output_has_run_id is true. Defaults to false.
settings/log_to_timeline boolean Optional. Enable developer logs (logs from Refiner and UDFs). Defaults to false.
settings/notification_emails list Optional. List of emails that will be notified when the run is completed.
settings/tags list Optional. List of string tags to attach with this Flow run. Flow runs can be later searched using these tags using the Flow Dashboard.
settings/step_timeout integer Optional. Timeout in seconds for each Flow step. When set to 0, the platform picks an appropriate timeout for each step; when set to -1, step timeouts are disabled. Defaults to 0.
settings/pipeline_ids list Optional. Associate current Flow run with a pipeline. Takes a list of pipeline ids retrieved from the Flow Pipeline API.
settings/save_binary_to_output boolean Optional. Save the flow binary in the output folder. Defaults to true.
settings/webhook_config dict Optional. Configure a webhook url that will be notified on Flow completion. Also see the Webhook Configuration section below.
settings/webhook_config/url string Webhook URL to which a notification event will be sent when the run is completed.
settings/webhook_config/headers string Optional. Headers for the notification event.
settings/runtime_config dict Optional. A JSON object used to pass arguments to the flow. Can also be used to add a log prefix and change the output schema.
settings/priority integer Optional. Priority level for this Flow run. There are 10 priority levels 0 (lowest) - 9 (highest). Tasks of a Flow run with higher priority will run before tasks of a Flow run with lower priority. Defaults to 5 for Flow run via UI and 0 for Flow run via API.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR
data/job_id A unique identifier for the job.
data/output_folder The full path to the root output folder.

Examples

Request

url = url_base + '/api/v1/flow/run_binary_async'

args = {
  'input_dir': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
  'binary_path': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/build/bin/my_flow.ibflowbin",
  'settings': {
    'delete_out_dir': False,
    'output_has_run_id': True,
    'notification_emails': ["me@domain.com"],
  },
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The HTTP call returns immediately while the binary execution proceeds asynchronously in the background. If successful, the response contains a job_id field that you can use to check the status of your execution.

{
  "status": "OK",
  "data": {
    "job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
    "output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/out"
  }
}

Webhook Configuration

You can use the webhook_config setting to ensure your application is notified when a flow completes. Instabase POSTs JSON-encoded data of the format below to the webhook endpoint when the flow completes.

# body
{
  "status": <string>,
  "msg": <string>,
  "job_id": <string>,
  "binary_path": <string>,
  "input_dir": <string>,
  "output": <string>
}

The response body contains the following fields:

  • status: "OK" | "ERROR"

  • msg: (optional) Error message. Present only if status is ERROR.

  • job_id: A unique identifier for the job.

  • binary_path: The path reference to flow binary file.

  • input_dir: Input directory.

  • output: The full path to the root output folder.

To acknowledge receipt of the event, your endpoint must return a 2xx HTTP status code to Instabase. All response codes outside this range, including 3xx codes, indicate to Instabase that you did not receive the event.

If Instabase does not receive a 2xx HTTP status code, the notification attempt is repeated upto 7 times.

Add a log prefix

Add a prefix to all log statements generated by this flow run.

Pass the field log_prefix with Flow runtime config, where the value of log_prefix is the string that you want to prepend to all log statements. For example, using the runtime config below would prepend org=Piston to all log statements.

"runtime_config": {"log_prefix": "org=Piston "}

Change output schema at runtime

Change the output schema at runtime for Marketplace solutions and flow binary jobs.

Pass the field modify_output_schema with Flow runtime config, where the value of modify_output_schema is a stringify JSON object that can contain these fields:

  • add_fields: a list of fields to add to the output. Each field is described as an object {"name": "field_name", "formula": "field_formula"}

  • remove_fields: a list of field names to remove from the output

  • rename_fields: an object where each key is the field to be renamed, and its value is the new name.

Here’s a JavaScript example of changing the output schema at runtime:

const output_config = {
  "add_fields": [{"name": "my_field_name", "formula": "echo(‘hello world’)"}],
  "remove_fields": ["field_1", "field_2"],
  "rename_fields": {"old_name_1": "new_name_1", "old_name_2": "new_name_2"}
}
const form = {
  "binary_path": binaryPath,
  "input_dir": inputDir,
    "settings": {
      "runtime_config": {"modify_output_schema": JSON.stringify(output_config)}
      ...
  }
}

$.ajax({
      type: 'POST',
      url: '/api/v1/flow/run_binary_async',
      headers: {
        'Content-Type': 'application/json;charset=UTF-8',
        'Authorization': 'Bearer '+token
      },
      data: JSON.stringify(form)
    })

Job Status

Use the Job Status API to get the execution status of a flow binary job.

Refer to the Job Status API.

Run a flow binary in sync mode

Method Syntax
POST URL_BASE/api/v1/flow/run_binary_sync

Description

Use the Sync API to synchronously run a Flow and return the output of the Flow in the specified format. The Sync API takes one or more files as input, runs the flow, blocks for the flow to finish and returns the extracted results. Flows run via Sync API run entirely in-memory and hence is faster than running the flow using the /run_binary_async API. However, the Sync API doesn’t support checkpoint resume, processing large batches (>5) of files, and has a max processing time of 60s.

Request parameters

Parameters are required unless marked as optional.

Name Type Description Values
binary_path string The path to the flow binary .iflowbin file.
options dict Dictionary of options.
options/result_format string Optional. The output structure. json(default), csv
files array An array of dictionaries where each dictionary defines an input file.
files/file_name string The name of the input file.
files/file_content string Base64-encoded file data.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
records An array of records.
records/output_file_path Path to the IBDOC output file.
records/results An array of key-value pairs for each field specified in the Refiner program.
records/file_name The name of the input file.
records/file_index The index of the input file.
records/record_index The index of the output record.
binary_path The path to the Flow binary .iflowbin file.
job_id The job-id for the Flow run

Examples

Request

import base64

encoded_string = base64.b64encode(open("test.pdf", "rb").read()).decode('utf-8')

url = url_base + '/api/v1/flow/run_binary_sync'

api_args = {
  'binary_path': "jaydoe/my_repo/fs/InstabaseDrive/build/bin/workflow/my_flow.ibflowbin",
  'files': [{
    'file_name': "test.pdf",
    'file_content': encoded_string,
  }]
}
json_data = json.dumps(api_args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers, verify=False)
resp_data = json.loads(r.content)

Response

{
  "records": [
    {
      "results": [
        {
          "key": "first_name",
          "value": "Coder"
        },
        {
          "key": "last_name",
          "value": "Maloder"
        }
      ],
      "file_name": "test.pdf",
      "file_index": 0,
      "record_index": 0
    }
  ],
  "binary_path": "jaydoe/my_repo/fs/InstabaseDrive/build/bin/workflow/my_flow.ibflowbin",
  "job_id": "5eb7df69-7506-41b3-bd3a-eb69d8b8a36d"
}