Flow Binary API

Table of Contents

The Flow Binary API allows you to compile a flow, generate and run a flow binary, and list, pause, resume, cancel, or retry flow jobs.

In this document, URL_BASE refers to the root URL of your Instabase instance, such as https://www.instabase.com.

Compile a flow binary

Method	Syntax
POST	`URL_BASE/api/v1/flow_binary/compile`

Description

Compile a flow to generate a flow binary.

Request parameters

Parameters are required unless marked as optional.

Name	Type	Description	Values
`flow_project_root`	string	The longest common path to your ibflow files and all flows’ root directories. This is needed regardless of if the path is already in the API URL. For example, if the folder that contains the flows is `folder1/folder2/folder3/workflows/`, and the root directories of all the flows are under `folder1/folder2/folder3/samples/`, then your `flow_project_root` should be `folder1/folder2/folder3`.
`binary_type`	string	The type of flow binary file.	Single Flow, Metaflow, Flows
`predefined_binary_path`	string	Optional. Path to save the compiled binary. If not specified, the default is `flow_project_root/build/bin/`.
`settings`	dict	Additional settings that depend on the `binary_type`. All paths here are relative paths to `flow_project_root`.
`settings/flow_file`	string	[Single Flow] Path to the flow to compile.
`settings/flows_dir`	string	[Metaflow] Directory that contains the flows to include in the metaflow.
`settings/metaflow_file`	string	[Metaflow] Path to the `.ibmetaflow` file that is used to configure the metaflow.
`settings/classifier_file`	string	[Metaflow] Path to the `.ibclassifier` file that is used for classification.
`settings/flows_dir`	string	[Flows] Directory that contains the flows to include in the metaflow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key	Description	Value
`status`	Status of the request.	OK, ERROR
`job_id`	The job-id for the compile job

Examples

Request

curl --location --request POST $URL_BASE'/api/v1/flow_binary/compile'
--header 'Authorization: Bearer '${ACCESS_TOKEN} \
--header 'Content-Type: application/json' \
--data-raw '{"binary_type": "Single Flow", \
"flow_project_root": "user1/HITL/fs/Instabase Drive/files/Checkpoint Step/", \
"settings": {"flow_file": "Workflows/checkpoint_step.ibflow"}}'

Response

{
    "status": "OK",
    "job_id": "6243b9a5-d1b7-4613-9acd-2e9571028853"
}

Run a Flow Binary

Method	Syntax
POST	`URL_BASE/api/v1/flow/run_binary_async`

Description

Run a flow binary.

Request parameters

Parameters are required unless marked as optional.

Name	Type	Description
`binary_path`	string	The path reference to your binary file.
`input_dir`	string	The folder containing the data to run the Flow on.
`output_dir`	string	Optional. The folder containing the output files. Defaults to `input_dir/../out/`.
`job_id`	string	Optional. The job ID for this job. If omitted, a job ID is generated automatically. The job ID must be no more than 50 characters long.
`settings`	dict	Optional. Additional settings for the flow run.
`settings/output_has_run_id`	boolean	Optional. Whether output should be written into a directory with a unique, timestamped run_id. Defaults to false.
`settings/delete_out_dir`	boolean	Optional. Whether to delete any existing content in the output folder, before running the binary. Is a no-op if `output_has_run_id` is true. Defaults to false.
`settings/log_to_timeline`	boolean	Optional. Enable developer logs (logs from Refiner and UDFs). Defaults to false.
`settings/notification_emails`	list	Optional. List of emails that will be notified when the run is completed.
`settings/tags`	list	Optional. List of string tags to attach with this Flow run. Flow runs can be later searched using these tags using the Flow Dashboard.
`settings/step_timeout`	integer	Optional. Timeout in seconds for each Flow step. When set to 0, the platform picks an appropriate timeout for each step; when set to -1, step timeouts are disabled. Defaults to 0.
`settings/pipeline_ids`	list	Optional. Associate current Flow run with a pipeline. Takes a list of pipeline ids retrieved from the Flow Pipeline API.
`settings/save_binary_to_output`	boolean	Optional. Save the flow binary in the output folder. Defaults to true.
`settings/webhook_config`	dict	Optional. Configure a webhook url that will be notified on Flow completion. Also see the Webhook Configuration section below.
`settings/webhook_config/url`	string	Webhook URL to which a notification event will be sent when the run is completed.
`settings/webhook_config/headers`	string	Optional. Headers for the notification event.
`settings/runtime_config`	dict	Optional. A JSON object used to pass arguments to the flow. Can also be used to add a log prefix and change the output schema.
`settings/priority`	integer	Optional. Priority level for this Flow run. There are 10 priority levels 0 (lowest) - 9 (highest). Tasks of a Flow run with higher priority will run before tasks of a Flow run with lower priority. Defaults to 5 for Flow run via UI and 0 for Flow run via API.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key	Description	Value
`status`	Status of the request.	OK, ERROR
`data/job_id`	A unique identifier for the job.
`data/output_folder`	The full path to the root output folder.

Examples

Request

url = url_base + '/api/v1/flow/run_binary_async'

args = {
  'input_dir': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
  'binary_path': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/build/bin/my_flow.ibflowbin",
  'settings': {
    'delete_out_dir': False,
    'output_has_run_id': True,
    'notification_emails': ["me@domain.com"],
  },
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The HTTP call returns immediately while the binary execution proceeds asynchronously in the background. If successful, the response contains a job_id field that you can use to check the status of your execution.

{
  "status": "OK",
  "data": {
    "job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
    "output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/out"
  }
}

Webhook Configuration

You can use the webhook_config setting to ensure your application is notified when a flow completes. Instabase POSTs JSON-encoded data of the format below to the webhook endpoint when the flow completes.

# body
{
  "status": <string>,
  "msg": <string>,
  "job_id": <string>,
  "binary_path": <string>,
  "input_dir": <string>,
  "output": <string>
}

The response body contains the following fields:

status: "OK" | "ERROR"
msg: (optional) Error message. Present only if status is ERROR.
job_id: A unique identifier for the job.
binary_path: The path reference to flow binary file.
input_dir: Input directory.
output: The full path to the root output folder.

To acknowledge receipt of the event, your endpoint must return a 2xx HTTP status code to Instabase. All response codes outside this range, including 3xx codes, indicate to Instabase that you did not receive the event.

If Instabase does not receive a 2xx HTTP status code, the notification attempt is repeated upto 7 times.

Add a log prefix

Add a prefix to all log statements generated by this flow run.

Pass the field log_prefix with Flow runtime config, where the value of log_prefix is the string that you want to prepend to all log statements. For example, using the runtime config below would prepend org=Piston to all log statements.

"runtime_config": {"log_prefix": "org=Piston "}

Change output schema at runtime

Change the output schema at runtime for Marketplace solutions and flow binary jobs.

Pass the field modify_output_schema with Flow runtime config, where the value of modify_output_schema is a stringify JSON object that can contain these fields:

add_fields: a list of fields to add to the output. Each field is described as an object {"name": "field_name", "formula": "field_formula"}
remove_fields: a list of field names to remove from the output
rename_fields: an object where each key is the field to be renamed, and its value is the new name.

Here’s a JavaScript example of changing the output schema at runtime:

const output_config = {
  "add_fields": [{"name": "my_field_name", "formula": "echo(‘hello world’)"}],
  "remove_fields": ["field_1", "field_2"],
  "rename_fields": {"old_name_1": "new_name_1", "old_name_2": "new_name_2"}
}
const form = {
  "binary_path": binaryPath,
  "input_dir": inputDir,
    "settings": {
      "runtime_config": {"modify_output_schema": JSON.stringify(output_config)}
      ...
  }
}

$.ajax({
      type: 'POST',
      url: '/api/v1/flow/run_binary_async',
      headers: {
        'Content-Type': 'application/json;charset=UTF-8',
        'Authorization': 'Bearer '+token
      },
      data: JSON.stringify(form)
    })

Job Status

Use the Job Status API to get the execution status of a flow binary job.

Refer to the Job Status API.

Run a flow binary in sync mode

Method	Syntax
POST	`URL_BASE/api/v1/flow/run_binary_sync`

Description

Use the Sync API to synchronously run a Flow and return the output of the Flow in the specified format. The Sync API takes one or more files as input, runs the flow, blocks for the flow to finish and returns the extracted results. Flows run via Sync API run entirely in-memory and hence is faster than running the flow using the /run_binary_async API. However, the Sync API doesn’t support checkpoint resume, processing large batches (>5) of files, and has a max processing time of 60s.

Request parameters

Parameters are required unless marked as optional.

Name	Type	Description	Values
`binary_path`	string	The path to the flow binary `.iflowbin` file.
`options`	dict	Dictionary of options.
`options/result_format`	string	Optional. The output structure.	json(default), csv
`files`	array	An array of dictionaries where each dictionary defines an input file.
`files/file_name`	string	The name of the input file.
`files/file_content`	string	Base64-encoded file data.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key	Description	Value
`records`	An array of records.
`records/output_file_path`	Path to the IBDOC output file.
`records/results`	An array of key-value pairs for each field specified in the Refiner program.
`records/file_name`	The name of the input file.
`records/file_index`	The index of the input file.
`records/record_index`	The index of the output record.
`binary_path`	The path to the Flow binary `.iflowbin` file.
`job_id`	The job-id for the Flow run

Examples

Request

import base64

encoded_string = base64.b64encode(open("test.pdf", "rb").read()).decode('utf-8')

url = url_base + '/api/v1/flow/run_binary_sync'

api_args = {
  'binary_path': "jaydoe/my_repo/fs/InstabaseDrive/build/bin/workflow/my_flow.ibflowbin",
  'files': [{
    'file_name': "test.pdf",
    'file_content': encoded_string,
  }]
}
json_data = json.dumps(api_args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers, verify=False)
resp_data = json.loads(r.content)

Response

{
  "records": [
    {
      "results": [
        {
          "key": "first_name",
          "value": "Coder"
        },
        {
          "key": "last_name",
          "value": "Maloder"
        }
      ],
      "file_name": "test.pdf",
      "file_index": 0,
      "record_index": 0
    }
  ],
  "binary_path": "jaydoe/my_repo/fs/InstabaseDrive/build/bin/workflow/my_flow.ibflowbin",
  "job_id": "5eb7df69-7506-41b3-bd3a-eb69d8b8a36d"
}