Flow API

Table of Contents

Use Flow API to run a flow step or a flow, get flow execution status, export flow output, and more.

Run a step

Use this API to run a step in a flow.

Request

Send a POST request to url_base/flow/run_step_async with the post body encoded as JSON:

POST url_base/flow/run_step_async
{
    "ibflow_path": "instabase/path/to/flow/file.ibflow",
    "step_index": 1
}

ibflow_path: The path reference to your flow file
step_index: The step index to execute. Valid values are 0 to len(steps) - 1.

The HTTP call returns immediately while the step proceeds asynchronously in the background.

Response

The HTTP call returns immediately while the step execution proceeds asynchronously in the background.

If successful, the response contains a job_id field that you can use to check the status of the execution.

See the Job Status API.

HTTP STATUS CODE 200

# body
{
  "status": "OK",
  "data": {
    "job_id": <string>,
    "output_folder": <string>
  }
}

The response body is a JSON object with the following fields:

status: "OK"
data: A JSON object with the following fields:
- job_id: A unique identifier for the job.
- output_folder: The full path to the root output folder.

Run a fully configured step

Use this API to run a standalone process files step.

Request

Run a fully configurable step by sending a POST to url_base/flow/run_step_async with the post body encoded as JSON:

POST url_base/flow/run_step_async
{
  "step_json_str": '{"kwargs": {}, "step_name": "process_files"}'
}

step_json_str: The fully configurable step is provided in JSON as a string in step_json_str:

{
  "step_name": "process_files",
  "kwargs": {
    "input_folder": "/owner/repo/drive/fs/files/input",
    "output_folder": "owner/repo/drive/fs/files/out",
    "process_type": "images_to_txt",
    "settings": {
      "ocr_page_type": "high_quality_doc",
      "output_format_layout": "layout_per_page",
      "page_range_str": "",
      "encryption_config": "",
      "ocr_config": ""
    }
  }
}

The kwargs map keyword arguments:

input_folder: Absolute path to input directory
output_folder: Absolute path to output directory
process_type: Which extensions to process:
- “auto_to_txt” - identify the extension to process for each file
- “images_to_txt” - process all image files, for example, pdf, tif, jpeg, and so on
- “pdf_to_txt”- process only pdf
settings: A map of extra settings for the Process File step.
- ocr_page_type: “default”, “high_quality_doc”, or “low_quality_doc”. We recommend using the default OCR model that is automatically selected based on the page being analyzed.
- output_format_layout: “layout_per_page” or “layout_per_doc”
- page_range_str: Restrict page ranges, for example, “1-5”
- encryption_config: A JSON string that is related to opening encrypted files
- ocr_config: A JSON string that is related to OCR flags

For details on extra settings, see the Process Files config options.

Run a flow

Method	Syntax
POST	`URL_BASE/api/v1/flow/run_flow_async`

Description

Run a flow.

Request parameters

Parameters are required unless marked as optional.

Name	Type	Description
`ibflow_path`	string	The path reference to your flow file.
`input_dir`	string	The folder containing the data to run the flow on.
`output_dir`	string	Optional. The folder containing the output files. Defaults to `input_dir/../out/`.
`output_has_run_id`	boolean	Optional. Whether output should be written into a directory with a unique, timestamped `run_id`. Defaults to false.
`delete_out_dir`	boolean	Optional. Whether to delete any existing content in the output folder, before running the binary. Is nonoperational if `output_has_run_id` is true. Defaults to false.
`log_to_timeline`	boolean	Optional. Enable developer logs from Refiner and UDFs. Defaults to false.
`notification_emails`	list	Optional. List of emails that will be notified when the run is completed.
`tags`	list	Optional. List of string tags to attach with this flow run. Flow runs can be later searched using these tags using the Flow dashboard or using the List API.
`step_timeout`	integer	Optional. Timeout in seconds for each flow step. When set to 0, the platform picks an appropriate timeout for each step; when set to -1, step timeouts are disabled. Defaults to 0.
`pipeline_ids`	list	Optional. Associate current flow run with a pipeline. Define a list of pipeline IDs, which can be retrieved from the Flow Pipeline API.
`save_binary_to_output`	boolean	Optional. Save the flow binary in the output folder. Defaults to true.
`webhook_config`	dict	Optional. Configure a webhook URL that will be notified on flow completion. See the Webhook Configuration section for more information.
`webhook_config/url`	string	Webhook URL to which a notification event will be sent when the run is completed.
`webhook_config/headers`	dict	Optional. Provide headers for the notification event as a dictionary, where each key-value pair represents a header name and its corresponding value.

The Flow API allows you to run multiple instances of a flow at the same time. To avoid write conflicts, we recommend using the output_has_run_id option, which places the output of each flow in a separate directory.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key	Description
`status`	Status of the request. Possible values are: OK, ERROR
`data/job_id`	A unique identifier for the job.
`data/output_folder`	The full path to the root output folder.

Examples

Request

url = url_base + '/api/v1/flow/run_flow_async'

args = {
  'input_dir': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
  'ibflow_path': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/my_flow.ibflow",
  'output_has_run_id': True,
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The HTTP call returns immediately while the flow execution proceeds asynchronously in the background. If successful, the response contains a job_id field that you can use to check the status of your execution.

{
  "status": "OK",
  "data": {
    "job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
    "output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/run_jaydoe/2023-04-12-20:22:36/756be65b-0eaf-4192-bea5-176f0377b0f8/out"
  }
}

Restart a flow

Method	Syntax
POST	`URL_BASE/api/v1/flow/restart`

Description

Restart a flow from the beginning, under a new job ID.

Request parameters

Name	Type	Description
`job_id`	string	The ID of the job to restart.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key	Description

`status` | Status of the request. Possible values are `OK` and `ERROR`. `data/job_id` | A unique identifier for the job. |

data/output_folder | The full path to the root output folder. |

Examples

Request

url = url_base + '/api/v1/flow/restart'

args = {
  'job_id': "b8f94216-c271-44f2-982c-3e279c6b446d",
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

If successful, the response contains the job_id of the new job as well as its output folder.

{
  "status": "OK",
  "data": {
    "job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
    "output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/run_jaydoe/2023-04-12-20:22:36/756be65b-0eaf-4192-bea5-176f0377b0f8/out"
  }
}

Run a metaflow

Method	Syntax
POST	`URL_BASE/api/v1/flow/run_metaflow_async`

Description

Run a metaflow.

Request parameters

Parameters are required unless marked as optional.

Name	Type	Description
`input_dir`	string	The folder that contains the data to run the metaflow on.
`flow_root_dir`	string	The path to a folder that contains the `.ibflow` files.
`classifier_file_path`	string	The path to the `.ibclassifier` file.
`delete_out_dir`	boolean	Optional. Whether to delete any existing content in the output folder, before running the binary. Is a no-op if `output_has_run_id` is true. Defaults to false.
`output_has_run_id`	boolean	Optional. Whether output should be written into a directory with a unique, timestamped run_id. Defaults to false.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key	Description
`status`	Status of the request. Possible values are: OK, ERROR
`data/job_id`	A unique identifier for the job.
`data/output_folder`	The full path to the root output folder.

Examples

Request

url = url_base + '/api/v1/flow/run_metaflow_async'

api_args = {
  'input_dir': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
  'flow_root_dir': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/workflows",
  'classifier_file_path': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/classifiers/my_classifier.ibclassifier",
  'delete_out_dir': False,
  'output_has_run_id': True,
}
json_data = json.dumps(api_args)

headers = {
  'Authorization': 'Bearer {0}'.format(token),
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The HTTP call returns immediately while the binary execution proceeds asynchronously in the background. If successful, the response contains a job_id field that you can use to check the status of the job execution.

HTTP STATUS CODE 200

# body
{
  "status": "OK",
  "data": {
    "job_id": <string>,
    "output_folder": <string>
  }
}

Job status

Use the job status API to get the execution status of a flow or metaflow job.

Refer to the Job Status API documentation.