Flow API

Use Flow API to run a flow step or a flow, get flow execution status, export flow output, and more.

Run a step

Use this API to run a step in a flow.

Request

Send a POST request to url_base/flow/run_step_async with the post body encoded as JSON:

POST url_base/flow/run_step_async
{
    "ibflow_path": "instabase/path/to/flow/file.ibflow",
    "step_index": 1
}
  • ibflow_path: The path reference to your flow file

  • step_index: The step index to execute. Valid values are 0 to len(steps) - 1.

The HTTP call returns immediately while the step proceeds asynchronously in the background.

Response

The HTTP call returns immediately while the step execution proceeds asynchronously in the background.

If successful, the response contains a job_id field that you can use to check the status of the execution.

See the Job Status API.

HTTP STATUS CODE 200

# body
{
  "status": "OK",
  "data": {
    "job_id": <string>,
    "output_folder": <string>
  }
}

The response body is a JSON object with the following fields:

  • status: "OK"

  • data: A JSON object with the following fields:

    • job_id: A unique identifier for the job.

    • output_folder: The full path to the root output folder.

Run a fully configured step

Use this API to run a standalone process files step.

Request

Run a fully configurable step by sending a POST to url_base/flow/run_step_async with the post body encoded as JSON:

POST url_base/flow/run_step_async
{
  "step_json_str": '{"kwargs": {}, "step_name": "process_files"}'
}
  • step_json_str: The fully configurable step is provided in JSON as a string in step_json_str:
{
  "step_name": "process_files",
  "kwargs": {
    "input_folder": "/owner/repo/drive/fs/files/input",
    "output_folder": "owner/repo/drive/fs/files/out",
    "process_type": "images_to_txt",
    "settings": {
      "ocr_page_type": "high_quality_doc",
      "output_format_layout": "layout_per_page",
      "page_range_str": "",
      "encryption_config": "",
      "ocr_config": ""
    }
  }
}

The kwargs map keyword arguments:

  • input_folder: Absolute path to input directory

  • output_folder: Absolute path to output directory

  • process_type: Which extensions to process:

    • “auto_to_txt” - identify the extension to process for each file

    • “images_to_txt” - process all image files, for example, pdf, tif, jpeg, and so on

    • “pdf_to_txt”- process only pdf

  • settings: A map of extra settings for the Process File step.

    • ocr_page_type: “default”, “high_quality_doc”, or “low_quality_doc”. We recommend using the default OCR model that is automatically selected based on the page being analyzed.

    • output_format_layout: “layout_per_page” or “layout_per_doc”

    • page_range_str: Restrict page ranges, for example, “1-5”

    • encryption_config: A JSON string that is related to opening encrypted files

    • ocr_config: A JSON string that is related to OCR flags

For details on extra settings, see the Process Files config options.

Run a flow

Method Syntax
POST URL_BASE/api/v1/flow/run_flow_async

Description

Run a flow.

Request parameters

Parameters are required unless marked as optional.

Name Type Description
ibflow_path string The path reference to your flow file.
input_dir string The folder containing the data to run the flow on.
output_dir string Optional. The folder containing the output files. Defaults to input_dir/../out/.
output_has_run_id boolean Optional. Whether output should be written into a directory with a unique, timestamped run_id. Defaults to false.
delete_out_dir boolean Optional. Whether to delete any existing content in the output folder, before running the binary. Is nonoperational if output_has_run_id is true. Defaults to false.
log_to_timeline boolean Optional. Enable developer logs from Refiner and UDFs. Defaults to false.
notification_emails list Optional. List of emails that will be notified when the run is completed.
tags list Optional. List of string tags to attach with this flow run. Flow runs can be later searched using these tags using the Flow dashboard or using the List API.
step_timeout integer Optional. Timeout in seconds for each flow step. When set to 0, the platform picks an appropriate timeout for each step; when set to -1, step timeouts are disabled. Defaults to 0.
pipeline_ids list Optional. Associate current flow run with a pipeline. Define a list of pipeline IDs, which can be retrieved from the Flow Pipeline API.
save_binary_to_output boolean Optional. Save the flow binary in the output folder. Defaults to true.
webhook_config dict Optional. Configure a webhook URL that will be notified on flow completion. See the Webhook Configuration section for more information.
webhook_config/url string Webhook URL to which a notification event will be sent when the run is completed.
webhook_config/headers dict Optional. Provide headers for the notification event as a dictionary, where each key-value pair represents a header name and its corresponding value.

The Flow API allows you to run multiple instances of a flow at the same time. To avoid write conflicts, we recommend using the output_has_run_id option, which places the output of each flow in a separate directory.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description
status Status of the request. Possible values are: OK, ERROR
data/job_id A unique identifier for the job.
data/output_folder The full path to the root output folder.

Examples

Request

url = url_base + '/api/v1/flow/run_flow_async'

args = {
  'input_dir': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
  'ibflow_path': "jaydoe/my_repo/fs/Instabase Drive/flow_proj/my_flow.ibflow",
  'output_has_run_id': True,
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The HTTP call returns immediately while the flow execution proceeds asynchronously in the background. If successful, the response contains a job_id field that you can use to check the status of your execution.

{
  "status": "OK",
  "data": {
    "job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
    "output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/run_jaydoe/2023-04-12-20:22:36/756be65b-0eaf-4192-bea5-176f0377b0f8/out"
  }
}

Restart a flow

Method Syntax
POST URL_BASE/api/v1/flow/restart

Description

Restart a flow from the beginning, under a new job ID.

Request parameters

Name Type Description
job_id string The ID of the job to restart.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description
`status` | Status of the request. Possible values are `OK` and `ERROR`. `data/job_id` | A unique identifier for the job. |

data/output_folder | The full path to the root output folder. |

Examples

Request

url = url_base + '/api/v1/flow/restart'

args = {
  'job_id': "b8f94216-c271-44f2-982c-3e279c6b446d",
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

If successful, the response contains the job_id of the new job as well as its output folder.

{
  "status": "OK",
  "data": {
    "job_id": "756be65b-0eaf-4192-bea5-176f0377b0f8",
    "output_folder": "jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/run_jaydoe/2023-04-12-20:22:36/756be65b-0eaf-4192-bea5-176f0377b0f8/out"
  }
}

Run a metaflow

Method Syntax
POST URL_BASE/api/v1/flow/run_metaflow_async

Description

Run a metaflow.

Request parameters

Parameters are required unless marked as optional.

Name Type Description
input_dir string The folder that contains the data to run the metaflow on.
flow_root_dir string The path to a folder that contains the .ibflow files.
classifier_file_path string The path to the .ibclassifier file.
delete_out_dir boolean Optional. Whether to delete any existing content in the output folder, before running the binary. Is a no-op if output_has_run_id is true. Defaults to false.
output_has_run_id boolean Optional. Whether output should be written into a directory with a unique, timestamped run_id. Defaults to false.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description
status Status of the request. Possible values are: OK, ERROR
data/job_id A unique identifier for the job.
data/output_folder The full path to the root output folder.

Examples

Request

url = url_base + '/api/v1/flow/run_metaflow_async'

api_args = {
  'input_dir': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/data/input",
  'flow_root_dir': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/workflows",
  'classifier_file_path': "/jaydoe/my_repo/fs/Instabase Drive/flow_proj/classifiers/my_classifier.ibclassifier",
  'delete_out_dir': False,
  'output_has_run_id': True,
}
json_data = json.dumps(api_args)

headers = {
  'Authorization': 'Bearer {0}'.format(token),
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The HTTP call returns immediately while the binary execution proceeds asynchronously in the background. If successful, the response contains a job_id field that you can use to check the status of the job execution.

HTTP STATUS CODE 200

# body
{
  "status": "OK",
  "data": {
    "job_id": <string>,
    "output_folder": <string>
  }
}

Job status

Use the job status API to get the execution status of a flow or metaflow job.

Refer to the Job Status API documentation.