Job API

Potentially data-heavy workflows, such as running a flow or refiner, can have long run times. As a result, these jobs are run asynchronously: when you call the API, the job is started and a job ID is returned, instead of an immediate result. You can use this job ID with the Job API to read and modify the status of the jobs.

In this document, URL_BASE refers to the root URL of your Instabase instance, such as https://www.instabase.com.

Job Status

Method Syntax
GET URL_BASE/api/v1/jobs/status

Description

Use this API to get the current status of a job and information about the results when the job is done.

Request parameters

Parameters are required unless marked as optional.

Name Type Description Values
job_id string job_id reported by the initial execution request.
type string Job type of the job. flow, refiner, job, async, group

The most commonly used type is flow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of request. OK, ERROR
msg Job status message.
state Job state. PENDING, DONE, COMPLETE
job_id The unique identifier for the job.
results Job results information. Present only if job is in a terminal state: PAUSED, COMPLETE, FAILED, CANCELLED, STOPPED_AT_CHECKPOINT
results/status Job completion status. OK, ERROR
results/output_folder Job output folder. Present if status equals OK
results/msg Job error message. Present if status equals ERROR
cur_status Job type specific JSON-encoded string containing current status of job.
cur_status/status The current state of the Flow job. RUNNING, PAUSED, COMPLETE, FAILED, CANCELLED, STOPPED_AT_CHECKPOINT
cur_status/finish_timestamp 10-digit Unix timestamp of Flow end. This field is not available for RUNNING flows.
cur_status/curProgress Progress of the job. [0-1]
cur_status/reviewer Assigned reviewer of job. Empty if none. Instabase username
cur_status/review_state Current state of job in the review process NONE, IN REVIEW, COMPLETED, NOT_COMPLETED
cur_status/flow_job_metrics Metrics for the flow job
cur_status/flow_review_metrics Metrics for the flow review
cur_status/run_summary/recordsWithMsg The number of records that failed jobs grouped by failure type.
cur_status/run_summary/numRecords The number of records processed.
cur_status/run_summary/numRuntimeErrors The number of execution errors.
cur_status/run_summary/numFiles The number of files processed.
cur_status/finish_timestamp Finish timestamp of the job. null if still running.

State Diagram

State diagram for Flow V3 jobs:

Examples

Request

import json, requests, time

job_id = 'uuid_from_run_binary_async_api_call'
url = url_base + f'/api/v1/jobs/status?job_id={job_id}&type=flow'

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

still_running = True
while still_running:
    r = requests.get(url, headers=headers)
    resp = json.loads(r.content)
    still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
    print('still running')
    time.sleep(1)

print('Job finished!')
print(resp)

Response

{
  "status": "OK",
  "msg": "Completed single_flow",
  "state": "DONE",
  "is_waiting_for_resources": false,
  "job_id": "c29e1c75-f7a3-4bf0-a1e8-2ad664b35fa5",
  "results": [
    {
      "status": "OK",
      "output_folder": "/jaydoe/my-repo/fs/Instabase Drive/flow/out"
    }
  ],
  "cur_status": "{\"job_id\": \"c29e1c75-f7a3-4bf0-a1e8-2ad664b35fa5\", \"flow_index\": 0, \"status\": \"COMPLETE\", \"finish_timestamp\": 1659728573413224848, \"curProgress\": 1.0, \"curMsg\": \"Completed single_flow\", \"reviewer\": \"jaydoe\", \"review_state\": \"IN_REVIEW\", \"flow_job_metrics\": null, \"flow_review_metrics\": null, \"run_summary\": {\"numRecords\": 4, \"numRuntimeErrors\": 0, \"numCheckpointFailed\": 0}}",
  "finish_timestamp": 1659728573413224848,
  "binary_mode": true
}

Job Logs

Method Syntax
GET URL_BASE/api/v1/jobs/get_logs

Description

Use this API to get the logs of the job.

Request parameters

Parameters are required unless marked as optional.

Name Type Description Values
job_id string job_id reported by the initial execution request.
offset int Optional. Page offset at which to start fetching logs from.

The most commonly used type is flow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
logs Array of logs. OK, ERROR
next_offset Next page offset to fetch logs from.

Examples

Request

import requests

job_id = 'uuid_from_run_binary_async_api_call'
url = url_base + f'/api/v1/jobs/get_logs?job_id={job_id}'

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.get(url, headers=headers)
resp = json.loads(r.content)
print(resp)

Response

{
  "logs": [
    {"job-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a", "task-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a-Stage1", "level": "INFO", "ts": "2022-12-07 23:27:54,636", "trace_id": "86a9bc25a1e8a436", "span_id": "5c63c2bd3fed5a90", "log": "Starting Task"},
    {"job-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a", "task-id": "4700d0b5-e1ce-4eb2-a265-39e2e63cc65a-Stage1", "level": "INFO", "ts": "2022-12-07 23:27:54,723", "trace_id": "86a9bc25a1e8a436", "span_id": "5c63c2bd3fed5a90", "log": "Initialized flow datastore"}
  ],
  "next_offset": 1
}

List Jobs

Method Syntax
GET URL_BASE/api/v1/jobs/list

Description

Get a list of Flow binary jobs.

Request parameters

Parameters are required unless marked as optional.

Parameter Type Description Values
limit integer Optional. The maximum number of Flow jobs to return. jobs limit per response (default 20)
offset integer Optional. Initial Flow job index to start returning jobs from. Used for pagination with limit. starting index (default 0)
from_timestamp integer Optional. 10-digit Unix timestamp. Returns all jobs started after this timestamp. starting timestamp (default is one week before current timestamp)
to_timestamp integer Optional. 10-digit Unix timestamp. Returns all jobs started before this timestamp. ending timestamp (default is current timestamp)
state string Optional. Returns Flow jobs in any of the input states. When not passed, all Flow jobs are returned. A comma-separated list of PENDING, COMPLETE, FAILED, CANCELLED, RUNNING, PAUSED, CHECKPOINT_FAILED
user string Optional. Returns only the Flow jobs started by this user. When a username is not passed, all Flow jobs for all users are returned. Instabase username
priority integer Optional. Returns all the jobs with the given priority. 0-9
tags string Optional. Returns all Flow jobs that were started with any of the input tags. See how to attach tags to Flow jobs in Run a Flow Binary API. A comma-separated list of tags
pipeline_ids string Optional. Returns all Flow jobs that are associated with any of the input pipelines. A comma-separated list of pipeline ids
review_state string Optional. Returns all Flow jobs that are in any of the given review states. comma-separated list of NONE, IN REVIEW, COMPLETED, NOT_COMPLETED
job_id string Optional. ID associated with each job, or the partial ID. Returns the singular Flow job associated with that job id. A valid job id
job_ids string Optional. Returns all Flow jobs associated with any of the job ids passed in the list. A comma-separated list of job ids.
reviewer string Optional. Returns all Flow jobs that were reviewed and are being reviewed by this user. Instabase username

Response schema

Key Type Description Value
jobs list List of jobs.
jobs/curMSG string A message with details on Flow execution status.
jobs/state string The current state of the job. RUNNING, COMPLETE, FAILED, CANCELLED, STOPPED_AT_CHECKPOINT
jobs/tags list A list of tags that are attached to the given Flow job. A list of tags associated with the job
jobs/job_id string The unique identifier for the job.
jobs/input_folder string Input folder containing the data for the flow. A valid filepath
jobs/output_folder string Output folder containing the results of each step of the flow. A valid filepath
jobs/source_path string Path to the Flow binary. A valid filepath
jobs/flow_type string The flow type. single_flow or metaflow or flows
jobs/is_flow_v3 boolean Whether the Flow is a v3 Flow or not. True, False
jobs/start_timestamp integer 10-digit Unix timestamp of Flow start.
jobs/finish_timestamp integer 10-digit Unix timestamp of Flow end. This field is not available for RUNNING flows.
jobs/runtime_sec number Running time of the Flow in seconds.
jobs/username string User that started the Flow Instabase username
jobs/priority int Priority of the job 0-9
jobs/run_summary/recordsWithMsg number The number of records that failed jobs grouped by failure type.
jobs/run_summary/numRecords number The number of records processed.
jobs/run_summary/numRuntimeErrors number The number of execution errors.
jobs/run_summary/numFiles number The number of files processed.
jobs/reviewer string Assigned reviewer of job. Empty if none. Instabase username
jobs/review_state string Current state of job in the review process NONE, IN REVIEW, COMPLETED, NOT_COMPLETED
jobs/flow_pipeline_infos list[dict] Contains the associated pipeline id and pipeline name in each dictionary
jobs/flow_job_metrics list Metrics for the flow job
jobs/flow_review_metrics list Metrics for the flow review
next_page string Paginated URL to get next page of results A valid request url

Examples

Request

curl $URL_BASE'/api/v1/jobs/list?limit=1&offset=1&from_timestamp=1594188122&to_timestamp=1594792922&state=COMPLETE&user=user234*tags=foo,bar'

Response

{
    "jobs": [
        {
            "curMsg": "Completed Flow",
            "state": "COMPLETE",
            "tags": [
                "foo"
            ],
            "flow_path": "folder/tests/fs/exampleFlowibflowbin"
            "job_id": "this execution's job id",
            "input_folder": "input folder for this Flow",
            "output_folder": "output folder for this Flow",
            "flow_type": "single_flow",
            "is_flow_v3": true,
            "start_timestamp": 1594203413.959266,
            "finish_timestamp": 1594204300.505522,
            "runtime_sec": 886.5462560653687,
            "username": "user234",
            "source_path": "path to .ibflowbin file",
            "run_summary": {
                "recordsWithMsg": {
                    "Error <Unable to connect to OCR> in step process_files": 3
                },
                "numRecords": 113,
                "numRuntimeErrors": 3
                "numFiles": 5
            },
            "reviewer": "user456",
            "review_state": "NONE",
            "flow_pipeline_infos": [],
            "flow_job_metrics": "any job metrics if available",
            "flow_review_metrics": "any review metrics if available"
        }
    ],
    "next_page": "/api/v1/jobs/list?limit=1&offset=2&from_timestamp=1594188122&to_timestamp=1594792922&state=COMPLETE"
}

Pause Job

Method Syntax
GET URL_BASE/api/v1/jobs/pause

Description

Pause a running Flow.

Request parameters

Parameter Type Description Values
job_id string job_id of the Flow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

Resume Job

Method Syntax
GET URL_BASE/api/v1/jobs/resume

Description

Resume a paused Flow.

Request parameters

Parameter Type Description Values
job_id string job_id of the Flow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

To verify that the job has resumed, check the job status).

Cancel Job

Method Syntax
GET URL_BASE/api/v1/jobs/cancel

Description

Cancel a running or paused Flow.

Request parameters

Parameter Type Description Values
job_id string job_id of the Flow.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

The HTTP call returns immediately while cancelling proceeds asynchronously in the background. Note: A cancelled Flow Binary cannot be resumed.

Retry Job

Method Syntax
GET URL_BASE/api/v1/jobs/retry

Description

Retry a failed Flow job.

Note

If you do not want the job to show up in the Flow Review dashboard anymore, make sure to mark the job as reviewed in Flow Review before retrying.

Request parameters

Parameters are required unless marked as optional.

Name Type Description Values
job_id string job_id of the flow.
type dict Optional. Specifying a type retries files within the flow with only a specific type of failure. If type is omitted, all failed files are retried. all, checkpoint_failure, step_failure
  • type=all: retry all failed files.
  • type=checkpoint_failure: resume files that paused at a checkpoint because of validation failure, and continue to execute the rest of the flow steps.
  • type=step_failure: rerun files that errored out at a step (for example, due to a timeout error) from the point at which the step failed. This type also re-runs any downstream steps that are dependant on the earlier failed step.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

The HTTP call returns immediately while retrying proceeds asynchronously in the background. To verify that the job has started running, check the job status.

Pause All Running Jobs

Method Syntax
GET URL_BASE/api/v1/jobs/pause_all

Description

Pause a group of running Flow jobs.

Request parameters

Parameter Type Description Values
to_timestamp integer Optional. 10-digit Unix timestamp. When not passed, the default is set to the current timestamp.
from_timestamp integer Optional. 10-digit Unix timestamp. When not passed, the default is set to a week before the current timestamp.
user string Optional. Instabase username. For admins only, includes only the Flow executions started by this user. When a user name is not passed, all Flow executions for all users are included.
tags string Optional. Comma separated list of tags. Includes running Flow executions that were started with any of the input tags.

A request which does not include any optional parameters pauses all running jobs from the past week.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

Examples

Request

curl $URL_BASE'/api/v1/jobs/pause_all?&from_timestamp=1594188122&to_timestamp=1594792922&tags=foo,bar'

Response

{ "status": "OK" }

Resume All Paused Jobs

Method Syntax
GET URL_BASE/api/v1/jobs/resume_all

Description

Resume a group of currently paused Flow jobs.

Request parameters

Parameter Type Description Values
to_timestamp integer Optional. 10-digit Unix timestamp. When not passed, the default is set to the current timestamp.
from_timestamp integer Optional. 10-digit Unix timestamp. When not passed, the default is set to a week before the current timestamp.
user string Optional. Instabase username. For admins only, includes only the Flow executions started by this user. When a user name is not passed, all Flow executions for all users are included.
tags string Optional. Comma separated list of tags. Includes running Flow executions that were started with any of the input tags.

A request which does not include any optional parameters resumes all paused jobs from the past week.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

Cancel All Jobs

Cancel a group of running Flow jobs by sending a GET request to URL_BASE/api/v1/jobs/cancel_all:

Description

Cancel a group of running or paused Flow jobs.

When authenticated with a site admin account, all jobs matching the criteria will be cancelled. From a non-admin account, only jobs that the authenticated user has access to that match the criteria will be cancelled.

Request parameters

Parameter Type Description Values
to_timestamp integer Optional. 10-digit Unix timestamp. When not passed, the default is set to the current timestamp.
from_timestamp integer Optional. 10-digit Unix timestamp. When not passed, the default is set to a week before the current timestamp.
user string Optional. Instabase username. Admin permissions required. Only cancel jobs started by this user; when not passed, all jobs for all users are included. For non-admin accounts, this parameter is not used and only the authenticated user’s jobs will be cancelled.
tags string Optional. Comma separated list of tags. Includes running Flow executions that were started with any of the input tags.

A request which does not include any optional parameters resumes all paused jobs from the past week.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Description Value
status Status of the request. OK, ERROR

Update job tags

Method Syntax
POST URL_BASE/api/v1/jobs/tags

Description

Update the tags associated with a job.

Request parameters

Parameter Type Description Values
job_id string The ID of the job you want to update tags for. A valid job ID.
names string Optional. The new tags to apply to the job. If this parameter is not included, any existing tags for the job_id provided are deleted. A list of tags.

Response schema

All keys are returned in the response by default, unless marked as optional.

Key Type Description Value
status string Whether the API call succeeded. OK or ERROR
updated_tags list A list of strings with the new tags associated with the job. An empty or non-empty list of the new tags for the job.
msg string Optional. Error message. Present only if status is ERROR. A string describing the error encountered.

Examples

Request

url = url_base + '/api/v1/jobs/tags`

args = {
  'job_id': "f403399d-7ac7-4285-bb06-f7ad82ddbea2",
  'names': ["tag1","tag2","tag3"],
}
json_data = json.dumps(args)

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

r = requests.post(url, data=json_data, headers=headers)
resp_data = json.loads(r.content)

Response

The response body is a JSON object. If successful:

HTTP STATUS CODE 200

# body
{
  "status": "OK",
  "updated_tags": [
    "tag1",
    "tag2",
    "tag3"
  ]
}