Model Service API

The Instabase platform can be accessed with the Model Service API.

You can use HTTP API calls to use and test models. These API endpoints permit single and batch run requests, test requests, and job-based asynchronous execution of models.

To use the following examples, replace instabase.com with your Instabase instance domain, and ACCESS_TOKEN with your access token from Instabase.

Error responses

The following API requests can also return an error response as a 200 OK response with the following format:

{
    "status": "ERROR",
    "msg": "Some error message"
}

Single input model API

Use this API only for testing purposes. The single input model API is an asynchronous API that returns a response containing a job ID.

{
    "status": "OK",
    "job_id": "2c641617-1154-4f5a-bff1-614a7ad03c34"
}

You can then use the Job Status API to query this job. Although the API always returns the job ID, note that the actual model execution might not complete within the model service timeout model_run_timeout_secs. If model execution fails, the Job Status API returns the failure reason.

Request

POST /model-service/run_async HTTP/1.1
Host: https://instabase.com/api/v1
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

// Format the request in JSON that matches the RunModelRequest datatype
{...}

Response

A dictionary that contains the status and job ID. After you query the Job Status API with the job ID, the Job Status API returns you a dictionary that follows the RunModelResponse datatype.

POST method request example

Example of using the API to run a model on a single input using the POST method:

POST /model-service/run_async HTTP/1.1
Host: https://instabase.com/api/v1
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

{
    "model_service_context": {
        "model_paths": ["path/to/label/model/from/walkthrough/"],
    },
    "model_name": "ib_label_model",
    "input_string": "Lets find the labels:\nnet pay\ntax",
    "model_payload": {
        "custom_request": {
            "error_tolerance": 0
        }
    }
}

Python example request

Example of using the API to run a model on a single input using Python code:

import json, requests

url = "https://instabase.com/api/v1/model-service/run_async"

payload = json.dumps({
    "model_service_context": {
        "model_paths": ["path/to/label/model/from/walkthrough/"],
    },
    "model_name": "ib_label_model",
    "input_string": "Lets find the labels:\nnet pay\ntax",
    "model_payload": {
        "custom_request": {
            "error_tolerance": 0
        }
    }
})

headers = {
  'Authorization': 'Bearer ACCESS_TOKEN',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data = payload)

print(response.json())

Example response

Example response after you use the API to run a model on a single input:

{
    "status": "OK",
    "job_id": "2c641617-1154-4f5a-bff1-614a7ad03c34"
}

After you query the Job Status API with the job ID, the Job Status API returns an example response as follows:

{
    "status": "OK",
    "msg": "",
    "state": "DONE",
    "is_waiting_for_resources": false,
    "job_id": "2c641617-1154-4f5a-bff1-614a7ad03c34",
    "results": [{
        "start_time": 1671151500.0,
        "end_time": 1671151500.0,
        "model_result": {
            "start_time": 1606228096.0,
            "end_time": 1606228096.0,
            "model_result": {
                "ner_result": {
                    "entities": [
                        {
                            "content": "net pay",
                            "label": "LABEL",
                            "start_index": 22,
                            "end_index": 29
                        },
                        {
                            "content": "tax",
                            "label": "LABEL",
                            "start_index": 30,
                            "end_index": 33
                        }
                    ]
                }
            }
        }
    }],
    "cur_status": "{}",
    "completed_count": 1,
    "finish_timestamp": null,
    "binary_mode": false
}

Batch inputs model API

Running models on a batch of inputs is often faster than running each input separately. Use this API only for testing purposes. The batch inputs model API is an asynchronous API that returns a job ID, which can be used to query the model results using the Job Status API. Although the API always returns the job ID, note that the actual model execution might not complete within the model service timeout model_run_timeout_secs. If the execution fails, the Job Status API returns the failure reason.

Request

POST /model-service/run_batch_async HTTP/1.1
Host: https://instabase.com/api/v1
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

// This request must be a JSON object that matches the RunModelBatchRequest datatype
{...}

Response

{
    "status": "OK",
    "job_id": "2c641617-1154-4f5a-bff1-614a7ad03c34"
}

After you query the Job Status API with the job ID, the Job Status API returns you a dictionary that follows the RunModelBatchResponse datatype.

POST method example request

Example of using the API to run models on a batch of inputs using the POST method:

POST /model-service/run_batch_async HTTP/1.1
Host: https://instabase.com/api/v1
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

{
    "model_service_context": {
        "model_paths": ["path/to/label/model/from/walkthrough/"]
    },
    "model_name": "ib_label_model",
    "input_strings": [
        "Lets find the labels:\nnet pay\ntax",
        "Random words: total amount tax"
    ],
    "model_payload": {
        "custom_request": {
            "error_tolerance": 0
        }
    }
}

Python example request

Example of using the API to run models on a batch of inputs using Python code:

import json, requests

url = "https://instabase.com/api/v1/model-service/run_batch_async"

payload = json.dumps({
    "model_service_context": {
        "model_paths": ["path/to/label/model/from/walkthrough/"]
    },
    "model_name": "ib_label_model",
    "input_strings": [
        "Lets find the labels:\nnet pay\ntax",
        "Random words: total amount tax"
    ],
    "model_payload": {
        "custom_request": {
            "error_tolerance": 0
        }
    }
})

headers = {
  'Authorization': 'Bearer ACCESS_TOKEN',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data = payload)

print(response.json())

Example response

Example response after you use the API to run models on a batch of inputs:

{
    "status": "OK",
    "job_id": "63a89743-a3d7-4927-8989-6df532fdd71b"
}

After you query the Job Status API with the job ID, the Job Status API returns you an example response as follows:

{
    "status": "OK",
    "msg": "",
    "state": "DONE",
    "is_waiting_for_resources": false,
    "job_id": "63a89743-a3d7-4927-8989-6df532fdd71b",
    "results": [
        {
            "start_time": 1671152600.0,
            "end_time": 1671152600.0,
            "model_results": [
                {
                    "model_result": {
                        "ner_result": {
                            "entities": [
                                {
                                    "content": "net pay",
                                    "label": "LABEL",
                                    "start_index": 22,
                                    "end_index": 29
                                },
                                {
                                    "content": "tax",
                                    "label": "LABEL",
                                    "start_index": 30,
                                    "end_index": 33
                                }
                            ]
                        }
                    },
                    "input_index": 0
                },
                {
                    "model_result": {
                        "ner_result": {
                            "entities": [
                                {
                                    "content": "total amount",
                                    "label": "LABEL",
                                    "start_index": 14,
                                    "end_index": 26
                                },
                                {
                                    "content": "tax",
                                    "label": "LABEL",
                                    "start_index": 27,
                                    "end_index": 30
                                }
                            ]
                        }
                    },
                    "input_index": 1
                }
            ]
        }
    ],
    "cur_status": "{}",
    "completed_count": 1,
    "finish_timestamp": null,
    "binary_mode": false
}

Prescreening model API

Use the prescreening model API to prescreen the baseline of resource consumption (memory and execution time) for model execution.

In the model inference, the model service loads the model into the cache to serve the model. To make sure the current model serving doesn’t impact the other models in the cache, the memory consumption upper limit is defined as model_service_max_model_mem_mb / model_process_cache_size. The upper limit for execution time is model_run_timeout_secs. The three variables listed here are the environment variables from the model-service deployment-config file. Check the config file for values, as deployment configuration varies by environment. The default upper limit for memory consumption is 6000 mb, or 18000 (model_service_max_model_mem_mb) / 3 (model_process_cache_size). The default upper limit for execution time is 120 seconds (model_run_timeout_secs).

If the profile values exceed the upper limit, reconsider using such a model to avoid errors. A model under load can increase beyond just the resource usage profiled by this API. To get a more precise benchmark, execute this API without running other model executions simultaneously.

Request

POST /model/prescreen HTTP/1.1
Host: https://instabase.com/api/v1
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

// This request must be a JSON object that contains model_name, model_version, and input_path
{...}

Response

A dictionary that contains the status and job ID.

After you query the Job Status API with the job ID, the Job Status API returns you a dictionary that follows the RunModelPrescreenResponse datatype.

The response is a dictionary that contains information about the profiled metadata and result of the model execution:

  • start_time: The start time for the model execution in seconds since epoch.

  • end_time: The end time for the model execution in seconds since epoch. The start and end time are used to determine the duration of the model execution.

  • model_result: The prediction results for the model execution.

  • peak_memory_mb: This represents the maximum amount of memory used to run inference in megabyte.

  • execution_time_secs: This represents the total execution time of the model prescreening process in seconds.

POST method example request

Example of using the API to run a model prescreening using the POST method:

POST /model/prescreen HTTP/1.1
Host: https://instabase.com/api/v1
Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

{
    "model_name": "ib_label_model",
    "model_version": "0.0.0",
    "input_path": "path/to/input/document/target.ibdoc"
}

Python example request

Example of using the API to run a model test using Python code:

import json, requests

url = "https://instabase.com/api/v1/model/prescreen"

payload = json.dumps({
    "model_name": "ib_label_model",
    "model_version": "0.0.0",
    "input_path": "path/to/input/document/target.ibdoc"
})

headers = {
  'Authorization': 'Bearer ACCESS_TOKEN',
  'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data = payload)

print(response.json())

Example response

{
    "status": "OK",
    "job_id": "2c641617-1154-4f5a-bff1-614a7ad03c34"
}

You can then use the Job Status API to query this job. After the model is finished running, the model contains a result with the following format:

{
    "start_time": 1606228096.0,
    "end_time": 1606228096.0,
    "model_result": {
        "ner_result": {
            "entities": [
                {
                    "content": "net pay",
                    "label": "LABEL",
                    "start_index": 22,
                    "end_index": 29
                },
                {
                    "content": "tax",
                    "label": "LABEL",
                    "start_index": 30,
                    "end_index": 33
                }
            ]
        }
    },
    "peak_memory_mb": 1234.5678,
    "execution_time_secs": 12.3456
}