Model API

Use the Model API to get information about a specific training, evaluation, or pruning job in a model project.

Metrics API

Method Syntax
GET api/v2/model/metrics

Description

Use this API to get metric information for a job, including F1 scores, confusion matrices, dataset record counts, and more.

URL parameters

Parameters are required unless marked as optional.

Name Type Description
model_project_path string The path to the model project’s folder.
job_id string The specific job’s ID.
job_type string The job type. Valid job types are training, pruning, evaluation.

Request body

The request has no body.

Response status

Unless otherwise specified, a 2XX status code indicates the request was successful.

Response schema

Key Description
metrics A list of tables (dictionary) that include F1 Score, Precision, Recall, Support, ECE, and train-evaluation loss datapoints. If the job is from a classification model, a confusion matrix table is included. If hyperparameter tuning was done, a table for hyperparameter search results is included.
metrics[i]/title The title of the table.
metrics[i]/subtitle A description of the data in the table.
metrics[i]/headers A list of column names.
metrics[i]/rows A list of row entry lists.
metrics[i]/show An optional boolean for whether or not the table should be shown in a job’s metrics tab.
platform_version The Instabase platform version string, such as "23.01.0".
ibformers_version The ibformers version string, such as "2.0.1".
train_count A dictionary that maps class name to its number of annotated train records (one class for extraction models, multiple for classification models).
train_total The total number of train records used.
train_datasets A list of associated datasets that contained annotated train records.
test_count A dictionary that maps class name to its number of annotated test records (one class for extraction models, multiple for classification models).
test_total The total number of test records used.
test_datasets A list of associated datasets that contained annotated test records.
hyperparams A dictionary mapping Hyperparameter names to their values.
Note

In order to receive complete results, you might need to retrain/reprune/reevaluate models initially ran before 23.07. Without rerunning, older jobs will trigger a best-effort retrieval, but might return incomplete results, such as missing a confusion matrix table in metrics or returning an empty platform_version value.

Examples

Example for a classification job

Request

import json, requests, time

model_project_path = 'user1/my-repo/fs/Instabase Drive/PaystubClassifier'
job_id = 'dc991a28-bd92-47bb-ajj8-e0f2ca930e51'
job_type = 'training'
url = url_base + f'/api/v2/model/metrics?model_project_path={model_project_path}&job_id={job_id}&job_type={job_type}'

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

still_running = True
while still_running:
    r = requests.get(url, headers=headers)
    resp = json.loads(r.content)
    still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
    print('still running')
    time.sleep(1)

print('Request finished!')
print(resp)

Response

{
  "metrics": [
    {
      "title": "Record-wise Classifier Metrics",
      "subtitle": "Performance of the Classifier model measured on the Record level. Measure how accurately model is classifing each Record to the given class",
      "headers": ["Class Type", "F1 Score", "Precision", "Recall", "Support", "ECE"],
      "rows": [
        ["W2", "90.91%", "100.00%", "83.33%", 6.0, "0.051"],
        ["ADP", "88.89%", "80.00%", "100.00%", 4.0, "0.125"],
        ["macro avg", "89.90%", "90.00%", "91.67%", 10.0, "N/A"],
        ["weighted avg", "90.10%", "92.00%", "90.00%", 10.0, "N/A"]
      ]
    },
    {
      "title": "Chunk-wise Classifier Metrics",
      "subtitle": "Performance of the Classifier model measured on the Chunk level. Measure how accurately model is classifing each Chunk to the given class",
      "headers": ["Class Type", "F1 Score", "Precision", "Recall", "Support", "ECE"],
      "rows": [
        ["W2", "90.91%", "100.00%", "83.33%", 6.0, "0.051"],
        ["ADP", "88.89%", "80.00%", "100.00%", 4.0, "0.125"],
        ["macro avg", "89.90%", "90.00%", "91.67%", 10.0, "N/A"],
        ["weighted avg", "90.10%", "92.00%", "90.00%", 10.0, "N/A"]
      ]
    },
    {
      "title": "Train-Validation Curve",
      "subtitle": "",
      "headers": ["Dataset Split", "Epoch 1", "2", "3"],
      "rows": [
        ["Train", 0.54, 0.21, 0.15],
        ["Eval", 0.25, 0.31, 0.29]
      ],
      "show": false
    },
    {
      "title": "Confusion Matrix",
      "subtitle": "",
      "headers": ["class name", "W2", "ADP"],
      "rows": [
        ["W2", 5, 1],
        ["ADP", 0, 4]
      ],
      "show": false
    },
    {
      "title": "Hyperparameter search results",
      "subtitle": "Top 5 runs from hyperparameter search, sorted by the objective value on validation dataset",
      "headers": ["Trial number", "learning_rate", "num_train_epochs", "class_weights_ins_power", "gradient_accumulation_steps", "Objective value"],
      "rows": [
        ["0", "4.565751263925242e-06", "18", "0.6417572172915128", "2", "1.0000"],
        ["1", "1.7003557680081774e-05", "6", "0.5791602253545789", "2", "1.0000"],
        ["2", "6.981284268085946e-06", "13", "0.47444889629309955", "1", "1.0000"],
        ["3", "6.801683402386447e-06", "21", "0.4066896735271678", "2", "1.0000"],
        ["5", "1.8975639033091587e-05", "14", "0.4435692137254761", "2", "1.0000"]
      ]
    }
  ],
  "platform_version": "23.05.0",
  "ibformers_version": "2.1.0",
  "train_count": {
    "W2": 10,
    "ADP": 14
  },
  "train_total": 24,
  "train_datasets": ["Paystubs"],
  "test_count": {
    "W2": 4,
    "ADP": 3
  },
  "test_total": 7,
  "test_datasets": ["Paystubs"],
  "hyperparams": {
    "model_name": "layoutlm-base-uncased",
    "model_source": "instabase",
    "task_name": "SEQUENCE_CLASSIFICATION",
    "batch_size": 4,
    "gradient_accumulation_steps": 1,
    "max_length": 512,
    "chunk_overlap": 64,
    "learning_rate": 5e-05,
    "lr_scheduler_type":
    "constant_with_warmup",
    "use_mixed_precision": true,
    "loss_type": "ce_ins",
    "num_train_epochs": 5,
    "task_type": "classification",
    "npages_to_filter": 20,
    "class_weights_ins_power": 0.2,
    "do_hyperparam_optimization": true,
    "do_calibration": true,
    "class_weights_ins_power": 0.3,
    "hp_search_num_trials": 20,
    "calibration_model": "PlattScalingCalibrationModel",
    "hp_search_param_space": []
  }
}

Example for an extraction job

Request

import json, requests, time

model_project_path = 'user1/my-repo/fs/Instabase Drive/W2ExtractionModel'
job_id = 'dc991a28-bd92-47bb-ajj8-e0f2ca930e38'
job_type = 'training'
url = url_base + f'/api/v2/model/metrics?model_project_path={model_project_path}&job_id={job_id}&job_type={job_type}'

headers = {
  'Authorization': 'Bearer {0}'.format(token)
}

still_running = True
while still_running:
    r = requests.get(url, headers=headers)
    resp = json.loads(r.content)
    still_running = (resp['status'] == 'OK') and (resp['state'] != 'DONE')
    print('still running')
    time.sleep(1)

print('Request finished!')
print(resp)

Response

{
  "metrics": [
    {
      "title": "Individual fields level metrics",
      "subtitle": "Accuracy scores for individual fields learned by the model",
      "headers": [
          "Field Name",
          "Precision",
          "Recall",
          "F1 Score",
          "Support",
          "ECE"
      ],
      "rows": [
          [
              "Micro Average",
              "83.33%",
              "83.33%",
              "83.33%",
              12,
              "N/A"
          ],
          [
              "Macro Average",
              "83.33%",
              "83.33%",
              "83.33%",
              "N/A",
              "N/A"
          ],
          [
              "Gross Pay",
              "83.33%",
              "83.33%",
              "83.33%",
              6,
              "0.166"
          ],
          [
              "Pay Date",
              "83.33%",
              "83.33%",
              "83.33%",
              6,
              "0.223"
          ]
      ]
    },
    {
      "title": "Individual token level metrics",
      "subtitle": "Accuracy scores for individual fields learned by the model on token level",
      "headers": [
          "Field Name",
          "Precision",
          "Recall",
          "F1 Score",
          "Support"
      ],
      "rows": [
          [
              "Micro Average",
              "100.00%",
              "92.59%",
              "96.15%",
              27
          ],
          [
              "Macro Average",
              "100.00%",
              "90.36%",
              "94.87%",
              "N/A"
          ],
          [
              "Gross Pay",
              "100.00%",
              "95.00%",
              "97.44%",
              20
          ],
          [
              "Pay Date",
              "100.00%",
              "85.71%",
              "92.31%",
              7
          ]
      ]
    },
    {
      "title": "Train-Validation Curve",
      "subtitle": "",
      "headers": ["Dataset Split", "Epoch 1", "2", "3"],
      "rows": [
        ["Train", 0.54, 0.21, 0.15],
        ["Eval", 0.25, 0.31, 0.29]
      ],
      "show": false
    }
    {
      "title": "Hyperparameter search results",
      "subtitle": "Top 5 runs from hyperparameter search, sorted by the objective value on validation dataset",
      "headers": ["Trial number", "learning_rate", "num_train_epochs", "class_weights_ins_power", "gradient_accumulation_steps", "Objective value"],
      "rows": [
        ["0", "4.565751263925242e-06", "18", "0.6417572172915128", "2", "1.0000"],
        ["1", "1.7003557680081774e-05", "6", "0.5791602253545789", "2", "1.0000"],
        ["2", "6.981284268085946e-06", "13", "0.47444889629309955", "1", "1.0000"],
        ["3", "6.801683402386447e-06", "21", "0.4066896735271678", "2", "1.0000"],
        ["5", "1.8975639033091587e-05", "14", "0.4435692137254761", "2", "1.0000"]
      ]
    }
  ],
  "platform_version": "23.05.0",
  "ibformers_version": "2.1.0",
  "train_count": {
    "W2": 10
  },
  "train_total": 10,
  "train_datasets": ["Paystubs"],
  "test_count": {
    "W2": 4
  },
  "test_total": 4,
  "test_datasets": ["Paystubs"],
  "hyperparams": {
    "model_name": "instalm-base-draft",
    "model_source": "instabase",
    "task_name": "TOKEN_CLASSIFICATION",
    "batch_size": 4,
    "gradient_accumulation_steps": 1,
    "max_length": 512,
    "chunk_overlap": 64,
    "learning_rate": 5e-05,
    "lr_scheduler_type":
    "constant_with_warmup",
    "use_mixed_precision": true,
    "loss_type": "ce_ins",
    "num_train_epochs": 5,
    "task_type": "classification",
    "npages_to_filter": 20,
    "class_weights_ins_power": 0.2,
    "do_hyperparam_optimization": true,
    "do_calibration": true,
    "class_weights_ins_power": 0.3,
    "hp_search_num_trials": 20,
    "calibration_model": "PlattScalingCalibrationModel",
    "hp_search_param_space": []
  }
}