Glossary

This glossary defines terms specific to Instabase. For definitions of general deep learning terms, refer to the Google Machine Learning Glossary.

A

annotation

The process of manually identifying document class and/or marking data to be extracted for the purpose of training a model.

annotation set

A collection of documents to which you’ve applied metadata indicating document class and/or data to be extracted.

C

classification

The process of identifying document type.

D

dataset

See: annotation set

drive

The physical or cloud location where Instabase files and artifacts are stored. Your configuration specifies an Instabase drive, which is the default file storage location for your deployment; however, you can add more drives within subspaces.

E

entity

A visual element in a document, such as a checkbox or signature.

extraction

The process of finding and extracting specified data from documents.

F

flow

A sequence of Instabase modules that perform repetitive processes on similar documents.

Flow

The Instabase app used to design and test flows.

Flow pipelines

A workstream used to group and assign flow reviews according to your organization’s security and privacy requirements.

Flow Review

The Instabase app used to manage and conduct human reviews of flow jobs.

H

human review

The process of manually checking a model’s accuracy at assigned tasks. Human reviews are managed and conducted within the Flow Review app.

I

IBDOC (Instabase Document)

The standard file format that represents the output of the process files step, including text content, OCR confidence, and any text fields refined from the document.

IBML (Instabase Markup Language)

The application-agnostic markup language that records information about documents and their contents, used by Instabase apps to exchange data.

M

ML Studio

The Instabase app used to create and train models.

model

The underlying technology that enables document understanding in Instabase. In ML Studio, you create models—either classification or extraction models—customized to address specific use cases.

Training an ML Studio model involves teaching an existing deep learning model to better classify or extract data from your specific annotation set. You begin model training using either a base model from Instabase or another best-in-class provider, or using a Marketplace model: a crowdsourced ML Studio model trained on similar data and published to the Instabase Marketplace.

model project

See: model

P

project

A set of files and directories on the Instabase file system. These files define Instabase components and hold data.

provenance

The origin of some object. In the Instabase platform, provenance tracking identifies where some output came from within its input.

R

Refiner

The Instabase app used to programmatically extract data from documents, or to clean up extracted data.

S

space

The top-level way that directories are organized in the Instabase file system. There are two types of spaces: user spaces, where each user has their own space solely managed by them, and organization spaces, which can have multiple managers and which host shared data. Spaces can be divided into subspaces.

solution

An Instabase flow that’s compiled into a form that can be run in production or shared with others.

Solution Builder

The Instabase app used to design and test solutions.

subspace

The secondary level of directory organization within the Instabase file system. Subspaces are grouped within spaces and can have multiple drives assigned.