23.04 Release notes

Table of Contents

Instabase 23.04 is a major release that introduces new features, enhancements, and bug fixes.

Release 23.04.57

This patch contains testing, optimizations, and other minor internal changes. User functionality is unchanged.

Release 23.04.56

This patch contains testing, optimizations, and other minor internal changes. User functionality is unchanged.

Release 23.04.55

In ML Studio, the field list failed to load in certain circumstances.

Release 23.04.54

This patch contains testing, optimizations, and other minor internal changes. User functionality is unchanged.

Release 23.04.53

This patch contains testing, optimizations, and other minor internal changes. User functionality is unchanged.

Release 23.04.52

This patch contains testing, optimizations, and other minor internal changes. User functionality is unchanged.

Release 23.04.51

All v2 API endpoints now return a 401 status code for authorization errors and a 403 status code for license errors. Previously, a 200 status code was returned, with an error message in the response body, for v2 API errors. See API errors for more information, including v1 API error responses

Release 23.04.50

This patch contains testing, optimizations, and other minor internal changes. User functionality is unchanged.

Release 23.04.49

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.48

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.47

Table editor cells and validations for the field within the table editor did not update correctly when cells were edited to fix validations.

Release 23.04.46

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.45

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.44

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.43

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.42

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.41

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.40

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.40

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.39

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.38

Digitization of rich text format (.rtf) files is now supported.
The job service sometimes crashed under large loads.

Release 23.04.37

During digitization, Reader—and the process files step in Flow—now automatically sizes table columns to prevent truncation when converting CSV to PDF.

Release 23.04.36

You can now process large TIFFs up to 150 pages in Instabase. To configure this functionality, set the following environment variables in celery-app tasks: MAGICK_MEMORY_LIMIT, MAGICK_MAP_LIMIT, and MAGICK_DISK_LIMIT.

Release 23.04.35

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.34

You can now extract barcode information without doing text extraction during the process files step.

To skip text extraction, set skip_text_extraction in the OCR settings to true. This will have the following effects:
- No text present in the generated ibdoc.
- The document layout is altered.
- The flow runtime is significantly reduced.
Trace was not working correctly in the Flow Dashboard.

Release 23.04.33

A new run sync endpoint ({URL-BASE}/api/v1/solution/run_sync) in the Marketplace and Solution API lets you run a solution in sync mode.

Release 23.04.32

Split classifier steps now function correctly when preceded by more than one process files step.

Release 23.04.31

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.30

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.29

When choosing files, the browser might crash if the file was too large to display a preview.
Automation metrics incorrectly assumed that if a validation rule was configured for a document, that the field must exist. This bug fix checks to see that the field also exists in that document.

Release 23.04.28

Under certain circumstances, when copying or moving files to a destination in an encrypted drive, the file service sometimes crashed.
Restricted file extensions are now case insensitive.
Cell-level validations were not highlighted in Flow Review for extracted tables and extracted table lists.

Release 23.04.27

This patch contains only internal changes and testing and does not change functionality for users.

Release 23.04.26

Improved Refiner performance when returning extracted table lists.
File retention jobs were not correctly purging files after 60 days.
Under certain circumstances, the model training and output set creation portions of the Solution Builder onboarding tutorial would break.

Release 23.04.25

Licenses now update automatically for multi-year contracts in which a new license is required each year.

Release 23.04.24

In Cloud Console, a “file not found” error returned a generic 500 internal service error. This has been fixed to return a 404 file not found error.

Release 23.04.23

In Solution Builder, an error in retrieving a trained model might occur when you tried to create a refiner from a table extraction model.
In ML Studio, when a table entity was converted into an annotation, the words in each cell weren’t correctly populated.
Uploading files with certain attachments produced errors and halted the upload process. Now, errors are logged and the upload process continues with the remaining documents.

Release 23.04.22

Models imported from Marketplace that included split classification failed to produce output for the apply classifier step.
Nested field configurations weren’t allowed for extracted table lists.

Release 23.04.21

After copying a validations rule and performing a field swap, the validations card in Solution Builder displayed the same affected fields for both the original rule and the copied rule.
Invalid fields in Flow Review didn’t include an error message in certain circumstances.

Release 23.04.20

Ground truth sets were limited to 20 sets. Now, ground truth sets are paginated to display all available sets.

Release 23.04.0

New features

Platform

The Instabase software as a service (SaaS) offering is now generally available. Instabase SaaS is a fully managed, cloud-based installation, which provides enterprise customers access to the entire Instabase platform without needing to procure, configure, and manage complex infrastructure. To learn more, see the Instabase SaaS documentation.
The Instabase Cloud Console is now available to all SaaS customers. Cloud Console provides a centralized location to view and manage all SaaS deployments in your installation, including viewing detailed deployment health information on the deployment status dashboards. To learn more, see the Cloud Console documentation.
Public preview | This release introduces model training using Ray. Ray, a framework for scaling applications such as machine learning, provides better out-of-memory prevention, more detailed error messages, and improved observability on model training metrics, including GPU usage. These enhancements provide greater visibility and control over the model training process.
- Observability improvement A new Grafana dashboard has been introduced to provide introspection into the state of model training workloads and hardware utilization statistics, including node count, node-based CPU utilization, memory consumption, GPU utilization, and GPU memory consumption. For each diagram in the dashboard, users can access comprehensive descriptions of the associated metrics by clicking on the exclamation mark located in the top left corner.
- Debuggability enhancement There are improvements to debugability for your model-training jobs in ML Studio. You will be able to see more specific error messages such as information about node crashes, out-of-memory errors, and GPU out-of-memory errors which should make it easier to identify underlying issues.
- Out-Of-Memory prevention Ray features a memory monitor that serves as a protection mechanism against excessive memory usage during training jobs. When a training job utilizes more than 95% of the available memory on a node, the memory monitor terminates the job to safeguard the worker node from being affected. An OutOfMemoryError will be surfaced in ML Studio. It is worth noting that if a task exhausts all available memory and the worker node crashes as a result, Ray’s memory monitor will not be able to terminate the task itself. In this scenario, Ray will throw a WorkerCrashedError to indicate that the worker node has crashed and automatically restart the worker.
This release introduces service accounts, which facilitate integrating Instabase with other platforms. You can use OAuth apps and tokens with service accounts, and grant them a range of privileges. Manage service accounts at Admin > Service Accounts.
You can now require users to authenticate using an authenticator app. To enforce authenticating with an app, at Admin > Configuration, enable Require Time-Based One-time Passcode (TOTP).
Google Cloud Storage file-client support is now available to all Instabase customers. You can use Google Cloud Storage as your primary storage system backing Instabase Drive, if you host the platform yourself. All customers, including SaaS customers, can also mount Google Cloud Storage drives to subspaces.
Customers running SaaS deployment can now set user-defined file retention rules to schedule automatic purges of files stored on the Instabase Drive. To learn more, see the file storage section of the site settings documentation.
An XSS risk affecting SaaS deployments is now mitigated by the use of pre-signed URLs for all three cloud storage providers (S3, Azure Blob Storage, and Google Cloud Storage) that Instabase supports. Loading PNG/JEPG/PDFs from Instabase’s file explorer now directs you to load the content directly from the cloud hosts using a pre-signed URL. The life cycle of this link is kept down to 5 minutes. This feature ensures that Instabase’s file system serves content from a host that differs from the platform host.

Infrastructure

Public preview | Pod-to-pod mTLS (Mutual Transport Layer Security) is now available. This can be used to secure traffic in and out of your pods in Instabase, with some ports excluded. To learn more, see the pod-to-pod mTLS documentation.

Deployment Manager

You can now complete version upgrades from the new Deployment Manager Upgrades tab. The improved upgrades process offers more guardrails, automatic database migration, and built-in health checks, among other features. See the upgrades documentation to learn more.
Deployment Manager has an updated design and improved usability, including a collapsible sidebar, tab-based navigation, and sortable tables.
Deployment Manager now includes a platform status dashboard that displays the status of major Instabase systems, so that you can see whether the required systems are running as expected.
A new extra large cluster size deploys a 256 CPU core cluster, in addition to the previous standard and large sizes. Additionally, you can now change your cluster size, which correctly sets replica counts, vertically scales singleton services, and patches environment variables based on the size.
Database migrations are now fully integrated into Deployment Manager. All schema changes are automatically applied during new installation or upgrades, and there’s no need to run a separate dbupdate job prior to deployment.

Solution Builder

Solution Builder is now generally available, and includes numerous UX improvements over the previous release based on feedback from public preview. With Solution Builder, you can more easily build, train, and debug complete data processing solutions from a centralized project workspace.

Low-code tools and shortcuts available from Solution Builder streamline uploading and annotating documents, training models, refining and validating data, and creating an automated flow. Solution Builder provides you with the right artifacts at the right time, so that you no longer have to organize your files and modules manually in the filesystem.

Get started with Solution Builder by importing a pre-built solution from the Marketplace, or use the Solution Builder guide to build your own solution.
You can now generate metrics about data accuracy in flows using a ground truth set, which uses corrected data from Flow Review to establish a standard of accuracy for the flow. After configuring a ground truth set, you can use Solution Builder to review, analyze, or save accuracy reports, which compare results to the ground truth set.

Flow

You can now run load tests on flows within Instabase, rather than relying on external resources for load testing.
Public preview | A new setting in the Apply Checkpoint step supports optional straight-through processing, so individual records that pass validation continue to the next flow step regardless of validation failures in the rest of the batch.

Flow Review

When correcting extracted data in Flow Review, you can select the appropriate value directly in the document preview. Using this method to correct extracted data specifies provenance for the field, which can improve accuracy. This functionality, previously announced as lasso mode in public preview, is now enabled by default for all users. You can click to select values, or use your mouse to draw a box around the information. Enable Toggle bounding boxes to view selectable text. Direct selection of values is supported only for text fields with the text, image, float, or integer output type.
Inline editing lets you edit TEXT, INT, and FLOAT fields in document view, through an inline editor that displays alongside the provenance of that field. Additionally, when you select a field to edit from the right-hand pane, Flow Review automatically zooms into the extracted provenance for that field and opens the inline editor. You can enable the inline editor in the Options menu in the Flow Review fields panel.
Flow Review now displays different alert levels for all validation errors:
- Critical - This is a warning that must be reviewed before continuing the Flow.
- Failure - This is the typical alert level.
- Warning - This alert level is the least severe and is indicated as so in the UI.

ML Studio

Public preview | Model pruning improves inference performance and resource usage in your environments.

Large models can be slow to run and difficult to move between environments. You can now run a pruning job to reduce the size of a published model, allowing faster inference and lower resource usage.

When running a pruning job, set target_sparsity to specify how much to reduce the model size. See model pruning documentation for details.
Public preview | ML Studio now supports the Text (multiple instances) field. This field allows you to annotate all instances of a given piece of information that might exist in a single document, such as a name or a phone number. Annotating all instances of a single piece of information improves model accuracy.

The annotation UI for the Text (multiple instances) type is the same as for “List” fields. Both field types allow you to have multiple items annotated for a single field. However, these field types behave somewhat differently:
- Accuracy metrics: For “List” fields, all annotation items must be in the prediction items for the prediction to be considered correct. For “Text (multiple instances)” fields, the highest confidence prediction item is checked as to whether it matches any annotation item.
- Extracted values display in Refiner: For “List” fields, the entire list is displayed. For “Text (multiple instances)” fields, only the highest confidence item is displayed.
You can now incrementally train, evaluate, or prune any previously trained model, including unpublished models.

For each job type, the following model types are supported:
- Incremental training: Extraction, classification, or split classification.
- Evaluation: Extraction, classification, or split classification.
- Pruning: Extraction.
For incremental training in Solution Builder, you can select unpublished models only from a different model project, which prevents overfitting due to training on the same data twice.
The ibformers v2.1.0 package, shipped with Instabase 23.04, includes hyperparameter information that improves the incremental training and evaluation experience. Now, model accelerators trained on older ibformers versions, which contain outdated model artifacts and hyperparameters, can be incrementally trained and evaluated with the newest ibformers version. Additionally, the original base model for model accelerators can be inferred, and original hyperparameters can be overwritten by recommended defaults.
This release removes the model service dependency on persistent volumes. The model service now uses local volume to store models and integrate it with request routing. This change facilitates workload distribution across different model service pods and eliminates the need to manage a multi-writer persistent volume.

Marketplace

Public preview | A consolidated library of solution accelerators, available in the Instabase Marketplace, enables you to fast-track solution development for common document processing tasks. Solution accelerators include full-fledged solutions that address particular use cases, Marketplace models that are pre-trained on specific document types, and developer packages with a variety of low-code and pro-code functions and libraries.

Warning

ACTION REQUIRED: If you previously published content to Marketplace or Developer Exchange, you must update legacy content to add new metadata keys.
In support of solution accelerators, more robust administrative capabilities are now available in Marketplace Admin, most notably the ability to publish solutions, models, and developer packages, and the ability to edit package and version metadata.

Deployed Solutions

Public preview | You can now view automation metrics in a new Deployed Solutions app. Use the app to add new deployed solutions, manage versions, and view automation metrics. Metrics include key performance indicators such as number of pages processed, percent of records passing all validations, and solution effectiveness as it relates to human reviews. Additionally, new APIs are available for interacting with deployed solutions and automation metrics. Currently, you can run deployed solutions only with an API call.

Enhancements

Platform

The All Users page is now hidden for users without site admin or Manage Users privileges. Additionally, the page now displays only active users by default. You can view inactive users by deselecting Show active users.
To improve observability, Jaeger tracing has been extended to include database calls.

Flow

This release adds support for a stricter Content-Security-Policy header. To enable this functionality, enable the on with the ENABLE_CSP environmental variable in the webapp and apps-server services.
You can now chunk input files for the apply refiner and run model extraction Flow steps. Chunking breaks each input file into smaller chunks at the record level, and then processes them in parallel for better performance and improved system utilization.

To enable chunking for these steps, set the ENABLE_APPLY_REFINER_CHUNKING and ENABLE_RUN_MODEL_EXTRACTION_CHUNKING environment variables to True.

Flow Review

Flow Review performance was improved for larger documents by changing how results are matched with extracted OCR words.
The change history for nested field types, including lists, tables, extracted tables, extracted tables lists, and dicts, now describes the change rather than simply showing the previous value.

ML Studio

You can improve list annotations using annotation analysis results on list, text, and text (multiple instances) fields. Previously, this feature was available on only text fields.
Previously, when you added a file to an annotation set, such as when you redigitized the file with different Reader or OCR settings, the annotations for each field and record might be deleted. Now annotations are updated so that they are valid with respect to the updated file. You can choose to accept or reject the updates in ML Studio’s annotation view.

Reader

The digitization step now supports barcode and QR code detection with any OCR engine. Enable this feature by enabling the barcode model in the entities section of a Reader profile.
The digitization step now supports splitting strings based on character type, so that letters, numbers, and special characters are parsed separately. For example, Name:John is split into Name, :, and John. Enable this capability in OCR configuration settings by selecting Split Concatenated Strings.
This release modifies digitization processes to make input and output file names consistent and uniform across all apps. In addition to output ibmsg and ibdocs, generated output PDFs, texts, and images are given names consistent with the input file name. For example, in previous releases, an input file a.xlsx generated the output file a.ibmsg. With this change, the output file is now a.xlsx.ibmsg.

Warning

ACTION REQUIRED: If your solution uses custom code to construct output file name based on input file name, you must update your code to accommodate the new file name format.
This release offers faster digitization with added support for chunking.

Bug fixes

Platform

New users weren’t prompted to update their password after initial logon.
Customers using Microsoft Authenticator for MFA couldn’t use separate tokens for multiple environments because all environments were assigned the name Instabase. With this fix, each environment is assigned an arbitrary UUID suffix.
Subspaces didn’t load in space selector dropdowns under certain circumstances.

Flow

Jobs on the Flow Dashboard disappeared after they were finished.
After you reclassify a record, you can apply either a class schema that is a combination of all previous extraction steps (by default) or the class schema of the most recently run extraction step. Extraction steps are any apply refiner step or run extraction model step in the flow. The record’s fields are replaced by the fields from the new schema.
Digitization profiles were sometimes created with a default name instead of the specified name.

Flow Review

Flow Review might crash if the schema generated from the flow and the results of a record did not align.
Dictionary fields now autosave when you click away from the dictionary editor.
If you change classes in Flow Review, the record’s results now use the schema from the most recently run extraction step for the class instead of the union of all schema steps.
New field names in Flow Review did not follow the same conventions used in Refiner.
When moving between pages of records in the file list in Flow Review, the “Only Errors” filter did not persist.

ML Studio

Annotations could be lost when you moved or exported them.
Bulk deletion of files in the annotation view was not possible.
If you tried to add files consisting of blank pages or pages containing only an image to an annotation set, ML Studio incorrectly returned an error.
This release fixes issues related to filtering records in the annotation view and test records view. If you’ve selected a record and then filtered that record out, the first record shown in the filtered file list is selected. Additionally, in the test records view, selected class filters are correctly applied as new files are loaded.
Model artifacts resulting from annotation analysis jobs could be published to Marketplace or used in solutions–these model artifacts are meant to be used only for annotation assist.
Erroneous dataset modification warnings appeared if data was stored on NFS drives or if users were editing different datasets in the same model.
When you use the keyboard shortcut (L) to create a new list item for a field, keyboard shortcut, the newly created list item is selected by default.

Refiner

A refiner with more than one dropdown output field incorrectly contained the same value in all dropdown output fields. .
When running flows with an extraction model step, an incorrect FieldExtraction error occurred.

Deployment Guide

Release 23.04 introduces the new Deployment Manager Upgrades tab; using the Upgrades tab is now the only way to complete a version upgrade.

Before upgrading to version 23.04, ensure your Deployment Manager version is up-to-date and running version 23.04.0 or later. To check your Deployment Manager version, use the Version tab (All apps > Deployment Manager > Version).

To update Deployment Manager:
1. Unzip the installation.zip file for the new release, provided by your customer success manager.
2. On the command line, navigate to the unzipped installation folder.
3. Apply the new Deployment Manager configuration file contained within by running the following command: kubectl -n $IB_NS apply -f control-plane/control-plane.yml, where $IB_NS is your Instabase namespace.
When upgrading from release 22.08 or earlier to release 23.04 or later, before starting the upgrade process, you must change the default value of the ENABLE_CONTROL_PLANE_UPGRADES_ROLLBACK variable to false. If you don’t, the upgrade will fail.

Note

This requirement only applies if upgrading from release 22.08 or earlier. If upgrading from a later release, you do not need to complete the following steps.

To change the variable’s value:
1. Update Deployment Manager:
  1. Unzip the installation.zip file for the new release, provided by your customer success manager.
  2. On the command line, navigate to the unzipped installation folder.
  3. Apply the new Deployment Manager configuration file contained within by running the following command: kubectl -n $IB_NS apply -f control-plane/control-plane.yml, where $IB_NS is your Instabase namespace.
2. Run the following command: kubectl edit deployment/deployment-control-plane -n $IB_NS
3. Locate the ENABLE_CONTROL_PLANE_UPGRADES_ROLLBACK variable.
4. Set the value to "False".
5. Save your changes.
After completing your upgrade:
1. Run the following command: kubectl edit deployment/deployment-control-plane -n $IB_NS
2. Locate the ENABLE_CONTROL_PLANE_UPGRADES_ROLLBACK variable.
3. Set the value to "True".
4. Save your changes.
When upgrading from release 22.10 to 23.01 or later, if you have flows that have extraction and refiner steps, you might see that they start failing during refiner execution. To fix this:
- If your flow is using a published model, you can use the ML Studio Utilities app to fix your published models. Go to the Migrate Published Models tab and click Migrate.
  - You need ML Studio Utilities version 2.0.4 to do this. If you don’t have this version, reach out to the Instabase team.
- If your flow is using model projects (that is, unpublished models), you must fix them manually in the file system.
  1. In the file system, open the extraction module folder in your flow modules. This folder contains a JSON file that contains information about the model project. Go to the folder specified in the "model_fs_path" field.
  2. In the folder, there should be a file called package.json. Open that file and check to see if the "result_type" field has the value "ner_result". If not, edit the file so that the "result_type" field has the value "ner_result" (that is, "result_type": "ner_result",). Your flow will now work correctly.
In this release, the jaeger-setup container is removed from the jaeger deployment YAML file. All existing functionality for jaeger-setup has been integrated into the main jaeger container. This change affects any customers using an external ElasticSearch cluster or customers who have modified the default jaeger-setup environment variables. After upgrading to 23.04, any environment variables previously existing in the jaeger-setup container must be defined identically in the jaeger container.

Deprecations and removals

Developer Exchange is deprecated in this release, because developer packages are now included in the new consolidated Marketplace.
Training in the Classifier app is scheduled for deprecation in 23.07 and removal in 24.01 or later. Use ML Studio to train classifiers instead. You can still use the Classifier app to write custom code classifiers that don’t require training, such as heuristic-based classifiers.
A planned Python upgrade in 23.07 might result in breaking errors in classifiers trained with the Classifier app.

Warning

ACTION REQUIRED: Train classifiers using ML Studio or convert the model to ONNX before upgrading to 23.07.
Licenses for MSFT v2.0 OCR disconnected containers will expire on 5/31/23. Customers using disconnected containers are directed to change to MFST v3.0 or MSFT v2.0 connected containers prior to the expiration date. Failure to update licenses before the 5/31 expiration date will cause solutions to stop working in production.
Jupyter Notebook is deprecated in 23.04 and planned for removal in a future version of Instabase.
The api-server-apps service is deprecated in 23.04 and all functionality has been merged into the api-server service. You do not need to take any action. You might see the api-server-apps service in your clusters with 0 replicas or it might be gone altogether.