Introduction The Power of LLMs for Automating Unstructured Data The Limitations of LLMs 1. Robust Business Rules and Validations 2. Trustworthy and Compliant Solutions: Taking Action with Instabase Conclusion

Author: Matt Weaver, Principal Engineer


Instabase has been trusted as a leader in the field of unstructured data transformation for large enterprises such as AXA, NatWest, Rocket Mortgage & Standard Chartered for many years. By enabling these organizations to turn complex documents, emails, and web pages into valuable, structured information, Instabase has played a pivotal role in automating their critical business challenges. As we continue to innovate, Instabase is now bringing the power of Generative AI models like GPT to the platform, with all of the crucial enterprise features needed to trust the power of AI with the critical data in your key business processes.

In the era of Large Language Models (LLMs) such as GPT, many companies are attempting to incorporate this powerful new technology into their software. However, whilst LLMs offer tremendous value, they can also come with their own set of limitations. How can we trust the outputs of these models? How can I ensure my data is private and secure? 

Challenges in document processing such as, handling large datasets and very long documents, eliminating hallucinations, encoding complex layouts, and incorporating visual features such as signatures and pen-marks, still persist. Merely adding LLMs as a bolt-on solution is not enough to address the complexities of enterprise unstructured data. Even once these complex engineering challenges are solved, how can one trust the integrity of the outputs from these models in highly regulated environments with very little margin for error? And how can the security, privacy and integrity of the sensitive information submitted for processing be guaranteed?

To truly solve this problem, a comprehensive enterprise platform is required, encompassing multiple models, validations, refinements, security, and scalability. In this blog post, we will explore how Instabase, the leading enterprise platform for unstructured data, provides the necessary tools and features to overcome these challenges and deliver trustworthy solutions for unstructured data at scale.

The Power of LLMs for Automating Unstructured Data

Large Language Models such as GPT have undoubtedly changed the game when it comes to Intelligent Document Processing (IDP). IDP solutions which could previously take weeks to configure due to the time-consuming process of collecting document samples, annotating them, and then training and refining models, can now take just hours or even minutes to build. This rapid solution development that’s unlocked by LLMs, enables large enterprises to drastically improve their time-to-value for each new use-case tackled.

Across the board, enterprises are already exploring how they may be able to use LLMs to unlock new value in all kinds of complex use cases, with a much lower technical barrier to entry and fewer specialist skills required to achieve fantastic results. When implemented correctly, and with the right controls in place, IDP represents a perfect low-risk entry point for the use of these new models. For example, IDP solutions can be non-judgemental (“the net pay on this paystub is X”) vs. judgemental (“this person’s personal income represents an acceptable risk for us to lend them this mortgage”), meaning that the impact of challenges such as model bias is significantly reduced, making compliance & Model Risk Management (MRM) processes much easier to complete.

The Limitations of LLMs

LLMs, although powerful, have inherent limitations. These can be grouped into 2 broad categories, reliability/accuracy and data privacy & security. 

They lack the ability to understand document layouts and struggle with encoding visual features, where so much information is stored. Moreover, token limitations set boundaries on the size of documents that can be processed, making it impossible to unlock value from very large datasets, or very long documents such as Annual Reports. Instabase recognizes these limitations and has developed innovative solutions to overcome them; by encoding visual and layout information before querying the LLM, Instabase ensures accurate interpretation and preservation of complex document structures. This approach, coupled with the ability to switch API calls to other models, including self-hosted ones, enhances flexibility and mitigates concerns regarding cost, latency, and data security.

Hence, even though every large enterprise is trying to use GPT and other LLMs or build their own, the use cases are still fairly restricted, primarily in the individual productivity realm. To truly achieve enterprise-grade outcomes, in a scalable and compliant manner, enterprises need:

  • A robust approach that combines the outputs from these models with refinement and validation capabilities
  • This powerful capability must be available on a proven, highly secure and private enterprise-grade platform.
1. Robust Business Rules and Validations

A critical aspect in the realm of enterprise unstructured data is the need for accurate and confident results, yet LLMs alone can often hallucinate and lack consistent confidence scores meaning they are often unaware of their own mistakes. To overcome these challenges, Instabase has developed a sophisticated trust layer around the LLM to totally eliminate any hallucinations during data extraction, in addition to providing a simple, no-code experience to configure robust business rules, refinements and validations after querying any LLM. 

These rules and validations can be tailored to specific business processes, ensuring the delivery of reliable insights, and enabling low-confidence data to be seamlessly passed to a human-in-the-loop for review and correction. With Instabase, identifying and rectifying incorrect data becomes easier, empowering organizations to make informed decisions based on trustworthy information.

2. Trustworthy and Compliant Solutions:

In today’s data-driven world, security and compliance are paramount concerns for enterprises. Instabase offers the unique advantage of being the most trusted enterprise unstructured data platform which is SOC2 Type II compliant and is designed to be compliant with GDPR, HIPAA, and CCPA (Instabase Trust Center). Any data submitted through Instabase is handled with the highest standards of security and privacy, with full encryption, zero data retention or re-use of your data in any way by any 3rd party (e.g. OpenAI, Microsoft). 

With these enterprise-grade SaaS security certifications, organizations can eliminate the burden of infrastructure costs and access the latest in AI innovation with maximum trust, whilst fully unlocking the value that can now be created through this huge leap forwards in unstructured data processing.

Taking Action with Instabase

If you’re ready to address your highly complex unstructured data challenges, Instabase is here to help! The AI Hub platform empowers organizations to set up their own cutting-edge, secure automation solutions quickly and dive right into solving high-value use cases at-scale. Please contact us to be connected with our team and schedule a tailored demo and presentation on all of the platform’s enterprise features. 


LLMs have revolutionized the field of unstructured document processing and content intelligence, but the devil is in the detail, and these powerful new models in isolation, are not standalone solutions for the challenges faced by large enterprises. To tackle the complexities and limitations of LLMs, Instabase has launched AI Hub, an enterprise platform that provides refinement, validation, security, and scalability. By partnering with Instabase, organizations can unlock huge value from their unstructured data without compromising on trust, security and data integrity.