Infrastructure requirements

To deploy Instabase, your infrastructure must meet certain requirements.

Your environment must have a machine or virtual machine to serve as a jump box or passthrough, from which deployments can be done. Your environment must also meet the following requirements.

Note

The listed specifications are the bare minimum required for Instabase to run as a proof-of-value.

Kubernetes cluster requirements

These requirements ensure that Instabase can run the necessary Dockerized services in a Kubernetes cluster.

  • Ingress controller for routing traffic to Instabase endpoints.

  • Kubernetes version 1.18 or later.

  • The following node specs are the minimum requirements for an Instabase deployment:

    • Compute node pool with 64 CPU and 256 GB memory avialable. Eight nodes, each with a minimum of eight CPUs and 32 GB RAM works well.

    • (Optional) To enable model training, a GPU node pool with at least eight CPUs, 32 GB of memory, and one Nvidia A10 GPU or better.

Resources created

The Instabase deployment process creates the following Kubernetes resources:

  • Deployments

  • StatefulSets

  • HorizontalPodAutoScaler

  • Services

  • Config Maps

  • Secrets

  • Jobs

  • CronJobs

  • ServiceAccounts

  • Roles

  • RoleBindings

PVC storage

The following service use PersistentVolumeClaims, so you must create Persistent Volumes or Storage Classes for them.

Service Storage Mode Recommended PVC Name
opensearch 100 GB Read Write Once opensearch-pvc
rabbitmq 100 GB Read Write Many rabbitmq-pvc
victoriametrics 100 GB Read Write Once vm-pvc
redis 100 GB Read Write Once redis-pvc
instabase-storage* 500 GB Read Write Many instabase-storage-pvc

* Optional: Only used for data-center deployments to mount the filesystem on an NFS volume for a localfs configuration.

Pod security policy

Most Instabase applications run with user 9999 and group 9999. Pods write data in their storage or in a mounted PVC. They don’t need access to write data to the host path of the node they are in.

Requests and limits are present for most Instabase pods. The admission controller must allow creation of pods with up to eight CPUs and 32 GB of resource requests within the namespace.

Secrets are used to store tokens, passwords, or certificates, and they are consumed by the applications hosted inside the instabase namespace.

Instabase uses the following services for various workloads:

  • NodePort: Used primarily for Nginx.

  • LoadBalancer: Used primarily for Nginx.

  • Headless: Used for StatefulSets.

  • ClusterIP

Kubernetes roles

The following Kubernetes roles are required:

deployment-manager:

kind: Role
metadata:
  namespace: {{namespace}}
  name: control-plane-role
rules:
- apiGroups: [
        "batch",
        "apps",
        "",
        "events.k8s.io",
        "networking.k8s.io",
        "autoscaling",
        "rbac.authorization.k8s.io",
      ]
  resources: [
        "jobs",
        "deployments",
        "daemonsets",
        "replicasets",
        "statefulsets",
        "configmaps",
        "events",
        "endpoints",
        "persistentvolumeclaims",
        "pods",
        "secrets",
        "serviceaccounts",
        "services",
        "networkpolicies",
        "pods/log",
        "horizontalpodautoscalers",
        "roles",
      ]
  verbs: ["get", "list", "watch", "create", "update", "patch", "delete"]

monitoring:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  labels:
    app: kube-state-metrics
  name: instabase-kube-state-metrics
  namespace: {{namespace}}
rules:

- apiGroups: [""]
  resources:
  - configmaps
  - endpoints
  - limitranges
  - persistentvolumeclaims
  - pods
  - replicationcontrollers
  - secrets
  - resourcequotas
  - services
  verbs: ["list", "watch"]

- apiGroups: ["batch"]
  resources:
  - cronjobs
  - jobs
  verbs: ["list", "watch"]

- apiGroups: ["extensions", "apps"]
  resources:
  - deployments
  verbs: ["list", "watch"]

- apiGroups: ["autoscaling"]
  resources:
  - horizontalpodautoscalers
  verbs: ["list", "watch"]

- apiGroups: ["extensions", "networking.k8s.io"]
  resources:
  - ingresses
  verbs: ["list", "watch"]

- apiGroups: ["networking.k8s.io"]
  resources:
  - networkpolicies
  verbs: ["list", "watch"]

- apiGroups: ["extensions", "apps"]
  resources:
  - replicasets
  verbs: ["list", "watch"]

- apiGroups: ["apps"]
  resources:
  - statefulsets
  verbs: ["list", "watch"]

Workload autoscaling

Note

Workload autoscaling is in public preview as of release 23.07 and disabled by default. See the 23.07 release notes for instructions on enabling workload autoscaling.

Workload autoscaling lets Instabase autoscale data services based on demand. Autoscaling optimizes service resources to maximize efficiency and performance for any workload at a given time. Autoscaling is performed with Kubernetes HorizontalPodAutoscalers (HPAs) based on CPU usage for conversion-service, ocr-msft-lite, ocr-msft-v3, and ocr-service.

Workload autoscaling relies on the following Kubernetes components:

  • HorizontalPodAutoscaler (HPA): The minimum HPA version is autoscaler/v2beta2, in Kubernetes version 1.12 or higher.

  • metrics-server: Required by Kubernetes HPA to fetch service resource utilization such as CPU and memory. If this is not already installed, refer to the Kubernetes metrics server installation guide or your cloud provider’s instructions.

See the workload autoscaling documentation for more information.

Database

A database is required to operate the Instabase platform. The database is used for configuration, account management, repository management, and access control rules.

Minimum database requirements:

  • For proof of value environments, two CPUs and 8 GB RAM with 100 GB SSD storage.

  • For production environments, four CPUs and 12 GB RAM with 200 GB SSD storage.

The following database types are supported:

  • MySQL v8.0

  • PostgreSQL v11+

  • Microsoft SQL Server 2014, 2017, 2019, 2022

  • Oracle 19c, 21c

Storage

File system storage is required to store input, intermediate, and output data.

Instabase provides a file system abstraction that is backed by one of the following supported storage systems:

  • Amazon S3, with an S3 bucket in the same region as other components.

  • Azure Blob Storage, with a storage account in the same region as other components.

  • Google Filestore with 2.5 TB SSD storage.

  • Azure Files with 500 GB of SSD storage.

  • For on-premises deployments only, local NFS based storage with at least 500 GB. This will be connected to the Kubernetes cluster using a PVC.

    Note

    S3 simulated APIs for other storage vendors, such as HCP, Dell EMC, Netapp, Oracle, and MinIO are not supported.

AWS S3

To run Instabase on AWS, an S3 bucket is required.

Permissions are required for the following S3 actions:

  • s3:DeleteObject

  • s3:DeleteObjectVersion

  • s3:GetObject

  • s3:GetObjectAcl

  • s3:GetObjectVersion

  • s3:PutObject

  • s3:PutObjectAcl

  • s3:PutObjectVersion

  • s3:ListBucket

  • s3:ListBucketMultipartUploads

  • s3:ListBucketVersions

  • s3:ListMultipartUploadParts

  • s3:AbortMultipartUpload

Azure Blob storage

To set up a file system storage using Azure Blob storage, you must provision an Azure storage account and container. The storage account hosting the container must have the enable storage account key access permission enabled. We recommend that you enable soft deletes for blobs, so that you can recover unintentionally deleted files and folders.

Google Cloud Storage

To set up a file system storage using Google Cloud Storage, you need:

Container registry

Instabase is a containerized platform, and we provide the list of images to you. To avoid latency and bandwidth issues, a customer-hosted container registry is required. Always pull the container images from the Instabase-hosted container registry and push them to your hosted registry.

Requirements for the container registry are:

  • Storage space required is 300 GB or more.

  • Container registry should be in the same region as the other components to reduce latency issues.

  • Registry credentials must be attached to the default service account being used with the pods.

  • No restriction on pulls by pods.

Network requirements

You must implement your own ingress implementation.

For GCP, if you create a LoadBalancer service, a load balancer is automatically created for you. Otherwise, you must have an Ingress and an IngressController pointing to Nginx deployed inside the instabase namespace.

To avoid increased latency, ensure that all components are in the same data center and region. Instabase applications are sensitive to file latency.

At least one subnet with a /20 address range is required when creating the Kubernetes cluster, but we recommend two subnets.

Security

We support both offline mode and online mode. Some of our components require internet access. If internet access is not an option, discuss this with the Instabase team prior to deployment so that we can ship you offline versions of those components.

Instabase workloads communicate with each other for functioning and message passing. On request, we can provide you with a list with all the services and ports that are used.

DNS

A hosted zone or a DNS provider is required, so that the ingress load balancer IP can be pointed to a canonical name. This can be either private or public.

Load balancer

You must have either an L4 or L7 load balancer, to provide a point of ingress from the Kubernetes cluster to our Nginx servers.

Termination of TLS can happen either at an L7 load balancer or inside the cluster at Nginx.

Firewall

Ensure that there’s connectivity between the following services:

  • Kubernetes cluster and load balancer

  • Kubernetes cluster and database

  • Kubernetes cluster and container registry

  • Virtual machine or Kubernetes worker and the Kubernetes cluster’s API server

Deployment machine requirements

You must have a machine to serve as a jump box or passthrough, from which deployments can be done. This machine connects to the Kubernetes cluster and to Deployment Manager, to perform operations as required for deployment. The requirements for this machine are:

  • kubectl binary with kubeconfig of the target cluster.

    • Necessary permissions to apply configurations to the instabase namespace.
  • Zip utility for decompressing the installation package you receive from Instabase.

  • Web browser, preferably Chrome or Firefox, for deployment with Deployment Manager.

  • (Optional) Postman or other tooling for making API calls.

Observability requirements

All self-hosted Instabase deployments are equipped with observability tools that measure and provide visibility into the internal state of the deployment’s infrastructure and applications. There are observability-specific system, storage, and access requirements to support statistics collection, federated cluster statistics collection, and log aggregation. See the observability documentation for details.