Observability configuration

The Instabase observability toolset is enabled by default when Instabase is installed. After installation, you can configure your observability toolset to enable optional features or manage preferences.

Before you begin

The files required to enable the observability toolset are applied during the course of a standard installation or upgrade. Any additional configuration assumes the following:

Configure federated cluster statistics collection

If you have Prometheus cluster monitoring enabled, you can optionally federate with the cluster statistics infrastructure to collect and display CPU and memory utilization data in Deployment Manager.

When enabling federated cluster statistics collection, the following parameters are required.

Parameter name Parameter key Description Default value
Enable Federate Job obsPromEnableFederation Enables federation scraping job to collect CPU and memory stats. Turned off ("false")
Get Node Exporter Stats obsPromFederationGetNodeExporterStats Enables the collection of node exporter stats as part of the federation job. Turned off ("false")
Scrape Target for Federate Job obsPromFederationTarget Defines the Prometheus server target for federated metrics scraping CPU/Memory stats data. For example, source-prometheus:9090.

To configure federated cluster statistics collection:

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select the config-prometheus-targets object.

    Info

    config-prometheus-targets shares a ConfigMap template with the config-prometheus-recording-rules, config-alertmanager-routes, and config-prometheus-alerting-rules objects. Editing any of these configs updates the others.

  3. Click Edit Config.

  4. Turn on the Enable Federate Job toggle.

  5. Turn on the Get Node Exporter toggle.

  6. In the Scrape Target for Federate Job field, enter the Prometheus server target.

  7. Click Save.

  8. Restart statefulset-vmagent to reflect your configuration changes.

    1. Open the Deployment Manager Stateful Sets tab (Deployment Manager > Infra Dashboard > Stateful Sets).

    2. From the StatefulSets list, select statefulset-vmagent.

    3. Click Restart.

Configure alerting

You can configure alerting to forward observability alerts to a Slack channel, a specific email address, or OpsGenie. By default, all warnings and critical severity notifications are forwarded.

Slack alerting

When configuring Slack alerting, the following parameters are required.

Parameter name Parameter key Description
Enable Slack Alerts obsEnableSlackAlerting Enables Slack alerting.
Slack Alert URL obsAlertSlackUrl The URL used to connect to the Slack platform for alerting.
Slack Alert Channel obsEnvoyAlertSlackChannel Defines the name of the Slack channel where alerts are posted, such as #alert-observability. Include the # symbol when defining the channel name.

To configure Slack alerting:

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select the config-alertmanager-routes object.

    Info

    config-alertmanager-routes shares a ConfigMap template with the config-prometheus-recording-rules, config-prometheus-targets, and config-prometheus-alerting-rules objects. Editing any of these configs updates the others.

  3. Click Edit Config.

  4. Turn on the Enable Slack Alerts toggle.

  5. InSlack Alert URL, enter the URL to connect to your Slack platform.

  6. In Slack Alert Channel, enter the name of the Slack channel.

  7. Click Save.

  8. Restart deployment-alertmanager to reflect your configuration changes.

    1. Open the Deployment Manager Deployments tab (Deployment Manager > Infra Dashboard > Deployments).

    2. From the deployments list, select deployment-alertmanager.

    3. Click Restart.

Email alerting

When configuring email alerting, the following parameters are required.

Note

The obsAlertEmailAppPassword parameter displays only as a key/value pair in the All Config Parameters tab.

Parameter name Parameter key Description
Enable Email Alerts obsEnableEmailAlerting Enables email alerting.
Sender Email obsAlertSenderEmail Defines the sender email address for any email alerts.
Receiver Email obsAlertReceiverEmail Defines the receiver email address for any email alerts.
SMTP Server obsAlertSmtpServer Defines the URL of the Simple Mail Transfer Protocol (SMTP) email server used for sending alerts.
obsAlertEmailAppPassword Defines the password associated with the SMTP email server. This password is required for authentication purposes when using the SMTP email server.

To configure email alerting:

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select the config-alertmanager-routes object.

    Info

    config-alertmanager-routes shares a ConfigMap template with the config-prometheus-recording-rules, config-prometheus-targets, and config-prometheus-alerting-rules objects. Editing any of these configs updates the others.

  3. Click Edit Config.

  4. In Sender Email, enter the email address from which alerts are sent.

  5. In Receiver Email, enter the email address to which alerts are sent.

  6. In SMTP Server, enter the URL of the SMTP server.

  7. Select the All Config Parameters tab.

  8. Locate the "obsAlertEmailAppPassword" key and enter the SMTP email server password.

  9. Click Save.

  10. Restart deployment-alertmanager to reflect your configuration changes.

    1. Open the Deployment Manager Deployments tab (Deployment Manager > Infra Dashboard > Deployments).

    2. From the deployments list, select deployment-alertmanager.

    3. Click Restart.

OpsGenie alerting

When configuring OpsGenie alerting, the following parameters are required.

Note

OpsGenie parameters display only as key/value pairs in the All Config Parameters tab of the ConfigMap template editor.

Parameter key Description
obsEnableOpsGenieAlerting Enables OpsGenie alerting. Set to "true" to enable.
obsOpsGenieApiKey The API key of the OpsGenie server used for OpsGenie alerting.

To configure OpsGenie alerting:

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select the config-alertmanager-routes object.

    Info

    config-alertmanager-routes shares a ConfigMap template with the config-prometheus-recording-rules, config-prometheus-targets, and config-prometheus-alerting-rules objects. Editing any of these configs updates the others.

  3. Click Edit Config.

  4. Select the All Config Parameters tab.

  5. Locate the "obsEnableOpsGenieAlerting" key and set the value to "true".

  6. Locate the "obsOpsGenieApiKey" key and enter the OpsGenie server API key.

  7. Click Save.

  8. Restart deployment-alertmanager to reflect your configuration changes.

    1. Open the Deployment Manager Deployments tab (Deployment Manager > Infra Dashboard > Deployments).

    2. From the deployments list, select deployment-alertmanager.

    3. Click Restart.

Configure log storage

Grafana Loki is responsible for aggregating, indexing, and storing log information. The storage system to which Loki persists logs can be configured.

By default, Loki uses the same file storage that was selected when installing Instabase. If you want to change where logs are stored, you can configure the log storage location.

The following log storage options are supported:

  • Amazon S3 bucket

  • Azure Blob storage container

  • Network file system (NFS) volume

Amazon S3

Note

By default, the Loki configuration ships with an Amazon S3 template for long term log storage. If you don’t have Amazon S3 storage enabled but would like to, revert the config-loki object to its base configuration then follow the steps in this section. You might need to delete any patches applied to config-loki that changed the default log storage setting.

While Amazon S3 is the default Loki storage option, some additional configuration might be required if your bucket has a non-standard configuration. If you selected Amazon S3 as your storage option during installation but still encounter log storage errors, the configuration can be completed by patching deployment-loki-write to add the following environment variables:

  • LOKI_S3_ACCESS_KEY
  • LOKI_S3_SECRET_ACCESS_KEY
  • LOKI_S3_REGION
  • LOKI_S3_BUCKET_NAME
Before you begin

The patch you apply to deployment-loki-write references a Kubernetes secret called aws-access-key. You must create this secret in Kubernetes, in the same namespace as your Instabase installation. The secret must include two values, one for access-key (your AWS IAM key) and one for secret-access-key (your AWS IAM secret key).

You also need the name of the S3 bucket to use for storage and the region code for your AWS account.

To configure Amazon S3 for log storage:

  1. Using the following template, create a patch with your region code (<YOUR_S3_REGION_CODE>) and bucket name (<YOUR_S3_BUCKET_NAME>).

    apiVersion: apps/v1
    kind: Deployment
    spec:
      template:
        spec:
          containers:
            - name: CONTAINER_NAME
              env:
                - name: LOKI_S3_ACCESS_KEY
                  $patch: replace
                  valueFrom:
                    secretKeyRef:
                      name: aws-access-key
                      key: access-key
                - name: LOKI_S3_SECRET_ACCESS_KEY
                  $patch: replace
                  valueFrom:
                    secretKeyRef:
                      name: aws-access-key
                      key: secret-access-key
                - name: LOKI_S3_REGION
                  value: "<YOUR_S3_REGION_CODE>"
                - name: LOKI_S3_BUCKET_NAME
                  value: "<YOUR_S3_BUCKET_NAME>"
    
  2. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  3. From the Configs dropdown, select deployment-loki-write.

  4. Click Enter Patch

  5. Enter the above patch in the config editor.

  6. Click Preview Changes to validate the patch.

  7. Click Confirm Changes.

Azure Blob storage

Before you begin

You must create a storage account access key for the Azure Blob storage container.

By default, the config-loki ConfigMap is set up to support Amazon S3 storage. To support a different storage provider, you must edit and replace the entire data field of the config-loki ConfigMap. Because the data field inside of a ConfigMap is a simple string, you can’t use patches that target specific lines.

When editing the config-loki ConfigMap, there are three changes required to support Azure Blob storage.

  • Under common.storage, you must replace the s3 configuration with an azure configuration that includes your Azure Blob storage account name, account key, and container name.

    For example:

    data:
      loki.yaml: |
        auth_enabled: false
        analytics:
          reporting_enabled: false
        server:
          http_listen_port: 3100
          http_server_read_timeout: 60s
        common:
          replication_factor: 1
          ring:
            kvstore:
              store: memberlist
          storage:
            azure:
              account_name: <YOUR_STORAGE_ACCOUNT_NAME>
              account_key: <YOUR_STORAGE_ACCOUNT_KEY>
              container_name: <YOUR_CONTAINER_NAME>    
    
  • Under schema_config.configs.object_store, define the object_store as azure.

  • Under storage_config.configs.shared_store, define the shared_store as azure.

The following config-loki excerpt shows these three changes together.

Note

This YAML file is incomplete and can’t be used as a patch.

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-loki
  namespace: ${ib.namespace}
  labels:
    app: loki
data:
  loki.yaml: |-
    auth_enabled: false
    analytics:
      reporting_enabled: false
    server:
      http_listen_port: 3100
      http_server_read_timeout: 60s
    common:
      replication_factor: 1
      ring:
        kvstore:
          store: memberlist
      storage:
        azure:
          account_name: <YOUR_STORAGE_ACCOUNT_NAME>
          account_key: <YOUR_STORAGE_ACCOUNT_KEY>
          container_name: <YOUR_CONTAINER_NAME>    
...
   schema_config:
     configs:
     - from: "2020-12-11"
       index:
         period: 24h
         prefix: index_
       object_store: azure
       schema: v11
       store: boltdb-shipper
  storage_config:
    boltdb_shipper:
      active_index_directory: /data/loki/boltdb-shipper-active
      cache_location: /data/loki/boltdb-shipper-cache
      cache_ttl: 24h
      shared_store: azure
...

To use Azure Blob storage for log storage:

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select config-loki.

  3. Copy the entire contents of the config-loki ConfigMap to your clipboard.

  4. Click Enter Patch

  5. Paste the ConfigMap in the config editor and make the indicated changes.

  6. Click Preview Changes to validate the edits.

  7. Click Confirm Changes.

Connect NFS volume

Before you begin

Approximately 64 GB of space is required for log storage in your NFS volume.

You can use an NFS volume for log storage by patching the config-loki object.

Tip

This patch is also included in your release bundle, in the optional_patches folder.

apiVersion: v1
kind: ConfigMap
metadata:
  name: config-loki
  labels:
    app: loki
data:
  loki.yaml: |
    auth_enabled: false
    analytics:
      reporting_enabled: false
    server:
      http_listen_port: 3100
      http_server_read_timeout: 60s
    common:
      replication_factor: 1
      ring:
        kvstore:
          store: memberlist
      storage:
        filesystem:
          chunks_directory: /data/loki/chunks
          rules_directory: /data/loki/rules
    chunk_store_config:
      max_look_back_period: 0s
    ingester:
      chunk_idle_period: 2h
      chunk_retain_period: 30s
      wal:
        enabled: false
        dir: /data/loki/wal
        flush_on_shutdown: true
      autoforget_unhealthy: true
    memberlist:
      abort_if_cluster_join_fails: false
      join_members:
      - loki-memberlist
    limits_config:
      enforce_metric_name: false
      retention_period: 336h
    schema_config:
      configs:
      - from: "2020-10-24"
        index:
          period: 24h
          prefix: index_
        object_store: filesystem
        schema: v11
        store: boltdb-shipper
    storage_config:
      boltdb_shipper:
        active_index_directory: /data/loki/boltdb-shipper-active
        cache_location: /data/loki/boltdb-shipper-cache
        cache_ttl: 24h
        shared_store: filesystem
    compactor:
      compactor_ring:
        kvstore:
          store: memberlist
      working_directory: /tmp/loki/boltdb-shipper-compactor
      shared_store: filesystem
      retention_enabled: true
    ruler:
      storage:
        type: local
        local:
          directory: /tmp/rules
      rule_path: /tmp/scratch
      alertmanager_url: http://service-alertmanager:9093
      ring:
        kvstore:
          store: memberlist
      enable_api: true
      remote_write:
        enabled: true
        client:
          url: http://service-prometheus-server:9090/api/v1/write
      wal:
        dir: /tmp    

To use an NFS volume for log storage:

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select config-loki.

  3. Click Enter Patch

  4. Enter the above patch in the config editor.

  5. Click Preview Changes to validate the patch.

  6. Click Confirm Changes.

Configure log retention period

By default, logs are stored for 336 hours, or 14 days. You can configure this retention period by modifying the limits_config.retention_period value in the config-loki object.

  1. Open the Deployment Manager Configs tab (All apps > Deployment Manager > Configs).

  2. From the Configs dropdown, select config-loki.

  3. Copy the entire contents of the config-loki ConfigMap to your clipboard.

  4. Click Enter Patch

  5. Paste the ConfigMap in the config editor.

  6. Define the retention_period value, in hours.

  7. Click Preview Changes to validate the edits.

  8. Click Confirm Changes.