Diffs API

Use the Diff API to run a diff operation with configurable sensitivity settings and signature detection customizations.

See Instabase API authorization and response conventions for authorization and error convention details.

Text Diff

The following code demonstrates how to request a text diff operation using the Diff API.

A diff can be generated on these levels of specificity:

  • Line

  • Word

  • Character

Request

  import json, requests

  # Set the URLs for your Instabase installation
  # --------------------------------------------
  INSTABASE_URL = u'https://www.instabase.com'
  API_PREFIX = u'/api/v1'
  API_PATH = u'/diff/gen-ibdiff'
  TOKEN = '<YOUR_API_TOKEN>'

  # Set your folder paths
  # ---------------------
  # The input files can be both ibocrs or the original pdfs
  original_filepath = 'owner/repo_name/fs/path/to/orig-file.pdf.ibocr'
  modified_filepath = 'owner/repo_name/fs/path/to/mod-file.pdf.ibocr'
  output_folder = 'owner/repo_name/fs/path/to/output_folder'

  # Set the correct settings
  # ------------------------

  settings = {
    mode: "text" | "word" | "character",
    persist: false,
  }
  # Perform the request
  # -------------------

  headers = {
    'Authorization': 'Bearer {0}'.format(TOKEN)
  }

  data = json.dumps({
    'original_filepath': original_filepath,
    'modified_filepath': modified_filepath,
    'output_folder': output_folder,
    'settings': settings
  })

  resp = requests.post(
    u'{0}{1}{2}'.format(INSTABASE_URL, API_PREFIX, API_PATH),
    headers=headers,
    data=data
  ).json()

Response

The text diff is computed asynchronously. The immediate response of the diff request is a job status that can be used to request the status of the job.

ibdiff Output Format

After the job status shows that the diff operation is complete, a results array that contains your diff is returned. If you passed in the persist option, an .ibdiff file is generated in the specified output_folder.

Job status result:

// response content
{
  // ...job status response metadata
  results: [
    {
      diff_content: [
        // There are 3 types of tuples that are generated in the diff result.
        // First item in the tuple is an integer indicating the type of result:
        //    0 = unchanged
        //    -1 = removed from original document
        //    1 = added in modified document
        // Second item in the tuple is a string (including newlines) that delimits the change
        [0, "some unchanged text"],
        [-1, "some removed text"],
        [1, "some added text"],
      ],
    },
  ];
}

Persisted .ibdiff file:

// ibdiff file content
{
  "diff_type": 'text' | 'word' | 'character',
  "diff_content": [
    // There are 3 types of tuples that are generated in the diff result.
    // First item in the tuple is an integer indicating the type of result:
    //    0 = unchanged
    //    -1 = removed from original document
    //    1 = added in modified document
    // Second item in the tuple is a string (including newlines) that delimits the change
    [0, "some unchanged text"],
    [-1, "some removed text"],
    [1, "some added text"],
  ],
  "metadata": {
    "data_paths": [
      {
        "base_path": "owner/repo_name/fs/path/to/orig-file.pdf.ibocr",
        "changed_path": "owner/repo_name/fs/path/to/mod-file.pdf.ibocr"
      }
    ],
    "settings": {
      "mode": "text",
      "wrap": false, // for diff-checker app default display mode
      "persist": true
    }
}

Exporting .ibdiff files to HTML

To export the .ibdiff file to HTML, create a POST request to /api/v1/diff/export-html with the following body parameters:

{
    "ibdiff_path": "<path to the .ibdiff file>",
    "output_folder": "<path to store output HTML>"
}

Sample response

{
    "status": "OK",
    "href": "<output_folder>/<file_name>.pdf.ibdoc-compare.html?content-disposition=html"
}