Apply UDF

Configuration options for the Apply UDF step.

Input File Extension

The filename extension that identifies the file type to process.

Output File Extension

The filename extension that identifies the file type to generate as output of this step.

Input Folder

Select the input folder that contains the files.

Output Folder

Select the output folder to store the generated output.

Formula

Type or paste a registered custom function. Each function can also use special runtime variables.

Special input variables

The input variables that are available to UDF formulas are:

  • INPUT_COL is the raw text

    • If the Input File Type is IBOCR, then INPUT_COL corresponds to the IBOCR object as text. Use the ParsedIBOCR object to access its fields.
  • INPUT_FILEPATH is the full path to the current document being processed

    • For example, /user/repo/fs/Instabase Drive/path/to/input/file1.pdf.
  • ROOT_OUTPUT_FOLDER is the absolute path to the Flow’s output directory.

  • CONFIG is a set of key-value pairs that are dynamically passed at runtime into a flow binary.

    • An example runtime config:
      {"key1": "val1", "key2": "val2"}
      
  • CLIENTS is an object that contains all of the clients the UDF has access to. This object contains the ibfile object, whose API is compatible with the Instabase Python notebook API. To see the supported methods, see IBFile.

  • REFINER_FNS is an object that provides an API for executing Refiner functions within the UDF. To see the supported method, see REFINER_FNS.

  • TOKEN_FRAMEWORK_REGISTRY is an object for interfacing with the TokenMatcher capabilities. For supported methods, see TokenFrameworkRegistry.

Logging

Use Python’s standard logging library to log messages from an Apply UDF step. You can filter to see only the logs from UDFs by selecting the “Show Developer Logs Only” option.

Note

Flow logs currently have a size limit of 20MB per job ID by default. As a good practice, avoid logging binary values (like images), entire IBDOCs, or extraction results that might contain PII. Logs are stored in the file system.

Note

Logging in UDFs used to be done by the LOGGER object from function context. Although LOGGER is still supported, we recommend you to directly use the logging library from Python now.

Extra settings

Click Extra Settings to access these configuration settings.

Scripts directory

When this folder is selected, all .py files are used to refine the output.