Measure accuracy with Target Comparison

Enable the Target Comparison feature for a Refiner run to measure extraction text field extraction accuracy as you build and make incremental changes in your Refiner program.

As you build out an extraction program in Refiner, you might wonder “How accurately is my extraction program extracting against the labeled data?”

Accuracy and progress metrics

Target Comparison is applied only to eligible mapped target fields that are present in the selected targets file. Accuracy metrics are calculated only on eligible fields, documents, or output values. Eligibility is impacted only by the most recent Refiner program run.

Target Comparison informs three types of measurements:

  • Field-level accuracy per field

  • Total field-level accuracy

  • Straight-through-processing accuracy

Field-level accuracy per field

Measures how accurately the extracted values compare against the target values for a single field. Field-level accuracy reveals how individual field extraction is are doing to help you focus on improving the accuracy of a specific field.

Field-level accuracy is shown as a percentage for each mapped field in the output view.

How is field-level accuracy calculated?

(The number of eligible output values that match their respective target values for a field) / (Total number of eligible output values for that field)

Total field-level accuracy

Measures how accurately the extracted values compare against the target values across all fields. Total field-level accuracy helps you understand how your extraction program is performing across all fields you want to extract.

Total field-level accuracy is shown as a percentage in a popup window after a run is complete.

How is total field-level accuracy calculated?

(The number of eligible output values that match their respective target values) / (The total number of eligible output values)

Straight-through-processing accuracy

Measures the percentage of documents that achieve correct extraction for all values. Straight-through-processing accuracy identifies the documents that can be automatically extracted.

Straight-through-processing accuracy is shown as a percentage in a popup window after a run is complete.

How is it calculated?

(The number of eligible documents that have 100% of their eligible output values match respective target values) / (The total number of eligible documents)

Required syntax for the target file

The target dataset is provided in a .csv file with document file names as row headers and the fields you want to extract as column headers. You can map one or more fields. The target file must reside in the Instabase file system.

For example, this sample target file maps the Name and Age fields. The file name-and-age-1.png.ibdoc contains the name John Stone and the age 46, and so on.

"","Name","Age"
"name-and-age-1.png.ibdoc","John Stone","46"
"name-and-age-2.png.ibdoc","Ponnappa Priya","49"
"name-and-age-3.png.ibdoc","Mia Wong","27"
"name-and-age-4.png.ibdoc","Peter Stanbridge","43"
"name-and-age-5.png.ibdoc","Natalie Lee-Walsh","75"

Note: The records listed in the sample target file and the target file itself must have the same .ibdoc filename extension. Both .ibdoc and .ibdoc-0 are valid, but cannot be mixed.

Using Target Comparison in Refiner

The high-level steps to enable and use Target Comparison in Refiner:

  1. Upload a correctly formatted target file.

  2. Map the target file to the Refiner program

    • If you are starting with a new Refiner program, you can do the mapping as you build out the Refiner program.
  3. Compare the extraction results with your target files.

  4. Understand the accuracy metrics.

Make iterative adjustments, adjust your refiner program functions as required, and then repeat the map and compare steps to improve your extraction accuracy.

Example walk-through of using Target Comparison

  1. To upload a target file in a Refiner project, and select File > Settings > Upload a target file.

  2. For each field you want to map, a corresponding entry in the targets file must exist.

    • In Refiner, the documents are automatically mapped for you.
    • Refiner does its best in mapping field names in your target file to the field name in your Refiner program for existing fields it can find.

    As documents and fields become mapped to their respective targets, purple bars in the field list and output table indicate that these documents and fields have a mapped target.

  3. To enable the Target Comparison feature for a Refiner program run, move the Run with targets slider to the right.

    • The Run All and Run Field buttons are purple.
  4. Run your Refiner program or field and review the output.

    • Purple indicators are displayed in the output view for each record that has a value corresponding to its row index in the target .csv files.
    • If the extracted values and the target values do not match, the target values are displayed on a purple background.

    Review the measured accuracy, make refinements, and run again.

  5. To change a mapped field, select the Target name field to map the selected field to a different field in the targets file.

Disable Target Comparison

You can enable and disable the Target Comparison feature for any Refiner program run.

To disable, move the Run with targets slider to the left.

  • When the Targets Comparison feature is disabled, the Run All and Run Field buttons are blue.