Signals

Get to know Signals field concepts and output data formats

What are Signals?

Signals are a unique type of field which leverage previously extracted data to derive aggregated metrics or values. Depending on the signal type, you may compute statistics of your own document data set or a shared pool of aggregated information.

Sample use-cases for signal field types include:

  • Comparing the Total on an invoice to the historical average to detect anomalies

  • Detecting document duplicates by searching for old data matching key fields

  • Computing the probability of observed fields combinations to detect potential fraud

How do I access signals?

Signals are currently available for access via private Beta.

If you'd like to get started with signals on your data, please get in touch!

They come in two forms:

Signal Types

Probability Models

Probability models compute the probability of observing the extracted field values with respect to others values present in the document. In the example below, we calculate the probability of the observed bank account details appearing for the detected Australian Business Number (ABN).

This can be a useful signal for detecting common invoice fraud where a third party replaces legitimate payment information for a known invoice issuer with their own payment details. These new details will not match historical payment information for this issuer and therefor return a low probability of being legitimate.

Sample config for computing the conditional probability of an observed value

Configuration

  • observed the set of fields being observed for probability estimates

  • conditioned the set of fields being conditioned on

Output

A float value indicating the probability P(observed|conditioned) computed from historical data.

Field confidence for probability models is correlated with the size of the document dataset being searched. More source data yields higher-confidence probability estimates.

Document Match

Document match models dynamically search and match previously uploaded documents based on values extracted on a query document. This can be used to power a 3-way match of invoice to purchase and delivery documentation; or in the example below, to detect near-duplicates such and avoid the invoice being processed multiple times:

Sample configuration to match documents by reference number and date

Configuration

  • fields a list of field IDs o match against

  • exact boolean value indicating whether to use fuzzy or exact value matching

Document match searches all previously uploaded documents where the match fields have been extracted.

Output

A list of documentreference values. One for each matched document.

For example:

[
{
"file_id": "aaaaaa-bbbbb-..."
},
{
"file_id": "cccccc-ddddd-..."
}
...
]

Statistics

Statistical signals return basic metrics for numerical field values with respect to historical data. This can be used for data analysis and benchmarking of extractions against historical values or market data precedents.

For example, configuring a value statistic signal over the invoice.total field, you may asses the percentile_rank of the Total on an invoice and use this in combination with field conditions to dynamically flag invoices with unusually high values (e.g. 99th percentile) for human review.

Sample configuration to extract field value statistics for the invoice.total field

Configuration

  • source the field identifier to compute historical metrics for. Only fields with numerical data types are accepted.

Output

A dictionary containing descriptive metrics including:

  • min the minimum observed value of this field in previously extracted data

  • max the maximum observed value of this field in previously extracted data

  • avg the average or mean observed value for this field in previously extracted data

  • variance the numerical variance of this field in previously extracted data

  • percentile_rank the percentile rank of the observed value observed with respect to previously extracted data

{
"min": -594.30,
"max": 275474.84,
"avg": 1961.09,
"variance": 5829136.35,
"percentile_rank": 70.21478
}