Get to know Signals field concepts and output data formats

Signals are a unique type of field which leverage previously extracted data to *derive aggregated *metrics or values. Depending on the signal type, you may compute statistics of your own document data set or a shared pool of aggregated information.

Sample use-cases for signal field types include:

Comparing the Total on an invoice to the historical average to detect anomalies

Detecting document duplicates by searching for old data matching key fields

Computing the probability of observed fields combinations to detect potential fraud

Signals are currently available for access via private Beta.

If you'd like to get started with signals on your data, please get in touch!

They come in two forms:

Pre-configured signals models for a specific use-case, e.g. Invoice Spend Signals

Custom signals configured manually via Sypht Workbench

Probability models compute the probability of observing the extracted field values with respect to others values present in the document. In the example below, we calculate the probability of the observed bank account details appearing for the detected Australian Business Number (ABN).

This can be a useful signal for detecting common invoice fraud where a third party replaces legitimate payment information for a known invoice issuer with their own payment details. These new details will not match historical payment information for this issuer and therefor return a low probability of being legitimate.

`observed`

the set of fields being observed for probability estimates`conditioned`

the set of fields being conditioned on

A `float`

value indicating the probability `P(observed|conditioned)`

computed from historical data.

Field confidence for probability models is correlated with the size of the document dataset being searched. More source data yields higher-confidence probability estimates.

Document match models dynamically search and match previously uploaded documents based on values extracted on a query document. This can be used to power a 3-way match of invoice to purchase and delivery documentation; or in the example below, to detect near-duplicates such and avoid the invoice being processed multiple times:

`fields`

a list of field IDs o match against`exact`

boolean value indicating whether to use fuzzy or exact value matching

Document match searches all previously uploaded documents where the match fields have been extracted.

A `list`

of `documentreference`

values. One for each matched document.

For example:

[{"file_id": "aaaaaa-bbbbb-..."},{"file_id": "cccccc-ddddd-..."}...]

Statistical signals return basic metrics for numerical field values with respect to historical data. This can be used for data analysis and benchmarking of extractions against historical values or market data precedents.

For example, configuring a value statistic signal over the `invoice.total`

field, you may asses the `percentile_rank`

of the Total on an invoice and use this in combination with field conditions to dynamically flag invoices with unusually high values (e.g. 99th percentile) for human review.

`source`

the field identifier to compute historical metrics for. Only fields with numerical data types are accepted.

A dictionary containing descriptive metrics including:

`min`

the minimum observed value of this field in previously extracted data`max`

the maximum observed value of this field in previously extracted data`avg`

the average or mean observed value for this field in previously extracted data`variance`

the numerical variance of this field in previously extracted data`percentile_rank`

the percentile rank of the observed value observed with respect to previously extracted data

{"min": -594.30,"max": 275474.84,"avg": 1961.09,"variance": 5829136.35,"percentile_rank": 70.21478}