Classification field predictions categorise content on a document into a predefined set of classifications. These are distinct from e.g. extraction type fields which make predictions about the content of a document.
A multi-class field is a type of document classification that assigns a single category, or class to a document. For example, the invoice.currencyCode
field determines the currency used in an invoice. It is implemented as a multi-class field as there is usually only a single currency used in a single invoice.
A multi-label field is a similar type of classification, but instead of assigning a single category to a document, a multi-label classification can assign many categories, or labels to a document. For example, the document.type
field determines which of Sypht's fieldsets are applicable to a given document. Since multiple fieldsets can be applicable to a single document, this field is implemented as a multi-label field.
For multi-class and multi-label fields, the value returned for each field is a JSON object.
Here is an example of output for the invoice.currencyCode
multi-class field:
Multi-class example{"name": "invoice.currencyCode","value": {"invoice.currencyCode.AUD": {"value": true, "confidence": 0.69},"invoice.currencyCode.EUR": {"value": false, "confidence": 0},"invoice.currencyCode.GBP": {"value": false, "confidence": 0},"invoice.currencyCode.MYR": {"value": false, "confidence": 0},"invoice.currencyCode.NZD": {"value": false, "confidence": 0.29},"invoice.currencyCode.SGD": {"value": false, "confidence": 0},"invoice.currencyCode.Unknown": {"value": false, "confidence": 0.02}},"confidence": 0.69,"boundingBox": null}
The list of keys in the object enumerate each possible class/label for the field. For the invoice.currencyCode
shown above, there are seven possible classes, six of which represent a currency code and a seventh Unknown
class for cases where Sypht cannot determine the currency. Note that each class is qualified with the field name.
Each class/label has a value and a confidence. For multi-class fields (as above), only a single class will have a value of true
.
For multi-label fields (such as document.type
), multiple labels can have a value of true
. Here is an example of output for the document.type
multi-label field that demonstrates how multiple labels can simultaneously have a value of true:
Multi-label example{"name": "document.type","value": {"bank": { "value": false, "confidence": 1 },"bill": { "value": false, "confidence": 1 },"bpay": { "value": false, "confidence": 1 },"electricity": { "value": false, "confidence": 1 },"generic": { "value": false, "confidence": 1 },"invoice": { "value": true, "confidence": 0.96 },"issuer": { "value": true, "confidence": 0.94 },"ndis": { "value": false, "confidence": 0.93 },"paystub": { "value": false, "confidence": 1 },"recipient": { "value": true, "confidence": 0.81 },"statement": { "value": false, "confidence": 1 },"toll": { "value": false, "confidence": 0.99 },"vehicle": { "value": true, "confidence": 0.76 }},"confidence": 0.511464907008,"boundingBox": null}
Note that invoice
, issuer
, recipient
and vehicle
all have values of true
.