Classification fields

How to use classification field types

What are classification fields?

Classification field predictions categorise content on a document into a predefined set of classifications. These are distinct from e.g. extraction type fields which make predictions about the content of a document.

Classification types

Multi-class

A multi-class field is a type of document classification that assigns a single category, or class to a document. For example, the invoice.currencyCode field determines the currency used in an invoice. It is implemented as a multi-class field as there is usually only a single currency used in a single invoice.

Multi-label

A multi-label field is a similar type of classification, but instead of assigning a single category to a document, a multi-label classification can assign many categories, or labels to a document. For example, the document.type field determines which of Sypht's fieldsets are applicable to a given document. Since multiple fieldsets can be applicable to a single document, this field is implemented as a multi-label field.

Samples

For multi-class and multi-label fields, the value returned for each field is a JSON object.

Here is an example of output for the invoice.currencyCode multi-class field:

Multi-class example
{
"name": "invoice.currencyCode",
"value": {
"invoice.currencyCode.AUD": {"value": true, "confidence": 0.69},
"invoice.currencyCode.EUR": {"value": false, "confidence": 0},
"invoice.currencyCode.GBP": {"value": false, "confidence": 0},
"invoice.currencyCode.MYR": {"value": false, "confidence": 0},
"invoice.currencyCode.NZD": {"value": false, "confidence": 0.29},
"invoice.currencyCode.SGD": {"value": false, "confidence": 0},
"invoice.currencyCode.Unknown": {"value": false, "confidence": 0.02}
},
"confidence": 0.69,
"boundingBox": null
}

The list of keys in the object enumerate each possible class/label for the field. For the invoice.currencyCode shown above, there are seven possible classes, six of which represent a currency code and a seventh Unknown class for cases where Sypht cannot determine the currency. Note that each class is qualified with the field name.

Decoding classifications

Each class/label has a value and a confidence. For multi-class fields (as above), only a single class will have a value of true.

For multi-label fields (such as document.type), multiple labels can have a value of true. Here is an example of output for the document.type multi-label field that demonstrates how multiple labels can simultaneously have a value of true:

Multi-label example
{
"name": "document.type",
"value": {
"bank": { "value": false, "confidence": 1 },
"bill": { "value": false, "confidence": 1 },
"bpay": { "value": false, "confidence": 1 },
"electricity": { "value": false, "confidence": 1 },
"generic": { "value": false, "confidence": 1 },
"invoice": { "value": true, "confidence": 0.96 },
"issuer": { "value": true, "confidence": 0.94 },
"ndis": { "value": false, "confidence": 0.93 },
"paystub": { "value": false, "confidence": 1 },
"recipient": { "value": true, "confidence": 0.81 },
"statement": { "value": false, "confidence": 1 },
"toll": { "value": false, "confidence": 0.99 },
"vehicle": { "value": true, "confidence": 0.76 }
},
"confidence": 0.511464907008,
"boundingBox": null
}

Note that invoice, issuer, recipient and vehicle all have values of true.