Classification fields

How to use classification field types

What are classification fields?

Classification field predictions categorise content on a document into a predefined set of classifications. These are distinct from e.g. extraction type fields which make predictions about the content of a document.

Classification types

Multi-class

A multi-class field is a type of document classification that assigns a single category, or class to a document. For example, the invoice.currencyCode field determines the currency used in an invoice. It is implemented as a multi-class field as there is usually only a single currency used in a single invoice.

Multi-label

A multi-label field is a similar type of classification, but instead of assigning a single category to a document, a multi-label classification can assign many categories, or labels to a document. For example, the document.type field determines which of Sypht's fieldsets are applicable to a given document. Since multiple fieldsets can be applicable to a single document, this field is implemented as a multi-label field.

Samples

For multi-class and multi-label fields, the value returned for each field is a JSON object.

Here is an example of output for the invoice.currencyCode multi-class field:

Multi-class example
{ 
  "name": "invoice.currencyCode",
  "value": {
    "invoice.currencyCode.AUD": {"value": true, "confidence": 0.69}, 
    "invoice.currencyCode.EUR": {"value": false, "confidence": 0}, 
    "invoice.currencyCode.GBP": {"value": false, "confidence": 0}, 
    "invoice.currencyCode.MYR": {"value": false, "confidence": 0}, 
    "invoice.currencyCode.NZD": {"value": false, "confidence": 0.29}, 
    "invoice.currencyCode.SGD": {"value": false, "confidence": 0}, 
    "invoice.currencyCode.Unknown": {"value": false, "confidence": 0.02}
  },
  "confidence": 0.69,
  "boundingBox": null
}

The list of keys in the object enumerate each possible class/label for the field. For the invoice.currencyCode shown above, there are seven possible classes, six of which represent a currency code and a seventh Unknown class for cases where Sypht cannot determine the currency. Note that each class is qualified with the field name.

Decoding classifications

Each class/label has a value and a confidence. For multi-class fields (as above), only a single class will have a value of true.

For multi-label fields (such as document.type), multiple labels can have a value of true. Here is an example of output for the document.type multi-label field that demonstrates how multiple labels can simultaneously have a value of true:

Multi-label example
{
  "name": "document.type",
  "value": {
    "bank": { "value": false, "confidence": 1 },
    "bill": { "value": false, "confidence": 1 },
    "bpay": { "value": false, "confidence": 1 },
    "electricity": { "value": false, "confidence": 1 },
    "generic": { "value": false, "confidence": 1 },
    "invoice": { "value": true, "confidence": 0.96 },
    "issuer": { "value": true, "confidence": 0.94 },
    "ndis": { "value": false, "confidence": 0.93 },
    "paystub": { "value": false, "confidence": 1 },
    "recipient": { "value": true, "confidence": 0.81 },
    "statement": { "value": false, "confidence": 1 },
    "toll": { "value": false, "confidence": 0.99 },
    "vehicle": { "value": true, "confidence": 0.76 }
  },
  "confidence": 0.511464907008,
  "boundingBox": null
}

Note that invoice, issuer, recipient and vehicle all have values of true.

Last updated