In cases where a single PDF file contains multiple underlying documents, smart-split allows for the automatic detection and segmentation of sub-documents. Even in cases where sub-documents have variable lengths and format.
When a source file is uploaded, it is processed and a corresponding fileId
is assigned. The /results/
for the original file will then contain one or more child document fileIds
which can then be queried to obtain the corresponding sub-document results.
Document splitting can be used in conjunction with other standard workflows like prediction or validation.
To automatically split files on upload, a few changes to the standard fileupload
form-data parameters are required:
Specify the document-splitting workflow type by setting: workflowId=split
Specify a childWorkflowId
to define what workflow to run on each generated sub-document
Optionally specify childWorkflowOptions
to parameterise the workflow run on each generated sub-document
Split workflows are a BETA
feature and subject to change.
This guide is under-construction.
workflowId
= split
workflowOptions
Sample{"prediction": {"childWorkflow": "prediction","childWorkflowOptions": {"prediction": {"fieldSets": ["sypht.invoice"]}},}}
{"fileId": "00000000-0000-0000-0000-000000000000","uploadedAt": "2020-08-20T03:19:07.319Z","status": "RECEIVED"}
GET https://api.sypht.com/result/final/00000000-0000-0000-0000-000000000000
{"fileId": "815c63f6-...-f07223d057cb","status": "FINALISED","results": {"fields": [{"name": "components.children","value": [{"file_id": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"},{"file_id": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"}]}]}}
GET https://api.sypht.com/result/final/aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa
Response{"fileId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa","status": "FINALISED","results": {"timestamp": "2020-08-20T03:30:09.703Z","fields": [{"name": "invoice.total","value": "1485.00","confidence": 0.9958282699555642,...},...]}}
GET https://api.sypht.com/result/final/bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb
Response{"fileId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb","status": "FINALISED","results": {"timestamp": "2020-08-20T03:30:09.703Z","fields": [{"name": "invoice.total","value": "2485.00","confidence": 0.99582,...},...]}}