Smart document split
Add automatic document splitting to workflows.
In cases where a single PDF file contains multiple underlying documents, smart-split allows for the automatic detection and segmentation of sub-documents. Even in cases where sub-documents have variable lengths and format.
When a source file is uploaded, it is processed and a corresponding
fileId
is assigned. The /results/
for the original file will then contain one or more child document fileIds
which can then be queried to obtain the corresponding sub-document results.Document splitting can be used in conjunction with other standard workflows like prediction or validation.

Sample combination of document splitting and validation.
To automatically split files on upload, a few changes to the standard
fileupload
form-data parameters are required:- Specify the document-splitting workflow type by setting:
workflowId=split
- Specify a
childWorkflowId
to define what workflow to run on each generated sub-document - Optionally specify
childWorkflowOptions
to parameterise the workflow run on each generated sub-document
Split workflows are a
BETA
feature and subject to change.
This guide is under-construction.workflowId
= split
workflowOptions
Sample
{
"prediction": {
"childWorkflow": "prediction",
"childWorkflowOptions": {
"prediction": {
"fieldSets": ["sypht.invoice"]
}
}
}
}
{
"fileId": "00000000-0000-0000-0000-000000000000",
"uploadedAt": "2020-08-20T03:19:07.319Z",
"status": "RECEIVED"
}
{
"fileId": "815c63f6-...-f07223d057cb",
"status": "FINALISED",
"results": {
"fields": [
{
"name": "components.children",
"value": [
{"file_id": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"},
{"file_id": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb"}
]
}
]
}
}
Response
{
"fileId": "aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa",
"status": "FINALISED",
"results": {
"timestamp": "2020-08-20T03:30:09.703Z",
"fields": [
{
"name": "invoice.total",
"value": "1485.00",
"confidence": 0.9958282699555642,
...
},
...
]
}
}
Response
{
"fileId": "bbbbbbbb-bbbb-bbbb-bbbb-bbbbbbbbbbbb",
"status": "FINALISED",
"results": {
"timestamp": "2020-08-20T03:30:09.703Z",
"fields": [
{
"name": "invoice.total",
"value": "2485.00",
"confidence": 0.99582,
...
},
...
]
}
}
Limitations, Errors and Recommendations
- Uploading a document for the split worflow does not enforce any page limit checks. You may upload a document of any size but recent tests have shown we cannot process more than 50 pages at this time.
- Any split documents will be checked for page limits. To avoid this scenario please ask to have your page limit increased to your expected maximum.
- If a split document is rejected due to page or file size limits, the split workflow will eventually be marked as failure. Some split documents my successfully upload however - this is not ideal and can be avoided by increasing your page limit as above.
Last modified 3mo ago