Entity matching
Match data to an external data source
How it works
Entity fields match information on a source document to user-provided reference data. This lets you establish a link between documents and records from an existing business database or directory.
A common use-case for entity matching is to link an invoice issuer to a supplier database. Entity fields automatically learn a fuzzy-match between information on the document (e.g. a Supplier Name, Address or Business number) and reference data fields you've uploaded. The reference data is then returned as a standard prediction result, allowing you to build on these matches to power complex automation rules, derived field values and validation checks.
Getting started
To configure and use an entity match field, the following basic process applies:
Upload entity data to Sypht via the entity storage API
Configure and train an entity match field
Extract the entity field from documents to establish a match
As entities are pushed from your database to Sypht there is no need to open up network or API access into secure internal data stores. You have complete control over what data is pushed and when.
Keeping reference data in sync
When an entity field is extracted on a document, data is matched against all available entities in the Sypht data store at that time. As reference data changes over time you can push smaller differential updates via the entity storage API to replicate addition, removal or modification of entities.
We recommend establishing a regular automated ETL process to keep reference data in sync.
Using the entities API
This section explains the available API endpoints for storage, retrieval and search of entity data. Entities are stored in isolated collections for a given company_id
and within that company there may be multiple distinct entity types (e.g. there may be distinct entity types for supplier
and the receiving office
; each having distinct attributes and match logic).
Checkout the open-source Sypht Python Client on GitHub for a reference implementation using the entities API.
Pushing entities
PUT storage/{company_id}
/entity/{entity_type}
/{entity_id}
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employeeentity_id
a unique identifier for the entity
Request Body:
JSON-encoded entity data
An object with keys and values representing attributes of the entity
Complex json data structures (e.g. nested objects or lists) may be stored but are currently not supported as reference fields for search and match
Empty values should be represented as null
POST storage/{company_id}
/bulkentity/{entity_type}
/
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employee
Request Body:
JSON-encoded list of objects with an
entity_id
anddata
to be store
Removing entities
DELETE storage/{company_id}
/entity/{entity_type}
/{entity_id}
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employeeentity_id
a unique identifier for the entity
Retrieving entities
GET storage/{company_id}
/entity/{entity_type}
/{entity_id}
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employeeentity_id
a unique identifier for the entity
Response Body:
Searching entities
POST storage/{company_id}
/entitysearch/{entity_type}
/
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employee
Request Body:
JSON-encoded string containing
exact
andfuzzy
match search constraintsEach of these should be an object with keys denoting an attribute to search against and value denoting the query string to search for
e.g. To search for exact match for "rego_no" == "qwer"
Response Body:
Searching entities by id
POST storage/{company_id}
/entitysearch/{entity_type}
/by_id
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employee
Request Body:
JSON-encoded list of objects with
entity_id
Response Body:
Retrieving list of entity_id
GET storage/{company_id}
/entitysearch/{entity_type}
Path Parameters:
company_id
your Sypht Company Identity_type
type of entity e.g. supplier, vehicle or employee
Query Parameters:
page
page token, if None (not provided) will return first page by default, otherwise request for specified page which would be grabbed fromnext_page
of previous responselimit
maximum count for responded entity_ids
Response Body:
Using Sypht Client
pip install sypht
Retrieving all entity_ids
This client method is a wrapper to loop over pagination endpoint to get all entity_ids
for specified entity_type
Returns list of objects if verbose (by default)
[{"entity_id": "id_0"}, {"entity_id": "id_1"}, ...]
Returns list of entity_id if not verbose
["id_0", "id_1", ...]
Last updated