Customer Ontology
Define your own entity types and synonyms, and seed them in bulk
bem builds an entity memory automatically as it parses documents — the
nodes and edges you read through the
Knowledge Graph API and the
File System API's find / open / xref ops.
The customer ontology endpoints let you shape that memory ahead of
time: define your own entity types, attach synonyms to entities,
and seed a whole catalog in one request so the matcher recognizes
your vocabulary from the first document it sees.
POST /v3/entity-types
POST /v3/entities/bulk
POST /v3/entities/{id}/synonyms
x-api-key: <your API key>Concepts
Entity types
An entity type is a category in your taxonomy — organization,
person, product, manufacturer, location. bem ships with a small
default set, but you can define your own and arrange them into a
hierarchy.
- Custom taxonomy. Create the types your domain actually uses. A
catalog might define
equipment,manufacturer, andaccessoryrather than leaning on the genericproduct. - Parent types. A type can declare a
parentTypeId, soaccessorycan sit underequipment. The matcher uses the hierarchy to resolve and roll up entities; a query for the parent type also covers its children. - Attribute schema. A type can carry an
attributeSchema— a JSON Schema describing the structured fields entities of that type may hold (e.g.weightKg,voltage,sku). Seeded entities validate theirattributesagainst it.
Synonyms
A synonym is an alternate surface form for an entity — "Acme Corp",
"Acme Corporation", and "ACME" all pointing at one canonical entity.
Synonyms are first-class records, not free text, and each one carries a
source that records its provenance:
source | Where it comes from |
|---|---|
extracted | bem inferred the surface form while parsing a document |
customer_defined | You added it via the API or a CSV seed |
sme_approved | A subject-matter expert reviewed and confirmed it |
When the matcher reads a new document, every synonym — whatever its
source — is a candidate surface form for resolving a mention back to the
canonical entity. Seeding customer_defined synonyms up front is how you
teach the matcher the spellings, abbreviations, and trade names your
documents actually use before it has seen them in context.
These three surfaces work together: entity types shape the taxonomy, synonyms widen what resolves to each entity, and the Knowledge Graph and File System APIs read the result back out.
Seeding via the API
POST /v3/entities/bulk creates or merges many entities in one call. A
seeded type resolves to an existing entity type or creates one; each
string in synonyms is inserted as a customer_defined synonym.
| Field | Notes |
|---|---|
bucket | bkt_…; absent → the default bucket for the account + environment |
entities[] | Each: canonical, type, optional description, synonyms (string[]), attributes (object) |
onConflict | "merge" — see merge semantics |
curl -X POST "https://api.bem.ai/v3/entities/bulk" \
-H "x-api-key: $BEM_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"bucket": "bkt_catalog",
"onConflict": "merge",
"entities": [
{
"canonical": "Acme Corporation",
"type": "manufacturer",
"description": "Industrial equipment maker",
"synonyms": ["Acme Corp", "ACME"]
},
{
"canonical": "Acme Forklift X200",
"type": "equipment",
"synonyms": ["X200", "Acme X200"],
"attributes": { "manufacturer": "Acme Corporation", "weightKg": 3200 }
}
]
}'Sync vs async
The request size decides the response shape:
-
Fewer than 100 entities →
200, processed inline. You get the per-row outcomes back in the same response.{ "results": [ { "canonical": "Acme Corporation", "outcome": "created", "entityID": "ent_acme" }, { "canonical": "Acme Forklift X200", "outcome": "merged-with", "entityID": "ent_x200" }, { "canonical": "", "outcome": "rejected", "reason": "canonical is required" } ], "summary": { "created": 1, "merged": 1, "rejected": 1 } } -
100 or more entities →
202, processed as a background job. You get aseedJobIDand astatusURLto poll.{ "seedJobID": "seed_7f3a9c", "status": "pending", "statusURL": "/v3/entities/seed/seed_7f3a9c" }
Polling a seed job
Poll the statusURL (GET /v3/entities/seed/{id}) until status is
terminal. While running, only the counts are populated; the full
results array appears once the job finishes.
curl "https://api.bem.ai/v3/entities/seed/seed_7f3a9c" \
-H "x-api-key: $BEM_API_KEY"{
"status": "completed",
"totalRows": 240,
"createdCount": 198,
"mergedCount": 40,
"rejectedCount": 2,
"results": [
{ "canonical": "Acme Corporation", "outcome": "created", "entityID": "ent_acme" }
],
"error": null
}A non-null error means the job itself failed; per-row problems show up
as rejected rows with a reason, not as a job-level error.
onConflict=merge semantics
A seeded entity conflicts with an existing one when both the canonical
name and the type match. With onConflict: "merge", the existing
entity is updated in place rather than duplicated:
- Synonyms merge additively — seeded synonyms are added as
customer_defined; existing synonyms (includingextractedones) are kept. - Description is updated to the seeded value when one is provided.
- Attributes merge — seeded keys overwrite, untouched keys remain.
The row's outcome comes back as merged-with (with the surviving
entityID) so you can tell merges from fresh created rows.
Seeding via CSV upload
For non-engineers, the dashboard route /memory/seed wraps the same
bulk endpoint in a drag-and-drop CSV flow.
Columns
canonical(required) — the entity's canonical name.type(required) — the entity type; resolves or creates it.description(optional) — free-text description.synonyms(optional) — a semicolon-separated list of surface forms, each inserted ascustomer_defined.- Any other column becomes a per-entity attribute, keyed by the column header.
canonical,type,description,synonyms,manufacturer,weightKg
Acme Corporation,manufacturer,Industrial equipment maker,Acme Corp;ACME,,
Acme Forklift X200,equipment,,X200;Acme X200,Acme Corporation,3200The flow
- Preview. After you drop the file, the dashboard parses it and shows a preview of the rows it will submit so you can confirm the column mapping before anything is written.
- Submit and watch progress. On submit the rows run through the bulk endpoint with live progress, using the same sync/async split as the API (large files become a background seed job).
- Rejected-rows CSV. When the run finishes, any rejected rows are
offered as a downloadable CSV — the original columns plus the
reason— so you can fix and re-upload just those.
Day-2 management
The seed is a starting point; you keep the ontology current with the synonym and entity-type endpoints.
Adding and removing synonyms
# add a customer-defined synonym (upgrades an existing extracted one in place)
curl -X POST "https://api.bem.ai/v3/entities/ent_acme/synonyms" \
-H "x-api-key: $BEM_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "text": "Acme Inc.", "locale": "en-US" }'
# delete a synonym
curl -X DELETE "https://api.bem.ai/v3/entities/ent_acme/synonyms/syn_123" \
-H "x-api-key: $BEM_API_KEY"POST adds the synonym with source: customer_defined. If the same text
already exists as an extracted synonym, it is upgraded in place to
customer_defined rather than duplicated. locale is optional.
DELETE only removes customer_defined or sme_approved synonyms.
Trying to delete an extracted synonym returns 409 — bem learned
it from a document, so you can't hand-delete it; it goes away when the
mentions do.
Synonym changes honor alias resolution: if {id} points at an entity
that was later merged away, the request resolves to the surviving entity
and operates on its synonym set.
Managing entity types
GET / POST / PATCH / DELETE /v3/entity-types manage the taxonomy.
The body is { name, description?, parentTypeId?, attributeSchema? }.
Two rules to know:
nameis immutable.PATCHcan changedescription,parentTypeId, orattributeSchema, but notname— entities reference the type by name, so renaming is not allowed.DELETEis blocked when the type is in use. If any entity uses the type, or the type has child types,DELETEreturns409. Reassign or remove the dependents first.
Worked example: seeding a small catalog
A realistic seed is wide but shallow — on the order of ~20 types × ~25 synonyms each for an equipment catalog. You don't send 500 lines of JSON; you send the shape below and let the row count grow.
First, the types (define the taxonomy and its hierarchy once):
curl -X POST "https://api.bem.ai/v3/entity-types" \
-H "x-api-key: $BEM_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "equipment", "description": "Physical machines in the catalog",
"attributeSchema": { "type": "object",
"properties": { "manufacturer": { "type": "string" },
"weightKg": { "type": "number" } } } }'
# a child type under "equipment"
curl -X POST "https://api.bem.ai/v3/entity-types" \
-H "x-api-key: $BEM_API_KEY" \
-H "Content-Type: application/json" \
-d '{ "name": "accessory", "parentTypeId": "etype_equipment" }'Then seed the entities for those types in one bulk call (truncated — the real body repeats this row shape across all ~20 types):
{
"bucket": "bkt_catalog",
"onConflict": "merge",
"entities": [
{
"canonical": "Acme Forklift X200",
"type": "equipment",
"synonyms": ["X200", "Acme X200", "Forklift X200", "AX-200"],
"attributes": { "manufacturer": "Acme Corporation", "weightKg": 3200 }
},
{
"canonical": "Globex Pallet Jack PJ-5",
"type": "equipment",
"synonyms": ["PJ-5", "Globex PJ5", "Pallet Jack 5"],
"attributes": { "manufacturer": "Globex Holdings", "weightKg": 95 }
}
]
}With ~25 synonyms behind each entity, the matcher resolves the abbreviations, SKUs, and trade names in your documents back to the right canonical entity from the first parse — no warm-up period.