Customer Ontology

bem builds an entity memory automatically as it parses documents — the nodes and edges you read through the Knowledge Graph API and the File System API's find / open / xref ops. The customer ontology endpoints let you shape that memory ahead of time: define your own entity types, attach synonyms to entities, and seed a whole catalog in one request so the matcher recognizes your vocabulary from the first document it sees.

POST   /v3/entity-types
POST   /v3/entities/bulk
POST   /v3/entities/{id}/synonyms
x-api-key: <your API key>

Concepts

Entity types

An entity type is a category in your taxonomy — organization, person, product, manufacturer, location. bem ships with a small default set, but you can define your own and arrange them into a hierarchy.

Custom taxonomy. Create the types your domain actually uses. A catalog might define equipment, manufacturer, and accessory rather than leaning on the generic product.
Parent types. A type can declare a parentTypeId, so accessory can sit under equipment. The matcher uses the hierarchy to resolve and roll up entities; a query for the parent type also covers its children.
Attribute schema. A type can carry an attributeSchema — a JSON Schema describing the structured fields entities of that type may hold (e.g. weightKg, voltage, sku). Seeded entities validate their attributes against it.

Synonyms

A synonym is an alternate surface form for an entity — "Acme Corp", "Acme Corporation", and "ACME" all pointing at one canonical entity. Synonyms are first-class records, not free text, and each one carries a source that records its provenance:

`source`	Where it comes from
`extracted`	bem inferred the surface form while parsing a document
`customer_defined`	You added it via the API or a CSV seed
`sme_approved`	A subject-matter expert reviewed and confirmed it

When the matcher reads a new document, every synonym — whatever its source — is a candidate surface form for resolving a mention back to the canonical entity. Seeding customer_defined synonyms up front is how you teach the matcher the spellings, abbreviations, and trade names your documents actually use before it has seen them in context.

These three surfaces work together: entity types shape the taxonomy, synonyms widen what resolves to each entity, and the Knowledge Graph and File System APIs read the result back out.

Seeding via the API

POST /v3/entities/bulk creates or merges many entities in one call. A seeded type resolves to an existing entity type or creates one; each string in synonyms is inserted as a customer_defined synonym.

Field	Notes
`bucket`	`bkt_…`; absent → the default bucket for the account + environment
`entities[]`	Each: `canonical`, `type`, optional `description`, `synonyms` (`string[]`), `attributes` (object)
`onConflict`	`"merge"` — see merge semantics

curl -X POST "https://api.bem.ai/v3/entities/bulk" \
  -H "x-api-key: $BEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "bucket": "bkt_catalog",
    "onConflict": "merge",
    "entities": [
      {
        "canonical": "Acme Corporation",
        "type": "manufacturer",
        "description": "Industrial equipment maker",
        "synonyms": ["Acme Corp", "ACME"]
      },
      {
        "canonical": "Acme Forklift X200",
        "type": "equipment",
        "synonyms": ["X200", "Acme X200"],
        "attributes": { "manufacturer": "Acme Corporation", "weightKg": 3200 }
      }
    ]
  }'

Sync vs async

The request size decides the response shape:

Fewer than 100 entities → 200, processed inline. You get the per-row outcomes back in the same response.

{
  "results": [
    { "canonical": "Acme Corporation", "outcome": "created", "entityID": "ent_acme" },
    { "canonical": "Acme Forklift X200", "outcome": "merged-with", "entityID": "ent_x200" },
    { "canonical": "", "outcome": "rejected", "reason": "canonical is required" }
  ],
  "summary": { "created": 1, "merged": 1, "rejected": 1 }
}

100 or more entities → 202, processed as a background job. You get a seedJobID and a statusURL to poll.

{
  "seedJobID": "seed_7f3a9c",
  "status": "pending",
  "statusURL": "/v3/entities/seed/seed_7f3a9c"
}

Polling a seed job

Poll the statusURL (GET /v3/entities/seed/{id}) until status is terminal. While running, only the counts are populated; the full results array appears once the job finishes.

curl "https://api.bem.ai/v3/entities/seed/seed_7f3a9c" \
  -H "x-api-key: $BEM_API_KEY"

{
  "status": "completed",
  "totalRows": 240,
  "createdCount": 198,
  "mergedCount": 40,
  "rejectedCount": 2,
  "results": [
    { "canonical": "Acme Corporation", "outcome": "created", "entityID": "ent_acme" }
  ],
  "error": null
}

A non-null error means the job itself failed; per-row problems show up as rejected rows with a reason, not as a job-level error.

`onConflict=merge` semantics

A seeded entity conflicts with an existing one when both the canonical name and the type match. With onConflict: "merge", the existing entity is updated in place rather than duplicated:

Synonyms merge additively — seeded synonyms are added as customer_defined; existing synonyms (including extracted ones) are kept.
Description is updated to the seeded value when one is provided.
Attributes merge — seeded keys overwrite, untouched keys remain.

The row's outcome comes back as merged-with (with the surviving entityID) so you can tell merges from fresh created rows.

Seeding via CSV upload

For non-engineers, the dashboard route /memory/seed wraps the same bulk endpoint in a drag-and-drop CSV flow.

Columns

canonical (required) — the entity's canonical name.
type (required) — the entity type; resolves or creates it.
description (optional) — free-text description.
synonyms (optional) — a semicolon-separated list of surface forms, each inserted as customer_defined.
Any other column becomes a per-entity attribute, keyed by the column header.

canonical,type,description,synonyms,manufacturer,weightKg
Acme Corporation,manufacturer,Industrial equipment maker,Acme Corp;ACME,,
Acme Forklift X200,equipment,,X200;Acme X200,Acme Corporation,3200

The flow

Preview. After you drop the file, the dashboard parses it and shows a preview of the rows it will submit so you can confirm the column mapping before anything is written.
Submit and watch progress. On submit the rows run through the bulk endpoint with live progress, using the same sync/async split as the API (large files become a background seed job).
Rejected-rows CSV. When the run finishes, any rejected rows are offered as a downloadable CSV — the original columns plus the reason — so you can fix and re-upload just those.

Day-2 management

The seed is a starting point; you keep the ontology current with the synonym and entity-type endpoints.

Adding and removing synonyms

# add a customer-defined synonym (upgrades an existing extracted one in place)
curl -X POST "https://api.bem.ai/v3/entities/ent_acme/synonyms" \
  -H "x-api-key: $BEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "text": "Acme Inc.", "locale": "en-US" }'

# delete a synonym
curl -X DELETE "https://api.bem.ai/v3/entities/ent_acme/synonyms/syn_123" \
  -H "x-api-key: $BEM_API_KEY"

POST adds the synonym with source: customer_defined. If the same text already exists as an extracted synonym, it is upgraded in place to customer_defined rather than duplicated. locale is optional.

DELETE only removes customer_defined or sme_approved synonyms. Trying to delete an extracted synonym returns 409 — bem learned it from a document, so you can't hand-delete it; it goes away when the mentions do.

Synonym changes honor alias resolution: if {id} points at an entity that was later merged away, the request resolves to the surviving entity and operates on its synonym set.

Managing entity types

GET / POST / PATCH / DELETE /v3/entity-types manage the taxonomy. The body is { name, description?, parentTypeId?, attributeSchema? }.

Two rules to know:

name is immutable. PATCH can change description, parentTypeId, or attributeSchema, but not name — entities reference the type by name, so renaming is not allowed.
DELETE is blocked when the type is in use. If any entity uses the type, or the type has child types, DELETE returns 409. Reassign or remove the dependents first.

Worked example: seeding a small catalog

A realistic seed is wide but shallow — on the order of ~20 types × ~25 synonyms each for an equipment catalog. You don't send 500 lines of JSON; you send the shape below and let the row count grow.

First, the types (define the taxonomy and its hierarchy once):

curl -X POST "https://api.bem.ai/v3/entity-types" \
  -H "x-api-key: $BEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "equipment", "description": "Physical machines in the catalog",
        "attributeSchema": { "type": "object",
          "properties": { "manufacturer": { "type": "string" },
                          "weightKg": { "type": "number" } } } }'

# a child type under "equipment"
curl -X POST "https://api.bem.ai/v3/entity-types" \
  -H "x-api-key: $BEM_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{ "name": "accessory", "parentTypeId": "etype_equipment" }'

Then seed the entities for those types in one bulk call (truncated — the real body repeats this row shape across all ~20 types):

{
  "bucket": "bkt_catalog",
  "onConflict": "merge",
  "entities": [
    {
      "canonical": "Acme Forklift X200",
      "type": "equipment",
      "synonyms": ["X200", "Acme X200", "Forklift X200", "AX-200"],
      "attributes": { "manufacturer": "Acme Corporation", "weightKg": 3200 }
    },
    {
      "canonical": "Globex Pallet Jack PJ-5",
      "type": "equipment",
      "synonyms": ["PJ-5", "Globex PJ5", "Pallet Jack 5"],
      "attributes": { "manufacturer": "Globex Holdings", "weightKg": 95 }
    }
  ]
}

With ~25 synonyms behind each entity, the matcher resolves the abbreviations, SKUs, and trade names in your documents back to the right canonical entity from the first parse — no warm-up period.