Function Types

Parse Functions

Render documents into a navigable structure of sections, entities, and relationships — designed to be walked by an LLM agent

Hand off to an LLM

Parse functions turn a document into a navigable representation of itself. Instead of returning schema-bound JSON, a Parse function emits page-aware sections, named entities (people, organizations, products, identifiers, datasets, …), and the relationships between them. Output is queryable via the File System API using Unix-shell verbs — ls, cat, grep, head, find, open, xref — so an LLM agent can browse a corpus the way a developer browses source code.

This is the alternative to a RAG pipeline for use cases where the questions aren't known up front. The agent decides what to read next; the platform's job is to keep parsed documents addressable and the entity graph fresh.

When to use

Use a Parse function when you need to:

  • Stand up an agent loop over a document corpus without building a chunker, embedder, and vector store
  • Extract entities and relationships from documents where the schema isn't known in advance
  • Maintain a cross-document memory — one canonical record per real-world thing — across an environment
  • Power retrieval that needs to reach beyond a top-K window: grep across the corpus, xref to every section that mentions an entity

If you already know exactly which fields you need out of a document, an Extract function is the simpler tool.

Configuration fields

Required fields

FieldTypeDescription
functionNamestringUnique identifier for the function (per environment)
typestringMust be "parse"

Optional fields

FieldTypeDefaultDescription
displayNamestringHuman-readable display name
tagsstring[]Tags for organization
parseConfig.extractEntitiesbooleantrueExtract named entities and relationships in addition to sections
parseConfig.linkAcrossDocumentsbooleantrueLink entities across documents in the environment to build a cross-doc memory

Output structure

Every parse call emits a single Transformation whose JSON has three top-level arrays:

ArrayAlways populatedWhat it contains
sectionsyesPage-aware chunks of the document — labels, types (heading, paragraph, table, list, …), page numbers, and content
entitiesonly when extractEntities=trueNamed entities pulled out of the document, deduped by canonical name within the doc and counted by mention
relationshipsonly when extractEntities=trueRelationships between entities (e.g. Author A affiliated_with Institution B)

sections is the anchor: it's what cat, head, grep, and xref read against. entities and relationships are the per-document slice of the entity graph; the cross-environment view lives in the Memory tab of the dashboard and is reachable via find / open / xref.

The two toggles, in detail

extractEntities

When true (the default), each parse call extracts entities and relationships alongside sections, and dedupes entities by canonical name within the document. When false, only sections[] is emitted; entities[] and relationships[] come back empty.

Turning entities off is rarely the right call — they're cheap on top of the section pass and they're what lets grep scope to entities or relationships later. Leave the default unless you have a specific reason to drop them.

linkAcrossDocuments

When true (the default), after each parse the platform runs a cross-document resolver that merges this document's entities with entities seen in earlier documents in the same environment, building one canonical record per real-world thing across the corpus. Surface forms (bem.ai, bem, Brilliant Enterprise Magic, Inc.) collapse onto one entityID.

This toggle:

  • Doesn't change the per-call parse output — entities remain attached to the document via entity_mentions
  • Is required for the memory-level File System ops (find, open, xref); with linking off, those ops return an empty list and a hint pointing at the toggle
  • Requires extractEntities=true (linking has nothing to link otherwise)

The resolver runs asynchronously after the parse event is dispatched, so the Memory tab is briefly eventually-consistent — a few seconds — while the resolver catches up.

Example: minimal Parse function

{
  "functionName": "paper-parser",
  "type": "parse",
  "displayName": "Research Paper Parser"
}

parseConfig is omitted, so both toggles default to true. This is the canonical setup for a "parse-and-navigate" pipeline.

Example: sections only, no memory

{
  "functionName": "draft-parser",
  "type": "parse",
  "parseConfig": {
    "extractEntities": false,
    "linkAcrossDocuments": false
  }
}

Use this when you only want the navigable document structure (ls, cat, head, grep) and don't need the entity graph — for example, drafting tools that surface sections to a human reviewer.

Querying parsed output

Parsed documents are read through the File System API at POST /v3/fs. The verbs split into two groups:

  • Doc-level ops (ls, cat, head, grep, stat) — work on every parsed document, regardless of toggles.
  • Memory-level ops (find, open, xref) — work on the cross-document entity graph. Require linkAcrossDocuments=true on the parse function that produced the docs.

For a worked example, see the Parse and Search over Contracts cookbook.

On this page