Parse Functions

Render documents into a navigable structure of sections, entities, and relationships — designed to be walked by an LLM agent

Parse functions turn a document into a navigable representation of itself. Instead of returning schema-bound JSON, a Parse function emits page-aware sections, named entities (people, organizations, products, identifiers, datasets, …), and the relationships between them. Output is queryable via the File System API using Unix-shell verbs — ls, cat, grep, head, find, open, xref — so an LLM agent can browse a corpus the way a developer browses source code.

This is the alternative to a RAG pipeline for use cases where the questions aren't known up front. The agent decides what to read next; the platform's job is to keep parsed documents addressable and the entity graph fresh.

When to use

Use a Parse function when you need to:

Stand up an agent loop over a document corpus without building a chunker, embedder, and vector store
Extract entities and relationships from documents where the schema isn't known in advance
Maintain a cross-document memory — one canonical record per real-world thing — across an environment
Power retrieval that needs to reach beyond a top-K window: grep across the corpus, xref to every section that mentions an entity

If you already know exactly which fields you need out of a document, an Extract function is the simpler tool.

Configuration fields

Required fields

Field	Type	Description
`functionName`	string	Unique identifier for the function (per environment)
`type`	string	Must be `"parse"`

Optional fields

Field	Type	Default	Description
`displayName`	string	—	Human-readable display name
`tags`	string[]	—	Tags for organization
`parseConfig.extractEntities`	boolean	`true`	Extract named entities and relationships in addition to sections
`parseConfig.linkAcrossDocuments`	boolean	`true`	Link entities across documents in the environment to build a cross-doc memory

Output structure

Every parse call emits a single Transformation whose JSON has three top-level arrays:

Array	Always populated	What it contains
`sections`	yes	Page-aware chunks of the document — labels, types (`heading`, `paragraph`, `table`, `list`, …), page numbers, and content
`entities`	only when `extractEntities=true`	Named entities pulled out of the document, deduped by canonical name within the doc and counted by mention
`relationships`	only when `extractEntities=true`	Relationships between entities (e.g. Author A `affiliated_with` Institution B)

sections is the anchor: it's what cat, head, grep, and xref read against. entities and relationships are the per-document slice of the entity graph; the cross-environment view lives in the Memory tab of the dashboard and is reachable via find / open / xref.

The two toggles, in detail

`extractEntities`

When true (the default), each parse call extracts entities and relationships alongside sections, and dedupes entities by canonical name within the document. When false, only sections[] is emitted; entities[] and relationships[] come back empty.

Turning entities off is rarely the right call — they're cheap on top of the section pass and they're what lets grep scope to entities or relationships later. Leave the default unless you have a specific reason to drop them.

`linkAcrossDocuments`

When true (the default), after each parse the platform runs a cross-document resolver that merges this document's entities with entities seen in earlier documents in the same environment, building one canonical record per real-world thing across the corpus. Surface forms (bem.ai, bem, Brilliant Enterprise Magic, Inc.) collapse onto one entityID.

This toggle:

Doesn't change the per-call parse output — entities remain attached to the document via entity_mentions
Is required for the memory-level File System ops (find, open, xref); with linking off, those ops return an empty list and a hint pointing at the toggle
Requires extractEntities=true (linking has nothing to link otherwise)

The resolver runs asynchronously after the parse event is dispatched, so the Memory tab is briefly eventually-consistent — a few seconds — while the resolver catches up.

Example: minimal Parse function

{
  "functionName": "paper-parser",
  "type": "parse",
  "displayName": "Research Paper Parser"
}

parseConfig is omitted, so both toggles default to true. This is the canonical setup for a "parse-and-navigate" pipeline.

Example: sections only, no memory

{
  "functionName": "draft-parser",
  "type": "parse",
  "parseConfig": {
    "extractEntities": false,
    "linkAcrossDocuments": false
  }
}

Use this when you only want the navigable document structure (ls, cat, head, grep) and don't need the entity graph — for example, drafting tools that surface sections to a human reviewer.

Querying parsed output

Parsed documents are read through the File System API at POST /v3/fs. The verbs split into two groups:

Doc-level ops (ls, cat, head, grep, stat) — work on every parsed document, regardless of toggles.
Memory-level ops (find, open, xref) — work on the cross-document entity graph. Require linkAcrossDocuments=true on the parse function that produced the docs.

For a worked example, see the Parse and Search over Contracts cookbook.