Parse and Search over Contracts
Parse a contract into a navigable structure, then search and reason over it with the /v3/fs File System API
This cookbook walks the Parse primitive end to end against a contract. By the end you will have:
- A Parse function with entity extraction and cross-document memory enabled
- A workflow that calls that Parse function
- Your first parsed contract — page-aware sections (DUTIES, COMPENSATION, INDEMNIFICATION, INSURANCE, …), entities (parties, dollar amounts, jurisdictions, named individuals), and the relationships between them
- A working set of File System ops (
ls,cat,grep,find,xref) you can wire straight into an agent's tool surface for contract review
The example uses a single Professional Services Agreement template at contract.pdf. The same shape works for MSAs, NDAs, vendor agreements, employment contracts — any structured legal document. Once several contracts are parsed in the same environment, the cross-document ops collapse the same party, statute, or coverage type into one canonical record across all of them.
Pick a language from the tabs in each step — the flow is identical across cURL, the SDKs, and the CLI. If you don't have an SDK installed yet, see Step 2 of the Quickstart.
Prerequisites
-
A bem account and an API key from Settings → API Keys
-
BEM_API_KEYexported in your shell:export BEM_API_KEY='your-api-key-here' -
A contract PDF on disk at
contract.pdf. The walkthrough's example responses are based on a 10-page Professional Services Agreement; substitute any contract you have.
Step 1: Create a Parse function
A Parse function has no outputSchema — it's configured by what you want extracted about each contract, not by the fields you need. The two parseConfig toggles default to true and we want both on:
extractEntities=truemakesentities[]andrelationships[]show up alongsidesections[]in the parse output (parties, signers, dollar amounts, statutes, …).linkAcrossDocuments=trueruns a cross-document resolver after each parse, so the same party or statute resolves to one canonical record across every contract you parse. This is what unlocks the memory-level File System ops (find,open,xref).
curl -X POST https://api.bem.ai/v3/functions \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{
"functionName": "contract-parser",
"type": "parse",
"displayName": "Contract Parser",
"tags": ["contracts", "legal"],
"parseConfig": {
"extractEntities": true,
"linkAcrossDocuments": true
}
}'import Bem from "bem-ai-sdk";
const client = new Bem();
const { function: fn } = await client.functions.create({
functionName: "contract-parser",
type: "parse",
displayName: "Contract Parser",
tags: ["contracts", "legal"],
parseConfig: {
extractEntities: true,
linkAcrossDocuments: true,
},
});
console.log(fn);from bem import Bem
client = Bem()
response = client.functions.create(
function_name="contract-parser",
type="parse",
display_name="Contract Parser",
tags=["contracts", "legal"],
parse_config={
"extract_entities": True,
"link_across_documents": True,
},
)
print(response.function)package main
import (
"context"
"fmt"
bem "github.com/bem-team/bem-go-sdk"
)
func main() {
client := bem.NewClient()
resp, err := client.Functions.New(context.TODO(), bem.FunctionNewParams{
CreateFunction: bem.CreateFunctionUnionParam{
OfParse: &bem.CreateFunctionParseParam{
FunctionName: "contract-parser",
DisplayName: bem.String("Contract Parser"),
Tags: []string{"contracts", "legal"},
ParseConfig: bem.CreateFunctionParseParseConfigParam{
ExtractEntities: bem.Bool(true),
LinkAcrossDocuments: bem.Bool(true),
},
},
},
})
if err != nil {
panic(err)
}
fmt.Printf("%+v\n", resp.Function)
}using Bem;
using Bem.Models.Functions;
BemClient client = new();
var response = await client.Functions.Create(new FunctionCreateParams
{
CreateFunction = new Parse
{
FunctionName = "contract-parser",
DisplayName = "Contract Parser",
Tags = new List<string> { "contracts", "legal" },
ParseConfig = new ParseConfig
{
ExtractEntities = true,
LinkAcrossDocuments = true,
},
},
});
Console.WriteLine(response.Function);bem functions create \
--function-name contract-parser \
--type parse \
--display-name "Contract Parser" \
--tags '["contracts", "legal"]' \
--parse-config.extract-entities \
--parse-config.link-across-documentsResponse:
{
"function": {
"functionID": "fn_2abc123",
"functionName": "contract-parser",
"type": "parse",
"displayName": "Contract Parser",
"tags": ["contracts", "legal"],
"versionNum": 1,
"parseConfig": {
"extractEntities": true,
"linkAcrossDocuments": true
}
}
}Step 2: Create a workflow
A workflow gives the Parse function a callable entry point. For a single-step parse pipeline you only need one node and no edges.
curl -X POST https://api.bem.ai/v3/workflows \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{
"name": "contract-parse",
"displayName": "Contract Parse",
"tags": ["contracts", "legal"],
"mainNodeName": "contract-parser",
"nodes": [
{
"name": "contract-parser",
"function": { "name": "contract-parser" }
}
]
}'import Bem from "bem-ai-sdk";
const client = new Bem();
const { workflow } = await client.workflows.create({
name: "contract-parse",
displayName: "Contract Parse",
tags: ["contracts", "legal"],
mainNodeName: "contract-parser",
nodes: [
{
name: "contract-parser",
function: { name: "contract-parser" },
},
],
});
console.log(workflow);from bem import Bem
client = Bem()
response = client.workflows.create(
name="contract-parse",
display_name="Contract Parse",
tags=["contracts", "legal"],
main_node_name="contract-parser",
nodes=[
{
"name": "contract-parser",
"function": {"name": "contract-parser"},
}
],
)
print(response.workflow)package main
import (
"context"
"fmt"
bem "github.com/bem-team/bem-go-sdk"
)
func main() {
client := bem.NewClient()
resp, err := client.Workflows.New(context.TODO(), bem.WorkflowNewParams{
Name: "contract-parse",
DisplayName: bem.String("Contract Parse"),
Tags: []string{"contracts", "legal"},
MainNodeName: "contract-parser",
Nodes: []bem.WorkflowNewParamsNode{
{
Name: bem.String("contract-parser"),
Function: bem.FunctionVersionIdentifierParam{
Name: bem.String("contract-parser"),
},
},
},
})
if err != nil {
panic(err)
}
fmt.Printf("%+v\n", resp.Workflow)
}using Bem;
using Bem.Models.Workflows;
BemClient client = new();
var response = await client.Workflows.Create(new WorkflowCreateParams
{
Name = "contract-parse",
DisplayName = "Contract Parse",
Tags = new List<string> { "contracts", "legal" },
MainNodeName = "contract-parser",
Nodes = new List<Node>
{
new Node
{
Function = new FunctionVersionIdentifier { Name = "contract-parser" },
Name = "contract-parser",
},
},
});
Console.WriteLine(response.Workflow);bem workflows create \
--name contract-parse \
--display-name "Contract Parse" \
--tags '["contracts", "legal"]' \
--main-node-name contract-parser \
--node '{name: contract-parser, function: {name: contract-parser}}'Step 3: Call the workflow with your contract
Send contract.pdf through the workflow. We pass wait=true to block for up to 30 seconds — Parse on a typical contract finishes well inside that window. The callReferenceID is the handle we'll address the parsed doc by from /v3/fs later.
Upload as multipart form data (recommended for files):
curl -X POST "https://api.bem.ai/v3/workflows/contract-parse/call" \
-H "x-api-key: $BEM_API_KEY" \
-F "wait=true" \
-F "callReferenceID=sample-contract" \
-F "file=@contract.pdf"Or, JSON body with base64-encoded file:
curl -X POST "https://api.bem.ai/v3/workflows/contract-parse/call?wait=true" \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{
"callReferenceID": "sample-contract",
"input": {
"singleFile": {
"inputType": "pdf",
"inputContent": "'"$(base64 -i contract.pdf)"'"
}
}
}'import fs from "node:fs";
import Bem from "bem-ai-sdk";
const client = new Bem();
const inputContent = fs.readFileSync("contract.pdf").toString("base64");
const { call } = await client.workflows.call("contract-parse", {
wait: true,
callReferenceID: "sample-contract",
input: {
singleFile: {
inputType: "pdf",
inputContent,
},
},
});
console.log(call?.status);import base64
from bem import Bem
client = Bem()
with open("contract.pdf", "rb") as f:
input_content = base64.b64encode(f.read()).decode()
response = client.workflows.call(
"contract-parse",
wait=True,
call_reference_id="sample-contract",
input={
"single_file": {
"input_type": "pdf",
"input_content": input_content,
}
},
)
print(response.call.status)package main
import (
"context"
"encoding/base64"
"fmt"
"os"
bem "github.com/bem-team/bem-go-sdk"
)
func main() {
client := bem.NewClient()
data, err := os.ReadFile("contract.pdf")
if err != nil {
panic(err)
}
encoded := base64.StdEncoding.EncodeToString(data)
resp, err := client.Workflows.Call(context.TODO(), "contract-parse", bem.WorkflowCallParams{
Wait: bem.Bool(true),
CallReferenceID: bem.String("sample-contract"),
Input: bem.WorkflowCallParamsInput{
SingleFile: &bem.WorkflowCallParamsInputSingleFile{
InputType: "pdf",
InputContent: encoded,
},
},
})
if err != nil {
panic(err)
}
fmt.Printf("status=%s\n", resp.Call.Status)
}using Bem;
using Bem.Models.Workflows;
BemClient client = new();
var bytes = File.ReadAllBytes("contract.pdf");
var encoded = Convert.ToBase64String(bytes);
var response = await client.Workflows.Call("contract-parse", new WorkflowCallParams
{
Wait = true,
CallReferenceID = "sample-contract",
Input = new Input
{
SingleFile = new SingleFile
{
InputType = "pdf",
InputContent = encoded,
},
},
});
Console.WriteLine(response.Call.Status);bem workflows call \
--workflow-name contract-parse \
--wait \
--call-reference-id sample-contract \
--input.single-file '{"inputContent": "@contract.pdf", "inputType": "pdf"}'The @contract.pdf syntax tells the CLI to read and base64-encode the file inline.
A few notes:
wait=trueblocks for up to 30 seconds. Larger or scan-heavy contracts may run longer and return apendingcall you can poll withGET /v3/calls/{callID}or subscribe to via webhook.- The cross-document resolver runs after each parse event is emitted, so the entity graph is briefly eventually-consistent — a few seconds — before
find/xrefsee new entities. If you're scripting against/v3/fsimmediately after a parse, build in a short delay.
Step 4: List parsed contracts
ls returns one row per parsed document with the metadata an agent needs to navigate.
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{ "op": "ls", "limit": 25 }'const { data } = await client.fs.navigate({ op: "ls", limit: 25 });
console.log(data);response = client.fs.navigate(op="ls", limit=25)
print(response.data)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpLs,
Limit: bem.Int(25),
})
if err != nil {
panic(err)
}
fmt.Printf("%+v\n", resp.Data)using Bem.Models.Fs;
var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "ls",
Limit = 25,
});
Console.WriteLine(response.Data);bem fs navigate --op ls --limit 25Response:
{
"op": "ls",
"data": [
{
"referenceID": "sample-contract",
"transformationID": "tr_…",
"functionName": "contract-parser",
"parsedAt": "2026-04-28T16:00:00Z",
"pageCount": 10,
"sectionCount": 47,
"entityCount": 31,
"previewEntities": [
"Santa Cruz County Regional Transportation Commission",
"CONSULTANT",
"Yesenia Parra",
"Luis Mendez",
"California Labor Code",
"$1,000,000"
]
}
],
"hasMore": false
}previewEntities is up to ~6 canonical names sampled from the document — enough for a sanity check that parsing landed on the right things.
Step 5: Map a contract's structure cheaply
Before reading content, let an agent map structure. cat with select projects the parse output to specific dotted paths — labels, types, pages — so the agent can scan the doc's skeleton for ~1 KB instead of pulling the full ~150 KB content.
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{
"op": "cat",
"path": "sample-contract",
"select": ["sections.label", "sections.type", "sections.page"]
}'const { data } = await client.fs.navigate({
op: "cat",
path: "sample-contract",
select: ["sections.label", "sections.type", "sections.page"],
});
console.log(data);response = client.fs.navigate(
op="cat",
path="sample-contract",
select=["sections.label", "sections.type", "sections.page"],
)
print(response.data)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpCat,
Path: bem.String("sample-contract"),
Select: []string{"sections.label", "sections.type", "sections.page"},
})var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "cat",
Path = "sample-contract",
Select = new List<string> { "sections.label", "sections.type", "sections.page" },
});bem fs navigate \
--op cat \
--path sample-contract \
--select '["sections.label", "sections.type", "sections.page"]'The agent gets back the contract's clause map — "DUTIES", "COMPENSATION", "TERM", "EARLY TERMINATION", "INDEMNIFICATION FOR DAMAGES, TAXES AND CONTRIBUTIONS", "INSURANCE", "FEDERAL, STATE AND LOCAL LAWS", … — with their types (heading, paragraph, list) and page numbers. From there it decides which sections to read in full.
To read one page in full:
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{ "op": "cat", "path": "sample-contract", "range": { "page": 4 } }'const { data } = await client.fs.navigate({
op: "cat",
path: "sample-contract",
range: { page: 4 },
});response = client.fs.navigate(
op="cat",
path="sample-contract",
range={"page": 4},
)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpCat,
Path: bem.String("sample-contract"),
Range: bem.FNavigateParamsRange{Page: bem.Int(4)},
})var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "cat",
Path = "sample-contract",
Range = new Range { Page = 4 },
});bem fs navigate --op cat --path sample-contract --range '{"page": 4}'Step 6: Search across your contracts
grep runs substring or regex search across every parsed document's output. scope narrows it to one part of the parse output (sections, entities, relationships, or all); path scopes to a single document; countOnly returns just the hit count.
Search for everywhere indemnification is discussed:
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{
"op": "grep",
"pattern": "indemnif",
"scope": "sections",
"limit": 10
}'const { data } = await client.fs.navigate({
op: "grep",
pattern: "indemnif",
scope: "sections",
limit: 10,
});response = client.fs.navigate(
op="grep",
pattern="indemnif",
scope="sections",
limit=10,
)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpGrep,
Pattern: bem.String("indemnif"),
Scope: bem.String("sections"),
Limit: bem.Int(10),
})var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "grep",
Pattern = "indemnif",
Scope = "sections",
Limit = 10,
});bem fs navigate \
--op grep \
--pattern indemnif \
--scope sections \
--limit 10Returns hits with referenceID, page, sectionLabel, and a snippet around each match — section 5 (INDEMNIFICATION FOR DAMAGES, TAXES AND CONTRIBUTIONS) on page 3, plus several insurance subclauses on page 4 that touch indemnity.
For a cheap "is it worth reading?" check, use countOnly:
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{ "op": "grep", "pattern": "Workers. Compensation", "regex": true, "countOnly": true }'const { count } = await client.fs.navigate({
op: "grep",
pattern: "Workers. Compensation",
regex: true,
countOnly: true,
});response = client.fs.navigate(
op="grep",
pattern="Workers. Compensation",
regex=True,
count_only=True,
)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpGrep,
Pattern: bem.String("Workers. Compensation"),
Regex: bem.Bool(true),
CountOnly: bem.Bool(true),
})var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "grep",
Pattern = "Workers. Compensation",
Regex = true,
CountOnly = true,
});bem fs navigate --op grep --pattern "Workers. Compensation" --regex --count-onlyReturns { "count": 3 } with no snippet payload — the agent decides whether to keep digging.
Step 7: List entities in cross-document memory
find is the entry point to the cross-document entity graph populated by linkAcrossDocuments=true. Filter by entity type ("organization", "person", "monetary_amount", "legal_reference", …) or by a substring search on canonical names.
List every organization mentioned across your contracts:
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{
"op": "find",
"filter": { "type": "organization" },
"limit": 20
}'const { data } = await client.fs.navigate({
op: "find",
filter: { type: "organization" },
limit: 20,
});response = client.fs.navigate(
op="find",
filter={"type": "organization"},
limit=20,
)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpFind,
Filter: bem.FNavigateParamsFilter{Type: bem.String("organization")},
Limit: bem.Int(20),
})var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "find",
Filter = new Filter { Type = "organization" },
Limit = 20,
});bem fs navigate \
--op find \
--filter.type organization \
--limit 20Returns one row per canonical entity with entityID, canonical, type, mentionCount, surfaceForms, and the document where it was first seen:
{
"op": "find",
"data": [
{
"entityID": "ent_4xQ…",
"canonical": "Santa Cruz County Regional Transportation Commission",
"type": "organization",
"mentionCount": 38,
"surfaceForms": ["SCCRTC", "COMMISSION", "Santa Cruz County Regional Transportation Commission"],
"firstSeenReferenceID": "sample-contract"
}
],
"hasMore": false
}Notice the surface-form collapse: SCCRTC, COMMISSION, and the long form all resolved to one canonical record. With multiple contracts in the same environment, this same entity would carry mentions from all of them.
If your environment hasn't yet been populated with linkAcrossDocuments=true parses, find returns an empty list with a hint field pointing at the toggle.
Step 8: Resolve an entity to every section that mentions it
xref is the killer "show me everywhere this entity is discussed, with full context" loop. Pass an entityID (from find); get back one row per mention with the section's full content.
curl -X POST https://api.bem.ai/v3/fs \
-H "Content-Type: application/json" \
-H "x-api-key: $BEM_API_KEY" \
-d '{ "op": "xref", "path": "ent_4xQ…", "limit": 50 }'const { data } = await client.fs.navigate({
op: "xref",
path: "ent_4xQ…",
limit: 50,
});response = client.fs.navigate(
op="xref",
path="ent_4xQ…",
limit=50,
)resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
Op: bem.FNavigateParamsOpXref,
Path: bem.String("ent_4xQ…"),
Limit: bem.Int(50),
})var response = await client.Fs.Navigate(new FNavigateParams
{
Op = "xref",
Path = "ent_4xQ…",
Limit = 50,
});bem fs navigate --op xref --path ent_4xQ… --limit 50Each row carries referenceID, page, sectionLabel, sectionType, the surface form that matched ("COMMISSION", "SCCRTC", "Santa Cruz County Regional Transportation Commission"), and the full sectionContent. One call replaces the loop of "find docs mentioning X, then cat each one to read the surrounding paragraph."
Putting it together: an agent loop
Here's a concrete contract-review loop in pseudocode. The agent does the comprehension; /v3/fs gives it eyes.
[user] What insurance minimums does this contract require?
[agent → tool] {op:"grep", pattern:"insurance", scope:"sections", countOnly:true}
[tool → agent] {count: 23}
[agent → tool] {op:"grep", pattern:"\\$[0-9,]+", regex:true, scope:"sections", limit:10}
[tool → agent] 10 hits — most cluster on page 4 ("$1,000,000 combined single
limit"); page 1 has the placeholder "$_____ for time and
materials" in section 2.A.
[agent → tool] {op:"cat", path:"sample-contract", range:{page:4}}
[tool → agent] Full text of page 4 — section 6.A enumerates four insurance
types with their minimum limits.
[agent → user] "The contract requires four insurance coverages from CONSULTANT:
Workers' Compensation (statutory minimum), Automobile Liability
($1,000,000 combined single limit), Comprehensive General
Liability ($1,000,000 CSL — bodily injury, personal injury,
broad form property damage, contractual liability,
cross-liability), and Professional Liability ($1,000,000 CSL,
only when both parties initial subparagraph 6.A.4). Source:
Section 6.A on page 4."Three tool roundtrips, no embeddings, no chunker tuning, no top-K window. With multiple contracts parsed under the same environment, swapping path:"sample-contract" for an xref against an entity like Santa Cruz County Regional Transportation Commission lights up every section across every contract that names that party — the same answer pattern, just broadened.
Pagination
ls and find paginate by cursor. Pass the previous response's nextCursor back as cursor to fetch the next page; hasMore: false means you've hit the end. Same idiom as /v3/calls and /v3/outputs.
Next steps
Parse Functions
The full reference for the Parse primitive — toggles, output structure, when to use
File System API
POST /v3/fs — every op, every flag, every parameter
MCP server
Wrap /v3/fs for Claude, ChatGPT, and other MCP-aware agents
Workflows explained
Chain a Parse function with downstream nodes for richer pipelines