Parse and Search over Contracts

Parse a contract into a navigable structure, then search and reason over it with the /v3/fs File System API

Hand off to an LLM

This cookbook walks the Parse primitive end to end against a contract. By the end you will have:

  1. A Parse function with entity extraction and cross-document memory enabled
  2. A workflow that calls that Parse function
  3. Your first parsed contract — page-aware sections (DUTIES, COMPENSATION, INDEMNIFICATION, INSURANCE, …), entities (parties, dollar amounts, jurisdictions, named individuals), and the relationships between them
  4. A working set of File System ops (ls, cat, grep, find, xref) you can wire straight into an agent's tool surface for contract review

The example uses a single Professional Services Agreement template at contract.pdf. The same shape works for MSAs, NDAs, vendor agreements, employment contracts — any structured legal document. Once several contracts are parsed in the same environment, the cross-document ops collapse the same party, statute, or coverage type into one canonical record across all of them.

Pick a language from the tabs in each step — the flow is identical across cURL, the SDKs, and the CLI. If you don't have an SDK installed yet, see Step 2 of the Quickstart.

Prerequisites

  • A bem account and an API key from Settings → API Keys

  • BEM_API_KEY exported in your shell:

    export BEM_API_KEY='your-api-key-here'
  • A contract PDF on disk at contract.pdf. The walkthrough's example responses are based on a 10-page Professional Services Agreement; substitute any contract you have.

Step 1: Create a Parse function

A Parse function has no outputSchema — it's configured by what you want extracted about each contract, not by the fields you need. The two parseConfig toggles default to true and we want both on:

  • extractEntities=true makes entities[] and relationships[] show up alongside sections[] in the parse output (parties, signers, dollar amounts, statutes, …).
  • linkAcrossDocuments=true runs a cross-document resolver after each parse, so the same party or statute resolves to one canonical record across every contract you parse. This is what unlocks the memory-level File System ops (find, open, xref).
curl -X POST https://api.bem.ai/v3/functions \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{
    "functionName": "contract-parser",
    "type": "parse",
    "displayName": "Contract Parser",
    "tags": ["contracts", "legal"],
    "parseConfig": {
      "extractEntities": true,
      "linkAcrossDocuments": true
    }
  }'
import Bem from "bem-ai-sdk";

const client = new Bem();

const { function: fn } = await client.functions.create({
  functionName: "contract-parser",
  type: "parse",
  displayName: "Contract Parser",
  tags: ["contracts", "legal"],
  parseConfig: {
    extractEntities: true,
    linkAcrossDocuments: true,
  },
});

console.log(fn);
from bem import Bem

client = Bem()

response = client.functions.create(
    function_name="contract-parser",
    type="parse",
    display_name="Contract Parser",
    tags=["contracts", "legal"],
    parse_config={
        "extract_entities": True,
        "link_across_documents": True,
    },
)
print(response.function)
package main

import (
    "context"
    "fmt"

    bem "github.com/bem-team/bem-go-sdk"
)

func main() {
    client := bem.NewClient()

    resp, err := client.Functions.New(context.TODO(), bem.FunctionNewParams{
        CreateFunction: bem.CreateFunctionUnionParam{
            OfParse: &bem.CreateFunctionParseParam{
                FunctionName: "contract-parser",
                DisplayName:  bem.String("Contract Parser"),
                Tags:         []string{"contracts", "legal"},
                ParseConfig: bem.CreateFunctionParseParseConfigParam{
                    ExtractEntities:     bem.Bool(true),
                    LinkAcrossDocuments: bem.Bool(true),
                },
            },
        },
    })
    if err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", resp.Function)
}
using Bem;
using Bem.Models.Functions;

BemClient client = new();

var response = await client.Functions.Create(new FunctionCreateParams
{
    CreateFunction = new Parse
    {
        FunctionName = "contract-parser",
        DisplayName  = "Contract Parser",
        Tags         = new List<string> { "contracts", "legal" },
        ParseConfig  = new ParseConfig
        {
            ExtractEntities     = true,
            LinkAcrossDocuments = true,
        },
    },
});

Console.WriteLine(response.Function);
bem functions create \
  --function-name contract-parser \
  --type parse \
  --display-name "Contract Parser" \
  --tags '["contracts", "legal"]' \
  --parse-config.extract-entities \
  --parse-config.link-across-documents

Response:

{
  "function": {
    "functionID": "fn_2abc123",
    "functionName": "contract-parser",
    "type": "parse",
    "displayName": "Contract Parser",
    "tags": ["contracts", "legal"],
    "versionNum": 1,
    "parseConfig": {
      "extractEntities": true,
      "linkAcrossDocuments": true
    }
  }
}

Step 2: Create a workflow

A workflow gives the Parse function a callable entry point. For a single-step parse pipeline you only need one node and no edges.

curl -X POST https://api.bem.ai/v3/workflows \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{
    "name": "contract-parse",
    "displayName": "Contract Parse",
    "tags": ["contracts", "legal"],
    "mainNodeName": "contract-parser",
    "nodes": [
      {
        "name": "contract-parser",
        "function": { "name": "contract-parser" }
      }
    ]
  }'
import Bem from "bem-ai-sdk";

const client = new Bem();

const { workflow } = await client.workflows.create({
  name: "contract-parse",
  displayName: "Contract Parse",
  tags: ["contracts", "legal"],
  mainNodeName: "contract-parser",
  nodes: [
    {
      name: "contract-parser",
      function: { name: "contract-parser" },
    },
  ],
});

console.log(workflow);
from bem import Bem

client = Bem()

response = client.workflows.create(
    name="contract-parse",
    display_name="Contract Parse",
    tags=["contracts", "legal"],
    main_node_name="contract-parser",
    nodes=[
        {
            "name": "contract-parser",
            "function": {"name": "contract-parser"},
        }
    ],
)
print(response.workflow)
package main

import (
    "context"
    "fmt"

    bem "github.com/bem-team/bem-go-sdk"
)

func main() {
    client := bem.NewClient()

    resp, err := client.Workflows.New(context.TODO(), bem.WorkflowNewParams{
        Name:         "contract-parse",
        DisplayName:  bem.String("Contract Parse"),
        Tags:         []string{"contracts", "legal"},
        MainNodeName: "contract-parser",
        Nodes: []bem.WorkflowNewParamsNode{
            {
                Name: bem.String("contract-parser"),
                Function: bem.FunctionVersionIdentifierParam{
                    Name: bem.String("contract-parser"),
                },
            },
        },
    })
    if err != nil {
        panic(err)
    }
    fmt.Printf("%+v\n", resp.Workflow)
}
using Bem;
using Bem.Models.Workflows;

BemClient client = new();

var response = await client.Workflows.Create(new WorkflowCreateParams
{
    Name         = "contract-parse",
    DisplayName  = "Contract Parse",
    Tags         = new List<string> { "contracts", "legal" },
    MainNodeName = "contract-parser",
    Nodes = new List<Node>
    {
        new Node
        {
            Function = new FunctionVersionIdentifier { Name = "contract-parser" },
            Name     = "contract-parser",
        },
    },
});

Console.WriteLine(response.Workflow);
bem workflows create \
  --name contract-parse \
  --display-name "Contract Parse" \
  --tags '["contracts", "legal"]' \
  --main-node-name contract-parser \
  --node '{name: contract-parser, function: {name: contract-parser}}'

Step 3: Call the workflow with your contract

Send contract.pdf through the workflow. We pass wait=true to block for up to 30 seconds — Parse on a typical contract finishes well inside that window. The callReferenceID is the handle we'll address the parsed doc by from /v3/fs later.

Upload as multipart form data (recommended for files):

curl -X POST "https://api.bem.ai/v3/workflows/contract-parse/call" \
  -H "x-api-key: $BEM_API_KEY" \
  -F "wait=true" \
  -F "callReferenceID=sample-contract" \
  -F "file=@contract.pdf"

Or, JSON body with base64-encoded file:

curl -X POST "https://api.bem.ai/v3/workflows/contract-parse/call?wait=true" \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{
    "callReferenceID": "sample-contract",
    "input": {
      "singleFile": {
        "inputType": "pdf",
        "inputContent": "'"$(base64 -i contract.pdf)"'"
      }
    }
  }'
import fs from "node:fs";
import Bem from "bem-ai-sdk";

const client = new Bem();

const inputContent = fs.readFileSync("contract.pdf").toString("base64");

const { call } = await client.workflows.call("contract-parse", {
  wait: true,
  callReferenceID: "sample-contract",
  input: {
    singleFile: {
      inputType: "pdf",
      inputContent,
    },
  },
});

console.log(call?.status);
import base64
from bem import Bem

client = Bem()

with open("contract.pdf", "rb") as f:
    input_content = base64.b64encode(f.read()).decode()

response = client.workflows.call(
    "contract-parse",
    wait=True,
    call_reference_id="sample-contract",
    input={
        "single_file": {
            "input_type": "pdf",
            "input_content": input_content,
        }
    },
)

print(response.call.status)
package main

import (
    "context"
    "encoding/base64"
    "fmt"
    "os"

    bem "github.com/bem-team/bem-go-sdk"
)

func main() {
    client := bem.NewClient()

    data, err := os.ReadFile("contract.pdf")
    if err != nil {
        panic(err)
    }
    encoded := base64.StdEncoding.EncodeToString(data)

    resp, err := client.Workflows.Call(context.TODO(), "contract-parse", bem.WorkflowCallParams{
        Wait:            bem.Bool(true),
        CallReferenceID: bem.String("sample-contract"),
        Input: bem.WorkflowCallParamsInput{
            SingleFile: &bem.WorkflowCallParamsInputSingleFile{
                InputType:    "pdf",
                InputContent: encoded,
            },
        },
    })
    if err != nil {
        panic(err)
    }
    fmt.Printf("status=%s\n", resp.Call.Status)
}
using Bem;
using Bem.Models.Workflows;

BemClient client = new();

var bytes = File.ReadAllBytes("contract.pdf");
var encoded = Convert.ToBase64String(bytes);

var response = await client.Workflows.Call("contract-parse", new WorkflowCallParams
{
    Wait            = true,
    CallReferenceID = "sample-contract",
    Input           = new Input
    {
        SingleFile = new SingleFile
        {
            InputType    = "pdf",
            InputContent = encoded,
        },
    },
});

Console.WriteLine(response.Call.Status);
bem workflows call \
  --workflow-name contract-parse \
  --wait \
  --call-reference-id sample-contract \
  --input.single-file '{"inputContent": "@contract.pdf", "inputType": "pdf"}'

The @contract.pdf syntax tells the CLI to read and base64-encode the file inline.

A few notes:

  • wait=true blocks for up to 30 seconds. Larger or scan-heavy contracts may run longer and return a pending call you can poll with GET /v3/calls/{callID} or subscribe to via webhook.
  • The cross-document resolver runs after each parse event is emitted, so the entity graph is briefly eventually-consistent — a few seconds — before find / xref see new entities. If you're scripting against /v3/fs immediately after a parse, build in a short delay.

Step 4: List parsed contracts

ls returns one row per parsed document with the metadata an agent needs to navigate.

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{ "op": "ls", "limit": 25 }'
const { data } = await client.fs.navigate({ op: "ls", limit: 25 });
console.log(data);
response = client.fs.navigate(op="ls", limit=25)
print(response.data)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:    bem.FNavigateParamsOpLs,
    Limit: bem.Int(25),
})
if err != nil {
    panic(err)
}
fmt.Printf("%+v\n", resp.Data)
using Bem.Models.Fs;

var response = await client.Fs.Navigate(new FNavigateParams
{
    Op    = "ls",
    Limit = 25,
});

Console.WriteLine(response.Data);
bem fs navigate --op ls --limit 25

Response:

{
  "op": "ls",
  "data": [
    {
      "referenceID": "sample-contract",
      "transformationID": "tr_…",
      "functionName": "contract-parser",
      "parsedAt": "2026-04-28T16:00:00Z",
      "pageCount": 10,
      "sectionCount": 47,
      "entityCount": 31,
      "previewEntities": [
        "Santa Cruz County Regional Transportation Commission",
        "CONSULTANT",
        "Yesenia Parra",
        "Luis Mendez",
        "California Labor Code",
        "$1,000,000"
      ]
    }
  ],
  "hasMore": false
}

previewEntities is up to ~6 canonical names sampled from the document — enough for a sanity check that parsing landed on the right things.

Step 5: Map a contract's structure cheaply

Before reading content, let an agent map structure. cat with select projects the parse output to specific dotted paths — labels, types, pages — so the agent can scan the doc's skeleton for ~1 KB instead of pulling the full ~150 KB content.

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{
    "op": "cat",
    "path": "sample-contract",
    "select": ["sections.label", "sections.type", "sections.page"]
  }'
const { data } = await client.fs.navigate({
  op: "cat",
  path: "sample-contract",
  select: ["sections.label", "sections.type", "sections.page"],
});
console.log(data);
response = client.fs.navigate(
    op="cat",
    path="sample-contract",
    select=["sections.label", "sections.type", "sections.page"],
)
print(response.data)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:     bem.FNavigateParamsOpCat,
    Path:   bem.String("sample-contract"),
    Select: []string{"sections.label", "sections.type", "sections.page"},
})
var response = await client.Fs.Navigate(new FNavigateParams
{
    Op     = "cat",
    Path   = "sample-contract",
    Select = new List<string> { "sections.label", "sections.type", "sections.page" },
});
bem fs navigate \
  --op cat \
  --path sample-contract \
  --select '["sections.label", "sections.type", "sections.page"]'

The agent gets back the contract's clause map — "DUTIES", "COMPENSATION", "TERM", "EARLY TERMINATION", "INDEMNIFICATION FOR DAMAGES, TAXES AND CONTRIBUTIONS", "INSURANCE", "FEDERAL, STATE AND LOCAL LAWS", … — with their types (heading, paragraph, list) and page numbers. From there it decides which sections to read in full.

To read one page in full:

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{ "op": "cat", "path": "sample-contract", "range": { "page": 4 } }'
const { data } = await client.fs.navigate({
  op: "cat",
  path: "sample-contract",
  range: { page: 4 },
});
response = client.fs.navigate(
    op="cat",
    path="sample-contract",
    range={"page": 4},
)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:    bem.FNavigateParamsOpCat,
    Path:  bem.String("sample-contract"),
    Range: bem.FNavigateParamsRange{Page: bem.Int(4)},
})
var response = await client.Fs.Navigate(new FNavigateParams
{
    Op    = "cat",
    Path  = "sample-contract",
    Range = new Range { Page = 4 },
});
bem fs navigate --op cat --path sample-contract --range '{"page": 4}'

Step 6: Search across your contracts

grep runs substring or regex search across every parsed document's output. scope narrows it to one part of the parse output (sections, entities, relationships, or all); path scopes to a single document; countOnly returns just the hit count.

Search for everywhere indemnification is discussed:

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{
    "op": "grep",
    "pattern": "indemnif",
    "scope": "sections",
    "limit": 10
  }'
const { data } = await client.fs.navigate({
  op: "grep",
  pattern: "indemnif",
  scope: "sections",
  limit: 10,
});
response = client.fs.navigate(
    op="grep",
    pattern="indemnif",
    scope="sections",
    limit=10,
)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:      bem.FNavigateParamsOpGrep,
    Pattern: bem.String("indemnif"),
    Scope:   bem.String("sections"),
    Limit:   bem.Int(10),
})
var response = await client.Fs.Navigate(new FNavigateParams
{
    Op      = "grep",
    Pattern = "indemnif",
    Scope   = "sections",
    Limit   = 10,
});
bem fs navigate \
  --op grep \
  --pattern indemnif \
  --scope sections \
  --limit 10

Returns hits with referenceID, page, sectionLabel, and a snippet around each match — section 5 (INDEMNIFICATION FOR DAMAGES, TAXES AND CONTRIBUTIONS) on page 3, plus several insurance subclauses on page 4 that touch indemnity.

For a cheap "is it worth reading?" check, use countOnly:

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{ "op": "grep", "pattern": "Workers. Compensation", "regex": true, "countOnly": true }'
const { count } = await client.fs.navigate({
  op: "grep",
  pattern: "Workers. Compensation",
  regex: true,
  countOnly: true,
});
response = client.fs.navigate(
    op="grep",
    pattern="Workers. Compensation",
    regex=True,
    count_only=True,
)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:        bem.FNavigateParamsOpGrep,
    Pattern:   bem.String("Workers. Compensation"),
    Regex:     bem.Bool(true),
    CountOnly: bem.Bool(true),
})
var response = await client.Fs.Navigate(new FNavigateParams
{
    Op        = "grep",
    Pattern   = "Workers. Compensation",
    Regex     = true,
    CountOnly = true,
});
bem fs navigate --op grep --pattern "Workers. Compensation" --regex --count-only

Returns { "count": 3 } with no snippet payload — the agent decides whether to keep digging.

Step 7: List entities in cross-document memory

find is the entry point to the cross-document entity graph populated by linkAcrossDocuments=true. Filter by entity type ("organization", "person", "monetary_amount", "legal_reference", …) or by a substring search on canonical names.

List every organization mentioned across your contracts:

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{
    "op": "find",
    "filter": { "type": "organization" },
    "limit": 20
  }'
const { data } = await client.fs.navigate({
  op: "find",
  filter: { type: "organization" },
  limit: 20,
});
response = client.fs.navigate(
    op="find",
    filter={"type": "organization"},
    limit=20,
)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:     bem.FNavigateParamsOpFind,
    Filter: bem.FNavigateParamsFilter{Type: bem.String("organization")},
    Limit:  bem.Int(20),
})
var response = await client.Fs.Navigate(new FNavigateParams
{
    Op     = "find",
    Filter = new Filter { Type = "organization" },
    Limit  = 20,
});
bem fs navigate \
  --op find \
  --filter.type organization \
  --limit 20

Returns one row per canonical entity with entityID, canonical, type, mentionCount, surfaceForms, and the document where it was first seen:

{
  "op": "find",
  "data": [
    {
      "entityID": "ent_4xQ…",
      "canonical": "Santa Cruz County Regional Transportation Commission",
      "type": "organization",
      "mentionCount": 38,
      "surfaceForms": ["SCCRTC", "COMMISSION", "Santa Cruz County Regional Transportation Commission"],
      "firstSeenReferenceID": "sample-contract"
    }
  ],
  "hasMore": false
}

Notice the surface-form collapse: SCCRTC, COMMISSION, and the long form all resolved to one canonical record. With multiple contracts in the same environment, this same entity would carry mentions from all of them.

If your environment hasn't yet been populated with linkAcrossDocuments=true parses, find returns an empty list with a hint field pointing at the toggle.

Step 8: Resolve an entity to every section that mentions it

xref is the killer "show me everywhere this entity is discussed, with full context" loop. Pass an entityID (from find); get back one row per mention with the section's full content.

curl -X POST https://api.bem.ai/v3/fs \
  -H "Content-Type: application/json" \
  -H "x-api-key: $BEM_API_KEY" \
  -d '{ "op": "xref", "path": "ent_4xQ…", "limit": 50 }'
const { data } = await client.fs.navigate({
  op: "xref",
  path: "ent_4xQ…",
  limit: 50,
});
response = client.fs.navigate(
    op="xref",
    path="ent_4xQ…",
    limit=50,
)
resp, err := client.Fs.Navigate(context.TODO(), bem.FNavigateParams{
    Op:    bem.FNavigateParamsOpXref,
    Path:  bem.String("ent_4xQ…"),
    Limit: bem.Int(50),
})
var response = await client.Fs.Navigate(new FNavigateParams
{
    Op    = "xref",
    Path  = "ent_4xQ…",
    Limit = 50,
});
bem fs navigate --op xref --path ent_4xQ… --limit 50

Each row carries referenceID, page, sectionLabel, sectionType, the surface form that matched ("COMMISSION", "SCCRTC", "Santa Cruz County Regional Transportation Commission"), and the full sectionContent. One call replaces the loop of "find docs mentioning X, then cat each one to read the surrounding paragraph."

Putting it together: an agent loop

Here's a concrete contract-review loop in pseudocode. The agent does the comprehension; /v3/fs gives it eyes.

[user] What insurance minimums does this contract require?

[agent → tool] {op:"grep", pattern:"insurance", scope:"sections", countOnly:true}
[tool → agent] {count: 23}

[agent → tool] {op:"grep", pattern:"\\$[0-9,]+", regex:true, scope:"sections", limit:10}
[tool → agent] 10 hits — most cluster on page 4 ("$1,000,000 combined single
                limit"); page 1 has the placeholder "$_____ for time and
                materials" in section 2.A.

[agent → tool] {op:"cat", path:"sample-contract", range:{page:4}}
[tool → agent] Full text of page 4 — section 6.A enumerates four insurance
                types with their minimum limits.

[agent → user] "The contract requires four insurance coverages from CONSULTANT:
                Workers' Compensation (statutory minimum), Automobile Liability
                ($1,000,000 combined single limit), Comprehensive General
                Liability ($1,000,000 CSL — bodily injury, personal injury,
                broad form property damage, contractual liability,
                cross-liability), and Professional Liability ($1,000,000 CSL,
                only when both parties initial subparagraph 6.A.4). Source:
                Section 6.A on page 4."

Three tool roundtrips, no embeddings, no chunker tuning, no top-K window. With multiple contracts parsed under the same environment, swapping path:"sample-contract" for an xref against an entity like Santa Cruz County Regional Transportation Commission lights up every section across every contract that names that party — the same answer pattern, just broadened.

Pagination

ls and find paginate by cursor. Pass the previous response's nextCursor back as cursor to fetch the next page; hasMore: false means you've hit the end. Same idiom as /v3/calls and /v3/outputs.

Next steps

On this page