System Overview
Understanding bem's core primitives and how they work together
bem provides infrastructure for transforming unstructured data into structured, actionable outputs. This page explains the core primitives and how they connect to form a complete data processing system.
The Big Picture
+----------------+ +------------------+ +-----------------+
| | | | | |
| Your Input | --> | Workflow | --> | Structured |
| (PDF, email, | | (orchestrates | | Output (JSON) |
| image, etc.) | | Functions) | | |
| | | | | |
+----------------+ +------------------+ +-----------------+
|
v
+--------------------+
| Subscriptions |
| (webhooks to |
| your systems) |
+--------------------+You send documents to bem, workflows orchestrate processing using functions, and you receive structured JSON via polling or webhooks.
Core Primitives
Functions
A function is a single, reusable processing operation. Functions are the atomic building blocks of bem:
| Type | Description | Use Case |
|---|---|---|
transform | 1:1 extraction of structured JSON from documents | Invoice processing, form extraction, receipt scanning |
analyze | Visual analysis of images and videos | Infer visual elements, pull identifiers from videos |
route | Classifies inputs and directs to different paths | Document type classification |
split | 1:N breakdown of multi-page documents | Processing bundled PDFs |
join | N:1 combination of multiple inputs | Merging related documents |
enrich | Augments data via semantic search against collections | SKU matching, catalog lookup |
payload_shaping | Transforms JSON structure using JMESPath | Formatting for downstream APIs |
Functions are versioned—each configuration change creates a new version, allowing safe iteration without breaking existing integrations.
Workflows
A workflow orchestrates multiple functions into a unified processing pipeline. Workflows are configured as a directed graph:
Workflow
+-------------------------------------------------------------+
| |
| +-----------+ +----------+ +----------------+ |
| | Transform | ---> | Enrich | ---> | Payload Shaping| |
| +-----------+ +----------+ +----------------+ |
| |
+-------------------------------------------------------------+
^
|
Single Entry Point- Main Function: The entry point that receives input
- Relationships: Define how data flows between functions
- Versioned: Update workflows safely without disrupting production
Calls
A call is an execution request. When you send data to bem, you create a call:
POST /v2/calls
|
v
+------------------+
| WorkflowCall | <-- Your execution request
+------------------+
|
| spawns one per function
v
+------------------+ +------------------+
| FunctionCall 1 | ---> | FunctionCall 2 |
+------------------+ +------------------+- Workflow Call: Executes an entire workflow
- Ad-hoc Function Call: Executes a single function directly
Calls progress through statuses: pending → running → completed (or failed).
Events
An event is the output notification from a function execution. When a function completes, it produces an event containing the results:
FunctionCall completes
|
v
+------------------+
| Event |
+------------------+
|
+-- EventType (transform, route, split, etc.)
|
+-- Contains Transformation (output data)
|
+-- Triggers Subscriptions (webhooks)Events are what subscriptions listen to—when created, bem delivers them to your configured webhook endpoints.
Transformations
A transformation is the structured data output from a function. It contains the actual extracted content:
{
"transformID": "tr_abc123",
"extractedJSON": {
"invoiceNumber": "INV-2024-001",
"vendor": "Acme Corp",
"totalAmount": 1250.00
},
"referenceID": "your-tracking-id"
}Transformations adhere to the outputSchema defined in the function configuration.
Subscriptions
A subscription configures webhook delivery for events, connecting function outputs to your systems:
Function completes --> Event created --> Subscription triggers --> Webhook sentSubscribe to specific functions to receive notifications when they complete.
Views
A view provides insight into transformation outputs. Views can include columns, filters, and aggregations—useful for monitoring and analyzing results across many function executions.
How Everything Connects
1. SETUP (once)
+-- Create Functions (define extraction logic)
+-- Create Workflow (chain functions together)
+-- Create Subscriptions (configure webhooks)
2. EXECUTE (per document)
POST /v2/calls
|
v
WorkflowCall created (status: pending)
|
v
FunctionCalls execute in sequence
|
v
Events produced with Transformations
|
v
Subscriptions trigger webhooks to your systems
3. RETRIEVE
GET /v2/calls/{id} --> Full results with all function outputsData Model Summary
| Concept | What It Is | Contains |
|---|---|---|
| Function | Reusable processing unit | Configuration, output schema |
| Workflow | Orchestration layer | Main function, relationships |
| Call | Execution request | Input data, reference ID |
| FunctionCall | Single function execution | Status, attempt info |
| Event | Output notification | Transformations, metadata |
| Transformation | Structured output | Extracted JSON, metrics |
| Subscription | Webhook config | Function ID, webhook URL |