System Overview

Understanding bem's core primitives and how they work together

bem provides infrastructure for transforming unstructured data into structured, actionable outputs. This page explains the core primitives and how they connect to form a complete data processing system.

The Big Picture

+----------------+     +------------------+     +-----------------+
|                |     |                  |     |                 |
|   Your Input   | --> |    Workflow      | --> |   Structured    |
|  (PDF, email,  |     |  (orchestrates   |     |  Output (JSON)  |
|   image, etc.) |     |   Functions)     |     |                 |
|                |     |                  |     |                 |
+----------------+     +------------------+     +-----------------+
                              |
                              v
                   +--------------------+
                   |   Subscriptions    |
                   |   (webhooks to     |
                   |    your systems)   |
                   +--------------------+

You send documents to bem, workflows orchestrate processing using functions, and you receive structured JSON via polling or webhooks.

Core Primitives

Functions

A function is a single, reusable processing operation. Functions are the atomic building blocks of bem:

TypeDescriptionUse Case
transform1:1 extraction of structured JSON from documentsInvoice processing, form extraction, receipt scanning
analyzeVisual analysis of images and videosInfer visual elements, pull identifiers from videos
routeClassifies inputs and directs to different pathsDocument type classification
split1:N breakdown of multi-page documentsProcessing bundled PDFs
joinN:1 combination of multiple inputsMerging related documents
enrichAugments data via semantic search against collectionsSKU matching, catalog lookup
payload_shapingTransforms JSON structure using JMESPathFormatting for downstream APIs

Functions are versioned—each configuration change creates a new version, allowing safe iteration without breaking existing integrations.

Workflows

A workflow orchestrates multiple functions into a unified processing pipeline. Workflows are configured as a directed graph:

                           Workflow
+-------------------------------------------------------------+
|                                                             |
|   +-----------+      +----------+      +----------------+   |
|   | Transform | ---> |  Enrich  | ---> | Payload Shaping|   |
|   +-----------+      +----------+      +----------------+   |
|                                                             |
+-------------------------------------------------------------+
         ^
         |
    Single Entry Point
  • Main Function: The entry point that receives input
  • Relationships: Define how data flows between functions
  • Versioned: Update workflows safely without disrupting production

Calls

A call is an execution request. When you send data to bem, you create a call:

POST /v2/calls
        |
        v
+------------------+
|   WorkflowCall   |   <-- Your execution request
+------------------+
        |
        | spawns one per function
        v
+------------------+      +------------------+
|  FunctionCall 1  | ---> |  FunctionCall 2  |
+------------------+      +------------------+
  • Workflow Call: Executes an entire workflow
  • Ad-hoc Function Call: Executes a single function directly

Calls progress through statuses: pendingrunningcompleted (or failed).

Events

An event is the output notification from a function execution. When a function completes, it produces an event containing the results:

FunctionCall completes
         |
         v
+------------------+
|      Event       |
+------------------+
         |
         +-- EventType (transform, route, split, etc.)
         |
         +-- Contains Transformation (output data)
         |
         +-- Triggers Subscriptions (webhooks)

Events are what subscriptions listen to—when created, bem delivers them to your configured webhook endpoints.

Transformations

A transformation is the structured data output from a function. It contains the actual extracted content:

{
  "transformID": "tr_abc123",
  "extractedJSON": {
    "invoiceNumber": "INV-2024-001",
    "vendor": "Acme Corp",
    "totalAmount": 1250.00
  },
  "referenceID": "your-tracking-id"
}

Transformations adhere to the outputSchema defined in the function configuration.

Subscriptions

A subscription configures webhook delivery for events, connecting function outputs to your systems:

Function completes --> Event created --> Subscription triggers --> Webhook sent

Subscribe to specific functions to receive notifications when they complete.

Views

A view provides insight into transformation outputs. Views can include columns, filters, and aggregations—useful for monitoring and analyzing results across many function executions.

How Everything Connects

1. SETUP (once)
   +-- Create Functions (define extraction logic)
   +-- Create Workflow (chain functions together)
   +-- Create Subscriptions (configure webhooks)

2. EXECUTE (per document)
   POST /v2/calls
        |
        v
   WorkflowCall created (status: pending)
        |
        v
   FunctionCalls execute in sequence
        |
        v
   Events produced with Transformations
        |
        v
   Subscriptions trigger webhooks to your systems

3. RETRIEVE
   GET /v2/calls/{id} --> Full results with all function outputs

Data Model Summary

ConceptWhat It IsContains
FunctionReusable processing unitConfiguration, output schema
WorkflowOrchestration layerMain function, relationships
CallExecution requestInput data, reference ID
FunctionCallSingle function executionStatus, attempt info
EventOutput notificationTransformations, metadata
TransformationStructured outputExtracted JSON, metrics
SubscriptionWebhook configFunction ID, webhook URL

Next Steps

On this page