Function Types
Split Functions
Break multi-page documents into smaller pieces for processing
Split functions break multi-page documents into smaller pieces for individual processing. They're essential for handling documents that contain multiple logical units (e.g., a PDF with multiple invoices).
When to Use
Use a Split function when you need to:
- Process multi-page PDFs where multiple documents are bundled together
- Handle batched documents in a single file
- Classify and separate different document types within one file
- Process each section of a document independently
Split Types
bem supports two split types:
| Type | Description | Use Case |
|---|---|---|
print_page | Split by physical page boundaries | Each page is a separate document |
semantic_page | Split by content/document boundaries semantically | Multi-document files, mixed content |
Configuration Fields
Required Fields
| Field | Type | Description |
|---|---|---|
functionName | string | Unique identifier for the function |
type | string | Must be "split" |
splitType | string | Either "print_page" or "semantic_page" |
Optional Fields
| Field | Type | Description |
|---|---|---|
displayName | string | Human-readable display name |
tags | string[] | Tags for organization |
printPageSplitConfig | object | Configuration for print page splits |
semanticPageSplitConfig | object | Configuration for semantic page splits |
Print Page Split
Use print_page when each physical page should be processed as a separate unit.
Configuration
{
"functionName": "page-splitter",
"type": "split",
"displayName": "Page-by-Page Splitter",
"splitType": "print_page",
"printPageSplitConfig": {
"nextFunctionName": "invoice-extractor"
}
}Print Page Config Fields
| Field | Type | Description |
|---|---|---|
nextFunctionID | string | Function ID to process each page |
nextFunctionName | string | Function name to process each page |
Semantic Page Split
Use semantic_page when documents should be split by content boundaries and classified into different types.
Configuration
{
"functionName": "document-splitter",
"type": "split",
"displayName": "Document Type Splitter",
"splitType": "semantic_page",
"semanticPageSplitConfig": {
"itemClasses": [
{
"name": "invoice",
"description": "Invoice documents with billing information",
"nextFunctionName": "invoice-extractor"
},
{
"name": "receipt",
"description": "Receipt documents from transactions",
"nextFunctionName": "receipt-extractor"
},
{
"name": "packing-slip",
"description": "Packing slips and shipping documents",
"nextFunctionName": "packing-slip-extractor"
}
]
}
}Semantic Page Config Fields
| Field | Type | Description |
|---|---|---|
itemClasses | array | Array of document class definitions |
Item Class Fields
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique name for this document class |
description | string | No | Description to help classify documents |
nextFunctionID | string | Conditional | Function ID to process this class |
nextFunctionName | string | Conditional | Function name to process this class |
Example Workflow
A common pattern combines Split with Transform functions to structure particular components of a single input:
Multi-page PDF (Bill of Lading, Invoice, Rate Confirmation)
│
▼
┌─────────────┐
│ Split │
│ Function │
└─────────────┘
│
├──► Page 1-3 ──► Bill of Lading Transform Function
│
├──► Page 4-6 ──► Invoice Transform Function
│
└──► Page 7 ──► Rate Confirmation Transform Function