Function Types

Split Functions

Break multi-page documents into smaller pieces for processing

Split functions break multi-page documents into smaller pieces for individual processing. They're essential for handling documents that contain multiple logical units (e.g., a PDF with multiple invoices).

When to Use

Use a Split function when you need to:

  • Process multi-page PDFs where multiple documents are bundled together
  • Handle batched documents in a single file
  • Classify and separate different document types within one file
  • Process each section of a document independently

Split Types

bem supports two split types:

TypeDescriptionUse Case
print_pageSplit by physical page boundariesEach page is a separate document
semantic_pageSplit by content/document boundaries semanticallyMulti-document files, mixed content

Configuration Fields

Required Fields

FieldTypeDescription
functionNamestringUnique identifier for the function
typestringMust be "split"
splitTypestringEither "print_page" or "semantic_page"

Optional Fields

FieldTypeDescription
displayNamestringHuman-readable display name
tagsstring[]Tags for organization
printPageSplitConfigobjectConfiguration for print page splits
semanticPageSplitConfigobjectConfiguration for semantic page splits

Use print_page when each physical page should be processed as a separate unit.

Configuration

{
  "functionName": "page-splitter",
  "type": "split",
  "displayName": "Page-by-Page Splitter",
  "splitType": "print_page",
  "printPageSplitConfig": {
    "nextFunctionName": "invoice-extractor"
  }
}
FieldTypeDescription
nextFunctionIDstringFunction ID to process each page
nextFunctionNamestringFunction name to process each page

Semantic Page Split

Use semantic_page when documents should be split by content boundaries and classified into different types.

Configuration

{
  "functionName": "document-splitter",
  "type": "split",
  "displayName": "Document Type Splitter",
  "splitType": "semantic_page",
  "semanticPageSplitConfig": {
    "itemClasses": [
      {
        "name": "invoice",
        "description": "Invoice documents with billing information",
        "nextFunctionName": "invoice-extractor"
      },
      {
        "name": "receipt",
        "description": "Receipt documents from transactions",
        "nextFunctionName": "receipt-extractor"
      },
      {
        "name": "packing-slip",
        "description": "Packing slips and shipping documents",
        "nextFunctionName": "packing-slip-extractor"
      }
    ]
  }
}

Semantic Page Config Fields

FieldTypeDescription
itemClassesarrayArray of document class definitions

Item Class Fields

FieldTypeRequiredDescription
namestringYesUnique name for this document class
descriptionstringNoDescription to help classify documents
nextFunctionIDstringConditionalFunction ID to process this class
nextFunctionNamestringConditionalFunction name to process this class

Example Workflow

A common pattern combines Split with Transform functions to structure particular components of a single input:

Multi-page PDF (Bill of Lading, Invoice, Rate Confirmation)


┌─────────────┐
│ Split       │
│ Function    │
└─────────────┘

     ├──► Page 1-3 ──► Bill of Lading Transform Function

     ├──► Page 4-6 ──► Invoice Transform Function

     └──► Page 7 ──► Rate Confirmation Transform Function

On this page