Transform Functions
Extract structured JSON data from documents
Transform functions are the most common function type in bem. They use semantic and visual analysis to extract structured JSON data from unstructured documents like PDFs, images, emails, and spreadsheets.
When to Use
Use a Transform function when you need to:
- Extract specific fields from invoices, receipts, or forms
- Convert documents into structured JSON
- Parse tabular data from spreadsheets or CSVs
- Process email content and attachments
Configuration Fields
Required Fields
| Field | Type | Description |
|---|---|---|
functionName | string | Unique identifier for the function |
type | string | Must be "transform" |
outputSchemaName | string | Human-readable name for your schema |
outputSchema | object | JSON Schema defining the structure to extract |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
displayName | string | - | Human-readable display name |
tags | string[] | - | Tags for organization |
tabularChunkingEnabled | boolean | false | Process CSV/Excel in row batches |
Output Schema
The outputSchema field defines the structure of the data you want to extract, using standard JSON Schema syntax.
Best Practices
- Use descriptive field names - Choose names that clearly indicate what data should be extracted
- Add descriptions - Include descriptions for complex fields to guide the AI
- Specify required fields - Mark essential fields as required in the schema
- Use appropriate types - Use
numberfor amounts,stringfor text,arrayfor lists
Example
{
"functionName": "invoice-extractor",
"type": "transform",
"displayName": "Invoice Data Extractor",
"outputSchemaName": "Invoice Schema",
"outputSchema": {
"type": "object",
"required": ["invoiceNumber", "totalAmount", "vendor"],
"properties": {
"invoiceNumber": {
"type": "string",
"description": "The unique invoice number"
},
"invoiceDate": {
"type": "string",
"description": "Date of the invoice in ISO 8601 format"
},
"totalAmount": {
"type": "number",
"description": "Total amount due"
},
"vendor": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "Vendor company name"
},
"address": {
"type": "string",
"description": "Vendor address"
}
}
},
"lineItems": {
"type": "array",
"description": "Individual line items on the invoice",
"items": {
"type": "object",
"properties": {
"description": { "type": "string" },
"quantity": { "type": "number" },
"unitPrice": { "type": "number" }
}
}
}
}
},
"tags": ["billing", "finance"]
}Tabular Chunking
For large spreadsheets or CSVs, enable tabularChunkingEnabled to process data in batches rather than all at once. This improves reliability for documents with many rows.
{
"functionName": "spreadsheet-processor",
"type": "transform",
"outputSchemaName": "Row Data",
"outputSchema": {
"type": "object",
"properties": {
"productName": { "type": "string" },
"quantity": { "type": "number" },
"price": { "type": "number" }
}
},
"tabularChunkingEnabled": true
}Email Integration
Transform functions automatically receive an email address. Forward emails to this address to process them through the function.
The email address is returned in the function response as emailAddress (e.g., eml_xxx@actions.bem.ai).