Supported File Types
Every input type bem accepts, with MIME mappings and processing notes
Every workflow call carries an inputType value that tells bem how to handle the file. The full set is below, grouped by category. Use the exact inputType string when constructing requests — values are lowercase and short (pdf, not PDF, not application/pdf).
Documents
inputType | MIME type(s) | Notes |
|---|---|---|
pdf | application/pdf | |
docx | application/vnd.openxmlformats-officedocument.wordprocessingml.document | Modern Word documents. Legacy .doc is not supported — convert to .docx first. |
email | message/rfc822 (.eml) | Body is parsed as the primary content. Attachments are unwrapped and processed alongside the body if their type is in this table. Attachments whose type isn't supported are ignored. |
text | text/plain | UTF-8 expected. |
Images
inputType | MIME type(s) | Notes |
|---|---|---|
jpeg | image/jpeg | |
png | image/png | |
webp | image/webp | |
heic | image/heic | iOS-format photos. Decoded server-side. |
heif | image/heif |
Spreadsheets
inputType | MIME type(s) | Notes |
|---|---|---|
csv | text/csv | UTF-8 expected. |
xls | application/vnd.ms-excel | Legacy Excel format. |
xlsx | application/vnd.openxmlformats-officedocument.spreadsheetml.sheet | Modern Excel format. Multi-sheet workbooks are processed sheet-by-sheet. |
Structured data
inputType | MIME type(s) | Notes |
|---|---|---|
json | application/json | |
xml | application/xml, text/xml | |
html | text/html |
Audio
inputType | MIME type(s) | Notes |
|---|---|---|
mp3 | audio/mpeg | Speech is transcribed before extraction. |
wav | audio/wav | |
m4a | audio/mp4 (audio-only MP4 container) |
Video
inputType | MIME type(s) | Notes |
|---|---|---|
mp4 | video/mp4 | Frames are sampled and the audio track is transcribed. |
Encoding rules
There are two ways to send a file in a workflow call:
Multipart form (multipart/form-data) — preferred for large files. Attach the binary directly as the file (or files for join workflows) field; no encoding required.
JSON (base64) — embed the file content as a base64-encoded string in input.singleFile.inputContent. Standard base64 (RFC 4648) — padding is required, line breaks are not. Don't include a data: URI prefix; pass only the base64 payload.
The Bem CLI hides the encoding: write --input.single-file '{"inputContent": "@invoice.pdf", "inputType": "pdf"}' and the CLI base64-encodes binary files automatically (text files are embedded as strings).
Size limits
The schema-inference endpoint (POST /v3/infer-schema) caps uploads at 20 MB. Workflow calls accept larger files; if you hit a size error in practice, contact support — the limit varies by plan.
When the input doesn't match
Sending an inputType that doesn't match the actual file format (for example, inputType: "pdf" with a JPEG body) returns 400 Bad Request with a message that names the mismatched type. Always set inputType from the source-of-truth extension or MIME type, not from a guess.
For unsupported file types — anything not in the tables above — there is no automatic conversion. Convert client-side first (e.g. .doc → .docx), or split the workload into a pre-processing step that produces a supported format.