Supported File Types

Every input type bem accepts, with MIME mappings and processing notes

Hand off to an LLM

Every workflow call carries an inputType value that tells bem how to handle the file. The full set is below, grouped by category. Use the exact inputType string when constructing requests — values are lowercase and short (pdf, not PDF, not application/pdf).

Documents

inputTypeMIME type(s)Notes
pdfapplication/pdf
docxapplication/vnd.openxmlformats-officedocument.wordprocessingml.documentModern Word documents. Legacy .doc is not supported — convert to .docx first.
emailmessage/rfc822 (.eml)Body is parsed as the primary content. Attachments are unwrapped and processed alongside the body if their type is in this table. Attachments whose type isn't supported are ignored.
texttext/plainUTF-8 expected.

Images

inputTypeMIME type(s)Notes
jpegimage/jpeg
pngimage/png
webpimage/webp
heicimage/heiciOS-format photos. Decoded server-side.
heifimage/heif

Spreadsheets

inputTypeMIME type(s)Notes
csvtext/csvUTF-8 expected.
xlsapplication/vnd.ms-excelLegacy Excel format.
xlsxapplication/vnd.openxmlformats-officedocument.spreadsheetml.sheetModern Excel format. Multi-sheet workbooks are processed sheet-by-sheet.

Structured data

inputTypeMIME type(s)Notes
jsonapplication/json
xmlapplication/xml, text/xml
htmltext/html

Audio

inputTypeMIME type(s)Notes
mp3audio/mpegSpeech is transcribed before extraction.
wavaudio/wav
m4aaudio/mp4 (audio-only MP4 container)

Video

inputTypeMIME type(s)Notes
mp4video/mp4Frames are sampled and the audio track is transcribed.

Encoding rules

There are two ways to send a file in a workflow call:

Multipart form (multipart/form-data) — preferred for large files. Attach the binary directly as the file (or files for join workflows) field; no encoding required.

JSON (base64) — embed the file content as a base64-encoded string in input.singleFile.inputContent. Standard base64 (RFC 4648) — padding is required, line breaks are not. Don't include a data: URI prefix; pass only the base64 payload.

The Bem CLI hides the encoding: write --input.single-file '{"inputContent": "@invoice.pdf", "inputType": "pdf"}' and the CLI base64-encodes binary files automatically (text files are embedded as strings).

Size limits

The schema-inference endpoint (POST /v3/infer-schema) caps uploads at 20 MB. Workflow calls accept larger files; if you hit a size error in practice, contact support — the limit varies by plan.

When the input doesn't match

Sending an inputType that doesn't match the actual file format (for example, inputType: "pdf" with a JPEG body) returns 400 Bad Request with a message that names the mismatched type. Always set inputType from the source-of-truth extension or MIME type, not from a guess.

For unsupported file types — anything not in the tables above — there is no automatic conversion. Convert client-side first (e.g. .doc.docx), or split the workload into a pre-processing step that produces a supported format.

On this page