Infer Schema from File

Hand off to an LLM

Analyze a file and infer a JSON Schema from its contents.

Accepts a file via multipart form upload and uses Gemini to analyze the document, returning a description of its contents, an inferred JSON Schema capturing all extractable fields, and document classification metadata.

The returned schema is designed to be reusable across many similar documents of the same type, not just the specific file uploaded. It can be used directly as the outputSchema when creating a Transform function.

The endpoint also detects whether the file contains multiple bundled documents and classifies the content nature (textual, visual, audio, video, or mixed).

Supported file types

PDF, PNG, JPEG, HEIC, HEIF, WebP, CSV, XLS, XLSX, DOCX, JSON, HTML, XML, EML, plain text, WAV, MP3, M4A, MP4.

File size limit

Maximum file size is 20 MB.

Examples

Using curl:

curl -X POST https://api.bem.ai/v3/infer-schema \
  -H "x-api-key: YOUR_API_KEY" \
  -F "file=@invoice.pdf"

Using the Bem CLI:

bem infer-schema create --file @invoice.pdf
POST
/v3/infer-schema
x-api-key<token>

Authenticate using API Key in request header

In: header

Request Body

multipart/form-data

TypeScript Definitions

Use the request body type in TypeScript.

file*unknown

The file to analyze and infer a JSON schema from.

Response Body

application/json

application/json

curl -X POST "https://api.bem.ai/v3/infer-schema" \  -F file="null"
{
  "filename": "string",
  "analysis": {
    "fileName": "string",
    "contentType": "string",
    "sizeBytes": 0,
    "fileType": "string",
    "description": "string",
    "schema": {},
    "isMultiDocument": true,
    "documentTypes": [
      {
        "name": "string",
        "count": 0,
        "description": "string"
      }
    ],
    "contentNature": "string"
  }
}
{
  "message": "string",
  "code": 0,
  "details": {}
}

See also