Polling and Retries

Synchronous waits, polling cadence, idempotency, and how to retry safely

Hand off to an LLM

Workflow calls run asynchronously by default. There are three ways to find out when one finishes:

  1. Synchronous wait — pass wait=true and bem holds the response open for up to 30 seconds.
  2. Polling — call GET /v3/calls/{callID} until status is terminal.
  3. Webhooks — subscribe a function to a URL and receive event deliveries (see Webhooks).

Use whichever fits your workload. Synchronous waits are the simplest for interactive flows; webhooks are the right choice for high-volume backends; polling is the universal fallback.

Synchronous waits (wait=true)

wait=true on POST /v3/workflows/{workflowName}/call holds the response open for up to 30 seconds. If the call finishes inside that window you get 200 OK (or 500 on failure) with the final result; if it doesn't, you get 202 Accepted with the in-progress call object and you fall back to polling or wait for the webhook.

For the full contract — request shape, latency expectations, language-by-language access patterns, HTTP-client timeout configuration, and the production patterns that combine sync mode with webhooks — see Synchronous Mode.

Polling

GET /v3/calls/{callID} returns the current state of any call. The status field is the one to switch on:

StatusTerminal?What it means
pendingNoQueued, not yet picked up by a worker
runningNoAt least one node is executing
completedYesEvery terminal node finished without an error event
failedYesOne or more terminal nodes produced an error event

Recommended cadence:

  • Initial wait: 500ms–1s. Most simple workflows finish well under 5s.
  • Backoff: double after each unsuccessful poll, with jitter, capping at ~10s. A capped exponential of 0.5, 1, 2, 4, 8, 10, 10, 10, … is a reasonable default.
  • Deadline: pick one based on your workflow's expected runtime. Multi-step workflows with split/extract chains can run for tens of seconds; OCR-heavy pages can take minutes. If you don't have a deadline, fall back to webhooks.
import time
from bem import Bem

client = Bem()
call_id = "wc_abc123"

delay = 0.5
deadline = time.time() + 120  # 2-minute deadline

while True:
    call = client.calls.retrieve(call_id).call
    if call.status in ("completed", "failed"):
        break
    if time.time() > deadline:
        raise TimeoutError(f"call {call_id} did not finish in time")
    time.sleep(delay)
    delay = min(delay * 2, 10)

For per-node visibility (which node ran, which event it emitted, why a particular node failed), fetch the trace at GET /v3/calls/{callID}/trace. The trace is incremental — it grows as the call progresses, so it's also pollable mid-execution.

Idempotency via callReferenceID

callReferenceID is your client-side deduplication key. When you submit the same callReferenceID against the same workflow within a short retention window, bem returns the existing call instead of creating a new one — safe to retry network failures without producing duplicates.

import Bem from "bem-ai-sdk";

const client = new Bem();

// Retrying this exact request with the same callReferenceID is safe.
const { call } = await client.workflows.call("invoice-processing", {
  callReferenceID: `invoice:${invoiceID}`,
  input: { singleFile: { inputContent, inputType: "pdf" } },
  wait: true,
});

Pick a callReferenceID that's deterministic from your domain — the invoice ID, the document UUID, the user-and-email-and-timestamp tuple — not a random string. Random IDs defeat the deduplication.

If you don't pass a callReferenceID, every retry creates a new call. The call objects are cheap, but you'll process the same input multiple times and your downstream systems will see duplicate events.

Network and server-side retries

StatusRetry?How
429 Too Many RequestsYesHonour Retry-After if set; otherwise exponential backoff.
500/502/503/504YesExponential backoff with jitter, max 5 attempts.
408 Request TimeoutYesSame as 5xx.
400/401/404/422NoThe request itself needs to change.

The official SDKs implement these defaults — you only need to add explicit retry logic if you're using fetch or requests directly. See Errors and status codes for the full breakdown.

When to use which

PatternUse whenWatch out for
wait=trueInteractive UIs, scripts, single-shot extracts that finish in secondsThe 30s ceiling — fall back to polling on 202
PollingBatch jobs, CI workflows, simple long-running scriptsDon't poll faster than ~2 calls/sec; honour rate limits
WebhooksProduction backends, multi-tenant systems, anything where you'd otherwise burn polling trafficSet up signature verification before going live (see Webhooks)

On this page