Function Accuracy
Overview of the Function Accuracy API
Hand off to an LLM
Monitor, evaluate, and iterate on the quality of every function in your environment. Function Accuracy bundles two complementary loops:
Evaluations (/v3/eval)
Trigger and retrieve per-transformation evaluations. Evaluations run
asynchronously and score each transformation's output against the
function's schema for confidence, per-field hallucination detection,
and relevance. Supported for extract, transform, analyze, and
join events.
- Trigger —
POST /v3/evalqueues jobs for a batch of transformation IDs. - Poll —
GET /v3/eval/resultsreturns the current state of each requested ID, partitioned intoresults,pending, andfailed. Accepts eithereventIDs(preferred) ortransformationIDsas a comma-separated query parameter, and always keys the response by event KSUID.
Up to 100 IDs may be submitted per request.
Metrics, review, regression (/v3/functions/{metrics,review,regression,compare})
Roll evaluation results and user corrections up into actionable function-level signal:
GET /v3/functions/metrics— aggregate accuracy, precision, recall, F1, and confusion-matrix counts per function.POST /v3/functions/review— sample-size estimation, confidence-bucketed distribution, PR-AUC, and per-threshold confidence intervals (Wald or Wilson) for picking review cutoffs.POST /v3/functions/regression— replay corrected historical inputs against a new function version, producing a labeled regression dataset.POST /v3/functions/regression/corrections— propagate baseline corrections onto the regression dataset so it can be scored.POST /v3/functions/compare— compute aggregate and field-level lift between any two versions, optionally scoped to the regression dataset.
All five endpoints support extract end-to-end on both the vision
and OCR paths, alongside the legacy transform / analyze / join
types.
See also
- System overview — evaluating extraction quality