Usage

Three ways to use the SDK. Pick the one that fits your code.

`beval.log(...)` — direct logging

The lowest-level API. Call it from anywhere.


import beval
 
beval.init()
 
beval.log(
    kind="llm",
    model_id="gpt-4o-mini",
    input="What is the capital of France?",
    output="Paris.",
    latency_ms=312,
    tokens_in=7,
    tokens_out=2,
)

Every argument is optional except what you want to see in the dashboard. Full list in the API Reference.

Capturing latency and errors


from time import perf_counter
 
t0 = perf_counter()
try:
    output = call_my_llm(prompt)
    beval.log(
        kind="llm",
        input=prompt,
        output=output,
        model_id="my-model",
        latency_ms=int((perf_counter() - t0) * 1000),
        status="success",
    )
except Exception as e:
    beval.log(
        kind="llm",
        input=prompt,
        model_id="my-model",
        latency_ms=int((perf_counter() - t0) * 1000),
        status="failure",
        error_message=f"{type(e).__name__}: {e}",
    )
    raise

`beval.wrap(client)` — auto-instrument OpenAI or Anthropic

One line of change. Every LLM call through the wrapped client is logged.

OpenAI


import beval
from openai import OpenAI
 
beval.init()
client = beval.wrap(OpenAI())
 
resp = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello."}],
)

Captured automatically:

Input messages (serialized as role: content lines)
Output text
Model name
Token usage (prompt_tokens, completion_tokens)
Latency
Exceptions (logged with status="failure", then re-raised)

Anthropic


import beval
from anthropic import Anthropic
 
beval.init()
client = beval.wrap(Anthropic())
 
resp = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=64,
    messages=[{"role": "user", "content": "Hello."}],
)

Same fields captured, plus the system prompt is prepended to the input.

What’s not yet supported

Streaming responses (stream=True) — wrappers short-circuit on streams in 0.1. See Changelog for status.
Async clients — same story. Coming in a minor release.
Tool / function calling metadata — captured in extra for OpenAI, not yet for Anthropic.

For these cases, fall back to beval.log(...) directly.

`@beval.trace` — decorate agent functions

Wraps any function (sync or async) as an agent turn.


import beval
 
beval.init()
 
@beval.trace
def run_agent(query: str) -> str:
    # ... your agent logic ...
    return answer
 
result = run_agent("Plan my week")

Captures:

Arguments (as JSON)
Return value (as JSON, truncated to 4 KB)
Latency
Exceptions (logged as status="failure", then re-raised)

With arguments


@beval.trace(name="tool:search", kind="agent")
async def search(q: str) -> list[dict]:
    ...

name — overrides the default (module.qualname). Good for grouping in the dashboard.
kind — defaults to "agent"; override to any log kind.
capture_args — set False to skip argument capture (e.g. for functions with huge inputs).
capture_return — set False to skip return-value capture.

Async support


@beval.trace
async def my_async_tool(x): ...

Detected automatically — no separate API.

VLM — attaching images

Pass image= as bytes, base64 string, or a data: URL. The log is promoted to kind="vlm" and the image appears in the dashboard drawer.


with open("screenshot.png", "rb") as f:
    beval.log(
        input="What's in this image?",
        output="A login screen.",
        model_id="gpt-4o",
        image=f.read(),
        image_mime="image/png",
    )

For images larger than ~256 KB, consider logging the reference / URL in extra instead of inlining — base64 in JSON is expensive. Direct-to-S3 upload is on the roadmap.

Mixing approaches

All three APIs share the same background queue and config. You can use them together:


beval.init()
client = beval.wrap(OpenAI())     # wraps all client.chat.completions.create
 
@beval.trace                      # logs as kind="agent"
def answer(q: str) -> str:
    # This LLM call is logged by the wrapper as kind="llm"
    resp = client.chat.completions.create(...)
    # And this manual log is logged as kind="embedding"
    beval.log(kind="embedding", input=q, ...)
    return resp.choices[0].message.content

A single agent invocation produces multiple logs — one agent per @trace, one llm per wrapped call, one embedding per explicit log(). Nested trace support (one parent span with children) is on the roadmap — see Changelog.