Usage
Three ways to use the SDK. Pick the one that fits your code.
beval.log(...) — direct logging
The lowest-level API. Call it from anywhere.
import beval
beval.init()
beval.log(
kind="llm",
model_id="gpt-4o-mini",
input="What is the capital of France?",
output="Paris.",
latency_ms=312,
tokens_in=7,
tokens_out=2,
)Every argument is optional except what you want to see in the dashboard. Full list in the API Reference.
Capturing latency and errors
from time import perf_counter
t0 = perf_counter()
try:
output = call_my_llm(prompt)
beval.log(
kind="llm",
input=prompt,
output=output,
model_id="my-model",
latency_ms=int((perf_counter() - t0) * 1000),
status="success",
)
except Exception as e:
beval.log(
kind="llm",
input=prompt,
model_id="my-model",
latency_ms=int((perf_counter() - t0) * 1000),
status="failure",
error_message=f"{type(e).__name__}: {e}",
)
raisebeval.wrap(client) — auto-instrument OpenAI or Anthropic
One line of change. Every LLM call through the wrapped client is logged.
OpenAI
import beval
from openai import OpenAI
beval.init()
client = beval.wrap(OpenAI())
resp = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello."}],
)Captured automatically:
- Input messages (serialized as
role: contentlines) - Output text
- Model name
- Token usage (
prompt_tokens,completion_tokens) - Latency
- Exceptions (logged with
status="failure", then re-raised)
Anthropic
import beval
from anthropic import Anthropic
beval.init()
client = beval.wrap(Anthropic())
resp = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=64,
messages=[{"role": "user", "content": "Hello."}],
)Same fields captured, plus the system prompt is prepended to the input.
What’s not yet supported
- Streaming responses (
stream=True) — wrappers short-circuit on streams in 0.1. See Changelog for status. - Async clients — same story. Coming in a minor release.
- Tool / function calling metadata — captured in
extrafor OpenAI, not yet for Anthropic.
For these cases, fall back to beval.log(...) directly.
@beval.trace — decorate agent functions
Wraps any function (sync or async) as an agent turn.
import beval
beval.init()
@beval.trace
def run_agent(query: str) -> str:
# ... your agent logic ...
return answer
result = run_agent("Plan my week")Captures:
- Arguments (as JSON)
- Return value (as JSON, truncated to 4 KB)
- Latency
- Exceptions (logged as
status="failure", then re-raised)
With arguments
@beval.trace(name="tool:search", kind="agent")
async def search(q: str) -> list[dict]:
...name— overrides the default (module.qualname). Good for grouping in the dashboard.kind— defaults to"agent"; override to any log kind.capture_args— setFalseto skip argument capture (e.g. for functions with huge inputs).capture_return— setFalseto skip return-value capture.
Async support
@beval.trace
async def my_async_tool(x): ...Detected automatically — no separate API.
VLM — attaching images
Pass image= as bytes, base64 string, or a data: URL. The log is promoted to kind="vlm" and the image appears in the dashboard drawer.
with open("screenshot.png", "rb") as f:
beval.log(
input="What's in this image?",
output="A login screen.",
model_id="gpt-4o",
image=f.read(),
image_mime="image/png",
)For images larger than ~256 KB, consider logging the reference / URL in extra instead of inlining — base64 in JSON is expensive. Direct-to-S3 upload is on the roadmap.
Mixing approaches
All three APIs share the same background queue and config. You can use them together:
beval.init()
client = beval.wrap(OpenAI()) # wraps all client.chat.completions.create
@beval.trace # logs as kind="agent"
def answer(q: str) -> str:
# This LLM call is logged by the wrapper as kind="llm"
resp = client.chat.completions.create(...)
# And this manual log is logged as kind="embedding"
beval.log(kind="embedding", input=q, ...)
return resp.choices[0].message.contentA single agent invocation produces multiple logs — one agent per @trace, one llm per wrapped call, one embedding per explicit log(). Nested trace support (one parent span with children) is on the roadmap — see Changelog.