Reliability & Performance
What the SDK does under stress, and what it costs you.
Performance budget
Per beval.log(...) call, in your hot path:
| Step | Cost |
|---|---|
| Validate + normalize fields | ~5 μs |
| JSON-encode payload | ~50 μs – 1 ms (depends on input size) |
| Enqueue onto in-memory queue | ~2 μs |
| Total blocking cost | < 1 ms typical |
The HTTP POST happens on a separate thread and does not block your code.
Trace decorator overhead
@beval.trace adds one perf_counter() pair and one JSON-encode of args + return. For most functions that’s microseconds; for functions with huge inputs / outputs, serialization can take milliseconds. Set capture_args=False or capture_return=False to skip it.
Failure modes
Gateway is down
- Queue fills up in memory.
- Each queued log is retried up to
max_retries(default 3) with exponential backoff. - After retries exhaust, the log is dropped and a warning goes to the
bevalPython logger. - Your code is unaffected.
Gateway is slow
- Same as “down” but with some throughput. Queue grows until capacity (
max_queue_size, default 10 000). - On overflow, new logs are dropped and a warning is emitted. Already-queued logs are preserved.
- Your code is unaffected.
Your code throws during beval.log()
Shouldn’t happen — the SDK catches its own bugs in broad try/except blocks. But if it did, the exception would surface in your code. Report it and fall back to wrapping your call in try/except.
Your redact function throws
The SDK catches it, emits a warning, and ships the unredacted payload. Test your redact function. See Configuration → Redaction.
Process crash / SIGKILL
Logs in the in-memory queue at crash time are lost. The SDK is at-least-once over the network, not durable. If you need durability, don’t use this SDK for that signal — ship to a message queue instead.
Graceful shutdown
On interpreter exit, an atexit hook calls shutdown() which:
- Waits up to 5 seconds for the queue to drain.
- Closes the HTTP client.
For long-running servers (FastAPI, Django, Celery), this Just Works. For scripts and notebooks, call beval.flush() explicitly before exit if you care about the last few logs.
Network retry contract
The SDK retries on:
408 Request Timeout429 Too Many Requests5xxserver errors- Any
httpx.HTTPError(connection refused, DNS failure, TLS error, read timeout)
Backoff: min(2^attempt * 0.25, 5.0) seconds. Max retries: max_retries (default 3).
Not retried:
400 Bad Request(malformed payload — warning logged, log dropped)401/403(bad API key — warning logged, log dropped)- Other 4xx (warning logged, log dropped)
Concurrency model
- One background thread per
BevalClientdrains the queue. - One
httpx.Clientper client, reusing connections to the gateway. - Queue is thread-safe (
queue.Queue). Multiple producer threads are fine. - The SDK is async-safe —
beval.log()from inside anasynciocoroutine works, though the enqueue + serialize is synchronous.
Memory footprint
- Baseline (no queued logs): ~200 KB.
- Per queued log: ~2–20 KB depending on payload size.
- At max queue (10 000 logs × 10 KB avg): ~100 MB held during outage.
Tune max_queue_size if your hosts have tight memory budgets.
Fork safety
The SDK is not fork-safe by default. If your process forks workers (Celery prefork, gunicorn with --preload, some uwsgi modes):
- The background thread started by
init()runs in the parent only. - Forked children inherit the
_CLIENTreference in memory but have no worker thread. - Logs enqueued in children pile up in a dead queue.
Fix: call init() from a post-fork hook in each child. For Celery:
from celery.signals import worker_process_init
import beval
@worker_process_init.connect
def _init_beval(**_):
beval.init()For gunicorn --preload, use post_fork in gunicorn.conf.py:
def post_fork(server, worker):
import beval
beval.init()See Integrations → Celery for more.
When to care about flush()
In most servers: never. atexit handles it.
You might want explicit flush() when:
- Running batch scripts that exit quickly after work.
- Running tests that assert against dashboard state.
- Shipping from notebooks where the kernel may be killed abruptly.
- Flushing before a known-long pause (e.g. waiting on a human prompt).
beval.flush(timeout=5.0) # waits up to 5 secondsPast the timeout, queued logs stay queued — they’ll either ship on the next tick or be lost at shutdown.