Reliability & Performance

What the SDK does under stress, and what it costs you.

Performance budget

Per beval.log(...) call, in your hot path:

Step	Cost
Validate + normalize fields	~5 μs
JSON-encode payload	~50 μs – 1 ms (depends on input size)
Enqueue onto in-memory queue	~2 μs
Total blocking cost	< 1 ms typical

The HTTP POST happens on a separate thread and does not block your code.

Trace decorator overhead

@beval.trace adds one perf_counter() pair and one JSON-encode of args + return. For most functions that’s microseconds; for functions with huge inputs / outputs, serialization can take milliseconds. Set capture_args=False or capture_return=False to skip it.

Failure modes

Gateway is down

Queue fills up in memory.
Each queued log is retried up to max_retries (default 3) with exponential backoff.
After retries exhaust, the log is dropped and a warning goes to the beval Python logger.
Your code is unaffected.

Gateway is slow

Same as “down” but with some throughput. Queue grows until capacity (max_queue_size, default 10 000).
On overflow, new logs are dropped and a warning is emitted. Already-queued logs are preserved.
Your code is unaffected.

Your code throws during `beval.log()`

Shouldn’t happen — the SDK catches its own bugs in broad try/except blocks. But if it did, the exception would surface in your code. Report it and fall back to wrapping your call in try/except.

Your redact function throws

The SDK catches it, emits a warning, and ships the unredacted payload. Test your redact function. See Configuration → Redaction.

Process crash / SIGKILL

Logs in the in-memory queue at crash time are lost. The SDK is at-least-once over the network, not durable. If you need durability, don’t use this SDK for that signal — ship to a message queue instead.

Graceful shutdown

On interpreter exit, an atexit hook calls shutdown() which:

Waits up to 5 seconds for the queue to drain.
Closes the HTTP client.

For long-running servers (FastAPI, Django, Celery), this Just Works. For scripts and notebooks, call beval.flush() explicitly before exit if you care about the last few logs.

Network retry contract

The SDK retries on:

408 Request Timeout
429 Too Many Requests
5xx server errors
Any httpx.HTTPError (connection refused, DNS failure, TLS error, read timeout)

Backoff: min(2^attempt * 0.25, 5.0) seconds. Max retries: max_retries (default 3).

Not retried:

400 Bad Request (malformed payload — warning logged, log dropped)
401 / 403 (bad API key — warning logged, log dropped)
Other 4xx (warning logged, log dropped)

Concurrency model

One background thread per BevalClient drains the queue.
One httpx.Client per client, reusing connections to the gateway.
Queue is thread-safe (queue.Queue). Multiple producer threads are fine.
The SDK is async-safe — beval.log() from inside an asyncio coroutine works, though the enqueue + serialize is synchronous.

Memory footprint

Baseline (no queued logs): ~200 KB.
Per queued log: ~2–20 KB depending on payload size.
At max queue (10 000 logs × 10 KB avg): ~100 MB held during outage.

Tune max_queue_size if your hosts have tight memory budgets.

Fork safety

The SDK is not fork-safe by default. If your process forks workers (Celery prefork, gunicorn with --preload, some uwsgi modes):

The background thread started by init() runs in the parent only.
Forked children inherit the _CLIENT reference in memory but have no worker thread.
Logs enqueued in children pile up in a dead queue.

Fix: call init() from a post-fork hook in each child. For Celery:


from celery.signals import worker_process_init
import beval
 
@worker_process_init.connect
def _init_beval(**_):
    beval.init()

For gunicorn --preload, use post_fork in gunicorn.conf.py:


def post_fork(server, worker):
    import beval
    beval.init()

See Integrations → Celery for more.

When to care about `flush()`

In most servers: never. atexit handles it.

You might want explicit flush() when:

Running batch scripts that exit quickly after work.
Running tests that assert against dashboard state.
Shipping from notebooks where the kernel may be killed abruptly.
Flushing before a known-long pause (e.g. waiting on a human prompt).


beval.flush(timeout=5.0)  # waits up to 5 seconds

Past the timeout, queued logs stay queued — they’ll either ship on the next tick or be lost at shutdown.