Methodology

DB Bench is a continuous benchmarking system that measures cold start, warm latency, query performance, write performance, and branch creation time for Neon and Xata serverless Postgres providers. This page documents the testing methodology, known limitations, and design decisions.

How cold start is measured

Cold start data is gathered passively. Both providers are configured with a 5-minute scaleToZero idle timeout. A dedicated cold-start endpoint is scheduled every 15 minutes — well beyond the idle timeout — so both databases should already be hibernated when probed. No in-worker sleeping or active suspension is performed.

Before each probe, we observe the provider's current state via their management API:

Neon: GET /endpoints/:id → current_state (expect idle).
Xata: GET /organizations/:org/projects/:proj/branches/:bid → status.lifecycle.phase and instanceReadyCount (expect 0 ready instances or a scaled-down phase).

Each cold-start run collects 1 sample per provider. The pre-probe state is recorded in sample metadata as preProbeState with a coldVerified flag. If the database was unexpectedly warm (e.g., another process woke it), the sample is still recorded but flagged so the dashboard can filter or label it.

Cold-start data accumulates over time — roughly 96 samples per day per provider at the 15-minute cadence.

Same driver, fair comparison

Both providers use the same database driver: @neondatabase/serverless (HTTP). Xata natively supports the Neon serverless HTTP protocol, so queries are sent as HTTP requests for both — no TCP handshake, no TLS negotiation per connection. This eliminates protocol asymmetry and gives an apples-to-apples comparison.

Every measurement reports connect_ms and query_ms separately. With the HTTP driver, connect_ms captures client instantiation time (near-zero for both), while query_ms captures the full HTTP round-trip including database execution. The total_ms is the end-to-end measurement.

Both providers use their best available driver from a Cloudflare Worker. The HTTP driver is purpose-built for serverless environments and is recommended by both Neon and Xata for edge/serverless deployments.

Why Cloudflare Workers

We run benchmarks from Cloudflare Workers rather than a traditional server for several reasons:

Workers have sub-10ms isolate cold start, so the runtime overhead is negligible compared to database cold start times (typically 100s–1000s of ms).
smart_placement pins the Worker near the target databases in us-east-1, minimizing network variance.
Workers are a common deployment target for applications using serverless Postgres, making the measurements representative of real workloads.

Test types

Test	Samples	Description
`cold_start`	1	TTFB after passive idle timeout. Pre-probe state observed via management API. Scheduled every 15 min.
`warm_latency`	20	`SELECT 1` on a warm connection. Baseline latency.
`simple_select`	10	`SELECT id, uuid, category FROM bench_items LIMIT 10`
`indexed_lookup`	10	`SELECT * FROM bench_items WHERE id = $1` with random primary key
`aggregation`	5	`SELECT category, COUNT(*), AVG(value) FROM bench_items GROUP BY category`
`write`	10	`INSERT INTO bench_items (...) RETURNING id` with random data
`branch_create`	3	Create branch → poll readiness → verify data → delete. Full lifecycle.

Sample sizes and frequency

Three separate QStash cron schedules drive data collection:

Warm tests (warm_latency, simple_select, indexed_lookup, aggregation, write) run every 2 hours — 55 samples per provider per run.
Cold-start tests run every 15 minutes — 1 sample per provider per run (~96/day).
Branch-create tests run once daily — 3 samples per provider per run.

Percentile calculations (p50, p95, p99) are computed over the successful samples from recent runs. Only samples marked success: true are included in percentile calculations.

Target database

Both providers are seeded with the same dataset: 100,000 rows in a bench_items table with 20 categories, random values, and ~200-character text payloads. The table has indexes on category and created_at. The seed is idempotent — it checks row count before inserting.

Known limitations

Single region: Both providers are in us-east-1. Results will vary for other region pairings.
Single connection: All queries run sequentially on a single connection. No concurrency testing.
Same driver: Both providers use the same @neondatabase/serverless HTTP driver, eliminating protocol asymmetry. Timing breakdowns are still reported as connect_ms / query_ms for transparency.
Time-of-day variance: Performance can vary by time of day due to shared infrastructure load. Continuous measurement over weeks helps surface these patterns.
Cold start verification: Cold-start probes rely on passive idle timeout rather than active suspension. Each sample includes coldVerified metadata indicating whether the management API confirmed a cold/hibernated state before the probe. Samples where the DB was unexpectedly warm are flagged but not discarded.
Worker placement: While smart_placement targets proximity to databases, the exact Worker location is not guaranteed.

Disclosure

This project is not affiliated with or endorsed by Neon or Xata. It is an independent benchmarking effort. All source code is open and available for review.

Statistics

Percentiles are computed using linear interpolation between sorted sample values. For a percentile p with n values:

Index = (p / 100) * (n - 1)
Result = linear interpolation between values at floor and ceiling of the index

This matches the standard "percentile" function used by most analytics tools. Raw samples are stored indefinitely — no rollup or aggregation is applied at the storage layer.