Core Architecture & Baselining Fundamentals

Query performance regressions are among the most insidious failure modes in data-intensive platforms: they rarely trigger infrastructure alerts, yet they quietly degrade p95/p99 latency, inflate compute spend, and cascade into application-level circuit breakers. This is the reference architecture that Database SREs, query optimization engineers, and Python DevOps teams use to detect, isolate, and remediate execution-plan drift automatically — before it reaches production traffic — by anchoring every plan to a reproducible baseline and gating changes through CI/CD.

The system operates as a strictly sequential, event-driven pipeline: Capture → Regression → CI Gate → Index Sync → Debugging. Each stage maintains explicit input/output contracts, enforcing idempotent processing and backpressure so that a fault in one stage cannot corrupt shared state or cascade downstream. This page defines the contracts, thresholds, observability surface, and orchestration patterns that bind the stages together; each stage is then documented in depth on its own page, linked inline below.

The Five-Stage Pipeline at a Glance

The Capture layer continuously extracts execution plans, parameterized query signatures, and runtime telemetry from production read replicas or staging environments. These artifacts are normalized into immutable baseline records — the single source of truth for performance expectations. The Regression engine consumes newly observed execution paths, compares them against historical anchors, and applies statistical variance models to flag deviations. Approved or flagged plans route through a CI Gate that enforces policy compliance before any schema modification or index operation propagates. The Index Sync stage materializes approved changes across target clusters through declarative reconciliation, and the Debugging workspace provides deterministic replay for root-cause isolation.

The upstream Capture stage is owned end-to-end by the Automated EXPLAIN Capture & Storage Workflows architecture, while the Regression and CI Gate verdict logic is specified in depth under Regression Detection & Rule Engines. This page is the connective tissue: it fixes the stage boundaries, the data contracts that cross them, and the thresholds every stage agrees to honour.

Sequential Pipeline Topology & Stage Contracts

Pipeline integrity relies on explicit contract enforcement at each boundary. Every stage transition is an immutable, versioned message; no stage ever mutates an upstream artifact in place. Because the artifacts are content-addressed by their plan hash, re-processing the same input is a no-op — the property that makes the whole pipeline idempotent and safe to replay after a partial failure.

Stage 1 — Capture

Responsibility: ingest raw EXPLAIN / EXPLAIN ANALYZE payloads, canonicalize them, and emit an immutable baseline candidate. Input: structured JSON or text plans, execution statistics, and parameter distributions delivered over asynchronous event streams. Output: a normalized plan document plus a deterministic fingerprint, published to the message broker.

Capture strips transient metadata — exact memory grants, temporary file paths, timestamped cache states, actual row counts — before serialization, so that only structural invariants survive. The normalization contract is shared with the cross-engine plan normalization workflow so that a PostgreSQL plan and its MySQL equivalent collapse to comparable structures. Capture is deliberately stateless and must never read the active baseline; that read-after-write dependency belongs to Regression. Failure isolation: a malformed plan is quarantined to a dead-letter stream with its parse error attached, and the capture worker continues — a single poison message can never stall the partition.

Stage 2 — Regression

Responsibility: compare each newly observed plan against its active baseline and emit a verdict. Input: a normalized plan document keyed by its fingerprint. Output: a verdict payload (PASS, WARN, or BLOCK) carrying variance coefficients, the delta against baseline cost, and the specific rule that fired.

Regression is where statistical models live: rolling percentiles, standard-deviation bands, and change-point detection distinguish gradual statistics decay from abrupt optimizer misestimation. The exact boundaries are governed by the regression thresholds contract and the rule evaluation logic in Regression Detection & Rule Engines. Failure isolation: if the baseline store is unreachable, Regression emits a WARN verdict tagged baseline_unavailable rather than failing open with a false PASS — the CI Gate then decides whether a degraded verdict is admissible.

Stage 3 — CI Gate

Responsibility: a synchronous policy enforcement point that translates verdicts into build outcomes. Input: a verdict payload plus the pull-request or deployment context. Output: a pass/fail signal to the CI system and, on failure, a routed artifact to Debugging.

The gate evaluates verdicts against organizational guardrails: maximum acceptable cost multiplier, mandatory index coverage, prohibition of new full-table scans above a row ceiling. A BLOCK fails the build; a WARN posts an annotation to the pull request with suggested rewrites or index recommendations but does not block merge. Failure isolation: the gate is fail-closed for BLOCK and fail-static for infrastructure faults — if the gate itself errors, it returns a neutral non-blocking status with an alert, never a silent green check.

Stage 4 — Index Sync

Responsibility: materialize approved schema and index changes across target clusters via declarative reconciliation. Input: an approved change set referencing a verdict that passed the gate. Output: a reconciliation report describing applied DDL and the resulting baseline invalidations.

Index Sync uses infrastructure-as-code templates to converge primary and replica clusters to a desired state; it never issues imperative one-off DDL. Because adding or dropping an index changes the plan space, every reconciliation emits a baseline-invalidation event so the next Capture cycle re-anchors affected queries. This stage draws its recommendations from the index-usage regression signals work. Failure isolation: DDL is applied within a bounded lock-timeout and rolled back on partial failure, so a stuck CREATE INDEX cannot hold an exclusive lock indefinitely.

Stage 5 — Debugging

Responsibility: provide deterministic replay for root-cause isolation on any flagged or blocked plan. Input: the offending plan fingerprint plus a snapshot of the statistics that produced it. Output: a diagnosis artifact (the divergent subtree, the responsible operator, and the suspected cause).

The Debugging workspace provisions ephemeral, resource-isolated containers that replay the exact query signature against a frozen statistics snapshot — no live production impact. It leans on join-type shift detection to pinpoint the class of regressions that most often escape cost thresholds. Failure isolation: replay containers are hard-capped on CPU, memory, and wall-clock; a runaway plan is killed and reported rather than allowed to saturate the debugging fleet.

Deterministic Plan Identification

Query optimizers are inherently non-deterministic across deployment cycles, statistics refreshes, and parameter-sniffing events. To track execution paths reliably, the architecture relies on canonical normalization and cryptographic fingerprinting. Raw execution trees are transformed into a directed acyclic graph where node ordering is strictly topological and operator attributes are normalized to stable enums. Volatile runtime values are discarded in favour of structural invariants: join algorithms, access methods, filter predicates, and sort orders.

Plan Hashing Algorithms for SQL Engines provide the mathematical backbone for generating collision-resistant identifiers that remain stable across minor optimizer iterations. By applying SHA-256 to the serialized DAG, the system produces a 256-bit fingerprint that uniquely maps to a logical execution strategy. Identical plans yield identical hashes regardless of hardware topology, concurrent load, or transient optimizer hints. This deterministic mapping eliminates false positives caused by environmental noise and enables precise version control over execution strategies. Parameterized statements are collapsed to a stable shape first, following the parameterized-query normalization rules, so that WHERE id = 42 and WHERE id = 99 never register as separate baselines.

Baseline Anchoring & Cost Model Translation

Performance baselines cannot rely solely on wall-clock latency, which fluctuates with I/O saturation, buffer-pool warming, and concurrent workload interference. Instead, the system anchors regression detection to optimizer cost models and cardinality estimates. Cost units are abstracted from engine-specific implementations and translated into normalized performance vectors. PostgreSQL’s cost metric reflects estimated CPU cycles and I/O parameterized by seq_page_cost, random_page_cost, and related GUCs. MySQL expresses cost through row estimates, filtered percentages, and join-buffer usage — a fundamentally different model.

Cost Estimation Mapping Across PostgreSQL and MySQL details how to harmonize these divergent cost models into a unified baseline schema. The architecture extracts cardinality estimates, selectivity factors, and join-cost multipliers from the optimizer’s internal state, then applies a weighted scoring function to produce a composite baseline index. This index decouples performance expectations from hardware variability, letting SREs detect regressions caused by statistics skew, missing histograms, or suboptimal join-order selection before they surface as latency spikes. Where a composite cost figure must be sanity-checked against reality, the cost-to-latency mapping methodology closes the loop between estimated units and observed milliseconds.

Threshold Matrix

Every stage agrees to the same numeric bands so that a verdict means the same thing whether it is raised at Capture time or re-evaluated in Debugging. The full derivation lives under Defining Regression Thresholds for Query Plans; the actionable summary is below.

Metric	Pass	Warn	Block	Automation trigger
Cost multiplier vs baseline p90	$< 1.5\times$	$1.5\times –2.99\times$	$\ge 3.0\times$	Block fails the CI build; Warn annotates the PR
Cardinality estimate divergence	`< 20%`	`20%–49%`	$\ge 50\%$	Warn opens a stats-refresh task; Block routes to Debugging
New sequential scan on table	`< 1M rows`	`1M–9.99M rows`	$\ge 10M rows$	Block; suggests candidate index via Index Sync
Join algorithm change (hash → nested loop)	none	1 join	$\ge 2 joins$ or outer input `> 100k` rows	Warn/Block per join-shift rules
Baseline age since last capture	`< 7 days`	`7–29 days`	$\ge 30 days$	Warn schedules re-capture; stale baseline downgrades verdict confidence
Regression false-positive rate (rolling 24h)	`< 5%`	`5%–14.9%`	$\ge 15\%$	Warn triggers threshold tuning

Bands are inclusive at the lower edge of each tier. A WARN never blocks a deployment on its own, but two or more concurrent WARN signals on the same plan are escalated to BLOCK by the CI Gate’s compound-rule evaluator to prevent slow, additive degradation from slipping through.

Production Readiness Requirements

Running this architecture against live databases imposes hard isolation requirements so that the tracking system never becomes the cause of the incident it is meant to detect.

Connection pool isolation. Capture and Debugging use a dedicated pool (qp_capture, default max_size=8, min_size=2) that is separate from the application pool. A saturated capture pool must be incapable of starving application connections; enforce this with a per-role CONNECTION LIMIT on the database side, not merely a client-side cap.
Read-replica routing. All EXPLAIN ANALYZE execution is pinned to read replicas via an explicit routing key; the primary is never targeted for plan capture. Replica lag is checked before each capture — if pg_last_wal_replay_lsn() lag exceeds 10s, capture is skipped and re-queued rather than run against stale statistics. This routing is defined alongside capturing plans without impacting production performance.
Circuit breaker configuration. Each stage wraps its downstream dependency in a breaker (failure_threshold=5, reset_timeout=30s, half_open_max_calls=2). On open, Capture buffers to the broker, Regression emits baseline_unavailable warns, and Index Sync halts reconciliation — degradation is graceful and observable, never a hard crash loop.
Privilege model. The capture role holds SELECT plus EXPLAIN-only privileges and no DML. Baseline mutation is restricted to the CI pipeline service account via RBAC; read access is granted to SRE and optimization teams through scoped, short-lived API tokens. The full isolation model is specified in Security Boundaries for Baseline Data Storage.

Observability Hooks

Every stage transition maps to a distinct OpenTelemetry span, and the pipeline exports Prometheus-compatible metrics with explicit instrument types so dashboards and alert rules can be written against a stable contract.

qp_baseline_capture_latency_seconds — histogram — end-to-end time from plan ingestion to baseline persistence; alert when p99 > 2.5s.
qp_regression_verdict_total{status="pass|warn|block"} — counter — verdicts by outcome; the ratio block / total drives the regression-rate SLO.
qp_baseline_active_count — gauge — number of live baselines under tracking; a sudden drop signals a mass-invalidation bug.
qp_index_sync_reconciliation_errors_total — counter — failed DDL reconciliations by cluster; any nonzero rate pages the on-call SRE.
qp_capture_pool_in_use — gauge — active capture connections; sustained values at max_size indicate pool starvation before it becomes application-visible.
qp_replica_lag_seconds — gauge — measured lag on the capture replica; feeds the skip-and-requeue guard.

These metrics feed automated alerting rules that raise PagerDuty incidents or Slack notifications when pipeline throughput degrades or false-positive rates breach the threshold matrix. Verdict counters are also reconciled against the cost-delta tracking series so that a spike in block verdicts can be attributed to a specific baseline version.

Python Orchestration Patterns

The pipeline is wired as a DAG under a scheduler (Airflow, Prefect, or Argo Workflows); the reference implementation targets Prefect with asyncio, asyncpg, structlog, and OpenTelemetry instrumentation. Stages communicate through content-addressed artifacts, so the scheduler only needs to enforce ordering and retries — never to carry plan state itself.

PYTHON

import asyncio

import asyncpg
import structlog
from opentelemetry import trace
from prefect import flow, task

log = structlog.get_logger("queryplan.orchestrator")
tracer = trace.get_tracer("queryplan.pipeline")

# Worker-pool sizing: capture is I/O bound on replica EXPLAIN, so it scales
# wider than the CPU-bound regression scorer. Sizes are pinned, not elastic,
# to keep replica load predictable.
CAPTURE_CONCURRENCY = 8
REGRESSION_CONCURRENCY = 4
SERIALIZATION = "canonical-json"  # UTF-8, sorted keys, no insignificant whitespace


@task(retries=3, retry_delay_seconds=5, tags=["capture"])
async def capture_plan(pool: asyncpg.Pool, query_id: str) -> dict:
    with tracer.start_as_current_span("capture") as span:
        span.set_attribute("query.id", query_id)
        async with pool.acquire() as conn:
            row = await conn.fetchrow(
                "EXPLAIN (FORMAT JSON, ANALYZE, BUFFERS) "
                "SELECT * FROM tracked_query($1)",
                query_id,
            )
        await log.ainfo("plan.captured", query_id=query_id)
        return {"query_id": query_id, "plan": row[0], "format": SERIALIZATION}


@task(retries=2, tags=["regression"])
async def evaluate_regression(artifact: dict) -> dict:
    with tracer.start_as_current_span("regression") as span:
        verdict = await score_against_baseline(artifact)  # PASS | WARN | BLOCK
        span.set_attribute("regression.verdict", verdict["status"])
        await log.ainfo(
            "verdict.emitted",
            query_id=artifact["query_id"],
            status=verdict["status"],
            cost_multiplier=verdict["cost_multiplier"],
        )
        return verdict


@flow(name="baseline-pipeline")
async def baseline_pipeline(query_ids: list[str]) -> None:
    pool = await asyncpg.create_pool(
        dsn="postgresql://qp_capture@replica/appdb",
        min_size=2,
        max_size=CAPTURE_CONCURRENCY,
        command_timeout=15,
    )
    try:
        cap_sem = asyncio.Semaphore(CAPTURE_CONCURRENCY)
        reg_sem = asyncio.Semaphore(REGRESSION_CONCURRENCY)

        async def run_one(qid: str) -> dict:
            async with cap_sem:
                artifact = await capture_plan(pool, qid)
            async with reg_sem:
                return await evaluate_regression(artifact)

        verdicts = await asyncio.gather(*(run_one(q) for q in query_ids))
        blocked = [v for v in verdicts if v["status"] == "BLOCK"]
        if blocked:
            await log.awarning("gate.block", count=len(blocked))
    finally:
        await pool.close()

Artifacts are serialized as canonical JSON so that the same logical plan produces byte-identical payloads across workers — the precondition for content-addressing and for the deduplication the async ingestion pipelines rely on at high throughput. Worker-pool sizes are pinned rather than autoscaled so that the replica sees a bounded, predictable EXPLAIN load; scale by adding scheduler workers, not by widening the capture pool.

Common Failure Modes and Mitigations

Each stage carries a short runbook entry. These are the regressions of the tracking system itself — the failures that turn a guardrail into an outage.

Capture — poison plan stalls a partition. Symptom: a partition’s lag climbs while others are healthy. Mitigation: the malformed plan is dead-lettered with its parse error; confirm the DLQ consumer is running and replay after fixing the normalizer.
Capture — replica lag skews statistics. Symptom: baselines drift immediately after a write-heavy window. Mitigation: the qp_replica_lag_seconds > 10s guard skips and requeues; if lag is chronic, route capture to a dedicated low-lag replica.
Regression — baseline store unreachable. Symptom: a burst of baseline_unavailable warns. Mitigation: the circuit breaker opens and verdicts degrade to WARN, never a false PASS; restore the store and let confidence recover on the next capture.
CI Gate — flapping verdicts inflate false positives. Symptom: qp_regression_false_positive_rate crosses 15%. Mitigation: apply threshold tuning and widen the observation window before tightening bands.
Index Sync — DDL holds an exclusive lock. Symptom: application timeouts spike during reconciliation. Mitigation: the bounded lock-timeout aborts and rolls back the DDL; prefer CREATE INDEX CONCURRENTLY and validate against schema-change baseline validation before promotion.
Debugging — runaway replay saturates the fleet. Symptom: replay containers OOM or exceed wall-clock. Mitigation: hard CPU/memory/time caps kill and report the container; capture the killed plan’s fingerprint for offline analysis.

Plan Hashing Algorithms for SQL Engines — deterministic fingerprinting for the Capture stage.
Cost Estimation Mapping Across PostgreSQL and MySQL — harmonizing divergent cost models into one baseline.
Defining Regression Thresholds for Query Plans — the numeric bands behind the threshold matrix.
Security Boundaries for Baseline Data Storage — isolation and access control for baseline repositories.
Automated EXPLAIN Capture & Storage Workflows — sibling architecture owning the upstream Capture ingestion path.
Regression Detection & Rule Engines — sibling architecture owning verdict logic and CI gating.

← Back to queryplan.org

The Five-Stage Pipeline at a Glance #

Sequential Pipeline Topology & Stage Contracts #

Stage 1 — Capture #

Stage 2 — Regression #

Stage 3 — CI Gate #

Stage 4 — Index Sync #

Stage 5 — Debugging #

Deterministic Plan Identification #

Baseline Anchoring & Cost Model Translation #

Threshold Matrix #

Production Readiness Requirements #

Observability Hooks #

Python Orchestration Patterns #

Common Failure Modes and Mitigations #

Related #