Guide

Cost Estimation Mapping Across PostgreSQL and MySQL

In a production-grade query plan baseline pipeline, raw optimizer outputs are fundamentally incompatible across database engines. PostgreSQL reports costs in arbitrary planner units derived from seq_page_cost, cpu_tuple_cost, and related GUCs, while MySQL’s optimizer expresses costs as a composite of row estimates, I/O operations, and CPU cycles. The Cost Normalization & Calibration Stage exists solely to bridge this semantic gap. This pipeline stage performs deterministic translation of engine-specific cost vectors into a unified, comparable metric space. It operates strictly downstream of plan capture and upstream of structural hashing. Within the broader Core Architecture & Baselining Fundamentals framework, this stage ensures that regression detection algorithms receive mathematically aligned inputs rather than raw, engine-biased planner outputs.

Stage Boundaries & Routing Logic

Strict isolation is mandatory for deterministic baselining. This stage accepts structured JSON payloads containing EXPLAIN or EXPLAIN ANALYZE trees, engine identifiers, and schema versions. It outputs normalized cost matrices with explicit confidence intervals. The routing logic enforces hard boundaries to prevent pipeline contamination:

  • Ingress: Accepts only validated plan trees with complete cost annotations. Rejects partial outputs, plans missing total_cost/rows fields, or payloads lacking engine version metadata. Validation occurs synchronously before normalization begins.
  • Processing: Applies engine-specific normalization functions in a pure, stateless execution context. Does not compute plan hashes, does not evaluate thresholds, and does not trigger CI/CD gates. All side effects (metrics, traces) are decoupled via async event publishing.
  • Egress: Emits standardized cost vectors to the hashing queue. If normalization fails, payloads route to a dead-letter queue (DLQ) with explicit error codes (ERR_MISSING_STATS, ERR_COST_OVERFLOW, ERR_VERSION_DRIFT). Routing decisions are deterministic and logged at INFO or ERROR levels based on outcome.

Engine-Specific Cost Models & Translation Logic

Direct arithmetic comparison between PostgreSQL and MySQL costs is mathematically invalid. PostgreSQL’s planner uses a linear model where total_cost = startup_cost + run_cost, scaled by configurable constants that reflect hardware I/O characteristics. MySQL’s optimizer employs a non-linear, row-driven model heavily influenced by index selectivity, join buffering, and filter pushdown. To achieve Cost Estimation Mapping Across PostgreSQL and MySQL, the automation layer must first extract raw cost components, then apply a deterministic scaling function that maps both engines to a dimensionless baseline unit (DBU).

The normalization formula derives from empirical calibration against known workload profiles and hardware profiles:

  • PostgreSQL: DBU = (total_cost / baseline_seq_cost) * (1 + cpu_penalty_factor)
  • MySQL: DBU = (optimizer_cost / baseline_io_cost) * row_selectivity_weight

These scaling factors are not static. They are versioned and stored alongside schema metadata to prevent silent drift during minor engine upgrades. Calibration coefficients are pulled from a signed configuration store at pipeline initialization. When mapping costs, the system references the exact engine minor version and applies the corresponding coefficient set. For deeper context on how normalized costs correlate with actual execution times, see Mapping EXPLAIN Costs to Real-World Latency Metrics.

Production Implementation & Validation Pipeline

The normalization service is implemented as a stateless Python worker. It relies on strict schema validation, typed configuration, and deterministic math operations to prevent floating-point instability.

PYTHON
import logging
from dataclasses import dataclass
from typing import Literal, Optional
from pydantic import BaseModel, Field, ValidationError
from opentelemetry import metrics, trace

logger = logging.getLogger(__name__)
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

normalization_duration = meter.create_histogram(
    "db.cost_normalization.duration_ms", description="Time to normalize engine costs"
)
normalization_failures = meter.create_counter(
    "db.cost_normalization.failures_total", description="Total normalization routing failures"
)

@dataclass(frozen=True)
class CalibrationCoefficients:
    baseline_seq_cost: float
    cpu_penalty_factor: float
    baseline_io_cost: float
    row_selectivity_weight: float

class PlanPayload(BaseModel):
    engine: Literal["postgresql", "mysql"]
    engine_version: str
    total_cost: float = Field(ge=0)
    estimated_rows: int = Field(ge=0)
    startup_cost: Optional[float] = None
    schema_version: str

def load_calibration(version: str) -> CalibrationCoefficients:
    """Fetches versioned coefficients from secure config store. Fails fast on missing data."""
    # Implementation omitted for brevity; assumes gRPC/HTTP fetch with TLS pinning
    pass

def normalize_cost(payload: PlanPayload) -> float:
    with tracer.start_as_current_span("normalize_cost") as span:
        span.set_attribute("engine", payload.engine)
        span.set_attribute("engine_version", payload.engine_version)
        
        try:
            coeffs = load_calibration(payload.engine_version)
        except Exception as e:
            normalization_failures.add(1, {"reason": "ERR_VERSION_DRIFT"})
            raise ValueError("ERR_VERSION_DRIFT: Missing calibration data") from e

        if payload.engine == "postgresql":
            if payload.total_cost <= 0:
                raise ValueError("ERR_MISSING_STATS: Invalid total_cost")
            dbu = (payload.total_cost / coeffs.baseline_seq_cost) * (1 + coeffs.cpu_penalty_factor)
        elif payload.engine == "mysql":
            if payload.total_cost <= 0 or payload.estimated_rows == 0:
                raise ValueError("ERR_MISSING_STATS: Invalid cost or row estimate")
            selectivity = min(payload.estimated_rows / 1_000_000, 1.0)
            dbu = (payload.total_cost / coeffs.baseline_io_cost) * selectivity
        else:
            raise ValueError("ERR_UNSUPPORTED_ENGINE")

        if dbu > 1e9:
            normalization_failures.add(1, {"reason": "ERR_COST_OVERFLOW"})
            raise ValueError("ERR_COST_OVERFLOW: Normalized cost exceeds safe bounds")

        return round(dbu, 4)

The implementation enforces strict type boundaries and fails explicitly on invalid inputs. All normalization math is bounded to prevent overflow, and version mismatches immediately trigger DLQ routing rather than silent fallback to stale coefficients.

Observability Hooks & Safe Fallback Protocols

Production baselining requires deterministic observability. The normalization stage emits structured telemetry at three critical boundaries:

  1. Ingress Validation Metrics: plan_ingress_validated_total and plan_ingress_rejected_total track schema compliance. Rejections include the exact missing field in the reason label.
  2. Processing Latency & Distribution: db.cost_normalization.duration_ms is recorded per payload. High p99 latency indicates config store contention or unbounded coefficient lookups.
  3. Egress Routing Signals: Successful normalization publishes to the cost_normalized Kafka topic. Failures publish to cost_dlq with a structured envelope containing error_code, engine, version, and trace_id.

Safe Fallback Protocols:

  • Config Degradation: If the calibration service returns HTTP 5xx or times out, the worker does not default to hardcoded values. Instead, it routes to DLQ with ERR_VERSION_DRIFT and triggers an alert. Baseline integrity is prioritized over pipeline throughput.
  • Cost Overflow Protection: Any normalized DBU exceeding 1e9 is treated as a planner anomaly. The payload is quarantined, and a WARN event is emitted to the SRE dashboard. This prevents outlier costs from skewing downstream Defining Regression Thresholds for Query Plans calculations.
  • Circuit Breaker: A token-bucket rate limiter caps normalization attempts per engine version. During mass schema migrations, the breaker opens to prevent config store saturation, routing excess payloads to a retry queue with exponential backoff.

Integration with Downstream Baseline Stages

Normalized DBU vectors are strictly consumed by the structural hashing layer. The Plan Hashing Algorithms for SQL Engines module expects dimensionless, engine-agnostic inputs to generate stable plan fingerprints. By decoupling cost mapping from hash generation, the pipeline guarantees that identical logical plans across PostgreSQL and MySQL produce comparable baseline signatures.

Threshold evaluation occurs exclusively after hashing and normalization are complete. Regression gates compare the current DBU against the historical baseline using statistical bounds (e.g., 95th percentile confidence intervals). Because this stage guarantees mathematical alignment, threshold logic can safely apply uniform multipliers without engine-specific branching.

For authoritative reference on raw EXPLAIN output structures, consult the official documentation: PostgreSQL EXPLAIN Documentation and MySQL EXPLAIN Output Format. These sources define the baseline fields that the normalization stage validates before translation.