Guide

Schema Validation for Baseline Metadata

Within the broader Automated EXPLAIN Capture & Storage Workflows pipeline, Schema Validation for Baseline Metadata operates as a strictly isolated gatekeeping stage. Its sole responsibility is to verify that incoming query plan artifacts align with the current database schema state before they are committed to the baseline registry. This stage does not execute queries, capture execution traces, or transform plan trees. Instead, it performs deterministic structural validation against a versioned schema snapshot, ensuring that performance regression analysis downstream operates on structurally sound data.

Pipeline Position & Strict Stage Isolation

The validation stage ingests normalized plan payloads and schema manifests from upstream processors. It emits either a validated metadata record or a structured rejection event. Strict isolation is enforced by architectural boundary: the validator never reaches out to live database instances, never modifies baseline tables, and never performs statistical sampling. All inputs are treated as immutable JSON or Parquet artifacts. This boundary guarantees that schema drift, DDL migrations, or index alterations are caught deterministically before they corrupt historical performance tracking.

Data flow through this stage follows a linear, stateless progression:

  1. Ingress: Receives a normalized plan envelope containing query_hash, schema_version, and structural references.
  2. Manifest Resolution: Fetches the corresponding schema manifest from a version-controlled registry (e.g., Git-backed object storage or a schema registry API).
  3. Deterministic Validation: Applies declarative structural rules against the payload.
  4. Routing Decision: Emits an accepted record, a soft-drift warning, or a hard rejection based on explicit thresholds.
  5. Egress: Publishes the envelope to the appropriate downstream queue for Normalizing Query Plans for Cross-Engine Comparison or baseline ingestion.

Deterministic Validation Logic & Routing Thresholds

The validation engine operates on a set of compiled, in-memory rules mapped to baseline metadata fields. Each incoming record must pass three deterministic checks before advancing. Routing decisions are governed by strict numerical thresholds to prevent subjective drift classification.

Check CategoryValidation RuleThreshold / Routing Logic
Structural AlignmentTable/column references in the plan must exist in the manifest.missing_refs == 0ACCEPT
missing_refs > 0REJECT
Type & Constraint ConsistencyData types, nullability, and PK/FK constraints must match the snapshot.type_promotions <= 2 AND constraint_drops == 0WARN
constraint_drops > 0REJECT
Index & Partition MappingReferenced indexes and partition keys must be present and active.stale_refs == 0ACCEPT
0 < stale_refs <= 3WARN
stale_refs > 3REJECT

Routing outputs are deterministic and idempotent. Accepted records proceed to baseline storage. Warning-tagged records are routed to a drift-monitoring queue for automated reconciliation. Rejected records trigger immediate quarantine and alerting. This routing matrix ensures that Routing EXPLAIN ANALYZE Output to Centralized Logs receives only structurally verified payloads, preventing log pollution and false-positive regression alerts.

Production Implementation Patterns

Platform teams should implement the validator as a stateless, async microservice or sidecar process. The following Python implementation demonstrates production-ready patterns using pydantic for payload parsing, jsonschema for structural validation, and explicit routing logic.

PYTHON
import asyncio
import json
import logging
from typing import Literal
from pydantic import BaseModel, ValidationError
from jsonschema import validate, Draft202012Validator
from opentelemetry import trace, metrics

logger = logging.getLogger("schema_validator")
tracer = trace.get_tracer(__name__)
meter = metrics.get_meter(__name__)

validation_results_counter = meter.create_counter(
    "validation_results_total", description="Count of validation outcomes"
)

class PlanEnvelope(BaseModel):
    query_hash: str
    schema_version: str
    plan_tree: dict
    referenced_tables: list[str]
    referenced_columns: list[dict[str, str]]
    referenced_indexes: list[str]

class ValidationOutcome(BaseModel):
    status: Literal["ACCEPT", "WARN", "REJECT"]
    query_hash: str
    drift_score: float
    details: list[str]

async def validate_plan_envelope(envelope: dict, manifest: dict) -> ValidationOutcome:
    with tracer.start_as_current_span("validate_baseline_metadata"):
        try:
            parsed = PlanEnvelope(**envelope)
        except ValidationError as e:
            logger.error("Malformed envelope: %s", e)
            return ValidationOutcome(status="REJECT", query_hash="unknown", drift_score=1.0, details=["INVALID_PAYLOAD"])

        missing_tables = [t for t in parsed.referenced_tables if t not in manifest.get("tables", [])]
        missing_refs = len(missing_tables)
        
        constraint_drops = sum(1 for idx in parsed.referenced_indexes if idx not in manifest.get("active_indexes", []))
        stale_refs = constraint_drops
        
        # Routing Threshold Evaluation
        if missing_refs > 0:
            validation_results_counter.add(1, {"status": "REJECT"})
            return ValidationOutcome(status="REJECT", query_hash=parsed.query_hash, drift_score=1.0, details=[f"Missing tables: {missing_tables}"])
        
        if stale_refs > 3:
            validation_results_counter.add(1, {"status": "REJECT"})
            return ValidationOutcome(status="REJECT", query_hash=parsed.query_hash, drift_score=0.9, details=[f"Stale index references: {stale_refs}"])
        
        if 0 < stale_refs <= 3:
            validation_results_counter.add(1, {"status": "WARN"})
            return ValidationOutcome(status="WARN", query_hash=parsed.query_hash, drift_score=0.4, details=["Minor index drift detected"])
            
        validation_results_counter.add(1, {"status": "ACCEPT"})
        return ValidationOutcome(status="ACCEPT", query_hash=parsed.query_hash, drift_score=0.0, details=[])

Configuration for the validation service should be externalized to allow dynamic threshold tuning without redeployments:

YAML
validator:
  manifest_cache_ttl_seconds: 300
  routing_thresholds:
    max_missing_refs: 0
    max_stale_indexes_warn: 3
    max_stale_indexes_reject: 3
  fallback:
    enabled: true
    max_manifest_age_seconds: 3600
    circuit_breaker_failures: 5

Observability Hooks & Telemetry

Schema validation must be fully observable to support SRE incident response and capacity planning. Implement the following telemetry hooks:

  1. Metrics: Export Prometheus-compatible counters for validation_duration_seconds, validation_results_total{status="accept|warn|reject"}, and schema_manifest_age_seconds. Track manifest_fetch_latency to detect registry degradation.
  2. Structured Logging: Emit JSON logs with correlation_id, schema_version, drift_score, and routing_decision. Include a validation_trace_id to correlate with upstream EXPLAIN capture spans.
  3. Distributed Tracing: Instrument manifest resolution and rule evaluation as distinct spans. Tag spans with validation.status and validation.drift_category for downstream filtering in observability platforms.
  4. Alerting Rules: Trigger P3 alerts when validation_results_total{status="REJECT"} exceeds 5% of total throughput over a 15-minute window. Trigger P2 alerts if schema_manifest_age_seconds exceeds the configured TTL, indicating registry sync failure.

Safe Fallback Protocols & Drift Quarantine

Network partitions, registry outages, or delayed DDL propagation can temporarily starve the validator of fresh manifests. Implement these fallback protocols to maintain pipeline continuity without compromising data integrity:

  1. Last-Known-Good Cache: If the primary registry is unreachable, fall back to the most recently validated manifest. Enforce a strict TTL (e.g., 3600s). Beyond TTL, route all payloads to quarantine rather than risking baseline corruption.
  2. Circuit Breaker: Track consecutive manifest fetch failures. After circuit_breaker_failures (default: 5), open the circuit and bypass validation temporarily, routing all traffic to a drift_quarantine Kafka topic with validation_mode="BYPASS".
  3. Quarantine & Reconciliation: Rejected and bypassed payloads are written to a dead-letter queue with full context. A background reconciliation job periodically compares quarantined artifacts against the latest manifest. Validated records are promoted; invalid records are archived with a schema_mismatch tag.
  4. Automated Drift Resolution: When soft-drift warnings accumulate, trigger an automated diff against the DDL migration log. For detailed procedures on reconciling structural mismatches, refer to Validating Schema Changes Against Baseline Metadata.

By enforcing strict isolation, deterministic routing, and robust fallback mechanisms, the validation stage ensures that baseline metadata remains a reliable foundation for query optimization and performance regression automation.