Guide

Security Boundaries for Baseline Data Storage

The persistence layer for query plan baselines operates as a strictly isolated pipeline stage, decoupled from upstream capture mechanisms and downstream regression evaluators. Establishing robust Security Boundaries for Baseline Data Storage ensures that historical execution metadata remains immutable, auditable, and cryptographically protected throughout its lifecycle. Within the broader Core Architecture & Baselining Fundamentals, this boundary functions exclusively as a deterministic sink: it receives validated plan artifacts, enforces strict schema contracts, routes payloads to tiered archival storage, and rejects malformed inputs before they can contaminate the regression evaluation queue. Unlike transient telemetry streams, baseline storage must guarantee write-once-read-many (WORM) semantics to prevent retroactive tampering that could invalidate historical performance comparisons.

The ingestion gateway acts as the primary enforcement point. Every incoming baseline payload must satisfy a rigid contract containing a canonical plan identifier, normalized cost vectors, and a cryptographic signature. Routing logic relies on deterministic fingerprinting to prevent duplicate writes and maintain referential consistency. Upon receipt, the storage orchestrator computes a secondary verification hash against the payload’s declared identifier. This step cross-references established Plan Hashing Algorithms for SQL Engines to guarantee that engine-specific optimizer variations do not fragment the baseline corpus. If the computed hash diverges from the declared identifier, the request is immediately routed to a quarantined dead-letter queue with a structured audit trail, leaving the primary storage boundary untouched.

Cost normalization occurs upstream, but the persistence stage must enforce strict dimensional validation against the accepted schema. Baseline records frequently carry engine-specific optimizer metrics that require mapping to a unified representation. The persistence gateway validates that all cost vectors conform to expected constraints, explicitly referencing the standardized Cost Estimation Mapping Across PostgreSQL and MySQL to reject records containing out-of-bound or malformed distributions. Records failing schema validation trigger synchronous rejection, are logged to a structured event bus, and never reach the immutable archive.

The storage boundary itself is implemented as a zero-trust data plane. Network segmentation isolates the baseline archive from general-purpose telemetry networks, restricting ingress to authenticated service accounts with least-privilege IAM roles. All baseline artifacts undergo cryptographic signing at the edge and are verified before persistence. Storage backends must enforce server-side encryption with customer-managed keys (CMKs) and immutable retention policies. For detailed implementation patterns regarding key rotation and cipher suite selection, refer to Encrypting Baseline Query Plans at Rest and in Transit.

Implementation Blueprint: Validation, Routing, and Observability

A production-grade baseline storage boundary requires deterministic validation, explicit routing, and comprehensive telemetry. The following Python implementation demonstrates the ingestion contract using strict schema enforcement and cryptographic verification.

PYTHON
# baseline_ingest_validator.py
from pydantic import BaseModel, Field, ValidationError
from cryptography.hazmat.primitives import hashes, serialization
from cryptography.hazmat.primitives.asymmetric import ec
import json
from typing import Optional

class CostVector(BaseModel):
    cpu_cost: float = Field(ge=0.0)
    io_cost: float = Field(ge=0.0)
    memory_cost: float = Field(ge=0.0)
    row_estimate: int = Field(ge=0)

class BaselinePayload(BaseModel):
    plan_id: str = Field(pattern=r"^[a-f0-9]{64}$")
    engine: str = Field(pattern=r"^(postgresql|mysql)$")
    cost_vector: CostVector
    signature: str
    timestamp_ms: int

def verify_signature(payload: dict, public_key_pem: bytes) -> bool:
    """Verify ECDSA signature against canonical JSON payload."""
    canonical = json.dumps(payload, sort_keys=True).encode()
    pub_key = serialization.load_pem_public_key(public_key_pem)
    try:
        pub_key.verify(
            bytes.fromhex(payload["signature"]),
            canonical,
            ec.ECDSA(hashes.SHA256())
        )
        return True
    except Exception:
        return False

Routing logic must explicitly separate valid baselines from quarantined payloads. The following configuration outlines a deterministic routing matrix with explicit fallback thresholds and idempotency controls:

YAML
# routing_matrix.yaml
ingestion:
  validation_timeout_ms: 500
  max_payload_size_kb: 128
  routing:
    success:
      target: "s3://baseline-archive/immutable/"
      retention_policy: "GOVERNANCE_365D"
      idempotency_key: "plan_id"
      max_retries: 0
    hash_mismatch:
      target: "sqs://baseline-dlq/hash-failure"
      retry: 0
      alert_severity: "P2"
      max_queue_depth: 10000
    schema_violation:
      target: "sqs://baseline-dlq/schema-failure"
      retry: 0
      alert_severity: "P3"
      max_queue_depth: 25000
    signature_invalid:
      target: "sqs://baseline-dlq/auth-failure"
      retry: 0
      alert_severity: "P1"
      max_queue_depth: 5000

Observability hooks must capture validation latency, routing decisions, and cryptographic verification outcomes. Integrating the OpenTelemetry Specification ensures distributed tracing across the ingestion boundary:

PYTHON
from opentelemetry import trace, metrics
from opentelemetry.trace import StatusCode

tracer = trace.get_tracer("baseline.storage")
meter = metrics.get_meter("baseline.storage")
validation_counter = meter.create_counter("baseline.validation.attempts")
routing_duration = meter.create_histogram("baseline.routing.duration.ms")

def process_baseline(payload: dict) -> str:
    with tracer.start_as_current_span("validate_and_route") as span:
        try:
            validated = BaselinePayload(**payload)
            validation_counter.add(1, {"status": "schema_pass"})
            span.set_attribute("routing.target", "immutable_archive")
            return "ARCHIVED"
        except ValidationError as e:
            span.record_exception(e)
            span.set_status(StatusCode.ERROR)
            validation_counter.add(1, {"status": "schema_fail", "error": str(e)})
            raise

Safe Fallback Protocols and Quarantine Workflows

When the primary storage boundary rejects a payload, the system must degrade gracefully without blocking upstream capture pipelines. The fallback protocol operates on three principles:

  1. Non-Blocking Rejection: Validation failures never block the ingestion queue. Rejected payloads are serialized to a dead-letter queue (DLQ) with full context preservation, including original headers and validation error traces.
  2. Idempotent Replay: Quarantined payloads retain original metadata and can be safely replayed once upstream normalization or hashing logic is patched. Replay workers must enforce the same schema contract to prevent recursive failures.
  3. Circuit Breaker Integration: If the DLQ depth exceeds a configurable threshold (e.g., 10,000 unprocessed records), a circuit breaker halts baseline ingestion and triggers an automated alert to the platform team. This prevents storage exhaustion and forces a controlled backpressure signal to upstream query capture agents.

Storage backends must enforce immutable retention policies at the infrastructure layer. For cloud-native deployments, leveraging object lock configurations ensures that even compromised service accounts cannot alter or delete archived baselines. Refer to AWS S3 Object Lock documentation for implementation guidelines on compliance-mode retention and legal hold workflows.

The security boundary for baseline data storage is not merely a persistence layer; it is the cryptographic and logical anchor for all downstream regression automation. By enforcing strict schema contracts, deterministic routing, and zero-trust access controls, platform teams can guarantee that historical query performance data remains a reliable source of truth for continuous optimization.