Section 4.2: Structured Output Enforcement
One of the most underappreciated guardrail strategies is not checking what the model says, but constraining how it says it. When an AI model returns free-form text, anything can happen — hallucinated data, embedded injection payloads, unexpected formats that break downstream processing. Structured output enforcement turns that open-ended risk into a bounded problem.
The idea is straightforward: define exactly what shape the output must take, then reject or retry anything that does not conform. This is a fundamentally different guardrail strategy from content classification. Instead of asking “is this output safe?” you ask “does this output conform to the expected structure?” — and structural violations are often the first signal that something has gone wrong.
JSON Schema Validation
JSON schema validation is the most common form of structured output enforcement. You define a schema that specifies the expected fields, types, and constraints, then validate every model output against it.
Here is a schema for a customer support bot that must return structured responses:
RESPONSE_SCHEMA = {
"type": "object",
"required": ["answer", "confidence", "sources", "needs_escalation"],
"properties": {
"answer": {
"type": "string",
"maxLength": 2000,
"minLength": 1,
},
"confidence": {
"type": "number",
"minimum": 0.0,
"maximum": 1.0,
},
"sources": {
"type": "array",
"items": {"type": "string"},
"maxItems": 5,
},
"needs_escalation": {
"type": "boolean",
},
},
"additionalProperties": False,
}
Validation in code:
import json
import jsonschema
def validate_model_output(raw_output: str, schema: dict) -> dict:
"""Parse and validate model output against a JSON schema."""
try:
parsed = json.loads(raw_output)
except json.JSONDecodeError as e:
return {
"valid": False,
"error": f"Invalid JSON: {e}",
"parsed": None,
}
try:
jsonschema.validate(instance=parsed, schema=schema)
except jsonschema.ValidationError as e:
return {
"valid": False,
"error": f"Schema violation: {e.message}",
"parsed": parsed,
}
return {"valid": True, "error": None, "parsed": parsed}
Why this matters for guardrails: Schema validation catches a broad class of output problems in a single check. If the model hallucinates extra fields, omits required ones, returns a string where a number was expected, or produces an answer longer than your maximum — the schema catches it. It is fast, deterministic, and zero-cost. Every application that consumes structured AI output should validate it.
Pydantic Models for Type-Safe Validation
In Python, Pydantic models provide an even more powerful approach than raw JSON schema. They combine parsing, validation, and type safety in a single declaration.
from pydantic import BaseModel, Field, field_validator
class SupportResponse(BaseModel):
answer: str = Field(..., min_length=1, max_length=2000)
confidence: float = Field(..., ge=0.0, le=1.0)
sources: list[str] = Field(default_factory=list, max_length=5)
needs_escalation: bool = False
@field_validator("answer")
@classmethod
def answer_must_not_contain_pii_patterns(cls, v: str) -> str:
import re
if re.search(r"\b\d{3}-\d{2}-\d{4}\b", v):
raise ValueError("Response contains SSN-like pattern")
return v
@field_validator("sources")
@classmethod
def sources_must_be_urls(cls, v: list[str]) -> list[str]:
for source in v:
if not source.startswith(("http://", "https://", "doc://")):
raise ValueError(f"Invalid source format: {source}")
return v
def parse_model_output(raw_output: str) -> dict:
"""Parse and validate model output using Pydantic."""
try:
parsed = json.loads(raw_output)
response = SupportResponse(**parsed)
return {"valid": True, "data": response.model_dump(), "error": None}
except json.JSONDecodeError as e:
return {"valid": False, "data": None, "error": f"Invalid JSON: {e}"}
except Exception as e:
return {"valid": False, "data": None, "error": str(e)}
The Pydantic approach has several advantages over raw JSON schema validation:
- Custom validators can enforce guardrail logic directly in the schema (like the PII pattern check above).
- Type coercion handles minor type mismatches (e.g.,
"0.95"parsed as0.95). - Default values provide graceful degradation when optional fields are missing.
- Serialization gives you clean Python objects to work with downstream.
Function Calling and Tool Use Schema Constraints
Modern LLM APIs support function calling (or tool use), where the model’s output is constrained to match a declared function signature. This is a provider-level form of structured output enforcement.
tools = [
{
"type": "function",
"function": {
"name": "search_knowledge_base",
"description": "Search the internal knowledge base for relevant articles.",
"parameters": {
"type": "object",
"required": ["query"],
"properties": {
"query": {
"type": "string",
"description": "The search query.",
"maxLength": 200,
},
"max_results": {
"type": "integer",
"minimum": 1,
"maximum": 10,
"default": 5,
},
},
"additionalProperties": False,
},
},
}
]
When the model uses function calling, the output is structurally constrained by the API — but you still need to validate that the values make sense. A model might produce valid JSON with a query like "show me all user passwords" — structurally correct, semantically dangerous.
def validate_tool_call(tool_call: dict) -> dict:
"""Validate both structure and content of a tool call."""
name = tool_call.get("function", {}).get("name")
args = json.loads(tool_call["function"]["arguments"])
if name == "search_knowledge_base":
query = args.get("query", "")
if len(query) > 200:
return {"valid": False, "error": "Query exceeds maximum length"}
if any(term in query.lower() for term in ["password", "secret", "credential"]):
return {"valid": False, "error": "Query contains restricted terms"}
return {"valid": True, "error": None, "args": args}
Why this matters for guardrails: Function calling schemas constrain the structure of tool use, but they do not constrain the intent. The schema says “query must be a string under 200 characters” — it does not say “query must be a legitimate search.” You always need a content-level check on top of the structural constraint.
Output Parsing and Error Recovery
AI models are not reliable JSON generators. They add commentary before or after the JSON, use single quotes instead of double quotes, include trailing commas, or emit markdown code fences around their output. A robust parser must handle all of these cases.
import re
import json
def extract_json_from_response(raw: str) -> str | None:
"""Extract JSON from a model response that may contain extra text."""
# Try parsing the raw string directly
try:
json.loads(raw)
return raw
except json.JSONDecodeError:
pass
# Try extracting from markdown code blocks
code_block_match = re.search(r"```(?:json)?\s*\n?(.*?)\n?```", raw, re.DOTALL)
if code_block_match:
candidate = code_block_match.group(1).strip()
try:
json.loads(candidate)
return candidate
except json.JSONDecodeError:
pass
# Try finding JSON object boundaries
brace_match = re.search(r"\{.*\}", raw, re.DOTALL)
if brace_match:
candidate = brace_match.group(0)
try:
json.loads(candidate)
return candidate
except json.JSONDecodeError:
pass
return None
Retry Logic for Malformed Outputs
When parsing fails, you need a retry strategy. The key design decisions are: how many times to retry, whether to change the prompt on retry, and when to give up and fall back.
import time
from dataclasses import dataclass
@dataclass
class RetryConfig:
max_retries: int = 3
backoff_base_ms: int = 100
include_error_in_retry: bool = True
fallback_response: dict | None = None
def generate_with_retry(
llm_client,
messages: list[dict],
schema: dict,
config: RetryConfig = RetryConfig(),
) -> dict:
"""Generate structured output with retry on validation failure."""
last_error = None
for attempt in range(config.max_retries + 1):
if attempt > 0:
wait_ms = config.backoff_base_ms * (2 ** (attempt - 1))
time.sleep(wait_ms / 1000)
if config.include_error_in_retry and last_error:
messages = messages + [
{"role": "assistant", "content": last_raw},
{
"role": "user",
"content": (
f"Your previous response was not valid JSON matching "
f"the required schema. Error: {last_error}. "
f"Please try again, returning ONLY valid JSON."
),
},
]
response = llm_client.chat.completions.create(
model="gpt-4o-mini",
messages=messages,
temperature=0.0,
)
last_raw = response.choices[0].message.content
extracted = extract_json_from_response(last_raw)
if extracted is None:
last_error = "No JSON found in response"
continue
result = validate_model_output(extracted, schema)
if result["valid"]:
return {
"success": True,
"data": result["parsed"],
"attempts": attempt + 1,
}
last_error = result["error"]
if config.fallback_response:
return {
"success": False,
"data": config.fallback_response,
"attempts": config.max_retries + 1,
"error": f"All retries exhausted. Last error: {last_error}",
}
return {
"success": False,
"data": None,
"attempts": config.max_retries + 1,
"error": f"All retries exhausted. Last error: {last_error}",
}
Important retry design principles:
- Exponential backoff prevents hammering the API on transient failures.
- Error feedback tells the model what went wrong, dramatically improving success on retry.
- Attempt limit prevents infinite loops and unbounded cost.
- Fallback response provides a safe default when all retries fail — better than crashing.
Constrained Decoding and Grammar-Based Generation
Some model serving frameworks support constrained decoding, where the model’s output is forced to conform to a grammar or schema at the token level. Instead of generating freely and validating afterward, the model can only produce tokens that lead to valid output.
┌──────────────────────────────────────────────────────────────┐
│ Constrained Decoding │
│ │
│ Model logits ──► Grammar filter ──► Valid tokens only │
│ │
│ At each step, only tokens that maintain valid JSON/grammar │
│ are allowed. The model CANNOT produce invalid output. │
└──────────────────────────────────────────────────────────────┘
┌──────────────────────────────────────────────────────────────┐
│ Post-Hoc Validation │
│ │
│ Model generates freely ──► Parse ──► Validate ──► Retry? │
│ │
│ The model CAN produce invalid output. You catch it after │
│ generation and either fix it or retry. │
└──────────────────────────────────────────────────────────────┘
Constrained Generation vs. Post-Hoc Validation
| Factor | Constrained Generation | Post-Hoc Validation |
|---|---|---|
| Guarantee | 100% — output is always structurally valid | Probabilistic — depends on retry success |
| Latency | Slightly higher per token (grammar check at each step) | Lower per attempt, but retries add latency |
| Availability | Requires framework support (vLLM, llama.cpp, Outlines) | Works with any model API |
| Flexibility | Limited to supported grammar types | Can validate any arbitrary constraint |
| Content quality | May reduce output quality by over-constraining | Preserves full model capability |
| Error handling | No errors to handle — output is always valid | Must handle parse failures, retries, fallbacks |
| Cost | Single generation attempt | Multiple attempts on failure |
| API support | Growing — OpenAI structured outputs, Anthropic tool use | Universal — works with any provider |
Why this matters for guardrails: Constrained generation eliminates an entire class of guardrail failures — you never have to deal with malformed output. But it is not always available, and it only guarantees structure, not content. A JSON object that perfectly matches the schema can still contain hallucinated data, toxic text, or PII. Structured output enforcement is a necessary layer, but it is not a sufficient one.
Template-Based Output Generation
For high-stakes applications where output format must be absolutely predictable, template-based generation removes the model from the formatting step entirely. The model fills in specific fields, and the application assembles the final output from a template.
RESPONSE_TEMPLATE = """Based on your question about {topic}:
{answer}
Sources consulted:
{sources}
Confidence: {confidence_label}
{escalation_note}"""
def build_response_from_template(model_output: dict) -> str:
"""Build a user-facing response from validated model output and a template."""
sources_text = "\n".join(
f" - {source}" for source in model_output["sources"]
) or " - No specific sources cited"
if model_output["confidence"] >= 0.8:
confidence_label = "High"
elif model_output["confidence"] >= 0.5:
confidence_label = "Medium"
else:
confidence_label = "Low — please verify this information"
escalation_note = (
"\n⚠️ This question has been flagged for human review."
if model_output["needs_escalation"]
else ""
)
return RESPONSE_TEMPLATE.format(
topic=model_output.get("topic", "your inquiry"),
answer=model_output["answer"],
sources=sources_text,
confidence_label=confidence_label,
escalation_note=escalation_note,
)
This pattern ensures the final user-facing text always follows the expected format, regardless of what the model produces. The model provides the content (answer, sources, confidence), and the application controls the presentation.
Why this matters for guardrails: Template-based generation is the strongest structural guardrail you can apply. The model never controls the final format, only the data that fills it. This prevents format injection — where an attacker manipulates the model into producing output that looks like system messages, UI elements, or instructions to the user.