cAIge Competency Matrix

Certified AI Guardrail Engineer (cAIge)

Version: 1.0 Maintained by: caige.org

Overview

AI guardrails are the technical controls that prevent AI systems from producing harmful content, being exploited by adversarial users, leaking sensitive data, or taking unauthorized actions. The cAIge certification focuses on the engineering discipline of designing, building, and validating these controls — specifically for applications powered by large language models (LLMs).

The cAIge competency matrix defines what a Certified AI Guardrail Engineer must know and be able to do. It is organized into five domains that follow the guardrail engineering lifecycle:

Understand the technology — How LLMs and production AI systems work
Understand the failures — What goes wrong, why, and how to model threats
Architect the defenses — Design guardrail strategies and systems
Build the defenses — Implement guardrails using proven techniques and tooling
Validate the defenses — Test, measure, monitor, and maintain guardrail effectiveness

The matrix is vendor-agnostic — it tests understanding of concepts, patterns, and engineering thinking rather than specific product knowledge. A cAIge holder is qualified to engineer guardrail systems for LLM-powered applications across any technology stack.

Domain Breakdown and Weights

Domain	Weight	Description
1. AI Foundations	15%	How LLMs and production AI systems work
2. Understanding Failure Modes	15%	What goes wrong, why, and how to model threats
3. Architecting Guardrails	25%	Designing guardrail strategies and systems
4. Implementing Guardrails	25%	Building guardrails using proven techniques and tooling
5. Validating Guardrails	20%	Testing, measuring, monitoring, and maintaining guardrails

Domain 1: AI Foundations (15%)

This domain covers the foundational knowledge of AI technology and production AI systems. A guardrail engineer must understand how the systems they are guarding actually work — from model internals to the production architecture that surrounds them.

1.1 Neural Networks & Deep Learning

Knowledge areas:

Neural network architecture — input layers, hidden layers, output layers, nodes, and weighted connections
How weights (parameters) determine a network’s behavior and what “model size” means
The training loop — forward pass, loss calculation, backpropagation, and weight update
Activation functions and why non-linearity is necessary for learning complex patterns
Why knowledge is distributed across weights and cannot be inspected, queried, or selectively removed

Skills — the candidate can:

Explain how a neural network learns patterns from data through iterative weight adjustment
Describe why knowledge stored in neural network weights cannot be inspected or selectively removed
Relate model parameter count to model capacity and the implications for memorization and risk
Distinguish between training (weight adjustment) and inference (prediction using fixed weights)

1.2 Large Language Models

Knowledge areas:

The transformer architecture and how self-attention replaced sequential processing (RNNs, LSTMs)
Encoder-decoder vs. decoder-only architecture and why modern LLMs use decoder-only
Next-token prediction as the fundamental training objective
Scale — parameter counts, training data sizes, and the emergence of capabilities at scale
Mixture-of-Experts (MoE) and efficiency tradeoffs
Embedding models — dense vector representations, semantic similarity, and their role in retrieval and classification systems
What an LLM is not — not a database, not a search engine, not a reasoning engine

Skills — the candidate can:

Explain why the transformer architecture replaced sequential models and what self-attention enables
Describe the next-token prediction objective and why it produces general-purpose capabilities
Relate model scale to emergent capabilities and risk factors (memorization, unpredictability)
Articulate why LLMs are pattern-matching systems producing statistical predictions, not reasoning engines with understanding

1.3 LLM Inference & Text Generation

Knowledge areas:

Tokenization — subword units, Byte Pair Encoding (BPE), vocabulary, special tokens
The context window as working memory and the token budget tradeoff
The embedding layer — converting token IDs to dense learned vectors
Positional encoding — adding sequence order information to embeddings
Masked self-attention — queries, keys, values, multi-head attention, and the causal mask
Feed-forward networks and their role in storing factual knowledge
Residual connections and layer normalization for training stability
The output head — logits, softmax, and probability distributions over the vocabulary
The autoregressive generation loop — producing one token at a time
Temperature, top-k, top-p sampling and their effect on output variability
Reasoning models, chain-of-thought, and inference-time scaling

Skills — the candidate can:

Trace the complete data path from raw text through tokenization, embedding, transformer layers, and output generation
Explain how the attention mechanism processes all tokens in parallel without distinguishing trusted from untrusted input
Describe how temperature and sampling parameters affect output variability and non-determinism
Identify where in the inference pipeline guardrails can intercept or validate data

1.4 LLM Training Pipeline

Knowledge areas:

Pre-training on massive text corpora — next-token prediction at scale, base models
Instruction tuning — fine-tuning on conversation data, chat templates with role boundaries (system, user, assistant, tool)
The instruction hierarchy as a learned statistical preference, not an enforced constraint
RLHF — reward models, Proximal Policy Optimization (PPO), and alignment training
Alternatives to RLHF — Constitutional AI, Direct Preference Optimization (DPO)
RL for reasoning — training models to produce step-by-step thinking tokens
Distillation — training smaller models to replicate larger model behavior
What each training stage creates and what can go wrong at each stage
What the application developer can and cannot control

Skills — the candidate can:

Describe each stage of LLM training and the capabilities and risks each introduces
Explain why instruction-following and safety behaviors are learned statistical preferences, not hard-coded rules
Identify which aspects of model behavior are controlled by the provider vs. the application developer
Articulate why training alone cannot eliminate risks like hallucination, prompt injection, or jailbreaking

1.5 Production AI System Architecture

Knowledge areas:

How production AI applications are assembled — API gateways, orchestration layers, model routing, and the request/response pipeline
Common production patterns — simple chat, RAG pipelines, agentic workflows, multi-model architectures — and how each pattern structures the data flow
Guardrail placement points — pre-processing (before the model), post-processing (after the model), system-level (around the model), and retrieval-level (before context injection)
The relationship between model provider safety features and application-level guardrails — what the provider handles vs. what you must build
Guardrail principles apply across modalities (text, image, audio, video); this program uses text as the primary example

Skills — the candidate can:

Map the components of a production AI application and identify where guardrails should be placed
Distinguish between model provider safety controls and application-level guardrails, and explain why both are necessary
Describe how RAG, agentic, and multi-model patterns change the architectural complexity and guardrail insertion points
Identify all guardrail insertion points for a given AI system architecture

Domain 2: Understanding Failure Modes (15%)

This domain covers how and why AI systems fail — from the architectural properties that create vulnerabilities to the specific failure modes that guardrails must address. A guardrail engineer must understand the risk landscape before designing defenses.

2.1 Architecture-to-Risk Mapping

Knowledge areas:

The instruction hierarchy problem — why learned compliance is not enforced constraint, and what that means for guardrail design
Mapping architectural properties to specific risks — attention mechanisms enable prompt injection, distributed weights enable data leakage, probabilistic generation enables hallucination, learned safety boundaries enable jailbreaking
Trust boundaries — where data crosses from trusted to untrusted in an AI pipeline, and why guardrails must exist at each boundary
The three layers of defense — model training, system prompt, application-level guardrails — and why defense in depth is required
How RAG expands the attack surface — retrieval poisoning, indirect prompt injection through documents, document-level access control gaps
How agentic patterns expand the attack surface — tool misuse, cascading failures, privilege escalation, identity delegation

Skills — the candidate can:

Map each architectural property of an LLM to the specific risk it creates
Identify trust boundaries in an AI system and explain why guardrails belong at each boundary
Explain why RAG and agentic patterns expand the attack surface beyond simple chat applications
Articulate the defense-in-depth principle and why no single layer of defense is sufficient

2.2 Common Failure Modes

Knowledge areas:

Hallucination — factual errors, fabricated citations, confident but wrong answers
Prompt injection — direct injection, indirect injection via retrieved content
Jailbreaking — techniques that bypass model safety training (role-playing, encoding, multi-turn manipulation)
Data leakage — exposing training data, system prompts, or user PII
Toxic and harmful output — hate speech, bias, dangerous instructions
Off-topic drift — model responding outside its intended scope
Over-reliance — systems that defer too much to AI without human checks
Cascading failures in agentic systems — one bad tool call leading to compounding errors
Identity and access failures — cross-tenant data leakage, privilege escalation, impersonation through prompt manipulation

Skills — the candidate can:

Categorize failure modes by severity and likelihood for a given use case
Explain why each failure mode occurs at a technical level
Map failure modes to the guardrail strategies that mitigate them
Recognize novel failure patterns that don’t fit existing categories

2.3 Threat Modeling for AI Systems

Knowledge areas:

AI-specific threat modeling frameworks and approaches (including OWASP Top 10 for LLM Applications)
Adversary profiles — who attacks AI systems and why (malicious users, competitors, researchers, insiders)
Attack surfaces unique to AI — prompts, training data, retrieval corpora, tool integrations, model APIs, tool integration protocols (MCP)
Supply chain risks — third-party models, fine-tuned weights, poisoned datasets, third-party MCP servers and tool providers
Risk assessment — likelihood vs. impact for AI-specific threats

Skills — the candidate can:

Conduct a threat model for an AI application identifying key risks and attack vectors
Prioritize guardrail investment based on threat severity and likelihood
Map adversary profiles to specific attack techniques and guardrail requirements
Document threat models in a format useful to engineering and security teams

Domain 3: Architecting Guardrails (25%)

This domain covers the design of guardrail strategies and systems. A guardrail engineer must be able to architect multi-layered defenses that address the failure modes identified in Domain 2, selecting the right guardrail types for each use case and placement point.

3.1 Guardrail Taxonomy

Knowledge areas:

Input guardrails — controls applied before the AI processes a request
Output guardrails — controls applied after the AI generates a response
System-level guardrails — controls that govern the overall behavior of the AI system
Retrieval guardrails — controls specific to RAG pipelines and knowledge retrieval
Agentic guardrails — controls on tool use, action execution, and autonomous decision-making
Human-in-the-loop guardrails — when and how to escalate to a human

Skills — the candidate can:

Classify any guardrail into its correct category
Explain the tradeoffs between each guardrail type (latency, cost, coverage, false positive rate)
Design a multi-layered guardrail strategy that combines input, output, and system-level controls
Justify why certain use cases require specific guardrail types

3.2 Input Guardrail Design

Knowledge areas:

Prompt validation and sanitization techniques
Prompt injection detection — pattern-based, classifier-based, and LLM-as-judge approaches
Input schema enforcement — constraining what users can send
Topic and intent classification — detecting off-topic or malicious intent
Rate limiting and abuse prevention at the input layer
Identity and access control as guardrail foundations — authentication, authorization, session isolation
Multi-tenant isolation — ensuring one user’s data, context, and conversation history never leaks to another
Identity-aware guardrail tuning — different guardrail profiles based on user role, trust level, or authorization scope
Content length and complexity limits

Skills — the candidate can:

Design an input validation pipeline for a given use case
Select appropriate injection detection methods based on risk tolerance and performance requirements
Define input schemas that balance usability with safety
Implement layered input defenses (cheap checks first, expensive checks for edge cases)
Design identity-aware guardrail systems that enforce different policies based on user context

3.3 Output Guardrail Design

Knowledge areas:

Content filtering — toxicity, bias, appropriateness classifiers
PII detection and redaction in AI-generated output
Factuality and groundedness checking — verifying claims against source material
Structured output enforcement — JSON schema validation, function call validation
Citation and attribution enforcement in RAG systems
Confidence scoring and uncertainty communication
Response length and format constraints
Refusal design — how the system communicates that it cannot fulfill a request

Skills — the candidate can:

Design output filtering pipelines that catch harmful content without excessive false positives
Implement PII detection strategies appropriate to the data sensitivity level
Build groundedness checks that verify AI output against retrieved sources
Design user-friendly refusal messages that are helpful without revealing system internals

3.4 System-Level Guardrail Design

Knowledge areas:

System prompt engineering for safety — setting behavioral boundaries
Conversation memory management — what to retain, what to forget, what to never store
Fallback and circuit breaker patterns — degrading gracefully when AI fails
Model selection and routing as a guardrail strategy (simpler models for lower-risk tasks)
Multi-model architectures — using one model to check another
Timeout and resource limits
Canary and shadow deployment patterns for guardrail changes

Skills — the candidate can:

Write system prompts that establish clear behavioral boundaries
Design fallback chains that maintain user experience during guardrail-triggered blocks
Architect multi-model validation systems (e.g., a small classifier guarding a large generator)
Plan guardrail deployment strategies that minimize risk of breaking production

3.5 RAG-Specific Guardrails

Knowledge areas:

Source document access control — ensuring retrieved content respects permissions
Relevance filtering — preventing irrelevant retrieval from polluting AI responses
Indirect prompt injection via retrieved documents
Source attribution and traceability requirements
Chunk-level vs. document-level guardrails
Handling contradictory sources in retrieval
Staleness and versioning of knowledge bases

Skills — the candidate can:

Design access control systems for RAG knowledge bases
Implement relevance thresholds that balance recall with safety
Defend against indirect injection through retrieved content
Build citation systems that let users verify AI claims against sources

3.6 Agentic System Guardrails

Knowledge areas:

Tool use policies — which tools an agent can access and under what conditions
Action confirmation and approval workflows
Scope limiting — constraining what an agent can do in a single session
Sandboxing and isolation for agent-executed code or actions
Budget and resource caps (API calls, tokens, time, cost)
Rollback and undo capabilities for agent actions
Observation and reasoning trace auditing
Multi-agent coordination and trust boundaries
Identity delegation — whose identity an agent acts under when calling tools and accessing systems
Privilege boundaries — preventing agents from escalating beyond the invoking user’s permissions
Tool integration protocols (MCP) — how models connect to external tools, trust boundaries between MCP clients and servers, permission scoping, transport security
Supply chain risks of third-party tool servers — untrusted MCP servers as attack surface, prompt injection through tool results

Skills — the candidate can:

Define tool access policies based on risk level and user authorization
Design approval workflows that balance autonomy with safety
Implement resource caps that prevent runaway agent behavior
Build audit trails that capture agent reasoning and actions for review
Design identity delegation models that prevent privilege escalation in agentic systems
Evaluate the trust boundaries and security implications of tool integration protocols like MCP

Domain 4: Implementing Guardrails (25%)

This domain covers the practical techniques and tooling used to build guardrails. A guardrail engineer must be able to select and combine detection methods, enforce output constraints, handle sensitive data, and integrate guardrails into application architectures.

4.1 Detection and Classification Techniques

Knowledge areas:

Rule-based detection — regex, keyword lists, blocklists/allowlists
ML-based classification — text classifiers, toxicity models, intent detection
LLM-as-judge — using language models to evaluate other language model outputs
Embedding-based similarity detection — cosine similarity for topic matching, nearest-neighbor for known-bad inputs
Hybrid approaches — combining rules, ML, and LLM-based methods
Tradeoffs: latency, cost, accuracy, maintainability for each approach

Skills — the candidate can:

Select the right detection approach for a given guardrail requirement
Combine multiple detection methods into a layered pipeline
Evaluate detection accuracy using precision, recall, and F1 metrics
Tune detection thresholds to balance false positives and false negatives

4.2 Structured Output Enforcement

Knowledge areas:

JSON schema validation for AI outputs
Function calling and tool use schema constraints
Output parsing and error recovery strategies
Retry logic for malformed outputs
Constrained decoding and grammar-based generation (where supported)
Template-based output generation as a guardrail strategy

Skills — the candidate can:

Define output schemas that enforce safety and correctness constraints
Build robust parsers that handle edge cases in AI-generated structured output
Implement retry strategies that don’t degrade user experience
Choose between constrained generation and post-hoc validation based on use case

4.3 PII and Sensitive Data Handling

Knowledge areas:

PII categories and sensitivity levels (names, emails, SSNs, medical records, financial data)
Detection methods — regex, NER models, purpose-built PII detectors
Redaction vs. masking vs. tokenization strategies
Data minimization in prompts — sending only what the model needs
Logging considerations — what to log, what to never log
Regional and regulatory differences in PII definitions

Skills — the candidate can:

Implement PII detection pipelines with appropriate sensitivity for the use case
Design data flows that minimize PII exposure to AI models
Configure logging systems that capture useful debugging info without storing PII
Apply different PII handling strategies based on data classification levels

4.4 Guardrail Frameworks and Tooling

Knowledge areas:

Understanding of the guardrail tooling landscape (not vendor-specific configuration, but categories of tools)
Guardrail middleware and interceptor patterns
SDK-level vs. proxy-level vs. gateway-level guardrail enforcement
Custom guardrail development — when to build vs. buy vs. use open source
Integration patterns — how guardrails fit into existing application architectures, including tool integration protocols (MCP)
Version control and configuration management for guardrail rules

Skills — the candidate can:

Evaluate guardrail tools and frameworks against requirements
Integrate guardrails into an application without requiring major architectural changes
Build custom guardrails when off-the-shelf solutions don’t fit
Manage guardrail configurations as code with proper version control

4.5 Prompt Engineering for Safety

Knowledge areas:

Defensive system prompt techniques — clear boundaries, explicit refusal instructions
Few-shot examples for guiding safe behavior
Chain-of-thought prompting for improved reasoning and guardrail compliance
Prompt templates that minimize injection surface area
Separation of instructions and user content in prompts
Dynamic prompt construction risks and mitigations

Skills — the candidate can:

Write system prompts that establish clear, robust behavioral boundaries
Use few-shot examples to demonstrate desired refusal behavior
Structure prompts to minimize the effectiveness of injection attacks
Test system prompts against known bypass techniques

Domain 5: Validating Guardrails (20%)

This domain covers the complete validation lifecycle — from adversarial testing before deployment to ongoing monitoring in production. A guardrail engineer must be able to prove that guardrails work, measure their effectiveness, detect when they degrade, and respond when they fail.

5.1 Adversarial Testing & Red Teaming

Knowledge areas:

Red teaming methodology for AI systems — planning, execution, reporting
Prompt injection attack taxonomy — direct, indirect, multi-turn, encoded
Jailbreak techniques — role-play attacks, DAN prompts, encoding tricks, language switching
Social engineering attacks against AI systems
Automated vs. manual red teaming — tools and approaches
Responsible disclosure for AI vulnerabilities

Skills — the candidate can:

Plan and execute a structured red team engagement against an AI system
Apply known attack techniques to test guardrail effectiveness
Document findings in a format actionable by engineering teams
Distinguish between theoretical vulnerabilities and practically exploitable ones

5.2 Guardrail Testing Methodology

Knowledge areas:

Unit testing individual guardrail components (classifiers, filters, validators)
Integration testing guardrail pipelines end-to-end
Regression testing — ensuring guardrail updates don’t break existing protections
Edge case testing — boundary inputs, encoding variations, language mixing
Performance testing — latency impact, throughput under load
A/B testing guardrail configurations

Skills — the candidate can:

Build comprehensive test suites for guardrail systems
Create adversarial test datasets that cover known attack patterns
Design regression test pipelines that run on every guardrail change
Measure and report on guardrail performance impact (latency, cost, user experience)

5.3 Evaluation Metrics

Knowledge areas:

Precision, recall, and F1 for guardrail classifiers
False positive rate — blocking legitimate requests (user friction)
False negative rate — missing harmful content (safety gaps)
Latency percentiles (p50, p95, p99) for guardrail processing
Cost per guardrail evaluation
Coverage metrics — what percentage of inputs/outputs are checked
User satisfaction and complaint rates as indirect guardrail metrics

Skills — the candidate can:

Define appropriate success metrics for guardrails based on use case risk level
Build dashboards and reports that communicate guardrail effectiveness
Use metrics to justify guardrail tuning decisions to stakeholders
Identify when metrics indicate a guardrail is underperforming

5.4 Monitoring & Observability

Knowledge areas:

Key metrics to monitor for guardrail systems (block rate, bypass rate, latency, error rate)
Anomaly detection for guardrail behavior — sudden spikes in blocks or passes
Alert design — what warrants a page vs. a ticket vs. a log entry
Dashboard design for guardrail operations
Structured logging for guardrail events — decision, reasoning, input hash, output action, latency
Privacy-preserving logging — capturing enough for debugging without storing sensitive content
Log analysis and forensics for guardrail investigations

Skills — the candidate can:

Design monitoring systems that surface guardrail health and effectiveness
Configure alerts that catch guardrail failures without creating alert fatigue
Design logging schemas that balance debugging utility with privacy requirements
Investigate anomalies in guardrail behavior and determine root causes

5.5 Continuous Validation & Lifecycle Management

Knowledge areas:

Canary testing for guardrail deployments
Synthetic traffic and chaos testing for guardrail systems
Ongoing adversarial probing in production (automated red teaming)
Community and research-driven attack updates — staying current on new bypass techniques
Model update impact assessment — re-validating guardrails when underlying models change
Guardrail drift — why guardrails degrade over time (new attacks, model updates, data shifts)
Guardrail versioning and rollback strategies
Incident response for guardrail failures — containment, classification, root cause analysis, hardening
Cost optimization — reducing guardrail spend without reducing coverage
Guardrail debt — accumulation of outdated or redundant guardrails

Skills — the candidate can:

Implement automated canary tests that validate guardrails on every deployment
Design synthetic adversarial traffic that continuously probes for weaknesses
Build processes that incorporate newly discovered attack patterns into test suites
Execute incident response procedures for guardrail bypass events
Plan guardrail maintenance schedules that account for model and threat evolution
Identify and remove redundant or outdated guardrails

Cross-Cutting Competencies

These are not tested as separate domains but are expected throughout all areas:

Vendor-agnostic thinking — understanding patterns and principles rather than memorizing product-specific configurations
Risk-based decision making — always connecting guardrail decisions to actual risk levels
Systems thinking — understanding how guardrails interact with the broader application, infrastructure, and organizational context
Continuous learning — awareness that the AI guardrail landscape evolves rapidly and commitment to staying current

← PreviousTraining Program Overview Next →Exam Blueprint