cAIge Exam Blueprint

Certified AI Guardrail Engineer (cAIge) Examination

Version: 1.0 Maintained by: caige.org

Exam Overview

Parameter	Detail
Exam name	Certified AI Guardrail Engineer (cAIge)
Format	Multiple choice, multiple select, and scenario-based questions
Number of questions	75 scored + 5 unscored (pilot questions)
Time limit	120 minutes
Passing score	70% (53 of 75 scored questions)
Delivery	Online proctored
Prerequisites	None required to sit for the exam; completion of the cAIge training program is strongly recommended
Languages	English (additional languages planned)

Question Distribution by Domain

Domain	Weight	Scored Questions	Approx. Count
1. AI Foundations	15%	11	11-12
2. Understanding Failure Modes	15%	11	11-12
3. Architecting Guardrails	25%	19	18-20
4. Implementing Guardrails	25%	19	18-20
5. Validating Guardrails	20%	15	14-16
Total	100%	75

5 additional unscored pilot questions are distributed randomly throughout the exam. Candidates cannot distinguish pilot questions from scored questions.

Question Types

Type 1: Multiple Choice (Single Answer)

Standard four-option questions with one correct answer. Tests factual knowledge and conceptual understanding.

Example:

A user sends a prompt containing encoded Base64 text that, when decoded, contains instructions to ignore the system prompt. What type of attack is this?

A. Social engineering attack B. Direct prompt injection attack C. Indirect prompt injection attack D. Model inversion attack

Correct: B — This is a direct prompt injection attack using encoding to obfuscate the malicious instruction. It is “direct” because the user is intentionally crafting the input, not relying on injected content from an external source.

Approximately 40% of exam questions.

Type 2: Multiple Select (Choose All That Apply)

Questions with 5-6 options where 2-3 are correct. Tests breadth of knowledge. Questions clearly state how many answers to select.

Example:

A RAG-based AI assistant retrieves documents from a shared knowledge base. Which THREE of the following are guardrail concerns specific to this architecture? (Choose 3)

A. Indirect prompt injection via retrieved documents B. Model hallucination due to high temperature settings C. Retrieval of documents the user is not authorized to access D. Staleness of retrieved information leading to outdated answers E. Excessive GPU memory consumption during inference

Correct: A, C, D — These are RAG-specific concerns. B and E are general AI concerns not specific to RAG architecture.

Approximately 25% of exam questions.

Type 3: Scenario-Based Questions

Multi-paragraph scenarios describing a real-world situation, followed by a question about the best course of action, root cause analysis, or architecture recommendation. Tests applied knowledge and judgment.

Example:

Scenario: A fintech company has deployed an AI-powered customer support chatbot that helps users with account inquiries. The system uses RAG to retrieve relevant help articles and account-specific information. After launch, the security team discovers that some users are receiving responses that contain fragments of other users’ account details. The guardrail system includes output PII filtering, but it was configured to detect and redact PII patterns in free text (SSNs, emails, phone numbers) rather than validate data authorization.

What is the MOST important guardrail gap to address first?

A. Improve the PII regex patterns to catch more formats B. Add output toxicity filtering to prevent harmful responses C. Implement retrieval-level access controls so the AI only retrieves data the current user is authorized to see D. Add rate limiting to prevent users from extracting data through repeated queries

Correct: C — The root cause is that the retrieval system is pulling unauthorized data. PII filtering (A) is treating symptoms, not the cause. The AI should never see data it shouldn’t return. Access controls at the retrieval layer are the fundamental fix.

Approximately 35% of exam questions.

Detailed Exam Objectives by Domain

Domain 1: AI Foundations (15%)

1.1 — Explain how neural networks and LLMs work at a level sufficient to reason about guardrail requirements

Describe how neural networks learn through the training loop and why knowledge is distributed across weights
Explain the transformer architecture, self-attention, and why it replaced sequential models
Articulate why LLMs are pattern-matching systems producing statistical predictions, not reasoning engines
Relate model scale to emergent capabilities and risk factors

1.2 — Trace the complete LLM inference pipeline and identify guardrail interception points

Trace data from raw text through tokenization, embedding, attention, and output generation
Explain how the attention mechanism processes all tokens without distinguishing trusted from untrusted input
Describe how temperature and sampling parameters affect output variability and non-determinism
Identify where in the inference pipeline guardrails can intercept or validate data

1.3 — Describe the LLM training pipeline and explain why training alone cannot eliminate risks

Distinguish each stage of training (pre-training, instruction tuning, RLHF/DPO) and what each creates
Explain why instruction-following and safety behaviors are learned statistical preferences, not hard-coded rules
Identify which aspects of model behavior are controlled by the provider vs. the application developer
Articulate why training alone cannot eliminate hallucination, prompt injection, or jailbreaking

1.4 — Identify production AI system architecture patterns and map guardrail placement points

Map the components of production AI applications (API gateways, orchestration, model routing)
Describe common patterns (simple chat, RAG, agentic, multi-model) and their data flows
Identify all guardrail placement points (pre-model, post-model, system-level, retrieval-level)
Distinguish between model provider safety controls and application-level guardrails

Domain 2: Understanding Failure Modes (15%)

2.1 — Map architectural properties of LLMs to the specific risks they create

Connect attention mechanisms to prompt injection, distributed weights to data leakage, probabilistic generation to hallucination, and learned boundaries to jailbreaking
Identify trust boundaries in AI pipelines and explain why guardrails must exist at each
Explain how RAG and agentic patterns expand the attack surface
Articulate the defense-in-depth principle and why no single layer of defense is sufficient

2.2 — Classify common AI failure modes and map them to guardrail strategies

Identify hallucination, prompt injection, jailbreaking, data leakage, toxic output, off-topic drift, over-reliance, cascading failures, and identity/access failures from examples
Explain why each failure mode occurs at a technical level
Categorize failure modes by severity and likelihood for a given use case
Recommend appropriate guardrail strategies for each failure mode

2.3 — Conduct threat modeling for AI applications

Apply AI-specific threat modeling frameworks (including OWASP Top 10 for LLM Applications)
Identify adversary profiles, motivations, and capabilities
Map attack surfaces unique to AI (prompts, training data, retrieval corpora, tool integrations, MCP)
Assess supply chain risks (third-party models, poisoned datasets, untrusted tool servers)
Prioritize threats by likelihood and impact

Domain 3: Architecting Guardrails (25%)

3.1 — Design multi-layered guardrail strategies using the guardrail taxonomy

Classify guardrails by type (input, output, system-level, retrieval, agentic, human-in-the-loop)
Explain the tradeoffs between each type (latency, cost, coverage, false positive rate)
Design strategies that combine multiple guardrail types for defense in depth
Justify design decisions based on risk assessment

3.2 — Design input guardrail pipelines

Select and sequence input validation techniques (sanitization, schema enforcement, injection detection)
Choose injection detection approaches (pattern-based, classifier-based, LLM-as-judge) based on risk tolerance
Design identity-aware guardrail systems with multi-tenant isolation
Layer cheap checks before expensive checks

3.3 — Design output guardrail pipelines

Select content filtering approaches (toxicity, bias, appropriateness classifiers)
Design PII detection and redaction strategies
Implement groundedness and factuality checks for RAG systems
Design refusal responses that are helpful but do not reveal system internals

3.4 — Design system-level guardrails

Write effective safety-oriented system prompts
Design fallback and circuit breaker patterns for graceful degradation
Architect multi-model validation systems (using one model to check another)
Plan conversation memory management, timeouts, and resource limits

3.5 — Design guardrails specific to RAG systems

Implement retrieval-level access controls respecting user permissions
Defend against indirect prompt injection via retrieved documents
Design citation and attribution systems
Handle relevance filtering, contradictory sources, and knowledge base staleness

3.6 — Design guardrails for agentic AI systems

Define tool access policies and action confirmation workflows
Implement scope limiting, sandboxing, and resource caps
Design identity delegation models that prevent privilege escalation
Evaluate trust boundaries and security implications of tool integration protocols (MCP)

Domain 4: Implementing Guardrails (25%)

4.1 — Select and implement appropriate detection and classification techniques

Choose between rule-based, ML-based, embedding-based, and LLM-as-judge approaches
Combine detection methods into layered pipelines (fast/cheap first, slow/expensive last)
Tune detection thresholds to balance false positives and false negatives
Evaluate approaches on latency, cost, accuracy, and maintainability

4.2 — Implement structured output enforcement

Define and validate output schemas (JSON schema, function call constraints)
Build retry and error recovery logic for malformed outputs
Choose between constrained decoding and post-hoc validation
Handle edge cases in AI-generated structured output

4.3 — Implement PII and sensitive data handling

Deploy PII detection (regex, NER, purpose-built detectors) appropriate to sensitivity level
Choose between redaction, masking, and tokenization strategies
Design data flows that minimize model exposure to PII
Configure privacy-preserving logging

4.4 — Evaluate and integrate guardrail frameworks and tooling

Assess guardrail tools against requirements (categories, not specific vendors)
Integrate guardrails at the appropriate layer (SDK, proxy, gateway)
Decide when to build custom vs. use existing solutions
Manage guardrail configuration as code with version control

4.5 — Apply prompt engineering techniques for safety

Write defensive system prompts with clear behavioral boundaries
Structure prompts to minimize injection surface (delimiters, separation of instructions and user content)
Use few-shot examples to demonstrate desired refusal behavior
Identify dynamic prompt construction risks and mitigations

Domain 5: Validating Guardrails (20%)

5.1 — Plan and execute adversarial testing and red teaming against AI guardrails

Structure a red team engagement for an AI system (planning, execution, reporting)
Apply known prompt injection, jailbreak, and encoding-based attack techniques
Distinguish between theoretical vulnerabilities and practically exploitable ones
Document and report findings in a format actionable by engineering teams

5.2 — Build and maintain guardrail test suites

Create unit tests for individual guardrail components (classifiers, filters, validators)
Build integration and regression tests for guardrail pipelines
Design edge case tests (encoding variations, language mixing, boundary inputs)
Measure performance impact of guardrails under load

5.3 — Define and measure guardrail effectiveness metrics

Calculate precision, recall, and F1 for guardrail classifiers
Measure and report false positive rate (user friction) and false negative rate (safety gaps)
Track latency percentiles (p50, p95, p99) and cost per evaluation
Use metrics to drive guardrail tuning decisions

5.4 — Design monitoring and observability for guardrail systems

Identify key operational metrics (block rate, bypass rate, latency, error rate)
Configure alerts that catch failures without creating alert fatigue
Design structured, privacy-preserving logging for guardrail events
Investigate anomalies and conduct forensic analysis of guardrail logs

5.5 — Implement continuous validation and lifecycle management

Deploy canary tests and synthetic adversarial probes for guardrail releases
Re-validate guardrails after model updates or configuration changes
Plan for guardrail drift, versioning, and rollback
Execute incident response procedures for guardrail bypass events
Identify and address guardrail debt (outdated or redundant guardrails)

Exam Development and Maintenance

Question Development Process

Subject matter experts (SMEs) write questions aligned to exam objectives
Each question is reviewed by at least two additional SMEs
Questions are pilot-tested as unscored items before becoming scored
Statistical analysis (item difficulty, discrimination index) determines if pilot questions become scored
Questions are reviewed annually for relevance and accuracy

Cut Score Methodology

The passing score of 70% was established using a modified Angoff method
A panel of SMEs rated the probability that a minimally qualified candidate would answer each question correctly
The cut score will be re-evaluated with each major exam version update

Exam Security

Question pool is significantly larger than any single exam form (minimum 3x)
Questions are randomized per candidate
Multiple exam forms are in rotation at any time
Statistical monitoring detects potential exam fraud (unusual score patterns, timing anomalies)

Version Control

Minor updates (adding/retiring questions, fixing errors) do not change the version number
Major updates (domain weight changes, new domains, significant objective changes) increment the version
Candidates are notified at least 90 days before major version changes
Both old and new versions are available for 60 days during transitions

Candidate Preparation Guide

Recommended Experience

While there are no formal prerequisites to sit for the exam, candidates are most successful with:

Completion of the cAIge training program (all five modules)
1-2 years of experience working with AI/ML systems in any capacity
Basic understanding of software architecture and API design
Familiarity with at least one programming language
Exposure to security concepts (not required to be a security specialist)

Study Approach

Review the competency matrix to understand all knowledge areas and skills
Complete the official cAIge training program (five text-based modules)
Build hands-on experience by implementing guardrails in a practice environment
Take practice exams to identify knowledge gaps
Review areas of weakness against the competency matrix

What This Exam Does NOT Test

Vendor-specific product configuration (e.g., “configure NeMo Guardrails rule X”)
Deep machine learning theory (e.g., backpropagation math, attention mechanism implementation)
General software engineering skills (e.g., data structures, algorithms)
Legal expertise (e.g., detailed regulatory interpretation)
Memorization of specific regulatory article numbers or section references

← PreviousCompetency Matrix Next →AI Foundations