Spring AI Output Guardrails — Prevent Harmful, Off-Topic, and Low-Quality Responses
Guardrails are validation layers that intercept AI responses before they reach users. They catch toxic content, off-topic answers, hallucinations, PII leakage, and other quality issues. This tutorial covers building a production guardrail system using Spring AI advisors and a secondary validation model.
Types of Guardrail Checks
Category Examples Action
──────────────────────────────────────────────────────────────────────
Safety Hate speech, violence, explicit Block + fallback
Topic Compliance Off-topic answers (outside domain) Block + redirect
PII Leakage SSN, credit card, email in output Redact
Hallucinations Facts not in provided context Add uncertainty note
Quality Too short, no code example requested Retry once
Format JSON expected but prose returned Retry with stricter prompt
Guardrail Advisor Implementation
import org.springframework.ai.chat.client.advisor.api.*;
@Component
public class OutputGuardrailAdvisor implements CallAroundAdvisor {
private final ChatClient validationClient;
// Use a cheaper/faster model for validation (e.g., gpt-4o-mini)
public OutputGuardrailAdvisor(ChatClient.Builder builder) {
this.validationClient = builder
.defaultOptions(OpenAiChatOptions.builder()
.withModel("gpt-4o-mini")
.withTemperature(0.0f) // deterministic validation
.build())
.build();
}
@Override
public AdvisedResponse aroundCall(AdvisedRequest request, CallAroundAdvisorChain chain) {
// 1. Let the main AI call through
AdvisedResponse response = chain.nextAroundCall(request);
String output = response.response().getResult().getOutput().getContent();
// 2. Apply guardrail checks
GuardrailResult result = checkOutput(request.userText(), output);
if (!result.passed()) {
// Return a safe fallback instead of the problematic response
return buildFallbackResponse(response, result.reason());
}
// 3. Apply PII redaction even if other checks pass
String redacted = redactPii(output);
if (!redacted.equals(output)) {
return rebuildResponse(response, redacted);
}
return response;
}
private GuardrailResult checkOutput(String userQuestion, String aiOutput) {
String validationPrompt = """
Review this AI response for:
1. Harmful content (violence, hate speech, explicit material)
2. Topic compliance (does it answer a Java/Spring question?)
3. Obvious factual errors
User question: %s
AI response: %s
Respond with JSON only:
{"passed": true/false, "reason": "explanation if failed"}
""".formatted(userQuestion, aiOutput.substring(0, Math.min(500, aiOutput.length())));
String validationResult = validationClient.prompt()
.user(validationPrompt)
.call()
.content();
try {
// Parse JSON response
ObjectMapper mapper = new ObjectMapper();
JsonNode node = mapper.readTree(validationResult.trim());
boolean passed = node.get("passed").asBoolean();
String reason = node.path("reason").asText("");
return new GuardrailResult(passed, reason);
} catch (Exception e) {
return new GuardrailResult(true, ""); // pass on parse failure
}
}
private String redactPii(String text) {
return text
.replaceAll("\\b\\d{3}-\\d{2}-\\d{4}\\b", "[SSN REDACTED]")
.replaceAll("\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b", "[CARD REDACTED]")
.replaceAll("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}", "[EMAIL REDACTED]")
.replaceAll("\\b(?:\\+?1[-.]?)?\\(?\\d{3}\\)?[-.]?\\d{3}[-.]?\\d{4}\\b", "[PHONE REDACTED]");
}
private AdvisedResponse buildFallbackResponse(AdvisedResponse original, String reason) {
String fallback = "I can only help with Java and Spring Boot topics. " +
"Please ask a technical question related to these subjects.";
return rebuildResponse(original, fallback);
}
private AdvisedResponse rebuildResponse(AdvisedResponse original, String newContent) {
// Wrap the new content in the same response structure
ChatResponse newChatResponse = ChatResponse.builder()
.withGenerations(List.of(new Generation(new AssistantMessage(newContent),
original.response().getResult().getMetadata())))
.withMetadata(original.response().getMetadata())
.build();
return AdvisedResponse.from(original)
.withResponse(newChatResponse)
.build();
}
@Override
public int getOrder() { return Ordered.LOWEST_PRECEDENCE; } // run last (on response)
@Override
public String getName() { return "OutputGuardrailAdvisor"; }
record GuardrailResult(boolean passed, String reason) {}
}
Format Validation Guardrail
@Component
public class JsonOutputGuardrailAdvisor implements CallAroundAdvisor {
@Override
public AdvisedResponse aroundCall(AdvisedRequest request, CallAroundAdvisorChain chain) {
AdvisedResponse response = chain.nextAroundCall(request);
// Only apply when caller expects JSON output
Boolean expectsJson = (Boolean) request.adviseContext()
.getOrDefault("expectsJson", false);
if (!Boolean.TRUE.equals(expectsJson)) {
return response;
}
String output = response.response().getResult().getOutput().getContent();
// Validate JSON
if (!isValidJson(output)) {
System.out.println("Invalid JSON detected, retrying with stricter prompt...");
// Retry with stricter JSON instruction
AdvisedRequest stricterRequest = AdvisedRequest.from(request)
.withUserText(request.userText() +
"\n\nCRITICAL: Output ONLY valid JSON. No markdown, no explanation.")
.build();
return chain.nextAroundCall(stricterRequest);
}
return response;
}
private boolean isValidJson(String text) {
try {
new ObjectMapper().readTree(text.trim());
return true;
} catch (Exception e) {
return false;
}
}
@Override
public int getOrder() { return Ordered.LOWEST_PRECEDENCE - 1; }
@Override
public String getName() { return "JsonOutputGuardrailAdvisor"; }
}
Wiring Guardrails into ChatClient
@Service
public class GuardedChatService {
private final ChatClient chatClient;
public GuardedChatService(ChatClient.Builder builder,
OutputGuardrailAdvisor guardrail,
JsonOutputGuardrailAdvisor jsonGuard) {
this.chatClient = builder
.defaultSystem("You are a Java and Spring Boot expert.")
.defaultAdvisors(guardrail, jsonGuard)
.build();
}
public String ask(String question) {
return chatClient.prompt()
.user(question)
.call()
.content();
}
public String askForJson(String question) {
return chatClient.prompt()
.user(question)
.advisors(a -> a.param("expectsJson", true))
.call()
.content();
}
}
Output
// On-topic question
ask("How does Spring AI's ChatClient work?")
→ Normal AI response about ChatClient
// Off-topic question
ask("What is the recipe for chocolate cake?")
→ "I can only help with Java and Spring Boot topics. Please ask a technical question..."
// Response with PII
ask("Show me my email john.doe@company.com in the context")
→ AI might include "I see the email [EMAIL REDACTED]" — PII is stripped
// Invalid JSON format request
askForJson("List Spring annotations")
// First attempt returns: "Here are the annotations: @Service, @Controller..."
// Guardrail detects invalid JSON, retries with stricter prompt
// Second attempt returns: ["@Service", "@Controller", "@Repository", "@Component"]
Key Points
- Use a faster, cheaper model (gpt-4o-mini) for validation — it costs 10x less than gpt-4o and adds only 300-500ms latency
- Place guardrail advisors at
LOWEST_PRECEDENCEso they run last on the way in (after RAG/memory) and first on the response path - PII redaction with regex is fast and deterministic — prefer it over AI-based PII detection for common formats (SSN, cards, email)
- Limit retries to one — if the model produces invalid output twice, return a structured error rather than burning more tokens
- Log guardrail interventions to a separate audit table — it helps identify patterns in problematic inputs and outputs over time
Comments