Java9R: Spring AI Output Guardrails — Prevent Harmful, Off-Topic, and Low-Quality Responses

Spring AI Output Guardrails — Prevent Harmful, Off-Topic, and Low-Quality Responses

Guardrails are validation layers that intercept AI responses before they reach users. They catch toxic content, off-topic answers, hallucinations, PII leakage, and other quality issues. This tutorial covers building a production guardrail system using Spring AI advisors and a secondary validation model.

Types of Guardrail Checks

Category          Examples                              Action
──────────────────────────────────────────────────────────────────────
Safety            Hate speech, violence, explicit       Block + fallback
Topic Compliance  Off-topic answers (outside domain)   Block + redirect
PII Leakage       SSN, credit card, email in output     Redact
Hallucinations    Facts not in provided context         Add uncertainty note
Quality           Too short, no code example requested  Retry once
Format            JSON expected but prose returned      Retry with stricter prompt

Guardrail Advisor Implementation

import org.springframework.ai.chat.client.advisor.api.*;

@Component
public class OutputGuardrailAdvisor implements CallAroundAdvisor {

    private final ChatClient validationClient;

    // Use a cheaper/faster model for validation (e.g., gpt-4o-mini)
    public OutputGuardrailAdvisor(ChatClient.Builder builder) {
        this.validationClient = builder
                .defaultOptions(OpenAiChatOptions.builder()
                        .withModel("gpt-4o-mini")
                        .withTemperature(0.0f)   // deterministic validation
                        .build())
                .build();
    }

    @Override
    public AdvisedResponse aroundCall(AdvisedRequest request, CallAroundAdvisorChain chain) {
        // 1. Let the main AI call through
        AdvisedResponse response = chain.nextAroundCall(request);

        String output = response.response().getResult().getOutput().getContent();

        // 2. Apply guardrail checks
        GuardrailResult result = checkOutput(request.userText(), output);

        if (!result.passed()) {
            // Return a safe fallback instead of the problematic response
            return buildFallbackResponse(response, result.reason());
        }

        // 3. Apply PII redaction even if other checks pass
        String redacted = redactPii(output);
        if (!redacted.equals(output)) {
            return rebuildResponse(response, redacted);
        }

        return response;
    }

    private GuardrailResult checkOutput(String userQuestion, String aiOutput) {
        String validationPrompt = """
                Review this AI response for:
                1. Harmful content (violence, hate speech, explicit material)
                2. Topic compliance (does it answer a Java/Spring question?)
                3. Obvious factual errors

                User question: %s
                AI response: %s

                Respond with JSON only:
                {"passed": true/false, "reason": "explanation if failed"}
                """.formatted(userQuestion, aiOutput.substring(0, Math.min(500, aiOutput.length())));

        String validationResult = validationClient.prompt()
                .user(validationPrompt)
                .call()
                .content();

        try {
            // Parse JSON response
            ObjectMapper mapper = new ObjectMapper();
            JsonNode node = mapper.readTree(validationResult.trim());
            boolean passed = node.get("passed").asBoolean();
            String reason  = node.path("reason").asText("");
            return new GuardrailResult(passed, reason);
        } catch (Exception e) {
            return new GuardrailResult(true, "");  // pass on parse failure
        }
    }

    private String redactPii(String text) {
        return text
                .replaceAll("\\b\\d{3}-\\d{2}-\\d{4}\\b", "[SSN REDACTED]")
                .replaceAll("\\b\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{4}\\b", "[CARD REDACTED]")
                .replaceAll("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}", "[EMAIL REDACTED]")
                .replaceAll("\\b(?:\\+?1[-.]?)?\\(?\\d{3}\\)?[-.]?\\d{3}[-.]?\\d{4}\\b", "[PHONE REDACTED]");
    }

    private AdvisedResponse buildFallbackResponse(AdvisedResponse original, String reason) {
        String fallback = "I can only help with Java and Spring Boot topics. " +
                          "Please ask a technical question related to these subjects.";
        return rebuildResponse(original, fallback);
    }

    private AdvisedResponse rebuildResponse(AdvisedResponse original, String newContent) {
        // Wrap the new content in the same response structure
        ChatResponse newChatResponse = ChatResponse.builder()
                .withGenerations(List.of(new Generation(new AssistantMessage(newContent),
                        original.response().getResult().getMetadata())))
                .withMetadata(original.response().getMetadata())
                .build();
        return AdvisedResponse.from(original)
                .withResponse(newChatResponse)
                .build();
    }

    @Override
    public int getOrder() { return Ordered.LOWEST_PRECEDENCE; }  // run last (on response)

    @Override
    public String getName() { return "OutputGuardrailAdvisor"; }

    record GuardrailResult(boolean passed, String reason) {}
}

Format Validation Guardrail

@Component
public class JsonOutputGuardrailAdvisor implements CallAroundAdvisor {

    @Override
    public AdvisedResponse aroundCall(AdvisedRequest request, CallAroundAdvisorChain chain) {
        AdvisedResponse response = chain.nextAroundCall(request);

        // Only apply when caller expects JSON output
        Boolean expectsJson = (Boolean) request.adviseContext()
                .getOrDefault("expectsJson", false);
        if (!Boolean.TRUE.equals(expectsJson)) {
            return response;
        }

        String output = response.response().getResult().getOutput().getContent();

        // Validate JSON
        if (!isValidJson(output)) {
            System.out.println("Invalid JSON detected, retrying with stricter prompt...");

            // Retry with stricter JSON instruction
            AdvisedRequest stricterRequest = AdvisedRequest.from(request)
                    .withUserText(request.userText() +
                            "\n\nCRITICAL: Output ONLY valid JSON. No markdown, no explanation.")
                    .build();

            return chain.nextAroundCall(stricterRequest);
        }

        return response;
    }

    private boolean isValidJson(String text) {
        try {
            new ObjectMapper().readTree(text.trim());
            return true;
        } catch (Exception e) {
            return false;
        }
    }

    @Override
    public int getOrder() { return Ordered.LOWEST_PRECEDENCE - 1; }

    @Override
    public String getName() { return "JsonOutputGuardrailAdvisor"; }
}

Wiring Guardrails into ChatClient

@Service
public class GuardedChatService {

    private final ChatClient chatClient;

    public GuardedChatService(ChatClient.Builder builder,
                               OutputGuardrailAdvisor  guardrail,
                               JsonOutputGuardrailAdvisor jsonGuard) {
        this.chatClient = builder
                .defaultSystem("You are a Java and Spring Boot expert.")
                .defaultAdvisors(guardrail, jsonGuard)
                .build();
    }

    public String ask(String question) {
        return chatClient.prompt()
                .user(question)
                .call()
                .content();
    }

    public String askForJson(String question) {
        return chatClient.prompt()
                .user(question)
                .advisors(a -> a.param("expectsJson", true))
                .call()
                .content();
    }
}

Output

// On-topic question
ask("How does Spring AI's ChatClient work?")
→ Normal AI response about ChatClient

// Off-topic question
ask("What is the recipe for chocolate cake?")
→ "I can only help with Java and Spring Boot topics. Please ask a technical question..."

// Response with PII
ask("Show me my email john.doe@company.com in the context")
→ AI might include "I see the email [EMAIL REDACTED]" — PII is stripped

// Invalid JSON format request
askForJson("List Spring annotations")
// First attempt returns: "Here are the annotations: @Service, @Controller..."
// Guardrail detects invalid JSON, retries with stricter prompt
// Second attempt returns: ["@Service", "@Controller", "@Repository", "@Component"]

Key Points

Use a faster, cheaper model (gpt-4o-mini) for validation — it costs 10x less than gpt-4o and adds only 300-500ms latency
Place guardrail advisors at LOWEST_PRECEDENCE so they run last on the way in (after RAG/memory) and first on the response path
PII redaction with regex is fast and deterministic — prefer it over AI-based PII detection for common formats (SSN, cards, email)
Limit retries to one — if the model produces invalid output twice, return a structured error rather than burning more tokens
Log guardrail interventions to a separate audit table — it helps identify patterns in problematic inputs and outputs over time

Spring AI Output Guardrails — Prevent Harmful, Off-Topic, and Low-Quality Responses

Spring AI Output Guardrails — Prevent Harmful, Off-Topic, and Low-Quality Responses

Types of Guardrail Checks

Guardrail Advisor Implementation

Format Validation Guardrail

Wiring Guardrails into ChatClient

Output

Key Points

Comments