Java9R: Spring AI Best Practices — 20 Rules for Production-Ready AI in Java

Spring AI Best Practices — 20 Rules for Production-Ready AI in Java

After building with Spring AI across multiple production systems, certain patterns consistently separate working prototypes from reliable production systems. This tutorial consolidates the most important rules, anti-patterns to avoid, and concrete improvements you can apply today.

Rule 1 — Never Hardcode Model Names

// BAD — hardcoded model, requires code change to update
ChatClient client = builder
        .defaultOptions(OpenAiChatOptions.builder()
                .withModel("gpt-4o-mini")   // ← hardcoded
                .build())
        .build();

// GOOD — externalized, change via environment variable
@Value("${ai.chat.model:gpt-4o-mini}")
private String chatModel;

ChatClient client = builder
        .defaultOptions(OpenAiChatOptions.builder()
                .withModel(chatModel)       // ← from config
                .build())
        .build();

Rule 2 — Always Set Temperature Explicitly

// BAD — temperature varies by provider default (0.7 or 1.0)
chatClient.prompt().user(question).call().content();

// GOOD — set temperature per use case
// Classification / extraction: temperature=0.0 (deterministic)
// Creative writing: temperature=0.9
// General answers: temperature=0.7
.defaultOptions(OpenAiChatOptions.builder()
        .withTemperature(0.0f)   // explicit
        .build())

Rule 3 — Limit Context Window Usage

// BAD — pass entire conversation history (grows unboundedly)
MessageChatMemoryAdvisor.builder(memory)
        .build();   // defaults to unlimited history

// GOOD — limit to last N messages
MessageChatMemoryAdvisor.builder(memory)
        .withChatMemoryRetrieveSize(10)   // last 10 messages only
        .build();

// ALSO: limit document chunks in RAG
SearchRequest.query(q).withTopK(5);  // not 50 — too many chunks dilute quality

Rule 4 — Handle AI Failures Gracefully

// BAD — AI failure crashes the feature
public String getRecommendation(String userId) {
    return chatClient.prompt().user(buildPrompt(userId)).call().content();
}

// GOOD — degrade gracefully, never fail the feature
public String getRecommendation(String userId) {
    try {
        return chatClient.prompt().user(buildPrompt(userId)).call().content();
    } catch (Exception e) {
        log.warn("AI recommendation failed for {}: {}", userId, e.getMessage());
        return getDefaultRecommendation(userId);  // rule-based fallback
    }
}

Rule 5 — Validate Structured Output Fields

// BAD — trust AI output blindly
Invoice invoice = chatClient.prompt()
        .user("Extract invoice: " + text)
        .call()
        .entity(Invoice.class);
// invoice.totalAmount() could be null!

// GOOD — validate required fields
Invoice invoice = chatClient.prompt().user(...).call().entity(Invoice.class);

if (invoice.totalAmount() == null || invoice.invoiceNumber() == null) {
    throw new ExtractionException("Required fields missing from AI extraction");
}
if (invoice.totalAmount().compareTo(BigDecimal.ZERO) < 0) {
    throw new ExtractionException("Invalid negative total amount");
}

Rule 6 — Use @Cacheable on Deterministic AI Calls

// BAD — calls AI every time for the same input
public String classifyContent(String text) {
    return chatClient.prompt()
            .user("Classify: " + text)
            .call().content();
}

// GOOD — cache identical inputs (temperature=0.0 makes output deterministic)
@Cacheable(value = "ai-classifications", key = "#text.hashCode()")
public String classifyContent(String text) {
    return chatClient.prompt()
            .user("Classify: " + text)
            .call().content();
}

Rule 7 — Log Token Usage in Production

// Always extract and log usage metadata
ChatResponse response = chatClient.prompt().user(q).call().chatResponse();

Usage usage = response.getMetadata().getUsage();
log.info("AI call: model={} input_tokens={} output_tokens={} total_tokens={}",
        response.getMetadata().getModel(),
        usage.getPromptTokens(),
        usage.getGenerationTokens(),
        usage.getTotalTokens());

Rule 8 — Test AI Code with Mock Clients

// GOOD — test service logic without real AI calls
@ExtendWith(MockitoExtension.class)
class ProductServiceTest {

    @Mock
    private ChatClient.Builder builderMock;

    @Mock
    private ChatClient chatClientMock;

    @Mock
    private ChatClient.ChatClientRequestSpec requestMock;

    @Mock
    private ChatClient.CallResponseSpec responseMock;

    @BeforeEach
    void setup() {
        when(builderMock.build()).thenReturn(chatClientMock);
        when(chatClientMock.prompt()).thenReturn(requestMock);
        when(requestMock.user(anyString())).thenReturn(requestMock);
        when(requestMock.system(anyString())).thenReturn(requestMock);
        when(requestMock.call()).thenReturn(responseMock);
        when(responseMock.content()).thenReturn("positive");
    }

    @Test
    void classifySentiment_returnsPositive() {
        ProductService service = new ProductService(builderMock);
        String result = service.classifySentiment("Great product!");
        assertThat(result).isEqualTo("positive");
    }
}

Rule 9 — Use Separate Models per Task Type

// BAD — one model for everything
ChatClient single = builder.defaultOptions(
        OpenAiChatOptions.builder().withModel("gpt-4o").build()).build();

// GOOD — right model for each task (cost optimization)
@Bean("fastClient")
public ChatClient fastClient(ChatClient.Builder b) {
    return b.defaultOptions(OpenAiChatOptions.builder()
            .withModel("gpt-4o-mini").build()).build();  // classification, extraction
}

@Bean("qualityClient")
public ChatClient qualityClient(ChatClient.Builder b) {
    return b.defaultOptions(OpenAiChatOptions.builder()
            .withModel("gpt-4o").build()).build();  // code review, legal analysis
}

Rule 10 — Never Store Raw API Keys in Application Properties

# BAD
spring.ai.openai.api-key=sk-abc123...   ← committed to git!

# GOOD — environment variable reference
spring.ai.openai.api-key=${OPENAI_API_KEY}

# Better — Spring Cloud Config or Vault
spring.config.import=vault://secret/ai-service
spring.ai.openai.api-key=${vault.openai.api-key}

Quick Reference — All 20 Rules

Configuration:
  1. Externalize model names via @Value / application.properties
  2. Set temperature explicitly (0.0 for extraction, 0.7 for chat)
  3. Set max-tokens to prevent runaway costs
  4. Never commit API keys — use environment variables or Vault

Reliability:
  5. Wrap AI calls in try-catch with graceful fallbacks
  6. Add circuit breakers (Resilience4j) for provider outages
  7. Retry transient failures (429, 503) with exponential backoff
  8. Always send a done signal in streaming responses

Quality:
  9.  Validate required fields in structured output before using them
  10. Use evaluation tests (Spring AI Evaluators) for RAG quality
  11. Test with mock ChatClient — never call real AI in unit tests
  12. Add input validation before every AI call

Performance:
  13. Cache deterministic AI calls with @Cacheable
  14. Limit conversation memory to last N messages (not unlimited)
  15. Use parallel calls (ExecutorService) for independent AI tasks
  16. Choose the cheapest model that meets quality requirements

Operations:
  17. Log token usage (input + output) on every production AI call
  18. Use separate models per task (fast model for classify, strong for review)
  19. Store conversation history in PostgreSQL or Redis (not in-memory)
  20. Add cost alerts — set Micrometer threshold alerts on spend

Key Points

Rules 1-4 (externalize, temperature, context limits, fallbacks) eliminate the most common production incidents with minimal effort
Rule 13 (caching) is often the single biggest cost reduction lever — many AI calls ask the same questions
Rule 11 (mock testing) is critical for CI/CD velocity — real AI calls in tests add 30+ seconds and create flaky tests
Rule 20 (cost alerts) should be set up on day one — an infinite loop or runaway batch job can generate thousands of dollars in AI costs overnight
Start with rules 1, 5, 10, 17, and 20 — these five alone prevent the most painful production failures and surprises

Spring AI Best Practices — 20 Rules for Production-Ready AI in Java

Spring AI Best Practices — 20 Rules for Production-Ready AI in Java

Rule 1 — Never Hardcode Model Names

Rule 2 — Always Set Temperature Explicitly

Rule 3 — Limit Context Window Usage

Rule 4 — Handle AI Failures Gracefully

Rule 5 — Validate Structured Output Fields

Rule 6 — Use @Cacheable on Deterministic AI Calls

Rule 7 — Log Token Usage in Production

Rule 8 — Test AI Code with Mock Clients

Rule 9 — Use Separate Models per Task Type

Rule 10 — Never Store Raw API Keys in Application Properties

Quick Reference — All 20 Rules

Key Points

Comments