Spring AI Best Practices — 20 Rules for Production-Ready AI in Java
After building with Spring AI across multiple production systems, certain patterns consistently separate working prototypes from reliable production systems. This tutorial consolidates the most important rules, anti-patterns to avoid, and concrete improvements you can apply today.
Rule 1 — Never Hardcode Model Names
// BAD — hardcoded model, requires code change to update
ChatClient client = builder
.defaultOptions(OpenAiChatOptions.builder()
.withModel("gpt-4o-mini") // ← hardcoded
.build())
.build();
// GOOD — externalized, change via environment variable
@Value("${ai.chat.model:gpt-4o-mini}")
private String chatModel;
ChatClient client = builder
.defaultOptions(OpenAiChatOptions.builder()
.withModel(chatModel) // ← from config
.build())
.build();
Rule 2 — Always Set Temperature Explicitly
// BAD — temperature varies by provider default (0.7 or 1.0)
chatClient.prompt().user(question).call().content();
// GOOD — set temperature per use case
// Classification / extraction: temperature=0.0 (deterministic)
// Creative writing: temperature=0.9
// General answers: temperature=0.7
.defaultOptions(OpenAiChatOptions.builder()
.withTemperature(0.0f) // explicit
.build())
Rule 3 — Limit Context Window Usage
// BAD — pass entire conversation history (grows unboundedly)
MessageChatMemoryAdvisor.builder(memory)
.build(); // defaults to unlimited history
// GOOD — limit to last N messages
MessageChatMemoryAdvisor.builder(memory)
.withChatMemoryRetrieveSize(10) // last 10 messages only
.build();
// ALSO: limit document chunks in RAG
SearchRequest.query(q).withTopK(5); // not 50 — too many chunks dilute quality
Rule 4 — Handle AI Failures Gracefully
// BAD — AI failure crashes the feature
public String getRecommendation(String userId) {
return chatClient.prompt().user(buildPrompt(userId)).call().content();
}
// GOOD — degrade gracefully, never fail the feature
public String getRecommendation(String userId) {
try {
return chatClient.prompt().user(buildPrompt(userId)).call().content();
} catch (Exception e) {
log.warn("AI recommendation failed for {}: {}", userId, e.getMessage());
return getDefaultRecommendation(userId); // rule-based fallback
}
}
Rule 5 — Validate Structured Output Fields
// BAD — trust AI output blindly
Invoice invoice = chatClient.prompt()
.user("Extract invoice: " + text)
.call()
.entity(Invoice.class);
// invoice.totalAmount() could be null!
// GOOD — validate required fields
Invoice invoice = chatClient.prompt().user(...).call().entity(Invoice.class);
if (invoice.totalAmount() == null || invoice.invoiceNumber() == null) {
throw new ExtractionException("Required fields missing from AI extraction");
}
if (invoice.totalAmount().compareTo(BigDecimal.ZERO) < 0) {
throw new ExtractionException("Invalid negative total amount");
}
Rule 6 — Use @Cacheable on Deterministic AI Calls
// BAD — calls AI every time for the same input
public String classifyContent(String text) {
return chatClient.prompt()
.user("Classify: " + text)
.call().content();
}
// GOOD — cache identical inputs (temperature=0.0 makes output deterministic)
@Cacheable(value = "ai-classifications", key = "#text.hashCode()")
public String classifyContent(String text) {
return chatClient.prompt()
.user("Classify: " + text)
.call().content();
}
Rule 7 — Log Token Usage in Production
// Always extract and log usage metadata
ChatResponse response = chatClient.prompt().user(q).call().chatResponse();
Usage usage = response.getMetadata().getUsage();
log.info("AI call: model={} input_tokens={} output_tokens={} total_tokens={}",
response.getMetadata().getModel(),
usage.getPromptTokens(),
usage.getGenerationTokens(),
usage.getTotalTokens());
Rule 8 — Test AI Code with Mock Clients
// GOOD — test service logic without real AI calls
@ExtendWith(MockitoExtension.class)
class ProductServiceTest {
@Mock
private ChatClient.Builder builderMock;
@Mock
private ChatClient chatClientMock;
@Mock
private ChatClient.ChatClientRequestSpec requestMock;
@Mock
private ChatClient.CallResponseSpec responseMock;
@BeforeEach
void setup() {
when(builderMock.build()).thenReturn(chatClientMock);
when(chatClientMock.prompt()).thenReturn(requestMock);
when(requestMock.user(anyString())).thenReturn(requestMock);
when(requestMock.system(anyString())).thenReturn(requestMock);
when(requestMock.call()).thenReturn(responseMock);
when(responseMock.content()).thenReturn("positive");
}
@Test
void classifySentiment_returnsPositive() {
ProductService service = new ProductService(builderMock);
String result = service.classifySentiment("Great product!");
assertThat(result).isEqualTo("positive");
}
}
Rule 9 — Use Separate Models per Task Type
// BAD — one model for everything
ChatClient single = builder.defaultOptions(
OpenAiChatOptions.builder().withModel("gpt-4o").build()).build();
// GOOD — right model for each task (cost optimization)
@Bean("fastClient")
public ChatClient fastClient(ChatClient.Builder b) {
return b.defaultOptions(OpenAiChatOptions.builder()
.withModel("gpt-4o-mini").build()).build(); // classification, extraction
}
@Bean("qualityClient")
public ChatClient qualityClient(ChatClient.Builder b) {
return b.defaultOptions(OpenAiChatOptions.builder()
.withModel("gpt-4o").build()).build(); // code review, legal analysis
}
Rule 10 — Never Store Raw API Keys in Application Properties
# BAD
spring.ai.openai.api-key=sk-abc123... ← committed to git!
# GOOD — environment variable reference
spring.ai.openai.api-key=${OPENAI_API_KEY}
# Better — Spring Cloud Config or Vault
spring.config.import=vault://secret/ai-service
spring.ai.openai.api-key=${vault.openai.api-key}
Quick Reference — All 20 Rules
Configuration:
1. Externalize model names via @Value / application.properties
2. Set temperature explicitly (0.0 for extraction, 0.7 for chat)
3. Set max-tokens to prevent runaway costs
4. Never commit API keys — use environment variables or Vault
Reliability:
5. Wrap AI calls in try-catch with graceful fallbacks
6. Add circuit breakers (Resilience4j) for provider outages
7. Retry transient failures (429, 503) with exponential backoff
8. Always send a done signal in streaming responses
Quality:
9. Validate required fields in structured output before using them
10. Use evaluation tests (Spring AI Evaluators) for RAG quality
11. Test with mock ChatClient — never call real AI in unit tests
12. Add input validation before every AI call
Performance:
13. Cache deterministic AI calls with @Cacheable
14. Limit conversation memory to last N messages (not unlimited)
15. Use parallel calls (ExecutorService) for independent AI tasks
16. Choose the cheapest model that meets quality requirements
Operations:
17. Log token usage (input + output) on every production AI call
18. Use separate models per task (fast model for classify, strong for review)
19. Store conversation history in PostgreSQL or Redis (not in-memory)
20. Add cost alerts — set Micrometer threshold alerts on spend
Key Points
- Rules 1-4 (externalize, temperature, context limits, fallbacks) eliminate the most common production incidents with minimal effort
- Rule 13 (caching) is often the single biggest cost reduction lever — many AI calls ask the same questions
- Rule 11 (mock testing) is critical for CI/CD velocity — real AI calls in tests add 30+ seconds and create flaky tests
- Rule 20 (cost alerts) should be set up on day one — an infinite loop or runaway batch job can generate thousands of dollars in AI costs overnight
- Start with rules 1, 5, 10, 17, and 20 — these five alone prevent the most painful production failures and surprises
Comments