Spring AI LLM Comparison — OpenAI vs Anthropic Claude vs Google Gemini vs Ollama
Spring AI supports multiple LLM providers through a uniform API. Choosing the right model for each task is one of the most impactful decisions in an AI application — it affects quality, latency, and cost by orders of magnitude. This tutorial compares the major providers with concrete Spring AI configuration and practical guidance.
Provider Comparison Matrix
Provider Model Context Input $/1M Output $/1M Best For
──────────────────────────────────────────────────────────────────────────────
OpenAI gpt-4o 128k $2.50 $10.00 General, vision, tools
OpenAI gpt-4o-mini 128k $0.15 $0.60 Fast, cheap, structured
Anthropic claude-opus-4-5 200k $15.00 $75.00 Complex reasoning, analysis
Anthropic claude-sonnet-4-5 200k $3.00 $15.00 Balanced quality/cost
Anthropic claude-haiku-4-5 200k $0.25 $1.25 Fast classification
Google gemini-1.5-pro 1M $1.25 $5.00 Huge context, multimodal
Google gemini-1.5-flash 1M $0.075 $0.30 Very cheap, fast
Ollama llama3.2 128k Free Free Private, offline
Ollama codellama 16k Free Free Local code generation
OpenAI Configuration
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
# application.properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o-mini
spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.embedding.options.model=text-embedding-3-small
Anthropic Claude Configuration
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>
spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model=claude-sonnet-4-5
spring.ai.anthropic.chat.options.max-tokens=4096
Google Gemini Configuration
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-vertex-ai-gemini-spring-boot-starter</artifactId>
</dependency>
spring.ai.vertex.ai.gemini.project-id=${GOOGLE_PROJECT_ID}
spring.ai.vertex.ai.gemini.location=us-central1
spring.ai.vertex.ai.gemini.chat.options.model=gemini-1.5-flash-001
Multi-Provider Service — Smart Routing
@Service
public class SmartRoutingAiService {
private final ChatClient cheapClient; // GPT-4o-mini or Gemini Flash
private final ChatClient qualityClient; // GPT-4o or Claude Sonnet
private final ChatClient analysisClient; // Claude Opus (long context)
private final ChatClient localClient; // Ollama (private data)
public SmartRoutingAiService(
@Qualifier("cheapClient") ChatClient cheapClient,
@Qualifier("qualityClient") ChatClient qualityClient,
@Qualifier("analysisClient") ChatClient analysisClient,
@Qualifier("localClient") ChatClient localClient) {
this.cheapClient = cheapClient;
this.qualityClient = qualityClient;
this.analysisClient = analysisClient;
this.localClient = localClient;
}
// Route to appropriate model based on task
public String classify(String text) {
return cheapClient.prompt()
.user("Classify into [SPAM, HAM, UNKNOWN]: " + text)
.call().content();
}
public String generateCode(String requirement) {
return qualityClient.prompt()
.user("Write production-ready Java code: " + requirement)
.call().content();
}
public String analyzeDocument(String largeDocument) {
return analysisClient.prompt()
.user("Analyze this document: " + largeDocument)
.call().content();
}
public String processPrivate(String sensitiveData) {
return localClient.prompt()
.user(sensitiveData)
.call().content();
}
}
Configuration of Multiple Providers
@Configuration
public class MultiProviderConfig {
@Bean("cheapClient")
public ChatClient cheapClient(@Qualifier("openAiBuilder") ChatClient.Builder builder) {
return builder
.defaultOptions(OpenAiChatOptions.builder()
.model("gpt-4o-mini")
.temperature(0.3)
.build())
.build();
}
@Bean("qualityClient")
public ChatClient qualityClient(@Qualifier("anthropicBuilder") ChatClient.Builder builder) {
return builder
.defaultOptions(AnthropicChatOptions.builder()
.model("claude-sonnet-4-5")
.maxTokens(4096)
.build())
.build();
}
@Bean("analysisClient")
public ChatClient analysisClient(@Qualifier("anthropicBuilder") ChatClient.Builder builder) {
return builder
.defaultOptions(AnthropicChatOptions.builder()
.model("claude-opus-4-5")
.maxTokens(8192)
.build())
.build();
}
@Bean("localClient")
public ChatClient localClient(@Qualifier("ollamaBuilder") ChatClient.Builder builder) {
return builder
.defaultOptions(OllamaOptions.builder()
.model("llama3.2")
.build())
.build();
}
}
Performance Benchmark (Typical Results)
Task: "Classify this text as SPAM or HAM: 'Congratulations! You won $1000!'"
Model Response Latency Cost/1K calls
──────────────────────────────────────────────────────────
gpt-4o-mini SPAM 0.8s $0.02
gpt-4o SPAM 1.2s $0.28
claude-haiku SPAM 0.6s $0.01
claude-sonnet SPAM 1.0s $0.17
gemini-flash SPAM 0.5s $0.005
llama3.2 local SPAM 1.8s FREE
Task: "Analyze this 50,000-word legal document and identify risks"
Model Context Fit Quality Time
────────────────────────────────────────────────────
gpt-4o (128k) Partial Good 45s
claude-opus (200k) Full Excellent 62s
gemini-1.5-pro(1M) Full Good 38s
llama3.2 (128k) Partial Fair 120s (CPU)
Model Selection Guidelines
Use gpt-4o-mini / claude-haiku / gemini-flash when:
✔ Classification, extraction, simple Q&A
✔ High volume (10k+ calls/day)
✔ Budget is the primary constraint
✔ Response structure is simple JSON
Use gpt-4o / claude-sonnet when:
✔ Code generation and review
✔ Multi-step reasoning
✔ Complex instruction following
✔ Balanced cost and quality
Use claude-opus when:
✔ Long document analysis (100k+ tokens)
✔ Complex architecture decisions
✔ Highest quality matters most
Use Ollama (local) when:
✔ Data must not leave your server
✔ Offline or air-gapped environments
✔ Development and testing
✔ Cost reduction for high-volume simple tasks
Key Points
- The same Spring AI code works across all providers — only the starter dependency and application.properties change
- Route different tasks to different models: cheap models for classification, quality models for generation, local models for private data
- Gemini 1.5 Pro's 1M token context window is unique — use it when you need to process entire codebases or book-length documents
- Claude models follow complex multi-part instructions most reliably — use them for workflows with strict output format requirements
- A hybrid strategy (cheap model first, escalate to quality model on low-confidence answers) can cut costs by 70%+ with minimal quality loss
Comments