Java SpringAI

Spring AI LLM Comparison — OpenAI vs Anthropic Claude vs Google Gemini vs Ollama

Spring AI LLM Comparison — OpenAI vs Anthropic Claude vs Google Gemini vs Ollama

Spring AI supports multiple LLM providers through a uniform API. Choosing the right model for each task is one of the most impactful decisions in an AI application — it affects quality, latency, and cost by orders of magnitude. This tutorial compares the major providers with concrete Spring AI configuration and practical guidance.

Provider Comparison Matrix

Provider     Model               Context   Input $/1M  Output $/1M  Best For
──────────────────────────────────────────────────────────────────────────────
OpenAI       gpt-4o              128k      $2.50       $10.00       General, vision, tools
OpenAI       gpt-4o-mini         128k      $0.15       $0.60        Fast, cheap, structured
Anthropic    claude-opus-4-5     200k      $15.00      $75.00       Complex reasoning, analysis
Anthropic    claude-sonnet-4-5   200k      $3.00       $15.00       Balanced quality/cost
Anthropic    claude-haiku-4-5    200k      $0.25       $1.25        Fast classification
Google       gemini-1.5-pro      1M        $1.25       $5.00        Huge context, multimodal
Google       gemini-1.5-flash    1M        $0.075      $0.30        Very cheap, fast
Ollama       llama3.2            128k      Free        Free         Private, offline
Ollama       codellama           16k       Free        Free         Local code generation

OpenAI Configuration

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
# application.properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
spring.ai.openai.chat.options.model=gpt-4o-mini
spring.ai.openai.chat.options.temperature=0.7
spring.ai.openai.embedding.options.model=text-embedding-3-small

Anthropic Claude Configuration

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-anthropic-spring-boot-starter</artifactId>
</dependency>
spring.ai.anthropic.api-key=${ANTHROPIC_API_KEY}
spring.ai.anthropic.chat.options.model=claude-sonnet-4-5
spring.ai.anthropic.chat.options.max-tokens=4096

Google Gemini Configuration

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-vertex-ai-gemini-spring-boot-starter</artifactId>
</dependency>
spring.ai.vertex.ai.gemini.project-id=${GOOGLE_PROJECT_ID}
spring.ai.vertex.ai.gemini.location=us-central1
spring.ai.vertex.ai.gemini.chat.options.model=gemini-1.5-flash-001

Multi-Provider Service — Smart Routing

@Service
public class SmartRoutingAiService {

    private final ChatClient cheapClient;       // GPT-4o-mini or Gemini Flash
    private final ChatClient qualityClient;     // GPT-4o or Claude Sonnet
    private final ChatClient analysisClient;    // Claude Opus (long context)
    private final ChatClient localClient;       // Ollama (private data)

    public SmartRoutingAiService(
            @Qualifier("cheapClient")    ChatClient cheapClient,
            @Qualifier("qualityClient")  ChatClient qualityClient,
            @Qualifier("analysisClient") ChatClient analysisClient,
            @Qualifier("localClient")    ChatClient localClient) {
        this.cheapClient    = cheapClient;
        this.qualityClient  = qualityClient;
        this.analysisClient = analysisClient;
        this.localClient    = localClient;
    }

    // Route to appropriate model based on task
    public String classify(String text) {
        return cheapClient.prompt()
                .user("Classify into [SPAM, HAM, UNKNOWN]: " + text)
                .call().content();
    }

    public String generateCode(String requirement) {
        return qualityClient.prompt()
                .user("Write production-ready Java code: " + requirement)
                .call().content();
    }

    public String analyzeDocument(String largeDocument) {
        return analysisClient.prompt()
                .user("Analyze this document: " + largeDocument)
                .call().content();
    }

    public String processPrivate(String sensitiveData) {
        return localClient.prompt()
                .user(sensitiveData)
                .call().content();
    }
}

Configuration of Multiple Providers

@Configuration
public class MultiProviderConfig {

    @Bean("cheapClient")
    public ChatClient cheapClient(@Qualifier("openAiBuilder") ChatClient.Builder builder) {
        return builder
                .defaultOptions(OpenAiChatOptions.builder()
                        .model("gpt-4o-mini")
                        .temperature(0.3)
                        .build())
                .build();
    }

    @Bean("qualityClient")
    public ChatClient qualityClient(@Qualifier("anthropicBuilder") ChatClient.Builder builder) {
        return builder
                .defaultOptions(AnthropicChatOptions.builder()
                        .model("claude-sonnet-4-5")
                        .maxTokens(4096)
                        .build())
                .build();
    }

    @Bean("analysisClient")
    public ChatClient analysisClient(@Qualifier("anthropicBuilder") ChatClient.Builder builder) {
        return builder
                .defaultOptions(AnthropicChatOptions.builder()
                        .model("claude-opus-4-5")
                        .maxTokens(8192)
                        .build())
                .build();
    }

    @Bean("localClient")
    public ChatClient localClient(@Qualifier("ollamaBuilder") ChatClient.Builder builder) {
        return builder
                .defaultOptions(OllamaOptions.builder()
                        .model("llama3.2")
                        .build())
                .build();
    }
}

Performance Benchmark (Typical Results)

Task: "Classify this text as SPAM or HAM: 'Congratulations! You won $1000!'"

Model           Response     Latency    Cost/1K calls
──────────────────────────────────────────────────────────
gpt-4o-mini     SPAM         0.8s       $0.02
gpt-4o          SPAM         1.2s       $0.28
claude-haiku    SPAM         0.6s       $0.01
claude-sonnet   SPAM         1.0s       $0.17
gemini-flash    SPAM         0.5s       $0.005
llama3.2 local  SPAM         1.8s       FREE

Task: "Analyze this 50,000-word legal document and identify risks"

Model              Context Fit    Quality    Time
────────────────────────────────────────────────────
gpt-4o (128k)      Partial        Good       45s
claude-opus (200k) Full           Excellent  62s
gemini-1.5-pro(1M) Full           Good       38s
llama3.2 (128k)    Partial        Fair       120s (CPU)

Model Selection Guidelines

Use gpt-4o-mini / claude-haiku / gemini-flash when:
  ✔ Classification, extraction, simple Q&A
  ✔ High volume (10k+ calls/day)
  ✔ Budget is the primary constraint
  ✔ Response structure is simple JSON

Use gpt-4o / claude-sonnet when:
  ✔ Code generation and review
  ✔ Multi-step reasoning
  ✔ Complex instruction following
  ✔ Balanced cost and quality

Use claude-opus when:
  ✔ Long document analysis (100k+ tokens)
  ✔ Complex architecture decisions
  ✔ Highest quality matters most

Use Ollama (local) when:
  ✔ Data must not leave your server
  ✔ Offline or air-gapped environments
  ✔ Development and testing
  ✔ Cost reduction for high-volume simple tasks

Key Points

  • The same Spring AI code works across all providers — only the starter dependency and application.properties change
  • Route different tasks to different models: cheap models for classification, quality models for generation, local models for private data
  • Gemini 1.5 Pro's 1M token context window is unique — use it when you need to process entire codebases or book-length documents
  • Claude models follow complex multi-part instructions most reliably — use them for workflows with strict output format requirements
  • A hybrid strategy (cheap model first, escalate to quality model on low-confidence answers) can cut costs by 70%+ with minimal quality loss
Topics: Java SpringAI
← Newer Post Older Post →