Java SpringAI

Spring AI Observability — Metrics, Tracing, and Logging for Production AI Apps

Spring AI Observability — Metrics, Tracing, and Logging for Production AI Apps

AI applications in production need the same observability as any other service: latency metrics, token usage tracking, error rates, and distributed traces. Spring AI integrates with Micrometer out of the box, automatically recording metrics for every AI call. This tutorial sets up complete observability for a Spring AI application.

What Spring AI Records Automatically

Metrics (Micrometer):
  gen_ai.client.operation.duration   → Latency of each AI call (histogram)
  gen_ai.client.token.usage          → Input/output token counts (counter)

Traces (OpenTelemetry/Zipkin):
  gen_ai.chat span     → Full trace of ChatClient calls
  gen_ai.embedding span → Full trace of embedding calls
  Includes: model name, prompt (if configured), response summary

Logs:
  DEBUG ai.spring.io → Log prompt and response content

Maven Dependencies

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

<!-- Micrometer metrics -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

<!-- Distributed tracing -->
<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
    <groupId>io.opentelemetry</groupId>
    <artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>

application.properties

spring.ai.openai.api-key=${OPENAI_API_KEY}

# Enable Spring AI observability
spring.ai.chat.observations.include-prompt=true     # include prompt in traces
spring.ai.chat.observations.include-completion=true # include response in traces

# Actuator
management.endpoints.web.exposure.include=health,info,metrics,prometheus
management.metrics.distribution.percentiles-histogram.gen_ai.client.operation.duration=true

# Tracing
management.tracing.sampling.probability=1.0  # 100% sampling for dev; use 0.1 for prod
management.zipkin.tracing.endpoint=http://localhost:9411/api/v2/spans

Custom Metrics — Token Usage Dashboard

import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Timer;

@Service
public class ObservableAiService {

    private final ChatClient  chatClient;
    private final Counter     totalCallsCounter;
    private final Counter     errorCounter;
    private final Timer       latencyTimer;
    private final MeterRegistry registry;

    public ObservableAiService(ChatClient.Builder builder, MeterRegistry registry) {
        this.chatClient = builder.build();
        this.registry   = registry;

        this.totalCallsCounter = Counter.builder("app.ai.calls.total")
                .description("Total AI chat calls made")
                .register(registry);

        this.errorCounter = Counter.builder("app.ai.calls.errors")
                .description("AI call errors")
                .register(registry);

        this.latencyTimer = Timer.builder("app.ai.latency")
                .description("AI call latency")
                .publishPercentiles(0.5, 0.95, 0.99)
                .register(registry);
    }

    public String ask(String question, String userId) {
        totalCallsCounter.increment();

        return latencyTimer.record(() -> {
            try {
                ChatResponse response = chatClient.prompt()
                        .user(question)
                        .call()
                        .chatResponse();

                // Record token usage with user tag
                long inputTokens  = response.getMetadata().getUsage().getPromptTokens();
                long outputTokens = response.getMetadata().getUsage().getGenerationTokens();

                registry.counter("app.ai.tokens.input",
                        "user", userId).increment(inputTokens);
                registry.counter("app.ai.tokens.output",
                        "user", userId).increment(outputTokens);

                return response.getResult().getOutput().getContent();

            } catch (Exception e) {
                errorCounter.increment();
                throw e;
            }
        });
    }
}

Observability Advisor — Log Every AI Call

import org.springframework.ai.chat.client.advisor.SimpleLoggerAdvisor;

@Service
public class LoggedAiService {

    private final ChatClient chatClient;

    public LoggedAiService(ChatClient.Builder builder) {
        this.chatClient = builder
                .defaultAdvisors(new SimpleLoggerAdvisor())  // logs every call
                .build();
    }

    public String ask(String question) {
        return chatClient.prompt().user(question).call().content();
    }
}
# Log output (DEBUG level):
2025-06-12 AI request  : {model=gpt-4o-mini, messages=[{role=user, content="What is RAG?"}]}
2025-06-12 AI response : {content="RAG is Retrieval Augmented Generation...", tokens={prompt=15, completion=87}}

Prometheus Metrics Endpoint

GET http://localhost:8080/actuator/prometheus

# Spring AI auto-recorded metrics:
gen_ai_client_operation_duration_seconds_bucket{gen_ai_operation_name="chat",le="0.5"} 12
gen_ai_client_operation_duration_seconds_bucket{gen_ai_operation_name="chat",le="1.0"} 18
gen_ai_client_token_usage_total{gen_ai_token_type="input"}  2450
gen_ai_client_token_usage_total{gen_ai_token_type="output"} 8120

# Custom app metrics:
app_ai_calls_total                  42.0
app_ai_calls_errors_total           2.0
app_ai_latency_seconds_p50          0.847
app_ai_latency_seconds_p95          2.341
app_ai_latency_seconds_p99          4.102

Health Check for AI Provider

import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;

@Component
public class AiProviderHealthIndicator implements HealthIndicator {

    private final ChatClient chatClient;

    public AiProviderHealthIndicator(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @Override
    public Health health() {
        try {
            String response = chatClient.prompt()
                    .user("Reply with exactly: OK")
                    .call()
                    .content();

            return response.contains("OK")
                    ? Health.up().withDetail("provider", "OpenAI").build()
                    : Health.degraded().withDetail("unexpected_response", response).build();

        } catch (Exception e) {
            return Health.down()
                    .withDetail("error", e.getMessage())
                    .build();
        }
    }
}

Key Points

  • Spring AI records gen_ai.client.operation.duration and gen_ai.client.token.usage automatically — no code needed
  • Set spring.ai.chat.observations.include-prompt=true only in development — prompt content is sensitive data in production
  • Use SimpleLoggerAdvisor during development to see exact prompts and responses in the log
  • Track token usage per user to implement usage limits and cost allocation in multi-tenant applications
  • A custom HealthIndicator that pings the AI provider ensures Kubernetes liveness probes detect AI service outages
Topics: Java SpringAI
← Newer Post Older Post →