Spring AI Observability — Metrics, Tracing, and Logging for Production AI Apps
AI applications in production need the same observability as any other service: latency metrics, token usage tracking, error rates, and distributed traces. Spring AI integrates with Micrometer out of the box, automatically recording metrics for every AI call. This tutorial sets up complete observability for a Spring AI application.
What Spring AI Records Automatically
Metrics (Micrometer):
gen_ai.client.operation.duration → Latency of each AI call (histogram)
gen_ai.client.token.usage → Input/output token counts (counter)
Traces (OpenTelemetry/Zipkin):
gen_ai.chat span → Full trace of ChatClient calls
gen_ai.embedding span → Full trace of embedding calls
Includes: model name, prompt (if configured), response summary
Logs:
DEBUG ai.spring.io → Log prompt and response content
Maven Dependencies
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
<!-- Micrometer metrics -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-registry-prometheus</artifactId>
</dependency>
<!-- Distributed tracing -->
<dependency>
<groupId>io.micrometer</groupId>
<artifactId>micrometer-tracing-bridge-otel</artifactId>
</dependency>
<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-exporter-zipkin</artifactId>
</dependency>
application.properties
spring.ai.openai.api-key=${OPENAI_API_KEY}
# Enable Spring AI observability
spring.ai.chat.observations.include-prompt=true # include prompt in traces
spring.ai.chat.observations.include-completion=true # include response in traces
# Actuator
management.endpoints.web.exposure.include=health,info,metrics,prometheus
management.metrics.distribution.percentiles-histogram.gen_ai.client.operation.duration=true
# Tracing
management.tracing.sampling.probability=1.0 # 100% sampling for dev; use 0.1 for prod
management.zipkin.tracing.endpoint=http://localhost:9411/api/v2/spans
Custom Metrics — Token Usage Dashboard
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Counter;
import io.micrometer.core.instrument.Timer;
@Service
public class ObservableAiService {
private final ChatClient chatClient;
private final Counter totalCallsCounter;
private final Counter errorCounter;
private final Timer latencyTimer;
private final MeterRegistry registry;
public ObservableAiService(ChatClient.Builder builder, MeterRegistry registry) {
this.chatClient = builder.build();
this.registry = registry;
this.totalCallsCounter = Counter.builder("app.ai.calls.total")
.description("Total AI chat calls made")
.register(registry);
this.errorCounter = Counter.builder("app.ai.calls.errors")
.description("AI call errors")
.register(registry);
this.latencyTimer = Timer.builder("app.ai.latency")
.description("AI call latency")
.publishPercentiles(0.5, 0.95, 0.99)
.register(registry);
}
public String ask(String question, String userId) {
totalCallsCounter.increment();
return latencyTimer.record(() -> {
try {
ChatResponse response = chatClient.prompt()
.user(question)
.call()
.chatResponse();
// Record token usage with user tag
long inputTokens = response.getMetadata().getUsage().getPromptTokens();
long outputTokens = response.getMetadata().getUsage().getGenerationTokens();
registry.counter("app.ai.tokens.input",
"user", userId).increment(inputTokens);
registry.counter("app.ai.tokens.output",
"user", userId).increment(outputTokens);
return response.getResult().getOutput().getContent();
} catch (Exception e) {
errorCounter.increment();
throw e;
}
});
}
}
Observability Advisor — Log Every AI Call
import org.springframework.ai.chat.client.advisor.SimpleLoggerAdvisor;
@Service
public class LoggedAiService {
private final ChatClient chatClient;
public LoggedAiService(ChatClient.Builder builder) {
this.chatClient = builder
.defaultAdvisors(new SimpleLoggerAdvisor()) // logs every call
.build();
}
public String ask(String question) {
return chatClient.prompt().user(question).call().content();
}
}
# Log output (DEBUG level):
2025-06-12 AI request : {model=gpt-4o-mini, messages=[{role=user, content="What is RAG?"}]}
2025-06-12 AI response : {content="RAG is Retrieval Augmented Generation...", tokens={prompt=15, completion=87}}
Prometheus Metrics Endpoint
GET http://localhost:8080/actuator/prometheus
# Spring AI auto-recorded metrics:
gen_ai_client_operation_duration_seconds_bucket{gen_ai_operation_name="chat",le="0.5"} 12
gen_ai_client_operation_duration_seconds_bucket{gen_ai_operation_name="chat",le="1.0"} 18
gen_ai_client_token_usage_total{gen_ai_token_type="input"} 2450
gen_ai_client_token_usage_total{gen_ai_token_type="output"} 8120
# Custom app metrics:
app_ai_calls_total 42.0
app_ai_calls_errors_total 2.0
app_ai_latency_seconds_p50 0.847
app_ai_latency_seconds_p95 2.341
app_ai_latency_seconds_p99 4.102
Health Check for AI Provider
import org.springframework.boot.actuate.health.Health;
import org.springframework.boot.actuate.health.HealthIndicator;
@Component
public class AiProviderHealthIndicator implements HealthIndicator {
private final ChatClient chatClient;
public AiProviderHealthIndicator(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@Override
public Health health() {
try {
String response = chatClient.prompt()
.user("Reply with exactly: OK")
.call()
.content();
return response.contains("OK")
? Health.up().withDetail("provider", "OpenAI").build()
: Health.degraded().withDetail("unexpected_response", response).build();
} catch (Exception e) {
return Health.down()
.withDetail("error", e.getMessage())
.build();
}
}
}
Key Points
- Spring AI records
gen_ai.client.operation.durationandgen_ai.client.token.usageautomatically — no code needed - Set
spring.ai.chat.observations.include-prompt=trueonly in development — prompt content is sensitive data in production - Use
SimpleLoggerAdvisorduring development to see exact prompts and responses in the log - Track token usage per user to implement usage limits and cost allocation in multi-tenant applications
- A custom
HealthIndicatorthat pings the AI provider ensures Kubernetes liveness probes detect AI service outages
Comments