Spring AI ChatClient API — System Prompts, Roles, and Streaming Responses
The ChatClient is the central API in Spring AI for interacting with language models. It offers a fluent builder pattern to compose messages with different roles (system, user, assistant), handle streaming responses, and override model options per call. This tutorial covers all the key capabilities of ChatClient with complete examples.
Message Roles in LLMs
SYSTEM → Instructions that define AI behavior for the entire conversation
"You are a senior Java developer. Answer only with code examples."
USER → The question or input from the human
"How do I create a thread pool in Java?"
ASSISTANT → The AI's previous response (used for multi-turn conversations)
"You can use Executors.newFixedThreadPool(n)..."
System Prompt — Control AI Behavior
@Service
public class JavaTutorService {
private final ChatClient chatClient;
public JavaTutorService(ChatClient.Builder builder) {
// Set a default system prompt for every call from this client
this.chatClient = builder
.defaultSystem("""
You are an expert Java developer writing tutorials for java9r.com.
Always provide code examples. Use Java 17+ features.
Keep explanations concise and practical.
""")
.build();
}
public String explain(String topic) {
return chatClient.prompt()
.user("Explain " + topic + " with a code example")
.call()
.content();
}
}
Per-Call System Prompt Override
public String translateCode(String code, String targetLanguage) {
return chatClient.prompt()
.system("You are a code translator. Output only code, no explanations.")
.user("Translate this Java code to " + targetLanguage + ":\n\n" + code)
.call()
.content();
}
Streaming Responses
Streaming returns tokens as they are generated — ideal for chat UIs or long responses. Spring AI returns a reactive Flux<String>.
import reactor.core.publisher.Flux;
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String q) {
return chatClient.prompt()
.user(q)
.stream()
.content(); // Flux<String> — one token per emission
}
Test with curl:
curl -N "http://localhost:8080/ai/stream?q=Write+a+Spring+Boot+hello+world"
Full ChatResponse with Metadata
import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.metadata.Usage;
public void askWithMetadata(String question) {
ChatResponse response = chatClient.prompt()
.user(question)
.call()
.chatResponse();
String content = response.getResult().getOutput().getContent();
Usage usage = response.getMetadata().getUsage();
System.out.println("Answer : " + content);
System.out.println("Prompt tokens : " + usage.getPromptTokens());
System.out.println("Output tokens : " + usage.getGenerationTokens());
System.out.println("Total tokens : " + usage.getTotalTokens());
}
Output of Metadata Call
Answer : To create a thread pool in Java use Executors.newFixedThreadPool(10)...
Prompt tokens : 42
Output tokens : 118
Total tokens : 160
Per-Call Model Options Override
import org.springframework.ai.openai.OpenAiChatOptions;
public String precise(String question) {
return chatClient.prompt()
.user(question)
.options(OpenAiChatOptions.builder()
.model("gpt-4o")
.temperature(0.0) // deterministic output
.maxTokens(500)
.build())
.call()
.content();
}
Multi-Turn Conversation (Manual History)
import org.springframework.ai.chat.messages.*;
import java.util.*;
public String chat(List<Message> history, String userInput) {
history.add(new UserMessage(userInput));
ChatResponse response = chatClient.prompt()
.messages(history)
.call()
.chatResponse();
String aiReply = response.getResult().getOutput().getContent();
history.add(new AssistantMessage(aiReply));
return aiReply;
}
Controller with Conversation Endpoint
@RestController
@RequestMapping("/ai")
public class ChatController {
private final ChatClient chatClient;
// Simple in-memory store — use Redis/DB in production
private final Map<String, List<Message>> sessions = new ConcurrentHashMap<>();
public ChatController(ChatClient.Builder builder) {
this.chatClient = builder
.defaultSystem("You are a helpful Java programming assistant.")
.build();
}
@PostMapping("/chat/{sessionId}")
public String chat(@PathVariable String sessionId, @RequestBody String message) {
List<Message> history = sessions.computeIfAbsent(sessionId, k -> new ArrayList<>());
history.add(new UserMessage(message));
String reply = chatClient.prompt()
.messages(history)
.call()
.content();
history.add(new AssistantMessage(reply));
return reply;
}
}
Key Points
defaultSystem()on the builder sets a system prompt for all calls from thatChatClientinstance.stream().content()returnsFlux<String>for token-by-token streaming.chatResponse()gives full metadata including token usage.options()overrides model parameters (temperature, model name, maxTokens) per individual call- For production multi-turn conversations use
MessageChatMemoryAdvisorinstead of manual history management
Comments