Java9R: Spring AI ChatClient API — System Prompts, Roles, and Streaming Responses

Spring AI ChatClient API — System Prompts, Roles, and Streaming Responses

The ChatClient is the central API in Spring AI for interacting with language models. It offers a fluent builder pattern to compose messages with different roles (system, user, assistant), handle streaming responses, and override model options per call. This tutorial covers all the key capabilities of ChatClient with complete examples.

Message Roles in LLMs

SYSTEM   → Instructions that define AI behavior for the entire conversation
           "You are a senior Java developer. Answer only with code examples."

USER     → The question or input from the human
           "How do I create a thread pool in Java?"

ASSISTANT → The AI's previous response (used for multi-turn conversations)
           "You can use Executors.newFixedThreadPool(n)..."

System Prompt — Control AI Behavior

@Service
public class JavaTutorService {

    private final ChatClient chatClient;

    public JavaTutorService(ChatClient.Builder builder) {
        // Set a default system prompt for every call from this client
        this.chatClient = builder
                .defaultSystem("""
                    You are an expert Java developer writing tutorials for java9r.com.
                    Always provide code examples. Use Java 17+ features.
                    Keep explanations concise and practical.
                    """)
                .build();
    }

    public String explain(String topic) {
        return chatClient.prompt()
                .user("Explain " + topic + " with a code example")
                .call()
                .content();
    }
}

Per-Call System Prompt Override

public String translateCode(String code, String targetLanguage) {
    return chatClient.prompt()
            .system("You are a code translator. Output only code, no explanations.")
            .user("Translate this Java code to " + targetLanguage + ":\n\n" + code)
            .call()
            .content();
}

Streaming Responses

Streaming returns tokens as they are generated — ideal for chat UIs or long responses. Spring AI returns a reactive Flux<String>.

import reactor.core.publisher.Flux;

@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<String> stream(@RequestParam String q) {
    return chatClient.prompt()
            .user(q)
            .stream()
            .content();   // Flux<String> — one token per emission
}

Test with curl:

curl -N "http://localhost:8080/ai/stream?q=Write+a+Spring+Boot+hello+world"

Full ChatResponse with Metadata

import org.springframework.ai.chat.model.ChatResponse;
import org.springframework.ai.chat.metadata.Usage;

public void askWithMetadata(String question) {
    ChatResponse response = chatClient.prompt()
            .user(question)
            .call()
            .chatResponse();

    String content  = response.getResult().getOutput().getContent();
    Usage  usage    = response.getMetadata().getUsage();

    System.out.println("Answer      : " + content);
    System.out.println("Prompt tokens : " + usage.getPromptTokens());
    System.out.println("Output tokens : " + usage.getGenerationTokens());
    System.out.println("Total tokens  : " + usage.getTotalTokens());
}

Output of Metadata Call

Answer       : To create a thread pool in Java use Executors.newFixedThreadPool(10)...
Prompt tokens  : 42
Output tokens  : 118
Total tokens   : 160

Per-Call Model Options Override

import org.springframework.ai.openai.OpenAiChatOptions;

public String precise(String question) {
    return chatClient.prompt()
            .user(question)
            .options(OpenAiChatOptions.builder()
                    .model("gpt-4o")
                    .temperature(0.0)   // deterministic output
                    .maxTokens(500)
                    .build())
            .call()
            .content();
}

Multi-Turn Conversation (Manual History)

import org.springframework.ai.chat.messages.*;
import java.util.*;

public String chat(List<Message> history, String userInput) {
    history.add(new UserMessage(userInput));

    ChatResponse response = chatClient.prompt()
            .messages(history)
            .call()
            .chatResponse();

    String aiReply = response.getResult().getOutput().getContent();
    history.add(new AssistantMessage(aiReply));

    return aiReply;
}

Controller with Conversation Endpoint

@RestController
@RequestMapping("/ai")
public class ChatController {

    private final ChatClient chatClient;
    // Simple in-memory store — use Redis/DB in production
    private final Map<String, List<Message>> sessions = new ConcurrentHashMap<>();

    public ChatController(ChatClient.Builder builder) {
        this.chatClient = builder
                .defaultSystem("You are a helpful Java programming assistant.")
                .build();
    }

    @PostMapping("/chat/{sessionId}")
    public String chat(@PathVariable String sessionId, @RequestBody String message) {
        List<Message> history = sessions.computeIfAbsent(sessionId, k -> new ArrayList<>());

        history.add(new UserMessage(message));
        String reply = chatClient.prompt()
                .messages(history)
                .call()
                .content();
        history.add(new AssistantMessage(reply));

        return reply;
    }
}

Key Points

defaultSystem() on the builder sets a system prompt for all calls from that ChatClient instance
.stream().content() returns Flux<String> for token-by-token streaming
.chatResponse() gives full metadata including token usage
.options() overrides model parameters (temperature, model name, maxTokens) per individual call
For production multi-turn conversations use MessageChatMemoryAdvisor instead of manual history management

Spring AI ChatClient API — System Prompts, Roles, and Streaming Responses

Spring AI ChatClient API — System Prompts, Roles, and Streaming Responses

Message Roles in LLMs

System Prompt — Control AI Behavior

Per-Call System Prompt Override

Streaming Responses

Full ChatResponse with Metadata

Output of Metadata Call

Per-Call Model Options Override

Multi-Turn Conversation (Manual History)

Controller with Conversation Endpoint

Key Points

Comments