Spring AI with WebSockets — Real-Time AI Chat with Streaming Responses
HTTP streaming (Server-Sent Events) works for one-way AI responses, but WebSockets enable true bidirectional communication — clients can send multiple messages, interrupt responses, or receive AI-initiated updates. This tutorial builds a real-time AI chat application using Spring WebSocket, STOMP, and Spring AI streaming.
Architecture — WebSocket AI Chat
Browser (SockJS + STOMP client)
│ send: /app/chat
│
WebSocket Server (Spring Boot)
│
AiChatHandler ──→ ChatClient.stream()
│ │
│ Flux<String> chunks
│ │
STOMP broker ←─── /topic/chat/{sessionId}
│
Browser receives token-by-token
Maven Dependencies
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-websocket</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.ai</groupId>
<artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>
WebSocket Configuration
@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {
@Override
public void configureMessageBroker(MessageBrokerRegistry config) {
config.enableSimpleBroker("/topic", "/queue");
config.setApplicationDestinationPrefixes("/app");
config.setUserDestinationPrefix("/user");
}
@Override
public void registerStompEndpoints(StompEndpointRegistry registry) {
registry.addEndpoint("/ws")
.setAllowedOriginPatterns("*")
.withSockJS(); // SockJS fallback for browsers without WebSocket support
}
}
WebSocket Chat Controller
@Controller
public class AiChatController {
private final ChatClient chatClient;
private final SimpMessagingTemplate messagingTemplate;
private final ChatMemory memory;
public AiChatController(ChatClient.Builder builder,
SimpMessagingTemplate messagingTemplate) {
this.memory = new InMemoryChatMemory();
this.messagingTemplate = messagingTemplate;
this.chatClient = builder
.defaultSystem("You are a helpful Java and Spring Boot expert.")
.defaultAdvisors(new MessageChatMemoryAdvisor(memory))
.build();
}
@MessageMapping("/chat")
public void handleChat(ChatMessage message, Principal principal) {
String sessionId = principal.getName();
String destination = "/topic/chat/" + sessionId;
// Stream AI response token by token to WebSocket
chatClient.prompt()
.user(message.content())
.advisors(a -> a.param(
MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
sessionId))
.stream()
.content()
.doOnNext(token -> {
// Send each token to the user's topic
messagingTemplate.convertAndSend(destination,
new TokenMessage(token, false));
})
.doOnComplete(() -> {
// Signal end of response
messagingTemplate.convertAndSend(destination,
new TokenMessage("", true));
})
.doOnError(e -> {
messagingTemplate.convertAndSend(destination,
new TokenMessage("[Error: " + e.getMessage() + "]", true));
})
.subscribe();
}
}
record ChatMessage(String content) {}
record TokenMessage(String token, boolean done) {}
Frontend — JavaScript WebSocket Client
<!-- index.html -->
<script src="/webjars/sockjs-client/sockjs.min.js"></script>
<script src="/webjars/stomp-websocket/stomp.min.js"></script>
<script>
const socket = new SockJS('/ws');
const stompClient = Stomp.over(socket);
stompClient.connect({}, function() {
// Subscribe to AI response stream for this user
stompClient.subscribe('/topic/chat/' + sessionId, function(msg) {
const data = JSON.parse(msg.body);
if (data.done) {
document.getElementById('sending').style.display = 'none';
} else {
// Append token to response div
document.getElementById('response').textContent += data.token;
}
});
});
function sendMessage(text) {
document.getElementById('response').textContent = '';
document.getElementById('sending').style.display = 'block';
stompClient.send('/app/chat', {}, JSON.stringify({ content: text }));
}
</script>
Server-Sent Events Alternative (Simpler)
// SSE — simpler, no WebSocket needed, unidirectional only
@RestController
@RequestMapping("/ai")
public class SseAiController {
private final ChatClient chatClient;
public SseAiController(ChatClient.Builder builder) {
this.chatClient = builder.build();
}
@GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
public Flux<ServerSentEvent<String>> stream(@RequestParam String question) {
return chatClient.prompt()
.user(question)
.stream()
.content()
.map(token -> ServerSentEvent.<String>builder()
.data(token)
.build())
.concatWith(Flux.just(ServerSentEvent.<String>builder()
.event("done")
.data("")
.build()));
}
}
Output
// User sends via WebSocket: { "content": "What is Spring AI?" }
// Tokens arrive at /topic/chat/user-123 approximately 50ms apart:
{"token": "Spring", "done": false}
{"token": " AI", "done": false}
{"token": " is", "done": false}
{"token": " a", "done": false}
{"token": " framework", "done": false}
... (continues streaming) ...
{"token": ".", "done": false}
{"token": "", "done": true} ← completion signal
// User sees text building up character by character in the browser
// Total time to first token: ~200ms
// Full response: ~3-8 seconds (streamed, not waited)
Key Points
- Use WebSocket (STOMP) when you need bidirectional communication — clients sending multiple messages while receiving, or AI-initiated pushes
- Use Server-Sent Events (SSE) for simpler one-way streaming — less infrastructure, works over standard HTTP/2, automatic reconnect built into browsers
- Always send a
"done": truecompletion signal — the client needs to know when to stop showing the typing indicator - For multi-instance deployments, replace the in-memory STOMP broker with RabbitMQ or Redis pub/sub so messages route correctly regardless of which instance handles the WebSocket connection
- Implement a heartbeat ping from client to server every 30 seconds — WebSocket connections through corporate proxies often time out after 60 seconds of inactivity
Comments