Java9R: Spring AI with WebSockets — Real-Time AI Chat with Streaming Responses

Spring AI with WebSockets — Real-Time AI Chat with Streaming Responses

HTTP streaming (Server-Sent Events) works for one-way AI responses, but WebSockets enable true bidirectional communication — clients can send multiple messages, interrupt responses, or receive AI-initiated updates. This tutorial builds a real-time AI chat application using Spring WebSocket, STOMP, and Spring AI streaming.

Architecture — WebSocket AI Chat

Browser (SockJS + STOMP client)
         │ send: /app/chat
         │
    WebSocket Server (Spring Boot)
         │
    AiChatHandler ──→ ChatClient.stream()
         │                    │
         │              Flux<String> chunks
         │                    │
    STOMP broker ←─── /topic/chat/{sessionId}
         │
    Browser receives token-by-token

Maven Dependencies

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-websocket</artifactId>
</dependency>
<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-openai-spring-boot-starter</artifactId>
</dependency>

WebSocket Configuration

@Configuration
@EnableWebSocketMessageBroker
public class WebSocketConfig implements WebSocketMessageBrokerConfigurer {

    @Override
    public void configureMessageBroker(MessageBrokerRegistry config) {
        config.enableSimpleBroker("/topic", "/queue");
        config.setApplicationDestinationPrefixes("/app");
        config.setUserDestinationPrefix("/user");
    }

    @Override
    public void registerStompEndpoints(StompEndpointRegistry registry) {
        registry.addEndpoint("/ws")
                .setAllowedOriginPatterns("*")
                .withSockJS();   // SockJS fallback for browsers without WebSocket support
    }
}

WebSocket Chat Controller

@Controller
public class AiChatController {

    private final ChatClient chatClient;
    private final SimpMessagingTemplate messagingTemplate;
    private final ChatMemory memory;

    public AiChatController(ChatClient.Builder builder,
                             SimpMessagingTemplate messagingTemplate) {
        this.memory            = new InMemoryChatMemory();
        this.messagingTemplate = messagingTemplate;
        this.chatClient        = builder
                .defaultSystem("You are a helpful Java and Spring Boot expert.")
                .defaultAdvisors(new MessageChatMemoryAdvisor(memory))
                .build();
    }

    @MessageMapping("/chat")
    public void handleChat(ChatMessage message, Principal principal) {
        String sessionId = principal.getName();
        String destination = "/topic/chat/" + sessionId;

        // Stream AI response token by token to WebSocket
        chatClient.prompt()
                .user(message.content())
                .advisors(a -> a.param(
                        MessageChatMemoryAdvisor.CHAT_MEMORY_CONVERSATION_ID_KEY,
                        sessionId))
                .stream()
                .content()
                .doOnNext(token -> {
                    // Send each token to the user's topic
                    messagingTemplate.convertAndSend(destination,
                            new TokenMessage(token, false));
                })
                .doOnComplete(() -> {
                    // Signal end of response
                    messagingTemplate.convertAndSend(destination,
                            new TokenMessage("", true));
                })
                .doOnError(e -> {
                    messagingTemplate.convertAndSend(destination,
                            new TokenMessage("[Error: " + e.getMessage() + "]", true));
                })
                .subscribe();
    }
}

record ChatMessage(String content) {}
record TokenMessage(String token, boolean done) {}

Frontend — JavaScript WebSocket Client

<!-- index.html -->
<script src="/webjars/sockjs-client/sockjs.min.js"></script>
<script src="/webjars/stomp-websocket/stomp.min.js"></script>
<script>
const socket  = new SockJS('/ws');
const stompClient = Stomp.over(socket);

stompClient.connect({}, function() {
    // Subscribe to AI response stream for this user
    stompClient.subscribe('/topic/chat/' + sessionId, function(msg) {
        const data = JSON.parse(msg.body);

        if (data.done) {
            document.getElementById('sending').style.display = 'none';
        } else {
            // Append token to response div
            document.getElementById('response').textContent += data.token;
        }
    });
});

function sendMessage(text) {
    document.getElementById('response').textContent = '';
    document.getElementById('sending').style.display = 'block';

    stompClient.send('/app/chat', {}, JSON.stringify({ content: text }));
}
</script>

Server-Sent Events Alternative (Simpler)

// SSE — simpler, no WebSocket needed, unidirectional only
@RestController
@RequestMapping("/ai")
public class SseAiController {

    private final ChatClient chatClient;

    public SseAiController(ChatClient.Builder builder) {
        this.chatClient = builder.build();
    }

    @GetMapping(value = "/stream", produces = MediaType.TEXT_EVENT_STREAM_VALUE)
    public Flux<ServerSentEvent<String>> stream(@RequestParam String question) {
        return chatClient.prompt()
                .user(question)
                .stream()
                .content()
                .map(token -> ServerSentEvent.<String>builder()
                        .data(token)
                        .build())
                .concatWith(Flux.just(ServerSentEvent.<String>builder()
                        .event("done")
                        .data("")
                        .build()));
    }
}

Output

// User sends via WebSocket: { "content": "What is Spring AI?" }

// Tokens arrive at /topic/chat/user-123 approximately 50ms apart:
{"token": "Spring",  "done": false}
{"token": " AI",     "done": false}
{"token": " is",     "done": false}
{"token": " a",      "done": false}
{"token": " framework", "done": false}
... (continues streaming) ...
{"token": ".", "done": false}
{"token": "",  "done": true}    ← completion signal

// User sees text building up character by character in the browser
// Total time to first token: ~200ms
// Full response: ~3-8 seconds (streamed, not waited)

Key Points

Use WebSocket (STOMP) when you need bidirectional communication — clients sending multiple messages while receiving, or AI-initiated pushes
Use Server-Sent Events (SSE) for simpler one-way streaming — less infrastructure, works over standard HTTP/2, automatic reconnect built into browsers
Always send a "done": true completion signal — the client needs to know when to stop showing the typing indicator
For multi-instance deployments, replace the in-memory STOMP broker with RabbitMQ or Redis pub/sub so messages route correctly regardless of which instance handles the WebSocket connection
Implement a heartbeat ping from client to server every 30 seconds — WebSocket connections through corporate proxies often time out after 60 seconds of inactivity

Spring AI with WebSockets — Real-Time AI Chat with Streaming Responses

Spring AI with WebSockets — Real-Time AI Chat with Streaming Responses

Architecture — WebSocket AI Chat

Maven Dependencies

WebSocket Configuration

WebSocket Chat Controller

Frontend — JavaScript WebSocket Client

Server-Sent Events Alternative (Simpler)

Output

Key Points

Comments