Scaling Real-Time Chat: Beyond WebSockets
Building a chat app for a hackathon is easy. A simple Node.js server with socket.io can handle a few hundred users. But scaling it to millions of concurrent connections? That's where the real engineering begins.
At "The Ladder," we've tackled this challenge head-on. This post explores the journey from a single server to a globally distributed, fault-tolerant real-time infrastructure.
The Concurrency Challenge
WebSockets are persistent connections (TCP). Unlike HTTP requests which are short-lived, a WebSocket stays open as long as the user is online. This creates two massive problems:
- Memory: Each connection consumes RAM.
- File Descriptors: A server has a hard limit on open sockets (often 65k by default).
To scale, we need a distributed system. But if User A is on Server 1 and User B is on Server 2, how do they talk?
Architecture Overview
| Feature | Before | After |
|---|---|---|
| Connection Handling | Single Monolith | Distributed Edge Nodes |
| State Sync | In-Memory Variables | Redis Pub/Sub |
| Message History | SQL Database | Cassandra / ScyllaDB (Write-Heavy) |
WebSockets vs. Server-Sent Events (SSE)
Before we jump into architecture, let's talk protocols. Everyone defaults to WebSockets, but are they always the right choice?
WebSockets:
- Pros: Full bi-directional communication. Low latency.
- Cons: Heavier protocol overhead. Firewall issues in some corporate environments.
Server-Sent Events (SSE):
- Pros: Simple HTTP connection. efficient for one-way (server-to-client) data. Reconnects automatically.
- Cons: Handling upstream (client-to-server) queries requires a separate POST request.
For a chat app where users are constantly typing and reading, WebSockets remains the gold standard.
The Solution: Redis Pub/Sub
We use Redis as a high-speed message bus to bridge our independent websocket servers.
The Flow
- User A connects to Server 1.
- User B connects to Server 2.
- User A sends "Hello".
- Server 1 publishes the event to Redis channel
chat-room-1. - Server 2 (subscriber) hears the event on
chat-room-1. - Server 2 forwards the message to User B's open socket.
// Publisher (Server 1)
async function sendMessage(roomId: string, message: Message) {
// 1. Save to DB for history
await db.messages.create(message);
// 2. Publish to live subscribers
await redis.publish(roomId, JSON.stringify({
type: 'NEW_MESSAGE',
payload: message
}));
}
// Subscriber (Server 2)
redis.subscribe('chat-room-1', (channel, messageStr) => {
const event = JSON.parse(messageStr);
// Get all local clients in this room
const localClients = socketServer.in(channel).fetchSockets();
// Broadcast to them
localClients.forEach(client => {
client.emit('message', event.payload);
});
});
Bottleneck Alert: Redis Pub/Sub is fire-and-forget. If a server is momentarily down, it misses the message. For critical delivery guarantees, consider Redis Streams or a persistent queue like Kafka.
Handling Offline States and Synchronization
What happens if User B loses internet for 10 seconds?
- The WebSocket disconnects.
- They miss real-time messages.
- They reconnect.
We need a Synchronization Protocol.
- Each message has a monotonically increasing
sequenceId. - The client remembers the
lastKnownId. - On reconnect, the client sends:
HELLO { lastKnownId: 105 }. - The server queries the DB:
SELECT * FROM messages WHERE id > 105. - The server sends the "gap" messages before opening the live pipe.
Global Distribution (Edge)
To reduce latency, we deploy WebSocket/Edge servers in multiple regions (US-East, EU-West, Asia-Pacific). User A connects to the closest edge node.
However, Redis usually lives in one primary region. This introduces the "speed of light" problem. Creating a truly multi-region active-active chat system requires complex CRDTs (Conflict-free Replicated Data Types), a topic for another blog post!
Conclusion
Real-time architecture requires a shift in thinking from "request-response" to "event-driven." By decoupling connection handling from application logic and using robust message brokers, we can build systems that scale infinitely.