Home/Blog/Redis/How It Works: Under the Hood of the Redis Single-Threaded Engine
RedisIntermediateβ€’9 min readβ€’

How It Works: Under the Hood of the Redis Single-Threaded Engine

Understand epoll multiplexing, memory layouts, ziplists, and eviction policies inside the Redis engine.

Abstract Algorithms

Abstract Algorithms

Helping engineers master software engineering topics.

TLDR: Redis achieves sub-millisecond latencies and millions of operations per second by executing all commands inside a single thread. This avoids thread context switching and lock contention, relying on non-blocking I/O multiplexing (epoll) to handle thousands of concurrent client sockets.


πŸ“– Design Challenge: Scaling Cache Concurrency without Locks

Imagine you are scaling an API gateway that handles 50,000 requests per second. The gateway query rate-limit quotas are stored in a centralized cache.

If the cache uses a traditional multithreaded locking architecture (like a synchronized shared hash map), every incoming request thread must acquire a lock on the target key's bucket before incrementing the request counter.

Under high concurrent load, this design runs into lock contention bottlenecks:

  • As thread count increases, threads spend more CPU cycles waiting in queues for locks rather than doing useful work.
  • The Operating System constantly context-switches threads in and out of CPU cores, consuming memory bandwidth and polluting CPU L1/L2 caches.
  • Thread safety bugs (like deadlocks or race conditions) emerge, causing unpredictable latencies.

To solve this, Redis takes a different approach: execute all commands on a single thread. By eliminating locks, thread context switches, and synchronization overhead, Redis processes commands at CPU memory speeds.

However, this introduces a new challenge: how can a single thread process requests from 10,000 concurrent client connections without blocking? The answer lies in I/O multiplexing.


πŸ” Core Architecture: The Single-Threaded Event Loop

The Redis engine is built around a non-blocking event loop. Instead of allocating a thread per client connection, Redis delegates socket monitoring to the Operating System kernel.

The architecture is composed of four main components:

  1. Socket Connections: Client TCP sockets sending commands (e.g., GET, SET).
  2. I/O Multiplexer (Epoll/Kqueue): A kernel-level system utility that monitors thousands of sockets concurrently and returns only the sockets that have pending data to read.
  3. Event Demultiplexer: Translates raw socket events into execution tasks (Read, Write, Close).
  4. File Event Handler: The single-threaded execution loop that processes commands sequentially.

The diagram below shows the component architecture:

graph LR
    Sockets[Client Sockets 1-N] -->|Send Data| Multiplexer[I/O Multiplexer - epoll]
    Multiplexer -->|Ready Sockets| Queue[Event Queue]
    Queue -->|Process Sequentially| Loop[Single-Threaded Event Loop]
    Loop -->|Query/Write| Memory[In-Memory Data Store]

This system diagram illustrates the Redis command processing chain. Multiple client sockets send concurrent network packets. The I/O multiplexer (epoll) monitors these connections at the OS kernel level, queueing only the ready sockets. The single-threaded event loop pulls events from the queue one-by-one and executes them against the in-memory data store, avoiding any concurrency lock overhead.


βš™οΈ Core Mechanics: Epoll, Multiplexing, and Data Structures

The core execution path of Redis is driven by the ae event library, which wraps OS-specific multiplexing system calls.

1. Sockets Multiplexing with Epoll

On Linux systems, Redis uses the epoll system call. The event loop registers client socket file descriptors (FDs) with epoll_ctl.

When the loop runs, it invokes epoll_wait, which blocks until at least one socket has data ready to read. The OS kernel wakes up the thread and returns a list of active FDs, preventing the thread from running busy loops.

2. Specialized Memory Layouts

To keep memory access fast, Redis uses custom data layouts that minimize overhead:

  • Dict: A hash table implementation with incremental rehashing. When the table grows, Redis rehashing occurs step-by-step during normal command lookups to avoid long blockage pauses.
  • Ziplist: A highly compressed, contiguous byte array used for small lists, hashes, and sorted sets. It eliminates pointer overhead, keeping data contiguous in CPU caches.
  • Skiplist: A probabilistic alternative to balanced trees, used alongside hash tables to implement Sorted Sets (zset). It allows sorted element queries in $O(\log N)$ time.

πŸ“Š Architectural Blueprint: End-to-End Command Processing

To trace how a command is processed, the sequence diagram below maps a request from socket arrival to client response:

sequenceDiagram
    participant Client
    participant OS as OS Kernel (epoll)
    participant Loop as Redis Event Loop
    participant DB as In-Memory DB

    Client->>OS: Send "SET key value" over TCP
    Note over OS: Mark socket FD as readable
    Loop->>OS: Call epoll_wait()
    OS-->>Loop: Return active socket FD list
    Loop->>Loop: Read bytes from socket buffer
    Loop->>Loop: Parse command protocols (RESP)
    Loop->>DB: Execute write on key
    DB-->>Loop: Acknowledge write
    Loop->>Client: Send "OK" response

This sequence diagram traces the lifecycle of a Redis command. The client writes command bytes to a TCP socket. The OS kernel marks the file descriptor as readable. The Redis event loop detects the active FD using epoll_wait(), reads the socket buffer, parses the RESP protocol, executes the update in-memory, and writes the response back to the client socket.


🧠 Deep Dive: Ziplists, Skiplists, and Eviction Logic

To scale Redis memory efficiency, we must analyze the internal layouts of its structures.

The Internals of Redis Memory Allocation, RESP, and Defragmentation

Every Redis object is wrapped in a robj structure, which defines the type, encoding, and pointers. To minimize allocator fragmentation (using jemalloc), Redis uses Ziplists for small collections. A ziplist stores elements as contiguous bytes containing the length of the previous entry, the length of the current entry, and the payload. Because it is a single contiguous block of memory, it avoids the pointer overhead of standard linked lists (which require 24 bytes per node for pointers). If a ziplist exceeds size bounds (e.g., 512 elements or 64 bytes per entry), Redis automatically converts it to a standard quicklist or hashtable.

The database parses incoming bytes using the REdis Serialization Protocol (RESP). RESP parses simple strings, bulk strings, integers, arrays, and errors using simple prefixes (like $ for bulk strings and * for arrays). By keeping the protocol human-readable yet simple, the parser consumes very little CPU.

Memory fragmentation is a significant challenge when keys are updated with varying sizes. Because jemalloc allocates memory in fixed-size arenas, deleting and resizing keys leaves empty holes. To counter this, Redis implements Active Defragmentation. It scans the keyspace, allocates new contiguous memory blocks for fragmented objects, updates the pointers, and releases the fragmented memory pages back to the OS.

Performance Analysis of Multi-Threading in Redis 6+

A common interview question asks: "Is Redis multithreaded now?" The answer is nuanced. Since Redis 6, the engine supports I/O Threads to parallelize network read and write execution.

When a socket has data, the main thread assigns the socket to an I/O thread. The I/O thread reads the raw TCP buffer and parses the RESP command.

However, the actual execution of the command against the in-memory database remains strictly single-threaded on the main thread. Once computed, the response serialisation is delegated back to the I/O threads to be written to client sockets, combining single-threaded transaction safety with multi-threaded I/O scaling.


🌍 Real-World Implementation: Caching at Stripe

Stripe utilizes large Redis clusters to manage rate limiting for millions of API calls. Since rate limit counters require atomic updates, Stripe uses Redis Lua Scripts:

  • Because Redis runs commands sequentially on a single thread, any Lua script is executed atomically.
  • No other client commands can execute while a Lua script is running, ensuring counter checks and updates complete without race conditions.

βš–οΈ Trade-offs and Failure Modes: Single-Thread Blockage and Slow Queries

The single-threaded design introduces unique failure modes:

  • The Blocking Command Trap: If a query runs in $O(N)$ time (e.g., KEYS * or HGETALL on a hash with 10 million elements), the single thread will block for seconds. All other client requests queue up, leading to connection timeouts across your microservices.
  • No Multi-Core Utilization: Redis cannot scale beyond a single CPU core for command processing. To utilize multi-core servers, you must run multiple Redis instances on different ports (sharding).

🧭 Decision Guide: Redis vs. Memcached vs. KeyDB

Use this guide to choose the right in-memory store for your architecture.

FeatureRedisMemcachedKeyDB
Concurrency ModelSingle-threaded event loopMultithreaded with lock boundariesMultithreaded architecture (shares event loop)
Data StructuresRich (Lists, Sets, Hashes, Sorted Sets)Simple (Strings / Key-Value only)Rich (Redis compatible)
Atomic OperationsYes (Lua Scripts, Transactions)NoYes
Horizontal ScaleRedis Cluster (Sharding)Client-side hashingMulti-Master active replication

πŸ§ͺ Practical Implementation: Tuning redis.conf for Production

To configure Redis eviction policies and persistence in production:

# Max memory constraint (e.g., 2 GB)
maxmemory 2gb

# Eviction strategy: Evict any key using approximated LRU
maxmemory-policy allkeys-lru

# Approximated LRU sample size (higher = more accurate, but slightly more CPU)
maxmemory-samples 5

This configuration guarantees that when memory usage reaches 2 GB, Redis will automatically evict the least recently used keys, preventing the JVM/OS from terminating the Redis process due to Out-Of-Memory limits.


πŸ“š Lessons Learned: Production Redis Anti-Patterns

Avoid these common Redis mistakes in production:

  1. Running KEYS *: Never run KEYS * in production. It scans the entire keyspace, blocking the single thread. Use SCAN instead, which iterates through keys incrementally without blocking the loop.
  2. Unbounded Keyspaces: Always set a Time-To-Live (TTL) on caching keys. Without a TTL, temporary data will accumulate, eventually triggering memory evictions on critical static data.
  3. Large Payloads: Do not store large objects (e.g., 50 MB PDFs) in Redis. Network serialization overhead blocks the event loop, degrading QPS for all other clients.

πŸ“Œ Summary: The Redis Engine Rulebook

  • Single Thread: Avoids lock contention and context switching, executing commands at memory speeds.
  • Epoll Multiplexing: Leverages the OS kernel to monitor thousands of active sockets without resource exhaustion.
  • Atomic execution: Guarantees atomic operations naturally since no other command can run concurrently.
  • Ziplists: Compressed data layouts reduce memory usage and optimize CPU cache access.
  • Avoid $O(N)$ Operations: Never run blocking commands like KEYS * that lock the single execution thread.

AI-generated article quiz

Test your understanding

🧠

Ready to test what you just learned?

Generate four focused questions from this article. Answers include immediate explanations.

Guided series path

How It Works: Internals Explained

View all lessons β†’
Lesson 23 of 34

Reader feedback

Was this article useful?

Rate it if it helped, then continue with the next deep dive when you are ready.

Sign in to save your rating.