1 - ARM support

Exploring Redis on the ARM CPU Architecture

Redis versions 4.0 and above support the ARM processor in general, and the Raspberry Pi specifically, as a main platform. Every new release of Redis is tested on the Pi environment, and we update this documentation page with information about supported devices and other useful information. While Redis does run on Android, in the future we look forward to extend our testing efforts to Android to also make it an officially supported platform.

We believe that Redis is ideal for IoT and embedded devices for several reasons:

  • Redis has a very small memory footprint and CPU requirements. It can run in small devices like the Raspberry Pi Zero without impacting the overall performance, using a small amount of memory while delivering good performance for many use cases.
  • The data structures of Redis are often an ideal way to model IoT/embedded use cases. Some examples include accumulating time series data, receiving or queuing commands to execute or respond to send back to the remote servers, and so forth.
  • Modeling data inside Redis can be very useful in order to make in-device decisions for appliances that must respond very quickly or when the remote servers are offline.
  • Redis can be used as an communication system between the processes running in the device.
  • The append-only file storage of Redis is well suited for SSD cards.
  • The stream data structure included in Redis versions 5.0 and higher was specifically designed for time series applications and has a very low memory overhead.

Redis /proc/cpu/alignment requirements

Linux on ARM allows to trap unaligned accesses and fix them inside the kernel in order to continue the execution of the offending program instead of generating a SIGBUS. Redis 4.0 and greater are fixed in order to avoid any kind of unaligned access, so there is no need to have a specific value for this kernel configuration. Even when kernel alignment fixing set as disabled Redis should run as expected.

Building Redis in the Pi

  • Download Redis version 4.0 or higher.
  • Use make as usual to create the executable.

There is nothing special in the process. The only difference is that by default, Redis uses the libc allocator instead of defaulting to jemalloc as it does in other Linux based environments. This is because we believe that for the small use cases inside embedded devices, memory fragmentation is unlikely to be a problem. Moreover jemalloc on ARM may not be as tested as the libc allocator.

Performance

Performance testing of Redis was performed on the Raspberry Pi 3 and Pi 1 model B. The difference between the two Pis in terms of delivered performance is quite big. The benchmarks were performed via the loopback interface, since most use cases will probably use Redis from within the device and not via the network. The following numbers were obtained using Redis 4.0.

Raspberry Pi 3:

  • Test 1 : 5 millions writes with 1 million keys (even distribution among keys). No persistence, no pipelining. 28,000 ops/sec.
  • Test 2: Like test 1 but with pipelining using groups of 8 operations: 80,000 ops/sec.
  • Test 3: Like test 1 but with AOF enabled, fsync 1 sec: 23,000 ops/sec
  • Test 4: Like test 3, but with an AOF rewrite in progress: 21,000 ops/sec

Raspberry Pi 1 model B:

  • Test 1 : 5 millions writes with 1 million keys (even distribution among keys). No persistence, no pipelining. 2,200 ops/sec.
  • Test 2: Like test 1 but with pipelining using groups of 8 operations: 8,500 ops/sec.
  • Test 3: Like test 1 but with AOF enabled, fsync 1 sec: 1,820 ops/sec
  • Test 4: Like test 3, but with an AOF rewrite in progress: 1,000 ops/sec

The benchmarks above are referring to simple SET/GET operations. The performance is similar for all the Redis fast operations (not running in linear time). However sorted sets may show slightly slower numbers.

2 - Redis client handling

How the Redis server manages client connections

This document provides information about how Redis handles clients at the network layer level: connections, timeouts, buffers, and other similar topics are covered here.

The information contained in this document is only applicable to Redis version 2.6 or greater.

Accepting Client Connections

Redis accepts clients connections on the configured TCP port and on the Unix socket if enabled. When a new client connection is accepted the following operations are performed:

  • The client socket is put in the non-blocking state since Redis uses multiplexing and non-blocking I/O.
  • The TCP_NODELAY option is set in order to ensure that there are no delays to the connection.
  • A readable file event is created so that Redis is able to collect the client queries as soon as new data is available to read on the socket.

After the client is initialized, Redis checks if it is already at the limit configured for the number of simultaneous clients (configured using the maxclients configuration directive, see the next section of this document for further information).

When Redis can't accept a new client connection because the maximum number of clients has been reached, it tries to send an error to the client in order to make it aware of this condition, closing the connection immediately. The error message will reach the client even if the connection is closed immediately by Redis because the new socket output buffer is usually big enough to contain the error, so the kernel will handle transmission of the error.

What Order are Client Requests Served In?

The order is determined by a combination of the client socket file descriptor number and order in which the kernel reports events, so the order should be considered as unspecified.

However, Redis does the following two things when serving clients:

  • It only performs a single read() system call every time there is something new to read from the client socket. This ensures that if we have multiple clients connected, and a few send queries at a high rate, other clients are not penalized and will not experience latency issues.
  • However once new data is read from a client, all the queries contained in the current buffers are processed sequentially. This improves locality and does not need iterating a second time to see if there are clients that need some processing time.

Maximum Concurrent Connected Clients

In Redis 2.4 there was a hard-coded limit for the maximum number of clients that could be handled simultaneously.

In Redis 2.6 and newer, this limit is dynamic: by default it is set to 10000 clients, unless otherwise stated by the maxclients directive in redis.conf.

However, Redis checks with the kernel what the maximum number of file descriptors that we are able to open is (the soft limit is checked). If the limit is less than the maximum number of clients we want to handle, plus 32 (that is the number of file descriptors Redis reserves for internal uses), then the maximum number of clients is updated to match the number of clients it is really able to handle under the current operating system limit.

When maxclients is set to a number greater than Redis can support, a message is logged at startup:

$ ./redis-server --maxclients 100000
[41422] 23 Jan 11:28:33.179 # Unable to set the max number of files limit to 100032 (Invalid argument), setting the max clients configuration to 10112.

When Redis is configured in order to handle a specific number of clients it is a good idea to make sure that the operating system limit for the maximum number of file descriptors per process is also set accordingly.

Under Linux these limits can be set both in the current session and as a system-wide setting with the following commands:

  • ulimit -Sn 100000 # This will only work if hard limit is big enough.
  • sysctl -w fs.file-max=100000

Output Buffer Limits

Redis needs to handle a variable-length output buffer for every client, since a command can produce a large amount of data that needs to be transferred to the client.

However it is possible that a client sends more commands producing more output to serve at a faster rate than that which Redis can send the existing output to the client. This is especially true with Pub/Sub clients in case a client is not able to process new messages fast enough.

Both conditions will cause the client output buffer to grow and consume more and more memory. For this reason by default Redis sets limits to the output buffer size for different kind of clients. When the limit is reached the client connection is closed and the event logged in the Redis log file.

There are two kind of limits Redis uses:

  • The hard limit is a fixed limit that when reached will make Redis close the client connection as soon as possible.
  • The soft limit instead is a limit that depends on the time, for instance a soft limit of 32 megabytes per 10 seconds means that if the client has an output buffer bigger than 32 megabytes for, continuously, 10 seconds, the connection gets closed.

Different kind of clients have different default limits:

  • Normal clients have a default limit of 0, that means, no limit at all, because most normal clients use blocking implementations sending a single command and waiting for the reply to be completely read before sending the next command, so it is always not desirable to close the connection in case of a normal client.
  • Pub/Sub clients have a default hard limit of 32 megabytes and a soft limit of 8 megabytes per 60 seconds.
  • Replicas have a default hard limit of 256 megabytes and a soft limit of 64 megabyte per 60 seconds.

It is possible to change the limit at runtime using the CONFIG SET command or in a permanent way using the Redis configuration file redis.conf. See the example redis.conf in the Redis distribution for more information about how to set the limit.

Query Buffer Hard Limit

Every client is also subject to a query buffer limit. This is a non-configurable hard limit that will close the connection when the client query buffer (that is the buffer we use to accumulate commands from the client) reaches 1 GB, and is actually only an extreme limit to avoid a server crash in case of client or server software bugs.

Client Eviction

Redis is built to handle a very large number of client connections. Client connections tend to consume memory, and when there are many of them, the aggregate memory consumption can be extremely high, leading to data eviction or out-of-memory errors. These cases can be mitigated to an extent using output buffer limits, but Redis allows us a more robust configuration to limit the aggregate memory used by all clients' connections.

This mechanism is called client eviction, and it's essentially a safety mechanism that will disconnect clients once the aggregate memory usage of all clients is above a threshold. The mechanism first attempts to disconnect clients that use the most memory. It disconnects the minimal number of clients needed to return below the maxmemory-clients threshold.

maxmemory-clients defines the maximum aggregate memory usage of all clients connected to Redis. The aggregation takes into account all the memory used by the client connections: the query buffer, the output buffer, and other intermediate buffers.

Note that replica and master connections aren't affected by the client eviction mechanism. Therefore, such connections are never evicted.

maxmemory-clients can be set permanently in the configuration file (redis.conf) or via the CONFIG SET command. This setting can either be 0 (meaning no limit), a size in bytes (possibly with mb/gb suffix), or a percentage of maxmemory by using the % suffix (e.g. setting it to 10% would mean 10% of the maxmemory configuration).

The default setting is 0, meaning client eviction is turned off by default. However, for any large production deployment, it is highly recommended to configure some non-zero maxmemory-clients value. A value 5%, for example, can be a good place to start.

It is possible to flag a specific client connection to be excluded from the client eviction mechanism. This is useful for control path connections. If, for example, you have an application that monitors the server via the INFO command and alerts you in case of a problem, you might want to make sure this connection isn't evicted. You can do so using the following command (from the relevant client's connection):

CLIENT NO-EVICT on

And you can revert that with:

CLIENT NO-EVICT off

For more information and an example refer to the maxmemory-clients section in the default redis.conf file.

Client eviction is available from Redis 7.0.

Client Timeouts

By default recent versions of Redis don't close the connection with the client if the client is idle for many seconds: the connection will remain open forever.

However if you don't like this behavior, you can configure a timeout, so that if the client is idle for more than the specified number of seconds, the client connection will be closed.

You can configure this limit via redis.conf or simply using CONFIG SET timeout <value>.

Note that the timeout only applies to normal clients and it does not apply to Pub/Sub clients, since a Pub/Sub connection is a push style connection so a client that is idle is the norm.

Even if by default connections are not subject to timeout, there are two conditions when it makes sense to set a timeout:

  • Mission critical applications where a bug in the client software may saturate the Redis server with idle connections, causing service disruption.
  • As a debugging mechanism in order to be able to connect with the server if a bug in the client software saturates the server with idle connections, making it impossible to interact with the server.

Timeouts are not to be considered very precise: Redis avoids setting timer events or running O(N) algorithms in order to check idle clients, so the check is performed incrementally from time to time. This means that it is possible that while the timeout is set to 10 seconds, the client connection will be closed, for instance, after 12 seconds if many clients are connected at the same time.

The CLIENT Command

The Redis CLIENT command allows you to inspect the state of every connected client, to kill a specific client, and to name connections. It is a very powerful debugging tool if you use Redis at scale.

CLIENT LIST is used in order to obtain a list of connected clients and their state:

redis 127.0.0.1:6379> client list
addr=127.0.0.1:52555 fd=5 name= age=855 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events=r cmd=client
addr=127.0.0.1:52787 fd=6 name= age=6 idle=5 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r cmd=ping

In the above example two clients are connected to the Redis server. Let's look at what some of the data returned represents:

  • addr: The client address, that is, the client IP and the remote port number it used to connect with the Redis server.
  • fd: The client socket file descriptor number.
  • name: The client name as set by CLIENT SETNAME.
  • age: The number of seconds the connection existed for.
  • idle: The number of seconds the connection is idle.
  • flags: The kind of client (N means normal client, check the full list of flags).
  • omem: The amount of memory used by the client for the output buffer.
  • cmd: The last executed command.

See the [CLIENT LIST](https://redis.io/commands/client-list) documentation for the full listing of fields and their purpose.

Once you have the list of clients, you can close a client's connection using the CLIENT KILL command, specifying the client address as its argument.

The commands CLIENT SETNAME and CLIENT GETNAME can be used to set and get the connection name. Starting with Redis 4.0, the client name is shown in the SLOWLOG output, to help identify clients that create latency issues.

TCP keepalive

From version 3.2 onwards, Redis has TCP keepalive (SO_KEEPALIVE socket option) enabled by default and set to about 300 seconds. This option is useful in order to detect dead peers (clients that cannot be reached even if they look connected). Moreover, if there is network equipment between clients and servers that need to see some traffic in order to take the connection open, the option will prevent unexpected connection closed events.

3 - Redis cluster specification

Detailed specification for Redis cluster

Welcome to the Redis Cluster Specification. Here you'll find information about the algorithms and design rationales of Redis Cluster. This document is a work in progress as it is continuously synchronized with the actual implementation of Redis.

Main properties and rationales of the design

Redis Cluster goals

Redis Cluster is a distributed implementation of Redis with the following goals in order of importance in the design:

  • High performance and linear scalability up to 1000 nodes. There are no proxies, asynchronous replication is used, and no merge operations are performed on values.
  • Acceptable degree of write safety: the system tries (in a best-effort way) to retain all the writes originating from clients connected with the majority of the master nodes. Usually there are small windows where acknowledged writes can be lost. Windows to lose acknowledged writes are larger when clients are in a minority partition.
  • Availability: Redis Cluster is able to survive partitions where the majority of the master nodes are reachable and there is at least one reachable replica for every master node that is no longer reachable. Moreover using replicas migration, masters no longer replicated by any replica will receive one from a master which is covered by multiple replicas.

What is described in this document is implemented in Redis 3.0 or greater.

Implemented subset

Redis Cluster implements all the single key commands available in the non-distributed version of Redis. Commands performing complex multi-key operations like set unions and intersections are implemented for cases where all of the keys involved in the operation hash to the same slot.

Redis Cluster implements a concept called hash tags that can be used to force certain keys to be stored in the same hash slot. However, during manual resharding, multi-key operations may become unavailable for some time while single-key operations are always available.

Redis Cluster does not support multiple databases like the standalone version of Redis. We only support database 0; the SELECT command is not allowed.

Client and Server roles in the Redis cluster protocol

In Redis Cluster, nodes are responsible for holding the data, and taking the state of the cluster, including mapping keys to the right nodes. Cluster nodes are also able to auto-discover other nodes, detect non-working nodes, and promote replica nodes to master when needed in order to continue to operate when a failure occurs.

To perform their tasks all the cluster nodes are connected using a TCP bus and a binary protocol, called the Redis Cluster Bus. Every node is connected to every other node in the cluster using the cluster bus. Nodes use a gossip protocol to propagate information about the cluster in order to discover new nodes, to send ping packets to make sure all the other nodes are working properly, and to send cluster messages needed to signal specific conditions. The cluster bus is also used in order to propagate Pub/Sub messages across the cluster and to orchestrate manual failovers when requested by users (manual failovers are failovers which are not initiated by the Redis Cluster failure detector, but by the system administrator directly).

Since cluster nodes are not able to proxy requests, clients may be redirected to other nodes using redirection errors -MOVED and -ASK. The client is in theory free to send requests to all the nodes in the cluster, getting redirected if needed, so the client is not required to hold the state of the cluster. However clients that are able to cache the map between keys and nodes can improve the performance in a sensible way.

Write safety

Redis Cluster uses asynchronous replication between nodes, and last failover wins implicit merge function. This means that the last elected master dataset eventually replaces all the other replicas. There is always a window of time when it is possible to lose writes during partitions. However these windows are very different in the case of a client that is connected to the majority of masters, and a client that is connected to the minority of masters.

Redis Cluster tries harder to retain writes that are performed by clients connected to the majority of masters, compared to writes performed in the minority side. The following are examples of scenarios that lead to loss of acknowledged writes received in the majority partitions during failures:

  1. A write may reach a master, but while the master may be able to reply to the client, the write may not be propagated to replicas via the asynchronous replication used between master and replica nodes. If the master dies without the write reaching the replicas, the write is lost forever if the master is unreachable for a long enough period that one of its replicas is promoted. This is usually hard to observe in the case of a total, sudden failure of a master node since masters try to reply to clients (with the acknowledge of the write) and replicas (propagating the write) at about the same time. However it is a real world failure mode.

  2. Another theoretically possible failure mode where writes are lost is the following:

  • A master is unreachable because of a partition.
  • It gets failed over by one of its replicas.
  • After some time it may be reachable again.
  • A client with an out-of-date routing table may write to the old master before it is converted into a replica (of the new master) by the cluster.

The second failure mode is unlikely to happen because master nodes unable to communicate with the majority of the other masters for enough time to be failed over will no longer accept writes, and when the partition is fixed writes are still refused for a small amount of time to allow other nodes to inform about configuration changes. This failure mode also requires that the client's routing table has not yet been updated.

Writes targeting the minority side of a partition have a larger window in which to get lost. For example, Redis Cluster loses a non-trivial number of writes on partitions where there is a minority of masters and at least one or more clients, since all the writes sent to the masters may potentially get lost if the masters are failed over in the majority side.

Specifically, for a master to be failed over it must be unreachable by the majority of masters for at least NODE_TIMEOUT, so if the partition is fixed before that time, no writes are lost. When the partition lasts for more than NODE_TIMEOUT, all the writes performed in the minority side up to that point may be lost. However the minority side of a Redis Cluster will start refusing writes as soon as NODE_TIMEOUT time has elapsed without contact with the majority, so there is a maximum window after which the minority becomes no longer available. Hence, no writes are accepted or lost after that time.

Availability

Redis Cluster is not available in the minority side of the partition. In the majority side of the partition assuming that there are at least the majority of masters and a replica for every unreachable master, the cluster becomes available again after NODE_TIMEOUT time plus a few more seconds required for a replica to get elected and failover its master (failovers are usually executed in a matter of 1 or 2 seconds).

This means that Redis Cluster is designed to survive failures of a few nodes in the cluster, but it is not a suitable solution for applications that require availability in the event of large net splits.

In the example of a cluster composed of N master nodes where every node has a single replica, the majority side of the cluster will remain available as long as a single node is partitioned away, and will remain available with a probability of 1-(1/(N*2-1)) when two nodes are partitioned away (after the first node fails we are left with N*2-1 nodes in total, and the probability of the only master without a replica to fail is 1/(N*2-1)).

For example, in a cluster with 5 nodes and a single replica per node, there is a 1/(5*2-1) = 11.11% probability that after two nodes are partitioned away from the majority, the cluster will no longer be available.

Thanks to a Redis Cluster feature called replicas migration the Cluster availability is improved in many real world scenarios by the fact that replicas migrate to orphaned masters (masters no longer having replicas). So at every successful failure event, the cluster may reconfigure the replicas layout in order to better resist the next failure.

Performance

In Redis Cluster nodes don't proxy commands to the right node in charge for a given key, but instead they redirect clients to the right nodes serving a given portion of the key space.

Eventually clients obtain an up-to-date representation of the cluster and which node serves which subset of keys, so during normal operations clients directly contact the right nodes in order to send a given command.

Because of the use of asynchronous replication, nodes do not wait for other nodes' acknowledgment of writes (if not explicitly requested using the WAIT command).

Also, because multi-key commands are only limited to near keys, data is never moved between nodes except when resharding.

Normal operations are handled exactly as in the case of a single Redis instance. This means that in a Redis Cluster with N master nodes you can expect the same performance as a single Redis instance multiplied by N as the design scales linearly. At the same time the query is usually performed in a single round trip, since clients usually retain persistent connections with the nodes, so latency figures are also the same as the single standalone Redis node case.

Very high performance and scalability while preserving weak but reasonable forms of data safety and availability is the main goal of Redis Cluster.

Why merge operations are avoided

The Redis Cluster design avoids conflicting versions of the same key-value pair in multiple nodes as in the case of the Redis data model this is not always desirable. Values in Redis are often very large; it is common to see lists or sorted sets with millions of elements. Also data types are semantically complex. Transferring and merging these kind of values can be a major bottleneck and/or may require the non-trivial involvement of application-side logic, additional memory to store meta-data, and so forth.

There are no strict technological limits here. CRDTs or synchronously replicated state machines can model complex data types similar to Redis. However, the actual run time behavior of such systems would not be similar to Redis Cluster. Redis Cluster was designed in order to cover the exact use cases of the non-clustered Redis version.

Overview of Redis Cluster main components

Key distribution model

The cluster's key space is split into 16384 slots, effectively setting an upper limit for the cluster size of 16384 master nodes (however, the suggested max size of nodes is on the order of ~ 1000 nodes).

Each master node in a cluster handles a subset of the 16384 hash slots. The cluster is stable when there is no cluster reconfiguration in progress (i.e. where hash slots are being moved from one node to another). When the cluster is stable, a single hash slot will be served by a single node (however the serving node can have one or more replicas that will replace it in the case of net splits or failures, and that can be used in order to scale read operations where reading stale data is acceptable).

The base algorithm used to map keys to hash slots is the following (read the next paragraph for the hash tag exception to this rule):

HASH_SLOT = CRC16(key) mod 16384

The CRC16 is specified as follows:

  • Name: XMODEM (also known as ZMODEM or CRC-16/ACORN)
  • Width: 16 bit
  • Poly: 1021 (That is actually x^16 + x^12 + x^5 + 1)
  • Initialization: 0000
  • Reflect Input byte: False
  • Reflect Output CRC: False
  • Xor constant to output CRC: 0000
  • Output for "123456789": 31C3

14 out of 16 CRC16 output bits are used (this is why there is a modulo 16384 operation in the formula above).

In our tests CRC16 behaved remarkably well in distributing different kinds of keys evenly across the 16384 slots.

Note: A reference implementation of the CRC16 algorithm used is available in the Appendix A of this document.

Hash tags

There is an exception for the computation of the hash slot that is used in order to implement hash tags. Hash tags are a way to ensure that multiple keys are allocated in the same hash slot. This is used in order to implement multi-key operations in Redis Cluster.

To implement hash tags, the hash slot for a key is computed in a slightly different way in certain conditions. If the key contains a "{...}" pattern only the substring between { and } is hashed in order to obtain the hash slot. However since it is possible that there are multiple occurrences of { or } the algorithm is well specified by the following rules:

  • IF the key contains a { character.
  • AND IF there is a } character to the right of {.
  • AND IF there are one or more characters between the first occurrence of { and the first occurrence of }.

Then instead of hashing the key, only what is between the first occurrence of { and the following first occurrence of } is hashed.

Examples:

  • The two keys {user1000}.following and {user1000}.followers will hash to the same hash slot since only the substring user1000 will be hashed in order to compute the hash slot.
  • For the key foo{}{bar} the whole key will be hashed as usually since the first occurrence of { is followed by } on the right without characters in the middle.
  • For the key foo{{bar}}zap the substring {bar will be hashed, because it is the substring between the first occurrence of { and the first occurrence of } on its right.
  • For the key foo{bar}{zap} the substring bar will be hashed, since the algorithm stops at the first valid or invalid (without bytes inside) match of { and }.
  • What follows from the algorithm is that if the key starts with {}, it is guaranteed to be hashed as a whole. This is useful when using binary data as key names.

Adding the hash tags exception, the following is an implementation of the HASH_SLOT function in Ruby and C language.

Ruby example code:

def HASH_SLOT(key)
    s = key.index "{"
    if s
        e = key.index "}",s+1
        if e && e != s+1
            key = key[s+1..e-1]
        end
    end
    crc16(key) % 16384
end

C example code:

unsigned int HASH_SLOT(char *key, int keylen) {
    int s, e; /* start-end indexes of { and } */

    /* Search the first occurrence of '{'. */
    for (s = 0; s < keylen; s++)
        if (key[s] == '{') break;

    /* No '{' ? Hash the whole key. This is the base case. */
    if (s == keylen) return crc16(key,keylen) & 16383;

    /* '{' found? Check if we have the corresponding '}'. */
    for (e = s+1; e < keylen; e++)
        if (key[e] == '}') break;

    /* No '}' or nothing between {} ? Hash the whole key. */
    if (e == keylen || e == s+1) return crc16(key,keylen) & 16383;

    /* If we are here there is both a { and a } on its right. Hash
     * what is in the middle between { and }. */
    return crc16(key+s+1,e-s-1) & 16383;
}

Cluster node attributes

Every node has a unique name in the cluster. The node name is the hex representation of a 160 bit random number, obtained the first time a node is started (usually using /dev/urandom). The node will save its ID in the node configuration file, and will use the same ID forever, or at least as long as the node configuration file is not deleted by the system administrator, or a hard reset is requested via the CLUSTER RESET command.

The node ID is used to identify every node across the whole cluster. It is possible for a given node to change its IP address without any need to also change the node ID. The cluster is also able to detect the change in IP/port and reconfigure using the gossip protocol running over the cluster bus.

The node ID is not the only information associated with each node, but is the only one that is always globally consistent. Every node has also the following set of information associated. Some information is about the cluster configuration detail of this specific node, and is eventually consistent across the cluster. Some other information, like the last time a node was pinged, is instead local to each node.

Every node maintains the following information about other nodes that it is aware of in the cluster: The node ID, IP and port of the node, a set of flags, what is the master of the node if it is flagged as replica, last time the node was pinged and the last time the pong was received, the current configuration epoch of the node (explained later in this specification), the link state and finally the set of hash slots served.

A detailed explanation of all the node fields is described in the CLUSTER NODES documentation.

The CLUSTER NODES command can be sent to any node in the cluster and provides the state of the cluster and the information for each node according to the local view the queried node has of the cluster.

The following is sample output of the CLUSTER NODES command sent to a master node in a small cluster of three nodes.

$ redis-cli cluster nodes
d1861060fe6a534d42d8a19aeb36600e18785e04 127.0.0.1:6379 myself - 0 1318428930 1 connected 0-1364
3886e65cc906bfd9b1f7e7bde468726a052d1dae 127.0.0.1:6380 master - 1318428930 1318428931 2 connected 1365-2729
d289c575dcbc4bdd2931585fd4339089e461a27d 127.0.0.1:6381 master - 1318428931 1318428931 3 connected 2730-4095

In the above listing the different fields are in order: node id, address:port, flags, last ping sent, last pong received, configuration epoch, link state, slots. Details about the above fields will be covered as soon as we talk of specific parts of Redis Cluster.

The cluster bus

Every Redis Cluster node has an additional TCP port for receiving incoming connections from other Redis Cluster nodes. This port will be derived by adding 10000 to the data port or it can be specified with the cluster-port config.

Example 1:

If a Redis node is listening for client connections on port 6379, and you do not add cluster-port parameter in redis.conf, the Cluster bus port 16379 will be opened.

Example 2:

If a Redis node is listening for client connections on port 6379, and you set cluster-port 20000 in redis.conf, the Cluster bus port 20000 will be opened.

Node-to-node communication happens exclusively using the Cluster bus and the Cluster bus protocol: a binary protocol composed of frames of different types and sizes. The Cluster bus binary protocol is not publicly documented since it is not intended for external software devices to talk with Redis Cluster nodes using this protocol. However you can obtain more details about the Cluster bus protocol by reading the cluster.h and cluster.c files in the Redis Cluster source code.

Cluster topology

Redis Cluster is a full mesh where every node is connected with every other node using a TCP connection.

In a cluster of N nodes, every node has N-1 outgoing TCP connections, and N-1 incoming connections.

These TCP connections are kept alive all the time and are not created on demand. When a node expects a pong reply in response to a ping in the cluster bus, before waiting long enough to mark the node as unreachable, it will try to refresh the connection with the node by reconnecting from scratch.

While Redis Cluster nodes form a full mesh, nodes use a gossip protocol and a configuration update mechanism in order to avoid exchanging too many messages between nodes during normal conditions, so the number of messages exchanged is not exponential.

Node handshake

Nodes always accept connections on the cluster bus port, and even reply to pings when received, even if the pinging node is not trusted. However, all other packets will be discarded by the receiving node if the sending node is not considered part of the cluster.

A node will accept another node as part of the cluster only in two ways:

  • If a node presents itself with a MEET message (CLUSTER MEET command). A meet message is exactly like a PING message, but forces the receiver to accept the node as part of the cluster. Nodes will send MEET messages to other nodes only if the system administrator requests this via the following command:

    CLUSTER MEET ip port

  • A node will also register another node as part of the cluster if a node that is already trusted will gossip about this other node. So if A knows B, and B knows C, eventually B will send gossip messages to A about C. When this happens, A will register C as part of the network, and will try to connect with C.

This means that as long as we join nodes in any connected graph, they'll eventually form a fully connected graph automatically. This means that the cluster is able to auto-discover other nodes, but only if there is a trusted relationship that was forced by the system administrator.

This mechanism makes the cluster more robust but prevents different Redis clusters from accidentally mixing after change of IP addresses or other network related events.

Redirection and resharding

MOVED Redirection

A Redis client is free to send queries to every node in the cluster, including replica nodes. The node will analyze the query, and if it is acceptable (that is, only a single key is mentioned in the query, or the multiple keys mentioned are all to the same hash slot) it will lookup what node is responsible for the hash slot where the key or keys belong.

If the hash slot is served by the node, the query is simply processed, otherwise the node will check its internal hash slot to node map, and will reply to the client with a MOVED error, like in the following example:

GET x
-MOVED 3999 127.0.0.1:6381

The error includes the hash slot of the key (3999) and the endpoint:port of the instance that can serve the query. The client needs to reissue the query to the specified node's endpoint address and port. The endpoint can be either an IP address, a hostname, or it can be empty (e.g. -MOVED 3999 :6380). An empty endpoint indicates that the server node has an an unknown endpoint, and the client should send the next request to the same endpoint as the current request but with the provided port.

Note that even if the client waits a long time before reissuing the query, and in the meantime the cluster configuration changed, the destination node will reply again with a MOVED error if the hash slot 3999 is now served by another node. The same happens if the contacted node had no updated information.

So while from the point of view of the cluster nodes are identified by IDs we try to simplify our interface with the client just exposing a map between hash slots and Redis nodes identified by endpoint:port pairs.

The client is not required to, but should try to memorize that hash slot 3999 is served by 127.0.0.1:6381. This way once a new command needs to be issued it can compute the hash slot of the target key and have a greater chance of choosing the right node.

An alternative is to just refresh the whole client-side cluster layout using the CLUSTER SHARDS, or the deprecated CLUSTER SLOTS, command when a MOVED redirection is received. When a redirection is encountered, it is likely multiple slots were reconfigured rather than just one, so updating the client configuration as soon as possible is often the best strategy.

Note that when the Cluster is stable (no ongoing changes in the configuration), eventually all the clients will obtain a map of hash slots -> nodes, making the cluster efficient, with clients directly addressing the right nodes without redirections, proxies or other single point of failure entities.

A client must be also able to handle -ASK redirections that are described later in this document, otherwise it is not a complete Redis Cluster client.

Live reconfiguration

Redis Cluster supports the ability to add and remove nodes while the cluster is running. Adding or removing a node is abstracted into the same operation: moving a hash slot from one node to another. This means that the same basic mechanism can be used in order to rebalance the cluster, add or remove nodes, and so forth.

  • To add a new node to the cluster an empty node is added to the cluster and some set of hash slots are moved from existing nodes to the new node.
  • To remove a node from the cluster the hash slots assigned to that node are moved to other existing nodes.
  • To rebalance the cluster a given set of hash slots are moved between nodes.

The core of the implementation is the ability to move hash slots around. From a practical point of view a hash slot is just a set of keys, so what Redis Cluster really does during resharding is to move keys from an instance to another instance. Moving a hash slot means moving all the keys that happen to hash into this hash slot.

To understand how this works we need to show the CLUSTER subcommands that are used to manipulate the slots translation table in a Redis Cluster node.

The following subcommands are available (among others not useful in this case):

The first four commands, ADDSLOTS, DELSLOTS, ADDSLOTSRANGE and DELSLOTSRANGE, are simply used to assign (or remove) slots to a Redis node. Assigning a slot means to tell a given master node that it will be in charge of storing and serving content for the specified hash slot.

After the hash slots are assigned they will propagate across the cluster using the gossip protocol, as specified later in the configuration propagation section.

The ADDSLOTS and ADDSLOTSRANGE commands are usually used when a new cluster is created from scratch to assign each master node a subset of all the 16384 hash slots available.

The DELSLOTS and DELSLOTSRANGE are mainly used for manual modification of a cluster configuration or for debugging tasks: in practice it is rarely used.

The SETSLOT subcommand is used to assign a slot to a specific node ID if the SETSLOT <slot> NODE form is used. Otherwise the slot can be set in the two special states MIGRATING and IMPORTING. Those two special states are used in order to migrate a hash slot from one node to another.

  • When a slot is set as MIGRATING, the node will accept all queries that are about this hash slot, but only if the key in question exists, otherwise the query is forwarded using a -ASK redirection to the node that is target of the migration.
  • When a slot is set as IMPORTING, the node will accept all queries that are about this hash slot, but only if the request is preceded by an ASKING command. If the ASKING command was not given by the client, the query is redirected to the real hash slot owner via a -MOVED redirection error, as would happen normally.

Let's make this clearer with an example of hash slot migration. Assume that we have two Redis master nodes, called A and B. We want to move hash slot 8 from A to B, so we issue commands like this:

  • We send B: CLUSTER SETSLOT 8 IMPORTING A
  • We send A: CLUSTER SETSLOT 8 MIGRATING B

All the other nodes will continue to point clients to node "A" every time they are queried with a key that belongs to hash slot 8, so what happens is that:

  • All queries about existing keys are processed by "A".
  • All queries about non-existing keys in A are processed by "B", because "A" will redirect clients to "B".

This way we no longer create new keys in "A". In the meantime, redis-cli used during reshardings and Redis Cluster configuration will migrate existing keys in hash slot 8 from A to B. This is performed using the following command:

CLUSTER GETKEYSINSLOT slot count

The above command will return count keys in the specified hash slot. For keys returned, redis-cli sends node "A" a MIGRATE command, that will migrate the specified keys from A to B in an atomic way (both instances are locked for the time (usually very small time) needed to migrate keys so there are no race conditions). This is how MIGRATE works:

MIGRATE target_host target_port "" target_database id timeout KEYS key1 key2 ...

MIGRATE will connect to the target instance, send a serialized version of the key, and once an OK code is received, the old key from its own dataset will be deleted. From the point of view of an external client a key exists either in A or B at any given time.

In Redis Cluster there is no need to specify a database other than 0, but MIGRATE is a general command that can be used for other tasks not involving Redis Cluster. MIGRATE is optimized to be as fast as possible even when moving complex keys such as long lists, but in Redis Cluster reconfiguring the cluster where big keys are present is not considered a wise procedure if there are latency constraints in the application using the database.

When the migration process is finally finished, the SETSLOT <slot> NODE <node-id> command is sent to the two nodes involved in the migration in order to set the slots to their normal state again. The same command is usually sent to all other nodes to avoid waiting for the natural propagation of the new configuration across the cluster.

ASK redirection

In the previous section, we briefly talked about ASK redirection. Why can't we simply use MOVED redirection? Because while MOVED means that we think the hash slot is permanently served by a different node and the next queries should be tried against the specified node. ASK means to send only the next query to the specified node.

This is needed because the next query about hash slot 8 can be about a key that is still in A, so we always want the client to try A and then B if needed. Since this happens only for one hash slot out of 16384 available, the performance hit on the cluster is acceptable.

We need to force that client behavior, so to make sure that clients will only try node B after A was tried, node B will only accept queries of a slot that is set as IMPORTING if the client sends the ASKING command before sending the query.

Basically the ASKING command sets a one-time flag on the client that forces a node to serve a query about an IMPORTING slot.

The full semantics of ASK redirection from the point of view of the client is as follows:

  • If ASK redirection is received, send only the query that was redirected to the specified node but continue sending subsequent queries to the old node.
  • Start the redirected query with the ASKING command.
  • Don't yet update local client tables to map hash slot 8 to B.

Once hash slot 8 migration is completed, A will send a MOVED message and the client may permanently map hash slot 8 to the new endpoint and port pair. Note that if a buggy client performs the map earlier this is not a problem since it will not send the ASKING command before issuing the query, so B will redirect the client to A using a MOVED redirection error.

Slots migration is explained in similar terms but with different wording (for the sake of redundancy in the documentation) in the CLUSTER SETSLOT command documentation.

Client connections and redirection handling

To be efficient, Redis Cluster clients maintain a map of the current slot configuration. However, this configuration is not required to be up to date. When contacting the wrong node results in a redirection, the client can update its internal slot map accordingly.

Clients usually need to fetch a complete list of slots and mapped node addresses in two different situations:

  • At startup, to populate the initial slots configuration
  • When the client receives a MOVED redirection

Note that a client may handle the MOVED redirection by updating just the moved slot in its table; however this is usually not efficient because often the configuration of multiple slots will be modified at once. For example, if a replica is promoted to master, all of the slots served by the old master will be remapped). It is much simpler to react to a MOVED redirection by fetching the full map of slots to nodes from scratch.

Client can issue a CLUSTER SLOTS command to retrieve an array of slot ranges and the associated master and replica nodes serving the specified ranges.

The following is an example of output of CLUSTER SLOTS:

127.0.0.1:7000> cluster slots
1) 1) (integer) 5461
   2) (integer) 10922
   3) 1) "127.0.0.1"
      2) (integer) 7001
   4) 1) "127.0.0.1"
      2) (integer) 7004
2) 1) (integer) 0
   2) (integer) 5460
   3) 1) "127.0.0.1"
      2) (integer) 7000
   4) 1) "127.0.0.1"
      2) (integer) 7003
3) 1) (integer) 10923
   2) (integer) 16383
   3) 1) "127.0.0.1"
      2) (integer) 7002
   4) 1) "127.0.0.1"
      2) (integer) 7005

The first two sub-elements of every element of the returned array are the start and end slots of the range. The additional elements represent address-port pairs. The first address-port pair is the master serving the slot, and the additional address-port pairs are the replicas serving the same slot. Replicas will be listed only when not in an error condition (i.e., when their FAIL flag is not set).

The first element in the output above says that slots from 5461 to 10922 (start and end included) are served by 127.0.0.1:7001, and it is possible to scale read-only load contacting the replica at 127.0.0.1:7004.

CLUSTER SLOTS is not guaranteed to return ranges that cover the full 16384 slots if the cluster is misconfigured, so clients should initialize the slots configuration map filling the target nodes with NULL objects, and report an error if the user tries to execute commands about keys that belong to unassigned slots.

Before returning an error to the caller when a slot is found to be unassigned, the client should try to fetch the slots configuration again to check if the cluster is now configured properly.

Multi-keys operations

Using hash tags, clients are free to use multi-key operations. For example the following operation is valid:

MSET {user:1000}.name Angela {user:1000}.surname White

Multi-key operations may become unavailable when a resharding of the hash slot the keys belong to is in progress.

More specifically, even during a resharding the multi-key operations targeting keys that all exist and all still hash to the same slot (either the source or destination node) are still available.

Operations on keys that don't exist or are - during the resharding - split between the source and destination nodes, will generate a -TRYAGAIN error. The client can try the operation after some time, or report back the error.

As soon as migration of the specified hash slot has terminated, all multi-key operations are available again for that hash slot.

Scaling reads using replica nodes

Normally replica nodes will redirect clients to the authoritative master for the hash slot involved in a given command, however clients can use replicas in order to scale reads using the READONLY command.

READONLY tells a Redis Cluster replica node that the client is ok reading possibly stale data and is not interested in running write queries.

When the connection is in readonly mode, the cluster will send a redirection to the client only if the operation involves keys not served by the replica's master node. This may happen because:

  1. The client sent a command about hash slots never served by the master of this replica.
  2. The cluster was reconfigured (for example resharded) and the replica is no longer able to serve commands for a given hash slot.

When this happens the client should update its hash slot map as explained in the previous sections.

The readonly state of the connection can be cleared using the READWRITE command.

Fault Tolerance

Heartbeat and gossip messages

Redis Cluster nodes continuously exchange ping and pong packets. Those two kinds of packets have the same structure, and both carry important configuration information. The only actual difference is the message type field. We'll refer to the sum of ping and pong packets as heartbeat packets.

Usually nodes send ping packets that will trigger the receivers to reply with pong packets. However this is not necessarily true. It is possible for nodes to just send pong packets to send information to other nodes about their configuration, without triggering a reply. This is useful, for example, in order to broadcast a new configuration as soon as possible.

Usually a node will ping a few random nodes every second so that the total number of ping packets sent (and pong packets received) by each node is a constant amount regardless of the number of nodes in the cluster.

However every node makes sure to ping every other node that hasn't sent a ping or received a pong for longer than half the NODE_TIMEOUT time. Before NODE_TIMEOUT has elapsed, nodes also try to reconnect the TCP link with another node to make sure nodes are not believed to be unreachable only because there is a problem in the current TCP connection.

The number of messages globally exchanged can be sizable if NODE_TIMEOUT is set to a small figure and the number of nodes (N) is very large, since every node will try to ping every other node for which they don't have fresh information every half the NODE_TIMEOUT time.

For example in a 100 node cluster with a node timeout set to 60 seconds, every node will try to send 99 pings every 30 seconds, with a total amount of pings of 3.3 per second. Multiplied by 100 nodes, this is 330 pings per second in the total cluster.

There are ways to lower the number of messages, however there have been no reported issues with the bandwidth currently used by Redis Cluster failure detection, so for now the obvious and direct design is used. Note that even in the above example, the 330 packets per second exchanged are evenly divided among 100 different nodes, so the traffic each node receives is acceptable.

Heartbeat packet content

Ping and pong packets contain a header that is common to all types of packets (for instance packets to request a failover vote), and a special gossip section that is specific to Ping and Pong packets.

The common header has the following information:

  • Node ID, a 160 bit pseudorandom string that is assigned the first time a node is created and remains the same for all the life of a Redis Cluster node.
  • The currentEpoch and configEpoch fields of the sending node that are used to mount the distributed algorithms used by Redis Cluster (this is explained in detail in the next sections). If the node is a replica the configEpoch is the last known configEpoch of its master.
  • The node flags, indicating if the node is a replica, a master, and other single-bit node information.
  • A bitmap of the hash slots served by the sending node, or if the node is a replica, a bitmap of the slots served by its master.
  • The sender TCP base port that is the port used by Redis to accept client commands.
  • The cluster port that is the port used by Redis for node-to-node communication.
  • The state of the cluster from the point of view of the sender (down or ok).
  • The master node ID of the sending node, if it is a replica.

Ping and pong packets also contain a gossip section. This section offers to the receiver a view of what the sender node thinks about other nodes in the cluster. The gossip section only contains information about a few random nodes among the set of nodes known to the sender. The number of nodes mentioned in a gossip section is proportional to the cluster size.

For every node added in the gossip section the following fields are reported:

  • Node ID.
  • IP and port of the node.
  • Node flags.

Gossip sections allow receiving nodes to get information about the state of other nodes from the point of view of the sender. This is useful both for failure detection and to discover other nodes in the cluster.

Failure detection

Redis Cluster failure detection is used to recognize when a master or replica node is no longer reachable by the majority of nodes and then respond by promoting a replica to the role of master. When replica promotion is not possible the cluster is put in an error state to stop receiving queries from clients.

As already mentioned, every node takes a list of flags associated with other known nodes. There are two flags that are used for failure detection that are called PFAIL and FAIL. PFAIL means Possible failure, and is a non-acknowledged failure type. FAIL means that a node is failing and that this condition was confirmed by a majority of masters within a fixed amount of time.

PFAIL flag:

A node flags another node with the PFAIL flag when the node is not reachable for more than NODE_TIMEOUT time. Both master and replica nodes can flag another node as PFAIL, regardless of its type.

The concept of non-reachability for a Redis Cluster node is that we have an active ping (a ping that we sent for which we have yet to get a reply) pending for longer than NODE_TIMEOUT. For this mechanism to work the NODE_TIMEOUT must be large compared to the network round trip time. In order to add reliability during normal operations, nodes will try to reconnect with other nodes in the cluster as soon as half of the NODE_TIMEOUT has elapsed without a reply to a ping. This mechanism ensures that connections are kept alive so broken connections usually won't result in false failure reports between nodes.

FAIL flag:

The PFAIL flag alone is just local information every node has about other nodes, but it is not sufficient to trigger a replica promotion. For a node to be considered down the PFAIL condition needs to be escalated to a FAIL condition.

As outlined in the node heartbeats section of this document, every node sends gossip messages to every other node including the state of a few random known nodes. Every node eventually receives a set of node flags for every other node. This way every node has a mechanism to signal other nodes about failure conditions they have detected.

A PFAIL condition is escalated to a FAIL condition when the following set of conditions are met:

  • Some node, that we'll call A, has another node B flagged as PFAIL.
  • Node A collected, via gossip sections, information about the state of B from the point of view of the majority of masters in the cluster.
  • The majority of masters signaled the PFAIL or FAIL condition within NODE_TIMEOUT * FAIL_REPORT_VALIDITY_MULT time. (The validity factor is set to 2 in the current implementation, so this is just two times the NODE_TIMEOUT time).

If all the above conditions are true, Node A will:

  • Mark the node as FAIL.
  • Send a FAIL message (as opposed to a FAIL condition within a heartbeat message) to all the reachable nodes.

The FAIL message will force every receiving node to mark the node in FAIL state, whether or not it already flagged the node in PFAIL state.

Note that the FAIL flag is mostly one way. That is, a node can go from PFAIL to FAIL, but a FAIL flag can only be cleared in the following situations:

  • The node is already reachable and is a replica. In this case the FAIL flag can be cleared as replicas are not failed over.
  • The node is already reachable and is a master not serving any slot. In this case the FAIL flag can be cleared as masters without slots do not really participate in the cluster and are waiting to be configured in order to join the cluster.
  • The node is already reachable and is a master, but a long time (N times the NODE_TIMEOUT) has elapsed without any detectable replica promotion. It's better for it to rejoin the cluster and continue in this case.

It is useful to note that while the PFAIL -> FAIL transition uses a form of agreement, the agreement used is weak:

  1. Nodes collect views of other nodes over some time period, so even if the majority of master nodes need to "agree", actually this is just state that we collected from different nodes at different times and we are not sure, nor we require, that at a given moment the majority of masters agreed. However we discard failure reports which are old, so the failure was signaled by the majority of masters within a window of time.
  2. While every node detecting the FAIL condition will force that condition on other nodes in the cluster using the FAIL message, there is no way to ensure the message will reach all the nodes. For instance a node may detect the FAIL condition and because of a partition will not be able to reach any other node.

However the Redis Cluster failure detection has a liveness requirement: eventually all the nodes should agree about the state of a given node. There are two cases that can originate from split brain conditions. Either some minority of nodes believe the node is in FAIL state, or a minority of nodes believe the node is not in FAIL state. In both the cases eventually the cluster will have a single view of the state of a given node:

Case 1: If a majority of masters have flagged a node as FAIL, because of failure detection and the chain effect it generates, every other node will eventually flag the master as FAIL, since in the specified window of time enough failures will be reported.

Case 2: When only a minority of masters have flagged a node as FAIL, the replica promotion will not happen (as it uses a more formal algorithm that makes sure everybody knows about the promotion eventually) and every node will clear the FAIL state as per the FAIL state clearing rules above (i.e. no promotion after N times the NODE_TIMEOUT has elapsed).

The FAIL flag is only used as a trigger to run the safe part of the algorithm for the replica promotion. In theory a replica may act independently and start a replica promotion when its master is not reachable, and wait for the masters to refuse to provide the acknowledgment if the master is actually reachable by the majority. However the added complexity of the PFAIL -> FAIL state, the weak agreement, and the FAIL message forcing the propagation of the state in the shortest amount of time in the reachable part of the cluster, have practical advantages. Because of these mechanisms, usually all the nodes will stop accepting writes at about the same time if the cluster is in an error state. This is a desirable feature from the point of view of applications using Redis Cluster. Also erroneous election attempts initiated by replicas that can't reach its master due to local problems (the master is otherwise reachable by the majority of other master nodes) are avoided.

Configuration handling, propagation, and failovers

Cluster current epoch

Redis Cluster uses a concept similar to the Raft algorithm "term". In Redis Cluster the term is called epoch instead, and it is used in order to give incremental versioning to events. When multiple nodes provide conflicting information, it becomes possible for another node to understand which state is the most up to date.

The currentEpoch is a 64 bit unsigned number.

At node creation every Redis Cluster node, both replicas and master nodes, set the currentEpoch to 0.

Every time a packet is received from another node, if the epoch of the sender (part of the cluster bus messages header) is greater than the local node epoch, the currentEpoch is updated to the sender epoch.

Because of these semantics, eventually all the nodes will agree to the greatest currentEpoch in the cluster.

This information is used when the state of the cluster is changed and a node seeks agreement in order to perform some action.

Currently this happens only during replica promotion, as described in the next section. Basically the epoch is a logical clock for the cluster and dictates that given information wins over one with a smaller epoch.

Configuration epoch

Every master always advertises its configEpoch in ping and pong packets along with a bitmap advertising the set of slots it serves.

The configEpoch is set to zero in masters when a new node is created.

A new configEpoch is created during replica election. replicas trying to replace failing masters increment their epoch and try to get authorization from a majority of masters. When a replica is authorized, a new unique configEpoch is created and the replica turns into a master using the new configEpoch.

As explained in the next sections the configEpoch helps to resolve conflicts when different nodes claim divergent configurations (a condition that may happen because of network partitions and node failures).

replica nodes also advertise the configEpoch field in ping and pong packets, but in the case of replicas the field represents the configEpoch of its master as of the last time they exchanged packets. This allows other instances to detect when a replica has an old configuration that needs to be updated (master nodes will not grant votes to replicas with an old configuration).

Every time the configEpoch changes for some known node, it is permanently stored in the nodes.conf file by all the nodes that receive this information. The same also happens for the currentEpoch value. These two variables are guaranteed to be saved and fsync-ed to disk when updated before a node continues its operations.

The configEpoch values generated using a simple algorithm during failovers are guaranteed to be new, incremental, and unique.

Replica election and promotion

Replica election and promotion is handled by replica nodes, with the help of master nodes that vote for the replica to promote. A replica election happens when a master is in FAIL state from the point of view of at least one of its replicas that has the prerequisites in order to become a master.

In order for a replica to promote itself to master, it needs to start an election and win it. All the replicas for a given master can start an election if the master is in FAIL state, however only one replica will win the election and promote itself to master.

A replica starts an election when the following conditions are met:

  • The replica's master is in FAIL state.
  • The master was serving a non-zero number of slots.
  • The replica replication link was disconnected from the master for no longer than a given amount of time, in order to ensure the promoted replica's data is reasonably fresh. This time is user configurable.

In order to be elected, the first step for a replica is to increment its currentEpoch counter, and request votes from master instances.

Votes are requested by the replica by broadcasting a FAILOVER_AUTH_REQUEST packet to every master node of the cluster. Then it waits for a maximum time of two times the NODE_TIMEOUT for replies to arrive (but always for at least 2 seconds).

Once a master has voted for a given replica, replying positively with a FAILOVER_AUTH_ACK, it can no longer vote for another replica of the same master for a period of NODE_TIMEOUT * 2. In this period it will not be able to reply to other authorization requests for the same master. This is not needed to guarantee safety, but useful for preventing multiple replicas from getting elected (even if with a different configEpoch) at around the same time, which is usually not wanted.

A replica discards any AUTH_ACK replies with an epoch that is less than the currentEpoch at the time the vote request was sent. This ensures it doesn't count votes intended for a previous election.

Once the replica receives ACKs from the majority of masters, it wins the election. Otherwise if the majority is not reached within the period of two times NODE_TIMEOUT (but always at least 2 seconds), the election is aborted and a new one will be tried again after NODE_TIMEOUT * 4 (and always at least 4 seconds).

Replica rank

As soon as a master is in FAIL state, a replica waits a short period of time before trying to get elected. That delay is computed as follows:

DELAY = 500 milliseconds + random delay between 0 and 500 milliseconds +
        REPLICA_RANK * 1000 milliseconds.

The fixed delay ensures that we wait for the FAIL state to propagate across the cluster, otherwise the replica may try to get elected while the masters are still unaware of the FAIL state, refusing to grant their vote.

The random delay is used to desynchronize replicas so they're unlikely to start an election at the same time.

The REPLICA_RANK is the rank of this replica regarding the amount of replication data it has processed from the master. Replicas exchange messages when the master is failing in order to establish a (best effort) rank: the replica with the most updated replication offset is at rank 0, the second most updated at rank 1, and so forth. In this way the most updated replicas try to get elected before others.

Rank order is not strictly enforced; if a replica of higher rank fails to be elected, the others will try shortly.

Once a replica wins the election, it obtains a new unique and incremental configEpoch which is higher than that of any other existing master. It starts advertising itself as master in ping and pong packets, providing the set of served slots with a configEpoch that will win over the past ones.

In order to speedup the reconfiguration of other nodes, a pong packet is broadcast to all the nodes of the cluster. Currently unreachable nodes will eventually be reconfigured when they receive a ping or pong packet from another node or will receive an UPDATE packet from another node if the information it publishes via heartbeat packets are detected to be out of date.

The other nodes will detect that there is a new master serving the same slots served by the old master but with a greater configEpoch, and will upgrade their configuration. Replicas of the old master (or the failed over master if it rejoins the cluster) will not just upgrade the configuration but will also reconfigure to replicate from the new master. How nodes rejoining the cluster are configured is explained in the next sections.

Masters reply to replica vote request

In the previous section, we discussed how replicas try to get elected. This section explains what happens from the point of view of a master that is requested to vote for a given replica.

Masters receive requests for votes in form of FAILOVER_AUTH_REQUEST requests from replicas.

For a vote to be granted the following conditions need to be met:

  1. A master only votes a single time for a given epoch, and refuses to vote for older epochs: every master has a lastVoteEpoch field and will refuse to vote again as long as the currentEpoch in the auth request packet is not greater than the lastVoteEpoch. When a master replies positively to a vote request, the lastVoteEpoch is updated accordingly, and safely stored on disk.
  2. A master votes for a replica only if the replica's master is flagged as FAIL.
  3. Auth requests with a currentEpoch that is less than the master currentEpoch are ignored. Because of this the master reply will always have the same currentEpoch as the auth request. If the same replica asks again to be voted, incrementing the currentEpoch, it is guaranteed that an old delayed reply from the master can not be accepted for the new vote.

Example of the issue caused by not using rule number 3:

Master currentEpoch is 5, lastVoteEpoch is 1 (this may happen after a few failed elections)

  • Replica currentEpoch is 3.
  • Replica tries to be elected with epoch 4 (3+1), master replies with an ok with currentEpoch 5, however the reply is delayed.
  • Replica will try to be elected again, at a later time, with epoch 5 (4+1), the delayed reply reaches the replica with currentEpoch 5, and is accepted as valid.
  1. Masters don't vote for a replica of the same master before NODE_TIMEOUT * 2 has elapsed if a replica of that master was already voted for. This is not strictly required as it is not possible for two replicas to win the election in the same epoch. However, in practical terms it ensures that when a replica is elected it has plenty of time to inform the other replicas and avoid the possibility that another replica will win a new election, performing an unnecessary second failover.
  2. Masters make no effort to select the best replica in any way. If the replica's master is in FAIL state and the master did not vote in the current term, a positive vote is granted. The best replica is the most likely to start an election and win it before the other replicas, since it will usually be able to start the voting process earlier because of its higher rank as explained in the previous section.
  3. When a master refuses to vote for a given replica there is no negative response, the request is simply ignored.
  4. Masters don't vote for replicas sending a configEpoch that is less than any configEpoch in the master table for the slots claimed by the replica. Remember that the replica sends the configEpoch of its master, and the bitmap of the slots served by its master. This means that the replica requesting the vote must have a configuration for the slots it wants to failover that is newer or equal the one of the master granting the vote.

Practical example of configuration epoch usefulness during partitions

This section illustrates how the epoch concept is used to make the replica promotion process more resistant to partitions.

  • A master is no longer reachable indefinitely. The master has three replicas A, B, C.
  • Replica A wins the election and is promoted to master.
  • A network partition makes A not available for the majority of the cluster.
  • Replica B wins the election and is promoted as master.
  • A partition makes B not available for the majority of the cluster.
  • The previous partition is fixed, and A is available again.

At this point B is down and A is available again with a role of master (actually UPDATE messages would reconfigure it promptly, but here we assume all UPDATE messages were lost). At the same time, replica C will try to get elected in order to fail over B. This is what happens:

  1. C will try to get elected and will succeed, since for the majority of masters its master is actually down. It will obtain a new incremental configEpoch.
  2. A will not be able to claim to be the master for its hash slots, because the other nodes already have the same hash slots associated with a higher configuration epoch (the one of B) compared to the one published by A.
  3. So, all the nodes will upgrade their table to assign the hash slots to C, and the cluster will continue its operations.

As you'll see in the next sections, a stale node rejoining a cluster will usually get notified as soon as possible about the configuration change because as soon as it pings any other node, the receiver will detect it has stale information and will send an UPDATE message.

Hash slots configuration propagation

An important part of Redis Cluster is the mechanism used to propagate the information about which cluster node is serving a given set of hash slots. This is vital to both the startup of a fresh cluster and the ability to upgrade the configuration after a replica was promoted to serve the slots of its failing master.

The same mechanism allows nodes partitioned away for an indefinite amount of time to rejoin the cluster in a sensible way.

There are two ways hash slot configurations are propagated:

  1. Heartbeat messages. The sender of a ping or pong packet always adds information about the set of hash slots it (or its master, if it is a replica) serves.
  2. UPDATE messages. Since in every heartbeat packet there is information about the sender configEpoch and set of hash slots served, if a receiver of a heartbeat packet finds the sender information is stale, it will send a packet with new information, forcing the stale node to update its info.

The receiver of a heartbeat or UPDATE message uses certain simple rules in order to update its table mapping hash slots to nodes. When a new Redis Cluster node is created, its local hash slot table is simply initialized to NULL entries so that each hash slot is not bound or linked to any node. This looks similar to the following:

0 -> NULL
1 -> NULL
2 -> NULL
...
16383 -> NULL

The first rule followed by a node in order to update its hash slot table is the following:

Rule 1: If a hash slot is unassigned (set to NULL), and a known node claims it, I'll modify my hash slot table and associate the claimed hash slots to it.

So if we receive a heartbeat from node A claiming to serve hash slots 1 and 2 with a configuration epoch value of 3, the table will be modified to:

0 -> NULL
1 -> A [3]
2 -> A [3]
...
16383 -> NULL

When a new cluster is created, a system administrator needs to manually assign (using the CLUSTER ADDSLOTS command, via the redis-cli command line tool, or by any other means) the slots served by each master node only to the node itself, and the information will rapidly propagate across the cluster.

However this rule is not enough. We know that hash slot mapping can change during two events:

  1. A replica replaces its master during a failover.
  2. A slot is resharded from a node to a different one.

For now let's focus on failovers. When a replica fails over its master, it obtains a configuration epoch which is guaranteed to be greater than the one of its master (and more generally greater than any other configuration epoch generated previously). For example node B, which is a replica of A, may failover A with configuration epoch of 4. It will start to send heartbeat packets (the first time mass-broadcasting cluster-wide) and because of the following second rule, receivers will update their hash slot tables:

Rule 2: If a hash slot is already assigned, and a known node is advertising it using a configEpoch that is greater than the configEpoch of the master currently associated with the slot, I'll rebind the hash slot to the new node.

So after receiving messages from B that claim to serve hash slots 1 and 2 with configuration epoch of 4, the receivers will update their table in the following way:

0 -> NULL
1 -> B [4]
2 -> B [4]
...
16383 -> NULL

Liveness property: because of the second rule, eventually all nodes in the cluster will agree that the owner of a slot is the one with the greatest configEpoch among the nodes advertising it.

This mechanism in Redis Cluster is called last failover wins.

The same happens during resharding. When a node importing a hash slot completes the import operation, its configuration epoch is incremented to make sure the change will be propagated throughout the cluster.

UPDATE messages, a closer look

With the previous section in mind, it is easier to see how update messages work. Node A may rejoin the cluster after some time. It will send heartbeat packets where it claims it serves hash slots 1 and 2 with configuration epoch of 3. All the receivers with updated information will instead see that the same hash slots are associated with node B having a higher configuration epoch. Because of this they'll send an UPDATE message to A with the new configuration for the slots. A will update its configuration because of the rule 2 above.

How nodes rejoin the cluster

The same basic mechanism is used when a node rejoins a cluster. Continuing with the example above, node A will be notified that hash slots 1 and 2 are now served by B. Assuming that these two were the only hash slots served by A, the count of hash slots served by A will drop to 0! So A will reconfigure to be a replica of the new master.

The actual rule followed is a bit more complex than this. In general it may happen that A rejoins after a lot of time, in the meantime it may happen that hash slots originally served by A are served by multiple nodes, for example hash slot 1 may be served by B, and hash slot 2 by C.

So the actual Redis Cluster node role switch rule is: A master node will change its configuration to replicate (be a replica of) the node that stole its last hash slot.

During reconfiguration, eventually the number of served hash slots will drop to zero, and the node will reconfigure accordingly. Note that in the base case this just means that the old master will be a replica of the replica that replaced it after a failover. However in the general form the rule covers all possible cases.

Replicas do exactly the same: they reconfigure to replicate the node that stole the last hash slot of its former master.

Replica migration

Redis Cluster implements a concept called replica migration in order to improve the availability of the system. The idea is that in a cluster with a master-replica setup, if the map between replicas and masters is fixed availability is limited over time if multiple independent failures of single nodes happen.

For example in a cluster where every master has a single replica, the cluster can continue operations as long as either the master or the replica fail, but not if both fail the same time. However there is a class of failures that are the independent failures of single nodes caused by hardware or software issues that can accumulate over time. For example:

  • Master A has a single replica A1.
  • Master A fails. A1 is promoted as new master.
  • Three hours later A1 fails in an independent manner (unrelated to the failure of A). No other replica is available for promotion since node A is still down. The cluster cannot continue normal operations.

If the map between masters and replicas is fixed, the only way to make the cluster more resistant to the above scenario is to add replicas to every master, however this is costly as it requires more instances of Redis to be executed, more memory, and so forth.

An alternative is to create an asymmetry in the cluster, and let the cluster layout automatically change over time. For example the cluster may have three masters A, B, C. A and B have a single replica each, A1 and B1. However the master C is different and has two replicas: C1 and C2.

Replica migration is the process of automatic reconfiguration of a replica in order to migrate to a master that has no longer coverage (no working replicas). With replica migration the scenario mentioned above turns into the following:

  • Master A fails. A1 is promoted.
  • C2 migrates as replica of A1, that is otherwise not backed by any replica.
  • Three hours later A1 fails as well.
  • C2 is promoted as new master to replace A1.
  • The cluster can continue the operations.

Replica migration algorithm

The migration algorithm does not use any form of agreement since the replica layout in a Redis Cluster is not part of the cluster configuration that needs to be consistent and/or versioned with config epochs. Instead it uses an algorithm to avoid mass-migration of replicas when a master is not backed. The algorithm guarantees that eventually (once the cluster configuration is stable) every master will be backed by at least one replica.

This is how the algorithm works. To start we need to define what is a good replica in this context: a good replica is a replica not in FAIL state from the point of view of a given node.

The execution of the algorithm is triggered in every replica that detects that there is at least a single master without good replicas. However among all the replicas detecting this condition, only a subset should act. This subset is actually often a single replica unless different replicas have in a given moment a slightly different view of the failure state of other nodes.

The acting replica is the replica among the masters with the maximum number of attached replicas, that is not in FAIL state and has the smallest node ID.

So for example if there are 10 masters with 1 replica each, and 2 masters with 5 replicas each, the replica that will try to migrate is - among the 2 masters having 5 replicas - the one with the lowest node ID. Given that no agreement is used, it is possible that when the cluster configuration is not stable, a race condition occurs where multiple replicas believe themselves to be the non-failing replica with the lower node ID (it is unlikely for this to happen in practice). If this happens, the result is multiple replicas migrating to the same master, which is harmless. If the race happens in a way that will leave the ceding master without replicas, as soon as the cluster is stable again the algorithm will be re-executed again and will migrate a replica back to the original master.

Eventually every master will be backed by at least one replica. However, the normal behavior is that a single replica migrates from a master with multiple replicas to an orphaned master.

The algorithm is controlled by a user-configurable parameter called cluster-migration-barrier: the number of good replicas a master must be left with before a replica can migrate away. For example, if this parameter is set to 2, a replica can try to migrate only if its master remains with two working replicas.

configEpoch conflicts resolution algorithm

When new configEpoch values are created via replica promotion during failovers, they are guaranteed to be unique.

However there are two distinct events where new configEpoch values are created in an unsafe way, just incrementing the local currentEpoch of the local node and hoping there are no conflicts at the same time. Both the events are system-administrator triggered:

  1. CLUSTER FAILOVER command with TAKEOVER option is able to manually promote a replica node into a master without the majority of masters being available. This is useful, for example, in multi data center setups.
  2. Migration of slots for cluster rebalancing also generates new configuration epochs inside the local node without agreement for performance reasons.

Specifically, during manual resharding, when a hash slot is migrated from a node A to a node B, the resharding program will force B to upgrade its configuration to an epoch which is the greatest found in the cluster, plus 1 (unless the node is already the one with the greatest configuration epoch), without requiring agreement from other nodes. Usually a real world resharding involves moving several hundred hash slots (especially in small clusters). Requiring an agreement to generate new configuration epochs during resharding, for each hash slot moved, is inefficient. Moreover it requires an fsync in each of the cluster nodes every time in order to store the new configuration. Because of the way it is performed instead, we only need a new config epoch when the first hash slot is moved, making it much more efficient in production environments.

However because of the two cases above, it is possible (though unlikely) to end with multiple nodes having the same configuration epoch. A resharding operation performed by the system administrator, and a failover happening at the same time (plus a lot of bad luck) could cause currentEpoch collisions if they are not propagated fast enough.

Moreover, software bugs and filesystem corruptions can also contribute to multiple nodes having the same configuration epoch.

When masters serving different hash slots have the same configEpoch, there are no issues. It is more important that replicas failing over a master have unique configuration epochs.

That said, manual interventions or resharding may change the cluster configuration in different ways. The Redis Cluster main liveness property requires that slot configurations always converge, so under every circumstance we really want all the master nodes to have a different configEpoch.

In order to enforce this, a conflict resolution algorithm is used in the event that two nodes end up with the same configEpoch.

  • IF a master node detects another master node is advertising itself with the same configEpoch.
  • AND IF the node has a lexicographically smaller Node ID compared to the other node claiming the same configEpoch.
  • THEN it increments its currentEpoch by 1, and uses it as the new configEpoch.

If there are any set of nodes with the same configEpoch, all the nodes but the one with the greatest Node ID will move forward, guaranteeing that, eventually, every node will pick a unique configEpoch regardless of what happened.

This mechanism also guarantees that after a fresh cluster is created, all nodes start with a different configEpoch (even if this is not actually used) since redis-cli makes sure to use CLUSTER SET-CONFIG-EPOCH at startup. However if for some reason a node is left misconfigured, it will update its configuration to a different configuration epoch automatically.

Node resets

Nodes can be software reset (without restarting them) in order to be reused in a different role or in a different cluster. This is useful in normal operations, in testing, and in cloud environments where a given node can be reprovisioned to join a different set of nodes to enlarge or create a new cluster.

In Redis Cluster nodes are reset using the CLUSTER RESET command. The command is provided in two variants:

  • CLUSTER RESET SOFT
  • CLUSTER RESET HARD

The command must be sent directly to the node to reset. If no reset type is provided, a soft reset is performed.

The following is a list of operations performed by a reset:

  1. Soft and hard reset: If the node is a replica, it is turned into a master, and its dataset is discarded. If the node is a master and contains keys the reset operation is aborted.
  2. Soft and hard reset: All the slots are released, and the manual failover state is reset.
  3. Soft and hard reset: All the other nodes in the nodes table are removed, so the node no longer knows any other node.
  4. Hard reset only: currentEpoch, configEpoch, and lastVoteEpoch are set to 0.
  5. Hard reset only: the Node ID is changed to a new random ID.

Master nodes with non-empty data sets can't be reset (since normally you want to reshard data to the other nodes). However, under special conditions when this is appropriate (e.g. when a cluster is totally destroyed with the intent of creating a new one), FLUSHALL must be executed before proceeding with the reset.

Removing nodes from a cluster

It is possible to practically remove a node from an existing cluster by resharding all its data to other nodes (if it is a master node) and shutting it down. However, the other nodes will still remember its node ID and address, and will attempt to connect with it.

For this reason, when a node is removed we want to also remove its entry from all the other nodes tables. This is accomplished by using the CLUSTER FORGET <node-id> command.

The command does two things:

  1. It removes the node with the specified node ID from the nodes table.
  2. It sets a 60 second ban which prevents a node with the same node ID from being re-added.

The second operation is needed because Redis Cluster uses gossip in order to auto-discover nodes, so removing the node X from node A, could result in node B gossiping about node X to A again. Because of the 60 second ban, the Redis Cluster administration tools have 60 seconds in order to remove the node from all the nodes, preventing the re-addition of the node due to auto discovery.

Further information is available in the CLUSTER FORGET documentation.

Publish/Subscribe

In a Redis Cluster, clients can subscribe to every node, and can also publish to every other node. The cluster will make sure that published messages are forwarded as needed.

The clients can send SUBSCRIBE to any node and can also send PUBLISH to any node. It will simply broadcast each published message to all other nodes.

Redis 7.0 and later features sharded pub/sub, in which shard channels are assigned to slots by the same algorithm used to assign keys to slots. A shard message must be sent to a node that owns the slot the shard channel is hashed to. The cluster makes sure the published shard messages are forwarded to all nodes in the shard, so clients can subscribe to a shard channel by connecting to either the master responsible for the slot, or to any of its replicas.

Appendix

Appendix A: CRC16 reference implementation in ANSI C

/*
 * Copyright 2001-2010 Georges Menie (www.menie.org)
 * Copyright 2010 Salvatore Sanfilippo (adapted to Redis coding style)
 * All rights reserved.
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions are met:
 *
 *     * Redistributions of source code must retain the above copyright
 *       notice, this list of conditions and the following disclaimer.
 *     * Redistributions in binary form must reproduce the above copyright
 *       notice, this list of conditions and the following disclaimer in the
 *       documentation and/or other materials provided with the distribution.
 *     * Neither the name of the University of California, Berkeley nor the
 *       names of its contributors may be used to endorse or promote products
 *       derived from this software without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND ANY
 * EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
 * WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
 * DISCLAIMED. IN NO EVENT SHALL THE REGENTS AND CONTRIBUTORS BE LIABLE FOR ANY
 * DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
 * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
 * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
 * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
 * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
 * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
 */

/* CRC16 implementation according to CCITT standards.
 *
 * Note by @antirez: this is actually the XMODEM CRC 16 algorithm, using the
 * following parameters:
 *
 * Name                       : "XMODEM", also known as "ZMODEM", "CRC-16/ACORN"
 * Width                      : 16 bit
 * Poly                       : 1021 (That is actually x^16 + x^12 + x^5 + 1)
 * Initialization             : 0000
 * Reflect Input byte         : False
 * Reflect Output CRC         : False
 * Xor constant to output CRC : 0000
 * Output for "123456789"     : 31C3
 */

static const uint16_t crc16tab[256]= {
    0x0000,0x1021,0x2042,0x3063,0x4084,0x50a5,0x60c6,0x70e7,
    0x8108,0x9129,0xa14a,0xb16b,0xc18c,0xd1ad,0xe1ce,0xf1ef,
    0x1231,0x0210,0x3273,0x2252,0x52b5,0x4294,0x72f7,0x62d6,
    0x9339,0x8318,0xb37b,0xa35a,0xd3bd,0xc39c,0xf3ff,0xe3de,
    0x2462,0x3443,0x0420,0x1401,0x64e6,0x74c7,0x44a4,0x5485,
    0xa56a,0xb54b,0x8528,0x9509,0xe5ee,0xf5cf,0xc5ac,0xd58d,
    0x3653,0x2672,0x1611,0x0630,0x76d7,0x66f6,0x5695,0x46b4,
    0xb75b,0xa77a,0x9719,0x8738,0xf7df,0xe7fe,0xd79d,0xc7bc,
    0x48c4,0x58e5,0x6886,0x78a7,0x0840,0x1861,0x2802,0x3823,
    0xc9cc,0xd9ed,0xe98e,0xf9af,0x8948,0x9969,0xa90a,0xb92b,
    0x5af5,0x4ad4,0x7ab7,0x6a96,0x1a71,0x0a50,0x3a33,0x2a12,
    0xdbfd,0xcbdc,0xfbbf,0xeb9e,0x9b79,0x8b58,0xbb3b,0xab1a,
    0x6ca6,0x7c87,0x4ce4,0x5cc5,0x2c22,0x3c03,0x0c60,0x1c41,
    0xedae,0xfd8f,0xcdec,0xddcd,0xad2a,0xbd0b,0x8d68,0x9d49,
    0x7e97,0x6eb6,0x5ed5,0x4ef4,0x3e13,0x2e32,0x1e51,0x0e70,
    0xff9f,0xefbe,0xdfdd,0xcffc,0xbf1b,0xaf3a,0x9f59,0x8f78,
    0x9188,0x81a9,0xb1ca,0xa1eb,0xd10c,0xc12d,0xf14e,0xe16f,
    0x1080,0x00a1,0x30c2,0x20e3,0x5004,0x4025,0x7046,0x6067,
    0x83b9,0x9398,0xa3fb,0xb3da,0xc33d,0xd31c,0xe37f,0xf35e,
    0x02b1,0x1290,0x22f3,0x32d2,0x4235,0x5214,0x6277,0x7256,
    0xb5ea,0xa5cb,0x95a8,0x8589,0xf56e,0xe54f,0xd52c,0xc50d,
    0x34e2,0x24c3,0x14a0,0x0481,0x7466,0x6447,0x5424,0x4405,
    0xa7db,0xb7fa,0x8799,0x97b8,0xe75f,0xf77e,0xc71d,0xd73c,
    0x26d3,0x36f2,0x0691,0x16b0,0x6657,0x7676,0x4615,0x5634,
    0xd94c,0xc96d,0xf90e,0xe92f,0x99c8,0x89e9,0xb98a,0xa9ab,
    0x5844,0x4865,0x7806,0x6827,0x18c0,0x08e1,0x3882,0x28a3,
    0xcb7d,0xdb5c,0xeb3f,0xfb1e,0x8bf9,0x9bd8,0xabbb,0xbb9a,
    0x4a75,0x5a54,0x6a37,0x7a16,0x0af1,0x1ad0,0x2ab3,0x3a92,
    0xfd2e,0xed0f,0xdd6c,0xcd4d,0xbdaa,0xad8b,0x9de8,0x8dc9,
    0x7c26,0x6c07,0x5c64,0x4c45,0x3ca2,0x2c83,0x1ce0,0x0cc1,
    0xef1f,0xff3e,0xcf5d,0xdf7c,0xaf9b,0xbfba,0x8fd9,0x9ff8,
    0x6e17,0x7e36,0x4e55,0x5e74,0x2e93,0x3eb2,0x0ed1,0x1ef0
};

uint16_t crc16(const char *buf, int len) {
    int counter;
    uint16_t crc = 0;
    for (counter = 0; counter < len; counter++)
            crc = (crc<<8) ^ crc16tab[((crc>>8) ^ *buf++)&0x00FF];
    return crc;
}

4 - Redis command arguments

How Redis commands expose their documentation programmatically

The COMMAND DOCS command returns documentation-focused information about available Redis commands. The map reply that the command returns includes the arguments key. This key stores an array that describes the command's arguments.

Every element in the arguments array is a map with the following fields:

  • name: the argument's name, always present. The name of an argument is given for identification purposes alone. It isn't displayed during the command's syntax rendering.
  • type: the argument's type, always present. An argument must have one of the following types:
    • string: a string argument.
    • integer: an integer argument.
    • double: a double-precision argument.
    • key: a string that represents the name of a key.
    • pattern: a string that represents a glob-like pattern.
    • unix-time: an integer that represents a Unix timestamp.
    • pure-token: an argument is a token, meaning a reserved keyword, which may or may not be provided. Not to be confused with free-text user input.
    • oneof: the argument is a container for nested arguments. This type enables choice among several nested arguments (see the XADD example below).
    • block: the argument is a container for nested arguments. This type enables grouping arguments and applying a property (such as optional) to all (see the XADD example below).
  • key_spec_index: this value is available for every argument of the key type. It is a 0-based index of the specification in the command's key specifications that corresponds to the argument.
  • token: a constant literal that precedes the argument (user input) itself.
  • summary: a short description of the argument.
  • since: the debut Redis version of the argument (or for module commands, the module version).
  • deprecated_since: the Redis version that deprecated the command (or for module commands, the module version).
  • flags: an array of argument flags. Possible flags are:
    • optional: denotes that the argument is optional (for example, the GET clause of the SET command).
    • multiple: denotes that the argument may be repeated (such as the key argument of DEL).
    • multiple-token: denotes the possible repetition of the argument with its preceding token (see SORT's GET pattern clause).
  • value: the argument's value. For arguments types other than oneof and block, this is a string that describes the value in the command's syntax. For the oneof and block types, this is an array of nested arguments, each being a map as described in this section.

Example

The trimming clause of XADD, i.e., [MAXLEN|MINID [=|~] threshold [LIMIT count]], is represented at the top-level as block-typed argument.

It consists of four nested arguments:

  1. trimming strategy: this nested argument has a oneof type with two nested arguments. Each of the nested arguments, MAXLEN and MINID, is typed as pure-token.
  2. trimming operator: this nested argument is an optional oneof type with two nested arguments. Each of the nested arguments, = and ~, is a pure-token.
  3. threshold: this nested argument is a string.
  4. count: this nested argument is an optional integer with a token (LIMIT).

Here's XADD's arguments array:

1) 1) "name"
   2) "key"
   3) "type"
   4) "key"
   5) "value"
   6) "key"
2)  1) "name"
    2) "nomkstream"
    3) "type"
    4) "pure-token"
    5) "token"
    6) "NOMKSTREAM"
    7) "since"
    8) "6.2"
    9) "flags"
   10) 1) optional
3) 1) "name"
   2) "trim"
   3) "type"
   4) "block"
   5) "flags"
   6) 1) optional
   7) "value"
   8) 1) 1) "name"
         2) "strategy"
         3) "type"
         4) "oneof"
         5) "value"
         6) 1) 1) "name"
               2) "maxlen"
               3) "type"
               4) "pure-token"
               5) "token"
               6) "MAXLEN"
            2) 1) "name"
               2) "minid"
               3) "type"
               4) "pure-token"
               5) "token"
               6) "MINID"
               7) "since"
               8) "6.2"
      2) 1) "name"
         2) "operator"
         3) "type"
         4) "oneof"
         5) "flags"
         6) 1) optional
         7) "value"
         8) 1) 1) "name"
               2) "equal"
               3) "type"
               4) "pure-token"
               5) "token"
               6) "="
            2) 1) "name"
               2) "approximately"
               3) "type"
               4) "pure-token"
               5) "token"
               6) "~"
      3) 1) "name"
         2) "threshold"
         3) "type"
         4) "string"
         5) "value"
         6) "threshold"
      4)  1) "name"
          2) "count"
          3) "type"
          4) "integer"
          5) "token"
          6) "LIMIT"
          7) "since"
          8) "6.2"
          9) "flags"
         10) 1) optional
         11) "value"
         12) "count"
4) 1) "name"
   2) "id_or_auto"
   3) "type"
   4) "oneof"
   5) "value"
   6) 1) 1) "name"
         2) "auto_id"
         3) "type"
         4) "pure-token"
         5) "token"
         6) "*"
      2) 1) "name"
         2) "id"
         3) "type"
         4) "string"
         5) "value"
         6) "id"
5) 1) "name"
   2) "field_value"
   3) "type"
   4) "block"
   5) "flags"
   6) 1) multiple
   7) "value"
   8) 1) 1) "name"
         2) "field"
         3) "type"
         4) "string"
         5) "value"
         6) "field"
      2) 1) "name"
         2) "value"
         3) "type"
         4) "string"
         5) "value"
         6) "value"

5 - Command key specifications

What are command key specification and how to use them in your client

Many of the commands in Redis accept key names as input arguments. The 9th element in the reply of COMMAND (and COMMAND INFO) is an array that consists of the command's key specifications.

A key specification describes a rule for extracting the names of one or more keys from the arguments of a given command. Key specifications provide a robust and flexible mechanism, compared to the first key, last key and step scheme employed until Redis 7.0. Before introducing these specifications, Redis clients had no trivial programmatic means to extract key names for all commands.

Cluster-aware Redis clients had to have the keys' extraction logic hard-coded in the cases of commands such as EVAL and ZUNIONSTORE that rely on a numkeys argument or SORT and its many clauses. Alternatively, the COMMAND GETKEYS can be used to achieve a similar extraction effect but at a higher latency.

A Redis client isn't obligated to support key specifications. It can continue using the legacy first key, last key and step scheme along with the movablekeys flag that remain unchanged.

However, a Redis client that implements key specifications support can consolidate most of its keys' extraction logic. Even if the client encounters an unfamiliar type of key specification, it can always revert to the COMMAND GETKEYS command.

That said, most cluster-aware clients only require a single key name to perform correct command routing, so it is possible that although a command features one unfamiliar specification, its other specification may still be usable by the client.

Key specifications are maps with three keys:

  1. begin_search:: the starting index for keys' extraction.
  2. find_keys: the rule for identifying the keys relative to the BS.
  3. notes: notes about this key spec, if there are any.
  4. flags: indicate the type of data access.

The begin_search value of a specification informs the client of the extraction's beginning. The value is a map. There are three types of begin_search:

  1. index: key name arguments begin at a constant index.
  2. keyword: key names start after a specific keyword (token).
  3. unknown: an unknown type of specification - see the incomplete flag section for more details.

index

The index type of begin_search indicates that input keys appear at a constant index. It is a map under the spec key with a single key:

  1. index: the 0-based index from which the client should start extracting key names.

keyword

The keyword type of begin_search means a literal token precedes key name arguments. It is a map under the spec with two keys:

  1. keyword: the keyword (token) that marks the beginning of key name arguments.
  2. startfrom: an index to the arguments array from which the client should begin searching. This can be a negative value, which means the search should start from the end of the arguments' array, in reverse order. For example, -2's meaning is to search reverse from the penultimate argument.

More examples of the keyword search type include:

  • SET has a begin_search specification of type index with a value of 1.
  • XREAD has a begin_search specification of type keyword with the values "STREAMS" and 1 as keyword and startfrom, respectively.
  • MIGRATE has a start_search specification of type keyword with the values of "KEYS" and -2.

find_keys

The find_keys value of a key specification tells the client how to continue the search for key names. find_keys has three possible types:

  1. range: keys stop at a specific index or relative to the last argument.
  2. keynum: an additional argument specifies the number of input keys.
  3. unknown: an unknown type of specification - see the incomplete flag section for more details.

range

The range type of find_keys is a map under the spec key with three keys:

  1. lastkey: the index, relative to begin_search, of the last key argument. This can be a negative value, in which case it isn't relative. For example, -1 indicates to keep extracting keys until the last argument, -2 until one before the last, and so on.
  2. keystep: the number of arguments that should be skipped, after finding a key, to find the next one.
  3. limit: if lastkey is has the value of -1, we use the limit to stop the search by a factor. 0 and 1 mean no limit. 2 means half of the remaining arguments, 3 means a third, and so on.

keynum

The keynum type of find_keys is a map under the spec key with three keys:

  • keynumidx: the index, relative to begin_search, of the argument containing the number of keys.
  • firstkey: the index, relative to begin_search, of the first key. This is usually the next argument after keynumidx, and its value, in this case, is greater by one.
  • keystep: Tthe number of arguments that should be skipped, after finding a key, to find the next one.

Examples:

  • The SET command has a range of 0, 1 and 0.
  • The MSET command has a range of -1, 2 and 0.
  • The XREAD command has a range of -1, 1 and 2.
  • The ZUNION command has a start_search type index with the value 1, and find_keys of type keynum with values of 0, 1 and 1.
  • The AI.DAGRUN command has a start_search of type keyword with values of "LOAD" and 1, and find_keys of type keynum with values of 0, 1 and 1.

Note: this isn't a perfect solution as the module writers can come up with anything. However, this mechanism should allow the extraction of key name arguments for the vast majority of commands.

notes

Notes about non-obvious key specs considerations, if applicable.

flags

A key specification can have additional flags that provide more details about the key. These flags are divided into three groups, as described below.

Access type flags

The following flags declare the type of access the command uses to a key's value or its metadata. A key's metadata includes LRU/LFU counters, type, and cardinality. These flags do not relate to the reply sent back to the client.

Every key specification has precisely one of the following flags:

  • RW: the read-write flag. The command modifies the data stored in the value of the key or its metadata. This flag marks every operation that isn't distinctly a delete, an overwrite, or read-only.
  • RO: the read-only flag. The command only reads the value of the key (although it doesn't necessarily return it).
  • OW: the overwrite flag. The command overwrites the data stored in the value of the key.
  • RM: the remove flag. The command deletes the key.

Logical operation flags

The following flags declare the type of operations performed on the data stored as the key's value and its TTL (if any), not the metadata. These flags describe the logical operation that the command executes on data, driven by the input arguments. The flags do not relate to modifying or returning metadata (such as a key's type, cardinality, or existence).

Every key specification may include the following flag:

  • access: the access flag. This flag indicates that the command returns, copies, or somehow uses the user's data that's stored in the key.

In addition, the specification may include precisely one of the following:

  • update: the update flag. The command updates the data stored in the key's value. The new value may depend on the old value. This flag marks every operation that isn't distinctly an insert or a delete.
  • insert: the insert flag. The command only adds data to the value; existing data isn't modified or deleted.
  • delete: the delete flag. The command explicitly deletes data from the value stored at the key.

Miscellaneous flags

Key specifications may have the following flags:

  • not_key: this flag indicates that the specified argument isn't a key. This argument is treated the same as a key when computing which slot a command should be assigned to for Redis cluster. For all other purposes this argument should not be considered a key.
  • incomplete: this flag is explained below.
  • variable_flags: this flag is explained below.

incomplete

Some commands feature exotic approaches when it comes to specifying their keys, which makes extraction difficult. Consider, for example, what would happen with a call to MIGRATE that includes the literal string "KEYS" as an argument to its AUTH clause. Our key specifications would miss the mark, and extraction would begin at the wrong index.

Thus, we recognize that key specifications are incomplete and may fail to extract all keys. However, we assure that even incomplete specifications never yield the wrong names of keys, providing that the command is syntactically correct.

In the case of MIGRATE, the search begins at the end (startfrom has the value of -1). If and when we encounter a key named "KEYS", we'll only extract the subset of the key name arguments after it. That's why MIGRATE has the incomplete flag in its key specification.

Another case of incompleteness is the SORT command. Here, the begin_search and find_keys are of type unknown. The client should revert to calling the COMMAND GETKEYS command to extract key names from the arguments, short of implementing it natively. The difficulty arises, for example, because the string "STORE" is both a keyword (token) and a valid literal argument for SORT.

Note: the only commands with incomplete key specifications are SORT and MIGRATE. We don't expect the addition of such commands in the future.

variable_flags

In some commands, the flags for the same key name argument can depend on other arguments. For example, consider the SET command and its optional GET argument. Without the GET argument, SET is write-only, but it becomes a read and write command with it. When this flag is present, it means that the key specification flags cover all possible options, but the effective flags depend on other arguments.

Examples

SET's key specifications

  1) 1) "flags"
     2) 1) RW
        2) access
        3) update
     3) "begin_search"
     4) 1) "type"
        2) "index"
        3) "spec"
        4) 1) "index"
           2) (integer) 1
     5) "find_keys"
     6) 1) "type"
        2) "range"
        3) "spec"
        4) 1) "lastkey"
           2) (integer) 0
           3) "keystep"
           4) (integer) 1
           5) "limit"
           6) (integer) 0

ZUNION's key specifications

  1) 1) "flags"
     2) 1) RO
        2) access
     3) "begin_search"
     4) 1) "type"
        2) "index"
        3) "spec"
        4) 1) "index"
           2) (integer) 1
     5) "find_keys"
     6) 1) "type"
        2) "keynum"
        3) "spec"
        4) 1) "keynumidx"
           2) (integer) 0
           3) "firstkey"
           4) (integer) 1
           5) "keystep"
           6) (integer) 1

6 - Redis command tips

Programm

Command tips are an array of strings. These provide Redis clients with additional information about the command. The information can instruct Redis Cluster clients as to how the command should be executed and its output processed in a clustered deployment.

Unlike the command's flags (see the 3rd element of COMMAND's reply), which are strictly internal to the server's operation, tips don't serve any purpose other than being reported to clients.

Command tips are arbitrary strings. However, the following sections describe proposed tips and demonstrate the conventions they are likely to adhere to.

nondeterministic_output

This tip indicates that the command's output isn't deterministic. That means that calls to the command may yield different results with the same arguments and data. That difference could be the result of the command's random nature (e.g., RANDOMKEY and SPOP); the call's timing (e.g. TTL); or generic differences that relate to the server's state (e.g. INFO and CLIENT LIST).

Note: prior to Redis 7.0, this tip was the random command flag.

nondeterministic_output_order

The existence of this tip indicates that the command's output is deterministic, but its ordering is random (e.g. HGETALL and SMEMBERS).

Note: prior to Redis 7.0, this tip was the sort_for_script flag.

request_policy

This tip can help clients determine the shard(s) to send the command in clustering mode. The default behavior a client should implement for commands without the request_policy tip is as follows:

  1. The command doesn't accept key name arguments: the client can execute the command on an arbitrary shard.
  2. For commands that accept one or more key name arguments: the client should route the command to a single shard, as determined by the hash slot of the input keys.

In cases where the client should adopt a behavior different than the default, the request_policy tip can be one of:

  • all_nodes: the client should execute the command on all nodes - masters and replicas alike. An example is the CONFIG SET command. This tip is in-use by commands that don't accept key name arguments. The command operates atomically per shard.
  • all_shards: the client should execute the command on all master shards (e.g., the DBSIZE command). This tip is in-use by commands that don't accept key name arguments. The command operates atomically per shard.
  • multi_shard: the client should execute the command on several shards. The shards that execute the command are determined by the hash slots of its input key name arguments. Examples for such commands include MSET, MGET and DEL. However, note that SUNIONSTORE isn't considered as multi_shard because all of its keys must belong to the same hash slot.
  • special: indicates a non-trivial form of the client's request policy, such as the SCAN command.

response_policy

This tip can help clients determine the aggregate they need to compute from the replies of multiple shards in a cluster. The default behavior for commands without a request_policy tip only applies to replies with of nested types (i.e., an array, a set, or a map). The client's implementation for the default behavior should be as follows:

  1. The command doesn't accept key name arguments: the client can aggregate all replies within a single nested data structure. For example, the array replies we get from calling KEYS against all shards. These should be packed in a single in no particular order.
  2. For commands that accept one or more key name arguments: the client needs to retain the same order of replies as the input key names. For example, MGET's aggregated reply.

The response_policy tip is set for commands that reply with scalar data types, or when it's expected that clients implement a non-default aggregate. This tip can be one of:

  • one_succeeded: the clients should return success if at least one shard didn't reply with an error. The client should reply with the first non-error reply it obtains. If all shards return an error, the client can reply with any one of these. For example, consider a SCRIPT KILL command that's sent to all shards. Although the script should be loaded in all of the cluster's shards, the SCRIPT KILL will typically run only on one at a given time.
  • all_succeeded: the client should return successfully only if there are no error replies. Even a single error reply should disqualify the aggregate and be returned. Otherwise, the client should return one of the non-error replies. As an example, consider the CONFIG SET, SCRIPT FLUSH and SCRIPT LOAD commands.
  • agg_logical_and: the client should return the result of a logical AND operation on all replies (only applies to integer replies, usually from commands that return either 0 or 1). Consider the SCRIPT EXISTS command as an example. It returns an array of 0's and 1's that denote the existence of its given SHA1 sums in the script cache. The aggregated response should be 1 only when all shards had reported that a given script SHA1 sum is in their respective cache.
  • agg_logical_or: the client should return the result of a logical AND operation on all replies (only applies to integer replies, usually from commands that return either 0 or 1).
  • agg_min: the client should return the minimal value from the replies (only applies to numerical replies). The aggregate reply from a cluster-wide WAIT command, for example, should be the minimal value (number of synchronized replicas) from all shards.
  • agg_max: the client should return the maximal value from the replies (only applies to numerical replies).
  • agg_sum: the client should return the sum of replies (only applies to numerical replies). Example: DBSIZE.
  • special: this type of tip indicates a non-trivial form of reply policy. INFO is an excellent example of that.

Example

redis> command info ping
1)  1) "ping"
    2) (integer) -1
    3) 1) fast
    4) (integer) 0
    5) (integer) 0
    6) (integer) 0
    7) 1) @fast
       2) @connection
    8) 1) "request_policy:all_shards"
       2) "response_policy:all_succeeded"
    9) (empty array)
   10) (empty array)

7 - Debugging

A guide to debugging Redis server processes

Redis is developed with an emphasis on stability. We do our best with every release to make sure you'll experience a stable product with no crashes. However, if you ever need to debug the Redis process itself, read on.

When Redis crashes, it produces a detailed report of what happened. However, sometimes looking at the crash report is not enough, nor is it possible for the Redis core team to reproduce the issue independently. In this scenario, we need help from the user who can reproduce the issue.

This guide shows how to use GDB to provide the information the Redis developers will need to track the bug more easily.

What is GDB?

GDB is the Gnu Debugger: a program that is able to inspect the internal state of another program. Usually tracking and fixing a bug is an exercise in gathering more information about the state of the program at the moment the bug happens, so GDB is an extremely useful tool.

GDB can be used in two ways:

  • It can attach to a running program and inspect the state of it at runtime.
  • It can inspect the state of a program that already terminated using what is called a core file, that is, the image of the memory at the time the program was running.

From the point of view of investigating Redis bugs we need to use both of these GDB modes. The user able to reproduce the bug attaches GDB to their running Redis instance, and when the crash happens, they create the core file that in turn the developer will use to inspect the Redis internals at the time of the crash.

This way the developer can perform all the inspections in his or her computer without the help of the user, and the user is free to restart Redis in their production environment.

Compiling Redis without optimizations

By default Redis is compiled with the -O2 switch, this means that compiler optimizations are enabled. This makes the Redis executable faster, but at the same time it makes Redis (like any other program) harder to inspect using GDB.

It is better to attach GDB to Redis compiled without optimizations using the make noopt command (instead of just using the plain make command). However, if you have an already running Redis in production there is no need to recompile and restart it if this is going to create problems on your side. GDB still works against executables compiled with optimizations.

You should not be overly concerned at the loss of performance from compiling Redis without optimizations. It is unlikely that this will cause problems in your environment as Redis is not very CPU-bound.

Attaching GDB to a running process

If you have an already running Redis server, you can attach GDB to it, so that if Redis crashes it will be possible to both inspect the internals and generate a core dump file.

After you attach GDB to the Redis process it will continue running as usual without any loss of performance, so this is not a dangerous procedure.

In order to attach GDB the first thing you need is the process ID of the running Redis instance (the pid of the process). You can easily obtain it using redis-cli:

$ redis-cli info | grep process_id
process_id:58414

In the above example the process ID is 58414.

Login into your Redis server.

(Optional but recommended) Start screen or tmux or any other program that will make sure that your GDB session will not be closed if your ssh connection times out. You can learn more about screen in this article.

Attach GDB to the running Redis server by typing:

$ gdb <path-to-redis-executable> <pid>

For example:

$ gdb /usr/local/bin/redis-server 58414

GDB will start and will attach to the running server printing something like the following:

Reading symbols for shared libraries + done
0x00007fff8d4797e6 in epoll_wait ()
(gdb)

At this point GDB is attached but your Redis instance is blocked by GDB. In order to let the Redis instance continue the execution just type continue at the GDB prompt, and press enter.

(gdb) continue
Continuing.

Done! Now your Redis instance has GDB attached. Now you can wait for the next crash. :)

Now it's time to detach your screen/tmux session, if you are running GDB using it, by pressing Ctrl-a a key combination.

After the crash

Redis has a command to simulate a segmentation fault (in other words a bad crash) using the DEBUG SEGFAULT command (don't use it against a real production instance of course! So I'll use this command to crash my instance to show what happens in the GDB side:

(gdb) continue
Continuing.

Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0xffffffffffffffff
debugCommand (c=0x7ffc32005000) at debug.c:220
220         *((char*)-1) = 'x';

As you can see GDB detected that Redis crashed, and was even able to show me the file name and line number causing the crash. This is already much better than the Redis crash report back trace (containing just function names and binary offsets).

Obtaining the stack trace

The first thing to do is to obtain a full stack trace with GDB. This is as simple as using the bt command:

(gdb) bt
#0  debugCommand (c=0x7ffc32005000) at debug.c:220
#1  0x000000010d246d63 in call (c=0x7ffc32005000) at redis.c:1163
#2  0x000000010d247290 in processCommand (c=0x7ffc32005000) at redis.c:1305
#3  0x000000010d251660 in processInputBuffer (c=0x7ffc32005000) at networking.c:959
#4  0x000000010d251872 in readQueryFromClient (el=0x0, fd=5, privdata=0x7fff76f1c0b0, mask=220924512) at networking.c:1021
#5  0x000000010d243523 in aeProcessEvents (eventLoop=0x7fff6ce408d0, flags=220829559) at ae.c:352
#6  0x000000010d24373b in aeMain (eventLoop=0x10d429ef0) at ae.c:397
#7  0x000000010d2494ff in main (argc=1, argv=0x10d2b2900) at redis.c:2046

This shows the backtrace, but we also want to dump the processor registers using the info registers command:

(gdb) info registers
rax            0x0  0
rbx            0x7ffc32005000   140721147367424
rcx            0x10d2b0a60  4515891808
rdx            0x7fff76f1c0b0   140735188943024
rsi            0x10d299777  4515796855
rdi            0x0  0
rbp            0x7fff6ce40730   0x7fff6ce40730
rsp            0x7fff6ce40650   0x7fff6ce40650
r8             0x4f26b3f7   1327936503
r9             0x7fff6ce40718   140735020271384
r10            0x81 129
r11            0x10d430398  4517462936
r12            0x4b7c04f8babc0  1327936503000000
r13            0x10d3350a0  4516434080
r14            0x10d42d9f0  4517452272
r15            0x10d430398  4517462936
rip            0x10d26cfd4  0x10d26cfd4 <debugCommand+68>
eflags         0x10246  66118
cs             0x2b 43
ss             0x0  0
ds             0x0  0
es             0x0  0
fs             0x0  0
gs             0x0  0

Please make sure to include both of these outputs in your bug report.

Obtaining the core file

The next step is to generate the core dump, that is the image of the memory of the running Redis process. This is done using the gcore command:

(gdb) gcore
Saved corefile core.58414

Now you have the core dump to send to the Redis developer, but it is important to understand that this happens to contain all the data that was inside the Redis instance at the time of the crash; Redis developers will make sure not to share the content with anyone else, and will delete the file as soon as it is no longer used for debugging purposes, but you are warned that by sending the core file you are sending your data.

What to send to developers

Finally you can send everything to the Redis core team:

  • The Redis executable you are using.
  • The stack trace produced by the bt command, and the registers dump.
  • The core file you generated with gdb.
  • Information about the operating system and GCC version, and Redis version you are using.

Thank you

Your help is extremely important! Many issues can only be tracked this way. So thanks!

8 - Redis and the Gopher protocol

The Redis Gopher protocol implementation

** Note: Support for Gopher was removed is Redis 7.0 **

Redis contains an implementation of the Gopher protocol, as specified in the RFC 1436.

The Gopher protocol was very popular in the late '90s. It is an alternative to the web, and the implementation both server and client side is so simple that the Redis server has just 100 lines of code in order to implement this support.

What do you do with Gopher nowadays? Well Gopher never really died, and lately there is a movement in order for the Gopher more hierarchical content composed of just plain text documents to be resurrected. Some want a simpler internet, others believe that the mainstream internet became too much controlled, and it's cool to create an alternative space for people that want a bit of fresh air.

Anyway, for the 10th birthday of the Redis, we gave it the Gopher protocol as a gift.

How it works

The Redis Gopher support uses the inline protocol of Redis, and specifically two kind of inline requests that were anyway illegal: an empty request or any request that starts with "/" (there are no Redis commands starting with such a slash). Normal RESP2/RESP3 requests are completely out of the path of the Gopher protocol implementation and are served as usually as well.

If you open a connection to Redis when Gopher is enabled and send it a string like "/foo", if there is a key named "/foo" it is served via the Gopher protocol.

In order to create a real Gopher "hole" (the name of a Gopher site in Gopher talking), you likely need a script such as the one in https://github.com/antirez/gopher2redis.

SECURITY WARNING

If you plan to put Redis on the internet in a publicly accessible address to server Gopher pages make sure to set a password to the instance. Once a password is set:

  1. The Gopher server (when enabled, not by default) will kill serve content via Gopher.
  2. However other commands cannot be called before the client will authenticate.

So use the requirepass option to protect your instance.

To enable Gopher support use the following configuration line.

gopher-enabled yes

Accessing keys that are not strings or do not exit will generate an error in Gopher protocol format.

9 - Redis internals

Documents describing internals in early Redis implementations

The following Redis documents were written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010), and do not necessarily reflect the latest Redis implementation.

9.1 - Event library

What's an event library, and how was the original Redis event library implemented?

Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010), and does not necessarily reflect the latest Redis implementation.

Why is an Event Library needed at all?

Let us figure it out through a series of Q&As.

Q: What do you expect a network server to be doing all the time?
A: Watch for inbound connections on the port its listening and accept them.

Q: Calling [accept](http://man.cx/accept%282%29 accept) yields a descriptor. What do I do with it?
A: Save the descriptor and do a non-blocking read/write operation on it.

Q: Why does the read/write have to be non-blocking?
A: If the file operation ( even a socket in Unix is a file ) is blocking how could the server for example accept other connection requests when its blocked in a file I/O operation.

Q: I guess I have to do many such non-blocking operations on the socket to see when it's ready. Am I right?
A: Yes. That is what an event library does for you. Now you get it.

Q: How do Event Libraries do what they do?
A: They use the operating system's polling facility along with timers.

Q: So are there any open source event libraries that do what you just described?
A: Yes. libevent and libev are two such event libraries that I can recall off the top of my head.

Q: Does Redis use such open source event libraries for handling socket I/O?
A: No. For various reasons Redis uses its own event library.

The Redis event library

Redis implements its own event library. The event library is implemented in ae.c.

The best way to understand how the Redis event library works is to understand how Redis uses it.

Event Loop Initialization

initServer function defined in redis.c initializes the numerous fields of the redisServer structure variable. One such field is the Redis event loop el:

aeEventLoop *el

initServer initializes server.el field by calling aeCreateEventLoop defined in ae.c. The definition of aeEventLoop is below:

typedef struct aeEventLoop
{
    int maxfd;
    long long timeEventNextId;
    aeFileEvent events[AE_SETSIZE]; /* Registered events */
    aeFiredEvent fired[AE_SETSIZE]; /* Fired events */
    aeTimeEvent *timeEventHead;
    int stop;
    void *apidata; /* This is used for polling API specific data */
    aeBeforeSleepProc *beforesleep;
} aeEventLoop;

aeCreateEventLoop

aeCreateEventLoop first mallocs aeEventLoop structure then calls ae_epoll.c:aeApiCreate.

aeApiCreate mallocs aeApiState that has two fields - epfd that holds the epoll file descriptor returned by a call from epoll_create and events that is of type struct epoll_event define by the Linux epoll library. The use of the events field will be described later.

Next is ae.c:aeCreateTimeEvent. But before that initServer call anet.c:anetTcpServer that creates and returns a listening descriptor. The descriptor listens on port 6379 by default. The returned listening descriptor is stored in server.fd field.

aeCreateTimeEvent

aeCreateTimeEvent accepts the following as parameters:

  • eventLoop: This is server.el in redis.c
  • milliseconds: The number of milliseconds from the current time after which the timer expires.
  • proc: Function pointer. Stores the address of the function that has to be called after the timer expires.
  • clientData: Mostly NULL.
  • finalizerProc: Pointer to the function that has to be called before the timed event is removed from the list of timed events.

initServer calls aeCreateTimeEvent to add a timed event to timeEventHead field of server.el. timeEventHead is a pointer to a list of such timed events. The call to aeCreateTimeEvent from redis.c:initServer function is given below:

aeCreateTimeEvent(server.el /*eventLoop*/, 1 /*milliseconds*/, serverCron /*proc*/, NULL /*clientData*/, NULL /*finalizerProc*/);

redis.c:serverCron performs many operations that helps keep Redis running properly.

aeCreateFileEvent

The essence of aeCreateFileEvent function is to execute epoll_ctl system call which adds a watch for EPOLLIN event on the listening descriptor create by anetTcpServer and associate it with the epoll descriptor created by a call to aeCreateEventLoop.

Following is an explanation of what precisely aeCreateFileEvent does when called from redis.c:initServer.

initServer passes the following arguments to aeCreateFileEvent:

  • server.el: The event loop created by aeCreateEventLoop. The epoll descriptor is got from server.el.
  • server.fd: The listening descriptor that also serves as an index to access the relevant file event structure from the eventLoop->events table and store extra information like the callback function.
  • AE_READABLE: Signifies that server.fd has to be watched for EPOLLIN event.
  • acceptHandler: The function that has to be executed when the event being watched for is ready. This function pointer is stored in eventLoop->events[server.fd]->rfileProc.

This completes the initialization of Redis event loop.

Event Loop Processing

ae.c:aeMain called from redis.c:main does the job of processing the event loop that is initialized in the previous phase.

ae.c:aeMain calls ae.c:aeProcessEvents in a while loop that processes pending time and file events.

aeProcessEvents

ae.c:aeProcessEvents looks for the time event that will be pending in the smallest amount of time by calling ae.c:aeSearchNearestTimer on the event loop. In our case there is only one timer event in the event loop that was created by ae.c:aeCreateTimeEvent.

Remember, that the timer event created by aeCreateTimeEvent has probably elapsed by now because it had an expiry time of one millisecond. Since the timer has already expired, the seconds and microseconds fields of the tvp timeval structure variable is initialized to zero.

The tvp structure variable along with the event loop variable is passed to ae_epoll.c:aeApiPoll.

aeApiPoll functions does an epoll_wait on the epoll descriptor and populates the eventLoop->fired table with the details:

  • fd: The descriptor that is now ready to do a read/write operation depending on the mask value.
  • mask: The read/write event that can now be performed on the corresponding descriptor.

aeApiPoll returns the number of such file events ready for operation. Now to put things in context, if any client has requested for a connection then aeApiPoll would have noticed it and populated the eventLoop->fired table with an entry of the descriptor being the listening descriptor and mask being AE_READABLE.

Now, aeProcessEvents calls the redis.c:acceptHandler registered as the callback. acceptHandler executes accept on the listening descriptor returning a connected descriptor with the client. redis.c:createClient adds a file event on the connected descriptor through a call to ae.c:aeCreateFileEvent like below:

if (aeCreateFileEvent(server.el, c->fd, AE_READABLE,
    readQueryFromClient, c) == AE_ERR) {
    freeClient(c);
    return NULL;
}

c is the redisClient structure variable and c->fd is the connected descriptor.

Next the ae.c:aeProcessEvent calls ae.c:processTimeEvents

processTimeEvents

ae.processTimeEvents iterates over list of time events starting at eventLoop->timeEventHead.

For every timed event that has elapsed processTimeEvents calls the registered callback. In this case it calls the only timed event callback registered, that is, redis.c:serverCron. The callback returns the time in milliseconds after which the callback must be called again. This change is recorded via a call to ae.c:aeAddMilliSeconds and will be handled on the next iteration of ae.c:aeMain while loop.

That's all.

9.2 - String internals

Guide to the original implementation of Redis strings

Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010). Virtual Memory has been deprecated since Redis 2.6, so this documentation is here only for historical interest.

The implementation of Redis strings is contained in sds.c (sds stands for Simple Dynamic Strings). The implementation is available as a standalone library at https://github.com/antirez/sds.

The C structure sdshdr declared in sds.h represents a Redis string:

struct sdshdr {
    long len;
    long free;
    char buf[];
};

The buf character array stores the actual string.

The len field stores the length of buf. This makes obtaining the length of a Redis string an O(1) operation.

The free field stores the number of additional bytes available for use.

Together the len and free field can be thought of as holding the metadata of the buf character array.

Creating Redis Strings

A new data type named sds is defined in sds.h to be a synonym for a character pointer:

typedef char *sds;

sdsnewlen function defined in sds.c creates a new Redis String:

sds sdsnewlen(const void *init, size_t initlen) {
    struct sdshdr *sh;

    sh = zmalloc(sizeof(struct sdshdr)+initlen+1);
#ifdef SDS_ABORT_ON_OOM
    if (sh == NULL) sdsOomAbort();
#else
    if (sh == NULL) return NULL;
#endif
    sh->len = initlen;
    sh->free = 0;
    if (initlen) {
        if (init) memcpy(sh->buf, init, initlen);
        else memset(sh->buf,0,initlen);
    }
    sh->buf[initlen] = '\0';
    return (char*)sh->buf;
}

Remember a Redis string is a variable of type struct sdshdr. But sdsnewlen returns a character pointer!!

That's a trick and needs some explanation.

Suppose I create a Redis string using sdsnewlen like below:

sdsnewlen("redis", 5);

This creates a new variable of type struct sdshdr allocating memory for len and free fields as well as for the buf character array.

sh = zmalloc(sizeof(struct sdshdr)+initlen+1); // initlen is length of init argument.

After sdsnewlen successfully creates a Redis string the result is something like:

-----------
|5|0|redis|
-----------
^   ^
sh  sh->buf

sdsnewlen returns sh->buf to the caller.

What do you do if you need to free the Redis string pointed by sh?

You want the pointer sh but you only have the pointer sh->buf.

Can you get the pointer sh from sh->buf?

Yes. Pointer arithmetic. Notice from the above ASCII art that if you subtract the size of two longs from sh->buf you get the pointer sh.

The sizeof two longs happens to be the size of struct sdshdr.

Look at sdslen function and see this trick at work:

size_t sdslen(const sds s) {
    struct sdshdr *sh = (void*) (s-(sizeof(struct sdshdr)));
    return sh->len;
}

Knowing this trick you could easily go through the rest of the functions in sds.c.

The Redis string implementation is hidden behind an interface that accepts only character pointers. The users of Redis strings need not care about how it's implemented and can treat Redis strings as a character pointer.

9.3 - Virtual memory (deprecated)

A description of the Redis virtual memory system that was deprecated in 2.6. This document exists for historial interest.

Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010). Virtual Memory has been deprecated since Redis 2.6, so this documentation is here only for historical interest.

This document details the internals of the Redis Virtual Memory subsystem prior to Redis 2.6. The intended audience is not the final user but programmers willing to understand or modify the Virtual Memory implementation.

Keys vs Values: what is swapped out?

The goal of the VM subsystem is to free memory transferring Redis Objects from memory to disk. This is a very generic command, but specifically, Redis transfers only objects associated with values. In order to understand better this concept we'll show, using the DEBUG command, how a key holding a value looks from the point of view of the Redis internals:

redis> set foo bar
OK
redis> debug object foo
Key at:0x100101d00 refcount:1, value at:0x100101ce0 refcount:1 encoding:raw serializedlength:4

As you can see from the above output, the Redis top level hash table maps Redis Objects (keys) to other Redis Objects (values). The Virtual Memory is only able to swap values on disk, the objects associated to keys are always taken in memory: this trade off guarantees very good lookup performances, as one of the main design goals of the Redis VM is to have performances similar to Redis with VM disabled when the part of the dataset frequently used fits in RAM.

How does a swapped value looks like internally

When an object is swapped out, this is what happens in the hash table entry:

  • The key continues to hold a Redis Object representing the key.
  • The value is set to NULL

So you may wonder where we store the information that a given value (associated to a given key) was swapped out. Just in the key object!

This is how the Redis Object structure robj looks like:

/* The actual Redis Object */
typedef struct redisObject {
    void *ptr;
    unsigned char type;
    unsigned char encoding;
    unsigned char storage;  /* If this object is a key, where is the value?
                             * REDIS_VM_MEMORY, REDIS_VM_SWAPPED, ... */
    unsigned char vtype; /* If this object is a key, and value is swapped out,
                          * this is the type of the swapped out object. */
    int refcount;
    /* VM fields, this are only allocated if VM is active, otherwise the
     * object allocation function will just allocate
     * sizeof(redisObject) minus sizeof(redisObjectVM), so using
     * Redis without VM active will not have any overhead. */
    struct redisObjectVM vm;
} robj;

As you can see there are a few fields about VM. The most important one is storage, that can be one of this values:

  • REDIS_VM_MEMORY: the associated value is in memory.
  • REDIS_VM_SWAPPED: the associated values is swapped, and the value entry of the hash table is just set to NULL.
  • REDIS_VM_LOADING: the value is swapped on disk, the entry is NULL, but there is a job to load the object from the swap to the memory (this field is only used when threaded VM is active).
  • REDIS_VM_SWAPPING: the value is in memory, the entry is a pointer to the actual Redis Object, but there is an I/O job in order to transfer this value to the swap file.

If an object is swapped on disk (REDIS_VM_SWAPPED or REDIS_VM_LOADING), how do we know where it is stored, what type it is, and so forth? That's simple: the vtype field is set to the original type of the Redis object swapped, while the vm field (that is a redisObjectVM structure) holds information about the location of the object. This is the definition of this additional structure:

/* The VM object structure */
struct redisObjectVM {
    off_t page;         /* the page at which the object is stored on disk */
    off_t usedpages;    /* number of pages used on disk */
    time_t atime;       /* Last access time */
} vm;

As you can see the structure contains the page at which the object is located in the swap file, the number of pages used, and the last access time of the object (this is very useful for the algorithm that select what object is a good candidate for swapping, as we want to transfer on disk objects that are rarely accessed).

As you can see, while all the other fields are using unused bytes in the old Redis Object structure (we had some free bit due to natural memory alignment concerns), the vm field is new, and indeed uses additional memory. Should we pay such a memory cost even when VM is disabled? No! This is the code to create a new Redis Object:

... some code ...
        if (server.vm_enabled) {
            pthread_mutex_unlock(&server.obj_freelist_mutex);
            o = zmalloc(sizeof(*o));
        } else {
            o = zmalloc(sizeof(*o)-sizeof(struct redisObjectVM));
        }
... some code ...

As you can see if the VM system is not enabled we allocate just sizeof(*o)-sizeof(struct redisObjectVM) of memory. Given that the vm field is the last in the object structure, and that this fields are never accessed if VM is disabled, we are safe and Redis without VM does not pay the memory overhead.

The Swap File

The next step in order to understand how the VM subsystem works is understanding how objects are stored inside the swap file. The good news is that's not some kind of special format, we just use the same format used to store the objects in .rdb files, that are the usual dump files produced by Redis using the SAVE command.

The swap file is composed of a given number of pages, where every page size is a given number of bytes. This parameters can be changed in redis.conf, since different Redis instances may work better with different values: it depends on the actual data you store inside it. The following are the default values:

vm-page-size 32
vm-pages 134217728

Redis takes a "bitmap" (an contiguous array of bits set to zero or one) in memory, every bit represent a page of the swap file on disk: if a given bit is set to 1, it represents a page that is already used (there is some Redis Object stored there), while if the corresponding bit is zero, the page is free.

Taking this bitmap (that will call the page table) in memory is a huge win in terms of performances, and the memory used is small: we just need 1 bit for every page on disk. For instance in the example below 134217728 pages of 32 bytes each (4GB swap file) is using just 16 MB of RAM for the page table.

Transferring objects from memory to swap

In order to transfer an object from memory to disk we need to perform the following steps (assuming non threaded VM, just a simple blocking approach):

  • Find how many pages are needed in order to store this object on the swap file. This is trivially accomplished just calling the function rdbSavedObjectPages that returns the number of pages used by an object on disk. Note that this function does not duplicate the .rdb saving code just to understand what will be the length after an object will be saved on disk, we use the trick of opening /dev/null and writing the object there, finally calling ftello in order check the amount of bytes required. What we do basically is to save the object on a virtual very fast file, that is, /dev/null.
  • Now that we know how many pages are required in the swap file, we need to find this number of contiguous free pages inside the swap file. This task is accomplished by the vmFindContiguousPages function. As you can guess this function may fail if the swap is full, or so fragmented that we can't easily find the required number of contiguous free pages. When this happens we just abort the swapping of the object, that will continue to live in memory.
  • Finally we can write the object on disk, at the specified position, just calling the function vmWriteObjectOnSwap.

As you can guess once the object was correctly written in the swap file, it is freed from memory, the storage field in the associated key is set to REDIS_VM_SWAPPED, and the used pages are marked as used in the page table.

Loading objects back in memory

Loading an object from swap to memory is simpler, as we already know where the object is located and how many pages it is using. We also know the type of the object (the loading functions are required to know this information, as there is no header or any other information about the object type on disk), but this is stored in the vtype field of the associated key as already seen above.

Calling the function vmLoadObject passing the key object associated to the value object we want to load back is enough. The function will also take care of fixing the storage type of the key (that will be REDIS_VM_MEMORY), marking the pages as freed in the page table, and so forth.

The return value of the function is the loaded Redis Object itself, that we'll have to set again as value in the main hash table (instead of the NULL value we put in place of the object pointer when the value was originally swapped out).

How blocking VM works

Now we have all the building blocks in order to describe how the blocking VM works. First of all, an important detail about configuration. In order to enable blocking VM in Redis server.vm_max_threads must be set to zero. We'll see later how this max number of threads info is used in the threaded VM, for now all it's needed to now is that Redis reverts to fully blocking VM when this is set to zero.

We also need to introduce another important VM parameter, that is, server.vm_max_memory. This parameter is very important as it is used in order to trigger swapping: Redis will try to swap objects only if it is using more memory than the max memory setting, otherwise there is no need to swap as we are matching the user requested memory usage.

Blocking VM swapping

Swapping of object from memory to disk happens in the cron function. This function used to be called every second, while in the recent Redis versions on git it is called every 100 milliseconds (that is, 10 times per second). If this function detects we are out of memory, that is, the memory used is greater than the vm-max-memory setting, it starts transferring objects from memory to disk in a loop calling the function vmSwapOneObect. This function takes just one argument, if 0 it will swap objects in a blocking way, otherwise if it is 1, I/O threads are used. In the blocking scenario we just call it with zero as argument.

vmSwapOneObject acts performing the following steps:

  • The key space in inspected in order to find a good candidate for swapping (we'll see later what a good candidate for swapping is).
  • The associated value is transferred to disk, in a blocking way.
  • The key storage field is set to REDIS_VM_SWAPPED, while the vm fields of the object are set to the right values (the page index where the object was swapped, and the number of pages used to swap it).
  • Finally the value object is freed and the value entry of the hash table is set to NULL.

The function is called again and again until one of the following happens: there is no way to swap more objects because either the swap file is full or nearly all the objects are already transferred on disk, or simply the memory usage is already under the vm-max-memory parameter.

What values to swap when we are out of memory?

Understanding what's a good candidate for swapping is not too hard. A few objects at random are sampled, and for each their swappability is commuted as:

swappability = age*log(size_in_memory)

The age is the number of seconds the key was not requested, while size_in_memory is a fast estimation of the amount of memory (in bytes) used by the object in memory. So we try to swap out objects that are rarely accessed, and we try to swap bigger objects over smaller one, but the latter is a less important factor (because of the logarithmic function used). This is because we don't want bigger objects to be swapped out and in too often as the bigger the object the more I/O and CPU is required in order to transfer it.

Blocking VM loading

What happens if an operation against a key associated with a swapped out object is requested? For instance Redis may just happen to process the following command:

GET foo

If the value object of the foo key is swapped we need to load it back in memory before processing the operation. In Redis the key lookup process is centralized in the lookupKeyRead and lookupKeyWrite functions, this two functions are used in the implementation of all the Redis commands accessing the keyspace, so we have a single point in the code where to handle the loading of the key from the swap file to memory.

So this is what happens:

  • The user calls some command having as argument a swapped key
  • The command implementation calls the lookup function
  • The lookup function search for the key in the top level hash table. If the value associated with the requested key is swapped (we can see that checking the storage field of the key object), we load it back in memory in a blocking way before to return to the user.

This is pretty straightforward, but things will get more interesting with the threads. From the point of view of the blocking VM the only real problem is the saving of the dataset using another process, that is, handling BGSAVE and BGREWRITEAOF commands.

Background saving when VM is active

The default Redis way to persist on disk is to create .rdb files using a child process. Redis calls the fork() system call in order to create a child, that has the exact copy of the in memory dataset, since fork duplicates the whole program memory space (actually thanks to a technique called Copy on Write memory pages are shared between the parent and child process, so the fork() call will not require too much memory).

In the child process we have a copy of the dataset in a given point in the time. Other commands issued by clients will just be served by the parent process and will not modify the child data.

The child process will just store the whole dataset into the dump.rdb file and finally will exit. But what happens when the VM is active? Values can be swapped out so we don't have all the data in memory, and we need to access the swap file in order to retrieve the swapped values. While child process is saving the swap file is shared between the parent and child process, since:

  • The parent process needs to access the swap file in order to load values back into memory if an operation against swapped out values are performed.
  • The child process needs to access the swap file in order to retrieve the full dataset while saving the data set on disk.

In order to avoid problems while both the processes are accessing the same swap file we do a simple thing, that is, not allowing values to be swapped out in the parent process while a background saving is in progress. This way both the processes will access the swap file in read only. This approach has the problem that while the child process is saving no new values can be transferred on the swap file even if Redis is using more memory than the max memory parameters dictates. This is usually not a problem as the background saving will terminate in a short amount of time and if still needed a percentage of values will be swapped on disk ASAP.

An alternative to this scenario is to enable the Append Only File that will have this problem only when a log rewrite is performed using the BGREWRITEAOF command.

The problem with the blocking VM

The problem of blocking VM is that... it's blocking :) This is not a problem when Redis is used in batch processing activities, but for real-time usage one of the good points of Redis is the low latency. The blocking VM will have bad latency behaviors as when a client is accessing a swapped out value, or when Redis needs to swap out values, no other clients will be served in the meantime.

Swapping out keys should happen in background. Similarly when a client is accessing a swapped out value other clients accessing in memory values should be served mostly as fast as when VM is disabled. Only the clients dealing with swapped out keys should be delayed.

All this limitations called for a non-blocking VM implementation.

Threaded VM

There are basically three main ways to turn the blocking VM into a non blocking one.

  • 1: One way is obvious, and in my opinion, not a good idea at all, that is, turning Redis itself into a threaded server: if every request is served by a different thread automatically other clients don't need to wait for blocked ones. Redis is fast, exports atomic operations, has no locks, and is just 10k lines of code, because it is single threaded, so this was not an option for me.
  • 2: Using non-blocking I/O against the swap file. After all you can think Redis already event-loop based, why don't just handle disk I/O in a non-blocking fashion? I also discarded this possibility because of two main reasons. One is that non blocking file operations, unlike sockets, are an incompatibility nightmare. It's not just like calling select, you need to use OS-specific things. The other problem is that the I/O is just one part of the time consumed to handle VM, another big part is the CPU used in order to encode/decode data to/from the swap file. This is I picked option three, that is...
  • 3: Using I/O threads, that is, a pool of threads handling the swap I/O operations. This is what the Redis VM is using, so let's detail how this works.

I/O Threads

The threaded VM design goals where the following, in order of importance:

  • Simple implementation, little room for race conditions, simple locking, VM system more or less completely decoupled from the rest of Redis code.
  • Good performances, no locks for clients accessing values in memory.
  • Ability to decode/encode objects in the I/O threads.

The above goals resulted in an implementation where the Redis main thread (the one serving actual clients) and the I/O threads communicate using a queue of jobs, with a single mutex. Basically when main thread requires some work done in the background by some I/O thread, it pushes an I/O job structure in the server.io_newjobs queue (that is, just a linked list). If there are no active I/O threads, one is started. At this point some I/O thread will process the I/O job, and the result of the processing is pushed in the server.io_processed queue. The I/O thread will send a byte using an UNIX pipe to the main thread in order to signal that a new job was processed and the result is ready to be processed.

This is how the iojob structure looks like:

typedef struct iojob {
    int type;   /* Request type, REDIS_IOJOB_* */
    redisDb *db;/* Redis database */
    robj *key;  /* This I/O request is about swapping this key */
    robj *val;  /* the value to swap for REDIS_IOREQ_*_SWAP, otherwise this
                 * field is populated by the I/O thread for REDIS_IOREQ_LOAD. */
    off_t page; /* Swap page where to read/write the object */
    off_t pages; /* Swap pages needed to save object. PREPARE_SWAP return val */
    int canceled; /* True if this command was canceled by blocking side of VM */
    pthread_t thread; /* ID of the thread processing this entry */
} iojob;

There are just three type of jobs that an I/O thread can perform (the type is specified by the type field of the structure):

  • REDIS_IOJOB_LOAD: load the value associated to a given key from swap to memory. The object offset inside the swap file is page, the object type is key->vtype. The result of this operation will populate the val field of the structure.
  • REDIS_IOJOB_PREPARE_SWAP: compute the number of pages needed in order to save the object pointed by val into the swap. The result of this operation will populate the pages field.
  • REDIS_IOJOB_DO_SWAP: Transfer the object pointed by val to the swap file, at page offset page.

The main thread delegates just the above three tasks. All the rest is handled by the I/O thread itself, for instance finding a suitable range of free pages in the swap file page table (that is a fast operation), deciding what object to swap, altering the storage field of a Redis object to reflect the current state of a value.

Non blocking VM as probabilistic enhancement of blocking VM

So now we have a way to request background jobs dealing with slow VM operations. How to add this to the mix of the rest of the work done by the main thread? While blocking VM was aware that an object was swapped out just when the object was looked up, this is too late for us: in C it is not trivial to start a background job in the middle of the command, leave the function, and re-enter in the same point the computation when the I/O thread finished what we requested (that is, no co-routines or continuations or alike).

Fortunately there was a much, much simpler way to do this. And we love simple things: basically consider the VM implementation a blocking one, but add an optimization (using non the no blocking VM operations we are able to perform) to make the blocking very unlikely.

This is what we do:

  • Every time a client sends us a command, before the command is executed, we examine the argument vector of the command in search for swapped keys. After all we know for every command what arguments are keys, as the Redis command format is pretty simple.
  • If we detect that at least a key in the requested command is swapped on disk, we block the client instead of really issuing the command. For every swapped value associated to a requested key, an I/O job is created, in order to bring the values back in memory. The main thread continues the execution of the event loop, without caring about the blocked client.
  • In the meanwhile, I/O threads are loading values in memory. Every time an I/O thread finished loading a value, it sends a byte to the main thread using an UNIX pipe. The pipe file descriptor has a readable event associated in the main thread event loop, that is the function vmThreadedIOCompletedJob. If this function detects that all the values needed for a blocked client were loaded, the client is restarted and the original command called.

So you can think of this as a blocked VM that almost always happen to have the right keys in memory, since we pause clients that are going to issue commands about swapped out values until this values are loaded.

If the function checking what argument is a key fails in some way, there is no problem: the lookup function will see that a given key is associated to a swapped out value and will block loading it. So our non blocking VM reverts to a blocking one when it is not possible to anticipate what keys are touched.

For instance in the case of the SORT command used together with the GET or BY options, it is not trivial to know beforehand what keys will be requested, so at least in the first implementation, SORT BY/GET resorts to the blocking VM implementation.

Blocking clients on swapped keys

How to block clients? To suspend a client in an event-loop based server is pretty trivial. All we do is canceling its read handler. Sometimes we do something different (for instance for BLPOP) that is just marking the client as blocked, but not processing new data (just accumulating the new data into input buffers).

Aborting I/O jobs

There is something hard to solve about the interactions between our blocking and non blocking VM, that is, what happens if a blocking operation starts about a key that is also "interested" by a non blocking operation at the same time?

For instance while SORT BY is executed, a few keys are being loaded in a blocking manner by the sort command. At the same time, another client may request the same keys with a simple GET key command, that will trigger the creation of an I/O job to load the key in background.

The only simple way to deal with this problem is to be able to kill I/O jobs in the main thread, so that if a key that we want to load or swap in a blocking way is in the REDIS_VM_LOADING or REDIS_VM_SWAPPING state (that is, there is an I/O job about this key), we can just kill the I/O job about this key, and go ahead with the blocking operation we want to perform.

This is not as trivial as it is. In a given moment an I/O job can be in one of the following three queues:

  • server.io_newjobs: the job was already queued but no thread is handling it.
  • server.io_processing: the job is being processed by an I/O thread.
  • server.io_processed: the job was already processed. The function able to kill an I/O job is vmCancelThreadedIOJob, and this is what it does:
  • If the job is in the newjobs queue, that's simple, removing the iojob structure from the queue is enough as no thread is still executing any operation.
  • If the job is in the processing queue, a thread is messing with our job (and possibly with the associated object!). The only thing we can do is waiting for the item to move to the next queue in a blocking way. Fortunately this condition happens very rarely so it's not a performance problem.
  • If the job is in the processed queue, we just mark it as canceled marking setting the canceled field to 1 in the iojob structure. The function processing completed jobs will just ignored and free the job instead of really processing it.

Questions?

This document is in no way complete, the only way to get the whole picture is reading the source code, but it should be a good introduction in order to make the code review / understanding a lot simpler.

Something is not clear about this page? Please leave a comment and I'll try to address the issue possibly integrating the answer in this document.

9.4 - Redis design draft #2 (historical)

A design for the RDB format written in the early days of Redis

Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2013), as part of a series of design drafts. This is preserved for historical interest.

Redis Design Draft 2 -- RDB version 7 info fields

  • Author: Salvatore Sanfilippo antirez@gmail.com
  • GitHub issue #1048

History of revisions

1.0, 10 April 2013 - Initial draft.

Overview

The Redis RDB format lacks a simple way to add info fields to an RDB file without causing a backward compatibility issue even if the added meta data is not required in order to load data from the RDB file.

For example thanks to the info fields specified in this document it will be possible to add to RDB information like file creation time, Redis version generating the file, and any other useful information, in a way that not every field is required for an RDB version 7 file to be correctly processed.

Also with minimal changes it will be possible to add RDB version 7 support to Redis 2.6 without actually supporting the additional fields but just skipping them when loading an RDB file.

RDB info fields may have semantic meaning if needed, so that the presence of the field may add information about the data set specified in the RDB file format, however when an info field is required to be correctly decoded in order to understand and load the data set content of the RDB file, the RDB file format must be increased so that previous versions of Redis will not attempt to load it.

However currently the info fields are designed to only hold additional information that are not useful to load the dataset, but can better specify how the RDB file was created.

Info fields representation

The RDB format 6 has the following layout:

  • A 9 bytes magic "REDIS0006"
  • key-value pairs
  • An EOF opcode
  • CRC64 checksum

The proposal for RDB format 7 is to add the optional fields immediately after the first 9 bytes magic, so that the new format will be:

  • A 9 bytes magic "REDIS0007"
  • Info field 1
  • Info field 2
  • ...
  • Info field N
  • Info field end-of-fields
  • key-value pairs
  • An EOF opcode
  • CRC64 checksum

Every single info field has the following structure:

  • A 16 bit identifier
  • A 64 bit data length
  • A data section of the exact length as specified

Both the identifier and the data length are stored in little endian byte ordering.

The special identifier 0 means that there are no other info fields, and that the remaining of the RDB file contains the key-value pairs.

Handling of info fields

A program can simply skip every info field it does not understand, as long as the RDB version matches the one that it is capable to load.

Specification of info fields IDs and content.

Info field 0 -- End of info fields

This just means there are no longer info fields to process.

Info field 1 -- Creation date

This field represents the unix time at which the RDB file was created. The format of the unix time is a 64 bit little endian integer representing seconds since 1th January 1970.

Info field 2 -- Redis version

This field represents a null-terminated string containing the Redis version that generated the file, as displayed in the Redis version INFO field.

10 - Redis modules API

Introduction to writing Redis modules

The modules documentation is composed of the following pages:

  • Introduction to Redis modules (this file). An overview about Redis Modules system and API. It's a good idea to start your reading here.
  • Implementing native data types covers the implementation of native data types into modules.
  • Blocking operations shows how to write blocking commands that will not reply immediately, but will block the client, without blocking the Redis server, and will provide a reply whenever will be possible.
  • Redis modules API reference is generated from module.c top comments of RedisModule functions. It is a good reference in order to understand how each function works.

Redis modules make it possible to extend Redis functionality using external modules, rapidly implementing new Redis commands with features similar to what can be done inside the core itself.

Redis modules are dynamic libraries that can be loaded into Redis at startup, or using the MODULE LOAD command. Redis exports a C API, in the form of a single C header file called redismodule.h. Modules are meant to be written in C, however it will be possible to use C++ or other languages that have C binding functionalities.

Modules are designed in order to be loaded into different versions of Redis, so a given module does not need to be designed, or recompiled, in order to run with a specific version of Redis. For this reason, the module will register to the Redis core using a specific API version. The current API version is "1".

This document is about an alpha version of Redis modules. API, functionalities and other details may change in the future.

Loading modules

In order to test the module you are developing, you can load the module using the following redis.conf configuration directive:

loadmodule /path/to/mymodule.so

It is also possible to load a module at runtime using the following command:

MODULE LOAD /path/to/mymodule.so

In order to list all loaded modules, use:

MODULE LIST

Finally, you can unload (and later reload if you wish) a module using the following command:

MODULE UNLOAD mymodule

Note that mymodule above is not the filename without the .so suffix, but instead, the name the module used to register itself into the Redis core. The name can be obtained using MODULE LIST. However it is good practice that the filename of the dynamic library is the same as the name the module uses to register itself into the Redis core.

The simplest module you can write

In order to show the different parts of a module, here we'll show a very simple module that implements a command that outputs a random number.

#include "redismodule.h"
#include <stdlib.h>

int HelloworldRand_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {
    RedisModule_ReplyWithLongLong(ctx,rand());
    return REDISMODULE_OK;
}

int RedisModule_OnLoad(RedisModuleCtx *ctx, RedisModuleString **argv, int argc) {
    if (RedisModule_Init(ctx,"helloworld",1,REDISMODULE_APIVER_1)
        == REDISMODULE_ERR) return REDISMODULE_ERR;

    if (RedisModule_CreateCommand(ctx,"helloworld.rand",
        HelloworldRand_RedisCommand, "fast random",
        0, 0, 0) == REDISMODULE_ERR)
        return REDISMODULE_ERR;

    return REDISMODULE_OK;
}

The example module has two functions. One implements a command called HELLOWORLD.RAND. This function is specific of that module. However the other function called RedisModule_OnLoad() must be present in each Redis module. It is the entry point for the module to be initialized, register its commands, and potentially other private data structures it uses.

Note that it is a good idea for modules to call commands with the name of the module followed by a dot, and finally the command name, like in the case of HELLOWORLD.RAND. This way it is less likely to have collisions.

Note that if different modules have colliding commands, they'll not be able to work in Redis at the same time, since the function RedisModule_CreateCommand will fail in one of the modules, so the module loading will abort returning an error condition.

Module initialization

The above example shows the usage of the function RedisModule_Init(). It should be the first function called by the module OnLoad function. The following is the function prototype:

int RedisModule_Init(RedisModuleCtx *ctx, const char *modulename,
                     int module_version, int api_version);

The Init function announces the Redis core that the module has a given name, its version (that is reported by MODULE LIST), and that is willing to use a specific version of the API.

If the API version is wrong, the name is already taken, or there are other similar errors, the function will return REDISMODULE_ERR, and the module OnLoad function should return ASAP with an error.

Before the Init function is called, no other API function can be called, otherwise the module will segfault and the Redis instance will crash.

The second function called, RedisModule_CreateCommand, is used in order to register commands into the Redis core. The following is the prototype:

int RedisModule_CreateCommand(RedisModuleCtx *ctx, const char *name,
                              RedisModuleCmdFunc cmdfunc, const char *strflags,
                              int firstkey, int lastkey, int keystep);

As you can see, most Redis modules API calls all take as first argument the context of the module, so that they have a reference to the module calling it, to the command and client executing a given command, and so forth.

To create a new command, the above function needs the context, the command's name, a pointer to the function implementing the command, the command's flags and the positions of key names in the command's arguments.

The function that implements the command must have the following prototype:

int mycommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc);

The command function arguments are just the context, that will be passed to all the other API calls, the command argument vector, and total number of arguments, as passed by the user.

As you can see, the arguments are provided as pointers to a specific data type, the RedisModuleString. This is an opaque data type you have API functions to access and use, direct access to its fields is never needed.

Zooming into the example command implementation, we can find another call:

int RedisModule_ReplyWithLongLong(RedisModuleCtx *ctx, long long integer);

This function returns an integer to the client that invoked the command, exactly like other Redis commands do, like for example INCR or SCARD.

Module cleanup

In most cases, there is no need for special cleanup. When a module is unloaded, Redis will automatically unregister commands and unsubscribe from notifications. However in the case where a module contains some persistent memory or configuration, a module may include an optional RedisModule_OnUnload function. If a module provides this function, it will be invoked during the module unload process. The following is the function prototype:

int RedisModule_OnUnload(RedisModuleCtx *ctx);

The OnUnload function may prevent module unloading by returning REDISMODULE_ERR. Otherwise, REDISMODULE_OK should be returned.

Setup and dependencies of a Redis module

Redis modules don't depend on Redis or some other library, nor they need to be compiled with a specific redismodule.h file. In order to create a new module, just copy a recent version of redismodule.h in your source tree, link all the libraries you want, and create a dynamic library having the RedisModule_OnLoad() function symbol exported.

The module will be able to load into different versions of Redis.

A module can be designed to support both newer and older Redis versions where certain API functions are not available in all versions. If an API function is not implemented in the currently running Redis version, the function pointer is set to NULL. This allows the module to check if a function exists before using it:

if (RedisModule_SetCommandInfo != NULL) {
    RedisModule_SetCommandInfo(cmd, &info);
}

In recent versions of redismodule.h, a convenience macro RMAPI_FUNC_SUPPORTED(funcname) is defined. Using the macro or just comparing with NULL is a matter of personal preference.

Passing configuration parameters to Redis modules

When the module is loaded with the MODULE LOAD command, or using the loadmodule directive in the redis.conf file, the user is able to pass configuration parameters to the module by adding arguments after the module file name:

loadmodule mymodule.so foo bar 1234

In the above example the strings foo, bar and 1234 will be passed to the module OnLoad() function in the argv argument as an array of RedisModuleString pointers. The number of arguments passed is into argc.

The way you can access those strings will be explained in the rest of this document. Normally the module will store the module configuration parameters in some static global variable that can be accessed module wide, so that the configuration can change the behavior of different commands.

Working with RedisModuleString objects

The command argument vector argv passed to module commands, and the return value of other module APIs functions, are of type RedisModuleString.

Usually you directly pass module strings to other API calls, however sometimes you may need to directly access the string object.

There are a few functions in order to work with string objects:

const char *RedisModule_StringPtrLen(RedisModuleString *string, size_t *len);

The above function accesses a string by returning its pointer and setting its length in len. You should never write to a string object pointer, as you can see from the const pointer qualifier.

However, if you want, you can create new string objects using the following API:

RedisModuleString *RedisModule_CreateString(RedisModuleCtx *ctx, const char *ptr, size_t len);

The string returned by the above command must be freed using a corresponding call to RedisModule_FreeString():

void RedisModule_FreeString(RedisModuleString *str);

However if you want to avoid having to free strings, the automatic memory management, covered later in this document, can be a good alternative, by doing it for you.

Note that the strings provided via the argument vector argv never need to be freed. You only need to free new strings you create, or new strings returned by other APIs, where it is specified that the returned string must be freed.

Creating strings from numbers or parsing strings as numbers

Creating a new string from an integer is a very common operation, so there is a function to do this:

RedisModuleString *mystr = RedisModule_CreateStringFromLongLong(ctx,10);

Similarly in order to parse a string as a number:

long long myval;
if (RedisModule_StringToLongLong(ctx,argv[1],&myval) == REDISMODULE_OK) {
    /* Do something with 'myval' */
}

Accessing Redis keys from modules

Most Redis modules, in order to be useful, have to interact with the Redis data space (this is not always true, for example an ID generator may never touch Redis keys). Redis modules have two different APIs in order to access the Redis data space, one is a low level API that provides very fast access and a set of functions to manipulate Redis data structures. The other API is more high level, and allows to call Redis commands and fetch the result, similarly to how Lua scripts access Redis.

The high level API is also useful in order to access Redis functionalities that are not available as APIs.

In general modules developers should prefer the low level API, because commands implemented using the low level API run at a speed comparable to the speed of native Redis commands. However there are definitely use cases for the higher level API. For example often the bottleneck could be processing the data and not accessing it.

Also note that sometimes using the low level API is not harder compared to the higher level one.

Calling Redis commands

The high level API to access Redis is the sum of the RedisModule_Call() function, together with the functions needed in order to access the reply object returned by Call().

RedisModule_Call uses a special calling convention, with a format specifier that is used to specify what kind of objects you are passing as arguments to the function.

Redis commands are invoked just using a command name and a list of arguments. However when calling commands, the arguments may originate from different kind of strings: null-terminated C strings, RedisModuleString objects as received from the argv parameter in the command implementation, binary safe C buffers with a pointer and a length, and so forth.

For example if I want to call INCRBY using a first argument (the key) a string received in the argument vector argv, which is an array of RedisModuleString object pointers, and a C string representing the number "10" as second argument (the increment), I'll use the following function call:

RedisModuleCallReply *reply;
reply = RedisModule_Call(ctx,"INCRBY","sc",argv[1],"10");

The first argument is the context, and the second is always a null terminated C string with the command name. The third argument is the format specifier where each character corresponds to the type of the arguments that will follow. In the above case "sc" means a RedisModuleString object, and a null terminated C string. The other arguments are just the two arguments as specified. In fact argv[1] is a RedisModuleString and "10" is a null terminated C string.

This is the full list of format specifiers:

  • c -- Null terminated C string pointer.
  • b -- C buffer, two arguments needed: C string pointer and size_t length.
  • s -- RedisModuleString as received in argv or by other Redis module APIs returning a RedisModuleString object.
  • l -- Long long integer.
  • v -- Array of RedisModuleString objects.
  • ! -- This modifier just tells the function to replicate the command to replicas and AOF. It is ignored from the point of view of arguments parsing.
  • A -- This modifier, when ! is given, tells to suppress AOF propagation: the command will be propagated only to replicas.
  • R -- This modifier, when ! is given, tells to suppress replicas propagation: the command will be propagated only to the AOF if enabled.

The function returns a RedisModuleCallReply object on success, on error NULL is returned.

NULL is returned when the command name is invalid, the format specifier uses characters that are not recognized, or when the command is called with the wrong number of arguments. In the above cases the errno var is set to EINVAL. NULL is also returned when, in an instance with Cluster enabled, the target keys are about non local hash slots. In this case errno is set to EPERM.

Working with RedisModuleCallReply objects.

RedisModuleCall returns reply objects that can be accessed using the RedisModule_CallReply* family of functions.

In order to obtain the type or reply (corresponding to one of the data types supported by the Redis protocol), the function RedisModule_CallReplyType() is used:

reply = RedisModule_Call(ctx,"INCRBY","sc",argv[1],"10");
if (RedisModule_CallReplyType(reply) == REDISMODULE_REPLY_INTEGER) {
    long long myval = RedisModule_CallReplyInteger(reply);
    /* Do something with myval. */
}

Valid reply types are:

  • REDISMODULE_REPLY_STRING Bulk string or status replies.
  • REDISMODULE_REPLY_ERROR Errors.
  • REDISMODULE_REPLY_INTEGER Signed 64 bit integers.
  • REDISMODULE_REPLY_ARRAY Array of replies.
  • REDISMODULE_REPLY_NULL NULL reply.

Strings, errors and arrays have an associated length. For strings and errors the length corresponds to the length of the string. For arrays the length is the number of elements. To obtain the reply length the following function is used:

size_t reply_len = RedisModule_CallReplyLength(reply);

In order to obtain the value of an integer reply, the following function is used, as already shown in the example above:

long long reply_integer_val = RedisModule_CallReplyInteger(reply);

Called with a reply object of the wrong type, the above function always returns LLONG_MIN.

Sub elements of array replies are accessed this way:

RedisModuleCallReply *subreply;
subreply = RedisModule_CallReplyArrayElement(reply,idx);

The above function returns NULL if you try to access out of range elements.

Strings and errors (which are like strings but with a different type) can be accessed using in the following way, making sure to never write to the resulting pointer (that is returned as as const pointer so that misusing must be pretty explicit):

size_t len;
char *ptr = RedisModule_CallReplyStringPtr(reply,&len);

If the reply type is not a string or an error, NULL is returned.

RedisCallReply objects are not the same as module string objects (RedisModuleString types). However sometimes you may need to pass replies of type string or integer, to API functions expecting a module string.

When this is the case, you may want to evaluate if using the low level API could be a simpler way to implement your command, or you can use the following function in order to create a new string object from a call reply of type string, error or integer:

RedisModuleString *mystr = RedisModule_CreateStringFromCallReply(myreply);

If the reply is not of the right type, NULL is returned. The returned string object should be released with RedisModule_FreeString() as usually, or by enabling automatic memory management (see corresponding section).

Releasing call reply objects

Reply objects must be freed using RedisModule_FreeCallReply. For arrays, you need to free only the top level reply, not the nested replies. Currently the module implementation provides a protection in order to avoid crashing if you free a nested reply object for error, however this feature is not guaranteed to be here forever, so should not be considered part of the API.

If you use automatic memory management (explained later in this document) you don't need to free replies (but you still could if you wish to release memory ASAP).

Returning values from Redis commands

Like normal Redis commands, new commands implemented via modules must be able to return values to the caller. The API exports a set of functions for this goal, in order to return the usual types of the Redis protocol, and arrays of such types as elements. Also errors can be returned with any error string and code (the error code is the initial uppercase letters in the error message, like the "BUSY" string in the "BUSY the sever is busy" error message).

All the functions to send a reply to the client are called RedisModule_ReplyWith<something>.

To return an error, use:

RedisModule_ReplyWithError(RedisModuleCtx *ctx, const char *err);

There is a predefined error string for key of wrong type errors:

REDISMODULE_ERRORMSG_WRONGTYPE

Example usage:

RedisModule_ReplyWithError(ctx,"ERR invalid arguments");

We already saw how to reply with a long long in the examples above:

RedisModule_ReplyWithLongLong(ctx,12345);

To reply with a simple string, that can't contain binary values or newlines, (so it's suitable to send small words, like "OK") we use:

RedisModule_ReplyWithSimpleString(ctx,"OK");

It's possible to reply with "bulk strings" that are binary safe, using two different functions:

int RedisModule_ReplyWithStringBuffer(RedisModuleCtx *ctx, const char *buf, size_t len);

int RedisModule_ReplyWithString(RedisModuleCtx *ctx, RedisModuleString *str);

The first function gets a C pointer and length. The second a RedisModuleString object. Use one or the other depending on the source type you have at hand.

In order to reply with an array, you just need to use a function to emit the array length, followed by as many calls to the above functions as the number of elements of the array are:

RedisModule_ReplyWithArray(ctx,2);
RedisModule_ReplyWithStringBuffer(ctx,"age",3);
RedisModule_ReplyWithLongLong(ctx,22);

To return nested arrays is easy, your nested array element just uses another call to RedisModule_ReplyWithArray() followed by the calls to emit the sub array elements.

Returning arrays with dynamic length

Sometimes it is not possible to know beforehand the number of items of an array. As an example, think of a Redis module implementing a FACTOR command that given a number outputs the prime factors. Instead of factorializing the number, storing the prime factors into an array, and later produce the command reply, a better solution is to start an array reply where the length is not known, and set it later. This is accomplished with a special argument to RedisModule_ReplyWithArray():

RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);

The above call starts an array reply so we can use other ReplyWith calls in order to produce the array items. Finally in order to set the length, use the following call:

RedisModule_ReplySetArrayLength(ctx, number_of_items);

In the case of the FACTOR command, this translates to some code similar to this:

RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
number_of_factors = 0;
while(still_factors) {
    RedisModule_ReplyWithLongLong(ctx, some_factor);
    number_of_factors++;
}
RedisModule_ReplySetArrayLength(ctx, number_of_factors);

Another common use case for this feature is iterating over the arrays of some collection and only returning the ones passing some kind of filtering.

It is possible to have multiple nested arrays with postponed reply. Each call to SetArray() will set the length of the latest corresponding call to ReplyWithArray():

RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
... generate 100 elements ...
RedisModule_ReplyWithArray(ctx, REDISMODULE_POSTPONED_LEN);
... generate 10 elements ...
RedisModule_ReplySetArrayLength(ctx, 10);
RedisModule_ReplySetArrayLength(ctx, 100);

This creates a 100 items array having as last element a 10 items array.

Arity and type checks

Often commands need to check that the number of arguments and type of the key is correct. In order to report a wrong arity, there is a specific function called RedisModule_WrongArity(). The usage is trivial:

if (argc != 2) return RedisModule_WrongArity(ctx);

Checking for the wrong type involves opening the key and checking the type:

RedisModuleKey *key = RedisModule_OpenKey(ctx,argv[1],
    REDISMODULE_READ|REDISMODULE_WRITE);

int keytype = RedisModule_KeyType(key);
if (keytype != REDISMODULE_KEYTYPE_STRING &&
    keytype != REDISMODULE_KEYTYPE_EMPTY)
{
    RedisModule_CloseKey(key);
    return RedisModule_ReplyWithError(ctx,REDISMODULE_ERRORMSG_WRONGTYPE);
}

Note that you often want to proceed with a command both if the key is of the expected type, or if it's empty.

Low level access to keys

Low level access to keys allow to perform operations on value objects associated to keys directly, with a speed similar to what Redis uses internally to implement the built-in commands.

Once a key is opened, a key pointer is returned that will be used with all the other low level API calls in order to perform operations on the key or its associated value.

Because the API is meant to be very fast, it cannot do too many run-time checks, so the user must be aware of certain rules to follow:

  • Opening the same key multiple times where at least one instance is opened for writing, is undefined and may lead to crashes.
  • While a key is open, it should only be accessed via the low level key API. For example opening a key, then calling DEL on the same key using the RedisModule_Call() API will result into a crash. However it is safe to open a key, perform some operation with the low level API, closing it, then using other APIs to manage the same key, and later opening it again to do some more work.

In order to open a key the RedisModule_OpenKey function is used. It returns a key pointer, that we'll use with all the next calls to access and modify the value:

RedisModuleKey *key;
key = RedisModule_OpenKey(ctx,argv[1],REDISMODULE_READ);

The second argument is the key name, that must be a RedisModuleString object. The third argument is the mode: REDISMODULE_READ or REDISMODULE_WRITE. It is possible to use | to bitwise OR the two modes to open the key in both modes. Currently a key opened for writing can also be accessed for reading but this is to be considered an implementation detail. The right mode should be used in sane modules.

You can open non existing keys for writing, since the keys will be created when an attempt to write to the key is performed. However when opening keys just for reading, RedisModule_OpenKey will return NULL if the key does not exist.

Once you are done using a key, you can close it with:

RedisModule_CloseKey(key);

Note that if automatic memory management is enabled, you are not forced to close keys. When the module function returns, Redis will take care to close all the keys which are still open.

Getting the key type

In order to obtain the value of a key, use the RedisModule_KeyType() function:

int keytype = RedisModule_KeyType(key);

It returns one of the following values:

REDISMODULE_KEYTYPE_EMPTY
REDISMODULE_KEYTYPE_STRING
REDISMODULE_KEYTYPE_LIST
REDISMODULE_KEYTYPE_HASH
REDISMODULE_KEYTYPE_SET
REDISMODULE_KEYTYPE_ZSET

The above are just the usual Redis key types, with the addition of an empty type, that signals the key pointer is associated with an empty key that does not yet exists.

Creating new keys

To create a new key, open it for writing and then write to it using one of the key writing functions. Example:

RedisModuleKey *key;
key = RedisModule_OpenKey(ctx,argv[1],REDISMODULE_WRITE);
if (RedisModule_KeyType(key) == REDISMODULE_KEYTYPE_EMPTY) {
    RedisModule_StringSet(key,argv[2]);
}

Deleting keys

Just use:

RedisModule_DeleteKey(key);

The function returns REDISMODULE_ERR if the key is not open for writing. Note that after a key gets deleted, it is setup in order to be targeted by new key commands. For example RedisModule_KeyType() will return it is an empty key, and writing to it will create a new key, possibly of another type (depending on the API used).

Managing key expires (TTLs)

To control key expires two functions are provided, that are able to set, modify, get, and unset the time to live associated with a key.

One function is used in order to query the current expire of an open key:

mstime_t RedisModule_GetExpire(RedisModuleKey *key);

The function returns the time to live of the key in milliseconds, or REDISMODULE_NO_EXPIRE as a special value to signal the key has no associated expire or does not exist at all (you can differentiate the two cases checking if the key type is REDISMODULE_KEYTYPE_EMPTY).

In order to change the expire of a key the following function is used instead:

int RedisModule_SetExpire(RedisModuleKey *key, mstime_t expire);

When called on a non existing key, REDISMODULE_ERR is returned, because the function can only associate expires to existing open keys (non existing open keys are only useful in order to create new values with data type specific write operations).

Again the expire time is specified in milliseconds. If the key has currently no expire, a new expire is set. If the key already have an expire, it is replaced with the new value.

If the key has an expire, and the special value REDISMODULE_NO_EXPIRE is used as a new expire, the expire is removed, similarly to the Redis PERSIST command. In case the key was already persistent, no operation is performed.

Obtaining the length of values

There is a single function in order to retrieve the length of the value associated to an open key. The returned length is value-specific, and is the string length for strings, and the number of elements for the aggregated data types (how many elements there is in a list, set, sorted set, hash).

size_t len = RedisModule_ValueLength(key);

If the key does not exist, 0 is returned by the function:

String type API

Setting a new string value, like the Redis SET command does, is performed using:

int RedisModule_StringSet(RedisModuleKey *key, RedisModuleString *str);

The function works exactly like the Redis SET command itself, that is, if there is a prior value (of any type) it will be deleted.

Accessing existing string values is performed using DMA (direct memory access) for speed. The API will return a pointer and a length, so that's possible to access and, if needed, modify the string directly.

size_t len, j;
char *myptr = RedisModule_StringDMA(key,&len,REDISMODULE_WRITE);
for (j = 0; j < len; j++) myptr[j] = 'A';

In the above example we write directly on the string. Note that if you want to write, you must be sure to ask for WRITE mode.

DMA pointers are only valid if no other operations are performed with the key before using the pointer, after the DMA call.

Sometimes when we want to manipulate strings directly, we need to change their size as well. For this scope, the RedisModule_StringTruncate function is used. Example:

RedisModule_StringTruncate(mykey,1024);

The function truncates, or enlarges the string as needed, padding it with zero bytes if the previous length is smaller than the new length we request. If the string does not exist since key is associated to an open empty key, a string value is created and associated to the key.

Note that every time StringTruncate() is called, we need to re-obtain the DMA pointer again, since the old may be invalid.

List type API

It's possible to push and pop values from list values:

int RedisModule_ListPush(RedisModuleKey *key, int where, RedisModuleString *ele);
RedisModuleString *RedisModule_ListPop(RedisModuleKey *key, int where);

In both the APIs the where argument specifies if to push or pop from tail or head, using the following macros:

REDISMODULE_LIST_HEAD
REDISMODULE_LIST_TAIL

Elements returned by RedisModule_ListPop() are like strings created with RedisModule_CreateString(), they must be released with RedisModule_FreeString() or by enabling automatic memory management.

Set type API

Work in progress.

Sorted set type API

Documentation missing, please refer to the top comments inside module.c for the following functions:

  • RedisModule_ZsetAdd
  • RedisModule_ZsetIncrby
  • RedisModule_ZsetScore
  • RedisModule_ZsetRem

And for the sorted set iterator:

  • RedisModule_ZsetRangeStop
  • RedisModule_ZsetFirstInScoreRange
  • RedisModule_ZsetLastInScoreRange
  • RedisModule_ZsetFirstInLexRange
  • RedisModule_ZsetLastInLexRange
  • RedisModule_ZsetRangeCurrentElement
  • RedisModule_ZsetRangeNext
  • RedisModule_ZsetRangePrev
  • RedisModule_ZsetRangeEndReached

Hash type API

Documentation missing, please refer to the top comments inside module.c for the following functions:

  • RedisModule_HashSet
  • RedisModule_HashGet

Iterating aggregated values

Work in progress.

Replicating commands

If you want to use module commands exactly like normal Redis commands, in the context of replicated Redis instances, or using the AOF file for persistence, it is important for module commands to handle their replication in a consistent way.

When using the higher level APIs to invoke commands, replication happens automatically if you use the "!" modifier in the format string of RedisModule_Call() as in the following example:

reply = RedisModule_Call(ctx,"INCRBY","!sc",argv[1],"10");

As you can see the format specifier is "!sc". The bang is not parsed as a format specifier, but it internally flags the command as "must replicate".

If you use the above programming style, there are no problems. However sometimes things are more complex than that, and you use the low level API. In this case, if there are no side effects in the command execution, and it consistently always performs the same work, what is possible to do is to replicate the command verbatim as the user executed it. To do that, you just need to call the following function:

RedisModule_ReplicateVerbatim(ctx);

When you use the above API, you should not use any other replication function since they are not guaranteed to mix well.

However this is not the only option. It's also possible to exactly tell Redis what commands to replicate as the effect of the command execution, using an API similar to RedisModule_Call() but that instead of calling the command sends it to the AOF / replicas stream. Example:

RedisModule_Replicate(ctx,"INCRBY","cl","foo",my_increment);

It's possible to call RedisModule_Replicate multiple times, and each will emit a command. All the sequence emitted is wrapped between a MULTI/EXEC transaction, so that the AOF and replication effects are the same as executing a single command.

Note that Call() replication and Replicate() replication have a rule, in case you want to mix both forms of replication (not necessarily a good idea if there are simpler approaches). Commands replicated with Call() are always the first emitted in the final MULTI/EXEC block, while all the commands emitted with Replicate() will follow.

Automatic memory management

Normally when writing programs in the C language, programmers need to manage memory manually. This is why the Redis modules API has functions to release strings, close open keys, free replies, and so forth.

However given that commands are executed in a contained environment and with a set of strict APIs, Redis is able to provide automatic memory management to modules, at the cost of some performance (most of the time, a very low cost).

When automatic memory management is enabled:

  1. You don't need to close open keys.
  2. You don't need to free replies.
  3. You don't need to free RedisModuleString objects.

However you can still do it, if you want. For example, automatic memory management may be active, but inside a loop allocating a lot of strings, you may still want to free strings no longer used.

In order to enable automatic memory management, just call the following function at the start of the command implementation:

RedisModule_AutoMemory(ctx);

Automatic memory management is usually the way to go, however experienced C programmers may not use it in order to gain some speed and memory usage benefit.

Allocating memory into modules

Normal C programs use malloc() and free() in order to allocate and release memory dynamically. While in Redis modules the use of malloc is not technically forbidden, it is a lot better to use the Redis Modules specific functions, that are exact replacements for malloc, free, realloc and strdup. These functions are:

void *RedisModule_Alloc(size_t bytes);
void* RedisModule_Realloc(void *ptr, size_t bytes);
void RedisModule_Free(void *ptr);
void RedisModule_Calloc(size_t nmemb, size_t size);
char *RedisModule_Strdup(const char *str);

They work exactly like their libc equivalent calls, however they use the same allocator Redis uses, and the memory allocated using these functions is reported by the INFO command in the memory section, is accounted when enforcing the maxmemory policy, and in general is a first citizen of the Redis executable. On the contrary, the method allocated inside modules with libc malloc() is transparent to Redis.

Another reason to use the modules functions in order to allocate memory is that, when creating native data types inside modules, the RDB loading functions can return deserialized strings (from the RDB file) directly as RedisModule_Alloc() allocations, so they can be used directly to populate data structures after loading, instead of having to copy them to the data structure.

Pool allocator

Sometimes in commands implementations, it is required to perform many small allocations that will be not retained at the end of the command execution, but are just functional to execute the command itself.

This work can be more easily accomplished using the Redis pool allocator:

void *RedisModule_PoolAlloc(RedisModuleCtx *ctx, size_t bytes);

It works similarly to malloc(), and returns memory aligned to the next power of two of greater or equal to bytes (for a maximum alignment of 8 bytes). However it allocates memory in blocks, so it the overhead of the allocations is small, and more important, the memory allocated is automatically released when the command returns.

So in general short living allocations are a good candidates for the pool allocator.

Writing commands compatible with Redis Cluster

Documentation missing, please check the following functions inside module.c:

RedisModule_IsKeysPositionRequest(ctx);
RedisModule_KeyAtPos(ctx,pos);

10.1 - Modules API reference

Auto-generated reference for the Redis Modules API

Sections

Heap allocation raw functions

Memory allocated with these functions are taken into account by Redis key eviction algorithms and are reported in Redis memory usage information.

RedisModule_Alloc

void *RedisModule_Alloc(size_t bytes);

Available since: 4.0.0

Use like malloc(). Memory allocated with this function is reported in Redis INFO memory, used for keys eviction according to maxmemory settings and in general is taken into account as memory allocated by Redis. You should avoid using malloc(). This function panics if unable to allocate enough memory.

RedisModule_TryAlloc

void *RedisModule_TryAlloc(size_t bytes);

Available since: 7.0.0

Similar to RedisModule_Alloc, but returns NULL in case of allocation failure, instead of panicking.

RedisModule_Calloc

void *RedisModule_Calloc(size_t nmemb, size_t size);

Available since: 4.0.0

Use like calloc(). Memory allocated with this function is reported in Redis INFO memory, used for keys eviction according to maxmemory settings and in general is taken into account as memory allocated by Redis. You should avoid using calloc() directly.

RedisModule_Realloc

void* RedisModule_Realloc(void *ptr, size_t bytes);

Available since: 4.0.0

Use like realloc() for memory obtained with RedisModule_Alloc().

RedisModule_Free

void RedisModule_Free(void *ptr);

Available since: 4.0.0

Use like free() for memory obtained by RedisModule_Alloc() and RedisModule_Realloc(). However you should never try to free with RedisModule_Free() memory allocated with malloc() inside your module.

RedisModule_Strdup

char *RedisModule_Strdup(const char *str);

Available since: 4.0.0

Like strdup() but returns memory allocated with RedisModule_Alloc().

RedisModule_PoolAlloc

void *RedisModule_PoolAlloc(RedisModuleCtx *ctx, size_t bytes);

Available since: 4.0.0

Return heap allocated memory that will be freed automatically when the module callback function returns. Mostly suitable for small allocations that are short living and must be released when the callback returns anyway. The returned memory is aligned to the architecture word size if at least word size bytes are requested, otherwise it is just aligned to the next power of two, so for example a 3 bytes request is 4 bytes aligned while a 2 bytes request is 2 bytes aligned.

There is no realloc style function since when this is needed to use the pool allocator is not a good idea.

The function returns NULL if bytes is 0.

Commands API

These functions are used to implement custom Redis commands.

For examples, see https://redis.io/topics/modules-intro.

RedisModule_IsKeysPositionRequest

int RedisModule_IsKeysPositionRequest(RedisModuleCtx *ctx);

Available since: 4.0.0

Return non-zero if a module command, that was declared with the flag "getkeys-api", is called in a special way to get the keys positions and not to get executed. Otherwise zero is returned.

RedisModule_KeyAtPosWithFlags

void RedisModule_KeyAtPosWithFlags(RedisModuleCtx *ctx, int pos, int flags);

Available since: 7.0.0

When a module command is called in order to obtain the position of keys, since it was flagged as "getkeys-api" during the registration, the command implementation checks for this special call using the RedisModule_IsKeysPositionRequest() API and uses this function in order to report keys.

The supported flags are the ones used by RedisModule_SetCommandInfo, see REDISMODULE_CMD_KEY_*.

The following is an example of how it could be used:

if (RedisModule_IsKeysPositionRequest(ctx)) {
    RedisModule_KeyAtPosWithFlags(ctx, 2, REDISMODULE_CMD_KEY_RO | REDISMODULE_CMD_KEY_ACCESS);
    RedisModule_KeyAtPosWithFlags(ctx, 1, REDISMODULE_CMD_KEY_RW | REDISMODULE_CMD_KEY_UPDATE | REDISMODULE_CMD_KEY_ACCESS);
}

Note: in the example above the get keys API could have been handled by key-specs (preferred). Implementing the getkeys-api is required only when is it not possible to declare key-specs that cover all keys.

RedisModule_KeyAtPos

void RedisModule_KeyAtPos(RedisModuleCtx *ctx, int pos);

Available since: 4.0.0

This API existed before RedisModule_KeyAtPosWithFlags was added, now deprecated and can be used for compatibility with older versions, before key-specs and flags were introduced.

RedisModule_IsChannelsPositionRequest

int RedisModule_IsChannelsPositionRequest(RedisModuleCtx *ctx);

Available since: 7.0.0

Return non-zero if a module command, that was declared with the flag "getchannels-api", is called in a special way to get the channel positions and not to get executed. Otherwise zero is returned.

RedisModule_ChannelAtPosWithFlags

void RedisModule_ChannelAtPosWithFlags(RedisModuleCtx *ctx,
                                       int pos,
                                       int flags);

Available since: 7.0.0

When a module command is called in order to obtain the position of channels, since it was flagged as "getchannels-api" during the registration, the command implementation checks for this special call using the RedisModule_IsChannelsPositionRequest() API and uses this function in order to report the channels.

The supported flags are:

  • REDISMODULE_CMD_CHANNEL_SUBSCRIBE: This command will subscribe to the channel.
  • REDISMODULE_CMD_CHANNEL_UNSUBSCRIBE: This command will unsubscribe from this channel.
  • REDISMODULE_CMD_CHANNEL_PUBLISH: This command will publish to this channel.
  • REDISMODULE_CMD_CHANNEL_PATTERN: Instead of acting on a specific channel, will act on any channel specified by the pattern. This is the same access used by the PSUBSCRIBE and PUNSUBSCRIBE commands available in Redis. Not intended to be used with PUBLISH permissions.

The following is an example of how it could be used:

if (RedisModule_IsChannelsPositionRequest(ctx)) {
    RedisModule_ChannelAtPosWithFlags(ctx, 1, REDISMODULE_CMD_CHANNEL_SUBSCRIBE | REDISMODULE_CMD_CHANNEL_PATTERN);
    RedisModule_ChannelAtPosWithFlags(ctx, 1, REDISMODULE_CMD_CHANNEL_PUBLISH);
}

Note: One usage of declaring channels is for evaluating ACL permissions. In this context, unsubscribing is always allowed, so commands will only be checked against subscribe and publish permissions. This is preferred over using RedisModule_ACLCheckChannelPermissions, since it allows the ACLs to be checked before the command is executed.

RedisModule_CreateCommand

int RedisModule_CreateCommand(RedisModuleCtx *ctx,
                              const char *name,
                              RedisModuleCmdFunc cmdfunc,
                              const char *strflags,
                              int firstkey,
                              int lastkey,
                              int keystep);

Available since: 4.0.0

Register a new command in the Redis server, that will be handled by calling the function pointer 'cmdfunc' using the RedisModule calling convention. The function returns REDISMODULE_ERR if the specified command name is already busy or a set of invalid flags were passed, otherwise REDISMODULE_OK is returned and the new command is registered.

This function must be called during the initialization of the module inside the RedisModule_OnLoad() function. Calling this function outside of the initialization function is not defined.

The command function type is the following:

 int MyCommand_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv, int argc);

And is supposed to always return REDISMODULE_OK.

The set of flags 'strflags' specify the behavior of the command, and should be passed as a C string composed of space separated words, like for example "write deny-oom". The set of flags are:

  • "write": The command may modify the data set (it may also read from it).
  • "readonly": The command returns data from keys but never writes.
  • "admin": The command is an administrative command (may change replication or perform similar tasks).
  • "deny-oom": The command may use additional memory and should be denied during out of memory conditions.
  • "deny-script": Don't allow this command in Lua scripts.
  • "allow-loading": Allow this command while the server is loading data. Only commands not interacting with the data set should be allowed to run in this mode. If not sure don't use this flag.
  • "pubsub": The command publishes things on Pub/Sub channels.
  • "random": The command may have different outputs even starting from the same input arguments and key values. Starting from Redis 7.0 this flag has been deprecated. Declaring a command as "random" can be done using command tips, see https://redis.io/topics/command-tips.
  • "allow-stale": The command is allowed to run on slaves that don't serve stale data. Don't use if you don't know what this means.
  • "no-monitor": Don't propagate the command on monitor. Use this if the command has sensitive data among the arguments.
  • "no-slowlog": Don't log this command in the slowlog. Use this if the command has sensitive data among the arguments.
  • "fast": The command time complexity is not greater than O(log(N)) where N is the size of the collection or anything else representing the normal scalability issue with the command.
  • "getkeys-api": The command implements the interface to return the arguments that are keys. Used when start/stop/step is not enough because of the command syntax.
  • "no-cluster": The command should not register in Redis Cluster since is not designed to work with it because, for example, is unable to report the position of the keys, programmatically creates key names, or any other reason.
  • "no-auth": This command can be run by an un-authenticated client. Normally this is used by a command that is used to authenticate a client.
  • "may-replicate": This command may generate replication traffic, even though it's not a write command.
  • "no-mandatory-keys": All the keys this command may take are optional
  • "blocking": The command has the potential to block the client.
  • "allow-busy": Permit the command while the server is blocked either by a script or by a slow module command, see RM_Yield.
  • "getchannels-api": The command implements the interface to return the arguments that are channels.

The last three parameters specify which arguments of the new command are Redis keys. See https://redis.io/commands/command for more information.

  • firstkey: One-based index of the first argument that's a key. Position 0 is always the command name itself. 0 for commands with no keys.
  • lastkey: One-based index of the last argument that's a key. Negative numbers refer to counting backwards from the last argument (-1 means the last argument provided) 0 for commands with no keys.
  • keystep: Step between first and last key indexes. 0 for commands with no keys.

This information is used by ACL, Cluster and the COMMAND command.

NOTE: The scheme described above serves a limited purpose and can only be used to find keys that exist at constant indices. For non-trivial key arguments, you may pass 0,0,0 and use RedisModule_SetCommandInfo to set key specs using a more advanced scheme.

RedisModule_GetCommand

RedisModuleCommand *RedisModule_GetCommand(RedisModuleCtx *ctx,
                                           const char *name);

Available since: 7.0.0

Get an opaque structure, representing a module command, by command name. This structure is used in some of the command-related APIs.

NULL is returned in case of the following errors:

  • Command not found
  • The command is not a module command
  • The command doesn't belong to the calling module

RedisModule_CreateSubcommand

int RedisModule_CreateSubcommand(RedisModuleCommand *parent,
                                 const char *name,
                                 RedisModuleCmdFunc cmdfunc,
                                 const char *strflags,
                                 int firstkey,
                                 int lastkey,
                                 int keystep);

Available since: 7.0.0

Very similar to RedisModule_CreateCommand except that it is used to create a subcommand, associated with another, container, command.

Example: If a module has a configuration command, MODULE.CONFIG, then GET and SET should be individual subcommands, while MODULE.CONFIG is a command, but should not be registered with a valid funcptr:

 if (RedisModule_CreateCommand(ctx,"module.config",NULL,"",0,0,0) == REDISMODULE_ERR)
     return REDISMODULE_ERR;

 RedisModuleCommand *parent = RedisModule_GetCommand(ctx,,"module.config");

 if (RedisModule_CreateSubcommand(parent,"set",cmd_config_set,"",0,0,0) == REDISMODULE_ERR)
    return REDISMODULE_ERR;

 if (RedisModule_CreateSubcommand(parent,"get",cmd_config_get,"",0,0,0) == REDISMODULE_ERR)
    return REDISMODULE_ERR;

Returns REDISMODULE_OK on success and REDISMODULE_ERR in case of the following errors:

  • Error while parsing strflags
  • Command is marked as no-cluster but cluster mode is enabled
  • parent is already a subcommand (we do not allow more than one level of command nesting)
  • parent is a command with an implementation (RedisModuleCmdFunc) (A parent command should be a pure container of subcommands)
  • parent already has a subcommand called name

RedisModule_SetCommandInfo

int RedisModule_SetCommandInfo(RedisModuleCommand *command,
                               const RedisModuleCommandInfo *info);

Available since: 7.0.0

Set additional command information.

Affects the output of COMMAND, COMMAND INFO and COMMAND DOCS, Cluster, ACL and is used to filter commands with the wrong number of arguments before the call reaches the module code.

This function can be called after creating a command using RedisModule_CreateCommand and fetching the command pointer using RedisModule_GetCommand. The information can only be set once for each command and has the following structure:

typedef struct RedisModuleCommandInfo {
    const RedisModuleCommandInfoVersion *version;
    const char *summary;
    const char *complexity;
    const char *since;
    RedisModuleCommandHistoryEntry *history;
    const char *tips;
    int arity;
    RedisModuleCommandKeySpec *key_specs;
    RedisModuleCommandArg *args;
} RedisModuleCommandInfo;

All fields except version are optional. Explanation of the fields:

  • version: This field enables compatibility with different Redis versions. Always set this field to REDISMODULE_COMMAND_INFO_VERSION.

  • summary: A short description of the command (optional).

  • complexity: Complexity description (optional).

  • since: The version where the command was introduced (optional). Note: The version specified should be the module's, not Redis version.

  • history: An array of RedisModuleCommandHistoryEntry (optional), which is a struct with the following fields:

      const char *since;
      const char *changes;
    

    since is a version string and changes is a string describing the changes. The array is terminated by a zeroed entry, i.e. an entry with both strings set to NULL.

  • tips: A string of space-separated tips regarding this command, meant for clients and proxies. See https://redis.io/topics/command-tips.

  • arity: Number of arguments, including the command name itself. A positive number specifies an exact number of arguments and a negative number specifies a minimum number of arguments, so use -N to say >= N. Redis validates a call before passing it to a module, so this can replace an arity check inside the module command implementation. A value of 0 (or an omitted arity field) is equivalent to -2 if the command has sub commands and -1 otherwise.

  • key_specs: An array of RedisModuleCommandKeySpec, terminated by an element memset to zero. This is a scheme that tries to describe the positions of key arguments better than the old RedisModule_CreateCommand arguments firstkey, lastkey, keystep and is needed if those three are not enough to describe the key positions. There are two steps to retrieve key positions: begin search (BS) in which index should find the first key and find keys (FK) which, relative to the output of BS, describes how can we will which arguments are keys. Additionally, there are key specific flags.

    Key-specs cause the triplet (firstkey, lastkey, keystep) given in RM_CreateCommand to be recomputed, but it is still useful to provide these three parameters in RM_CreateCommand, to better support old Redis versions where RM_SetCommandInfo is not available.

    Note that key-specs don't fully replace the "getkeys-api" (see RM_CreateCommand, RM_IsKeysPositionRequest and RM_KeyAtPosWithFlags) so it may be a good idea to supply both key-specs and implement the getkeys-api.

    A key-spec has the following structure:

      typedef struct RedisModuleCommandKeySpec {
          const char *notes;
          uint64_t flags;
          RedisModuleKeySpecBeginSearchType begin_search_type;
          union {
              struct {
                  int pos;
              } index;
              struct {
                  const char *keyword;
                  int startfrom;
              } keyword;
          } bs;
          RedisModuleKeySpecFindKeysType find_keys_type;
          union {
              struct {
                  int lastkey;
                  int keystep;
                  int limit;
              } range;
              struct {
                  int keynumidx;
                  int firstkey;
                  int keystep;
              } keynum;
          } fk;
      } RedisModuleCommandKeySpec;
    

    Explanation of the fields of RedisModuleCommandKeySpec:

    • notes: Optional notes or clarifications about this key spec.

    • flags: A bitwise or of key-spec flags described below.

    • begin_search_type: This describes how the first key is discovered. There are two ways to determine the first key:

      • REDISMODULE_KSPEC_BS_UNKNOWN: There is no way to tell where the key args start.
      • REDISMODULE_KSPEC_BS_INDEX: Key args start at a constant index.
      • REDISMODULE_KSPEC_BS_KEYWORD: Key args start just after a specific keyword.
    • bs: This is a union in which the index or keyword branch is used depending on the value of the begin_search_type field.

      • bs.index.pos: The index from which we start the search for keys. (REDISMODULE_KSPEC_BS_INDEX only.)

      • bs.keyword.keyword: The keyword (string) that indicates the beginning of key arguments. (REDISMODULE_KSPEC_BS_KEYWORD only.)

      • bs.keyword.startfrom: An index in argv from which to start searching. Can be negative, which means start search from the end, in reverse. Example: -2 means to start in reverse from the penultimate argument. (REDISMODULE_KSPEC_BS_KEYWORD only.)

    • find_keys_type: After the "begin search", this describes which arguments are keys. The strategies are:

      • REDISMODULE_KSPEC_BS_UNKNOWN: There is no way to tell where the key args are located.
      • REDISMODULE_KSPEC_FK_RANGE: Keys end at a specific index (or relative to the last argument).
      • REDISMODULE_KSPEC_FK_KEYNUM: There's an argument that contains the number of key args somewhere before the keys themselves.

      find_keys_type and fk can be omitted if this keyspec describes exactly one key.

    • fk: This is a union in which the range or keynum branch is used depending on the value of the find_keys_type field.

      • fk.range (for REDISMODULE_KSPEC_FK_RANGE): A struct with the following fields:

        • lastkey: Index of the last key relative to the result of the begin search step. Can be negative, in which case it's not relative. -1 indicates the last argument, -2 one before the last and so on.

        • keystep: How many arguments should we skip after finding a key, in order to find the next one?

        • limit: If lastkey is -1, we use limit to stop the search by a factor. 0 and 1 mean no limit. 2 means 1/2 of the remaining args, 3 means 1/3, and so on.

      • fk.keynum (for REDISMODULE_KSPEC_FK_KEYNUM): A struct with the following fields:

        • keynumidx: Index of the argument containing the number of keys to come, relative to the result of the begin search step.

        • firstkey: Index of the fist key relative to the result of the begin search step. (Usually it's just after keynumidx, in which case it should be set to keynumidx + 1.)

        • keystep: How many argumentss should we skip after finding a key, in order to find the next one?

    Key-spec flags:

    The first four refer to what the command actually does with the value or metadata of the key, and not necessarily the user data or how it affects it. Each key-spec may must have exactly one of these. Any operation that's not distinctly deletion, overwrite or read-only would be marked as RW.

    • REDISMODULE_CMD_KEY_RO: Read-Only. Reads the value of the key, but doesn't necessarily return it.

    • REDISMODULE_CMD_KEY_RW: Read-Write. Modifies the data stored in the value of the key or its metadata.

    • REDISMODULE_CMD_KEY_OW: Overwrite. Overwrites the data stored in the value of the key.

    • REDISMODULE_CMD_KEY_RM: Deletes the key.

    The next four refer to user data inside the value of the key, not the metadata like LRU, type, cardinality. It refers to the logical operation on the user's data (actual input strings or TTL), being used/returned/copied/changed. It doesn't refer to modification or returning of metadata (like type, count, presence of data). ACCESS can be combined with one of the write operations INSERT, DELETE or UPDATE. Any write that's not an INSERT or a DELETE would be UPDATE.

    • REDISMODULE_CMD_KEY_ACCESS: Returns, copies or uses the user data from the value of the key.

    • REDISMODULE_CMD_KEY_UPDATE: Updates data to the value, new value may depend on the old value.

    • REDISMODULE_CMD_KEY_INSERT: Adds data to the value with no chance of modification or deletion of existing data.

    • REDISMODULE_CMD_KEY_DELETE: Explicitly deletes some content from the value of the key.

    Other flags:

    • REDISMODULE_CMD_KEY_NOT_KEY: The key is not actually a key, but should be routed in cluster mode as if it was a key.

    • REDISMODULE_CMD_KEY_INCOMPLETE: The keyspec might not point out all the keys it should cover.

    • REDISMODULE_CMD_KEY_VARIABLE_FLAGS: Some keys might have different flags depending on arguments.

  • args: An array of RedisModuleCommandArg, terminated by an element memset to zero. RedisModuleCommandArg is a structure with at the fields described below.

      typedef struct RedisModuleCommandArg {
          const char *name;
          RedisModuleCommandArgType type;
          int key_spec_index;
          const char *token;
          const char *summary;
          const char *since;
          int flags;
          struct RedisModuleCommandArg *subargs;
      } RedisModuleCommandArg;
    

    Explanation of the fields:

    • name: Name of the argument.

    • type: The type of the argument. See below for details. The types REDISMODULE_ARG_TYPE_ONEOF and REDISMODULE_ARG_TYPE_BLOCK require an argument to have sub-arguments, i.e. subargs.

    • key_spec_index: If the type is REDISMODULE_ARG_TYPE_KEY you must provide the index of the key-spec associated with this argument. See key_specs above. If the argument is not a key, you may specify -1.

    • token: The token preceding the argument (optional). Example: the argument seconds in SET has a token EX. If the argument consists of only a token (for example NX in SET) the type should be REDISMODULE_ARG_TYPE_PURE_TOKEN and value should be NULL.

    • summary: A short description of the argument (optional).

    • since: The first version which included this argument (optional).

    • flags: A bitwise or of the macros REDISMODULE_CMD_ARG_*. See below.

    • value: The display-value of the argument. This string is what should be displayed when creating the command syntax from the output of COMMAND. If token is not NULL, it should also be displayed.

    Explanation of RedisModuleCommandArgType:

    • REDISMODULE_ARG_TYPE_STRING: String argument.
    • REDISMODULE_ARG_TYPE_INTEGER: Integer argument.
    • REDISMODULE_ARG_TYPE_DOUBLE: Double-precision float argument.
    • REDISMODULE_ARG_TYPE_KEY: String argument representing a keyname.
    • REDISMODULE_ARG_TYPE_PATTERN: String, but regex pattern.
    • REDISMODULE_ARG_TYPE_UNIX_TIME: Integer, but Unix timestamp.
    • REDISMODULE_ARG_TYPE_PURE_TOKEN: Argument doesn't have a placeholder. It's just a token without a value. Example: the KEEPTTL option of the SET command.
    • REDISMODULE_ARG_TYPE_ONEOF: Used when the user can choose only one of a few sub-arguments. Requires subargs. Example: the NX and XX options of SET.
    • REDISMODULE_ARG_TYPE_BLOCK: Used when one wants to group together several sub-arguments, usually to apply something on all of them, like making the entire group "optional". Requires subargs. Example: the LIMIT offset count parameters in ZRANGE.

    Explanation of the command argument flags:

    • REDISMODULE_CMD_ARG_OPTIONAL: The argument is optional (like GET in the SET command).
    • REDISMODULE_CMD_ARG_MULTIPLE: The argument may repeat itself (like key in DEL).
    • REDISMODULE_CMD_ARG_MULTIPLE_TOKEN: The argument may repeat itself, and so does its token (like GET pattern in SORT).

On success REDISMODULE_OK is returned. On error REDISMODULE_ERR is returned and errno is set to EINVAL if invalid info was provided or EEXIST if info has already been set. If the info is invalid, a warning is logged explaining which part of the info is invalid and why.

Module information and time measurement

RedisModule_IsModuleNameBusy

int RedisModule_IsModuleNameBusy(const char *name);

Available since: 4.0.3

Return non-zero if the module name is busy. Otherwise zero is returned.

RedisModule_Milliseconds

long long RedisModule_Milliseconds(void);

Available since: 4.0.0

Return the current UNIX time in milliseconds.

RedisModule_MonotonicMicroseconds

uint64_t RedisModule_MonotonicMicroseconds(void);

Available since: 7.0.0

Return counter of micro-seconds relative to an arbitrary point in time.

RedisModule_BlockedClientMeasureTimeStart

int RedisModule_BlockedClientMeasureTimeStart(RedisModuleBlockedClient *bc);

Available since: 6.2.0

Mark a point in time that will be used as the start time to calculate the elapsed execution time when RedisModule_BlockedClientMeasureTimeEnd() is called. Within the same command, you can call multiple times RedisModule_BlockedClientMeasureTimeStart() and RedisModule_BlockedClientMeasureTimeEnd() to accumulate independent time intervals to the background duration. This method always return REDISMODULE_OK.

RedisModule_BlockedClientMeasureTimeEnd

int RedisModule_BlockedClientMeasureTimeEnd(RedisModuleBlockedClient *bc);

Available since: 6.2.0

Mark a point in time that will be used as the end time to calculate the elapsed execution time. On success REDISMODULE_OK is returned. This method only returns REDISMODULE_ERR if no start time was previously defined ( meaning RedisModule_BlockedClientMeasureTimeStart was not called ).

RedisModule_Yield

void RedisModule_Yield(RedisModuleCtx *ctx, int flags, const char *busy_reply);

Available since: 7.0.0

This API allows modules to let Redis process background tasks, and some commands during long blocking execution of a module command. The module can call this API periodically. The flags is a bit mask of these:

  • REDISMODULE_YIELD_FLAG_NONE: No special flags, can perform some background operations, but not process client commands.
  • REDISMODULE_YIELD_FLAG_CLIENTS: Redis can also process client commands.

The busy_reply argument is optional, and can be used to control the verbose error string after the -BUSY error code.

When the REDISMODULE_YIELD_FLAG_CLIENTS is used, Redis will only start processing client commands after the time defined by the busy-reply-threshold config, in which case Redis will start rejecting most commands with -BUSY error, but allow the ones marked with the allow-busy flag to be executed. This API can also be used in thread safe context (while locked), and during loading (in the rdb_load callback, in which case it'll reject commands with the -LOADING error)

RedisModule_SetModuleOptions

void RedisModule_SetModuleOptions(RedisModuleCtx *ctx, int options);

Available since: 6.0.0

Set flags defining capabilities or behavior bit flags.

REDISMODULE_OPTIONS_HANDLE_IO_ERRORS: Generally, modules don't need to bother with this, as the process will just terminate if a read error happens, however, setting this flag would allow repl-diskless-load to work if enabled. The module should use RedisModule_IsIOError after reads, before using the data that was read, and in case of error, propagate it upwards, and also be able to release the partially populated value and all it's allocations.

REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED: See RedisModule_SignalModifiedKey().

REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD: Setting this flag indicates module awareness of diskless async replication (repl-diskless-load=swapdb) and that redis could be serving reads during replication instead of blocking with LOADING status.

RedisModule_SignalModifiedKey

int RedisModule_SignalModifiedKey(RedisModuleCtx *ctx,
                                  RedisModuleString *keyname);

Available since: 6.0.0

Signals that the key is modified from user's perspective (i.e. invalidate WATCH and client side caching).

This is done automatically when a key opened for writing is closed, unless the option REDISMODULE_OPTION_NO_IMPLICIT_SIGNAL_MODIFIED has been set using RedisModule_SetModuleOptions().

Automatic memory management for modules

RedisModule_AutoMemory

void RedisModule_AutoMemory(RedisModuleCtx *ctx);

Available since: 4.0.0

Enable automatic memory management.

The function must be called as the first function of a command implementation that wants to use automatic memory.

When enabled, automatic memory management tracks and automatically frees keys, call replies and Redis string objects once the command returns. In most cases this eliminates the need of calling the following functions:

  1. RedisModule_CloseKey()
  2. RedisModule_FreeCallReply()
  3. RedisModule_FreeString()

These functions can still be used with automatic memory management enabled, to optimize loops that make numerous allocations for example.

String objects APIs

RedisModule_CreateString

RedisModuleString *RedisModule_CreateString(RedisModuleCtx *ctx,
                                            const char *ptr,
                                            size_t len);

Available since: 4.0.0

Create a new module string object. The returned string must be freed with RedisModule_FreeString(), unless automatic memory is enabled.

The string is created by copying the len bytes starting at ptr. No reference is retained to the passed buffer.

The module context 'ctx' is optional and may be NULL if you want to create a string out of the context scope. However in that case, the automatic memory management will not be available, and the string memory must be managed manually.

RedisModule_CreateStringPrintf

RedisModuleString *RedisModule_CreateStringPrintf(RedisModuleCtx *ctx,
                                                  const char *fmt,
                                                  ...);

Available since: 4.0.0

Create a new module string object from a printf format and arguments. The returned string must be freed with RedisModule_FreeString(), unless automatic memory is enabled.

The string is created using the sds formatter function sdscatvprintf().

The passed context 'ctx' may be NULL if necessary, see the RedisModule_CreateString() documentation for more info.

RedisModule_CreateStringFromLongLong

RedisModuleString *RedisModule_CreateStringFromLongLong(RedisModuleCtx *ctx,
                                                        long long ll);

Available since: 4.0.0

Like RedisModule_CreatString(), but creates a string starting from a long long integer instead of taking a buffer and its length.

The returned string must be released with RedisModule_FreeString() or by enabling automatic memory management.

The passed context 'ctx' may be NULL if necessary, see the RedisModule_CreateString() documentation for more info.

RedisModule_CreateStringFromDouble

RedisModuleString *RedisModule_CreateStringFromDouble(RedisModuleCtx *ctx,
                                                      double d);

Available since: 6.0.0

Like RedisModule_CreatString(), but creates a string starting from a double instead of taking a buffer and its length.

The returned string must be released with RedisModule_FreeString() or by enabling automatic memory management.

RedisModule_CreateStringFromLongDouble

RedisModuleString *RedisModule_CreateStringFromLongDouble(RedisModuleCtx *ctx,
                                                          long double ld,
                                                          int humanfriendly);

Available since: 6.0.0

Like RedisModule_CreatString(), but creates a string starting from a long double.

The returned string must be released with RedisModule_FreeString() or by enabling automatic memory management.

The passed context 'ctx' may be NULL if necessary, see the RedisModule_CreateString() documentation for more info.

RedisModule_CreateStringFromString

RedisModuleString *RedisModule_CreateStringFromString(RedisModuleCtx *ctx,
                                                      const RedisModuleString *str);

Available since: 4.0.0

Like RedisModule_CreatString(), but creates a string starting from another RedisModuleString.

The returned string must be released with RedisModule_FreeString() or by enabling automatic memory management.

The passed context 'ctx' may be NULL if necessary, see the RedisModule_CreateString() documentation for more info.

RedisModule_CreateStringFromStreamID

RedisModuleString *RedisModule_CreateStringFromStreamID(RedisModuleCtx *ctx,
                                                        const RedisModuleStreamID *id);

Available since: 6.2.0

Creates a string from a stream ID. The returned string must be released with RedisModule_FreeString(), unless automatic memory is enabled.

The passed context ctx may be NULL if necessary. See the RedisModule_CreateString() documentation for more info.

RedisModule_FreeString

void RedisModule_FreeString(RedisModuleCtx *ctx, RedisModuleString *str);

Available since: 4.0.0

Free a module string object obtained with one of the Redis modules API calls that return new string objects.

It is possible to call this function even when automatic memory management is enabled. In that case the string will be released ASAP and removed from the pool of string to release at the end.

If the string was created with a NULL context 'ctx', it is also possible to pass ctx as NULL when releasing the string (but passing a context will not create any issue). Strings created with a context should be freed also passing the context, so if you want to free a string out of context later, make sure to create it using a NULL context.

RedisModule_RetainString

void RedisModule_RetainString(RedisModuleCtx *ctx, RedisModuleString *str);

Available since: 4.0.0

Every call to this function, will make the string 'str' requiring an additional call to RedisModule_FreeString() in order to really free the string. Note that the automatic freeing of the string obtained enabling modules automatic memory management counts for one RedisModule_FreeString() call (it is just executed automatically).

Normally you want to call this function when, at the same time the following conditions are true:

  1. You have automatic memory management enabled.
  2. You want to create string objects.
  3. Those string objects you create need to live after the callback function(for example a command implementation) creating them returns.

Usually you want this in order to store the created string object into your own data structure, for example when implementing a new data type.

Note that when memory management is turned off, you don't need any call to RetainString() since creating a string will always result into a string that lives after the callback function returns, if no FreeString() call is performed.

It is possible to call this function with a NULL context.

When strings are going to be retained for an extended duration, it is good practice to also call RedisModule_TrimStringAllocation() in order to optimize memory usage.

Threaded modules that reference retained strings from other threads must explicitly trim the allocation as soon as the string is retained. Not doing so may result with automatic trimming which is not thread safe.

RedisModule_HoldString

RedisModuleString* RedisModule_HoldString(RedisModuleCtx *ctx,
                                          RedisModuleString *str);

Available since: 6.0.7

This function can be used instead of RedisModule_RetainString(). The main difference between the two is that this function will always succeed, whereas RedisModule_RetainString() may fail because of an assertion.

The function returns a pointer to RedisModuleString, which is owned by the caller. It requires a call to RedisModule_FreeString() to free the string when automatic memory management is disabled for the context. When automatic memory management is enabled, you can either call RedisModule_FreeString() or let the automation free it.

This function is more efficient than RedisModule_CreateStringFromString() because whenever possible, it avoids copying the underlying RedisModuleString. The disadvantage of using this function is that it might not be possible to use RedisModule_StringAppendBuffer() on the returned RedisModuleString.

It is possible to call this function with a NULL context.

When strings are going to be held for an extended duration, it is good practice to also call RedisModule_TrimStringAllocation() in order to optimize memory usage.

Threaded modules that reference held strings from other threads must explicitly trim the allocation as soon as the string is held. Not doing so may result with automatic trimming which is not thread safe.

RedisModule_StringPtrLen

const char *RedisModule_StringPtrLen(const RedisModuleString *str,
                                     size_t *len);

Available since: 4.0.0

Given a string module object, this function returns the string pointer and length of the string. The returned pointer and length should only be used for read only accesses and never modified.

RedisModule_StringToLongLong

int RedisModule_StringToLongLong(const RedisModuleString *str, long long *ll);

Available since: 4.0.0

Convert the string into a long long integer, storing it at *ll. Returns REDISMODULE_OK on success. If the string can't be parsed as a valid, strict long long (no spaces before/after), REDISMODULE_ERR is returned.

RedisModule_StringToDouble

int RedisModule_StringToDouble(const RedisModuleString *str, double *d);

Available since: 4.0.0

Convert the string into a double, storing it at *d. Returns REDISMODULE_OK on success or REDISMODULE_ERR if the string is not a valid string representation of a double value.

RedisModule_StringToLongDouble

int RedisModule_StringToLongDouble(const RedisModuleString *str,
                                   long double *ld);

Available since: 6.0.0

Convert the string into a long double, storing it at *ld. Returns REDISMODULE_OK on success or REDISMODULE_ERR if the string is not a valid string representation of a double value.

RedisModule_StringToStreamID

int RedisModule_StringToStreamID(const RedisModuleString *str,
                                 RedisModuleStreamID *id);

Available since: 6.2.0

Convert the string into a stream ID, storing it at *id. Returns REDISMODULE_OK on success and returns REDISMODULE_ERR if the string is not a valid string representation of a stream ID. The special IDs "+" and "-" are allowed.

RedisModule_StringCompare

int RedisModule_StringCompare(RedisModuleString *a, RedisModuleString *b);

Available since: 4.0.0

Compare two string objects, returning -1, 0 or 1 respectively if a < b, a == b, a > b. Strings are compared byte by byte as two binary blobs without any encoding care / collation attempt.

RedisModule_StringAppendBuffer

int RedisModule_StringAppendBuffer(RedisModuleCtx *ctx,
                                   RedisModuleString *str,
                                   const char *buf,
                                   size_t len);

Available since: 4.0.0

Append the specified buffer to the string 'str'. The string must be a string created by the user that is referenced only a single time, otherwise REDISMODULE_ERR is returned and the operation is not performed.

RedisModule_TrimStringAllocation

void RedisModule_TrimStringAllocation(RedisModuleString *str);

Available since: 7.0.0

Trim possible excess memory allocated for a RedisModuleString.

Sometimes a RedisModuleString may have more memory allocated for it than required, typically for argv arguments that were constructed from network buffers. This function optimizes such strings by reallocating their memory, which is useful for strings that are not short lived but retained for an extended duration.

This operation is not thread safe and should only be called when no concurrent access to the string is guaranteed. Using it for an argv string in a module command before the string is potentially available to other threads is generally safe.

Currently, Redis may also automatically trim retained strings when a module command returns. However, doing this explicitly should still be a preferred option:

  1. Future versions of Redis may abandon auto-trimming.
  2. Auto-trimming as currently implemented is not thread safe. A background thread manipulating a recently retained string may end up in a race condition with the auto-trim, which could result with data corruption.

Reply APIs

These functions are used for sending replies to the client.

Most functions always return REDISMODULE_OK so you can use it with 'return' in order to return from the command implementation with:

if (... some condition ...)
    return RedisModule_ReplyWithLongLong(ctx,mycount);

Reply with collection functions

After starting a collection reply, the module must make calls to other ReplyWith* style functions in order to emit the elements of the collection. Collection types include: Array, Map, Set and Attribute.

When producing collections with a number of elements that is not known beforehand, the function can be called with a special flag REDISMODULE_POSTPONED_LEN (REDISMODULE_POSTPONED_ARRAY_LEN in the past), and the actual number of elements can be later set with RedisModule_ReplySet*Length() call (which will set the latest "open" count if there are multiple ones).

RedisModule_WrongArity

int RedisModule_WrongArity(RedisModuleCtx *ctx);

Available since: 4.0.0

Send an error about the number of arguments given to the command, citing the command name in the error message. Returns REDISMODULE_OK.

Example:

if (argc != 3) return RedisModule_WrongArity(ctx);

RedisModule_ReplyWithLongLong

int RedisModule_ReplyWithLongLong(RedisModuleCtx *ctx, long long ll);

Available since: 4.0.0

Send an integer reply to the client, with the specified long long value. The function always returns REDISMODULE_OK.

RedisModule_ReplyWithError

int RedisModule_ReplyWithError(RedisModuleCtx *ctx, const char *err);

Available since: 4.0.0

Reply with the error 'err'.

Note that 'err' must contain all the error, including the initial error code. The function only provides the initial "-", so the usage is, for example:

RedisModule_ReplyWithError(ctx,"ERR Wrong Type");

and not just:

RedisModule_ReplyWithError(ctx,"Wrong Type");

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithSimpleString

int RedisModule_ReplyWithSimpleString(RedisModuleCtx *ctx, const char *msg);

Available since: 4.0.0

Reply with a simple string (+... \r\n in RESP protocol). This replies are suitable only when sending a small non-binary string with small overhead, like "OK" or similar replies.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithArray

int RedisModule_ReplyWithArray(RedisModuleCtx *ctx, long len);

Available since: 4.0.0

Reply with an array type of 'len' elements.

After starting an array reply, the module must make len calls to other ReplyWith* style functions in order to emit the elements of the array. See Reply APIs section for more details.

Use RedisModule_ReplySetArrayLength() to set deferred length.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithMap

int RedisModule_ReplyWithMap(RedisModuleCtx *ctx, long len);

Available since: 7.0.0

Reply with a RESP3 Map type of 'len' pairs. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

After starting a map reply, the module must make len*2 calls to other ReplyWith* style functions in order to emit the elements of the map. See Reply APIs section for more details.

If the connected client is using RESP2, the reply will be converted to a flat array.

Use RedisModule_ReplySetMapLength() to set deferred length.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithSet

int RedisModule_ReplyWithSet(RedisModuleCtx *ctx, long len);

Available since: 7.0.0

Reply with a RESP3 Set type of 'len' elements. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

After starting a set reply, the module must make len calls to other ReplyWith* style functions in order to emit the elements of the set. See Reply APIs section for more details.

If the connected client is using RESP2, the reply will be converted to an array type.

Use RedisModule_ReplySetSetLength() to set deferred length.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithAttribute

int RedisModule_ReplyWithAttribute(RedisModuleCtx *ctx, long len);

Available since: 7.0.0

Add attributes (metadata) to the reply. Should be done before adding the actual reply. see https://github.com/antirez/RESP3/blob/master/spec.md#attribute-type

After starting an attributes reply, the module must make len*2 calls to other ReplyWith* style functions in order to emit the elements of the attribtute map. See Reply APIs section for more details.

Use RedisModule_ReplySetAttributeLength() to set deferred length.

Not supported by RESP2 and will return REDISMODULE_ERR, otherwise the function always returns REDISMODULE_OK.

RedisModule_ReplyWithNullArray

int RedisModule_ReplyWithNullArray(RedisModuleCtx *ctx);

Available since: 6.0.0

Reply to the client with a null array, simply null in RESP3, null array in RESP2.

Note: In RESP3 there's no difference between Null reply and NullArray reply, so to prevent ambiguity it's better to avoid using this API and use RedisModule_ReplyWithNull instead.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithEmptyArray

int RedisModule_ReplyWithEmptyArray(RedisModuleCtx *ctx);

Available since: 6.0.0

Reply to the client with an empty array.

The function always returns REDISMODULE_OK.

RedisModule_ReplySetArrayLength

void RedisModule_ReplySetArrayLength(RedisModuleCtx *ctx, long len);

Available since: 4.0.0

When RedisModule_ReplyWithArray() is used with the argument REDISMODULE_POSTPONED_LEN, because we don't know beforehand the number of items we are going to output as elements of the array, this function will take care to set the array length.

Since it is possible to have multiple array replies pending with unknown length, this function guarantees to always set the latest array length that was created in a postponed way.

For example in order to output an array like [1,[10,20,30]] we could write:

 RedisModule_ReplyWithArray(ctx,REDISMODULE_POSTPONED_LEN);
 RedisModule_ReplyWithLongLong(ctx,1);
 RedisModule_ReplyWithArray(ctx,REDISMODULE_POSTPONED_LEN);
 RedisModule_ReplyWithLongLong(ctx,10);
 RedisModule_ReplyWithLongLong(ctx,20);
 RedisModule_ReplyWithLongLong(ctx,30);
 RedisModule_ReplySetArrayLength(ctx,3); // Set len of 10,20,30 array.
 RedisModule_ReplySetArrayLength(ctx,2); // Set len of top array

Note that in the above example there is no reason to postpone the array length, since we produce a fixed number of elements, but in the practice the code may use an iterator or other ways of creating the output so that is not easy to calculate in advance the number of elements.

RedisModule_ReplySetMapLength

void RedisModule_ReplySetMapLength(RedisModuleCtx *ctx, long len);

Available since: 7.0.0

Very similar to RedisModule_ReplySetArrayLength except len should exactly half of the number of ReplyWith* functions called in the context of the map. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

RedisModule_ReplySetSetLength

void RedisModule_ReplySetSetLength(RedisModuleCtx *ctx, long len);

Available since: 7.0.0

Very similar to RedisModule_ReplySetArrayLength Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

RedisModule_ReplySetAttributeLength

void RedisModule_ReplySetAttributeLength(RedisModuleCtx *ctx, long len);

Available since: 7.0.0

Very similar to RedisModule_ReplySetMapLength Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

Must not be called if RedisModule_ReplyWithAttribute returned an error.

RedisModule_ReplyWithStringBuffer

int RedisModule_ReplyWithStringBuffer(RedisModuleCtx *ctx,
                                      const char *buf,
                                      size_t len);

Available since: 4.0.0

Reply with a bulk string, taking in input a C buffer pointer and length.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithCString

int RedisModule_ReplyWithCString(RedisModuleCtx *ctx, const char *buf);

Available since: 5.0.6

Reply with a bulk string, taking in input a C buffer pointer that is assumed to be null-terminated.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithString

int RedisModule_ReplyWithString(RedisModuleCtx *ctx, RedisModuleString *str);

Available since: 4.0.0

Reply with a bulk string, taking in input a RedisModuleString object.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithEmptyString

int RedisModule_ReplyWithEmptyString(RedisModuleCtx *ctx);

Available since: 6.0.0

Reply with an empty string.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithVerbatimStringType

int RedisModule_ReplyWithVerbatimStringType(RedisModuleCtx *ctx,
                                            const char *buf,
                                            size_t len,
                                            const char *ext);

Available since: 7.0.0

Reply with a binary safe string, which should not be escaped or filtered taking in input a C buffer pointer, length and a 3 character type/extension.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithVerbatimString

int RedisModule_ReplyWithVerbatimString(RedisModuleCtx *ctx,
                                        const char *buf,
                                        size_t len);

Available since: 6.0.0

Reply with a binary safe string, which should not be escaped or filtered taking in input a C buffer pointer and length.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithNull

int RedisModule_ReplyWithNull(RedisModuleCtx *ctx);

Available since: 4.0.0

Reply to the client with a NULL.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithBool

int RedisModule_ReplyWithBool(RedisModuleCtx *ctx, int b);

Available since: 7.0.0

Reply with a RESP3 Boolean type. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

In RESP3, this is boolean type In RESP2, it's a string response of "1" and "0" for true and false respectively.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithCallReply

int RedisModule_ReplyWithCallReply(RedisModuleCtx *ctx,
                                   RedisModuleCallReply *reply);

Available since: 4.0.0

Reply exactly what a Redis command returned us with RedisModule_Call(). This function is useful when we use RedisModule_Call() in order to execute some command, as we want to reply to the client exactly the same reply we obtained by the command.

Return:

  • REDISMODULE_OK on success.
  • REDISMODULE_ERR if the given reply is in RESP3 format but the client expects RESP2. In case of an error, it's the module writer responsibility to translate the reply to RESP2 (or handle it differently by returning an error). Notice that for module writer convenience, it is possible to pass 0 as a parameter to the fmt argument of RM_Call so that the RedisModuleCallReply will return in the same protocol (RESP2 or RESP3) as set in the current client's context.

RedisModule_ReplyWithDouble

int RedisModule_ReplyWithDouble(RedisModuleCtx *ctx, double d);

Available since: 4.0.0

Reply with a RESP3 Double type. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

Send a string reply obtained converting the double 'd' into a bulk string. This function is basically equivalent to converting a double into a string into a C buffer, and then calling the function RedisModule_ReplyWithStringBuffer() with the buffer and length.

In RESP3 the string is tagged as a double, while in RESP2 it's just a plain string that the user will have to parse.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithBigNumber

int RedisModule_ReplyWithBigNumber(RedisModuleCtx *ctx,
                                   const char *bignum,
                                   size_t len);

Available since: 7.0.0

Reply with a RESP3 BigNumber type. Visit https://github.com/antirez/RESP3/blob/master/spec.md for more info about RESP3.

In RESP3, this is a string of length len that is tagged as a BigNumber, however, it's up to the caller to ensure that it's a valid BigNumber. In RESP2, this is just a plain bulk string response.

The function always returns REDISMODULE_OK.

RedisModule_ReplyWithLongDouble

int RedisModule_ReplyWithLongDouble(RedisModuleCtx *ctx, long double ld);

Available since: 6.0.0

Send a string reply obtained converting the long double 'ld' into a bulk string. This function is basically equivalent to converting a long double into a string into a C buffer, and then calling the function RedisModule_ReplyWithStringBuffer() with the buffer and length. The double string uses human readable formatting (see addReplyHumanLongDouble in networking.c).

The function always returns REDISMODULE_OK.

Commands replication API

RedisModule_Replicate

int RedisModule_Replicate(RedisModuleCtx *ctx,
                          const char *cmdname,
                          const char *fmt,
                          ...);

Available since: 4.0.0

Replicate the specified command and arguments to slaves and AOF, as effect of execution of the calling command implementation.

The replicated commands are always wrapped into the MULTI/EXEC that contains all the commands replicated in a given module command execution. However the commands replicated with RedisModule_Call() are the first items, the ones replicated with RedisModule_Replicate() will all follow before the EXEC.

Modules should try to use one interface or the other.

This command follows exactly the same interface of RedisModule_Call(), so a set of format specifiers must be passed, followed by arguments matching the provided format specifiers.

Please refer to RedisModule_Call() for more information.

Using the special "A" and "R" modifiers, the caller can exclude either the AOF or the replicas from the propagation of the specified command. Otherwise, by default, the command will be propagated in both channels.

Note about calling this function from a thread safe context:

Normally when you call this function from the callback implementing a module command, or any other callback provided by the Redis Module API, Redis will accumulate all the calls to this function in the context of the callback, and will propagate all the commands wrapped in a MULTI/EXEC transaction. However when calling this function from a threaded safe context that can live an undefined amount of time, and can be locked/unlocked in at will, the behavior is different: MULTI/EXEC wrapper is not emitted and the command specified is inserted in the AOF and replication stream immediately.

Return value

The command returns REDISMODULE_ERR if the format specifiers are invalid or the command name does not belong to a known command.

RedisModule_ReplicateVerbatim

int RedisModule_ReplicateVerbatim(RedisModuleCtx *ctx);

Available since: 4.0.0

This function will replicate the command exactly as it was invoked by the client. Note that this function will not wrap the command into a MULTI/EXEC stanza, so it should not be mixed with other replication commands.

Basically this form of replication is useful when you want to propagate the command to the slaves and AOF file exactly as it was called, since the command can just be re-executed to deterministically re-create the new state starting from the old one.

The function always returns REDISMODULE_OK.

DB and Key APIs – Generic API

RedisModule_GetClientId

unsigned long long RedisModule_GetClientId(RedisModuleCtx *ctx);

Available since: 4.0.0

Return the ID of the current client calling the currently active module command. The returned ID has a few guarantees:

  1. The ID is different for each different client, so if the same client executes a module command multiple times, it can be recognized as having the same ID, otherwise the ID will be different.
  2. The ID increases monotonically. Clients connecting to the server later are guaranteed to get IDs greater than any past ID previously seen.

Valid IDs are from 1 to 2^64 - 1. If 0 is returned it means there is no way to fetch the ID in the context the function was currently called.

After obtaining the ID, it is possible to check if the command execution is actually happening in the context of AOF loading, using this macro:

 if (RedisModule_IsAOFClient(RedisModule_GetClientId(ctx)) {
     // Handle it differently.
 }

RedisModule_GetClientUserNameById

RedisModuleString *RedisModule_GetClientUserNameById(RedisModuleCtx *ctx,
                                                     uint64_t id);

Available since: 6.2.1

Return the ACL user name used by the client with the specified client ID. Client ID can be obtained with RedisModule_GetClientId() API. If the client does not exist, NULL is returned and errno is set to ENOENT. If the client isn't using an ACL user, NULL is returned and errno is set to ENOTSUP

RedisModule_GetClientInfoById

int RedisModule_GetClientInfoById(void *ci, uint64_t id);

Available since: 6.0.0

Return information about the client with the specified ID (that was previously obtained via the RedisModule_GetClientId() API). If the client exists, REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned.

When the client exist and the ci pointer is not NULL, but points to a structure of type RedisModuleClientInfo, previously initialized with the correct REDISMODULE_CLIENTINFO_INITIALIZER, the structure is populated with the following fields:

 uint64_t flags;         // REDISMODULE_CLIENTINFO_FLAG_*
 uint64_t id;            // Client ID
 char addr[46];          // IPv4 or IPv6 address.
 uint16_t port;          // TCP port.
 uint16_t db;            // Selected DB.

Note: the client ID is useless in the context of this call, since we already know, however the same structure could be used in other contexts where we don't know the client ID, yet the same structure is returned.

With flags having the following meaning:

REDISMODULE_CLIENTINFO_FLAG_SSL          Client using SSL connection.
REDISMODULE_CLIENTINFO_FLAG_PUBSUB       Client in Pub/Sub mode.
REDISMODULE_CLIENTINFO_FLAG_BLOCKED      Client blocked in command.
REDISMODULE_CLIENTINFO_FLAG_TRACKING     Client with keys tracking on.
REDISMODULE_CLIENTINFO_FLAG_UNIXSOCKET   Client using unix domain socket.
REDISMODULE_CLIENTINFO_FLAG_MULTI        Client in MULTI state.

However passing NULL is a way to just check if the client exists in case we are not interested in any additional information.

This is the correct usage when we want the client info structure returned:

 RedisModuleClientInfo ci = REDISMODULE_CLIENTINFO_INITIALIZER;
 int retval = RedisModule_GetClientInfoById(&ci,client_id);
 if (retval == REDISMODULE_OK) {
     printf("Address: %s\n", ci.addr);
 }

RedisModule_PublishMessage

int RedisModule_PublishMessage(RedisModuleCtx *ctx,
                               RedisModuleString *channel,
                               RedisModuleString *message);

Available since: 6.0.0

Publish a message to subscribers (see PUBLISH command).

RedisModule_PublishMessageShard

int RedisModule_PublishMessageShard(RedisModuleCtx *ctx,
                                    RedisModuleString *channel,
                                    RedisModuleString *message);

Available since: 7.0.0

Publish a message to shard-subscribers (see SPUBLISH command).

RedisModule_GetSelectedDb

int RedisModule_GetSelectedDb(RedisModuleCtx *ctx);

Available since: 4.0.0

Return the currently selected DB.

RedisModule_GetContextFlags

int RedisModule_GetContextFlags(RedisModuleCtx *ctx);

Available since: 4.0.3

Return the current context's flags. The flags provide information on the current request context (whether the client is a Lua script or in a MULTI), and about the Redis instance in general, i.e replication and persistence.

It is possible to call this function even with a NULL context, however in this case the following flags will not be reported:

  • LUA, MULTI, REPLICATED, DIRTY (see below for more info).

Available flags and their meaning:

  • REDISMODULE_CTX_FLAGS_LUA: The command is running in a Lua script

  • REDISMODULE_CTX_FLAGS_MULTI: The command is running inside a transaction

  • REDISMODULE_CTX_FLAGS_REPLICATED: The command was sent over the replication link by the MASTER

  • REDISMODULE_CTX_FLAGS_MASTER: The Redis instance is a master

  • REDISMODULE_CTX_FLAGS_SLAVE: The Redis instance is a slave

  • REDISMODULE_CTX_FLAGS_READONLY: The Redis instance is read-only

  • REDISMODULE_CTX_FLAGS_CLUSTER: The Redis instance is in cluster mode

  • REDISMODULE_CTX_FLAGS_AOF: The Redis instance has AOF enabled

  • REDISMODULE_CTX_FLAGS_RDB: The instance has RDB enabled

  • REDISMODULE_CTX_FLAGS_MAXMEMORY: The instance has Maxmemory set

  • REDISMODULE_CTX_FLAGS_EVICT: Maxmemory is set and has an eviction policy that may delete keys

  • REDISMODULE_CTX_FLAGS_OOM: Redis is out of memory according to the maxmemory setting.

  • REDISMODULE_CTX_FLAGS_OOM_WARNING: Less than 25% of memory remains before reaching the maxmemory level.

  • REDISMODULE_CTX_FLAGS_LOADING: Server is loading RDB/AOF

  • REDISMODULE_CTX_FLAGS_REPLICA_IS_STALE: No active link with the master.

  • REDISMODULE_CTX_FLAGS_REPLICA_IS_CONNECTING: The replica is trying to connect with the master.

  • REDISMODULE_CTX_FLAGS_REPLICA_IS_TRANSFERRING: Master -> Replica RDB transfer is in progress.

  • REDISMODULE_CTX_FLAGS_REPLICA_IS_ONLINE: The replica has an active link with its master. This is the contrary of STALE state.

  • REDISMODULE_CTX_FLAGS_ACTIVE_CHILD: There is currently some background process active (RDB, AUX or module).

  • REDISMODULE_CTX_FLAGS_MULTI_DIRTY: The next EXEC will fail due to dirty CAS (touched keys).

  • REDISMODULE_CTX_FLAGS_IS_CHILD: Redis is currently running inside background child process.

  • REDISMODULE_CTX_FLAGS_RESP3: Indicate the that client attached to this context is using RESP3.

RedisModule_AvoidReplicaTraffic

int RedisModule_AvoidReplicaTraffic();

Available since: 6.0.0

Returns true if a client sent the CLIENT PAUSE command to the server or if Redis Cluster does a manual failover, pausing the clients. This is needed when we have a master with replicas, and want to write, without adding further data to the replication channel, that the replicas replication offset, match the one of the master. When this happens, it is safe to failover the master without data loss.

However modules may generate traffic by calling RedisModule_Call() with the "!" flag, or by calling RedisModule_Replicate(), in a context outside commands execution, for instance in timeout callbacks, threads safe contexts, and so forth. When modules will generate too much traffic, it will be hard for the master and replicas offset to match, because there is more data to send in the replication channel.

So modules may want to try to avoid very heavy background work that has the effect of creating data to the replication channel, when this function returns true. This is mostly useful for modules that have background garbage collection tasks, or that do writes and replicate such writes periodically in timer callbacks or other periodic callbacks.

RedisModule_SelectDb

int RedisModule_SelectDb(RedisModuleCtx *ctx, int newid);

Available since: 4.0.0

Change the currently selected DB. Returns an error if the id is out of range.

Note that the client will retain the currently selected DB even after the Redis command implemented by the module calling this function returns.

If the module command wishes to change something in a different DB and returns back to the original one, it should call RedisModule_GetSelectedDb() before in order to restore the old DB number before returning.

RedisModule_KeyExists

int RedisModule_KeyExists(RedisModuleCtx *ctx, robj *keyname);

Available since: 7.0.0

Check if a key exists, without affecting its last access time.

This is equivalent to calling RedisModule_OpenKey with the mode REDISMODULE_READ | REDISMODULE_OPEN_KEY_NOTOUCH, then checking if NULL was returned and, if not, calling RedisModule_CloseKey on the opened key.

RedisModule_OpenKey

RedisModuleKey *RedisModule_OpenKey(RedisModuleCtx *ctx,
                                    robj *keyname,
                                    int mode);

Available since: 4.0.0

Return an handle representing a Redis key, so that it is possible to call other APIs with the key handle as argument to perform operations on the key.

The return value is the handle representing the key, that must be closed with RedisModule_CloseKey().

If the key does not exist and WRITE mode is requested, the handle is still returned, since it is possible to perform operations on a yet not existing key (that will be created, for example, after a list push operation). If the mode is just READ instead, and the key does not exist, NULL is returned. However it is still safe to call RedisModule_CloseKey() and RedisModule_KeyType() on a NULL value.

RedisModule_CloseKey

void RedisModule_CloseKey(RedisModuleKey *key);

Available since: 4.0.0

Close a key handle.

RedisModule_KeyType

int RedisModule_KeyType(RedisModuleKey *key);

Available since: 4.0.0

Return the type of the key. If the key pointer is NULL then REDISMODULE_KEYTYPE_EMPTY is returned.

RedisModule_ValueLength

size_t RedisModule_ValueLength(RedisModuleKey *key);

Available since: 4.0.0

Return the length of the value associated with the key. For strings this is the length of the string. For all the other types is the number of elements (just counting keys for hashes).

If the key pointer is NULL or the key is empty, zero is returned.

RedisModule_DeleteKey

int RedisModule_DeleteKey(RedisModuleKey *key);

Available since: 4.0.0

If the key is open for writing, remove it, and setup the key to accept new writes as an empty key (that will be created on demand). On success REDISMODULE_OK is returned. If the key is not open for writing REDISMODULE_ERR is returned.

RedisModule_UnlinkKey

int RedisModule_UnlinkKey(RedisModuleKey *key);

Available since: 4.0.7

If the key is open for writing, unlink it (that is delete it in a non-blocking way, not reclaiming memory immediately) and setup the key to accept new writes as an empty key (that will be created on demand). On success REDISMODULE_OK is returned. If the key is not open for writing REDISMODULE_ERR is returned.

RedisModule_GetExpire

mstime_t RedisModule_GetExpire(RedisModuleKey *key);

Available since: 4.0.0

Return the key expire value, as milliseconds of remaining TTL. If no TTL is associated with the key or if the key is empty, REDISMODULE_NO_EXPIRE is returned.

RedisModule_SetExpire

int RedisModule_SetExpire(RedisModuleKey *key, mstime_t expire);

Available since: 4.0.0

Set a new expire for the key. If the special expire REDISMODULE_NO_EXPIRE is set, the expire is cancelled if there was one (the same as the PERSIST command).

Note that the expire must be provided as a positive integer representing the number of milliseconds of TTL the key should have.

The function returns REDISMODULE_OK on success or REDISMODULE_ERR if the key was not open for writing or is an empty key.

RedisModule_GetAbsExpire

mstime_t RedisModule_GetAbsExpire(RedisModuleKey *key);

Available since: 6.2.2

Return the key expire value, as absolute Unix timestamp. If no TTL is associated with the key or if the key is empty, REDISMODULE_NO_EXPIRE is returned.

RedisModule_SetAbsExpire

int RedisModule_SetAbsExpire(RedisModuleKey *key, mstime_t expire);

Available since: 6.2.2

Set a new expire for the key. If the special expire REDISMODULE_NO_EXPIRE is set, the expire is cancelled if there was one (the same as the PERSIST command).

Note that the expire must be provided as a positive integer representing the absolute Unix timestamp the key should have.

The function returns REDISMODULE_OK on success or REDISMODULE_ERR if the key was not open for writing or is an empty key.

RedisModule_ResetDataset

void RedisModule_ResetDataset(int restart_aof, int async);

Available since: 6.0.0

Performs similar operation to FLUSHALL, and optionally start a new AOF file (if enabled) If restart_aof is true, you must make sure the command that triggered this call is not propagated to the AOF file. When async is set to true, db contents will be freed by a background thread.

RedisModule_DbSize

unsigned long long RedisModule_DbSize(RedisModuleCtx *ctx);

Available since: 6.0.0

Returns the number of keys in the current db.

RedisModule_RandomKey

RedisModuleString *RedisModule_RandomKey(RedisModuleCtx *ctx);

Available since: 6.0.0

Returns a name of a random key, or NULL if current db is empty.

RedisModule_GetKeyNameFromOptCtx

const RedisModuleString *RedisModule_GetKeyNameFromOptCtx(RedisModuleKeyOptCtx *ctx);

Available since: 7.0.0

Returns the name of the key currently being processed.

RedisModule_GetToKeyNameFromOptCtx

const RedisModuleString *RedisModule_GetToKeyNameFromOptCtx(RedisModuleKeyOptCtx *ctx);

Available since: 7.0.0

Returns the name of the target key currently being processed.

RedisModule_GetDbIdFromOptCtx

int RedisModule_GetDbIdFromOptCtx(RedisModuleKeyOptCtx *ctx);

Available since: 7.0.0

Returns the dbid currently being processed.

RedisModule_GetToDbIdFromOptCtx

int RedisModule_GetToDbIdFromOptCtx(RedisModuleKeyOptCtx *ctx);

Available since: 7.0.0

Returns the target dbid currently being processed.

Key API for String type

See also RedisModule_ValueLength(), which returns the length of a string.

RedisModule_StringSet

int RedisModule_StringSet(RedisModuleKey *key, RedisModuleString *str);

Available since: 4.0.0

If the key is open for writing, set the specified string 'str' as the value of the key, deleting the old value if any. On success REDISMODULE_OK is returned. If the key is not open for writing or there is an active iterator, REDISMODULE_ERR is returned.

RedisModule_StringDMA

char *RedisModule_StringDMA(RedisModuleKey *key, size_t *len, int mode);

Available since: 4.0.0

Prepare the key associated string value for DMA access, and returns a pointer and size (by reference), that the user can use to read or modify the string in-place accessing it directly via pointer.

The 'mode' is composed by bitwise OR-ing the following flags:

REDISMODULE_READ -- Read access
REDISMODULE_WRITE -- Write access

If the DMA is not requested for writing, the pointer returned should only be accessed in a read-only fashion.

On error (wrong type) NULL is returned.

DMA access rules:

  1. No other key writing function should be called since the moment the pointer is obtained, for all the time we want to use DMA access to read or modify the string.

  2. Each time RedisModule_StringTruncate() is called, to continue with the DMA access, RedisModule_StringDMA() should be called again to re-obtain a new pointer and length.

  3. If the returned pointer is not NULL, but the length is zero, no byte can be touched (the string is empty, or the key itself is empty) so a RedisModule_StringTruncate() call should be used if there is to enlarge the string, and later call StringDMA() again to get the pointer.

RedisModule_StringTruncate

int RedisModule_StringTruncate(RedisModuleKey *key, size_t newlen);

Available since: 4.0.0

If the key is open for writing and is of string type, resize it, padding with zero bytes if the new length is greater than the old one.

After this call, RedisModule_StringDMA() must be called again to continue DMA access with the new pointer.

The function returns REDISMODULE_OK on success, and REDISMODULE_ERR on error, that is, the key is not open for writing, is not a string or resizing for more than 512 MB is requested.

If the key is empty, a string key is created with the new string value unless the new length value requested is zero.

Key API for List type

Many of the list functions access elements by index. Since a list is in essence a doubly-linked list, accessing elements by index is generally an O(N) operation. However, if elements are accessed sequentially or with indices close together, the functions are optimized to seek the index from the previous index, rather than seeking from the ends of the list.

This enables iteration to be done efficiently using a simple for loop:

long n = RM_ValueLength(key);
for (long i = 0; i < n; i++) {
    RedisModuleString *elem = RedisModule_ListGet(key, i);
    // Do stuff...
}

Note that after modifying a list using RedisModule_ListPop, RedisModule_ListSet or RedisModule_ListInsert, the internal iterator is invalidated so the next operation will require a linear seek.

Modifying a list in any another way, for examle using RedisModule_Call(), while a key is open will confuse the internal iterator and may cause trouble if the key is used after such modifications. The key must be reopened in this case.

See also RedisModule_ValueLength(), which returns the length of a list.

RedisModule_ListPush

int RedisModule_ListPush(RedisModuleKey *key,
                         int where,
                         RedisModuleString *ele);

Available since: 4.0.0

Push an element into a list, on head or tail depending on 'where' argument (REDISMODULE_LIST_HEAD or REDISMODULE_LIST_TAIL). If the key refers to an empty key opened for writing, the key is created. On success, REDISMODULE_OK is returned. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if key or ele is NULL.
  • ENOTSUP if the key is of another type than list.
  • EBADF if the key is not opened for writing.

Note: Before Redis 7.0, errno was not set by this function.

RedisModule_ListPop

RedisModuleString *RedisModule_ListPop(RedisModuleKey *key, int where);

Available since: 4.0.0

Pop an element from the list, and returns it as a module string object that the user should be free with RedisModule_FreeString() or by enabling automatic memory. The where argument specifies if the element should be popped from the beginning or the end of the list (REDISMODULE_LIST_HEAD or REDISMODULE_LIST_TAIL). On failure, the command returns NULL and sets errno as follows:

  • EINVAL if key is NULL.
  • ENOTSUP if the key is empty or of another type than list.
  • EBADF if the key is not opened for writing.

Note: Before Redis 7.0, errno was not set by this function.

RedisModule_ListGet

RedisModuleString *RedisModule_ListGet(RedisModuleKey *key, long index);

Available since: 7.0.0

Returns the element at index index in the list stored at key, like the LINDEX command. The element should be free'd using RedisModule_FreeString() or using automatic memory management.

The index is zero-based, so 0 means the first element, 1 the second element and so on. Negative indices can be used to designate elements starting at the tail of the list. Here, -1 means the last element, -2 means the penultimate and so forth.

When no value is found at the given key and index, NULL is returned and errno is set as follows:

  • EINVAL if key is NULL.
  • ENOTSUP if the key is not a list.
  • EBADF if the key is not opened for reading.
  • EDOM if the index is not a valid index in the list.

RedisModule_ListSet

int RedisModule_ListSet(RedisModuleKey *key,
                        long index,
                        RedisModuleString *value);

Available since: 7.0.0

Replaces the element at index index in the list stored at key.

The index is zero-based, so 0 means the first element, 1 the second element and so on. Negative indices can be used to designate elements starting at the tail of the list. Here, -1 means the last element, -2 means the penultimate and so forth.

On success, REDISMODULE_OK is returned. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if key or value is NULL.
  • ENOTSUP if the key is not a list.
  • EBADF if the key is not opened for writing.
  • EDOM if the index is not a valid index in the list.

RedisModule_ListInsert

int RedisModule_ListInsert(RedisModuleKey *key,
                           long index,
                           RedisModuleString *value);

Available since: 7.0.0

Inserts an element at the given index.

The index is zero-based, so 0 means the first element, 1 the second element and so on. Negative indices can be used to designate elements starting at the tail of the list. Here, -1 means the last element, -2 means the penultimate and so forth. The index is the element's index after inserting it.

On success, REDISMODULE_OK is returned. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if key or value is NULL.
  • ENOTSUP if the key of another type than list.
  • EBADF if the key is not opened for writing.
  • EDOM if the index is not a valid index in the list.

RedisModule_ListDelete

int RedisModule_ListDelete(RedisModuleKey *key, long index);

Available since: 7.0.0

Removes an element at the given index. The index is 0-based. A negative index can also be used, counting from the end of the list.

On success, REDISMODULE_OK is returned. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if key or value is NULL.
  • ENOTSUP if the key is not a list.
  • EBADF if the key is not opened for writing.
  • EDOM if the index is not a valid index in the list.

Key API for Sorted Set type

See also RedisModule_ValueLength(), which returns the length of a sorted set.

RedisModule_ZsetAdd

int RedisModule_ZsetAdd(RedisModuleKey *key,
                        double score,
                        RedisModuleString *ele,
                        int *flagsptr);

Available since: 4.0.0

Add a new element into a sorted set, with the specified 'score'. If the element already exists, the score is updated.

A new sorted set is created at value if the key is an empty open key setup for writing.

Additional flags can be passed to the function via a pointer, the flags are both used to receive input and to communicate state when the function returns. 'flagsptr' can be NULL if no special flags are used.

The input flags are:

REDISMODULE_ZADD_XX: Element must already exist. Do nothing otherwise.
REDISMODULE_ZADD_NX: Element must not exist. Do nothing otherwise.
REDISMODULE_ZADD_GT: If element exists, new score must be greater than the current score. 
                     Do nothing otherwise. Can optionally be combined with XX.
REDISMODULE_ZADD_LT: If element exists, new score must be less than the current score.
                     Do nothing otherwise. Can optionally be combined with XX.

The output flags are:

REDISMODULE_ZADD_ADDED: The new element was added to the sorted set.
REDISMODULE_ZADD_UPDATED: The score of the element was updated.
REDISMODULE_ZADD_NOP: No operation was performed because XX or NX flags.

On success the function returns REDISMODULE_OK. On the following errors REDISMODULE_ERR is returned:

  • The key was not opened for writing.
  • The key is of the wrong type.
  • 'score' double value is not a number (NaN).

RedisModule_ZsetIncrby

int RedisModule_ZsetIncrby(RedisModuleKey *key,
                           double score,
                           RedisModuleString *ele,
                           int *flagsptr,
                           double *newscore);

Available since: 4.0.0

This function works exactly like RedisModule_ZsetAdd(), but instead of setting a new score, the score of the existing element is incremented, or if the element does not already exist, it is added assuming the old score was zero.

The input and output flags, and the return value, have the same exact meaning, with the only difference that this function will return REDISMODULE_ERR even when 'score' is a valid double number, but adding it to the existing score results into a NaN (not a number) condition.

This function has an additional field 'newscore', if not NULL is filled with the new score of the element after the increment, if no error is returned.

RedisModule_ZsetRem

int RedisModule_ZsetRem(RedisModuleKey *key,
                        RedisModuleString *ele,
                        int *deleted);

Available since: 4.0.0

Remove the specified element from the sorted set. The function returns REDISMODULE_OK on success, and REDISMODULE_ERR on one of the following conditions:

  • The key was not opened for writing.
  • The key is of the wrong type.

The return value does NOT indicate the fact the element was really removed (since it existed) or not, just if the function was executed with success.

In order to know if the element was removed, the additional argument 'deleted' must be passed, that populates the integer by reference setting it to 1 or 0 depending on the outcome of the operation. The 'deleted' argument can be NULL if the caller is not interested to know if the element was really removed.

Empty keys will be handled correctly by doing nothing.

RedisModule_ZsetScore

int RedisModule_ZsetScore(RedisModuleKey *key,
                          RedisModuleString *ele,
                          double *score);

Available since: 4.0.0

On success retrieve the double score associated at the sorted set element 'ele' and returns REDISMODULE_OK. Otherwise REDISMODULE_ERR is returned to signal one of the following conditions:

  • There is no such element 'ele' in the sorted set.
  • The key is not a sorted set.
  • The key is an open empty key.

Key API for Sorted Set iterator

RedisModule_ZsetRangeStop

void RedisModule_ZsetRangeStop(RedisModuleKey *key);

Available since: 4.0.0

Stop a sorted set iteration.

RedisModule_ZsetRangeEndReached

int RedisModule_ZsetRangeEndReached(RedisModuleKey *key);

Available since: 4.0.0

Return the "End of range" flag value to signal the end of the iteration.

RedisModule_ZsetFirstInScoreRange

int RedisModule_ZsetFirstInScoreRange(RedisModuleKey *key,
                                      double min,
                                      double max,
                                      int minex,
                                      int maxex);

Available since: 4.0.0

Setup a sorted set iterator seeking the first element in the specified range. Returns REDISMODULE_OK if the iterator was correctly initialized otherwise REDISMODULE_ERR is returned in the following conditions:

  1. The value stored at key is not a sorted set or the key is empty.

The range is specified according to the two double values 'min' and 'max'. Both can be infinite using the following two macros:

  • REDISMODULE_POSITIVE_INFINITE for positive infinite value
  • REDISMODULE_NEGATIVE_INFINITE for negative infinite value

'minex' and 'maxex' parameters, if true, respectively setup a range where the min and max value are exclusive (not included) instead of inclusive.

RedisModule_ZsetLastInScoreRange

int RedisModule_ZsetLastInScoreRange(RedisModuleKey *key,
                                     double min,
                                     double max,
                                     int minex,
                                     int maxex);

Available since: 4.0.0

Exactly like RedisModule_ZsetFirstInScoreRange() but the last element of the range is selected for the start of the iteration instead.

RedisModule_ZsetFirstInLexRange

int RedisModule_ZsetFirstInLexRange(RedisModuleKey *key,
                                    RedisModuleString *min,
                                    RedisModuleString *max);

Available since: 4.0.0

Setup a sorted set iterator seeking the first element in the specified lexicographical range. Returns REDISMODULE_OK if the iterator was correctly initialized otherwise REDISMODULE_ERR is returned in the following conditions:

  1. The value stored at key is not a sorted set or the key is empty.
  2. The lexicographical range 'min' and 'max' format is invalid.

'min' and 'max' should be provided as two RedisModuleString objects in the same format as the parameters passed to the ZRANGEBYLEX command. The function does not take ownership of the objects, so they can be released ASAP after the iterator is setup.

RedisModule_ZsetLastInLexRange

int RedisModule_ZsetLastInLexRange(RedisModuleKey *key,
                                   RedisModuleString *min,
                                   RedisModuleString *max);

Available since: 4.0.0

Exactly like RedisModule_ZsetFirstInLexRange() but the last element of the range is selected for the start of the iteration instead.

RedisModule_ZsetRangeCurrentElement

RedisModuleString *RedisModule_ZsetRangeCurrentElement(RedisModuleKey *key,
                                                       double *score);

Available since: 4.0.0

Return the current sorted set element of an active sorted set iterator or NULL if the range specified in the iterator does not include any element.

RedisModule_ZsetRangeNext

int RedisModule_ZsetRangeNext(RedisModuleKey *key);

Available since: 4.0.0

Go to the next element of the sorted set iterator. Returns 1 if there was a next element, 0 if we are already at the latest element or the range does not include any item at all.

RedisModule_ZsetRangePrev

int RedisModule_ZsetRangePrev(RedisModuleKey *key);

Available since: 4.0.0

Go to the previous element of the sorted set iterator. Returns 1 if there was a previous element, 0 if we are already at the first element or the range does not include any item at all.

Key API for Hash type

See also RedisModule_ValueLength(), which returns the number of fields in a hash.

RedisModule_HashSet

int RedisModule_HashSet(RedisModuleKey *key, int flags, ...);

Available since: 4.0.0

Set the field of the specified hash field to the specified value. If the key is an empty key open for writing, it is created with an empty hash value, in order to set the specified field.

The function is variadic and the user must specify pairs of field names and values, both as RedisModuleString pointers (unless the CFIELD option is set, see later). At the end of the field/value-ptr pairs, NULL must be specified as last argument to signal the end of the arguments in the variadic function.

Example to set the hash argv[1] to the value argv[2]:

 RedisModule_HashSet(key,REDISMODULE_HASH_NONE,argv[1],argv[2],NULL);

The function can also be used in order to delete fields (if they exist) by setting them to the specified value of REDISMODULE_HASH_DELETE:

 RedisModule_HashSet(key,REDISMODULE_HASH_NONE,argv[1],
                     REDISMODULE_HASH_DELETE,NULL);

The behavior of the command changes with the specified flags, that can be set to REDISMODULE_HASH_NONE if no special behavior is needed.

REDISMODULE_HASH_NX: The operation is performed only if the field was not
                     already existing in the hash.
REDISMODULE_HASH_XX: The operation is performed only if the field was
                     already existing, so that a new value could be
                     associated to an existing filed, but no new fields
                     are created.
REDISMODULE_HASH_CFIELDS: The field names passed are null terminated C
                          strings instead of RedisModuleString objects.
REDISMODULE_HASH_COUNT_ALL: Include the number of inserted fields in the
                            returned number, in addition to the number of
                            updated and deleted fields. (Added in Redis
                            6.2.)

Unless NX is specified, the command overwrites the old field value with the new one.

When using REDISMODULE_HASH_CFIELDS, field names are reported using normal C strings, so for example to delete the field "foo" the following code can be used:

 RedisModule_HashSet(key,REDISMODULE_HASH_CFIELDS,"foo",
                     REDISMODULE_HASH_DELETE,NULL);

Return value:

The number of fields existing in the hash prior to the call, which have been updated (its old value has been replaced by a new value) or deleted. If the flag REDISMODULE_HASH_COUNT_ALL is set, inserted fields not previously existing in the hash are also counted.

If the return value is zero, errno is set (since Redis 6.2) as follows:

  • EINVAL if any unknown flags are set or if key is NULL.
  • ENOTSUP if the key is associated with a non Hash value.
  • EBADF if the key was not opened for writing.
  • ENOENT if no fields were counted as described under Return value above. This is not actually an error. The return value can be zero if all fields were just created and the COUNT_ALL flag was unset, or if changes were held back due to the NX and XX flags.

NOTICE: The return value semantics of this function are very different between Redis 6.2 and older versions. Modules that use it should determine the Redis version and handle it accordingly.

RedisModule_HashGet

int RedisModule_HashGet(RedisModuleKey *key, int flags, ...);

Available since: 4.0.0

Get fields from an hash value. This function is called using a variable number of arguments, alternating a field name (as a RedisModuleString pointer) with a pointer to a RedisModuleString pointer, that is set to the value of the field if the field exists, or NULL if the field does not exist. At the end of the field/value-ptr pairs, NULL must be specified as last argument to signal the end of the arguments in the variadic function.

This is an example usage:

 RedisModuleString *first, *second;
 RedisModule_HashGet(mykey,REDISMODULE_HASH_NONE,argv[1],&first,
                     argv[2],&second,NULL);

As with RedisModule_HashSet() the behavior of the command can be specified passing flags different than REDISMODULE_HASH_NONE:

REDISMODULE_HASH_CFIELDS: field names as null terminated C strings.

REDISMODULE_HASH_EXISTS: instead of setting the value of the field expecting a RedisModuleString pointer to pointer, the function just reports if the field exists or not and expects an integer pointer as the second element of each pair.

Example of REDISMODULE_HASH_CFIELDS:

 RedisModuleString *username, *hashedpass;
 RedisModule_HashGet(mykey,REDISMODULE_HASH_CFIELDS,"username",&username,"hp",&hashedpass, NULL);

Example of REDISMODULE_HASH_EXISTS:

 int exists;
 RedisModule_HashGet(mykey,REDISMODULE_HASH_EXISTS,argv[1],&exists,NULL);

The function returns REDISMODULE_OK on success and REDISMODULE_ERR if the key is not an hash value.

Memory management:

The returned RedisModuleString objects should be released with RedisModule_FreeString(), or by enabling automatic memory management.

Key API for Stream type

For an introduction to streams, see https://redis.io/topics/streams-intro.

The type RedisModuleStreamID, which is used in stream functions, is a struct with two 64-bit fields and is defined as

typedef struct RedisModuleStreamID {
    uint64_t ms;
    uint64_t seq;
} RedisModuleStreamID;

See also RedisModule_ValueLength(), which returns the length of a stream, and the conversion functions RedisModule_StringToStreamID() and RedisModule_CreateStringFromStreamID().

RedisModule_StreamAdd

int RedisModule_StreamAdd(RedisModuleKey *key,
                          int flags,
                          RedisModuleStreamID *id,
                          RedisModuleString **argv,
                          long numfields);

Available since: 6.2.0

Adds an entry to a stream. Like XADD without trimming.

  • key: The key where the stream is (or will be) stored
  • flags: A bit field of
    • REDISMODULE_STREAM_ADD_AUTOID: Assign a stream ID automatically, like * in the XADD command.
  • id: If the AUTOID flag is set, this is where the assigned ID is returned. Can be NULL if AUTOID is set, if you don't care to receive the ID. If AUTOID is not set, this is the requested ID.
  • argv: A pointer to an array of size numfields * 2 containing the fields and values.
  • numfields: The number of field-value pairs in argv.

Returns REDISMODULE_OK if an entry has been added. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if called with invalid arguments
  • ENOTSUP if the key refers to a value of a type other than stream
  • EBADF if the key was not opened for writing
  • EDOM if the given ID was 0-0 or not greater than all other IDs in the stream (only if the AUTOID flag is unset)
  • EFBIG if the stream has reached the last possible ID
  • ERANGE if the elements are too large to be stored.

RedisModule_StreamDelete

int RedisModule_StreamDelete(RedisModuleKey *key, RedisModuleStreamID *id);

Available since: 6.2.0

Deletes an entry from a stream.

  • key: A key opened for writing, with no stream iterator started.
  • id: The stream ID of the entry to delete.

Returns REDISMODULE_OK on success. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if called with invalid arguments
  • ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
  • EBADF if the key was not opened for writing or if a stream iterator is associated with the key
  • ENOENT if no entry with the given stream ID exists

See also RedisModule_StreamIteratorDelete() for deleting the current entry while iterating using a stream iterator.

RedisModule_StreamIteratorStart

int RedisModule_StreamIteratorStart(RedisModuleKey *key,
                                    int flags,
                                    RedisModuleStreamID *start,
                                    RedisModuleStreamID *end);

Available since: 6.2.0

Sets up a stream iterator.

  • key: The stream key opened for reading using RedisModule_OpenKey().
  • flags:
    • REDISMODULE_STREAM_ITERATOR_EXCLUSIVE: Don't include start and end in the iterated range.
    • REDISMODULE_STREAM_ITERATOR_REVERSE: Iterate in reverse order, starting from the end of the range.
  • start: The lower bound of the range. Use NULL for the beginning of the stream.
  • end: The upper bound of the range. Use NULL for the end of the stream.

Returns REDISMODULE_OK on success. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if called with invalid arguments
  • ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
  • EBADF if the key was not opened for writing or if a stream iterator is already associated with the key
  • EDOM if start or end is outside the valid range

Returns REDISMODULE_OK on success and REDISMODULE_ERR if the key doesn't refer to a stream or if invalid arguments were given.

The stream IDs are retrieved using RedisModule_StreamIteratorNextID() and for each stream ID, the fields and values are retrieved using RedisModule_StreamIteratorNextField(). The iterator is freed by calling RedisModule_StreamIteratorStop().

Example (error handling omitted):

RedisModule_StreamIteratorStart(key, 0, startid_ptr, endid_ptr);
RedisModuleStreamID id;
long numfields;
while (RedisModule_StreamIteratorNextID(key, &id, &numfields) ==
       REDISMODULE_OK) {
    RedisModuleString *field, *value;
    while (RedisModule_StreamIteratorNextField(key, &field, &value) ==
           REDISMODULE_OK) {
        //
        // ... Do stuff ...
        //
        RedisModule_FreeString(ctx, field);
        RedisModule_FreeString(ctx, value);
    }
}
RedisModule_StreamIteratorStop(key);

RedisModule_StreamIteratorStop

int RedisModule_StreamIteratorStop(RedisModuleKey *key);

Available since: 6.2.0

Stops a stream iterator created using RedisModule_StreamIteratorStart() and reclaims its memory.

Returns REDISMODULE_OK on success. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if called with a NULL key
  • ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
  • EBADF if the key was not opened for writing or if no stream iterator is associated with the key

RedisModule_StreamIteratorNextID

int RedisModule_StreamIteratorNextID(RedisModuleKey *key,
                                     RedisModuleStreamID *id,
                                     long *numfields);

Available since: 6.2.0

Finds the next stream entry and returns its stream ID and the number of fields.

  • key: Key for which a stream iterator has been started using RedisModule_StreamIteratorStart().
  • id: The stream ID returned. NULL if you don't care.
  • numfields: The number of fields in the found stream entry. NULL if you don't care.

Returns REDISMODULE_OK and sets *id and *numfields if an entry was found. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if called with a NULL key
  • ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
  • EBADF if no stream iterator is associated with the key
  • ENOENT if there are no more entries in the range of the iterator

In practice, if RedisModule_StreamIteratorNextID() is called after a successful call to RedisModule_StreamIteratorStart() and with the same key, it is safe to assume that an REDISMODULE_ERR return value means that there are no more entries.

Use RedisModule_StreamIteratorNextField() to retrieve the fields and values. See the example at RedisModule_StreamIteratorStart().

RedisModule_StreamIteratorNextField

int RedisModule_StreamIteratorNextField(RedisModuleKey *key,
                                        RedisModuleString **field_ptr,
                                        RedisModuleString **value_ptr);

Available since: 6.2.0

Retrieves the next field of the current stream ID and its corresponding value in a stream iteration. This function should be called repeatedly after calling RedisModule_StreamIteratorNextID() to fetch each field-value pair.

  • key: Key where a stream iterator has been started.
  • field_ptr: This is where the field is returned.
  • value_ptr: This is where the value is returned.

Returns REDISMODULE_OK and points *field_ptr and *value_ptr to freshly allocated RedisModuleString objects. The string objects are freed automatically when the callback finishes if automatic memory is enabled. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if called with a NULL key
  • ENOTSUP if the key refers to a value of a type other than stream or if the key is empty
  • EBADF if no stream iterator is associated with the key
  • ENOENT if there are no more fields in the current stream entry

In practice, if RedisModule_StreamIteratorNextField() is called after a successful call to RedisModule_StreamIteratorNextID() and with the same key, it is safe to assume that an REDISMODULE_ERR return value means that there are no more fields.

See the example at RedisModule_StreamIteratorStart().

RedisModule_StreamIteratorDelete

int RedisModule_StreamIteratorDelete(RedisModuleKey *key);

Available since: 6.2.0

Deletes the current stream entry while iterating.

This function can be called after RedisModule_StreamIteratorNextID() or after any calls to RedisModule_StreamIteratorNextField().

Returns REDISMODULE_OK on success. On failure, REDISMODULE_ERR is returned and errno is set as follows:

  • EINVAL if key is NULL
  • ENOTSUP if the key is empty or is of another type than stream
  • EBADF if the key is not opened for writing, if no iterator has been started
  • ENOENT if the iterator has no current stream entry

RedisModule_StreamTrimByLength

long long RedisModule_StreamTrimByLength(RedisModuleKey *key,
                                         int flags,
                                         long long length);

Available since: 6.2.0

Trim a stream by length, similar to XTRIM with MAXLEN.

  • key: Key opened for writing.
  • flags: A bitfield of
    • REDISMODULE_STREAM_TRIM_APPROX: Trim less if it improves performance, like XTRIM with ~.
  • length: The number of stream entries to keep after trimming.

Returns the number of entries deleted. On failure, a negative value is returned and errno is set as follows:

  • EINVAL if called with invalid arguments
  • ENOTSUP if the key is empty or of a type other than stream
  • EBADF if the key is not opened for writing

RedisModule_StreamTrimByID

long long RedisModule_StreamTrimByID(RedisModuleKey *key,
                                     int flags,
                                     RedisModuleStreamID *id);

Available since: 6.2.0

Trim a stream by ID, similar to XTRIM with MINID.

  • key: Key opened for writing.
  • flags: A bitfield of
    • REDISMODULE_STREAM_TRIM_APPROX: Trim less if it improves performance, like XTRIM with ~.
  • id: The smallest stream ID to keep after trimming.

Returns the number of entries deleted. On failure, a negative value is returned and errno is set as follows:

  • EINVAL if called with invalid arguments
  • ENOTSUP if the key is empty or of a type other than stream
  • EBADF if the key is not opened for writing

Calling Redis commands from modules

RedisModule_Call() sends a command to Redis. The remaining functions handle the reply.

RedisModule_FreeCallReply

void RedisModule_FreeCallReply(RedisModuleCallReply *reply);

Available since: 4.0.0

Free a Call reply and all the nested replies it contains if it's an array.

RedisModule_CallReplyType

int RedisModule_CallReplyType(RedisModuleCallReply *reply);

Available since: 4.0.0

Return the reply type as one of the following:

  • REDISMODULE_REPLY_UNKNOWN
  • REDISMODULE_REPLY_STRING
  • REDISMODULE_REPLY_ERROR
  • REDISMODULE_REPLY_INTEGER
  • REDISMODULE_REPLY_ARRAY
  • REDISMODULE_REPLY_NULL
  • REDISMODULE_REPLY_MAP
  • REDISMODULE_REPLY_SET
  • REDISMODULE_REPLY_BOOL
  • REDISMODULE_REPLY_DOUBLE
  • REDISMODULE_REPLY_BIG_NUMBER
  • REDISMODULE_REPLY_VERBATIM_STRING
  • REDISMODULE_REPLY_ATTRIBUTE

RedisModule_CallReplyLength

size_t RedisModule_CallReplyLength(RedisModuleCallReply *reply);

Available since: 4.0.0

Return the reply type length, where applicable.

RedisModule_CallReplyArrayElement

RedisModuleCallReply *RedisModule_CallReplyArrayElement(RedisModuleCallReply *reply,
                                                        size_t idx);

Available since: 4.0.0

Return the 'idx'-th nested call reply element of an array reply, or NULL if the reply type is wrong or the index is out of range.

RedisModule_CallReplyInteger

long long RedisModule_CallReplyInteger(RedisModuleCallReply *reply);

Available since: 4.0.0

Return the long long of an integer reply.

RedisModule_CallReplyDouble

double RedisModule_CallReplyDouble(RedisModuleCallReply *reply);

Available since: 7.0.0

Return the double value of a double reply.

RedisModule_CallReplyBigNumber

const char *RedisModule_CallReplyBigNumber(RedisModuleCallReply *reply,
                                           size_t *len);

Available since: 7.0.0

Return the big number value of a big number reply.

RedisModule_CallReplyVerbatim

const char *RedisModule_CallReplyVerbatim(RedisModuleCallReply *reply,
                                          size_t *len,
                                          const char **format);

Available since: 7.0.0

Return the value of an verbatim string reply, An optional output argument can be given to get verbatim reply format.

RedisModule_CallReplyBool

int RedisModule_CallReplyBool(RedisModuleCallReply *reply);

Available since: 7.0.0

Return the Boolean value of a Boolean reply.

RedisModule_CallReplySetElement

RedisModuleCallReply *RedisModule_CallReplySetElement(RedisModuleCallReply *reply,
                                                      size_t idx);

Available since: 7.0.0

Return the 'idx'-th nested call reply element of a set reply, or NULL if the reply type is wrong or the index is out of range.

RedisModule_CallReplyMapElement

int RedisModule_CallReplyMapElement(RedisModuleCallReply *reply,
                                    size_t idx,
                                    RedisModuleCallReply **key,
                                    RedisModuleCallReply **val);

Available since: 7.0.0

Retrieve the 'idx'-th key and value of a map reply.

Returns:

  • REDISMODULE_OK on success.
  • REDISMODULE_ERR if idx out of range or if the reply type is wrong.

The key and value arguments are used to return by reference, and may be NULL if not required.

RedisModule_CallReplyAttribute

RedisModuleCallReply *RedisModule_CallReplyAttribute(RedisModuleCallReply *reply);

Available since: 7.0.0

Return the attribute of the given reply, or NULL if no attribute exists.

RedisModule_CallReplyAttributeElement

int RedisModule_CallReplyAttributeElement(RedisModuleCallReply *reply,
                                          size_t idx,
                                          RedisModuleCallReply **key,
                                          RedisModuleCallReply **val);

Available since: 7.0.0

Retrieve the 'idx'-th key and value of a attribute reply.

Returns:

  • REDISMODULE_OK on success.
  • REDISMODULE_ERR if idx out of range or if the reply type is wrong.

The key and value arguments are used to return by reference, and may be NULL if not required.

RedisModule_CallReplyStringPtr

const char *RedisModule_CallReplyStringPtr(RedisModuleCallReply *reply,
                                           size_t *len);

Available since: 4.0.0

Return the pointer and length of a string or error reply.

RedisModule_CreateStringFromCallReply

RedisModuleString *RedisModule_CreateStringFromCallReply(RedisModuleCallReply *reply);

Available since: 4.0.0

Return a new string object from a call reply of type string, error or integer. Otherwise (wrong reply type) return NULL.

RedisModule_Call

RedisModuleCallReply *RedisModule_Call(RedisModuleCtx *ctx,
                                       const char *cmdname,
                                       const char *fmt,
                                       ...);

Available since: 4.0.0

Exported API to call any Redis command from modules.

  • cmdname: The Redis command to call.

  • fmt: A format specifier string for the command's arguments. Each of the arguments should be specified by a valid type specification. The format specifier can also contain the modifiers !, A, 3 and R which don't have a corresponding argument.

    • b -- The argument is a buffer and is immediately followed by another argument that is the buffer's length.
    • c -- The argument is a pointer to a plain C string (null-terminated).
    • l -- The argument is long long integer.
    • s -- The argument is a RedisModuleString.
    • v -- The argument(s) is a vector of RedisModuleString.
    • ! -- Sends the Redis command and its arguments to replicas and AOF.
    • A -- Suppress AOF propagation, send only to replicas (requires !).
    • R -- Suppress replicas propagation, send only to AOF (requires !).
    • 3 -- Return a RESP3 reply. This will change the command reply. e.g., HGETALL returns a map instead of a flat array.
    • 0 -- Return the reply in auto mode, i.e. the reply format will be the same as the client attached to the given RedisModuleCtx. This will probably used when you want to pass the reply directly to the client.
    • C -- Check if command can be executed according to ACL rules.
    • 'S' -- Run the command in a script mode, this means that it will raise an error if a command which are not allowed inside a script (flagged with the deny-script flag) is invoked (like SHUTDOWN). In addition, on script mode, write commands are not allowed if there are not enough good replicas (as configured with min-replicas-to-write) or when the server is unable to persist to the disk.
    • 'W' -- Do not allow to run any write command (flagged with the write flag).
    • 'E' -- Return error as RedisModuleCallReply. If there is an error before invoking the command, the error is returned using errno mechanism. This flag allows to get the error also as an error CallReply with relevant error message.
  • ...: The actual arguments to the Redis command.

On success a RedisModuleCallReply object is returned, otherwise NULL is returned and errno is set to the following values:

  • EBADF: wrong format specifier.
  • EINVAL: wrong command arity.
  • ENOENT: command does not exist.
  • EPERM: operation in Cluster instance with key in non local slot.
  • EROFS: operation in Cluster instance when a write command is sent in a readonly state.
  • ENETDOWN: operation in Cluster instance when cluster is down.
  • ENOTSUP: No ACL user for the specified module context
  • EACCES: Command cannot be executed, according to ACL rules
  • ENOSPC: Write command is not allowed
  • ESPIPE: Command not allowed on script mode

Example code fragment:

 reply = RedisModule_Call(ctx,"INCRBY","sc",argv[1],"10");
 if (RedisModule_CallReplyType(reply) == REDISMODULE_REPLY_INTEGER) {
   long long myval = RedisModule_CallReplyInteger(reply);
   // Do something with myval.
 }

This API is documented here: https://redis.io/topics/modules-intro

RedisModule_CallReplyProto

const char *RedisModule_CallReplyProto(RedisModuleCallReply *reply,
                                       size_t *len);

Available since: 4.0.0

Return a pointer, and a length, to the protocol returned by the command that returned the reply object.

Modules data types

When String DMA or using existing data structures is not enough, it is possible to create new data types from scratch and export them to Redis. The module must provide a set of callbacks for handling the new values exported (for example in order to provide RDB saving/loading, AOF rewrite, and so forth). In this section we define this API.

RedisModule_CreateDataType

moduleType *RedisModule_CreateDataType(RedisModuleCtx *ctx,
                                       const char *name,
                                       int encver,
                                       void *typemethods_ptr);

Available since: 4.0.0

Register a new data type exported by the module. The parameters are the following. Please for in depth documentation check the modules API documentation, especially https://redis.io/topics/modules-native-types.

  • name: A 9 characters data type name that MUST be unique in the Redis Modules ecosystem. Be creative... and there will be no collisions. Use the charset A-Z a-z 9-0, plus the two "-_" characters. A good idea is to use, for example <typename>-<vendor>. For example "tree-AntZ" may mean "Tree data structure by @antirez". To use both lower case and upper case letters helps in order to prevent collisions.

  • encver: Encoding version, which is, the version of the serialization that a module used in order to persist data. As long as the "name" matches, the RDB loading will be dispatched to the type callbacks whatever 'encver' is used, however the module can understand if the encoding it must load are of an older version of the module. For example the module "tree-AntZ" initially used encver=0. Later after an upgrade, it started to serialize data in a different format and to register the type with encver=1. However this module may still load old data produced by an older version if the rdb_load callback is able to check the encver value and act accordingly. The encver must be a positive value between 0 and 1023.

  • typemethods_ptr is a pointer to a RedisModuleTypeMethods structure that should be populated with the methods callbacks and structure version, like in the following example:

      RedisModuleTypeMethods tm = {
          .version = REDISMODULE_TYPE_METHOD_VERSION,
          .rdb_load = myType_RDBLoadCallBack,
          .rdb_save = myType_RDBSaveCallBack,
          .aof_rewrite = myType_AOFRewriteCallBack,
          .free = myType_FreeCallBack,
    
          // Optional fields
          .digest = myType_DigestCallBack,
          .mem_usage = myType_MemUsageCallBack,
          .aux_load = myType_AuxRDBLoadCallBack,
          .aux_save = myType_AuxRDBSaveCallBack,
          .free_effort = myType_FreeEffortCallBack,
          .unlink = myType_UnlinkCallBack,
          .copy = myType_CopyCallback,
          .defrag = myType_DefragCallback
    
          // Enhanced optional fields
          .mem_usage2 = myType_MemUsageCallBack2,
          .free_effort2 = myType_FreeEffortCallBack2,
          .unlink2 = myType_UnlinkCallBack2,
          .copy2 = myType_CopyCallback2,
      }
    
  • rdb_load: A callback function pointer that loads data from RDB files.

  • rdb_save: A callback function pointer that saves data to RDB files.

  • aof_rewrite: A callback function pointer that rewrites data as commands.

  • digest: A callback function pointer that is used for DEBUG DIGEST.

  • free: A callback function pointer that can free a type value.

  • aux_save: A callback function pointer that saves out of keyspace data to RDB files. 'when' argument is either REDISMODULE_AUX_BEFORE_RDB or REDISMODULE_AUX_AFTER_RDB.

  • aux_load: A callback function pointer that loads out of keyspace data from RDB files. Similar to aux_save, returns REDISMODULE_OK on success, and ERR otherwise.

  • free_effort: A callback function pointer that used to determine whether the module's memory needs to be lazy reclaimed. The module should return the complexity involved by freeing the value. for example: how many pointers are gonna be freed. Note that if it returns 0, we'll always do an async free.

  • unlink: A callback function pointer that used to notifies the module that the key has been removed from the DB by redis, and may soon be freed by a background thread. Note that it won't be called on FLUSHALL/FLUSHDB (both sync and async), and the module can use the RedisModuleEvent_FlushDB to hook into that.

  • copy: A callback function pointer that is used to make a copy of the specified key. The module is expected to perform a deep copy of the specified value and return it. In addition, hints about the names of the source and destination keys is provided. A NULL return value is considered an error and the copy operation fails. Note: if the target key exists and is being overwritten, the copy callback will be called first, followed by a free callback to the value that is being replaced.

  • defrag: A callback function pointer that is used to request the module to defrag a key. The module should then iterate pointers and call the relevant RedisModule_Defrag*() functions to defragment pointers or complex types. The module should continue iterating as long as RedisModule_DefragShouldStop() returns a zero value, and return a zero value if finished or non-zero value if more work is left to be done. If more work needs to be done, RedisModule_DefragCursorSet() and RedisModule_DefragCursorGet() can be used to track this work across different calls. Normally, the defrag mechanism invokes the callback without a time limit, so RedisModule_DefragShouldStop() always returns zero. The "late defrag" mechanism which has a time limit and provides cursor support is used only for keys that are determined to have significant internal complexity. To determine this, the defrag mechanism uses the free_effort callback and the 'active-defrag-max-scan-fields' config directive. NOTE: The value is passed as a void** and the function is expected to update the pointer if the top-level value pointer is defragmented and consequently changes.

  • mem_usage2: Similar to mem_usage, but provides the RedisModuleKeyOptCtx parameter so that meta information such as key name and db id can be obtained, and the sample_size for size estimation (see MEMORY USAGE command).

  • free_effort2: Similar to free_effort, but provides the RedisModuleKeyOptCtx parameter so that meta information such as key name and db id can be obtained.

  • unlink2: Similar to unlink, but provides the RedisModuleKeyOptCtx parameter so that meta information such as key name and db id can be obtained.

  • copy2: Similar to copy, but provides the RedisModuleKeyOptCtx parameter so that meta information such as key names and db ids can be obtained.

Note: the module name "AAAAAAAAA" is reserved and produces an error, it happens to be pretty lame as well.

If there is already a module registering a type with the same name, and if the module name or encver is invalid, NULL is returned. Otherwise the new type is registered into Redis, and a reference of type RedisModuleType is returned: the caller of the function should store this reference into a global variable to make future use of it in the modules type API, since a single module may register multiple types. Example code fragment:

 static RedisModuleType *BalancedTreeType;

 int RedisModule_OnLoad(RedisModuleCtx *ctx) {
     // some code here ...
     BalancedTreeType = RM_CreateDataType(...);
 }

RedisModule_ModuleTypeSetValue

int RedisModule_ModuleTypeSetValue(RedisModuleKey *key,
                                   moduleType *mt,
                                   void *value);

Available since: 4.0.0

If the key is open for writing, set the specified module type object as the value of the key, deleting the old value if any. On success REDISMODULE_OK is returned. If the key is not open for writing or there is an active iterator, REDISMODULE_ERR is returned.

RedisModule_ModuleTypeGetType

moduleType *RedisModule_ModuleTypeGetType(RedisModuleKey *key);

Available since: 4.0.0

Assuming RedisModule_KeyType() returned REDISMODULE_KEYTYPE_MODULE on the key, returns the module type pointer of the value stored at key.

If the key is NULL, is not associated with a module type, or is empty, then NULL is returned instead.

RedisModule_ModuleTypeGetValue

void *RedisModule_ModuleTypeGetValue(RedisModuleKey *key);

Available since: 4.0.0

Assuming RedisModule_KeyType() returned REDISMODULE_KEYTYPE_MODULE on the key, returns the module type low-level value stored at key, as it was set by the user via RedisModule_ModuleTypeSetValue().

If the key is NULL, is not associated with a module type, or is empty, then NULL is returned instead.

RDB loading and saving functions

RedisModule_IsIOError

int RedisModule_IsIOError(RedisModuleIO *io);

Available since: 6.0.0

Returns true if any previous IO API failed. for Load* APIs the REDISMODULE_OPTIONS_HANDLE_IO_ERRORS flag must be set with RedisModule_SetModuleOptions first.

RedisModule_SaveUnsigned

void RedisModule_SaveUnsigned(RedisModuleIO *io, uint64_t value);

Available since: 4.0.0

Save an unsigned 64 bit value into the RDB file. This function should only be called in the context of the rdb_save method of modules implementing new data types.

RedisModule_LoadUnsigned

uint64_t RedisModule_LoadUnsigned(RedisModuleIO *io);

Available since: 4.0.0

Load an unsigned 64 bit value from the RDB file. This function should only be called in the context of the rdb_load method of modules implementing new data types.

RedisModule_SaveSigned

void RedisModule_SaveSigned(RedisModuleIO *io, int64_t value);

Available since: 4.0.0

Like RedisModule_SaveUnsigned() but for signed 64 bit values.

RedisModule_LoadSigned

int64_t RedisModule_LoadSigned(RedisModuleIO *io);

Available since: 4.0.0

Like RedisModule_LoadUnsigned() but for signed 64 bit values.

RedisModule_SaveString

void RedisModule_SaveString(RedisModuleIO *io, RedisModuleString *s);

Available since: 4.0.0

In the context of the rdb_save method of a module type, saves a string into the RDB file taking as input a RedisModuleString.

The string can be later loaded with RedisModule_LoadString() or other Load family functions expecting a serialized string inside the RDB file.

RedisModule_SaveStringBuffer

void RedisModule_SaveStringBuffer(RedisModuleIO *io,
                                  const char *str,
                                  size_t len);

Available since: 4.0.0

Like RedisModule_SaveString() but takes a raw C pointer and length as input.

RedisModule_LoadString

RedisModuleString *RedisModule_LoadString(RedisModuleIO *io);

Available since: 4.0.0

In the context of the rdb_load method of a module data type, loads a string from the RDB file, that was previously saved with RedisModule_SaveString() functions family.

The returned string is a newly allocated RedisModuleString object, and the user should at some point free it with a call to RedisModule_FreeString().

If the data structure does not store strings as RedisModuleString objects, the similar function RedisModule_LoadStringBuffer() could be used instead.

RedisModule_LoadStringBuffer

char *RedisModule_LoadStringBuffer(RedisModuleIO *io, size_t *lenptr);

Available since: 4.0.0

Like RedisModule_LoadString() but returns an heap allocated string that was allocated with RedisModule_Alloc(), and can be resized or freed with RedisModule_Realloc() or RedisModule_Free().

The size of the string is stored at '*lenptr' if not NULL. The returned string is not automatically NULL terminated, it is loaded exactly as it was stored inside the RDB file.

RedisModule_SaveDouble

void RedisModule_SaveDouble(RedisModuleIO *io, double value);

Available since: 4.0.0

In the context of the rdb_save method of a module data type, saves a double value to the RDB file. The double can be a valid number, a NaN or infinity. It is possible to load back the value with RedisModule_LoadDouble().

RedisModule_LoadDouble

double RedisModule_LoadDouble(RedisModuleIO *io);

Available since: 4.0.0

In the context of the rdb_save method of a module data type, loads back the double value saved by RedisModule_SaveDouble().

RedisModule_SaveFloat

void RedisModule_SaveFloat(RedisModuleIO *io, float value);

Available since: 4.0.0

In the context of the rdb_save method of a module data type, saves a float value to the RDB file. The float can be a valid number, a NaN or infinity. It is possible to load back the value with RedisModule_LoadFloat().

RedisModule_LoadFloat

float RedisModule_LoadFloat(RedisModuleIO *io);

Available since: 4.0.0

In the context of the rdb_save method of a module data type, loads back the float value saved by RedisModule_SaveFloat().

RedisModule_SaveLongDouble

void RedisModule_SaveLongDouble(RedisModuleIO *io, long double value);

Available since: 6.0.0

In the context of the rdb_save method of a module data type, saves a long double value to the RDB file. The double can be a valid number, a NaN or infinity. It is possible to load back the value with RedisModule_LoadLongDouble().

RedisModule_LoadLongDouble

long double RedisModule_LoadLongDouble(RedisModuleIO *io);

Available since: 6.0.0

In the context of the rdb_save method of a module data type, loads back the long double value saved by RedisModule_SaveLongDouble().

Key digest API (DEBUG DIGEST interface for modules types)

RedisModule_DigestAddStringBuffer

void RedisModule_DigestAddStringBuffer(RedisModuleDigest *md,
                                       const char *ele,
                                       size_t len);

Available since: 4.0.0

Add a new element to the digest. This function can be called multiple times one element after the other, for all the elements that constitute a given data structure. The function call must be followed by the call to RedisModule_DigestEndSequence eventually, when all the elements that are always in a given order are added. See the Redis Modules data types documentation for more info. However this is a quick example that uses Redis data types as an example.

To add a sequence of unordered elements (for example in the case of a Redis Set), the pattern to use is:

foreach element {
    AddElement(element);
    EndSequence();
}

Because Sets are not ordered, so every element added has a position that does not depend from the other. However if instead our elements are ordered in pairs, like field-value pairs of an Hash, then one should use:

foreach key,value {
    AddElement(key);
    AddElement(value);
    EndSequence();
}

Because the key and value will be always in the above order, while instead the single key-value pairs, can appear in any position into a Redis hash.

A list of ordered elements would be implemented with:

foreach element {
    AddElement(element);
}
EndSequence();

RedisModule_DigestAddLongLong

void RedisModule_DigestAddLongLong(RedisModuleDigest *md, long long ll);

Available since: 4.0.0

Like RedisModule_DigestAddStringBuffer() but takes a long long as input that gets converted into a string before adding it to the digest.

RedisModule_DigestEndSequence

void RedisModule_DigestEndSequence(RedisModuleDigest *md);

Available since: 4.0.0

See the documentation for RedisModule_DigestAddElement().

RedisModule_LoadDataTypeFromStringEncver

void *RedisModule_LoadDataTypeFromStringEncver(const RedisModuleString *str,
                                               const moduleType *mt,
                                               int encver);

Available since: 7.0.0

Decode a serialized representation of a module data type 'mt', in a specific encoding version 'encver' from string 'str' and return a newly allocated value, or NULL if decoding failed.

This call basically reuses the 'rdb_load' callback which module data types implement in order to allow a module to arbitrarily serialize/de-serialize keys, similar to how the Redis 'DUMP' and 'RESTORE' commands are implemented.

Modules should generally use the REDISMODULE_OPTIONS_HANDLE_IO_ERRORS flag and make sure the de-serialization code properly checks and handles IO errors (freeing allocated buffers and returning a NULL).

If this is NOT done, Redis will handle corrupted (or just truncated) serialized data by producing an error message and terminating the process.

RedisModule_LoadDataTypeFromString

void *RedisModule_LoadDataTypeFromString(const RedisModuleString *str,
                                         const moduleType *mt);

Available since: 6.0.0

Similar to RedisModule_LoadDataTypeFromStringEncver, original version of the API, kept for backward compatibility.

RedisModule_SaveDataTypeToString

RedisModuleString *RedisModule_SaveDataTypeToString(RedisModuleCtx *ctx,
                                                    void *data,
                                                    const moduleType *mt);

Available since: 6.0.0

Encode a module data type 'mt' value 'data' into serialized form, and return it as a newly allocated RedisModuleString.

This call basically reuses the 'rdb_save' callback which module data types implement in order to allow a module to arbitrarily serialize/de-serialize keys, similar to how the Redis 'DUMP' and 'RESTORE' commands are implemented.

RedisModule_GetKeyNameFromDigest

const RedisModuleString *RedisModule_GetKeyNameFromDigest(RedisModuleDigest *dig);

Available since: 7.0.0

Returns the name of the key currently being processed.

RedisModule_GetDbIdFromDigest

int RedisModule_GetDbIdFromDigest(RedisModuleDigest *dig);

Available since: 7.0.0

Returns the database id of the key currently being processed.

AOF API for modules data types

RedisModule_EmitAOF

void RedisModule_EmitAOF(RedisModuleIO *io,
                         const char *cmdname,
                         const char *fmt,
                         ...);

Available since: 4.0.0

Emits a command into the AOF during the AOF rewriting process. This function is only called in the context of the aof_rewrite method of data types exported by a module. The command works exactly like RedisModule_Call() in the way the parameters are passed, but it does not return anything as the error handling is performed by Redis itself.

IO context handling

RedisModule_GetKeyNameFromIO

const RedisModuleString *RedisModule_GetKeyNameFromIO(RedisModuleIO *io);

Available since: 5.0.5

Returns the name of the key currently being processed. There is no guarantee that the key name is always available, so this may return NULL.

RedisModule_GetKeyNameFromModuleKey

const RedisModuleString *RedisModule_GetKeyNameFromModuleKey(RedisModuleKey *key);

Available since: 6.0.0

Returns a RedisModuleString with the name of the key from RedisModuleKey.

RedisModule_GetDbIdFromModuleKey

int RedisModule_GetDbIdFromModuleKey(RedisModuleKey *key);

Available since: 7.0.0

Returns a database id of the key from RedisModuleKey.

RedisModule_GetDbIdFromIO

int RedisModule_GetDbIdFromIO(RedisModuleIO *io);

Available since: 7.0.0

Returns the database id of the key currently being processed. There is no guarantee that this info is always available, so this may return -1.

Logging

RedisModule_Log

void RedisModule_Log(RedisModuleCtx *ctx,
                     const char *levelstr,
                     const char *fmt,
                     ...);

Available since: 4.0.0

Produces a log message to the standard Redis log, the format accepts printf-alike specifiers, while level is a string describing the log level to use when emitting the log, and must be one of the following:

  • "debug" (REDISMODULE_LOGLEVEL_DEBUG)
  • "verbose" (REDISMODULE_LOGLEVEL_VERBOSE)
  • "notice" (REDISMODULE_LOGLEVEL_NOTICE)
  • "warning" (REDISMODULE_LOGLEVEL_WARNING)

If the specified log level is invalid, verbose is used by default. There is a fixed limit to the length of the log line this function is able to emit, this limit is not specified but is guaranteed to be more than a few lines of text.

The ctx argument may be NULL if cannot be provided in the context of the caller for instance threads or callbacks, in which case a generic "module" will be used instead of the module name.

RedisModule_LogIOError

void RedisModule_LogIOError(RedisModuleIO *io,
                            const char *levelstr,
                            const char *fmt,
                            ...);

Available since: 4.0.0

Log errors from RDB / AOF serialization callbacks.

This function should be used when a callback is returning a critical error to the caller since cannot load or save the data for some critical reason.

RedisModule__Assert

void RedisModule__Assert(const char *estr, const char *file, int line);

Available since: 6.0.0

Redis-like assert function.

The macro RedisModule_Assert(expression) is recommended, rather than calling this function directly.

A failed assertion will shut down the server and produce logging information that looks identical to information generated by Redis itself.

RedisModule_LatencyAddSample

void RedisModule_LatencyAddSample(const char *event, mstime_t latency);

Available since: 6.0.0

Allows adding event to the latency monitor to be observed by the LATENCY command. The call is skipped if the latency is smaller than the configured latency-monitor-threshold.

Blocking clients from modules

For a guide about blocking commands in modules, see https://redis.io/topics/modules-blocking-ops.

RedisModule_BlockClient

RedisModuleBlockedClient *RedisModule_BlockClient(RedisModuleCtx *ctx,
                                                  RedisModuleCmdFunc reply_callback,
                                                  RedisModuleCmdFunc timeout_callback,
                                                  void (*free_privdata)(RedisModuleCtx*, void*),
                                                  long long timeout_ms);

Available since: 4.0.0

Block a client in the context of a blocking command, returning an handle which will be used, later, in order to unblock the client with a call to RedisModule_UnblockClient(). The arguments specify callback functions and a timeout after which the client is unblocked.

The callbacks are called in the following contexts:

reply_callback:   called after a successful RedisModule_UnblockClient()
                  call in order to reply to the client and unblock it.

timeout_callback: called when the timeout is reached or if [`CLIENT UNBLOCK`](/commands/client-unblock)
                  is invoked, in order to send an error to the client.

free_privdata:    called in order to free the private data that is passed
                  by RedisModule_UnblockClient() call.

Note: RedisModule_UnblockClient should be called for every blocked client, even if client was killed, timed-out or disconnected. Failing to do so will result in memory leaks.

There are some cases where RedisModule_BlockClient() cannot be used:

  1. If the client is a Lua script.
  2. If the client is executing a MULTI block.

In these cases, a call to RedisModule_BlockClient() will not block the client, but instead produce a specific error reply.

A module that registers a timeout_callback function can also be unblocked using the CLIENT UNBLOCK command, which will trigger the timeout callback. If a callback function is not registered, then the blocked client will be treated as if it is not in a blocked state and CLIENT UNBLOCK will return a zero value.

Measuring background time: By default the time spent in the blocked command is not account for the total command duration. To include such time you should use RedisModule_BlockedClientMeasureTimeStart() and RedisModule_BlockedClientMeasureTimeEnd() one, or multiple times within the blocking command background work.

RedisModule_BlockClientOnKeys

RedisModuleBlockedClient *RedisModule_BlockClientOnKeys(RedisModuleCtx *ctx,
                                                        RedisModuleCmdFunc reply_callback,
                                                        RedisModuleCmdFunc timeout_callback,
                                                        void (*free_privdata)(RedisModuleCtx*, void*),
                                                        long long timeout_ms,
                                                        RedisModuleString **keys,
                                                        int numkeys,
                                                        void *privdata);

Available since: 6.0.0

This call is similar to RedisModule_BlockClient(), however in this case we don't just block the client, but also ask Redis to unblock it automatically once certain keys become "ready", that is, contain more data.

Basically this is similar to what a typical Redis command usually does, like BLPOP or BZPOPMAX: the client blocks if it cannot be served ASAP, and later when the key receives new data (a list push for instance), the client is unblocked and served.

However in the case of this module API, when the client is unblocked?

  1. If you block on a key of a type that has blocking operations associated, like a list, a sorted set, a stream, and so forth, the client may be unblocked once the relevant key is targeted by an operation that normally unblocks the native blocking operations for that type. So if we block on a list key, an RPUSH command may unblock our client and so forth.
  2. If you are implementing your native data type, or if you want to add new unblocking conditions in addition to "1", you can call the modules API RedisModule_SignalKeyAsReady().

Anyway we can't be sure if the client should be unblocked just because the key is signaled as ready: for instance a successive operation may change the key, or a client in queue before this one can be served, modifying the key as well and making it empty again. So when a client is blocked with RedisModule_BlockClientOnKeys() the reply callback is not called after RedisModule_UnblockClient() is called, but every time a key is signaled as ready: if the reply callback can serve the client, it returns REDISMODULE_OK and the client is unblocked, otherwise it will return REDISMODULE_ERR and we'll try again later.

The reply callback can access the key that was signaled as ready by calling the API RedisModule_GetBlockedClientReadyKey(), that returns just the string name of the key as a RedisModuleString object.

Thanks to this system we can setup complex blocking scenarios, like unblocking a client only if a list contains at least 5 items or other more fancy logics.

Note that another difference with RedisModule_BlockClient(), is that here we pass the private data directly when blocking the client: it will be accessible later in the reply callback. Normally when blocking with RedisModule_BlockClient() the private data to reply to the client is passed when calling RedisModule_UnblockClient() but here the unblocking is performed by Redis itself, so we need to have some private data before hand. The private data is used to store any information about the specific unblocking operation that you are implementing. Such information will be freed using the free_privdata callback provided by the user.

However the reply callback will be able to access the argument vector of the command, so the private data is often not needed.

Note: Under normal circumstances RedisModule_UnblockClient should not be called for clients that are blocked on keys (Either the key will become ready or a timeout will occur). If for some reason you do want to call RedisModule_UnblockClient it is possible: Client will be handled as if it were timed-out (You must implement the timeout callback in that case).

RedisModule_SignalKeyAsReady

void RedisModule_SignalKeyAsReady(RedisModuleCtx *ctx, RedisModuleString *key);

Available since: 6.0.0

This function is used in order to potentially unblock a client blocked on keys with RedisModule_BlockClientOnKeys(). When this function is called, all the clients blocked for this key will get their reply_callback called.

Note: The function has no effect if the signaled key doesn't exist.

RedisModule_UnblockClient

int RedisModule_UnblockClient(RedisModuleBlockedClient *bc, void *privdata);

Available since: 4.0.0

Unblock a client blocked by RedisModule_BlockedClient. This will trigger the reply callbacks to be called in order to reply to the client. The 'privdata' argument will be accessible by the reply callback, so the caller of this function can pass any value that is needed in order to actually reply to the client.

A common usage for 'privdata' is a thread that computes something that needs to be passed to the client, included but not limited some slow to compute reply or some reply obtained via networking.

Note 1: this function can be called from threads spawned by the module.

Note 2: when we unblock a client that is blocked for keys using the API RedisModule_BlockClientOnKeys(), the privdata argument here is not used. Unblocking a client that was blocked for keys using this API will still require the client to get some reply, so the function will use the "timeout" handler in order to do so (The privdata provided in RedisModule_BlockClientOnKeys() is accessible from the timeout callback via RedisModule_GetBlockedClientPrivateData).

RedisModule_AbortBlock

int RedisModule_AbortBlock(RedisModuleBlockedClient *bc);

Available since: 4.0.0

Abort a blocked client blocking operation: the client will be unblocked without firing any callback.

RedisModule_SetDisconnectCallback

void RedisModule_SetDisconnectCallback(RedisModuleBlockedClient *bc,
                                       RedisModuleDisconnectFunc callback);

Available since: 5.0.0

Set a callback that will be called if a blocked client disconnects before the module has a chance to call RedisModule_UnblockClient()

Usually what you want to do there, is to cleanup your module state so that you can call RedisModule_UnblockClient() safely, otherwise the client will remain blocked forever if the timeout is large.

Notes:

  1. It is not safe to call Reply* family functions here, it is also useless since the client is gone.

  2. This callback is not called if the client disconnects because of a timeout. In such a case, the client is unblocked automatically and the timeout callback is called.

RedisModule_IsBlockedReplyRequest

int RedisModule_IsBlockedReplyRequest(RedisModuleCtx *ctx);

Available since: 4.0.0

Return non-zero if a module command was called in order to fill the reply for a blocked client.

RedisModule_IsBlockedTimeoutRequest

int RedisModule_IsBlockedTimeoutRequest(RedisModuleCtx *ctx);

Available since: 4.0.0

Return non-zero if a module command was called in order to fill the reply for a blocked client that timed out.

RedisModule_GetBlockedClientPrivateData

void *RedisModule_GetBlockedClientPrivateData(RedisModuleCtx *ctx);

Available since: 4.0.0

Get the private data set by RedisModule_UnblockClient()

RedisModule_GetBlockedClientReadyKey

RedisModuleString *RedisModule_GetBlockedClientReadyKey(RedisModuleCtx *ctx);

Available since: 6.0.0

Get the key that is ready when the reply callback is called in the context of a client blocked by RedisModule_BlockClientOnKeys().

RedisModule_GetBlockedClientHandle

RedisModuleBlockedClient *RedisModule_GetBlockedClientHandle(RedisModuleCtx *ctx);

Available since: 5.0.0

Get the blocked client associated with a given context. This is useful in the reply and timeout callbacks of blocked clients, before sometimes the module has the blocked client handle references around, and wants to cleanup it.

RedisModule_BlockedClientDisconnected

int RedisModule_BlockedClientDisconnected(RedisModuleCtx *ctx);

Available since: 5.0.0

Return true if when the free callback of a blocked client is called, the reason for the client to be unblocked is that it disconnected while it was blocked.

Thread Safe Contexts

RedisModule_GetThreadSafeContext

RedisModuleCtx *RedisModule_GetThreadSafeContext(RedisModuleBlockedClient *bc);

Available since: 4.0.0

Return a context which can be used inside threads to make Redis context calls with certain modules APIs. If 'bc' is not NULL then the module will be bound to a blocked client, and it will be possible to use the RedisModule_Reply* family of functions to accumulate a reply for when the client will be unblocked. Otherwise the thread safe context will be detached by a specific client.

To call non-reply APIs, the thread safe context must be prepared with:

RedisModule_ThreadSafeContextLock(ctx);
... make your call here ...
RedisModule_ThreadSafeContextUnlock(ctx);

This is not needed when using RedisModule_Reply* functions, assuming that a blocked client was used when the context was created, otherwise no RedisModule_Reply* call should be made at all.

NOTE: If you're creating a detached thread safe context (bc is NULL), consider using RM_GetDetachedThreadSafeContext which will also retain the module ID and thus be more useful for logging.

RedisModule_GetDetachedThreadSafeContext

RedisModuleCtx *RedisModule_GetDetachedThreadSafeContext(RedisModuleCtx *ctx);

Available since: 6.0.9

Return a detached thread safe context that is not associated with any specific blocked client, but is associated with the module's context.

This is useful for modules that wish to hold a global context over a long term, for purposes such as logging.

RedisModule_FreeThreadSafeContext

void RedisModule_FreeThreadSafeContext(RedisModuleCtx *ctx);

Available since: 4.0.0

Release a thread safe context.

RedisModule_ThreadSafeContextLock

void RedisModule_ThreadSafeContextLock(RedisModuleCtx *ctx);

Available since: 4.0.0

Acquire the server lock before executing a thread safe API call. This is not needed for RedisModule_Reply* calls when there is a blocked client connected to the thread safe context.

RedisModule_ThreadSafeContextTryLock

int RedisModule_ThreadSafeContextTryLock(RedisModuleCtx *ctx);

Available since: 6.0.8

Similar to RedisModule_ThreadSafeContextLock but this function would not block if the server lock is already acquired.

If successful (lock acquired) REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned and errno is set accordingly.

RedisModule_ThreadSafeContextUnlock

void RedisModule_ThreadSafeContextUnlock(RedisModuleCtx *ctx);

Available since: 4.0.0

Release the server lock after a thread safe API call was executed.

Module Keyspace Notifications API

RedisModule_SubscribeToKeyspaceEvents

int RedisModule_SubscribeToKeyspaceEvents(RedisModuleCtx *ctx,
                                          int types,
                                          RedisModuleNotificationFunc callback);

Available since: 4.0.9

Subscribe to keyspace notifications. This is a low-level version of the keyspace-notifications API. A module can register callbacks to be notified when keyspace events occur.

Notification events are filtered by their type (string events, set events, etc), and the subscriber callback receives only events that match a specific mask of event types.

When subscribing to notifications with RedisModule_SubscribeToKeyspaceEvents the module must provide an event type-mask, denoting the events the subscriber is interested in. This can be an ORed mask of any of the following flags:

  • REDISMODULE_NOTIFY_GENERIC: Generic commands like DEL, EXPIRE, RENAME
  • REDISMODULE_NOTIFY_STRING: String events
  • REDISMODULE_NOTIFY_LIST: List events
  • REDISMODULE_NOTIFY_SET: Set events
  • REDISMODULE_NOTIFY_HASH: Hash events
  • REDISMODULE_NOTIFY_ZSET: Sorted Set events
  • REDISMODULE_NOTIFY_EXPIRED: Expiration events
  • REDISMODULE_NOTIFY_EVICTED: Eviction events
  • REDISMODULE_NOTIFY_STREAM: Stream events
  • REDISMODULE_NOTIFY_MODULE: Module types events
  • REDISMODULE_NOTIFY_KEYMISS: Key-miss events
  • REDISMODULE_NOTIFY_ALL: All events (Excluding REDISMODULE_NOTIFY_KEYMISS)
  • REDISMODULE_NOTIFY_LOADED: A special notification available only for modules, indicates that the key was loaded from persistence. Notice, when this event fires, the given key can not be retained, use RM_CreateStringFromString instead.

We do not distinguish between key events and keyspace events, and it is up to the module to filter the actions taken based on the key.

The subscriber signature is:

int (*RedisModuleNotificationFunc) (RedisModuleCtx *ctx, int type,
                                    const char *event,
                                    RedisModuleString *key);

type is the event type bit, that must match the mask given at registration time. The event string is the actual command being executed, and key is the relevant Redis key.

Notification callback gets executed with a redis context that can not be used to send anything to the client, and has the db number where the event occurred as its selected db number.

Notice that it is not necessary to enable notifications in redis.conf for module notifications to work.

Warning: the notification callbacks are performed in a synchronous manner, so notification callbacks must to be fast, or they would slow Redis down. If you need to take long actions, use threads to offload them.

See https://redis.io/topics/notifications for more information.

RedisModule_GetNotifyKeyspaceEvents

int RedisModule_GetNotifyKeyspaceEvents();

Available since: 6.0.0

Get the configured bitmap of notify-keyspace-events (Could be used for additional filtering in RedisModuleNotificationFunc)

RedisModule_NotifyKeyspaceEvent

int RedisModule_NotifyKeyspaceEvent(RedisModuleCtx *ctx,
                                    int type,
                                    const char *event,
                                    RedisModuleString *key);

Available since: 6.0.0

Expose notifyKeyspaceEvent to modules

Modules Cluster API

RedisModule_RegisterClusterMessageReceiver

void RedisModule_RegisterClusterMessageReceiver(RedisModuleCtx *ctx,
                                                uint8_t type,
                                                RedisModuleClusterMessageReceiver callback);

Available since: 5.0.0

Register a callback receiver for cluster messages of type 'type'. If there was already a registered callback, this will replace the callback function with the one provided, otherwise if the callback is set to NULL and there is already a callback for this function, the callback is unregistered (so this API call is also used in order to delete the receiver).

RedisModule_SendClusterMessage

int RedisModule_SendClusterMessage(RedisModuleCtx *ctx,
                                   const char *target_id,
                                   uint8_t type,
                                   const char *msg,
                                   uint32_t len);

Available since: 5.0.0

Send a message to all the nodes in the cluster if target is NULL, otherwise at the specified target, which is a REDISMODULE_NODE_ID_LEN bytes node ID, as returned by the receiver callback or by the nodes iteration functions.

The function returns REDISMODULE_OK if the message was successfully sent, otherwise if the node is not connected or such node ID does not map to any known cluster node, REDISMODULE_ERR is returned.

RedisModule_GetClusterNodesList

char **RedisModule_GetClusterNodesList(RedisModuleCtx *ctx, size_t *numnodes);

Available since: 5.0.0

Return an array of string pointers, each string pointer points to a cluster node ID of exactly REDISMODULE_NODE_ID_LEN bytes (without any null term). The number of returned node IDs is stored into *numnodes. However if this function is called by a module not running an a Redis instance with Redis Cluster enabled, NULL is returned instead.

The IDs returned can be used with RedisModule_GetClusterNodeInfo() in order to get more information about single node.

The array returned by this function must be freed using the function RedisModule_FreeClusterNodesList().

Example:

size_t count, j;
char **ids = RedisModule_GetClusterNodesList(ctx,&count);
for (j = 0; j < count; j++) {
    RedisModule_Log(ctx,"notice","Node %.*s",
        REDISMODULE_NODE_ID_LEN,ids[j]);
}
RedisModule_FreeClusterNodesList(ids);

RedisModule_FreeClusterNodesList

void RedisModule_FreeClusterNodesList(char **ids);

Available since: 5.0.0

Free the node list obtained with RedisModule_GetClusterNodesList.

RedisModule_GetMyClusterID

const char *RedisModule_GetMyClusterID(void);

Available since: 5.0.0

Return this node ID (REDISMODULE_CLUSTER_ID_LEN bytes) or NULL if the cluster is disabled.

RedisModule_GetClusterSize

size_t RedisModule_GetClusterSize(void);

Available since: 5.0.0

Return the number of nodes in the cluster, regardless of their state (handshake, noaddress, ...) so that the number of active nodes may actually be smaller, but not greater than this number. If the instance is not in cluster mode, zero is returned.

RedisModule_GetClusterNodeInfo

int RedisModule_GetClusterNodeInfo(RedisModuleCtx *ctx,
                                   const char *id,
                                   char *ip,
                                   char *master_id,
                                   int *port,
                                   int *flags);

Available since: 5.0.0

Populate the specified info for the node having as ID the specified 'id', then returns REDISMODULE_OK. Otherwise if the format of node ID is invalid or the node ID does not exist from the POV of this local node, REDISMODULE_ERR is returned.

The arguments ip, master_id, port and flags can be NULL in case we don't need to populate back certain info. If an ip and master_id (only populated if the instance is a slave) are specified, they point to buffers holding at least REDISMODULE_NODE_ID_LEN bytes. The strings written back as ip and master_id are not null terminated.

The list of flags reported is the following:

  • REDISMODULE_NODE_MYSELF: This node
  • REDISMODULE_NODE_MASTER: The node is a master
  • REDISMODULE_NODE_SLAVE: The node is a replica
  • REDISMODULE_NODE_PFAIL: We see the node as failing
  • REDISMODULE_NODE_FAIL: The cluster agrees the node is failing
  • REDISMODULE_NODE_NOFAILOVER: The slave is configured to never failover

RedisModule_SetClusterFlags

void RedisModule_SetClusterFlags(RedisModuleCtx *ctx, uint64_t flags);

Available since: 5.0.0

Set Redis Cluster flags in order to change the normal behavior of Redis Cluster, especially with the goal of disabling certain functions. This is useful for modules that use the Cluster API in order to create a different distributed system, but still want to use the Redis Cluster message bus. Flags that can be set:

  • CLUSTER_MODULE_FLAG_NO_FAILOVER
  • CLUSTER_MODULE_FLAG_NO_REDIRECTION

With the following effects:

  • NO_FAILOVER: prevent Redis Cluster slaves from failing over a dead master. Also disables the replica migration feature.

  • NO_REDIRECTION: Every node will accept any key, without trying to perform partitioning according to the Redis Cluster algorithm. Slots information will still be propagated across the cluster, but without effect.

Modules Timers API

Module timers are an high precision "green timers" abstraction where every module can register even millions of timers without problems, even if the actual event loop will just have a single timer that is used to awake the module timers subsystem in order to process the next event.

All the timers are stored into a radix tree, ordered by expire time, when the main Redis event loop timer callback is called, we try to process all the timers already expired one after the other. Then we re-enter the event loop registering a timer that will expire when the next to process module timer will expire.

Every time the list of active timers drops to zero, we unregister the main event loop timer, so that there is no overhead when such feature is not used.

RedisModule_CreateTimer

RedisModuleTimerID RedisModule_CreateTimer(RedisModuleCtx *ctx,
                                           mstime_t period,
                                           RedisModuleTimerProc callback,
                                           void *data);

Available since: 5.0.0

Create a new timer that will fire after period milliseconds, and will call the specified function using data as argument. The returned timer ID can be used to get information from the timer or to stop it before it fires. Note that for the common use case of a repeating timer (Re-registration of the timer inside the RedisModuleTimerProc callback) it matters when this API is called: If it is called at the beginning of 'callback' it means the event will triggered every 'period'. If it is called at the end of 'callback' it means there will 'period' milliseconds gaps between events. (If the time it takes to execute 'callback' is negligible the two statements above mean the same)

RedisModule_StopTimer

int RedisModule_StopTimer(RedisModuleCtx *ctx,
                          RedisModuleTimerID id,
                          void **data);

Available since: 5.0.0

Stop a timer, returns REDISMODULE_OK if the timer was found, belonged to the calling module, and was stopped, otherwise REDISMODULE_ERR is returned. If not NULL, the data pointer is set to the value of the data argument when the timer was created.

RedisModule_GetTimerInfo

int RedisModule_GetTimerInfo(RedisModuleCtx *ctx,
                             RedisModuleTimerID id,
                             uint64_t *remaining,
                             void **data);

Available since: 5.0.0

Obtain information about a timer: its remaining time before firing (in milliseconds), and the private data pointer associated with the timer. If the timer specified does not exist or belongs to a different module no information is returned and the function returns REDISMODULE_ERR, otherwise REDISMODULE_OK is returned. The arguments remaining or data can be NULL if the caller does not need certain information.

Modules EventLoop API

RedisModule_EventLoopAdd

int RedisModule_EventLoopAdd(int fd,
                             int mask,
                             RedisModuleEventLoopFunc func,
                             void *user_data);

Available since: 7.0.0

Add a pipe / socket event to the event loop.

  • mask must be one of the following values:

    • REDISMODULE_EVENTLOOP_READABLE
    • REDISMODULE_EVENTLOOP_WRITABLE
    • REDISMODULE_EVENTLOOP_READABLE | REDISMODULE_EVENTLOOP_WRITABLE

On success REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned and errno is set to the following values:

  • ERANGE: fd is negative or higher than maxclients Redis config.
  • EINVAL: callback is NULL or mask value is invalid.

errno might take other values in case of an internal error.

Example:

void onReadable(int fd, void *user_data, int mask) {
    char buf[32];
    int bytes = read(fd,buf,sizeof(buf));
    printf("Read %d bytes \n", bytes);
}
RM_EventLoopAdd(fd, REDISMODULE_EVENTLOOP_READABLE, onReadable, NULL);

RedisModule_EventLoopDel

int RedisModule_EventLoopDel(int fd, int mask);

Available since: 7.0.0

Delete a pipe / socket event from the event loop.

  • mask must be one of the following values:

    • REDISMODULE_EVENTLOOP_READABLE
    • REDISMODULE_EVENTLOOP_WRITABLE
    • REDISMODULE_EVENTLOOP_READABLE | REDISMODULE_EVENTLOOP_WRITABLE

On success REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned and errno is set to the following values:

  • ERANGE: fd is negative or higher than maxclients Redis config.
  • EINVAL: mask value is invalid.

RedisModule_EventLoopAddOneShot

int RedisModule_EventLoopAddOneShot(RedisModuleEventLoopOneShotFunc func,
                                    void *user_data);

Available since: 7.0.0

This function can be called from other threads to trigger callback on Redis main thread. On success REDISMODULE_OK is returned. If func is NULL REDISMODULE_ERR is returned and errno is set to EINVAL.

Modules ACL API

Implements a hook into the authentication and authorization within Redis.

RedisModule_CreateModuleUser

RedisModuleUser *RedisModule_CreateModuleUser(const char *name);

Available since: 6.0.0

Creates a Redis ACL user that the module can use to authenticate a client. After obtaining the user, the module should set what such user can do using the RedisModule_SetUserACL() function. Once configured, the user can be used in order to authenticate a connection, with the specified ACL rules, using the RedisModule_AuthClientWithUser() function.

Note that:

  • Users created here are not listed by the ACL command.
  • Users created here are not checked for duplicated name, so it's up to the module calling this function to take care of not creating users with the same name.
  • The created user can be used to authenticate multiple Redis connections.

The caller can later free the user using the function RedisModule_FreeModuleUser(). When this function is called, if there are still clients authenticated with this user, they are disconnected. The function to free the user should only be used when the caller really wants to invalidate the user to define a new one with different capabilities.

RedisModule_FreeModuleUser

int RedisModule_FreeModuleUser(RedisModuleUser *user);

Available since: 6.0.0

Frees a given user and disconnects all of the clients that have been authenticated with it. See RedisModule_CreateModuleUser for detailed usage.

RedisModule_SetModuleUserACL

int RedisModule_SetModuleUserACL(RedisModuleUser *user, const char* acl);

Available since: 6.0.0

Sets the permissions of a user created through the redis module interface. The syntax is the same as ACL SETUSER, so refer to the documentation in acl.c for more information. See RedisModule_CreateModuleUser for detailed usage.

Returns REDISMODULE_OK on success and REDISMODULE_ERR on failure and will set an errno describing why the operation failed.

RedisModule_GetCurrentUserName

RedisModuleString *RedisModule_GetCurrentUserName(RedisModuleCtx *ctx);

Available since: 7.0.0

Retrieve the user name of the client connection behind the current context. The user name can be used later, in order to get a RedisModuleUser. See more information in RedisModule_GetModuleUserFromUserName.

The returned string must be released with RedisModule_FreeString() or by enabling automatic memory management.

RedisModule_GetModuleUserFromUserName

RedisModuleUser *RedisModule_GetModuleUserFromUserName(RedisModuleString *name);

Available since: 7.0.0

A RedisModuleUser can be used to check if command, key or channel can be executed or accessed according to the ACLs rules associated with that user. When a Module wants to do ACL checks on a general ACL user (not created by RedisModule_CreateModuleUser), it can get the RedisModuleUser from this API, based on the user name retrieved by RedisModule_GetCurrentUserName.

Since a general ACL user can be deleted at any time, this RedisModuleUser should be used only in the context where this function was called. In order to do ACL checks out of that context, the Module can store the user name, and call this API at any other context.

Returns NULL if the user is disabled or the user does not exist. The caller should later free the user using the function RedisModule_FreeModuleUser().

RedisModule_ACLCheckCommandPermissions

int RedisModule_ACLCheckCommandPermissions(RedisModuleUser *user,
                                           RedisModuleString **argv,
                                           int argc);

Available since: 7.0.0

Checks if the command can be executed by the user, according to the ACLs associated with it.

On success a REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned and errno is set to the following values:

  • ENOENT: Specified command does not exist.
  • EACCES: Command cannot be executed, according to ACL rules

RedisModule_ACLCheckKeyPermissions

int RedisModule_ACLCheckKeyPermissions(RedisModuleUser *user,
                                       RedisModuleString *key,
                                       int flags);

Available since: 7.0.0

Check if the key can be accessed by the user according to the ACLs attached to the user and the flags representing the key access. The flags are the same that are used in the keyspec for logical operations. These flags are documented in RedisModule_SetCommandInfo as the REDISMODULE_CMD_KEY_ACCESS, REDISMODULE_CMD_KEY_UPDATE, REDISMODULE_CMD_KEY_INSERT, and REDISMODULE_CMD_KEY_DELETE flags.

If no flags are supplied, the user is still required to have some access to the key for this command to return successfully.

If the user is able to access the key then REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned and errno is set to one of the following values:

  • EINVAL: The provided flags are invalid.
  • EACCESS: The user does not have permission to access the key.

RedisModule_ACLCheckChannelPermissions

int RedisModule_ACLCheckChannelPermissions(RedisModuleUser *user,
                                           RedisModuleString *ch,
                                           int flags);

Available since: 7.0.0

Check if the pubsub channel can be accessed by the user based off of the given access flags. See RedisModule_ChannelAtPosWithFlags for more information about the possible flags that can be passed in.

If the user is able to acecss the pubsub channel then REDISMODULE_OK is returned, otherwise REDISMODULE_ERR is returned and errno is set to one of the following values:

  • EINVAL: The provided flags are invalid.
  • EACCESS: The user does not have permission to access the pubsub channel.

RedisModule_ACLAddLogEntry

int RedisModule_ACLAddLogEntry(RedisModuleCtx *ctx,
                               RedisModuleUser *user,
                               RedisModuleString *object,
                               RedisModuleACLLogEntryReason reason);

Available since: 7.0.0

Adds a new entry in the ACL log. Returns REDISMODULE_OK on success and REDISMODULE_ERR on error.

For more information about ACL log, please refer to https://redis.io/commands/acl-log

RedisModule_AuthenticateClientWithUser

int RedisModule_AuthenticateClientWithUser(RedisModuleCtx *ctx,
                                           RedisModuleUser *module_user,
                                           RedisModuleUserChangedFunc callback,
                                           void *privdata,
                                           uint64_t *client_id);

Available since: 6.0.0

Authenticate the current context's user with the provided redis acl user. Returns REDISMODULE_ERR if the user is disabled.

See authenticateClientWithUser for information about callback, client_id, and general usage for authentication.

RedisModule_AuthenticateClientWithACLUser

int RedisModule_AuthenticateClientWithACLUser(RedisModuleCtx *ctx,
                                              const char *name,
                                              size_t len,
                                              RedisModuleUserChangedFunc callback,
                                              void *privdata,
                                              uint64_t *client_id);

Available since: 6.0.0

Authenticate the current context's user with the provided redis acl user. Returns REDISMODULE_ERR if the user is disabled or the user does not exist.

See authenticateClientWithUser for information about callback, client_id, and general usage for authentication.

RedisModule_DeauthenticateAndCloseClient

int RedisModule_DeauthenticateAndCloseClient(RedisModuleCtx *ctx,
                                             uint64_t client_id);

Available since: 6.0.0

Deauthenticate and close the client. The client resources will not be be immediately freed, but will be cleaned up in a background job. This is the recommended way to deauthenticate a client since most clients can't handle users becoming deauthenticated. Returns REDISMODULE_ERR when the client doesn't exist and REDISMODULE_OK when the operation was successful.

The client ID is returned from the RedisModule_AuthenticateClientWithUser and RedisModule_AuthenticateClientWithACLUser APIs, but can be obtained through the CLIENT api or through server events.

This function is not thread safe, and must be executed within the context of a command or thread safe context.

RedisModule_RedactClientCommandArgument

int RedisModule_RedactClientCommandArgument(RedisModuleCtx *ctx, int pos);

Available since: 7.0.0

Redact the client command argument specified at the given position. Redacted arguments are obfuscated in user facing commands such as SLOWLOG or MONITOR, as well as never being written to server logs. This command may be called multiple times on the same position.

Note that the command name, position 0, can not be redacted.

Returns REDISMODULE_OK if the argument was redacted and REDISMODULE_ERR if there was an invalid parameter passed in or the position is outside the client argument range.

RedisModule_GetClientCertificate

RedisModuleString *RedisModule_GetClientCertificate(RedisModuleCtx *ctx,
                                                    uint64_t client_id);

Available since: 6.0.9

Return the X.509 client-side certificate used by the client to authenticate this connection.

The return value is an allocated RedisModuleString that is a X.509 certificate encoded in PEM (Base64) format. It should be freed (or auto-freed) by the caller.

A NULL value is returned in the following conditions:

  • Connection ID does not exist
  • Connection is not a TLS connection
  • Connection is a TLS connection but no client certificate was used

Modules Dictionary API

Implements a sorted dictionary (actually backed by a radix tree) with the usual get / set / del / num-items API, together with an iterator capable of going back and forth.

RedisModule_CreateDict

RedisModuleDict *RedisModule_CreateDict(RedisModuleCtx *ctx);

Available since: 5.0.0

Create a new dictionary. The 'ctx' pointer can be the current module context or NULL, depending on what you want. Please follow the following rules:

  1. Use a NULL context if you plan to retain a reference to this dictionary that will survive the time of the module callback where you created it.
  2. Use a NULL context if no context is available at the time you are creating the dictionary (of course...).
  3. However use the current callback context as 'ctx' argument if the dictionary time to live is just limited to the callback scope. In this case, if enabled, you can enjoy the automatic memory management that will reclaim the dictionary memory, as well as the strings returned by the Next / Prev dictionary iterator calls.

RedisModule_FreeDict

void RedisModule_FreeDict(RedisModuleCtx *ctx, RedisModuleDict *d);

Available since: 5.0.0

Free a dictionary created with RedisModule_CreateDict(). You need to pass the context pointer 'ctx' only if the dictionary was created using the context instead of passing NULL.

RedisModule_DictSize

uint64_t RedisModule_DictSize(RedisModuleDict *d);

Available since: 5.0.0

Return the size of the dictionary (number of keys).

RedisModule_DictSetC

int RedisModule_DictSetC(RedisModuleDict *d,
                         void *key,
                         size_t keylen,
                         void *ptr);

Available since: 5.0.0

Store the specified key into the dictionary, setting its value to the pointer 'ptr'. If the key was added with success, since it did not already exist, REDISMODULE_OK is returned. Otherwise if the key already exists the function returns REDISMODULE_ERR.

RedisModule_DictReplaceC

int RedisModule_DictReplaceC(RedisModuleDict *d,
                             void *key,
                             size_t keylen,
                             void *ptr);

Available since: 5.0.0

Like RedisModule_DictSetC() but will replace the key with the new value if the key already exists.

RedisModule_DictSet

int RedisModule_DictSet(RedisModuleDict *d, RedisModuleString *key, void *ptr);

Available since: 5.0.0

Like RedisModule_DictSetC() but takes the key as a RedisModuleString.

RedisModule_DictReplace

int RedisModule_DictReplace(RedisModuleDict *d,
                            RedisModuleString *key,
                            void *ptr);

Available since: 5.0.0

Like RedisModule_DictReplaceC() but takes the key as a RedisModuleString.

RedisModule_DictGetC

void *RedisModule_DictGetC(RedisModuleDict *d,
                           void *key,
                           size_t keylen,
                           int *nokey);

Available since: 5.0.0

Return the value stored at the specified key. The function returns NULL both in the case the key does not exist, or if you actually stored NULL at key. So, optionally, if the 'nokey' pointer is not NULL, it will be set by reference to 1 if the key does not exist, or to 0 if the key exists.

RedisModule_DictGet

void *RedisModule_DictGet(RedisModuleDict *d,
                          RedisModuleString *key,
                          int *nokey);

Available since: 5.0.0

Like RedisModule_DictGetC() but takes the key as a RedisModuleString.

RedisModule_DictDelC

int RedisModule_DictDelC(RedisModuleDict *d,
                         void *key,
                         size_t keylen,
                         void *oldval);

Available since: 5.0.0

Remove the specified key from the dictionary, returning REDISMODULE_OK if the key was found and deleted, or REDISMODULE_ERR if instead there was no such key in the dictionary. When the operation is successful, if 'oldval' is not NULL, then '*oldval' is set to the value stored at the key before it was deleted. Using this feature it is possible to get a pointer to the value (for instance in order to release it), without having to call RedisModule_DictGet() before deleting the key.

RedisModule_DictDel

int RedisModule_DictDel(RedisModuleDict *d,
                        RedisModuleString *key,
                        void *oldval);

Available since: 5.0.0

Like RedisModule_DictDelC() but gets the key as a RedisModuleString.

RedisModule_DictIteratorStartC

RedisModuleDictIter *RedisModule_DictIteratorStartC(RedisModuleDict *d,
                                                    const char *op,
                                                    void *key,
                                                    size_t keylen);

Available since: 5.0.0

Return an iterator, setup in order to start iterating from the specified key by applying the operator 'op', which is just a string specifying the comparison operator to use in order to seek the first element. The operators available are:

  • ^ – Seek the first (lexicographically smaller) key.
  • $ – Seek the last (lexicographically bigger) key.
  • > – Seek the first element greater than the specified key.
  • >= – Seek the first element greater or equal than the specified key.
  • < – Seek the first element smaller than the specified key.
  • <= – Seek the first element smaller or equal than the specified key.
  • == – Seek the first element matching exactly the specified key.

Note that for ^ and $ the passed key is not used, and the user may just pass NULL with a length of 0.

If the element to start the iteration cannot be seeked based on the key and operator passed, RedisModule_DictNext() / Prev() will just return REDISMODULE_ERR at the first call, otherwise they'll produce elements.

RedisModule_DictIteratorStart

RedisModuleDictIter *RedisModule_DictIteratorStart(RedisModuleDict *d,
                                                   const char *op,
                                                   RedisModuleString *key);

Available since: 5.0.0

Exactly like RedisModule_DictIteratorStartC, but the key is passed as a RedisModuleString.

RedisModule_DictIteratorStop

void RedisModule_DictIteratorStop(RedisModuleDictIter *di);

Available since: 5.0.0

Release the iterator created with RedisModule_DictIteratorStart(). This call is mandatory otherwise a memory leak is introduced in the module.

RedisModule_DictIteratorReseekC

int RedisModule_DictIteratorReseekC(RedisModuleDictIter *di,
                                    const char *op,
                                    void *key,
                                    size_t keylen);

Available since: 5.0.0

After its creation with RedisModule_DictIteratorStart(), it is possible to change the currently selected element of the iterator by using this API call. The result based on the operator and key is exactly like the function RedisModule_DictIteratorStart(), however in this case the return value is just REDISMODULE_OK in case the seeked element was found, or REDISMODULE_ERR in case it was not possible to seek the specified element. It is possible to reseek an iterator as many times as you want.

RedisModule_DictIteratorReseek

int RedisModule_DictIteratorReseek(RedisModuleDictIter *di,
                                   const char *op,
                                   RedisModuleString *key);

Available since: 5.0.0

Like RedisModule_DictIteratorReseekC() but takes the key as as a RedisModuleString.

RedisModule_DictNextC

void *RedisModule_DictNextC(RedisModuleDictIter *di,
                            size_t *keylen,
                            void **dataptr);

Available since: 5.0.0

Return the current item of the dictionary iterator di and steps to the next element. If the iterator already yield the last element and there are no other elements to return, NULL is returned, otherwise a pointer to a string representing the key is provided, and the *keylen length is set by reference (if keylen is not NULL). The *dataptr, if not NULL is set to the value of the pointer stored at the returned key as auxiliary data (as set by the RedisModule_DictSet API).

Usage example:

 ... create the iterator here ...
 char *key;
 void *data;
 while((key = RedisModule_DictNextC(iter,&keylen,&data)) != NULL) {
     printf("%.*s %p\n", (int)keylen, key, data);
 }

The returned pointer is of type void because sometimes it makes sense to cast it to a char* sometimes to an unsigned char* depending on the fact it contains or not binary data, so this API ends being more comfortable to use.

The validity of the returned pointer is until the next call to the next/prev iterator step. Also the pointer is no longer valid once the iterator is released.

RedisModule_DictPrevC

void *RedisModule_DictPrevC(RedisModuleDictIter *di,
                            size_t *keylen,
                            void **dataptr);

Available since: 5.0.0

This function is exactly like RedisModule_DictNext() but after returning the currently selected element in the iterator, it selects the previous element (lexicographically smaller) instead of the next one.

RedisModule_DictNext

RedisModuleString *RedisModule_DictNext(RedisModuleCtx *ctx,
                                        RedisModuleDictIter *di,
                                        void **dataptr);

Available since: 5.0.0

Like RedisModuleNextC(), but instead of returning an internally allocated buffer and key length, it returns directly a module string object allocated in the specified context 'ctx' (that may be NULL exactly like for the main API RedisModule_CreateString).

The returned string object should be deallocated after use, either manually or by using a context that has automatic memory management active.

RedisModule_DictPrev

RedisModuleString *RedisModule_DictPrev(RedisModuleCtx *ctx,
                                        RedisModuleDictIter *di,
                                        void **dataptr);

Available since: 5.0.0

Like RedisModule_DictNext() but after returning the currently selected element in the iterator, it selects the previous element (lexicographically smaller) instead of the next one.

RedisModule_DictCompareC

int RedisModule_DictCompareC(RedisModuleDictIter *di,
                             const char *op,
                             void *key,
                             size_t keylen);

Available since: 5.0.0

Compare the element currently pointed by the iterator to the specified element given by key/keylen, according to the operator 'op' (the set of valid operators are the same valid for RedisModule_DictIteratorStart). If the comparison is successful the command returns REDISMODULE_OK otherwise REDISMODULE_ERR is returned.

This is useful when we want to just emit a lexicographical range, so in the loop, as we iterate elements, we can also check if we are still on range.

The function return REDISMODULE_ERR if the iterator reached the end of elements condition as well.

RedisModule_DictCompare

int RedisModule_DictCompare(RedisModuleDictIter *di,
                            const char *op,
                            RedisModuleString *key);

Available since: 5.0.0

Like RedisModule_DictCompareC but gets the key to compare with the current iterator key as a RedisModuleString.

Modules Info fields

RedisModule_InfoAddSection

int RedisModule_InfoAddSection(RedisModuleInfoCtx *ctx, const char *name);

Available since: 6.0.0

Used to start a new section, before adding any fields. the section name will be prefixed by <modulename>_ and must only include A-Z,a-z,0-9. NULL or empty string indicates the default section (only <modulename>) is used. When return value is REDISMODULE_ERR, the section should and will be skipped.

RedisModule_InfoBeginDictField

int RedisModule_InfoBeginDictField(RedisModuleInfoCtx *ctx, const char *name);

Available since: 6.0.0

Starts a dict field, similar to the ones in INFO KEYSPACE. Use normal RedisModule_InfoAddField* functions to add the items to this field, and terminate with RedisModule_InfoEndDictField.

RedisModule_InfoEndDictField

int RedisModule_InfoEndDictField(RedisModuleInfoCtx *ctx);

Available since: 6.0.0

Ends a dict field, see RedisModule_InfoBeginDictField

RedisModule_InfoAddFieldString

int RedisModule_InfoAddFieldString(RedisModuleInfoCtx *ctx,
                                   const char *field,
                                   RedisModuleString *value);

Available since: 6.0.0

Used by RedisModuleInfoFunc to add info fields. Each field will be automatically prefixed by <modulename>_. Field names or values must not include \r\n or :.

RedisModule_InfoAddFieldCString

int RedisModule_InfoAddFieldCString(RedisModuleInfoCtx *ctx,
                                    const char *field,
                                    const char *value);

Available since: 6.0.0

See RedisModule_InfoAddFieldString().

RedisModule_InfoAddFieldDouble

int RedisModule_InfoAddFieldDouble(RedisModuleInfoCtx *ctx,
                                   const char *field,
                                   double value);

Available since: 6.0.0

See RedisModule_InfoAddFieldString().

RedisModule_InfoAddFieldLongLong

int RedisModule_InfoAddFieldLongLong(RedisModuleInfoCtx *ctx,
                                     const char *field,
                                     long long value);

Available since: 6.0.0

See RedisModule_InfoAddFieldString().

RedisModule_InfoAddFieldULongLong

int RedisModule_InfoAddFieldULongLong(RedisModuleInfoCtx *ctx,
                                      const char *field,
                                      unsigned long long value);

Available since: 6.0.0

See RedisModule_InfoAddFieldString().

RedisModule_RegisterInfoFunc

int RedisModule_RegisterInfoFunc(RedisModuleCtx *ctx, RedisModuleInfoFunc cb);

Available since: 6.0.0

Registers callback for the INFO command. The callback should add INFO fields by calling the RedisModule_InfoAddField*() functions.

RedisModule_GetServerInfo

RedisModuleServerInfoData *RedisModule_GetServerInfo(RedisModuleCtx *ctx,
                                                     const char *section);

Available since: 6.0.0

Get information about the server similar to the one that returns from the INFO command. This function takes an optional 'section' argument that may be NULL. The return value holds the output and can be used with RedisModule_ServerInfoGetField and alike to get the individual fields. When done, it needs to be freed with RedisModule_FreeServerInfo or with the automatic memory management mechanism if enabled.

RedisModule_FreeServerInfo

void RedisModule_FreeServerInfo(RedisModuleCtx *ctx,
                                RedisModuleServerInfoData *data);

Available since: 6.0.0

Free data created with RedisModule_GetServerInfo(). You need to pass the context pointer 'ctx' only if the dictionary was created using the context instead of passing NULL.

RedisModule_ServerInfoGetField

RedisModuleString *RedisModule_ServerInfoGetField(RedisModuleCtx *ctx,
                                                  RedisModuleServerInfoData *data,
                                                  const char* field);

Available since: 6.0.0

Get the value of a field from data collected with RedisModule_GetServerInfo(). You need to pass the context pointer 'ctx' only if you want to use auto memory mechanism to release the returned string. Return value will be NULL if the field was not found.

RedisModule_ServerInfoGetFieldC

const char *RedisModule_ServerInfoGetFieldC(RedisModuleServerInfoData *data,
                                            const char* field);

Available since: 6.0.0

Similar to RedisModule_ServerInfoGetField, but returns a char* which should not be freed but the caller.

RedisModule_ServerInfoGetFieldSigned

long long RedisModule_ServerInfoGetFieldSigned(RedisModuleServerInfoData *data,
                                               const char* field,
                                               int *out_err);

Available since: 6.0.0

Get the value of a field from data collected with RedisModule_GetServerInfo(). If the field is not found, or is not numerical or out of range, return value will be 0, and the optional out_err argument will be set to REDISMODULE_ERR.

RedisModule_ServerInfoGetFieldUnsigned

unsigned long long RedisModule_ServerInfoGetFieldUnsigned(RedisModuleServerInfoData *data,
                                                          const char* field,
                                                          int *out_err);

Available since: 6.0.0

Get the value of a field from data collected with RedisModule_GetServerInfo(). If the field is not found, or is not numerical or out of range, return value will be 0, and the optional out_err argument will be set to REDISMODULE_ERR.

RedisModule_ServerInfoGetFieldDouble

double RedisModule_ServerInfoGetFieldDouble(RedisModuleServerInfoData *data,
                                            const char* field,
                                            int *out_err);

Available since: 6.0.0

Get the value of a field from data collected with RedisModule_GetServerInfo(). If the field is not found, or is not a double, return value will be 0, and the optional out_err argument will be set to REDISMODULE_ERR.

Modules utility APIs

RedisModule_GetRandomBytes

void RedisModule_GetRandomBytes(unsigned char *dst, size_t len);

Available since: 5.0.0

Return random bytes using SHA1 in counter mode with a /dev/urandom initialized seed. This function is fast so can be used to generate many bytes without any effect on the operating system entropy pool. Currently this function is not thread safe.

RedisModule_GetRandomHexChars

void RedisModule_GetRandomHexChars(char *dst, size_t len);

Available since: 5.0.0

Like RedisModule_GetRandomBytes() but instead of setting the string to random bytes the string is set to random characters in the in the hex charset [0-9a-f].

Modules API exporting / importing

RedisModule_ExportSharedAPI

int RedisModule_ExportSharedAPI(RedisModuleCtx *ctx,
                                const char *apiname,
                                void *func);

Available since: 5.0.4

This function is called by a module in order to export some API with a given name. Other modules will be able to use this API by calling the symmetrical function RedisModule_GetSharedAPI() and casting the return value to the right function pointer.

The function will return REDISMODULE_OK if the name is not already taken, otherwise REDISMODULE_ERR will be returned and no operation will be performed.

IMPORTANT: the apiname argument should be a string literal with static lifetime. The API relies on the fact that it will always be valid in the future.

RedisModule_GetSharedAPI

void *RedisModule_GetSharedAPI(RedisModuleCtx *ctx, const char *apiname);

Available since: 5.0.4

Request an exported API pointer. The return value is just a void pointer that the caller of this function will be required to cast to the right function pointer, so this is a private contract between modules.

If the requested API is not available then NULL is returned. Because modules can be loaded at different times with different order, this function calls should be put inside some module generic API registering step, that is called every time a module attempts to execute a command that requires external APIs: if some API cannot be resolved, the command should return an error.

Here is an example:

int ... myCommandImplementation() {
   if (getExternalAPIs() == 0) {
        reply with an error here if we cannot have the APIs
   }
   // Use the API:
   myFunctionPointer(foo);
}

And the function registerAPI() is:

int getExternalAPIs(void) {
    static int api_loaded = 0;
    if (api_loaded != 0) return 1; // APIs already resolved.

    myFunctionPointer = RedisModule_GetOtherModuleAPI("...");
    if (myFunctionPointer == NULL) return 0;

    return 1;
}

Module Command Filter API

RedisModule_RegisterCommandFilter

RedisModuleCommandFilter *RedisModule_RegisterCommandFilter(RedisModuleCtx *ctx,
                                                            RedisModuleCommandFilterFunc callback,
                                                            int flags);

Available since: 5.0.5

Register a new command filter function.

Command filtering makes it possible for modules to extend Redis by plugging into the execution flow of all commands.

A registered filter gets called before Redis executes any command. This includes both core Redis commands and commands registered by any module. The filter applies in all execution paths including:

  1. Invocation by a client.
  2. Invocation through RedisModule_Call() by any module.
  3. Invocation through Lua 'redis.`call()``.
  4. Replication of a command from a master.

The filter executes in a special filter context, which is different and more limited than a RedisModuleCtx. Because the filter affects any command, it must be implemented in a very efficient way to reduce the performance impact on Redis. All Redis Module API calls that require a valid context (such as RedisModule_Call(), RedisModule_OpenKey(), etc.) are not supported in a filter context.

The RedisModuleCommandFilterCtx can be used to inspect or modify the executed command and its arguments. As the filter executes before Redis begins processing the command, any change will affect the way the command is processed. For example, a module can override Redis commands this way:

  1. Register a MODULE.SET command which implements an extended version of the Redis SET command.
  2. Register a command filter which detects invocation of SET on a specific pattern of keys. Once detected, the filter will replace the first argument from SET to MODULE.SET.
  3. When filter execution is complete, Redis considers the new command name and therefore executes the module's own command.

Note that in the above use case, if MODULE.SET itself uses RedisModule_Call() the filter will be applied on that call as well. If that is not desired, the REDISMODULE_CMDFILTER_NOSELF flag can be set when registering the filter.

The REDISMODULE_CMDFILTER_NOSELF flag prevents execution flows that originate from the module's own RM_Call() from reaching the filter. This flag is effective for all execution flows, including nested ones, as long as the execution begins from the module's command context or a thread-safe context that is associated with a blocking command.

Detached thread-safe contexts are not associated with the module and cannot be protected by this flag.

If multiple filters are registered (by the same or different modules), they are executed in the order of registration.

RedisModule_UnregisterCommandFilter

int RedisModule_UnregisterCommandFilter(RedisModuleCtx *ctx,
                                        RedisModuleCommandFilter *filter);

Available since: 5.0.5

Unregister a command filter.

RedisModule_CommandFilterArgsCount

int RedisModule_CommandFilterArgsCount(RedisModuleCommandFilterCtx *fctx);

Available since: 5.0.5

Return the number of arguments a filtered command has. The number of arguments include the command itself.

RedisModule_CommandFilterArgGet

RedisModuleString *RedisModule_CommandFilterArgGet(RedisModuleCommandFilterCtx *fctx,
                                                   int pos);

Available since: 5.0.5

Return the specified command argument. The first argument (position 0) is the command itself, and the rest are user-provided args.

RedisModule_CommandFilterArgInsert

int RedisModule_CommandFilterArgInsert(RedisModuleCommandFilterCtx *fctx,
                                       int pos,
                                       RedisModuleString *arg);

Available since: 5.0.5

Modify the filtered command by inserting a new argument at the specified position. The specified RedisModuleString argument may be used by Redis after the filter context is destroyed, so it must not be auto-memory allocated, freed or used elsewhere.

RedisModule_CommandFilterArgReplace

int RedisModule_CommandFilterArgReplace(RedisModuleCommandFilterCtx *fctx,
                                        int pos,
                                        RedisModuleString *arg);

Available since: 5.0.5

Modify the filtered command by replacing an existing argument with a new one. The specified RedisModuleString argument may be used by Redis after the filter context is destroyed, so it must not be auto-memory allocated, freed or used elsewhere.

RedisModule_CommandFilterArgDelete

int RedisModule_CommandFilterArgDelete(RedisModuleCommandFilterCtx *fctx,
                                       int pos);

Available since: 5.0.5

Modify the filtered command by deleting an argument at the specified position.

RedisModule_MallocSize

size_t RedisModule_MallocSize(void* ptr);

Available since: 6.0.0

For a given pointer allocated via RedisModule_Alloc() or RedisModule_Realloc(), return the amount of memory allocated for it. Note that this may be different (larger) than the memory we allocated with the allocation calls, since sometimes the underlying allocator will allocate more memory.

RedisModule_MallocSizeString

size_t RedisModule_MallocSizeString(RedisModuleString* str);

Available since: 7.0.0

Same as RedisModule_MallocSize, except it works on RedisModuleString pointers.

RedisModule_MallocSizeDict

size_t RedisModule_MallocSizeDict(RedisModuleDict* dict);

Available since: 7.0.0

Same as RedisModule_MallocSize, except it works on RedisModuleDict pointers. Note that the returned value is only the overhead of the underlying structures, it does not include the allocation size of the keys and values.

RedisModule_GetUsedMemoryRatio

float RedisModule_GetUsedMemoryRatio();

Available since: 6.0.0

Return the a number between 0 to 1 indicating the amount of memory currently used, relative to the Redis "maxmemory" configuration.

  • 0 - No memory limit configured.
  • Between 0 and 1 - The percentage of the memory used normalized in 0-1 range.
  • Exactly 1 - Memory limit reached.
  • Greater 1 - More memory used than the configured limit.

Scanning keyspace and hashes

RedisModule_ScanCursorCreate

RedisModuleScanCursor *RedisModule_ScanCursorCreate();

Available since: 6.0.0

Create a new cursor to be used with RedisModule_Scan

RedisModule_ScanCursorRestart

void RedisModule_ScanCursorRestart(RedisModuleScanCursor *cursor);

Available since: 6.0.0

Restart an existing cursor. The keys will be rescanned.

RedisModule_ScanCursorDestroy

void RedisModule_ScanCursorDestroy(RedisModuleScanCursor *cursor);

Available since: 6.0.0

Destroy the cursor struct.

RedisModule_Scan

int RedisModule_Scan(RedisModuleCtx *ctx,
                     RedisModuleScanCursor *cursor,
                     RedisModuleScanCB fn,
                     void *privdata);

Available since: 6.0.0

Scan API that allows a module to scan all the keys and value in the selected db.

Callback for scan implementation.

void scan_callback(RedisModuleCtx *ctx, RedisModuleString *keyname,
                   RedisModuleKey *key, void *privdata);
  • ctx: the redis module context provided to for the scan.
  • keyname: owned by the caller and need to be retained if used after this function.
  • key: holds info on the key and value, it is provided as best effort, in some cases it might be NULL, in which case the user should (can) use RedisModule_OpenKey() (and CloseKey too). when it is provided, it is owned by the caller and will be free when the callback returns.
  • privdata: the user data provided to RedisModule_Scan().

The way it should be used:

 RedisModuleCursor *c = RedisModule_ScanCursorCreate();
 while(RedisModule_Scan(ctx, c, callback, privateData));
 RedisModule_ScanCursorDestroy(c);

It is also possible to use this API from another thread while the lock is acquired during the actual call to RedisModule_Scan:

 RedisModuleCursor *c = RedisModule_ScanCursorCreate();
 RedisModule_ThreadSafeContextLock(ctx);
 while(RedisModule_Scan(ctx, c, callback, privateData)){
     RedisModule_ThreadSafeContextUnlock(ctx);
     // do some background job
     RedisModule_ThreadSafeContextLock(ctx);
 }
 RedisModule_ScanCursorDestroy(c);

The function will return 1 if there are more elements to scan and 0 otherwise, possibly setting errno if the call failed.

It is also possible to restart an existing cursor using RedisModule_ScanCursorRestart.

IMPORTANT: This API is very similar to the Redis SCAN command from the point of view of the guarantees it provides. This means that the API may report duplicated keys, but guarantees to report at least one time every key that was there from the start to the end of the scanning process.

NOTE: If you do database changes within the callback, you should be aware that the internal state of the database may change. For instance it is safe to delete or modify the current key, but may not be safe to delete any other key. Moreover playing with the Redis keyspace while iterating may have the effect of returning more duplicates. A safe pattern is to store the keys names you want to modify elsewhere, and perform the actions on the keys later when the iteration is complete. However this can cost a lot of memory, so it may make sense to just operate on the current key when possible during the iteration, given that this is safe.

RedisModule_ScanKey

int RedisModule_ScanKey(RedisModuleKey *key,
                        RedisModuleScanCursor *cursor,
                        RedisModuleScanKeyCB fn,
                        void *privdata);

Available since: 6.0.0

Scan api that allows a module to scan the elements in a hash, set or sorted set key

Callback for scan implementation.

void scan_callback(RedisModuleKey *key, RedisModuleString* field, RedisModuleString* value, void *privdata);
  • key - the redis key context provided to for the scan.
  • field - field name, owned by the caller and need to be retained if used after this function.
  • value - value string or NULL for set type, owned by the caller and need to be retained if used after this function.
  • privdata - the user data provided to RedisModule_ScanKey.

The way it should be used:

 RedisModuleCursor *c = RedisModule_ScanCursorCreate();
 RedisModuleKey *key = RedisModule_OpenKey(...)
 while(RedisModule_ScanKey(key, c, callback, privateData));
 RedisModule_CloseKey(key);
 RedisModule_ScanCursorDestroy(c);

It is also possible to use this API from another thread while the lock is acquired during the actual call to RedisModule_ScanKey, and re-opening the key each time:

 RedisModuleCursor *c = RedisModule_ScanCursorCreate();
 RedisModule_ThreadSafeContextLock(ctx);
 RedisModuleKey *key = RedisModule_OpenKey(...)
 while(RedisModule_ScanKey(ctx, c, callback, privateData)){
     RedisModule_CloseKey(key);
     RedisModule_ThreadSafeContextUnlock(ctx);
     // do some background job
     RedisModule_ThreadSafeContextLock(ctx);
     RedisModuleKey *key = RedisModule_OpenKey(...)
 }
 RedisModule_CloseKey(key);
 RedisModule_ScanCursorDestroy(c);

The function will return 1 if there are more elements to scan and 0 otherwise, possibly setting errno if the call failed. It is also possible to restart an existing cursor using RedisModule_ScanCursorRestart.

NOTE: Certain operations are unsafe while iterating the object. For instance while the API guarantees to return at least one time all the elements that are present in the data structure consistently from the start to the end of the iteration (see HSCAN and similar commands documentation), the more you play with the elements, the more duplicates you may get. In general deleting the current element of the data structure is safe, while removing the key you are iterating is not safe.

Module fork API

RedisModule_Fork

int RedisModule_Fork(RedisModuleForkDoneHandler cb, void *user_data);

Available since: 6.0.0

Create a background child process with the current frozen snapshot of the main process where you can do some processing in the background without affecting / freezing the traffic and no need for threads and GIL locking. Note that Redis allows for only one concurrent fork. When the child wants to exit, it should call RedisModule_ExitFromChild. If the parent wants to kill the child it should call RedisModule_KillForkChild The done handler callback will be executed on the parent process when the child existed (but not when killed) Return: -1 on failure, on success the parent process will get a positive PID of the child, and the child process will get 0.

RedisModule_SendChildHeartbeat

void RedisModule_SendChildHeartbeat(double progress);

Available since: 6.2.0

The module is advised to call this function from the fork child once in a while, so that it can report progress and COW memory to the parent which will be reported in INFO. The progress argument should between 0 and 1, or -1 when not available.

RedisModule_ExitFromChild

int RedisModule_ExitFromChild(int retcode);

Available since: 6.0.0

Call from the child process when you want to terminate it. retcode will be provided to the done handler executed on the parent process.

RedisModule_KillForkChild

int RedisModule_KillForkChild(int child_pid);

Available since: 6.0.0

Can be used to kill the forked child process from the parent process. child_pid would be the return value of RedisModule_Fork.

Server hooks implementation

RedisModule_SubscribeToServerEvent

int RedisModule_SubscribeToServerEvent(RedisModuleCtx *ctx,
                                       RedisModuleEvent event,
                                       RedisModuleEventCallback callback);

Available since: 6.0.0

Register to be notified, via a callback, when the specified server event happens. The callback is called with the event as argument, and an additional argument which is a void pointer and should be cased to a specific type that is event-specific (but many events will just use NULL since they do not have additional information to pass to the callback).

If the callback is NULL and there was a previous subscription, the module will be unsubscribed. If there was a previous subscription and the callback is not null, the old callback will be replaced with the new one.

The callback must be of this type:

int (*RedisModuleEventCallback)(RedisModuleCtx *ctx,
                                RedisModuleEvent eid,
                                uint64_t subevent,
                                void *data);

The 'ctx' is a normal Redis module context that the callback can use in order to call other modules APIs. The 'eid' is the event itself, this is only useful in the case the module subscribed to multiple events: using the 'id' field of this structure it is possible to check if the event is one of the events we registered with this callback. The 'subevent' field depends on the event that fired.

Finally the 'data' pointer may be populated, only for certain events, with more relevant data.

Here is a list of events you can use as 'eid' and related sub events:

  • RedisModuleEvent_ReplicationRoleChanged:

    This event is called when the instance switches from master to replica or the other way around, however the event is also called when the replica remains a replica but starts to replicate with a different master.

    The following sub events are available:

    • REDISMODULE_SUBEVENT_REPLROLECHANGED_NOW_MASTER
    • REDISMODULE_SUBEVENT_REPLROLECHANGED_NOW_REPLICA

    The 'data' field can be casted by the callback to a RedisModuleReplicationInfo structure with the following fields:

      int master; // true if master, false if replica
      char *masterhost; // master instance hostname for NOW_REPLICA
      int masterport; // master instance port for NOW_REPLICA
      char *replid1; // Main replication ID
      char *replid2; // Secondary replication ID
      uint64_t repl1_offset; // Main replication offset
      uint64_t repl2_offset; // Offset of replid2 validity
    
  • RedisModuleEvent_Persistence

    This event is called when RDB saving or AOF rewriting starts and ends. The following sub events are available:

    • REDISMODULE_SUBEVENT_PERSISTENCE_RDB_START
    • REDISMODULE_SUBEVENT_PERSISTENCE_AOF_START
    • REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_RDB_START
    • REDISMODULE_SUBEVENT_PERSISTENCE_SYNC_AOF_START
    • REDISMODULE_SUBEVENT_PERSISTENCE_ENDED
    • REDISMODULE_SUBEVENT_PERSISTENCE_FAILED

    The above events are triggered not just when the user calls the relevant commands like BGSAVE, but also when a saving operation or AOF rewriting occurs because of internal server triggers. The SYNC_RDB_START sub events are happening in the foreground due to SAVE command, FLUSHALL, or server shutdown, and the other RDB and AOF sub events are executed in a background fork child, so any action the module takes can only affect the generated AOF or RDB, but will not be reflected in the parent process and affect connected clients and commands. Also note that the AOF_START sub event may end up saving RDB content in case of an AOF with rdb-preamble.

  • RedisModuleEvent_FlushDB

    The FLUSHALL, FLUSHDB or an internal flush (for instance because of replication, after the replica synchronization) happened. The following sub events are available:

    • REDISMODULE_SUBEVENT_FLUSHDB_START
    • REDISMODULE_SUBEVENT_FLUSHDB_END

    The data pointer can be casted to a RedisModuleFlushInfo structure with the following fields:

      int32_t async;  // True if the flush is done in a thread.
                      // See for instance FLUSHALL ASYNC.
                      // In this case the END callback is invoked
                      // immediately after the database is put
                      // in the free list of the thread.
      int32_t dbnum;  // Flushed database number, -1 for all the DBs
                      // in the case of the FLUSHALL operation.
    

    The start event is called before the operation is initiated, thus allowing the callback to call DBSIZE or other operation on the yet-to-free keyspace.

  • RedisModuleEvent_Loading

    Called on loading operations: at startup when the server is started, but also after a first synchronization when the replica is loading the RDB file from the master. The following sub events are available:

    • REDISMODULE_SUBEVENT_LOADING_RDB_START
    • REDISMODULE_SUBEVENT_LOADING_AOF_START
    • REDISMODULE_SUBEVENT_LOADING_REPL_START
    • REDISMODULE_SUBEVENT_LOADING_ENDED
    • REDISMODULE_SUBEVENT_LOADING_FAILED

    Note that AOF loading may start with an RDB data in case of rdb-preamble, in which case you'll only receive an AOF_START event.

  • RedisModuleEvent_ClientChange

    Called when a client connects or disconnects. The data pointer can be casted to a RedisModuleClientInfo structure, documented in RedisModule_GetClientInfoById(). The following sub events are available:

    • REDISMODULE_SUBEVENT_CLIENT_CHANGE_CONNECTED
    • REDISMODULE_SUBEVENT_CLIENT_CHANGE_DISCONNECTED
  • RedisModuleEvent_Shutdown

    The server is shutting down. No subevents are available.

  • RedisModuleEvent_ReplicaChange

    This event is called when the instance (that can be both a master or a replica) get a new online replica, or lose a replica since it gets disconnected. The following sub events are available:

    • REDISMODULE_SUBEVENT_REPLICA_CHANGE_ONLINE
    • REDISMODULE_SUBEVENT_REPLICA_CHANGE_OFFLINE

    No additional information is available so far: future versions of Redis will have an API in order to enumerate the replicas connected and their state.

  • RedisModuleEvent_CronLoop

    This event is called every time Redis calls the serverCron() function in order to do certain bookkeeping. Modules that are required to do operations from time to time may use this callback. Normally Redis calls this function 10 times per second, but this changes depending on the "hz" configuration. No sub events are available.

    The data pointer can be casted to a RedisModuleCronLoop structure with the following fields:

      int32_t hz;  // Approximate number of events per second.
    
  • RedisModuleEvent_MasterLinkChange

    This is called for replicas in order to notify when the replication link becomes functional (up) with our master, or when it goes down. Note that the link is not considered up when we just connected to the master, but only if the replication is happening correctly. The following sub events are available:

    • REDISMODULE_SUBEVENT_MASTER_LINK_UP
    • REDISMODULE_SUBEVENT_MASTER_LINK_DOWN
  • RedisModuleEvent_ModuleChange

    This event is called when a new module is loaded or one is unloaded. The following sub events are available:

    • REDISMODULE_SUBEVENT_MODULE_LOADED
    • REDISMODULE_SUBEVENT_MODULE_UNLOADED

    The data pointer can be casted to a RedisModuleModuleChange structure with the following fields:

      const char* module_name;  // Name of module loaded or unloaded.
      int32_t module_version;  // Module version.
    
  • RedisModuleEvent_LoadingProgress

    This event is called repeatedly called while an RDB or AOF file is being loaded. The following sub events are available:

    • REDISMODULE_SUBEVENT_LOADING_PROGRESS_RDB
    • REDISMODULE_SUBEVENT_LOADING_PROGRESS_AOF

    The data pointer can be casted to a RedisModuleLoadingProgress structure with the following fields:

      int32_t hz;  // Approximate number of events per second.
      int32_t progress;  // Approximate progress between 0 and 1024,
                         // or -1 if unknown.
    
  • RedisModuleEvent_SwapDB

    This event is called when a SWAPDB command has been successfully Executed. For this event call currently there is no subevents available.

    The data pointer can be casted to a RedisModuleSwapDbInfo structure with the following fields:

      int32_t dbnum_first;    // Swap Db first dbnum
      int32_t dbnum_second;   // Swap Db second dbnum
    
  • RedisModuleEvent_ReplBackup

    WARNING: Replication Backup events are deprecated since Redis 7.0 and are never fired. See RedisModuleEvent_ReplAsyncLoad for understanding how Async Replication Loading events are now triggered when repl-diskless-load is set to swapdb.

    Called when repl-diskless-load config is set to swapdb, And redis needs to backup the current database for the possibility to be restored later. A module with global data and maybe with aux_load and aux_save callbacks may need to use this notification to backup / restore / discard its globals. The following sub events are available:

    • REDISMODULE_SUBEVENT_REPL_BACKUP_CREATE
    • REDISMODULE_SUBEVENT_REPL_BACKUP_RESTORE
    • REDISMODULE_SUBEVENT_REPL_BACKUP_DISCARD
  • RedisModuleEvent_ReplAsyncLoad

    Called when repl-diskless-load config is set to swapdb and a replication with a master of same data set history (matching replication ID) occurs. In which case redis serves current data set while loading new database in memory from socket. Modules must have declared they support this mechanism in order to activate it, through REDISMODULE_OPTIONS_HANDLE_REPL_ASYNC_LOAD flag. The following sub events are available:

    • REDISMODULE_SUBEVENT_REPL_ASYNC_LOAD_STARTED
    • REDISMODULE_SUBEVENT_REPL_ASYNC_LOAD_ABORTED
    • REDISMODULE_SUBEVENT_REPL_ASYNC_LOAD_COMPLETED
  • RedisModuleEvent_ForkChild

    Called when a fork child (AOFRW, RDBSAVE, module fork...) is born/dies The following sub events are available:

    • REDISMODULE_SUBEVENT_FORK_CHILD_BORN
    • REDISMODULE_SUBEVENT_FORK_CHILD_DIED
  • RedisModuleEvent_EventLoop

    Called on each event loop iteration, once just before the event loop goes to sleep or just after it wakes up. The following sub events are available:

    • REDISMODULE_SUBEVENT_EVENTLOOP_BEFORE_SLEEP
    • REDISMODULE_SUBEVENT_EVENTLOOP_AFTER_SLEEP
  • RedisModule_Event_Config

    Called when a configuration event happens The following sub events are available:

    • REDISMODULE_SUBEVENT_CONFIG_CHANGE

    The data pointer can be casted to a RedisModuleConfigChange structure with the following fields:

      const char **config_names; // An array of C string pointers containing the
                                 // name of each modified configuration item 
      uint32_t num_changes;      // The number of elements in the config_names array
    

The function returns REDISMODULE_OK if the module was successfully subscribed for the specified event. If the API is called from a wrong context or unsupported event is given then REDISMODULE_ERR is returned.

RedisModule_IsSubEventSupported

int RedisModule_IsSubEventSupported(RedisModuleEvent event, int64_t subevent);

Available since: 6.0.9

For a given server event and subevent, return zero if the subevent is not supported and non-zero otherwise.

Module Configurations API

RedisModule_RegisterStringConfig

int RedisModule_RegisterStringConfig(RedisModuleCtx *ctx,
                                     const char *name,
                                     const char *default_val,
                                     unsigned int flags,
                                     RedisModuleConfigGetStringFunc getfn,
                                     RedisModuleConfigSetStringFunc setfn,
                                     RedisModuleConfigApplyFunc applyfn,
                                     void *privdata);

Available since: 7.0.0

Create a string config that Redis users can interact with via the Redis config file, CONFIG SET, CONFIG GET, and CONFIG REWRITE commands.

The actual config value is owned by the module, and the getfn, setfn and optional applyfn callbacks that are provided to Redis in order to access or manipulate the value. The getfn callback retrieves the value from the module, while the setfn callback provides a value to be stored into the module config. The optional applyfn callback is called after a CONFIG SET command modified one or more configs using the setfn callback and can be used to atomically apply a config after several configs were changed together. If there are multiple configs with applyfn callbacks set by a single CONFIG SET command, they will be deduplicated if their applyfn function and privdata pointers are identical, and the callback will only be run once. Both the setfn and applyfn can return an error if the provided value is invalid or cannot be used. The config also declares a type for the value that is validated by Redis and provided to the module. The config system provides the following types:

  • Redis String: Binary safe string data.
  • Enum: One of a finite number of string tokens, provided during registration.
  • Numeric: 64 bit signed integer, which also supports min and max values.
  • Bool: Yes or no value.

The setfn callback is expected to return REDISMODULE_OK when the value is successfully applied. It can also return REDISMODULE_ERR if the value can't be applied, and the *err pointer can be set with a RedisModuleString error message to provide to the client. This RedisModuleString will be freed by redis after returning from the set callback.

All configs are registered with a name, a type, a default value, private data that is made available in the callbacks, as well as several flags that modify the behavior of the config. The name must only contain alphanumeric characters or dashes. The supported flags are:

  • REDISMODULE_CONFIG_DEFAULT: The default flags for a config. This creates a config that can be modified after startup.
  • REDISMODULE_CONFIG_IMMUTABLE: This config can only be provided loading time.
  • REDISMODULE_CONFIG_SENSITIVE: The value stored in this config is redacted from all logging.
  • REDISMODULE_CONFIG_HIDDEN: The name is hidden from CONFIG GET with pattern matching.
  • REDISMODULE_CONFIG_PROTECTED: This config will be only be modifiable based off the value of enable-protected-configs.
  • REDISMODULE_CONFIG_DENY_LOADING: This config is not modifiable while the server is loading data.
  • REDISMODULE_CONFIG_MEMORY: For numeric configs, this config will convert data unit notations into their byte equivalent.
  • REDISMODULE_CONFIG_BITFLAGS: For enum configs, this config will allow multiple entries to be combined as bit flags.

Default values are used on startup to set the value if it is not provided via the config file or command line. Default values are also used to compare to on a config rewrite.

Notes:

  1. On string config sets that the string passed to the set callback will be freed after execution and the module must retain it.
  2. On string config gets the string will not be consumed and will be valid after execution.

Example implementation:

RedisModuleString *strval;
int adjustable = 1;
RedisModuleString *getStringConfigCommand(const char *name, void *privdata) {
    return strval;
}

int setStringConfigCommand(const char *name, RedisModuleString *new, void *privdata, RedisModuleString **err) {
   if (adjustable) {
       RedisModule_Free(strval);
       RedisModule_RetainString(NULL, new);
       strval = new;
       return REDISMODULE_OK;
   }
   *err = RedisModule_CreateString(NULL, "Not adjustable.", 15);
   return REDISMODULE_ERR;
}
...
RedisModule_RegisterStringConfig(ctx, "string", NULL, REDISMODULE_CONFIG_DEFAULT, getStringConfigCommand, setStringConfigCommand, NULL, NULL);

If the registration fails, REDISMODULE_ERR is returned and one of the following errno is set:

  • EINVAL: The provided flags are invalid for the registration or the name of the config contains invalid characters.
  • EALREADY: The provided configuration name is already used.

RedisModule_RegisterBoolConfig

int RedisModule_RegisterBoolConfig(RedisModuleCtx *ctx,
                                   const char *name,
                                   int default_val,
                                   unsigned int flags,
                                   RedisModuleConfigGetBoolFunc getfn,
                                   RedisModuleConfigSetBoolFunc setfn,
                                   RedisModuleConfigApplyFunc applyfn,
                                   void *privdata);

Available since: 7.0.0

Create a bool config that server clients can interact with via the CONFIG SET, CONFIG GET, and CONFIG REWRITE commands. See RedisModule_RegisterStringConfig for detailed information about configs.

RedisModule_RegisterEnumConfig

int RedisModule_RegisterEnumConfig(RedisModuleCtx *ctx,
                                   const char *name,
                                   int default_val,
                                   unsigned int flags,
                                   const char **enum_values,
                                   const int *int_values,
                                   int num_enum_vals,
                                   RedisModuleConfigGetEnumFunc getfn,
                                   RedisModuleConfigSetEnumFunc setfn,
                                   RedisModuleConfigApplyFunc applyfn,
                                   void *privdata);

Available since: 7.0.0

Create an enum config that server clients can interact with via the CONFIG SET, CONFIG GET, and CONFIG REWRITE commands. Enum configs are a set of string tokens to corresponding integer values, where the string value is exposed to Redis clients but the value passed Redis and the module is the integer value. These values are defined in enum_values, an array of null-terminated c strings, and int_vals, an array of enum values who has an index partner in enum_values. Example Implementation: const char *enum_vals[3] = {"first", "second", "third"}; const int int_vals[3] = {0, 2, 4}; int enum_val = 0;

 int getEnumConfigCommand(const char *name, void *privdata) {
     return enum_val;
 }
  
 int setEnumConfigCommand(const char *name, int val, void *privdata, const char **err) {
     enum_val = val;
     return REDISMODULE_OK;
 }
 ...
 RedisModule_RegisterEnumConfig(ctx, "enum", 0, REDISMODULE_CONFIG_DEFAULT, enum_vals, int_vals, 3, getEnumConfigCommand, setEnumConfigCommand, NULL, NULL);

See RedisModule_RegisterStringConfig for detailed general information about configs.

RedisModule_RegisterNumericConfig

int RedisModule_RegisterNumericConfig(RedisModuleCtx *ctx,
                                      const char *name,
                                      long long default_val,
                                      unsigned int flags,
                                      long long min,
                                      long long max,
                                      RedisModuleConfigGetNumericFunc getfn,
                                      RedisModuleConfigSetNumericFunc setfn,
                                      RedisModuleConfigApplyFunc applyfn,
                                      void *privdata);

Available since: 7.0.0

Create an integer config that server clients can interact with via the CONFIG SET, CONFIG GET, and CONFIG REWRITE commands. See RedisModule_RegisterStringConfig for detailed information about configs.

RedisModule_LoadConfigs

int RedisModule_LoadConfigs(RedisModuleCtx *ctx);

Available since: 7.0.0

Applies all pending configurations on the module load. This should be called after all of the configurations have been registered for the module inside of RedisModule_OnLoad. This API needs to be called when configurations are provided in either MODULE LOADEX or provided as startup arguments.

Key eviction API

RedisModule_SetLRU

int RedisModule_SetLRU(RedisModuleKey *key, mstime_t lru_idle);

Available since: 6.0.0

Set the key last access time for LRU based eviction. not relevant if the servers's maxmemory policy is LFU based. Value is idle time in milliseconds. returns REDISMODULE_OK if the LRU was updated, REDISMODULE_ERR otherwise.

RedisModule_GetLRU

int RedisModule_GetLRU(RedisModuleKey *key, mstime_t *lru_idle);

Available since: 6.0.0

Gets the key last access time. Value is idletime in milliseconds or -1 if the server's eviction policy is LFU based. returns REDISMODULE_OK if when key is valid.

RedisModule_SetLFU

int RedisModule_SetLFU(RedisModuleKey *key, long long lfu_freq);

Available since: 6.0.0

Set the key access frequency. only relevant if the server's maxmemory policy is LFU based. The frequency is a logarithmic counter that provides an indication of the access frequencyonly (must be <= 255). returns REDISMODULE_OK if the LFU was updated, REDISMODULE_ERR otherwise.

RedisModule_GetLFU

int RedisModule_GetLFU(RedisModuleKey *key, long long *lfu_freq);

Available since: 6.0.0

Gets the key access frequency or -1 if the server's eviction policy is not LFU based. returns REDISMODULE_OK if when key is valid.

Miscellaneous APIs

RedisModule_GetContextFlagsAll

int RedisModule_GetContextFlagsAll();

Available since: 6.0.9

Returns the full ContextFlags mask, using the return value the module can check if a certain set of flags are supported by the redis server version in use. Example:

   int supportedFlags = RM_GetContextFlagsAll();
   if (supportedFlags & REDISMODULE_CTX_FLAGS_MULTI) {
         // REDISMODULE_CTX_FLAGS_MULTI is supported
   } else{
         // REDISMODULE_CTX_FLAGS_MULTI is not supported
   }

RedisModule_GetKeyspaceNotificationFlagsAll

int RedisModule_GetKeyspaceNotificationFlagsAll();

Available since: 6.0.9

Returns the full KeyspaceNotification mask, using the return value the module can check if a certain set of flags are supported by the redis server version in use. Example:

   int supportedFlags = RM_GetKeyspaceNotificationFlagsAll();
   if (supportedFlags & REDISMODULE_NOTIFY_LOADED) {
         // REDISMODULE_NOTIFY_LOADED is supported
   } else{
         // REDISMODULE_NOTIFY_LOADED is not supported
   }

RedisModule_GetServerVersion

int RedisModule_GetServerVersion();

Available since: 6.0.9

Return the redis version in format of 0x00MMmmpp. Example for 6.0.7 the return value will be 0x00060007.

RedisModule_GetTypeMethodVersion

int RedisModule_GetTypeMethodVersion();

Available since: 6.2.0

Return the current redis-server runtime value of REDISMODULE_TYPE_METHOD_VERSION. You can use that when calling RedisModule_CreateDataType to know which fields of RedisModuleTypeMethods are gonna be supported and which will be ignored.

RedisModule_ModuleTypeReplaceValue

int RedisModule_ModuleTypeReplaceValue(RedisModuleKey *key,
                                       moduleType *mt,
                                       void *new_value,
                                       void **old_value);

Available since: 6.0.0

Replace the value assigned to a module type.

The key must be open for writing, have an existing value, and have a moduleType that matches the one specified by the caller.

Unlike RedisModule_ModuleTypeSetValue() which will free the old value, this function simply swaps the old value with the new value.

The function returns REDISMODULE_OK on success, REDISMODULE_ERR on errors such as:

  1. Key is not opened for writing.
  2. Key is not a module data type key.
  3. Key is a module datatype other than 'mt'.

If old_value is non-NULL, the old value is returned by reference.

RedisModule_GetCommandKeysWithFlags

int *RedisModule_GetCommandKeysWithFlags(RedisModuleCtx *ctx,
                                         RedisModuleString **argv,
                                         int argc,
                                         int *num_keys,
                                         int **out_flags);

Available since: 7.0.0

For a specified command, parse its arguments and return an array that contains the indexes of all key name arguments. This function is essentially a more efficient way to do COMMAND GETKEYS.

The out_flags argument is optional, and can be set to NULL. When provided it is filled with REDISMODULE_CMD_KEY_ flags in matching indexes with the key indexes of the returned array.

A NULL return value indicates the specified command has no keys, or an error condition. Error conditions are indicated by setting errno as follows:

  • ENOENT: Specified command does not exist.
  • EINVAL: Invalid command arity specified.

NOTE: The returned array is not a Redis Module object so it does not get automatically freed even when auto-memory is used. The caller must explicitly call RedisModule_Free() to free it, same as the out_flags pointer if used.

RedisModule_GetCommandKeys

int *RedisModule_GetCommandKeys(RedisModuleCtx *ctx,
                                RedisModuleString **argv,
                                int argc,
                                int *num_keys);

Available since: 6.0.9

Identinal to RedisModule_GetCommandKeysWithFlags when flags are not needed.

RedisModule_GetCurrentCommandName

const char *RedisModule_GetCurrentCommandName(RedisModuleCtx *ctx);

Available since: 6.2.5

Return the name of the command currently running

Defrag API

RedisModule_RegisterDefragFunc

int RedisModule_RegisterDefragFunc(RedisModuleCtx *ctx,
                                   RedisModuleDefragFunc cb);

Available since: 6.2.0

Register a defrag callback for global data, i.e. anything that the module may allocate that is not tied to a specific data type.

RedisModule_DefragShouldStop

int RedisModule_DefragShouldStop(RedisModuleDefragCtx *ctx);

Available since: 6.2.0

When the data type defrag callback iterates complex structures, this function should be called periodically. A zero (false) return indicates the callback may continue its work. A non-zero value (true) indicates it should stop.

When stopped, the callback may use RedisModule_DefragCursorSet() to store its position so it can later use RedisModule_DefragCursorGet() to resume defragging.

When stopped and more work is left to be done, the callback should return 1. Otherwise, it should return 0.

NOTE: Modules should consider the frequency in which this function is called, so it generally makes sense to do small batches of work in between calls.

RedisModule_DefragCursorSet

int RedisModule_DefragCursorSet(RedisModuleDefragCtx *ctx,
                                unsigned long cursor);

Available since: 6.2.0

Store an arbitrary cursor value for future re-use.

This should only be called if RedisModule_DefragShouldStop() has returned a non-zero value and the defrag callback is about to exit without fully iterating its data type.

This behavior is reserved to cases where late defrag is performed. Late defrag is selected for keys that implement the free_effort callback and return a free_effort value that is larger than the defrag 'active-defrag-max-scan-fields' configuration directive.

Smaller keys, keys that do not implement free_effort or the global defrag callback are not called in late-defrag mode. In those cases, a call to this function will return REDISMODULE_ERR.

The cursor may be used by the module to represent some progress into the module's data type. Modules may also store additional cursor-related information locally and use the cursor as a flag that indicates when traversal of a new key begins. This is possible because the API makes a guarantee that concurrent defragmentation of multiple keys will not be performed.

RedisModule_DefragCursorGet

int RedisModule_DefragCursorGet(RedisModuleDefragCtx *ctx,
                                unsigned long *cursor);

Available since: 6.2.0

Fetch a cursor value that has been previously stored using RedisModule_DefragCursorSet().

If not called for a late defrag operation, REDISMODULE_ERR will be returned and the cursor should be ignored. See RedisModule_DefragCursorSet() for more details on defrag cursors.

RedisModule_DefragAlloc

void *RedisModule_DefragAlloc(RedisModuleDefragCtx *ctx, void *ptr);

Available since: 6.2.0

Defrag a memory allocation previously allocated by RedisModule_Alloc, RedisModule_Calloc, etc. The defragmentation process involves allocating a new memory block and copying the contents to it, like realloc().

If defragmentation was not necessary, NULL is returned and the operation has no other effect.

If a non-NULL value is returned, the caller should use the new pointer instead of the old one and update any reference to the old pointer, which must not be used again.

RedisModule_DefragRedisModuleString

RedisModuleString *RedisModule_DefragRedisModuleString(RedisModuleDefragCtx *ctx,
                                                       RedisModuleString *str);

Available since: 6.2.0

Defrag a RedisModuleString previously allocated by RedisModule_Alloc, RedisModule_Calloc, etc. See RedisModule_DefragAlloc() for more information on how the defragmentation process works.

NOTE: It is only possible to defrag strings that have a single reference. Typically this means strings retained with RedisModule_RetainString or RedisModule_HoldString may not be defragmentable. One exception is command argvs which, if retained by the module, will end up with a single reference (because the reference on the Redis side is dropped as soon as the command callback returns).

RedisModule_GetKeyNameFromDefragCtx

const RedisModuleString *RedisModule_GetKeyNameFromDefragCtx(RedisModuleDefragCtx *ctx);

Available since: 7.0.0

Returns the name of the key currently being processed. There is no guarantee that the key name is always available, so this may return NULL.

RedisModule_GetDbIdFromDefragCtx

int RedisModule_GetDbIdFromDefragCtx(RedisModuleDefragCtx *ctx);

Available since: 7.0.0

Returns the database id of the key currently being processed. There is no guarantee that this info is always available, so this may return -1.

Function index

10.2 - Redis modules and blocking commands

How to implement blocking commands in a Redis module

Redis has a few blocking commands among the built-in set of commands. One of the most used is BLPOP (or the symmetric BRPOP) which blocks waiting for elements arriving in a list.

The interesting fact about blocking commands is that they do not block the whole server, but just the client calling them. Usually the reason to block is that we expect some external event to happen: this can be some change in the Redis data structures like in the BLPOP case, a long computation happening in a thread, to receive some data from the network, and so forth.

Redis modules have the ability to implement blocking commands as well, this documentation shows how the API works and describes a few patterns that can be used in order to model blocking commands.

NOTE: This API is currently experimental, so it can only be used if the macro REDISMODULE_EXPERIMENTAL_API is defined. This is required because these calls are still not in their final stage of design, so may change in the future, certain parts may be deprecated and so forth.

To use this part of the modules API include the modules header like that:

#define REDISMODULE_EXPERIMENTAL_API
#include "redismodule.h"

How blocking and resuming works.

Note: You may want to check the helloblock.c example in the Redis source tree inside the src/modules directory, for a simple to understand example on how the blocking API is applied.

In Redis modules, commands are implemented by callback functions that are invoked by the Redis core when the specific command is called by the user. Normally the callback terminates its execution sending some reply to the client. Using the following function instead, the function implementing the module command may request that the client is put into the blocked state:

RedisModuleBlockedClient *RedisModule_BlockClient(RedisModuleCtx *ctx, RedisModuleCmdFunc reply_callback, RedisModuleCmdFunc timeout_callback, void (*free_privdata)(void*), long long timeout_ms);

The function returns a RedisModuleBlockedClient object, which is later used in order to unblock the client. The arguments have the following meaning:

  • ctx is the command execution context as usually in the rest of the API.
  • reply_callback is the callback, having the same prototype of a normal command function, that is called when the client is unblocked in order to return a reply to the client.
  • timeout_callback is the callback, having the same prototype of a normal command function that is called when the client reached the ms timeout.
  • free_privdata is the callback that is called in order to free the private data. Private data is a pointer to some data that is passed between the API used to unblock the client, to the callback that will send the reply to the client. We'll see how this mechanism works later in this document.
  • ms is the timeout in milliseconds. When the timeout is reached, the timeout callback is called and the client is automatically aborted.

Once a client is blocked, it can be unblocked with the following API:

int RedisModule_UnblockClient(RedisModuleBlockedClient *bc, void *privdata);

The function takes as argument the blocked client object returned by the previous call to RedisModule_BlockClient(), and unblock the client. Immediately before the client gets unblocked, the reply_callback function specified when the client was blocked is called: this function will have access to the privdata pointer used here.

IMPORTANT: The above function is thread safe, and can be called from within a thread doing some work in order to implement the command that blocked the client.

The privdata data will be freed automatically using the free_privdata callback when the client is unblocked. This is useful since the reply callback may never be called in case the client timeouts or disconnects from the server, so it's important that it's up to an external function to have the responsibility to free the data passed if needed.

To better understand how the API works, we can imagine writing a command that blocks a client for one second, and then send as reply "Hello!".

Note: arity checks and other non important things are not implemented int his command, in order to take the example simple.

int Example_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv,
                         int argc)
{
    RedisModuleBlockedClient *bc =
        RedisModule_BlockClient(ctx,reply_func,timeout_func,NULL,0);

    pthread_t tid;
    pthread_create(&tid,NULL,threadmain,bc);

    return REDISMODULE_OK;
}

void *threadmain(void *arg) {
    RedisModuleBlockedClient *bc = arg;

    sleep(1); /* Wait one second and unblock. */
    RedisModule_UnblockClient(bc,NULL);
}

The above command blocks the client ASAP, spawning a thread that will wait a second and will unblock the client. Let's check the reply and timeout callbacks, which are in our case very similar, since they just reply the client with a different reply type.

int reply_func(RedisModuleCtx *ctx, RedisModuleString **argv,
               int argc)
{
    return RedisModule_ReplyWithSimpleString(ctx,"Hello!");
}

int timeout_func(RedisModuleCtx *ctx, RedisModuleString **argv,
               int argc)
{
    return RedisModule_ReplyWithNull(ctx);
}

The reply callback just sends the "Hello!" string to the client. The important bit here is that the reply callback is called when the client is unblocked from the thread.

The timeout command returns NULL, as it often happens with actual Redis blocking commands timing out.

Passing reply data when unblocking

The above example is simple to understand but lacks an important real world aspect of an actual blocking command implementation: often the reply function will need to know what to reply to the client, and this information is often provided as the client is unblocked.

We could modify the above example so that the thread generates a random number after waiting one second. You can think at it as an actually expansive operation of some kind. Then this random number can be passed to the reply function so that we return it to the command caller. In order to make this working, we modify the functions as follow:

void *threadmain(void *arg) {
    RedisModuleBlockedClient *bc = arg;

    sleep(1); /* Wait one second and unblock. */

    long *mynumber = RedisModule_Alloc(sizeof(long));
    *mynumber = rand();
    RedisModule_UnblockClient(bc,mynumber);
}

As you can see, now the unblocking call is passing some private data, that is the mynumber pointer, to the reply callback. In order to obtain this private data, the reply callback will use the following function:

void *RedisModule_GetBlockedClientPrivateData(RedisModuleCtx *ctx);

So our reply callback is modified like that:

int reply_func(RedisModuleCtx *ctx, RedisModuleString **argv,
               int argc)
{
    long *mynumber = RedisModule_GetBlockedClientPrivateData(ctx);
    /* IMPORTANT: don't free mynumber here, but in the
     * free privdata callback. */
    return RedisModule_ReplyWithLongLong(ctx,mynumber);
}

Note that we also need to pass a free_privdata function when blocking the client with RedisModule_BlockClient(), since the allocated long value must be freed. Our callback will look like the following:

void free_privdata(void *privdata) {
    RedisModule_Free(privdata);
}

NOTE: It is important to stress that the private data is best freed in the free_privdata callback because the reply function may not be called if the client disconnects or timeout.

Also note that the private data is also accessible from the timeout callback, always using the GetBlockedClientPrivateData() API.

Aborting the blocking of a client

One problem that sometimes arises is that we need to allocate resources in order to implement the non blocking command. So we block the client, then, for example, try to create a thread, but the thread creation function returns an error. What to do in such a condition in order to recover? We don't want to take the client blocked, nor we want to call UnblockClient() because this will trigger the reply callback to be called.

In this case the best thing to do is to use the following function:

int RedisModule_AbortBlock(RedisModuleBlockedClient *bc);

Practically this is how to use it:

int Example_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv,
                         int argc)
{
    RedisModuleBlockedClient *bc =
        RedisModule_BlockClient(ctx,reply_func,timeout_func,NULL,0);

    pthread_t tid;
    if (pthread_create(&tid,NULL,threadmain,bc) != 0) {
        RedisModule_AbortBlock(bc);
        RedisModule_ReplyWithError(ctx,"Sorry can't create a thread");
    }

    return REDISMODULE_OK;
}

The client will be unblocked but the reply callback will not be called.

Implementing the command, reply and timeout callback using a single function

The following functions can be used in order to implement the reply and callback with the same function that implements the primary command function:

int RedisModule_IsBlockedReplyRequest(RedisModuleCtx *ctx);
int RedisModule_IsBlockedTimeoutRequest(RedisModuleCtx *ctx);

So I could rewrite the example command without using a separated reply and timeout callback:

int Example_RedisCommand(RedisModuleCtx *ctx, RedisModuleString **argv,
                         int argc)
{
    if (RedisModule_IsBlockedReplyRequest(ctx)) {
        long *mynumber = RedisModule_GetBlockedClientPrivateData(ctx);
        return RedisModule_ReplyWithLongLong(ctx,mynumber);
    } else if (RedisModule_IsBlockedTimeoutRequest) {
        return RedisModule_ReplyWithNull(ctx);
    }

    RedisModuleBlockedClient *bc =
        RedisModule_BlockClient(ctx,reply_func,timeout_func,NULL,0);

    pthread_t tid;
    if (pthread_create(&tid,NULL,threadmain,bc) != 0) {
        RedisModule_AbortBlock(bc);
        RedisModule_ReplyWithError(ctx,"Sorry can't create a thread");
    }

    return REDISMODULE_OK;
}

Functionally is the same but there are people that will prefer the less verbose implementation that concentrates most of the command logic in a single function.

Working on copies of data inside a thread

An interesting pattern in order to work with threads implementing the slow part of a command, is to work with a copy of the data, so that while some operation is performed in a key, the user continues to see the old version. However when the thread terminated its work, the representations are swapped and the new, processed version, is used.

An example of this approach is the Neural Redis module where neural networks are trained in different threads while the user can still execute and inspect their older versions.

Future work

An API is work in progress right now in order to allow Redis modules APIs to be called in a safe way from threads, so that the threaded command can access the data space and do incremental operations.

There is no ETA for this feature but it may appear in the course of the Redis 4.0 release at some point.

10.3 - Modules API for native types

How to use native types in a Redis module

Redis modules can access Redis built-in data structures both at high level, by calling Redis commands, and at low level, by manipulating the data structures directly.

By using these capabilities in order to build new abstractions on top of existing Redis data structures, or by using strings DMA in order to encode modules data structures into Redis strings, it is possible to create modules that feel like they are exporting new data types. However, for more complex problems, this is not enough, and the implementation of new data structures inside the module is needed.

We call the ability of Redis modules to implement new data structures that feel like native Redis ones native types support. This document describes the API exported by the Redis modules system in order to create new data structures and handle the serialization in RDB files, the rewriting process in AOF, the type reporting via the TYPE command, and so forth.

Overview of native types

A module exporting a native type is composed of the following main parts:

  • The implementation of some kind of new data structure and of commands operating on the new data structure.
  • A set of callbacks that handle: RDB saving, RDB loading, AOF rewriting, releasing of a value associated with a key, calculation of a value digest (hash) to be used with the DEBUG DIGEST command.
  • A 9 characters name that is unique to each module native data type.
  • An encoding version, used to persist into RDB files a module-specific data version, so that a module will be able to load older representations from RDB files.

While to handle RDB loading, saving and AOF rewriting may look complex as a first glance, the modules API provide very high level function for handling all this, without requiring the user to handle read/write errors, so in practical terms, writing a new data structure for Redis is a simple task.

A very easy to understand but complete example of native type implementation is available inside the Redis distribution in the /modules/hellotype.c file. The reader is encouraged to read the documentation by looking at this example implementation to see how things are applied in the practice.

Registering a new data type

In order to register a new native type into the Redis core, the module needs to declare a global variable that will hold a reference to the data type. The API to register the data type will return a data type reference that will be stored in the global variable.

static RedisModuleType *MyType;
#define MYTYPE_ENCODING_VERSION 0

int RedisModule_OnLoad(RedisModuleCtx *ctx) {
RedisModuleTypeMethods tm = {
    .version = REDISMODULE_TYPE_METHOD_VERSION,
    .rdb_load = MyTypeRDBLoad,
    .rdb_save = MyTypeRDBSave,
    .aof_rewrite = MyTypeAOFRewrite,
    .free = MyTypeFree
};

    MyType = RedisModule_CreateDataType(ctx, "MyType-AZ",
	MYTYPE_ENCODING_VERSION, &tm);
    if (MyType == NULL) return REDISMODULE_ERR;
}

As you can see from the example above, a single API call is needed in order to register the new type. However a number of function pointers are passed as arguments. Certain are optionals while some are mandatory. The above set of methods must be passed, while .digest and .mem_usage are optional and are currently not actually supported by the modules internals, so for now you can just ignore them.

The ctx argument is the context that we receive in the OnLoad function. The type name is a 9 character name in the character set that includes from A-Z, a-z, 0-9, plus the underscore _ and minus - characters.

Note that this name must be unique for each data type in the Redis ecosystem, so be creative, use both lower-case and upper case if it makes sense, and try to use the convention of mixing the type name with the name of the author of the module, to create a 9 character unique name.

NOTE: It is very important that the name is exactly 9 chars or the registration of the type will fail. Read more to understand why.

For example if I'm building a b-tree data structure and my name is antirez I'll call my type btree1-az. The name, converted to a 64 bit integer, is stored inside the RDB file when saving the type, and will be used when the RDB data is loaded in order to resolve what module can load the data. If Redis finds no matching module, the integer is converted back to a name in order to provide some clue to the user about what module is missing in order to load the data.

The type name is also used as a reply for the TYPE command when called with a key holding the registered type.

The encver argument is the encoding version used by the module to store data inside the RDB file. For example I can start with an encoding version of 0, but later when I release version 2.0 of my module, I can switch encoding to something better. The new module will register with an encoding version of 1, so when it saves new RDB files, the new version will be stored on disk. However when loading RDB files, the module rdb_load method will be called even if there is data found for a different encoding version (and the encoding version is passed as argument to rdb_load), so that the module can still load old RDB files.

The last argument is a structure used in order to pass the type methods to the registration function: rdb_load, rdb_save, aof_rewrite, digest and free and mem_usage are all callbacks with the following prototypes and uses:

typedef void *(*RedisModuleTypeLoadFunc)(RedisModuleIO *rdb, int encver);
typedef void (*RedisModuleTypeSaveFunc)(RedisModuleIO *rdb, void *value);
typedef void (*RedisModuleTypeRewriteFunc)(RedisModuleIO *aof, RedisModuleString *key, void *value);
typedef size_t (*RedisModuleTypeMemUsageFunc)(void *value);
typedef void (*RedisModuleTypeDigestFunc)(RedisModuleDigest *digest, void *value);
typedef void (*RedisModuleTypeFreeFunc)(void *value);
  • rdb_load is called when loading data from the RDB file. It loads data in the same format as rdb_save produces.
  • rdb_save is called when saving data to the RDB file.
  • aof_rewrite is called when the AOF is being rewritten, and the module needs to tell Redis what is the sequence of commands to recreate the content of a given key.
  • digest is called when DEBUG DIGEST is executed and a key holding this module type is found. Currently this is not yet implemented so the function ca be left empty.
  • mem_usage is called when the MEMORY command asks for the total memory consumed by a specific key, and is used in order to get the amount of bytes used by the module value.
  • free is called when a key with the module native type is deleted via DEL or in any other mean, in order to let the module reclaim the memory associated with such a value.

Ok, but why modules types require a 9 characters name?

Oh, I understand you need to understand this, so here is a very specific explanation.

When Redis persists to RDB files, modules specific data types require to be persisted as well. Now RDB files are sequences of key-value pairs like the following:

[1 byte type] [key] [a type specific value]

The 1 byte type identifies strings, lists, sets, and so forth. In the case of modules data, it is set to a special value of module data, but of course this is not enough, we need the information needed to link a specific value with a specific module type that is able to load and handle it.

So when we save a type specific value about a module, we prefix it with a 64 bit integer. 64 bits is large enough to store the informations needed in order to lookup the module that can handle that specific type, but is short enough that we can prefix each module value we store inside the RDB without making the final RDB file too big. At the same time, this solution of prefixing the value with a 64 bit signature does not require to do strange things like defining in the RDB header a list of modules specific types. Everything is pretty simple.

So, what you can store in 64 bits in order to identify a given module in a reliable way? Well if you build a character set of 64 symbols, you can easily store 9 characters of 6 bits, and you are left with 10 bits, that are used in order to store the encoding version of the type, so that the same type can evolve in the future and provide a different and more efficient or updated serialization format for RDB files.

So the 64 bit prefix stored before each module value is like the following:

6|6|6|6|6|6|6|6|6|10

The first 9 elements are 6-bits characters, the final 10 bits is the encoding version.

When the RDB file is loaded back, it reads the 64 bit value, masks the final 10 bits, and searches for a matching module in the modules types cache. When a matching one is found, the method to load the RDB file value is called with the 10 bits encoding version as argument, so that the module knows what version of the data layout to load, if it can support multiple versions.

Now the interesting thing about all this is that, if instead the module type cannot be resolved, since there is no loaded module having this signature, we can convert back the 64 bit value into a 9 characters name, and print an error to the user that includes the module type name! So that she or he immediately realizes what's wrong.

Setting and getting keys

After registering our new data type in the RedisModule_OnLoad() function, we also need to be able to set Redis keys having as value our native type.

This normally happens in the context of commands that write data to a key. The native types API allow to set and get keys to module native data types, and to test if a given key is already associated to a value of a specific data type.

The API uses the normal modules RedisModule_OpenKey() low level key access interface in order to deal with this. This is an example of setting a native type private data structure to a Redis key:

RedisModuleKey *key = RedisModule_OpenKey(ctx,keyname,REDISMODULE_WRITE);
struct some_private_struct *data = createMyDataStructure();
RedisModule_ModuleTypeSetValue(key,MyType,data);

The function RedisModule_ModuleTypeSetValue() is used with a key handle open for writing, and gets three arguments: the key handle, the reference to the native type, as obtained during the type registration, and finally a void* pointer that contains the private data implementing the module native type.

Note that Redis has no clues at all about what your data contains. It will just call the callbacks you provided during the method registration in order to perform operations on the type.

Similarly we can retrieve the private data from a key using this function:

struct some_private_struct *data;
data = RedisModule_ModuleTypeGetValue(key);

We can also test for a key to have our native type as value:

if (RedisModule_ModuleTypeGetType(key) == MyType) {
    /* ... do something ... */
}

However for the calls to do the right thing, we need to check if the key is empty, if it contains a value of the right kind, and so forth. So the idiomatic code to implement a command writing to our native type is along these lines:

RedisModuleKey *key = RedisModule_OpenKey(ctx,argv[1],
    REDISMODULE_READ|REDISMODULE_WRITE);
int type = RedisModule_KeyType(key);
if (type != REDISMODULE_KEYTYPE_EMPTY &&
    RedisModule_ModuleTypeGetType(key) != MyType)
{
    return RedisModule_ReplyWithError(ctx,REDISMODULE_ERRORMSG_WRONGTYPE);
}

Then if we successfully verified the key is not of the wrong type, and we are going to write to it, we usually want to create a new data structure if the key is empty, or retrieve the reference to the value associated to the key if there is already one:

/* Create an empty value object if the key is currently empty. */
struct some_private_struct *data;
if (type == REDISMODULE_KEYTYPE_EMPTY) {
    data = createMyDataStructure();
    RedisModule_ModuleTypeSetValue(key,MyTyke,data);
} else {
    data = RedisModule_ModuleTypeGetValue(key);
}
/* Do something with 'data'... */

Free method

As already mentioned, when Redis needs to free a key holding a native type value, it needs help from the module in order to release the memory. This is the reason why we pass a free callback during the type registration:

typedef void (*RedisModuleTypeFreeFunc)(void *value);

A trivial implementation of the free method can be something like this, assuming our data structure is composed of a single allocation:

void MyTypeFreeCallback(void *value) {
    RedisModule_Free(value);
}

However a more real world one will call some function that performs a more complex memory reclaiming, by casting the void pointer to some structure and freeing all the resources composing the value.

RDB load and save methods

The RDB saving and loading callbacks need to create (and load back) a representation of the data type on disk. Redis offers an high level API that can automatically store inside the RDB file the following types:

  • Unsigned 64 bit integers.
  • Signed 64 bit integers.
  • Doubles.
  • Strings.

It is up to the module to find a viable representation using the above base types. However note that while the integer and double values are stored and loaded in an architecture and endianness agnostic way, if you use the raw string saving API to, for example, save a structure on disk, you have to care those details yourself.

This is the list of functions performing RDB saving and loading:

void RedisModule_SaveUnsigned(RedisModuleIO *io, uint64_t value);
uint64_t RedisModule_LoadUnsigned(RedisModuleIO *io);
void RedisModule_SaveSigned(RedisModuleIO *io, int64_t value);
int64_t RedisModule_LoadSigned(RedisModuleIO *io);
void RedisModule_SaveString(RedisModuleIO *io, RedisModuleString *s);
void RedisModule_SaveStringBuffer(RedisModuleIO *io, const char *str, size_t len);
RedisModuleString *RedisModule_LoadString(RedisModuleIO *io);
char *RedisModule_LoadStringBuffer(RedisModuleIO *io, size_t *lenptr);
void RedisModule_SaveDouble(RedisModuleIO *io, double value);
double RedisModule_LoadDouble(RedisModuleIO *io);

The functions don't require any error checking from the module, that can always assume calls succeed.

As an example, imagine I've a native type that implements an array of double values, with the following structure:

struct double_array {
    size_t count;
    double *values;
};

My rdb_save method may look like the following:

void DoubleArrayRDBSave(RedisModuleIO *io, void *ptr) {
    struct dobule_array *da = ptr;
    RedisModule_SaveUnsigned(io,da->count);
    for (size_t j = 0; j < da->count; j++)
        RedisModule_SaveDouble(io,da->values[j]);
}

What we did was to store the number of elements followed by each double value. So when later we'll have to load the structure in the rdb_load method we'll do something like this:

void *DoubleArrayRDBLoad(RedisModuleIO *io, int encver) {
    if (encver != DOUBLE_ARRAY_ENC_VER) {
        /* We should actually log an error here, or try to implement
           the ability to load older versions of our data structure. */
        return NULL;
    }

    struct double_array *da;
    da = RedisModule_Alloc(sizeof(*da));
    da->count = RedisModule_LoadUnsigned(io);
    da->values = RedisModule_Alloc(da->count * sizeof(double));
    for (size_t j = 0; j < da->count; j++)
        da->values[j] = RedisModule_LoadDouble(io);
    return da;
}

The load callback just reconstruct back the data structure from the data we stored in the RDB file.

Note that while there is no error handling on the API that writes and reads from disk, still the load callback can return NULL on errors in case what it reads does not look correct. Redis will just panic in that case.

AOF rewriting

void RedisModule_EmitAOF(RedisModuleIO *io, const char *cmdname, const char *fmt, ...);

Handling multiple encodings

WORK IN PROGRESS

Allocating memory

Modules data types should try to use RedisModule_Alloc() functions family in order to allocate, reallocate and release heap memory used to implement the native data structures (see the other Redis Modules documentation for detailed information).

This is not just useful in order for Redis to be able to account for the memory used by the module, but there are also more advantages:

  • Redis uses the jemalloc allocator, that often prevents fragmentation problems that could be caused by using the libc allocator.
  • When loading strings from the RDB file, the native types API is able to return strings allocated directly with RedisModule_Alloc(), so that the module can directly link this memory into the data structure representation, avoiding an useless copy of the data.

Even if you are using external libraries implementing your data structures, the allocation functions provided by the module API is exactly compatible with malloc(), realloc(), free() and strdup(), so converting the libraries in order to use these functions should be trivial.

In case you have an external library that uses libc malloc(), and you want to avoid replacing manually all the calls with the Redis Modules API calls, an approach could be to use simple macros in order to replace the libc calls with the Redis API calls. Something like this could work:

#define malloc RedisModule_Alloc
#define realloc RedisModule_Realloc
#define free RedisModule_Free
#define strdup RedisModule_Strdup

However take in mind that mixing libc calls with Redis API calls will result into troubles and crashes, so if you replace calls using macros, you need to make sure that all the calls are correctly replaced, and that the code with the substituted calls will never, for example, attempt to call RedisModule_Free() with a pointer allocated using libc malloc().

11 - Optimizing Redis

Benchmarking, profiling, and optimizations for memory and latency

11.1 - Redis benchmark

Using the redis-benchmark utility to benchmark a Redis server

Redis includes the redis-benchmark utility that simulates running commands done by N clients at the same time sending M total queries. The utility provides a default set of tests, or a custom set of tests can be supplied.

The following options are supported:

Usage: redis-benchmark [-h <host>] [-p <port>] [-c <clients>] [-n <requests]> [-k <boolean>]

 -h <hostname>      Server hostname (default 127.0.0.1)
 -p <port>          Server port (default 6379)
 -s <socket>        Server socket (overrides host and port)
 -a <password>      Password for Redis Auth
 -c <clients>       Number of parallel connections (default 50)
 -n <requests>      Total number of requests (default 100000)
 -d <size>          Data size of SET/GET value in bytes (default 2)
 --dbnum <db>       SELECT the specified db number (default 0)
 -k <boolean>       1=keep alive 0=reconnect (default 1)
 -r <keyspacelen>   Use random keys for SET/GET/INCR, random values for SADD
  Using this option the benchmark will expand the string __rand_int__
  inside an argument with a 12 digits number in the specified range
  from 0 to keyspacelen-1. The substitution changes every time a command
  is executed. Default tests use this to hit random keys in the
  specified range.
 -P <numreq>        Pipeline <numreq> requests. Default 1 (no pipeline).
 -q                 Quiet. Just show query/sec values
 --csv              Output in CSV format
 -l                 Loop. Run the tests forever
 -t <tests>         Only run the comma separated list of tests. The test
                    names are the same as the ones produced as output.
 -I                 Idle mode. Just open N idle connections and wait.

You need to have a running Redis instance before launching the benchmark. You can run the benchmarking utility like so:

redis-benchmark -q -n 100000

Running only a subset of the tests

You don't need to run all the default tests every time you execute redis-benchmark. For example, to select only a subset of tests, use the -t option as in the following example:

$ redis-benchmark -t set,lpush -n 100000 -q
SET: 74239.05 requests per second
LPUSH: 79239.30 requests per second

This example runs the tests for the SET and LPUSH commands and uses quiet mode (see the -q switch).

You can even benchmark a specfic command:

$ redis-benchmark -n 100000 -q script load "redis.call('set','foo','bar')"
script load redis.call('set','foo','bar'): 69881.20 requests per second

Selecting the size of the key space

By default, the benchmark runs against a single key. In Redis the difference between such a synthetic benchmark and a real one is not huge since it is an in-memory system, however it is possible to stress cache misses and in general to simulate a more real-world work load by using a large key space.

This is obtained by using the -r switch. For instance if I want to run one million SET operations, using a random key for every operation out of 100k possible keys, I'll use the following command line:

$ redis-cli flushall
OK

$ redis-benchmark -t set -r 100000 -n 1000000
====== SET ======
  1000000 requests completed in 13.86 seconds
  50 parallel clients
  3 bytes payload
  keep alive: 1

99.76% `<=` 1 milliseconds
99.98% `<=` 2 milliseconds
100.00% `<=` 3 milliseconds
100.00% `<=` 3 milliseconds
72144.87 requests per second

$ redis-cli dbsize
(integer) 99993

Using pipelining

By default every client (the benchmark simulates 50 clients if not otherwise specified with -c) sends the next command only when the reply of the previous command is received, this means that the server will likely need a read call in order to read each command from every client. Also RTT is paid as well.

Redis supports pipelining, so it is possible to send multiple commands at once, a feature often exploited by real world applications. Redis pipelining is able to dramatically improve the number of operations per second a server is able do deliver.

This is an example of running the benchmark in a MacBook Air 11" using a pipelining of 16 commands:

$ redis-benchmark -n 1000000 -t set,get -P 16 -q
SET: 403063.28 requests per second
GET: 508388.41 requests per second

Using pipelining results in a significant increase in performance.

Pitfalls and misconceptions

The first point is obvious: the golden rule of a useful benchmark is to only compare apples and apples. Different versions of Redis can be compared on the same workload for instance. Or the same version of Redis, but with different options. If you plan to compare Redis to something else, then it is important to evaluate the functional and technical differences, and take them in account.

  • Redis is a server: all commands involve network or IPC round trips. It is meaningless to compare it to embedded data stores, because the cost of most operations is primarily in network/protocol management.
  • Redis commands return an acknowledgment for all usual commands. Some other data stores do not. Comparing Redis to stores involving one-way queries is only mildly useful.
  • Naively iterating on synchronous Redis commands does not benchmark Redis itself, but rather measure your network (or IPC) latency and the client library intrinsic latency. To really test Redis, you need multiple connections (like redis-benchmark) and/or to use pipelining to aggregate several commands and/or multiple threads or processes.
  • Redis is an in-memory data store with some optional persistence options. If you plan to compare it to transactional servers (MySQL, PostgreSQL, etc ...), then you should consider activating AOF and decide on a suitable fsync policy.
  • Redis is, mostly, a single-threaded server from the POV of commands execution (actually modern versions of Redis use threads for different things). It is not designed to benefit from multiple CPU cores. People are supposed to launch several Redis instances to scale out on several cores if needed. It is not really fair to compare one single Redis instance to a multi-threaded data store.

The redis-benchmark program is a quick and useful way to get some figures and evaluate the performance of a Redis instance on a given hardware. However, by default, it does not represent the maximum throughput a Redis instance can sustain. Actually, by using pipelining and a fast client (hiredis), it is fairly easy to write a program generating more throughput than redis-benchmark. The default behavior of redis-benchmark is to achieve throughput by exploiting concurrency only (i.e. it creates several connections to the server). It does not use pipelining or any parallelism at all (one pending query per connection at most, and no multi-threading), if not explicitly enabled via the -P parameter. So in some way using redis-benchmark and, triggering, for example, a BGSAVE operation in the background at the same time, will provide the user with numbers more near to the worst case than to the best case.

To run a benchmark using pipelining mode (and achieve higher throughput), you need to explicitly use the -P option. Please note that it is still a realistic behavior since a lot of Redis based applications actively use pipelining to improve performance. However you should use a pipeline size that is more or less the average pipeline length you'll be able to use in your application in order to get realistic numbers.

The benchmark should apply the same operations, and work in the same way with the multiple data stores you want to compare. It is absolutely pointless to compare the result of redis-benchmark to the result of another benchmark program and extrapolate.

For instance, Redis and memcached in single-threaded mode can be compared on GET/SET operations. Both are in-memory data stores, working mostly in the same way at the protocol level. Provided their respective benchmark application is aggregating queries in the same way (pipelining) and use a similar number of connections, the comparison is actually meaningful.

When you're benchmarking a high-performance, in-memory database like Redis, it may be difficult to saturate the server. Sometimes, the performance bottleneck is on the client side, and not the server-side. In that case, the client (i.e., the benchmarking program itself) must be fixed, or perhaps scaled out, to reach the maximum throughput.

Factors impacting Redis performance

There are multiple factors having direct consequences on Redis performance. We mention them here, since they can alter the result of any benchmarks. Please note however, that a typical Redis instance running on a low end, untuned box usually provides good enough performance for most applications.

  • Network bandwidth and latency usually have a direct impact on the performance. It is a good practice to use the ping program to quickly check the latency between the client and server hosts is normal before launching the benchmark. Regarding the bandwidth, it is generally useful to estimate the throughput in Gbit/s and compare it to the theoretical bandwidth of the network. For instance a benchmark setting 4 KB strings in Redis at 100000 q/s, would actually consume 3.2 Gbit/s of bandwidth and probably fit within a 10 Gbit/s link, but not a 1 Gbit/s one. In many real world scenarios, Redis throughput is limited by the network well before being limited by the CPU. To consolidate several high-throughput Redis instances on a single server, it worth considering putting a 10 Gbit/s NIC or multiple 1 Gbit/s NICs with TCP/IP bonding.
  • CPU is another very important factor. Being single-threaded, Redis favors fast CPUs with large caches and not many cores. At this game, Intel CPUs are currently the winners. It is not uncommon to get only half the performance on an AMD Opteron CPU compared to similar Nehalem EP/Westmere EP/Sandy Bridge Intel CPUs with Redis. When client and server run on the same box, the CPU is the limiting factor with redis-benchmark.
  • Speed of RAM and memory bandwidth seem less critical for global performance especially for small objects. For large objects (>10 KB), it may become noticeable though. Usually, it is not really cost-effective to buy expensive fast memory modules to optimize Redis.
  • Redis runs slower on a VM compared to running without virtualization using the same hardware. If you have the chance to run Redis on a physical machine this is preferred. However this does not mean that Redis is slow in virtualized environments, the delivered performances are still very good and most of the serious performance issues you may incur in virtualized environments are due to over-provisioning, non-local disks with high latency, or old hypervisor software that have slow fork syscall implementation.
  • When the server and client benchmark programs run on the same box, both the TCP/IP loopback and unix domain sockets can be used. Depending on the platform, unix domain sockets can achieve around 50% more throughput than the TCP/IP loopback (on Linux for instance). The default behavior of redis-benchmark is to use the TCP/IP loopback.
  • The performance benefit of unix domain sockets compared to TCP/IP loopback tends to decrease when pipelining is heavily used (i.e. long pipelines).
  • When an ethernet network is used to access Redis, aggregating commands using pipelining is especially efficient when the size of the data is kept under the ethernet packet size (about 1500 bytes). Actually, processing 10 bytes, 100 bytes, or 1000 bytes queries almost result in the same throughput. See the graph below.

Data size impact

  • On multi CPU sockets servers, Redis performance becomes dependent on the NUMA configuration and process location. The most visible effect is that redis-benchmark results seem non-deterministic because client and server processes are distributed randomly on the cores. To get deterministic results, it is required to use process placement tools (on Linux: taskset or numactl). The most efficient combination is always to put the client and server on two different cores of the same CPU to benefit from the L3 cache. Here are some results of 4 KB SET benchmark for 3 server CPUs (AMD Istanbul, Intel Nehalem EX, and Intel Westmere) with different relative placements. Please note this benchmark is not meant to compare CPU models between themselves (CPUs exact model and frequency are therefore not disclosed).

NUMA chart

  • With high-end configurations, the number of client connections is also an important factor. Being based on epoll/kqueue, the Redis event loop is quite scalable. Redis has already been benchmarked at more than 60000 connections, and was still able to sustain 50000 q/s in these conditions. As a rule of thumb, an instance with 30000 connections can only process half the throughput achievable with 100 connections. Here is an example showing the throughput of a Redis instance per number of connections:

connections chart

  • With high-end configurations, it is possible to achieve higher throughput by tuning the NIC(s) configuration and associated interruptions. Best throughput is achieved by setting an affinity between Rx/Tx NIC queues and CPU cores, and activating RPS (Receive Packet Steering) support. More information in this thread. Jumbo frames may also provide a performance boost when large objects are used.
  • Depending on the platform, Redis can be compiled against different memory allocators (libc malloc, jemalloc, tcmalloc), which may have different behaviors in term of raw speed, internal and external fragmentation. If you did not compile Redis yourself, you can use the INFO command to check the mem_allocator field. Please note most benchmarks do not run long enough to generate significant external fragmentation (contrary to production Redis instances).

Other things to consider

One important goal of any benchmark is to get reproducible results, so they can be compared to the results of other tests.

  • A good practice is to try to run tests on isolated hardware as much as possible. If it is not possible, then the system must be monitored to check the benchmark is not impacted by some external activity.
  • Some configurations (desktops and laptops for sure, some servers as well) have a variable CPU core frequency mechanism. The policy controlling this mechanism can be set at the OS level. Some CPU models are more aggressive than others at adapting the frequency of the CPU cores to the workload. To get reproducible results, it is better to set the highest possible fixed frequency for all the CPU cores involved in the benchmark.
  • An important point is to size the system accordingly to the benchmark. The system must have enough RAM and must not swap. On Linux, do not forget to set the overcommit_memory parameter correctly. Please note 32 and 64 bit Redis instances do not have the same memory footprint.
  • If you plan to use RDB or AOF for your benchmark, please check there is no other I/O activity in the system. Avoid putting RDB or AOF files on NAS or NFS shares, or on any other devices impacting your network bandwidth and/or latency (for instance, EBS on Amazon EC2).
  • Set Redis logging level (loglevel parameter) to warning or notice. Avoid putting the generated log file on a remote filesystem.
  • Avoid using monitoring tools which can alter the result of the benchmark. For instance using INFO at regular interval to gather statistics is probably fine, but MONITOR will impact the measured performance significantly.

Other Redis benchmarking tools

There are several third-party tools that can be used for benchmarking Redis. Refer to each tool's documentation for more information about its goals and capabilities.

  • memtier_benchmark from Redis Ltd. is a NoSQL Redis and Memcache traffic generation and benchmarking tool.
  • rpc-perf from Twitter is a tool for benchmarking RPC services that supports Redis and Memcache.
  • YCSB from Yahoo @Yahoo is a benchmarking framework with clients to many databases, including Redis.

11.2 - Redis CPU profiling

Performance engineering guide for on-CPU profiling and tracing

Filling the performance checklist

Redis is developed with a great emphasis on performance. We do our best with every release to make sure you'll experience a very stable and fast product.

Nevertheless, if you're finding room to improve the efficiency of Redis or are pursuing a performance regression investigation you will need a concise methodical way of monitoring and analyzing Redis performance.

To do so you can rely on different methodologies (some more suited than other depending on the class of issues/analysis we intent to make). A curated list of methodologies and their steps are enumerated by Brendan Greg at the following link.

We recommend the Utilization Saturation and Errors (USE) Method for answering the question of what is your bottleneck. Check the following mapping between system resource, metric, and tools for a pratical deep dive: USE method.

Ensuring the CPU is your bottleneck

This guide assumes you've followed one of the above methodologies to perform a complete check of system health, and identified the bottleneck being the CPU. If you have identified that most of the time is spent blocked on I/O, locks, timers, paging/swapping, etc., this guide is not for you.

Build Prerequisites

For a proper On-CPU analysis, Redis (and any dynamically loaded library like Redis Modules) requires stack traces to be available to tracers, which you may need to fix first.

By default, Redis is compiled with the -O2 switch (which we intent to keep during profiling). This means that compiler optimizations are enabled. Many compilers omit the frame pointer as a runtime optimization (saving a register), thus breaking frame pointer-based stack walking. This makes the Redis executable faster, but at the same time it makes Redis (like any other program) harder to trace, potentially wrongfully pinpointing on-CPU time to the last available frame pointer of a call stack that can get a lot deeper (but impossible to trace).

It's important that you ensure that:

  • debug information is present: compile option -g
  • frame pointer register is present: -fno-omit-frame-pointer
  • we still run with optimizations to get an accurate representation of production run times, meaning we will keep: -O2

You can do it as follows within redis main repo:

$ make REDIS_CFLAGS="-g -fno-omit-frame-pointer"

A set of instruments to identify performance regressions and/or potential on-CPU performance improvements

This document focuses specifically on on-CPU resource bottlenecks analysis, meaning we're interested in understanding where threads are spending CPU cycles while running on-CPU and, as importantly, whether those cycles are effectively being used for computation or stalled waiting (not blocked!) for memory I/O, and cache misses, etc.

For that we will rely on toolkits (perf, bcc tools), and hardware specific PMCs (Performance Monitoring Counters), to proceed with:

  • Hotspot analysis (pref or bcc tools): to profile code execution and determine which functions are consuming the most time and thus are targets for optimization. We'll present two options to collect, report, and visualize hotspots either with perf or bcc/BPF tracing tools.

  • Call counts analysis: to count events including function calls, enabling us to correlate several calls/components at once, relying on bcc/BPF tracing tools.

  • Hardware event sampling: crucial for understanding CPU behavior, including memory I/O, stall cycles, and cache misses.

Tool prerequesits

The following steps rely on Linux perf_events (aka "perf"), bcc/BPF tracing tools, and Brendan Greg’s FlameGraph repo.

We assume beforehand you have:

  • Installed the perf tool on your system. Most Linux distributions will likely package this as a package related to the kernel. More information about the perf tool can be found at perf wiki.
  • Followed the install bcc/BPF instructions to install bcc toolkit on your machine.
  • Cloned Brendan Greg’s FlameGraph repo and made accessible the difffolded.pl and flamegraph.pl files, to generated the collapsed stack traces and Flame Graphs.

Hotspot analysis with perf or eBPF (stack traces sampling)

Profiling CPU usage by sampling stack traces at a timed interval is a fast and easy way to identify performance-critical code sections (hotspots).

Sampling stack traces using perf

To profile both user- and kernel-level stacks of redis-server for a specific length of time, for example 60 seconds, at a sampling frequency of 999 samples per second:

$ perf record -g --pid $(pgrep redis-server) -F 999 -- sleep 60

Displaying the recorded profile information using perf report

By default perf record will generate a perf.data file in the current working directory.

You can then report with a call-graph output (call chain, stack backtrace), with a minimum call graph inclusion threshold of 0.5%, with:

$ perf report -g "graph,0.5,caller"

See the perf report documention for advanced filtering, sorting and aggregation capabilities.

Visualizing the recorded profile information using Flame Graphs

Flame graphs allow for a quick and accurate visualization of frequent code-paths. They can be generated using Brendan Greg's open source programs on github, which create interactive SVGs from folded stack files.

Specifically, for perf we need to convert the generated perf.data into the captured stacks, and fold each of them into single lines. You can then render the on-CPU flame graph with:

$ perf script > redis.perf.stacks
$ stackcollapse-perf.pl redis.perf.stacks > redis.folded.stacks
$ flamegraph.pl redis.folded.stacks > redis.svg

By default, perf script will generate a perf.data file in the current working directory. See the perf script documentation for advanced usage.

See FlameGraph usage options for more advanced stack trace visualizations (like the differential one).

Archiving and sharing recorded profile information

So that analysis of the perf.data contents can be possible on a machine other than the one on which collection happened, you need to export along with the perf.data file all object files with build-ids found in the record data file. This can be easily done with the help of perf-archive.sh script:

$ perf-archive.sh perf.data

Now please run:

$ tar xvf perf.data.tar.bz2 -C ~/.debug

on the machine where you need to run perf report.

Sampling stack traces using bcc/BPF's profile

Similarly to perf, as of Linux kernel 4.9, BPF-optimized profiling is now fully available with the promise of lower overhead on CPU (as stack traces are frequency counted in kernel context) and disk I/O resources during profiling.

Apart from that, and relying solely on bcc/BPF's profile tool, we have also removed the perf.data and intermediate steps if stack traces analysis is our main goal. You can use bcc's profile tool to output folded format directly, for flame graph generation:

$ /usr/share/bcc/tools/profile -F 999 -f --pid $(pgrep redis-server) --duration 60 > redis.folded.stacks

In that manner, we've remove any preprocessing and can render the on-CPU flame graph with a single command:

$ flamegraph.pl redis.folded.stacks > redis.svg

Visualizing the recorded profile information using Flame Graphs

Call counts analysis with bcc/BPF

A function may consume significant CPU cycles either because its code is slow or because it's frequently called. To answer at what rate functions are being called, you can rely upon call counts analysis using BCC's funccount tool:

$ /usr/share/bcc/tools/funccount 'redis-server:(call*|*Read*|*Write*)' --pid $(pgrep redis-server) --duration 60
Tracing 64 functions for "redis-server:(call*|*Read*|*Write*)"... Hit Ctrl-C to end.

FUNC                                    COUNT
call                                      334
handleClientsWithPendingWrites            388
clientInstallWriteHandler                 388
postponeClientRead                        514
handleClientsWithPendingReadsUsingThreads      735
handleClientsWithPendingWritesUsingThreads      735
prepareClientToWrite                     1442
Detaching...

The above output shows that, while tracing, the Redis's call() function was called 334 times, handleClientsWithPendingWrites() 388 times, etc.

Hardware event counting with Performance Monitoring Counters (PMCs)

Many modern processors contain a performance monitoring unit (PMU) exposing Performance Monitoring Counters (PMCs). PMCs are crucial for understanding CPU behavior, including memory I/O, stall cycles, and cache misses, and provide low-level CPU performance statistics that aren't available anywhere else.

The design and functionality of a PMU is CPU-specific and you should assess your CPU supported counters and features by using perf list.

To calculate the number of instructions per cycle, the number of micro ops executed, the number of cycles during which no micro ops were dispatched, the number stalled cycles on memory, including a per memory type stalls, for the duration of 60s, specifically for redis process:

$ perf stat -e "cpu-clock,cpu-cycles,instructions,uops_executed.core,uops_executed.stall_cycles,cache-references,cache-misses,cycle_activity.stalls_total,cycle_activity.stalls_mem_any,cycle_activity.stalls_l3_miss,cycle_activity.stalls_l2_miss,cycle_activity.stalls_l1d_miss" --pid $(pgrep redis-server) -- sleep 60

Performance counter stats for process id '3038':

  60046.411437      cpu-clock (msec)          #    1.001 CPUs utilized          
  168991975443      cpu-cycles                #    2.814 GHz                      (36.40%)
  388248178431      instructions              #    2.30  insn per cycle           (45.50%)
  443134227322      uops_executed.core        # 7379.862 M/sec                    (45.51%)
   30317116399      uops_executed.stall_cycles #  504.895 M/sec                    (45.51%)
     670821512      cache-references          #   11.172 M/sec                    (45.52%)
      23727619      cache-misses              #    3.537 % of all cache refs      (45.43%)
   30278479141      cycle_activity.stalls_total #  504.251 M/sec                    (36.33%)
   19981138777      cycle_activity.stalls_mem_any #  332.762 M/sec                    (36.33%)
     725708324      cycle_activity.stalls_l3_miss #   12.086 M/sec                    (36.33%)
    8487905659      cycle_activity.stalls_l2_miss #  141.356 M/sec                    (36.32%)
   10011909368      cycle_activity.stalls_l1d_miss #  166.736 M/sec                    (36.31%)

  60.002765665 seconds time elapsed

It's important to know that there are two very different ways in which PMCs can be used (couting and sampling), and we've focused solely on PMCs counting for the sake of this analysis. Brendan Greg clearly explains it on the following link.

11.3 - Diagnosing latency issues

Finding the causes of slow responses

This document will help you understand what the problem could be if you are experiencing latency problems with Redis.

In this context latency is the maximum delay between the time a client issues a command and the time the reply to the command is received by the client. Usually Redis processing time is extremely low, in the sub microsecond range, but there are certain conditions leading to higher latency figures.

I've little time, give me the checklist

The following documentation is very important in order to run Redis in a low latency fashion. However I understand that we are busy people, so let's start with a quick checklist. If you fail following these steps, please return here to read the full documentation.

  1. Make sure you are not running slow commands that are blocking the server. Use the Redis Slow Log feature to check this.
  2. For EC2 users, make sure you use HVM based modern EC2 instances, like m3.medium. Otherwise fork() is too slow.
  3. Transparent huge pages must be disabled from your kernel. Use echo never > /sys/kernel/mm/transparent_hugepage/enabled to disable them, and restart your Redis process.
  4. If you are using a virtual machine, it is possible that you have an intrinsic latency that has nothing to do with Redis. Check the minimum latency you can expect from your runtime environment using ./redis-cli --intrinsic-latency 100. Note: you need to run this command in the server not in the client.
  5. Enable and use the Latency monitor feature of Redis in order to get a human readable description of the latency events and causes in your Redis instance.

In general, use the following table for durability VS latency/performance tradeoffs, ordered from stronger safety to better latency.

  1. AOF + fsync always: this is very slow, you should use it only if you know what you are doing.
  2. AOF + fsync every second: this is a good compromise.
  3. AOF + fsync every second + no-appendfsync-on-rewrite option set to yes: this is as the above, but avoids to fsync during rewrites to lower the disk pressure.
  4. AOF + fsync never. Fsyncing is up to the kernel in this setup, even less disk pressure and risk of latency spikes.
  5. RDB. Here you have a vast spectrum of tradeoffs depending on the save triggers you configure.

And now for people with 15 minutes to spend, the details...

Measuring latency

If you are experiencing latency problems, you probably know how to measure it in the context of your application, or maybe your latency problem is very evident even macroscopically. However redis-cli can be used to measure the latency of a Redis server in milliseconds, just try:

redis-cli --latency -h `host` -p `port`

Using the internal Redis latency monitoring subsystem

Since Redis 2.8.13, Redis provides latency monitoring capabilities that are able to sample different execution paths to understand where the server is blocking. This makes debugging of the problems illustrated in this documentation much simpler, so we suggest enabling latency monitoring ASAP. Please refer to the Latency monitor documentation.

While the latency monitoring sampling and reporting capabilities will make it simpler to understand the source of latency in your Redis system, it is still advised that you read this documentation extensively to better understand the topic of Redis and latency spikes.

Latency baseline

There is a kind of latency that is inherently part of the environment where you run Redis, that is the latency provided by your operating system kernel and, if you are using virtualization, by the hypervisor you are using.

While this latency can't be removed it is important to study it because it is the baseline, or in other words, you won't be able to achieve a Redis latency that is better than the latency that every process running in your environment will experience because of the kernel or hypervisor implementation or setup.

We call this kind of latency intrinsic latency, and redis-cli starting from Redis version 2.8.7 is able to measure it. This is an example run under Linux 3.11.0 running on an entry level server.

Note: the argument 100 is the number of seconds the test will be executed. The more time we run the test, the more likely we'll be able to spot latency spikes. 100 seconds is usually appropriate, however you may want to perform a few runs at different times. Please note that the test is CPU intensive and will likely saturate a single core in your system.

$ ./redis-cli --intrinsic-latency 100
Max latency so far: 1 microseconds.
Max latency so far: 16 microseconds.
Max latency so far: 50 microseconds.
Max latency so far: 53 microseconds.
Max latency so far: 83 microseconds.
Max latency so far: 115 microseconds.

Note: redis-cli in this special case needs to run in the server where you run or plan to run Redis, not in the client. In this special mode redis-cli does not connect to a Redis server at all: it will just try to measure the largest time the kernel does not provide CPU time to run to the redis-cli process itself.

In the above example, the intrinsic latency of the system is just 0.115 milliseconds (or 115 microseconds), which is a good news, however keep in mind that the intrinsic latency may change over time depending on the load of the system.

Virtualized environments will not show so good numbers, especially with high load or if there are noisy neighbors. The following is a run on a Linode 4096 instance running Redis and Apache:

$ ./redis-cli --intrinsic-latency 100
Max latency so far: 573 microseconds.
Max latency so far: 695 microseconds.
Max latency so far: 919 microseconds.
Max latency so far: 1606 microseconds.
Max latency so far: 3191 microseconds.
Max latency so far: 9243 microseconds.
Max latency so far: 9671 microseconds.

Here we have an intrinsic latency of 9.7 milliseconds: this means that we can't ask better than that to Redis. However other runs at different times in different virtualization environments with higher load or with noisy neighbors can easily show even worse values. We were able to measure up to 40 milliseconds in systems otherwise apparently running normally.

Latency induced by network and communication

Clients connect to Redis using a TCP/IP connection or a Unix domain connection. The typical latency of a 1 Gbit/s network is about 200 us, while the latency with a Unix domain socket can be as low as 30 us. It actually depends on your network and system hardware. On top of the communication itself, the system adds some more latency (due to thread scheduling, CPU caches, NUMA placement, etc ...). System induced latencies are significantly higher on a virtualized environment than on a physical machine.

The consequence is even if Redis processes most commands in sub microsecond range, a client performing many roundtrips to the server will have to pay for these network and system related latencies.

An efficient client will therefore try to limit the number of roundtrips by pipelining several commands together. This is fully supported by the servers and most clients. Aggregated commands like MSET/MGET can be also used for that purpose. Starting with Redis 2.4, a number of commands also support variadic parameters for all data types.

Here are some guidelines:

  • If you can afford it, prefer a physical machine over a VM to host the server.
  • Do not systematically connect/disconnect to the server (especially true for web based applications). Keep your connections as long lived as possible.
  • If your client is on the same host than the server, use Unix domain sockets.
  • Prefer to use aggregated commands (MSET/MGET), or commands with variadic parameters (if possible) over pipelining.
  • Prefer to use pipelining (if possible) over sequence of roundtrips.
  • Redis supports Lua server-side scripting to cover cases that are not suitable for raw pipelining (for instance when the result of a command is an input for the following commands).

On Linux, some people can achieve better latencies by playing with process placement (taskset), cgroups, real-time priorities (chrt), NUMA configuration (numactl), or by using a low-latency kernel. Please note vanilla Redis is not really suitable to be bound on a single CPU core. Redis can fork background tasks that can be extremely CPU consuming like BGSAVE or BGREWRITEAOF. These tasks must never run on the same core as the main event loop.

In most situations, these kind of system level optimizations are not needed. Only do them if you require them, and if you are familiar with them.

Single threaded nature of Redis

Redis uses a mostly single threaded design. This means that a single process serves all the client requests, using a technique called multiplexing. This means that Redis can serve a single request in every given moment, so all the requests are served sequentially. This is very similar to how Node.js works as well. However, both products are not often perceived as being slow. This is caused in part by the small amount of time to complete a single request, but primarily because these products are designed to not block on system calls, such as reading data from or writing data to a socket.

I said that Redis is mostly single threaded since actually from Redis 2.4 we use threads in Redis in order to perform some slow I/O operations in the background, mainly related to disk I/O, but this does not change the fact that Redis serves all the requests using a single thread.

Latency generated by slow commands

A consequence of being single thread is that when a request is slow to serve all the other clients will wait for this request to be served. When executing normal commands, like GET or SET or LPUSH this is not a problem at all since these commands are executed in constant (and very small) time. However there are commands operating on many elements, like SORT, LREM, SUNION and others. For instance taking the intersection of two big sets can take a considerable amount of time.

The algorithmic complexity of all commands is documented. A good practice is to systematically check it when using commands you are not familiar with.

If you have latency concerns you should either not use slow commands against values composed of many elements, or you should run a replica using Redis replication where you run all your slow queries.

It is possible to monitor slow commands using the Redis Slow Log feature.

Additionally, you can use your favorite per-process monitoring program (top, htop, prstat, etc ...) to quickly check the CPU consumption of the main Redis process. If it is high while the traffic is not, it is usually a sign that slow commands are used.

IMPORTANT NOTE: a VERY common source of latency generated by the execution of slow commands is the use of the KEYS command in production environments. KEYS, as documented in the Redis documentation, should only be used for debugging purposes. Since Redis 2.8 a new commands were introduced in order to iterate the key space and other large collections incrementally, please check the SCAN, SSCAN, HSCAN and ZSCAN commands for more information.

Latency generated by fork

In order to generate the RDB file in background, or to rewrite the Append Only File if AOF persistence is enabled, Redis has to fork background processes. The fork operation (running in the main thread) can induce latency by itself.

Forking is an expensive operation on most Unix-like systems, since it involves copying a good number of objects linked to the process. This is especially true for the page table associated to the virtual memory mechanism.

For instance on a Linux/AMD64 system, the memory is divided in 4 kB pages. To convert virtual addresses to physical addresses, each process stores a page table (actually represented as a tree) containing at least a pointer per page of the address space of the process. So a large 24 GB Redis instance requires a page table of 24 GB / 4 kB * 8 = 48 MB.

When a background save is performed, this instance will have to be forked, which will involve allocating and copying 48 MB of memory. It takes time and CPU, especially on virtual machines where allocation and initialization of a large memory chunk can be expensive.

Fork time in different systems

Modern hardware is pretty fast at copying the page table, but Xen is not. The problem with Xen is not virtualization-specific, but Xen-specific. For instance using VMware or Virtual Box does not result into slow fork time. The following is a table that compares fork time for different Redis instance size. Data is obtained performing a BGSAVE and looking at the latest_fork_usec filed in the INFO command output.

However the good news is that new types of EC2 HVM based instances are much better with fork times, almost on par with physical servers, so for example using m3.medium (or better) instances will provide good results.

  • Linux beefy VM on VMware 6.0GB RSS forked in 77 milliseconds (12.8 milliseconds per GB).
  • Linux running on physical machine (Unknown HW) 6.1GB RSS forked in 80 milliseconds (13.1 milliseconds per GB)
  • Linux running on physical machine (Xeon @ 2.27Ghz) 6.9GB RSS forked into 62 milliseconds (9 milliseconds per GB).
  • Linux VM on 6sync (KVM) 360 MB RSS forked in 8.2 milliseconds (23.3 milliseconds per GB).
  • Linux VM on EC2, old instance types (Xen) 6.1GB RSS forked in 1460 milliseconds (239.3 milliseconds per GB).
  • Linux VM on EC2, new instance types (Xen) 1GB RSS forked in 10 milliseconds (10 milliseconds per GB).
  • Linux VM on Linode (Xen) 0.9GBRSS forked into 382 milliseconds (424 milliseconds per GB).

As you can see certain VMs running on Xen have a performance hit that is between one order to two orders of magnitude. For EC2 users the suggestion is simple: use modern HVM based instances.

Latency induced by transparent huge pages

Unfortunately when a Linux kernel has transparent huge pages enabled, Redis incurs to a big latency penalty after the fork call is used in order to persist on disk. Huge pages are the cause of the following issue:

  1. Fork is called, two processes with shared huge pages are created.
  2. In a busy instance, a few event loops runs will cause commands to target a few thousand of pages, causing the copy on write of almost the whole process memory.
  3. This will result in big latency and big memory usage.

Make sure to disable transparent huge pages using the following command:

echo never > /sys/kernel/mm/transparent_hugepage/enabled

Latency induced by swapping (operating system paging)

Linux (and many other modern operating systems) is able to relocate memory pages from the memory to the disk, and vice versa, in order to use the system memory efficiently.

If a Redis page is moved by the kernel from the memory to the swap file, when the data stored in this memory page is used by Redis (for example accessing a key stored into this memory page) the kernel will stop the Redis process in order to move the page back into the main memory. This is a slow operation involving random I/Os (compared to accessing a page that is already in memory) and will result into anomalous latency experienced by Redis clients.

The kernel relocates Redis memory pages on disk mainly because of three reasons:

  • The system is under memory pressure since the running processes are demanding more physical memory than the amount that is available. The simplest instance of this problem is simply Redis using more memory than is available.
  • The Redis instance data set, or part of the data set, is mostly completely idle (never accessed by clients), so the kernel could swap idle memory pages on disk. This problem is very rare since even a moderately slow instance will touch all the memory pages often, forcing the kernel to retain all the pages in memory.
  • Some processes are generating massive read or write I/Os on the system. Because files are generally cached, it tends to put pressure on the kernel to increase the filesystem cache, and therefore generate swapping activity. Please note it includes Redis RDB and/or AOF background threads which can produce large files.

Fortunately Linux offers good tools to investigate the problem, so the simplest thing to do is when latency due to swapping is suspected is just to check if this is the case.

The first thing to do is to checking the amount of Redis memory that is swapped on disk. In order to do so you need to obtain the Redis instance pid:

$ redis-cli info | grep process_id
process_id:5454

Now enter the /proc file system directory for this process:

$ cd /proc/5454

Here you'll find a file called smaps that describes the memory layout of the Redis process (assuming you are using Linux 2.6.16 or newer). This file contains very detailed information about our process memory maps, and one field called Swap is exactly what we are looking for. However there is not just a single swap field since the smaps file contains the different memory maps of our Redis process (The memory layout of a process is more complex than a simple linear array of pages).

Since we are interested in all the memory swapped by our process the first thing to do is to grep for the Swap field across all the file:

$ cat smaps | grep 'Swap:'
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                 12 kB
Swap:                156 kB
Swap:                  8 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  4 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  4 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  4 kB
Swap:                  4 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB
Swap:                  0 kB

If everything is 0 kB, or if there are sporadic 4k entries, everything is perfectly normal. Actually in our example instance (the one of a real web site running Redis and serving hundreds of users every second) there are a few entries that show more swapped pages. To investigate if this is a serious problem or not we change our command in order to also print the size of the memory map:

$ cat smaps | egrep '^(Swap|Size)'
Size:                316 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  8 kB
Swap:                  0 kB
Size:                 40 kB
Swap:                  0 kB
Size:                132 kB
Swap:                  0 kB
Size:             720896 kB
Swap:                 12 kB
Size:               4096 kB
Swap:                156 kB
Size:               4096 kB
Swap:                  8 kB
Size:               4096 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:               1272 kB
Swap:                  0 kB
Size:                  8 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                 16 kB
Swap:                  0 kB
Size:                 84 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  8 kB
Swap:                  4 kB
Size:                  8 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  4 kB
Size:                144 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  4 kB
Size:                 12 kB
Swap:                  4 kB
Size:                108 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB
Size:                272 kB
Swap:                  0 kB
Size:                  4 kB
Swap:                  0 kB

As you can see from the output, there is a map of 720896 kB (with just 12 kB swapped) and 156 kB more swapped in another map: basically a very small amount of our memory is swapped so this is not going to create any problem at all.

If instead a non trivial amount of the process memory is swapped on disk your latency problems are likely related to swapping. If this is the case with your Redis instance you can further verify it using the vmstat command:

$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 0  0   3980 697932 147180 1406456    0    0     2     2    2    0  4  4 91  0
 0  0   3980 697428 147180 1406580    0    0     0     0 19088 16104  9  6 84  0
 0  0   3980 697296 147180 1406616    0    0     0    28 18936 16193  7  6 87  0
 0  0   3980 697048 147180 1406640    0    0     0     0 18613 15987  6  6 88  0
 2  0   3980 696924 147180 1406656    0    0     0     0 18744 16299  6  5 88  0
 0  0   3980 697048 147180 1406688    0    0     0     4 18520 15974  6  6 88  0
^C

The interesting part of the output for our needs are the two columns si and so, that counts the amount of memory swapped from/to the swap file. If you see non zero counts in those two columns then there is swapping activity in your system.

Finally, the iostat command can be used to check the global I/O activity of the system.

$ iostat -xk 1
avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          13.55    0.04    2.92    0.53    0.00   82.95

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
sda               0.77     0.00    0.01    0.00     0.40     0.00    73.65     0.00    3.62   2.58   0.00
sdb               1.27     4.75    0.82    3.54    38.00    32.32    32.19     0.11   24.80   4.24   1.85

If your latency problem is due to Redis memory being swapped on disk you need to lower the memory pressure in your system, either adding more RAM if Redis is using more memory than the available, or avoiding running other memory hungry processes in the same system.

Latency due to AOF and disk I/O

Another source of latency is due to the Append Only File support on Redis. The AOF basically uses two system calls to accomplish its work. One is write(2) that is used in order to write data to the append only file, and the other one is fdatasync(2) that is used in order to flush the kernel file buffer on disk in order to ensure the durability level specified by the user.

Both the write(2) and fdatasync(2) calls can be source of latency. For instance write(2) can block both when there is a system wide sync in progress, or when the output buffers are full and the kernel requires to flush on disk in order to accept new writes.

The fdatasync(2) call is a worse source of latency as with many combinations of kernels and file systems used it can take from a few milliseconds to a few seconds to complete, especially in the case of some other process doing I/O. For this reason when possible Redis does the fdatasync(2) call in a different thread since Redis 2.4.

We'll see how configuration can affect the amount and source of latency when using the AOF file.

The AOF can be configured to perform an fsync on disk in three different ways using the appendfsync configuration option (this setting can be modified at runtime using the CONFIG SET command).

  • When appendfsync is set to the value of no Redis performs no fsync. In this configuration the only source of latency can be write(2). When this happens usually there is no solution since simply the disk can't cope with the speed at which Redis is receiving data, however this is uncommon if the disk is not seriously slowed down by other processes doing I/O.

  • When appendfsync is set to the value of everysec Redis performs an fsync every second. It uses a different thread, and if the fsync is still in progress Redis uses a buffer to delay the write(2) call up to two seconds (since write would block on Linux if an fsync is in progress against the same file). However if the fsync is taking too long Redis will eventually perform the write(2) call even if the fsync is still in progress, and this can be a source of latency.

  • When appendfsync is set to the value of always an fsync is performed at every write operation before replying back to the client with an OK code (actually Redis will try to cluster many commands executed at the same time into a single fsync). In this mode performances are very low in general and it is strongly recommended to use a fast disk and a file system implementation that can perform the fsync in short time.

Most Redis users will use either the no or everysec setting for the appendfsync configuration directive. The suggestion for minimum latency is to avoid other processes doing I/O in the same system. Using an SSD disk can help as well, but usually even non SSD disks perform well with the append only file if the disk is spare as Redis writes to the append only file without performing any seek.

If you want to investigate your latency issues related to the append only file you can use the strace command under Linux:

sudo strace -p $(pidof redis-server) -T -e trace=fdatasync

The above command will show all the fdatasync(2) system calls performed by Redis in the main thread. With the above command you'll not see the fdatasync system calls performed by the background thread when the appendfsync config option is set to everysec. In order to do so just add the -f switch to strace.

If you wish you can also see both fdatasync and write system calls with the following command:

sudo strace -p $(pidof redis-server) -T -e trace=fdatasync,write

However since write(2) is also used in order to write data to the client sockets this will likely show too many things unrelated to disk I/O. Apparently there is no way to tell strace to just show slow system calls so I use the following command:

sudo strace -f -p $(pidof redis-server) -T -e trace=fdatasync,write 2>&1 | grep -v '0.0' | grep -v unfinished

Latency generated by expires

Redis evict expired keys in two ways:

  • One lazy way expires a key when it is requested by a command, but it is found to be already expired.
  • One active way expires a few keys every 100 milliseconds.

The active expiring is designed to be adaptive. An expire cycle is started every 100 milliseconds (10 times per second), and will do the following:

  • Sample ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP keys, evicting all the keys already expired.
  • If the more than 25% of the keys were found expired, repeat.

Given that ACTIVE_EXPIRE_CYCLE_LOOKUPS_PER_LOOP is set to 20 by default, and the process is performed ten times per second, usually just 200 keys per second are actively expired. This is enough to clean the DB fast enough even when already expired keys are not accessed for a long time, so that the lazy algorithm does not help. At the same time expiring just 200 keys per second has no effects in the latency a Redis instance.

However the algorithm is adaptive and will loop if it finds more than 25% of keys already expired in the set of sampled keys. But given that we run the algorithm ten times per second, this means that the unlucky event of more than 25% of the keys in our random sample are expiring at least in the same second.

Basically this means that if the database has many many keys expiring in the same second, and these make up at least 25% of the current population of keys with an expire set, Redis can block in order to get the percentage of keys already expired below 25%.

This approach is needed in order to avoid using too much memory for keys that are already expired, and usually is absolutely harmless since it's strange that a big number of keys are going to expire in the same exact second, but it is not impossible that the user used EXPIREAT extensively with the same Unix time.

In short: be aware that many keys expiring at the same moment can be a source of latency.

Redis software watchdog

Redis 2.6 introduces the Redis Software Watchdog that is a debugging tool designed to track those latency problems that for one reason or the other escaped an analysis using normal tools.

The software watchdog is an experimental feature. While it is designed to be used in production environments care should be taken to backup the database before proceeding as it could possibly have unexpected interactions with the normal execution of the Redis server.

It is important to use it only as last resort when there is no way to track the issue by other means.

This is how this feature works:

  • The user enables the software watchdog using the CONFIG SET command.
  • Redis starts monitoring itself constantly.
  • If Redis detects that the server is blocked into some operation that is not returning fast enough, and that may be the source of the latency issue, a low level report about where the server is blocked is dumped on the log file.
  • The user contacts the developers writing a message in the Redis Google Group, including the watchdog report in the message.

Note that this feature cannot be enabled using the redis.conf file, because it is designed to be enabled only in already running instances and only for debugging purposes.

To enable the feature just use the following:

CONFIG SET watchdog-period 500

The period is specified in milliseconds. In the above example I specified to log latency issues only if the server detects a delay of 500 milliseconds or greater. The minimum configurable period is 200 milliseconds.

When you are done with the software watchdog you can turn it off setting the watchdog-period parameter to 0. Important: remember to do this because keeping the instance with the watchdog turned on for a longer time than needed is generally not a good idea.

The following is an example of what you'll see printed in the log file once the software watchdog detects a delay longer than the configured one:

[8547 | signal handler] (1333114359)
--- WATCHDOG TIMER EXPIRED ---
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libpthread.so.0(+0xf8f0) [0x7f16b5f158f0]
/lib/libc.so.6(nanosleep+0x2d) [0x7f16b5c2d39d]
/lib/libc.so.6(usleep+0x34) [0x7f16b5c62844]
./redis-server(debugCommand+0x3e1) [0x43ab41]
./redis-server(call+0x5d) [0x415a9d]
./redis-server(processCommand+0x375) [0x415fc5]
./redis-server(processInputBuffer+0x4f) [0x4203cf]
./redis-server(readQueryFromClient+0xa0) [0x4204e0]
./redis-server(aeProcessEvents+0x128) [0x411b48]
./redis-server(aeMain+0x2b) [0x411dbb]
./redis-server(main+0x2b6) [0x418556]
/lib/libc.so.6(__libc_start_main+0xfd) [0x7f16b5ba1c4d]
./redis-server() [0x411099]
------

Note: in the example the DEBUG SLEEP command was used in order to block the server. The stack trace is different if the server blocks in a different context.

If you happen to collect multiple watchdog stack traces you are encouraged to send everything to the Redis Google Group: the more traces we obtain, the simpler it will be to understand what the problem with your instance is.

11.4 - Redis latency monitoring

Discovering slow server events in Redis

Redis is often used for demanding use cases, where it serves a large number of queries per second per instance, but also has strict latency requirements for the average response time and the worst-case latency.

While Redis is an in-memory system, it deals with the operating system in different ways, for example, in the context of persisting to disk. Moreover Redis implements a rich set of commands. Certain commands are fast and run in constant or logarithmic time. Other commands are slower O(N) commands that can cause latency spikes.

Finally, Redis is single threaded. This is usually an advantage from the point of view of the amount of work it can perform per core, and in the latency figures it is able to provide. However, it poses a challenge for latency, since the single thread must be able to perform certain tasks incrementally, for example key expiration, in a way that does not impact the other clients that are served.

For all these reasons, Redis 2.8.13 introduced a new feature called Latency Monitoring, that helps the user to check and troubleshoot possible latency problems. Latency monitoring is composed of the following conceptual parts:

  • Latency hooks that sample different latency-sensitive code paths.
  • Time series recording of latency spikes, split by different events.
  • Reporting engine to fetch raw data from the time series.
  • Analysis engine to provide human-readable reports and hints according to the measurements.

The rest of this document covers the latency monitoring subsystem details. For more information about the general topic of Redis and latency, see Redis latency problems troubleshooting.

Events and time series

Different monitored code paths have different names and are called events. For example, command is an event that measures latency spikes of possibly slow command executions, while fast-command is the event name for the monitoring of the O(1) and O(log N) commands. Other events are less generic and monitor specific operations performed by Redis. For example, the fork event only monitors the time taken by Redis to execute the fork(2) system call.

A latency spike is an event that takes more time to run than the configured latency threshold. There is a separate time series associated with every monitored event. This is how the time series work:

  • Every time a latency spike happens, it is logged in the appropriate time series.
  • Every time series is composed of 160 elements.
  • Each element is a pair made of a Unix timestamp of the time the latency spike was measured and the number of milliseconds the event took to execute.
  • Latency spikes for the same event that occur in the same second are merged by taking the maximum latency. Even if continuous latency spikes are measured for a given event, which could happen with a low threshold, at least 180 seconds of history are available.
  • Records the all-time maximum latency for every element.

The framework monitors and logs latency spikes in the execution time of these events:

  • command: regular commands.
  • fast-command: O(1) and O(log N) commands.
  • fork: the fork(2) system call.
  • rdb-unlink-temp-file: the unlink(2) system call.
  • aof-write: writing to the AOF - a catchall event for fsync(2) system calls.
  • aof-fsync-always: the fsync(2) system call when invoked by the appendfsync allways policy.
  • aof-write-pending-fsync: the fsync(2) system call when there are pending writes.
  • aof-write-active-child: the fsync(2) system call when performed by a child process.
  • aof-write-alone: the fsync(2) system call when performed by the main process.
  • aof-fstat: the fstat(2) system call.
  • aof-rename: the rename(2) system call for renaming the temporary file after completing BGREWRITEAOF.
  • aof-rewrite-diff-write: writing the differences accumulated while performing BGREWRITEAOF.
  • active-defrag-cycle: the active defragmentation cycle.
  • expire-cycle: the expiration cycle.
  • eviction-cycle: the eviction cycle.
  • eviction-del: deletes during the eviction cycle.

How to enable latency monitoring

What is high latency for one use case may not be considered high latency for another. Some applications may require that all queries be served in less than 1 millisecond. For other applications, it may be acceptable for a small amount of clients to experience a 2 second latency on occasion.

The first step to enable the latency monitor is to set a latency threshold in milliseconds. Only events that take longer than the specified threshold will be logged as latency spikes. The user should set the threshold according to their needs. For example, if the application requires a maximum acceptable latency of 100 milliseconds, the threshold should be set to log all the events blocking the server for a time equal or greater to 100 milliseconds.

Enable the latency monitor at runtime in a production server with the following command:

CONFIG SET latency-monitor-threshold 100

Monitoring is turned off by default (threshold set to 0), even if the actual cost of latency monitoring is near zero. While the memory requirements of latency monitoring are very small, there is no good reason to raise the baseline memory usage of a Redis instance that is working well.

Report information with the LATENCY command

The user interface to the latency monitoring subsystem is the LATENCY command. Like many other Redis commands, LATENCY accepts subcommands that modify its behavior. These subcommands are:

  • LATENCY LATEST - returns the latest latency samples for all events.
  • LATENCY HISTORY - returns latency time series for a given event.
  • LATENCY RESET - resets latency time series data for one or more events.
  • LATENCY GRAPH - renders an ASCII-art graph of an event's latency samples.
  • LATENCY DOCTOR - replies with a human-readable latency analysis report.

Refer to each subcommand's documentation page for further information.

11.5 - Memory optimization

Strategies for optimizing memory usage in Redis

Special encoding of small aggregate data types

Since Redis 2.2 many data types are optimized to use less space up to a certain size. Hashes, Lists, Sets composed of just integers, and Sorted Sets, when smaller than a given number of elements, and up to a maximum element size, are encoded in a very memory efficient way that uses up to 10 times less memory (with 5 time less memory used being the average saving).

This is completely transparent from the point of view of the user and API. Since this is a CPU / memory trade off it is possible to tune the maximum number of elements and maximum element size for special encoded types using the following redis.conf directives.

hash-max-ziplist-entries 512
hash-max-ziplist-value 64
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
set-max-intset-entries 512

If a specially encoded value overflows the configured max size, Redis will automatically convert it into normal encoding. This operation is very fast for small values, but if you change the setting in order to use specially encoded values for much larger aggregate types the suggestion is to run some benchmarks and tests to check the conversion time.

Using 32 bit instances

Redis compiled with 32 bit target uses a lot less memory per key, since pointers are small, but such an instance will be limited to 4 GB of maximum memory usage. To compile Redis as 32 bit binary use make 32bit. RDB and AOF files are compatible between 32 bit and 64 bit instances (and between little and big endian of course) so you can switch from 32 to 64 bit, or the contrary, without problems.

Bit and byte level operations

Redis 2.2 introduced new bit and byte level operations: GETRANGE, SETRANGE, GETBIT and SETBIT. Using these commands you can treat the Redis string type as a random access array. For instance if you have an application where users are identified by a unique progressive integer number, you can use a bitmap in order to save information about the subscription of users in a mailing list, setting the bit for subscribed and clearing it for unsubscribed, or the other way around. With 100 million users this data will take just 12 megabytes of RAM in a Redis instance. You can do the same using GETRANGE and SETRANGE in order to store one byte of information for each user. This is just an example but it is actually possible to model a number of problems in very little space with these new primitives.

Use hashes when possible

Small hashes are encoded in a very small space, so you should try representing your data using hashes whenever possible. For instance if you have objects representing users in a web application, instead of using different keys for name, surname, email, password, use a single hash with all the required fields.

If you want to know more about this, read the next section.

Using hashes to abstract a very memory efficient plain key-value store on top of Redis

I understand the title of this section is a bit scary, but I'm going to explain in details what this is about.

Basically it is possible to model a plain key-value store using Redis where values can just be just strings, that is not just more memory efficient than Redis plain keys but also much more memory efficient than memcached.

Let's start with some facts: a few keys use a lot more memory than a single key containing a hash with a few fields. How is this possible? We use a trick. In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like a hash table.

But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains grows too large (you can configure the limit in redis.conf).

This does not only work well from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than a hash table).

However since hash fields and values are not (always) represented as full featured Redis objects, hash fields can't have an associated time to live (expire) like a real key, and can only contain a string. But we are okay with this, this was the intention anyway when the hash data type API was designed (we trust simplicity more than features, so nested data structures are not allowed, as expires of single fields are not allowed).

So hashes are memory efficient. This is useful when using hashes to represent objects or to model other problems when there are group of related fields. But what about if we have a plain key value business?

Imagine we want to use Redis as a cache for many small objects, that can be JSON encoded objects, small HTML fragments, simple key -> boolean values and so forth. Basically anything is a string -> string map with small keys and values.

Now let's assume the objects we want to cache are numbered, like:

  • object:102393
  • object:1234
  • object:5

This is what we can do. Every time we perform a SET operation to set a new value, we actually split the key into two parts, one part used as a key, and the other part used as the field name for the hash. For instance the object named "object:1234" is actually split into:

  • a Key named object:12
  • a Field named 34

So we use all the characters but the last two for the key, and the final two characters for the hash field name. To set our key we use the following command:

HSET object:12 34 somevalue

As you can see every hash will end containing 100 fields, that is an optimal compromise between CPU and memory saved.

There is another important thing to note, with this schema every hash will have more or less 100 fields regardless of the number of objects we cached. This is since our objects will always end with a number, and not a random string. In some way the final number can be considered as a form of implicit pre-sharding.

What about small numbers? Like object:2? We handle this case using just "object:" as a key name, and the whole number as the hash field name. So object:2 and object:10 will both end inside the key "object:", but one as field name "2" and one as "10".

How much memory do we save this way?

I used the following Ruby program to test how this works:

require 'rubygems'
require 'redis'

USE_OPTIMIZATION = true

def hash_get_key_field(key)
  s = key.split(':')
  if s[1].length > 2
    { key: s[0] + ':' + s[1][0..-3], field: s[1][-2..-1] }
  else
    { key: s[0] + ':', field: s[1] }
  end
end

def hash_set(r, key, value)
  kf = hash_get_key_field(key)
  r.hset(kf[:key], kf[:field], value)
end

def hash_get(r, key, value)
  kf = hash_get_key_field(key)
  r.hget(kf[:key], kf[:field], value)
end

r = Redis.new
(0..100_000).each do |id|
  key = "object:#{id}"
  if USE_OPTIMIZATION
    hash_set(r, key, 'val')
  else
    r.set(key, 'val')
  end
end

This is the result against a 64 bit instance of Redis 2.2:

  • USE_OPTIMIZATION set to true: 1.7 MB of used memory
  • USE_OPTIMIZATION set to false; 11 MB of used memory

This is an order of magnitude, I think this makes Redis more or less the most memory efficient plain key value store out there.

WARNING: for this to work, make sure that in your redis.conf you have something like this:

hash-max-zipmap-entries 256

Also remember to set the following field accordingly to the maximum size of your keys and values:

hash-max-zipmap-value 1024

Every time a hash exceeds the number of elements or element size specified it will be converted into a real hash table, and the memory saving will be lost.

You may ask, why don't you do this implicitly in the normal key space so that I don't have to care? There are two reasons: one is that we tend to make tradeoffs explicit, and this is a clear tradeoff between many things: CPU, memory, max element size. The second is that the top level key space must support a lot of interesting things like expires, LRU data, and so forth so it is not practical to do this in a general way.

But the Redis Way is that the user must understand how things work so that he is able to pick the best compromise, and to understand how the system will behave exactly.

Memory allocation

To store user keys, Redis allocates at most as much memory as the maxmemory setting enables (however there are small extra allocations possible).

The exact value can be set in the configuration file or set later via CONFIG SET (see Using memory as an LRU cache for more info). There are a few things that should be noted about how Redis manages memory:

  • Redis will not always free up (return) memory to the OS when keys are removed. This is not something special about Redis, but it is how most malloc() implementations work. For example if you fill an instance with 5GB worth of data, and then remove the equivalent of 2GB of data, the Resident Set Size (also known as the RSS, which is the number of memory pages consumed by the process) will probably still be around 5GB, even if Redis will claim that the user memory is around 3GB. This happens because the underlying allocator can't easily release the memory. For example often most of the removed keys were allocated in the same pages as the other keys that still exist.
  • The previous point means that you need to provision memory based on your peak memory usage. If your workload from time to time requires 10GB, even if most of the times 5GB could do, you need to provision for 10GB.
  • However allocators are smart and are able to reuse free chunks of memory, so after you freed 2GB of your 5GB data set, when you start adding more keys again, you'll see the RSS (Resident Set Size) stay steady and not grow more, as you add up to 2GB of additional keys. The allocator is basically trying to reuse the 2GB of memory previously (logically) freed.
  • Because of all this, the fragmentation ratio is not reliable when you had a memory usage that at peak is much larger than the currently used memory. The fragmentation is calculated as the physical memory actually used (the RSS value) divided by the amount of memory currently in use (as the sum of all the allocations performed by Redis). Because the RSS reflects the peak memory, when the (virtually) used memory is low since a lot of keys / values were freed, but the RSS is high, the ratio RSS / mem_used will be very high.

If maxmemory is not set Redis will keep allocating memory as it sees fit and thus it can (gradually) eat up all your free memory. Therefore it is generally advisable to configure some limit. You may also want to set maxmemory-policy to noeviction (which is not the default value in some older versions of Redis).

It makes Redis return an out of memory error for write commands if and when it reaches the limit - which in turn may result in errors in the application but will not render the whole machine dead because of memory starvation.

12 - Redis programming patterns

Novel patterns for working with Redis data structures

The following documents describe some novel development patterns you can use with Redis.

12.1 - Bulk loading

Writing data in bulk using the Redis protocol

Bulk loading is the process of loading Redis with a large amount of pre-existing data. Ideally, you want to perform this operation quickly and efficiently. This document describes some strategies for bulk loading data in Redis.

Bulk loading using the Redis protocol

Using a normal Redis client to perform bulk loading is not a good idea for a few reasons: the naive approach of sending one command after the other is slow because you have to pay for the round trip time for every command. It is possible to use pipelining, but for bulk loading of many records you need to write new commands while you read replies at the same time to make sure you are inserting as fast as possible.

Only a small percentage of clients support non-blocking I/O, and not all the clients are able to parse the replies in an efficient way in order to maximize throughput. For all of these reasons the preferred way to mass import data into Redis is to generate a text file containing the Redis protocol, in raw format, in order to call the commands needed to insert the required data.

For instance if I need to generate a large data set where there are billions of keys in the form: `keyN -> ValueN' I will create a file containing the following commands in the Redis protocol format:

SET Key0 Value0
SET Key1 Value1
...
SET KeyN ValueN

Once this file is created, the remaining action is to feed it to Redis as fast as possible. In the past the way to do this was to use the netcat with the following command:

(cat data.txt; sleep 10) | nc localhost 6379 > /dev/null

However this is not a very reliable way to perform mass import because netcat does not really know when all the data was transferred and can't check for errors. In 2.6 or later versions of Redis the redis-cli utility supports a new mode called pipe mode that was designed in order to perform bulk loading.

Using the pipe mode the command to run looks like the following:

cat data.txt | redis-cli --pipe

That will produce an output similar to this:

All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 1000000

The redis-cli utility will also make sure to only redirect errors received from the Redis instance to the standard output.

Generating Redis Protocol

The Redis protocol is extremely simple to generate and parse, and is Documented here. However in order to generate protocol for the goal of bulk loading you don't need to understand every detail of the protocol, but just that every command is represented in the following way:

*<args><cr><lf>
$<len><cr><lf>
<arg0><cr><lf>
<arg1><cr><lf>
...
<argN><cr><lf>

Where <cr> means "\r" (or ASCII character 13) and <lf> means "\n" (or ASCII character 10).

For instance the command SET key value is represented by the following protocol:

*3<cr><lf>
$3<cr><lf>
SET<cr><lf>
$3<cr><lf>
key<cr><lf>
$5<cr><lf>
value<cr><lf>

Or represented as a quoted string:

"*3\r\n$3\r\nSET\r\n$3\r\nkey\r\n$5\r\nvalue\r\n"

The file you need to generate for bulk loading is just composed of commands represented in the above way, one after the other.

The following Ruby function generates valid protocol:

def gen_redis_proto(*cmd)
    proto = ""
    proto << "*"+cmd.length.to_s+"\r\n"
    cmd.each{|arg|
        proto << "$"+arg.to_s.bytesize.to_s+"\r\n"
        proto << arg.to_s+"\r\n"
    }
    proto
end

puts gen_redis_proto("SET","mykey","Hello World!").inspect

Using the above function it is possible to easily generate the key value pairs in the above example, with this program:

(0...1000).each{|n|
    STDOUT.write(gen_redis_proto("SET","Key#{n}","Value#{n}"))
}

We can run the program directly in pipe to redis-cli in order to perform our first mass import session.

$ ruby proto.rb | redis-cli --pipe
All data transferred. Waiting for the last reply...
Last reply received from server.
errors: 0, replies: 1000

How the pipe mode works under the hood

The magic needed inside the pipe mode of redis-cli is to be as fast as netcat and still be able to understand when the last reply was sent by the server at the same time.

This is obtained in the following way:

  • redis-cli --pipe tries to send data as fast as possible to the server.
  • At the same time it reads data when available, trying to parse it.
  • Once there is no more data to read from stdin, it sends a special ECHO command with a random 20 byte string: we are sure this is the latest command sent, and we are sure we can match the reply checking if we receive the same 20 bytes as a bulk reply.
  • Once this special final command is sent, the code receiving replies starts to match replies with these 20 bytes. When the matching reply is reached it can exit with success.

Using this trick we don't need to parse the protocol we send to the server in order to understand how many commands we are sending, but just the replies.

However while parsing the replies we take a counter of all the replies parsed so that at the end we are able to tell the user the amount of commands transferred to the server by the mass insert session.

12.2 - Distributed Locks with Redis

A Distributed Lock Pattern with Redis

Distributed locks are a very useful primitive in many environments where different processes must operate with shared resources in a mutually exclusive way.

There are a number of libraries and blog posts describing how to implement a DLM (Distributed Lock Manager) with Redis, but every library uses a different approach, and many use a simple approach with lower guarantees compared to what can be achieved with slightly more complex designs.

This page describes a more canonical algorithm to implement distributed locks with Redis. We propose an algorithm, called Redlock, which implements a DLM which we believe to be safer than the vanilla single instance approach. We hope that the community will analyze it, provide feedback, and use it as a starting point for the implementations or more complex or alternative designs.

Implementations

Before describing the algorithm, here are a few links to implementations already available that can be used for reference.

Safety and Liveness Guarantees

We are going to model our design with just three properties that, from our point of view, are the minimum guarantees needed to use distributed locks in an effective way.

  1. Safety property: Mutual exclusion. At any given moment, only one client can hold a lock.
  2. Liveness property A: Deadlock free. Eventually it is always possible to acquire a lock, even if the client that locked a resource crashes or gets partitioned.
  3. Liveness property B: Fault tolerance. As long as the majority of Redis nodes are up, clients are able to acquire and release locks.

Why Failover-based Implementations Are Not Enough

To understand what we want to improve, let’s analyze the current state of affairs with most Redis-based distributed lock libraries.

The simplest way to use Redis to lock a resource is to create a key in an instance. The key is usually created with a limited time to live, using the Redis expires feature, so that eventually it will get released (property 2 in our list). When the client needs to release the resource, it deletes the key.

Superficially this works well, but there is a problem: this is a single point of failure in our architecture. What happens if the Redis master goes down? Well, let’s add a replica! And use it if the master is unavailable. This is unfortunately not viable. By doing so we can’t implement our safety property of mutual exclusion, because Redis replication is asynchronous.

There is a race condition with this model:

  1. Client A acquires the lock in the master.
  2. The master crashes before the write to the key is transmitted to the replica.
  3. The replica gets promoted to master.
  4. Client B acquires the lock to the same resource A already holds a lock for. SAFETY VIOLATION!

Sometimes it is perfectly fine that, under special circumstances, for example during a failure, multiple clients can hold the lock at the same time. If this is the case, you can use your replication based solution. Otherwise we suggest to implement the solution described in this document.

Correct Implementation with a Single Instance

Before trying to overcome the limitation of the single instance setup described above, let’s check how to do it correctly in this simple case, since this is actually a viable solution in applications where a race condition from time to time is acceptable, and because locking into a single instance is the foundation we’ll use for the distributed algorithm described here.

To acquire the lock, the way to go is the following:

    SET resource_name my_random_value NX PX 30000

The command will set the key only if it does not already exist (NX option), with an expire of 30000 milliseconds (PX option). The key is set to a value “my_random_value”. This value must be unique across all clients and all lock requests.

Basically the random value is used in order to release the lock in a safe way, with a script that tells Redis: remove the key only if it exists and the value stored at the key is exactly the one I expect to be. This is accomplished by the following Lua script:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

This is important in order to avoid removing a lock that was created by another client. For example a client may acquire the lock, get blocked performing some operation for longer than the lock validity time (the time at which the key will expire), and later remove the lock, that was already acquired by some other client. Using just DEL is not safe as a client may remove another client's lock. With the above script instead every lock is “signed” with a random string, so the lock will be removed only if it is still the one that was set by the client trying to remove it.

What should this random string be? We assume it’s 20 bytes from /dev/urandom, but you can find cheaper ways to make it unique enough for your tasks. For example a safe pick is to seed RC4 with /dev/urandom, and generate a pseudo random stream from that. A simpler solution is to use a UNIX timestamp with microsecond precision, concatenating the timestamp with a client ID. It is not as safe, but probably sufficient for most environments.

The "lock validity time" is the time we use as the key's time to live. It is both the auto release time, and the time the client has in order to perform the operation required before another client may be able to acquire the lock again, without technically violating the mutual exclusion guarantee, which is only limited to a given window of time from the moment the lock is acquired.

So now we have a good way to acquire and release the lock. With this system, reasoning about a non-distributed system composed of a single, always available, instance, is safe. Let’s extend the concept to a distributed system where we don’t have such guarantees.

The Redlock Algorithm

In the distributed version of the algorithm we assume we have N Redis masters. Those nodes are totally independent, so we don’t use replication or any other implicit coordination system. We already described how to acquire and release the lock safely in a single instance. We take for granted that the algorithm will use this method to acquire and release the lock in a single instance. In our examples we set N=5, which is a reasonable value, so we need to run 5 Redis masters on different computers or virtual machines in order to ensure that they’ll fail in a mostly independent way.

In order to acquire the lock, the client performs the following operations:

  1. It gets the current time in milliseconds.
  2. It tries to acquire the lock in all the N instances sequentially, using the same key name and random value in all the instances. During step 2, when setting the lock in each instance, the client uses a timeout which is small compared to the total lock auto-release time in order to acquire it. For example if the auto-release time is 10 seconds, the timeout could be in the ~ 5-50 milliseconds range. This prevents the client from remaining blocked for a long time trying to talk with a Redis node which is down: if an instance is not available, we should try to talk with the next instance ASAP.
  3. The client computes how much time elapsed in order to acquire the lock, by subtracting from the current time the timestamp obtained in step 1. If and only if the client was able to acquire the lock in the majority of the instances (at least 3), and the total time elapsed to acquire the lock is less than lock validity time, the lock is considered to be acquired.
  4. If the lock was acquired, its validity time is considered to be the initial validity time minus the time elapsed, as computed in step 3.
  5. If the client failed to acquire the lock for some reason (either it was not able to lock N/2+1 instances or the validity time is negative), it will try to unlock all the instances (even the instances it believed it was not able to lock).

Is the Algorithm Asynchronous?

The algorithm relies on the assumption that while there is no synchronized clock across the processes, the local time in every process updates at approximately at the same rate, with a small margin of error compared to the auto-release time of the lock. This assumption closely resembles a real-world computer: every computer has a local clock and we can usually rely on different computers to have a clock drift which is small.

At this point we need to better specify our mutual exclusion rule: it is guaranteed only as long as the client holding the lock terminates its work within the lock validity time (as obtained in step 3), minus some time (just a few milliseconds in order to compensate for clock drift between processes).

This paper contains more information about similar systems requiring a bound clock drift: Leases: an efficient fault-tolerant mechanism for distributed file cache consistency.

Retry on Failure

When a client is unable to acquire the lock, it should try again after a random delay in order to try to desynchronize multiple clients trying to acquire the lock for the same resource at the same time (this may result in a split brain condition where nobody wins). Also the faster a client tries to acquire the lock in the majority of Redis instances, the smaller the window for a split brain condition (and the need for a retry), so ideally the client should try to send the SET commands to the N instances at the same time using multiplexing.

It is worth stressing how important it is for clients that fail to acquire the majority of locks, to release the (partially) acquired locks ASAP, so that there is no need to wait for key expiry in order for the lock to be acquired again (however if a network partition happens and the client is no longer able to communicate with the Redis instances, there is an availability penalty to pay as it waits for key expiration).

Releasing the Lock

Releasing the lock is simple, and can be performed whether or not the client believes it was able to successfully lock a given instance.

Safety Arguments

Is the algorithm safe? Let's examine what happens in different scenarios.

To start let’s assume that a client is able to acquire the lock in the majority of instances. All the instances will contain a key with the same time to live. However, the key was set at different times, so the keys will also expire at different times. But if the first key was set at worst at time T1 (the time we sample before contacting the first server) and the last key was set at worst at time T2 (the time we obtained the reply from the last server), we are sure that the first key to expire in the set will exist for at least MIN_VALIDITY=TTL-(T2-T1)-CLOCK_DRIFT. All the other keys will expire later, so we are sure that the keys will be simultaneously set for at least this time.

During the time that the majority of keys are set, another client will not be able to acquire the lock, since N/2+1 SET NX operations can’t succeed if N/2+1 keys already exist. So if a lock was acquired, it is not possible to re-acquire it at the same time (violating the mutual exclusion property).

However we want to also make sure that multiple clients trying to acquire the lock at the same time can’t simultaneously succeed.

If a client locked the majority of instances using a time near, or greater, than the lock maximum validity time (the TTL we use for SET basically), it will consider the lock invalid and will unlock the instances, so we only need to consider the case where a client was able to lock the majority of instances in a time which is less than the validity time. In this case for the argument already expressed above, for MIN_VALIDITY no client should be able to re-acquire the lock. So multiple clients will be able to lock N/2+1 instances at the same time (with "time" being the end of Step 2) only when the time to lock the majority was greater than the TTL time, making the lock invalid.

Liveness Arguments

The system liveness is based on three main features:

  1. The auto release of the lock (since keys expire): eventually keys are available again to be locked.
  2. The fact that clients, usually, will cooperate removing the locks when the lock was not acquired, or when the lock was acquired and the work terminated, making it likely that we don’t have to wait for keys to expire to re-acquire the lock.
  3. The fact that when a client needs to retry a lock, it waits a time which is comparably greater than the time needed to acquire the majority of locks, in order to probabilistically make split brain conditions during resource contention unlikely.

However, we pay an availability penalty equal to TTL time on network partitions, so if there are continuous partitions, we can pay this penalty indefinitely. This happens every time a client acquires a lock and gets partitioned away before being able to remove the lock.

Basically if there are infinite continuous network partitions, the system may become not available for an infinite amount of time.

Performance, Crash Recovery and fsync

Many users using Redis as a lock server need high performance in terms of both latency to acquire and release a lock, and number of acquire / release operations that it is possible to perform per second. In order to meet this requirement, the strategy to talk with the N Redis servers to reduce latency is definitely multiplexing (putting the socket in non-blocking mode, send all the commands, and read all the commands later, assuming that the RTT between the client and each instance is similar).

However there is another consideration around persistence if we want to target a crash-recovery system model.

Basically to see the problem here, let’s assume we configure Redis without persistence at all. A client acquires the lock in 3 of 5 instances. One of the instances where the client was able to acquire the lock is restarted, at this point there are again 3 instances that we can lock for the same resource, and another client can lock it again, violating the safety property of exclusivity of lock.

If we enable AOF persistence, things will improve quite a bit. For example we can upgrade a server by sending it a SHUTDOWN command and restarting it. Because Redis expires are semantically implemented so that time still elapses when the server is off, all our requirements are fine. However everything is fine as long as it is a clean shutdown. What about a power outage? If Redis is configured, as by default, to fsync on disk every second, it is possible that after a restart our key is missing. In theory, if we want to guarantee the lock safety in the face of any kind of instance restart, we need to enable fsync=always in the persistence settings. This will affect performance due to the additional sync overhead.

However things are better than they look like at a first glance. Basically, the algorithm safety is retained as long as when an instance restarts after a crash, it no longer participates to any currently active lock. This means that the set of currently active locks when the instance restarts were all obtained by locking instances other than the one which is rejoining the system.

To guarantee this we just need to make an instance, after a crash, unavailable for at least a bit more than the max TTL we use. This is the time needed for all the keys about the locks that existed when the instance crashed to become invalid and be automatically released.

Using delayed restarts it is basically possible to achieve safety even without any kind of Redis persistence available, however note that this may translate into an availability penalty. For example if a majority of instances crash, the system will become globally unavailable for TTL (here globally means that no resource at all will be lockable during this time).

Making the algorithm more reliable: Extending the lock

If the work performed by clients consists of small steps, it is possible to use smaller lock validity times by default, and extend the algorithm implementing a lock extension mechanism. Basically the client, if in the middle of the computation while the lock validity is approaching a low value, may extend the lock by sending a Lua script to all the instances that extends the TTL of the key if the key exists and its value is still the random value the client assigned when the lock was acquired.

The client should only consider the lock re-acquired if it was able to extend the lock into the majority of instances, and within the validity time (basically the algorithm to use is very similar to the one used when acquiring the lock).

However this does not technically change the algorithm, so the maximum number of lock reacquisition attempts should be limited, otherwise one of the liveness properties is violated.

Want to help?

If you are into distributed systems, it would be great to have your opinion / analysis. Also reference implementations in other languages could be great.

Thanks in advance!

Analysis of Redlock


  1. Martin Kleppmann analyzed Redlock here. A counterpoint to this analysis can be found here.

12.3 - Secondary indexing

Building secondary indexes in Redis

Redis is not exactly a key-value store, since values can be complex data structures. However it has an external key-value shell: at API level data is addressed by the key name. It is fair to say that, natively, Redis only offers primary key access. However since Redis is a data structures server, its capabilities can be used for indexing, in order to create secondary indexes of different kinds, including composite (multi-column) indexes.

This document explains how it is possible to create indexes in Redis using the following data structures:

  • Sorted sets to create secondary indexes by ID or other numerical fields.
  • Sorted sets with lexicographical ranges for creating more advanced secondary indexes, composite indexes and graph traversal indexes.
  • Sets for creating random indexes.
  • Lists for creating simple iterable indexes and last N items indexes.

Implementing and maintaining indexes with Redis is an advanced topic, so most users that need to perform complex queries on data should understand if they are better served by a relational store. However often, especially in caching scenarios, there is the explicit need to store indexed data into Redis in order to speedup common queries which require some form of indexing in order to be executed.

Simple numerical indexes with sorted sets

The simplest secondary index you can create with Redis is by using the sorted set data type, which is a data structure representing a set of elements ordered by a floating point number which is the score of each element. Elements are ordered from the smallest to the highest score.

Since the score is a double precision float, indexes you can build with vanilla sorted sets are limited to things where the indexing field is a number within a given range.

The two commands to build these kind of indexes are ZADD and ZRANGEBYSCORE to respectively add items and retrieve items within a specified range.

For instance, it is possible to index a set of person names by their age by adding element to a sorted set. The element will be the name of the person and the score will be the age.

ZADD myindex 25 Manuel
ZADD myindex 18 Anna
ZADD myindex 35 Jon
ZADD myindex 67 Helen

In order to retrieve all persons with an age between 20 and 40, the following command can be used:

ZRANGEBYSCORE myindex 20 40
1) "Manuel"
2) "Jon"

By using the WITHSCORES option of ZRANGEBYSCORE it is also possible to obtain the scores associated with the returned elements.

The ZCOUNT command can be used in order to retrieve the number of elements within a given range, without actually fetching the elements, which is also useful, especially given the fact the operation is executed in logarithmic time regardless of the size of the range.

Ranges can be inclusive or exclusive, please refer to the ZRANGEBYSCORE command documentation for more information.

Note: Using the ZREVRANGEBYSCORE it is possible to query a range in reversed order, which is often useful when data is indexed in a given direction (ascending or descending) but we want to retrieve information the other way around.

Using objects IDs as associated values

In the above example we associated names to ages. However in general we may want to index some field of an object which is stored elsewhere. Instead of using the sorted set value directly to store the data associated with the indexed field, it is possible to store just the ID of the object.

For example I may have Redis hashes representing users. Each user is represented by a single key, directly accessible by ID:

HMSET user:1 id 1 username antirez ctime 1444809424 age 38
HMSET user:2 id 2 username maria ctime 1444808132 age 42
HMSET user:3 id 3 username jballard ctime 1443246218 age 33

If I want to create an index in order to query users by their age, I could do:

ZADD user.age.index 38 1
ZADD user.age.index 42 2
ZADD user.age.index 33 3

This time the value associated with the score in the sorted set is the ID of the object. So once I query the index with ZRANGEBYSCORE I'll also have to retrieve the information I need with HGETALL or similar commands. The obvious advantage is that objects can change without touching the index, as long as we don't change the indexed field.

In the next examples we'll almost always use IDs as values associated with the index, since this is usually the more sounding design, with a few exceptions.

Updating simple sorted set indexes

Often we index things which change over time. In the above example, the age of the user changes every year. In such a case it would make sense to use the birth date as index instead of the age itself, but there are other cases where we simply want some field to change from time to time, and the index to reflect this change.

The ZADD command makes updating simple indexes a very trivial operation since re-adding back an element with a different score and the same value will simply update the score and move the element at the right position, so if the user antirez turned 39 years old, in order to update the data in the hash representing the user, and in the index as well, we need to execute the following two commands:

HSET user:1 age 39
ZADD user.age.index 39 1

The operation may be wrapped in a MULTI/EXEC transaction in order to make sure both fields are updated or none.

Turning multi dimensional data into linear data

Indexes created with sorted sets are able to index only a single numerical value. Because of this you may think it is impossible to index something which has multiple dimensions using this kind of indexes, but actually this is not always true. If you can efficiently represent something multi-dimensional in a linear way, they it is often possible to use a simple sorted set for indexing.

For example the Redis geo indexing API uses a sorted set to index places by latitude and longitude using a technique called Geo hash. The sorted set score represents alternating bits of longitude and latitude, so that we map the linear score of a sorted set to many small squares in the earth surface. By doing an 8+1 style center plus neighborhoods search it is possible to retrieve elements by radius.

Limits of the score

Sorted set elements scores are double precision floats. It means that they can represent different decimal or integer values with different errors, because they use an exponential representation internally. However what is interesting for indexing purposes is that the score is always able to represent without any error numbers between -9007199254740992 and 9007199254740992, which is -/+ 2^53.

When representing much larger numbers, you need a different form of indexing that is able to index numbers at any precision, called a lexicographical index.

Lexicographical indexes

Redis sorted sets have an interesting property. When elements are added with the same score, they are sorted lexicographically, comparing the strings as binary data with the memcmp() function.

For people that don't know the C language nor the memcmp function, what it means is that elements with the same score are sorted comparing the raw values of their bytes, byte after byte. If the first byte is the same, the second is checked and so forth. If the common prefix of two strings is the same then the longer string is considered the greater of the two, so "foobar" is greater than "foo".

There are commands such as ZRANGEBYLEX and ZLEXCOUNT that are able to query and count ranges in a lexicographically fashion, assuming they are used with sorted sets where all the elements have the same score.

This Redis feature is basically equivalent to a b-tree data structure which is often used in order to implement indexes with traditional databases. As you can guess, because of this, it is possible to use this Redis data structure in order to implement pretty fancy indexes.

Before we dive into using lexicographical indexes, let's check how sorted sets behave in this special mode of operation. Since we need to add elements with the same score, we'll always use the special score of zero.

ZADD myindex 0 baaa
ZADD myindex 0 abbb
ZADD myindex 0 aaaa
ZADD myindex 0 bbbb

Fetching all the elements from the sorted set immediately reveals that they are ordered lexicographically.

ZRANGE myindex 0 -1
1) "aaaa"
2) "abbb"
3) "baaa"
4) "bbbb"

Now we can use ZRANGEBYLEX in order to perform range queries.

ZRANGEBYLEX myindex [a (b
1) "aaaa"
2) "abbb"

Note that in the range queries we prefixed the min and max elements identifying the range with the special characters [ and (. This prefixes are mandatory, and they specify if the elements of the range are inclusive or exclusive. So the range [a (b means give me all the elements lexicographically between a inclusive and b exclusive, which are all the elements starting with a.

There are also two more special characters indicating the infinitely negative string and the infinitely positive string, which are - and +.

ZRANGEBYLEX myindex [b +
1) "baaa"
2) "bbbb"

That's it basically. Let's see how to use these features to build indexes.

A first example: completion

An interesting application of indexing is completion. Completion is what happens when you start typing your query into a search engine: the user interface will anticipate what you are likely typing, providing common queries that start with the same characters.

A naive approach to completion is to just add every single query we get from the user into the index. For example if the user searches banana we'll just do:

ZADD myindex 0 banana

And so forth for each search query ever encountered. Then when we want to complete the user input, we execute a range query using ZRANGEBYLEX. Imagine the user is typing "bit" inside the search form, and we want to offer possible search keywords starting for "bit". We send Redis a command like that:

ZRANGEBYLEX myindex "[bit" "[bit\xff"

Basically we create a range using the string the user is typing right now as start, and the same string plus a trailing byte set to 255, which is \xff in the example, as the end of the range. This way we get all the strings that start for the string the user is typing.

Note that we don't want too many items returned, so we may use the LIMIT option in order to reduce the number of results.

Adding frequency into the mix

The above approach is a bit naive, because all the user searches are the same in this way. In a real system we want to complete strings according to their frequency: very popular searches will be proposed with an higher probability compared to search strings typed very rarely.

In order to implement something which depends on the frequency, and at the same time automatically adapts to future inputs, by purging searches that are no longer popular, we can use a very simple streaming algorithm.

To start, we modify our index in order to store not just the search term, but also the frequency the term is associated with. So instead of just adding banana we add banana:1, where 1 is the frequency.

ZADD myindex 0 banana:1

We also need logic in order to increment the index if the search term already exists in the index, so what we'll actually do is something like that:

ZRANGEBYLEX myindex "[banana:" + LIMIT 0 1
1) "banana:1"

This will return the single entry of banana if it exists. Then we can increment the associated frequency and send the following two commands:

ZREM myindex 0 banana:1
ZADD myindex 0 banana:2

Note that because it is possible that there are concurrent updates, the above three commands should be send via a Lua script instead, so that the Lua script will atomically get the old count and re-add the item with incremented score.

So the result will be that, every time a user searches for banana we'll get our entry updated.

There is more: our goal is to just have items searched very frequently. So we need some form of purging. When we actually query the index in order to complete the user input, we may see something like that:

ZRANGEBYLEX myindex "[banana:" + LIMIT 0 10
1) "banana:123"
2) "banaooo:1"
3) "banned user:49"
4) "banning:89"

Apparently nobody searches for "banaooo", for example, but the query was performed a single time, so we end presenting it to the user.

This is what we can do. Out of the returned items, we pick a random one, decrement its score by one, and re-add it with the new score. However if the score reaches 0, we simply remove the item from the list. You can use much more advanced systems, but the idea is that the index in the long run will contain top searches, and if top searches will change over the time it will adapt automatically.

A refinement to this algorithm is to pick entries in the list according to their weight: the higher the score, the less likely entries are picked in order to decrement its score, or evict them.

Normalizing strings for case and accents

In the completion examples we always used lowercase strings. However reality is much more complex than that: languages have capitalized names, accents, and so forth.

One simple way do deal with this issues is to actually normalize the string the user searches. Whatever the user searches for "Banana", "BANANA" or "Ba'nana" we may always turn it into "banana".

However sometimes we may like to present the user with the original item typed, even if we normalize the string for indexing. In order to do this, what we do is to change the format of the index so that instead of just storing term:frequency we store normalized:frequency:original like in the following example:

ZADD myindex 0 banana:273:Banana

Basically we add another field that we'll extract and use only for visualization. Ranges will always be computed using the normalized strings instead. This is a common trick which has multiple applications.

Adding auxiliary information in the index

When using a sorted set in a direct way, we have two different attributes for each object: the score, which we use as an index, and an associated value. When using lexicographical indexes instead, the score is always set to 0 and basically not used at all. We are left with a single string, which is the element itself.

Like we did in the previous completion examples, we are still able to store associated data using separators. For example we used the colon in order to add the frequency and the original word for completion.

In general we can add any kind of associated value to our indexing key. In order to use a lexicographical index to implement a simple key-value store we just store the entry as key:value:

ZADD myindex 0 mykey:myvalue

And search for the key with:

ZRANGEBYLEX myindex [mykey: + LIMIT 0 1
1) "mykey:myvalue"

Then we extract the part after the colon to retrieve the value. However a problem to solve in this case is collisions. The colon character may be part of the key itself, so it must be chosen in order to never collide with the key we add.

Since lexicographical ranges in Redis are binary safe you can use any byte or any sequence of bytes. However if you receive untrusted user input, it is better to use some form of escaping in order to guarantee that the separator will never happen to be part of the key.

For example if you use two null bytes as separator "\0\0", you may want to always escape null bytes into two bytes sequences in your strings.

Numerical padding

Lexicographical indexes may look like good only when the problem at hand is to index strings. Actually it is very simple to use this kind of index in order to perform indexing of arbitrary precision numbers.

In the ASCII character set, digits appear in the order from 0 to 9, so if we left-pad numbers with leading zeroes, the result is that comparing them as strings will order them by their numerical value.

ZADD myindex 0 00324823481:foo
ZADD myindex 0 12838349234:bar
ZADD myindex 0 00000000111:zap

ZRANGE myindex 0 -1
1) "00000000111:zap"
2) "00324823481:foo"
3) "12838349234:bar"

We effectively created an index using a numerical field which can be as big as we want. This also works with floating point numbers of any precision by making sure we left pad the numerical part with leading zeroes and the decimal part with trailing zeroes like in the following list of numbers:

    01000000000000.11000000000000
    01000000000000.02200000000000
    00000002121241.34893482930000
    00999999999999.00000000000000

Using numbers in binary form

Storing numbers in decimal may use too much memory. An alternative approach is just to store numbers, for example 128 bit integers, directly in their binary form. However for this to work, you need to store the numbers in big endian format, so that the most significant bytes are stored before the least significant bytes. This way when Redis compares the strings with memcmp(), it will effectively sort the numbers by their value.

Keep in mind that data stored in binary format is less observable for debugging, harder to parse and export. So it is definitely a trade off.

Composite indexes

So far we explored ways to index single fields. However we all know that SQL stores are able to create indexes using multiple fields. For example I may index products in a very large store by room number and price.

I need to run queries in order to retrieve all the products in a given room having a given price range. What I can do is to index each product in the following way:

ZADD myindex 0 0056:0028.44:90
ZADD myindex 0 0034:0011.00:832

Here the fields are room:price:product_id. I used just four digits padding in the example for simplicity. The auxiliary data (the product ID) does not need any padding.

With an index like that, to get all the products in room 56 having a price between 10 and 30 dollars is very easy. We can just run the following command:

ZRANGEBYLEX myindex [0056:0010.00 [0056:0030.00

The above is called a composed index. Its effectiveness depends on the order of the fields and the queries I want to run. For example the above index cannot be used efficiently in order to get all the products having a specific price range regardless of the room number. However I can use the primary key in order to run queries regardless of the price, like give me all the products in room 44.

Composite indexes are very powerful, and are used in traditional stores in order to optimize complex queries. In Redis they could be useful both to implement a very fast in-memory Redis index of something stored into a traditional data store, or in order to directly index Redis data.

Updating lexicographical indexes

The value of the index in a lexicographical index can get pretty fancy and hard or slow to rebuild from what we store about the object. So one approach to simplify the handling of the index, at the cost of using more memory, is to also take alongside to the sorted set representing the index a hash mapping the object ID to the current index value.

So for example, when we index we also add to a hash:

MULTI
ZADD myindex 0 0056:0028.44:90
HSET index.content 90 0056:0028.44:90
EXEC

This is not always needed, but simplifies the operations of updating the index. In order to remove the old information we indexed for the object ID 90, regardless of the current fields values of the object, we just have to retrieve the hash value by object ID and ZREM it in the sorted set view.

Representing and querying graphs using an hexastore

One cool thing about composite indexes is that they are handy in order to represent graphs, using a data structure which is called Hexastore.

The hexastore provides a representation for relations between objects, formed by a subject, a predicate and an object. A simple relation between objects could be:

antirez is-friend-of matteocollina

In order to represent this relation I can store the following element in my lexicographical index:

ZADD myindex 0 spo:antirez:is-friend-of:matteocollina

Note that I prefixed my item with the string spo. It means that the item represents a subject,predicate,object relation.

In can add 5 more entries for the same relation, but in a different order:

ZADD myindex 0 sop:antirez:matteocollina:is-friend-of
ZADD myindex 0 ops:matteocollina:is-friend-of:antirez
ZADD myindex 0 osp:matteocollina:antirez:is-friend-of
ZADD myindex 0 pso:is-friend-of:antirez:matteocollina
ZADD myindex 0 pos:is-friend-of:matteocollina:antirez

Now things start to be interesting, and I can query the graph in many different ways. For example, who are all the people antirez is friend of?

ZRANGEBYLEX myindex "[spo:antirez:is-friend-of:" "[spo:antirez:is-friend-of:\xff"
1) "spo:antirez:is-friend-of:matteocollina"
2) "spo:antirez:is-friend-of:wonderwoman"
3) "spo:antirez:is-friend-of:spiderman"

Or, what are all the relationships antirez and matteocollina have where the first is the subject and the second is the object?

ZRANGEBYLEX myindex "[sop:antirez:matteocollina:" "[sop:antirez:matteocollina:\xff"
1) "sop:antirez:matteocollina:is-friend-of"
2) "sop:antirez:matteocollina:was-at-conference-with"
3) "sop:antirez:matteocollina:talked-with"

By combining different queries, I can ask fancy questions. For example: Who are all my friends that, like beer, live in Barcelona, and matteocollina consider friends as well? To get this information I start with an spo query to find all the people I'm friend with. Then for each result I get I perform an spo query to check if they like beer, removing the ones for which I can't find this relation. I do it again to filter by city. Finally I perform an ops query to find, of the list I obtained, who is considered friend by matteocollina.

Make sure to check Matteo Collina's slides about Levelgraph in order to better understand these ideas.

Multi dimensional indexes

A more complex type of index is an index that allows you to perform queries where two or more variables are queried at the same time for specific ranges. For example I may have a data set representing persons age and salary, and I want to retrieve all the people between 50 and 55 years old having a salary between 70000 and 85000.

This query may be performed with a multi column index, but this requires us to select the first variable and then scan the second, which means we may do a lot more work than needed. It is possible to perform these kinds of queries involving multiple variables using different data structures. For example, multi-dimensional trees such as k-d trees or r-trees are sometimes used. Here we'll describe a different way to index data into multiple dimensions, using a representation trick that allows us to perform the query in a very efficient way using Redis lexicographical ranges.

Let's start by visualizing the problem. In this picture we have points in the space, which represent our data samples, where x and y are our coordinates. Both variables max value is 400.

The blue box in the picture represents our query. We want all the points where x is between 50 and 100, and where y is between 100 and 300.

Points in the space

In order to represent data that makes these kinds of queries fast to perform, we start by padding our numbers with 0. So for example imagine we want to add the point 10,25 (x,y) to our index. Given that the maximum range in the example is 400 we can just pad to three digits, so we obtain:

x = 010
y = 025

Now what we do is to interleave the digits, taking the leftmost digit in x, and the leftmost digit in y, and so forth, in order to create a single number:

001205

This is our index, however in order to more easily reconstruct the original representation, if we want (at the cost of space), we may also add the original values as additional columns:

001205:10:25

Now, let's reason about this representation and why it is useful in the context of range queries. For example let's take the center of our blue box, which is at x=75 and y=200. We can encode this number as we did earlier by interleaving the digits, obtaining:

027050

What happens if we substitute the last two digits respectively with 00 and 99? We obtain a range which is lexicographically continuous:

027000 to 027099

What this maps to is to a square representing all values where the x variable is between 70 and 79, and the y variable is between 200 and 209. We can write random points in this interval, in order to identify this specific area:

Small area

So the above lexicographic query allows us to easily query for points in a specific square in the picture. However the square may be too small for the box we are searching, so that too many queries are needed. So we can do the same but instead of replacing the last two digits with 00 and 99, we can do it for the last four digits, obtaining the following range:

020000 029999

This time the range represents all the points where x is between 0 and 99 and y is between 200 and 299. Drawing random points in this interval shows us this larger area:

Large area

Oops now our area is ways too big for our query, and still our search box is not completely included. We need more granularity, but we can easily obtain it by representing our numbers in binary form. This time, when we replace digits instead of getting squares which are ten times bigger, we get squares which are just two times bigger.

Our numbers in binary form, assuming we need just 9 bits for each variable (in order to represent numbers up to 400 in value) would be:

x = 75  -> 001001011
y = 200 -> 011001000

So by interleaving digits, our representation in the index would be:

000111000011001010:75:200

Let's see what are our ranges as we substitute the last 2, 4, 6, 8, ... bits with 0s ad 1s in the interleaved representation:

2 bits: x between 70 and 75, y between 200 and 201 (range=2)
4 bits: x between 72 and 75, y between 200 and 203 (range=4)
6 bits: x between 72 and 79, y between 200 and 207 (range=8)
8 bits: x between 64 and 79, y between 192 and 207 (range=16)

And so forth. Now we have definitely better granularity! As you can see substituting N bits from the index gives us search boxes of side 2^(N/2).

So what we do is check the dimension where our search box is smaller, and check the nearest power of two to this number. Our search box was 50,100 to 100,300, so it has a width of 50 and an height of 200. We take the smaller of the two, 50, and check the nearest power of two which is 64. 64 is 2^6, so we would work with indexes obtained replacing the latest 12 bits from the interleaved representation (so that we end replacing just 6 bits of each variable).

However single squares may not cover all our search, so we may need more. What we do is to start with the left bottom corner of our search box, which is 50,100, and find the first range by substituting the last 6 bits in each number with 0. Then we do the same with the right top corner.

With two trivial nested for loops where we increment only the significant bits, we can find all the squares between these two. For each square we convert the two numbers into our interleaved representation, and create the range using the converted representation as our start, and the same representation but with the latest 12 bits turned on as end range.

For each square found we perform our query and get the elements inside, removing the elements which are outside our search box.

Turning this into code is simple. Here is a Ruby example:

def spacequery(x0,y0,x1,y1,exp)
    bits=exp*2
    x_start = x0/(2**exp)
    x_end = x1/(2**exp)
    y_start = y0/(2**exp)
    y_end = y1/(2**exp)
    (x_start..x_end).each{|x|
        (y_start..y_end).each{|y|
            x_range_start = x*(2**exp)
            x_range_end = x_range_start | ((2**exp)-1)
            y_range_start = y*(2**exp)
            y_range_end = y_range_start | ((2**exp)-1)
            puts "#{x},#{y} x from #{x_range_start} to #{x_range_end}, y from #{y_range_start} to #{y_range_end}"

            # Turn it into interleaved form for ZRANGEBYLEX query.
            # We assume we need 9 bits for each integer, so the final
            # interleaved representation will be 18 bits.
            xbin = x_range_start.to_s(2).rjust(9,'0')
            ybin = y_range_start.to_s(2).rjust(9,'0')
            s = xbin.split("").zip(ybin.split("")).flatten.compact.join("")
            # Now that we have the start of the range, calculate the end
            # by replacing the specified number of bits from 0 to 1.
            e = s[0..-(bits+1)]+("1"*bits)
            puts "ZRANGEBYLEX myindex [#{s} [#{e}"
        }
    }
end

spacequery(50,100,100,300,6)

While non immediately trivial this is a very useful indexing strategy that in the future may be implemented in Redis in a native way. For now, the good thing is that the complexity may be easily encapsulated inside a library that can be used in order to perform indexing and queries. One example of such library is Redimension, a proof of concept Ruby library which indexes N-dimensional data inside Redis using the technique described here.

Multi dimensional indexes with negative or floating point numbers

The simplest way to represent negative values is just to work with unsigned integers and represent them using an offset, so that when you index, before translating numbers in the indexed representation, you add the absolute value of your smaller negative integer.

For floating point numbers, the simplest approach is probably to convert them to integers by multiplying the integer for a power of ten proportional to the number of digits after the dot you want to retain.

Non range indexes

So far we checked indexes which are useful to query by range or by single item. However other Redis data structures such as Sets or Lists can be used in order to build other kind of indexes. They are very commonly used but maybe we don't always realize they are actually a form of indexing.

For instance I can index object IDs into a Set data type in order to use the get random elements operation via SRANDMEMBER in order to retrieve a set of random objects. Sets can also be used to check for existence when all I need is to test if a given item exists or not or has a single boolean property or not.

Similarly lists can be used in order to index items into a fixed order. I can add all my items into a Redis list and rotate the list with RPOPLPUSH using the same key name as source and destination. This is useful when I want to process a given set of items again and again forever in the same order. Think of an RSS feed system that needs to refresh the local copy periodically.

Another popular index often used with Redis is a capped list, where items are added with LPUSH and trimmed with LTRIM, in order to create a view with just the latest N items encountered, in the same order they were seen.

Index inconsistency

Keeping the index updated may be challenging, in the course of months or years it is possible that inconsistencies are added because of software bugs, network partitions or other events.

Different strategies could be used. If the index data is outside Redis read repair can be a solution, where data is fixed in a lazy way when it is requested. When we index data which is stored in Redis itself the SCAN family of commands can be used in order to verify, update or rebuild the index from scratch, incrementally.

12.4 - Redis patterns example

Learn several Redis patterns by building a Twitter clone

This article describes the design and implementation of a very simple Twitter clone written using PHP with Redis as the only database. The programming community has traditionally considered key-value stores as a special purpose database that couldn't be used as a drop-in replacement for a relational database for the development of web applications. This article will try to show that Redis data structures on top of a key-value layer are an effective data model to implement many kinds of applications.

Note: the original version of this article was written in 2009 when Redis was released. It was not exactly clear at that time that the Redis data model was suitable to write entire applications. Now after 5 years there are many cases of applications using Redis as their main store, so the goal of the article today is to be a tutorial for Redis newcomers. You'll learn how to design a simple data layout using Redis, and how to apply different data structures.

Our Twitter clone, called Retwis, is structurally simple, has very good performance, and can be distributed among any number of web and Redis servers with little efforts. You can find the source code here.

I used PHP for the example since it can be read by everybody. The same (or better) results can be obtained using Ruby, Python, Erlang, and so on. A few clones exist (however not all the clones use the same data layout as the current version of this tutorial, so please, stick with the official PHP implementation for the sake of following the article better).

  • Retwis-RB is a port of Retwis to Ruby and Sinatra written by Daniel Lucraft! Full source code is included of course, and a link to its Git repository appears in the footer of this article. The rest of this article targets PHP, but Ruby programmers can also check the Retwis-RB source code since it's conceptually very similar.
  • Retwis-J is a port of Retwis to Java, using the Spring Data Framework, written by Costin Leau. Its source code can be found on GitHub, and there is comprehensive documentation available at springsource.org.

What is a key-value store?

The essence of a key-value store is the ability to store some data, called a value, inside a key. The value can be retrieved later only if we know the specific key it was stored in. There is no direct way to search for a key by value. In some sense, it is like a very large hash/dictionary, but it is persistent, i.e. when your application ends, the data doesn't go away. So, for example, I can use the command SET to store the value bar in the key foo:

SET foo bar

Redis stores data permanently, so if I later ask "What is the value stored in key foo?" Redis will reply with bar:

GET foo => bar

Other common operations provided by key-value stores are DEL, to delete a given key and its associated value, SET-if-not-exists (called SETNX on Redis), to assign a value to a key only if the key does not already exist, and INCR, to atomically increment a number stored in a given key:

SET foo 10
INCR foo => 11
INCR foo => 12
INCR foo => 13

Atomic operations

There is something special about INCR. You may wonder why Redis provides such an operation if we can do it ourselves with a bit of code? After all, it is as simple as:

x = GET foo
x = x + 1
SET foo x

The problem is that incrementing this way will work as long as there is only one client working with the key foo at one time. See what happens if two clients are accessing this key at the same time:

x = GET foo (yields 10)
y = GET foo (yields 10)
x = x + 1 (x is now 11)
y = y + 1 (y is now 11)
SET foo x (foo is now 11)
SET foo y (foo is now 11)

Something is wrong! We incremented the value two times, but instead of going from 10 to 12, our key holds 11. This is because the increment done with GET / increment / SET is not an atomic operation. Instead the INCR provided by Redis, Memcached, ..., are atomic implementations, and the server will take care of protecting the key during the time needed to complete the increment in order to prevent simultaneous accesses.

What makes Redis different from other key-value stores is that it provides other operations similar to INCR that can be used to model complex problems. This is why you can use Redis to write whole web applications without using another database like an SQL database, and without going crazy.

Beyond key-value stores: lists

In this section we will see which Redis features we need to build our Twitter clone. The first thing to know is that Redis values can be more than strings. Redis supports Lists, Sets, Hashes, Sorted Sets, Bitmaps, and HyperLogLog types as values, and there are atomic operations to operate on them so we are safe even with multiple accesses to the same key. Let's start with Lists:

LPUSH mylist a (now mylist holds 'a')
LPUSH mylist b (now mylist holds 'b','a')
LPUSH mylist c (now mylist holds 'c','b','a')

LPUSH means Left Push, that is, add an element to the left (or to the head) of the list stored in mylist. If the key mylist does not exist it is automatically created as an empty list before the PUSH operation. As you can imagine, there is also an RPUSH operation that adds the element to the right of the list (on the tail). This is very useful for our Twitter clone. User updates can be added to a list stored in username:updates, for instance.

There are operations to get data from Lists, of course. For instance, LRANGE returns a range from the list, or the whole list.

LRANGE mylist 0 1 => c,b

LRANGE uses zero-based indexes - that is the first element is 0, the second 1, and so on. The command arguments are LRANGE key first-index last-index. The last-index argument can be negative, with a special meaning: -1 is the last element of the list, -2 the penultimate, and so on. So, to get the whole list use:

LRANGE mylist 0 -1 => c,b,a

Other important operations are LLEN that returns the number of elements in the list, and LTRIM that is like LRANGE but instead of returning the specified range trims the list, so it is like Get range from mylist, Set this range as new value but does so atomically.

The Set data type

Currently we don't use the Set type in this tutorial, but since we use Sorted Sets, which are kind of a more capable version of Sets, it is better to start introducing Sets first (which are a very useful data structure per se), and later Sorted Sets.

There are more data types than just Lists. Redis also supports Sets, which are unsorted collections of elements. It is possible to add, remove, and test for existence of members, and perform the intersection between different Sets. Of course it is possible to get the elements of a Set. Some examples will make it more clear. Keep in mind that SADD is the add to set operation, SREM is the remove from set operation, SISMEMBER is the test if member operation, and SINTER is the perform intersection operation. Other operations are SCARD to get the cardinality (the number of elements) of a Set, and SMEMBERS to return all the members of a Set.

SADD myset a
SADD myset b
SADD myset foo
SADD myset bar
SCARD myset => 4
SMEMBERS myset => bar,a,foo,b

Note that SMEMBERS does not return the elements in the same order we added them since Sets are unsorted collections of elements. When you want to store in order it is better to use Lists instead. Some more operations against Sets:

SADD mynewset b
SADD mynewset foo
SADD mynewset hello
SINTER myset mynewset => foo,b

SINTER can return the intersection between Sets but it is not limited to two Sets. You may ask for the intersection of 4,5, or 10000 Sets. Finally let's check how SISMEMBER works:

SISMEMBER myset foo => 1
SISMEMBER myset notamember => 0

The Sorted Set data type

Sorted Sets are similar to Sets: collection of elements. However in Sorted Sets each element is associated with a floating point value, called the element score. Because of the score, elements inside a Sorted Set are ordered, since we can always compare two elements by score (and if the score happens to be the same, we compare the two elements as strings).

Like Sets in Sorted Sets it is not possible to add repeated elements, every element is unique. However it is possible to update an element's score.

Sorted Set commands are prefixed with Z. The following is an example of Sorted Sets usage:

ZADD zset 10 a
ZADD zset 5 b
ZADD zset 12.55 c
ZRANGE zset 0 -1 => b,a,c

In the above example we added a few elements with ZADD, and later retrieved the elements with ZRANGE. As you can see the elements are returned in order according to their score. In order to check if a given element exists, and also to retrieve its score if it exists, we use the ZSCORE command:

ZSCORE zset a => 10
ZSCORE zset non_existing_element => NULL

Sorted Sets are a very powerful data structure, you can query elements by score range, lexicographically, in reverse order, and so forth. To know more please check the Sorted Set sections in the official Redis commands documentation.

The Hash data type

This is the last data structure we use in our program, and is extremely easy to gasp since there is an equivalent in almost every programming language out there: Hashes. Redis Hashes are basically like Ruby or Python hashes, a collection of fields associated with values:

HMSET myuser name Salvatore surname Sanfilippo country Italy
HGET myuser surname => Sanfilippo

HMSET can be used to set fields in the hash, that can be retrieved with HGET later. It is possible to check if a field exists with HEXISTS, or to increment a hash field with HINCRBY and so forth.

Hashes are the ideal data structure to represent objects. For example we use Hashes in order to represent Users and Updates in our Twitter clone.

Okay, we just exposed the basics of the Redis main data structures, we are ready to start coding!

Prerequisites

If you haven't downloaded the Retwis source code already please grab it now. It contains a few PHP files, and also a copy of Predis, the PHP client library we use in this example.

Another thing you probably want is a working Redis server. Just get the source, build with make, run with ./redis-server, and you're ready to go. No configuration is required at all in order to play with or run Retwis on your computer.

Data layout

When working with a relational database, a database schema must be designed so that we'd know the tables, indexes, and so on that the database will contain. We don't have tables in Redis, so what do we need to design? We need to identify what keys are needed to represent our objects and what kind of values these keys need to hold.

Let's start with Users. We need to represent users, of course, with their username, userid, password, the set of users following a given user, the set of users a given user follows, and so on. The first question is, how should we identify a user? Like in a relational DB, a good solution is to identify different users with different numbers, so we can associate a unique ID with every user. Every other reference to this user will be done by id. Creating unique IDs is very simple to do by using our atomic INCR operation. When we create a new user we can do something like this, assuming the user is called "antirez":

INCR next_user_id => 1000
HMSET user:1000 username antirez password p1pp0

Note: you should use a hashed password in a real application, for simplicity we store the password in clear text.

We use the next_user_id key in order to always get a unique ID for every new user. Then we use this unique ID to name the key holding a Hash with user's data. This is a common design pattern with key-values stores! Keep it in mind. Besides the fields already defined, we need some more stuff in order to fully define a User. For example, sometimes it can be useful to be able to get the user ID from the username, so every time we add a user, we also populate the users key, which is a Hash, with the username as field, and its ID as value.

HSET users antirez 1000

This may appear strange at first, but remember that we are only able to access data in a direct way, without secondary indexes. It's not possible to tell Redis to return the key that holds a specific value. This is also our strength. This new paradigm is forcing us to organize data so that everything is accessible by primary key, speaking in relational DB terms.

Followers, following, and updates

There is another central need in our system. A user might have users who follow them, which we'll call their followers. A user might follow other users, which we'll call a following. We have a perfect data structure for this. That is... Sets. The uniqueness of Sets elements, and the fact we can test in constant time for existence, are two interesting features. However what about also remembering the time at which a given user started following another one? In an enhanced version of our simple Twitter clone this may be useful, so instead of using a simple Set, we use a Sorted Set, using the user ID of the following or follower user as element, and the unix time at which the relation between the users was created, as our score.

So let's define our keys:

followers:1000 => Sorted Set of uids of all the followers users
following:1000 => Sorted Set of uids of all the following users

We can add new followers with:

ZADD followers:1000 1401267618 1234 => Add user 1234 with time 1401267618

Another important thing we need is a place were we can add the updates to display in the user's home page. We'll need to access this data in chronological order later, from the most recent update to the oldest, so the perfect kind of data structure for this is a List. Basically every new update will be LPUSHed in the user updates key, and thanks to LRANGE, we can implement pagination and so on. Note that we use the words updates and posts interchangeably, since updates are actually "little posts" in some way.

posts:1000 => a List of post ids - every new post is LPUSHed here.

This list is basically the User timeline. We'll push the IDs of her/his own posts, and, the IDs of all the posts of created by the following users. Basically, we'll implement a write fanout.

Authentication

OK, we have more or less everything about the user except for authentication. We'll handle authentication in a simple but robust way: we don't want to use PHP sessions, as our system must be ready to be distributed among different web servers easily, so we'll keep the whole state in our Redis database. All we need is a random unguessable string to set as the cookie of an authenticated user, and a key that will contain the user ID of the client holding the string.

We need two things in order to make this thing work in a robust way. First: the current authentication secret (the random unguessable string) should be part of the User object, so when the user is created we also set an auth field in its Hash:

HSET user:1000 auth fea5e81ac8ca77622bed1c2132a021f9

Moreover, we need a way to map authentication secrets to user IDs, so we also take an auths key, which has as value a Hash type mapping authentication secrets to user IDs.

HSET auths fea5e81ac8ca77622bed1c2132a021f9 1000

In order to authenticate a user we'll do these simple steps (see the login.php file in the Retwis source code):

  • Get the username and password via the login form.
  • Check if the username field actually exists in the users Hash.
  • If it exists we have the user id, (i.e. 1000).
  • Check if user:1000 password matches, if not, return an error message.
  • Ok authenticated! Set "fea5e81ac8ca77622bed1c2132a021f9" (the value of user:1000 auth field) as the "auth" cookie.

This is the actual code:

include("retwis.php");

# Form sanity checks
if (!gt("username") || !gt("password"))
    goback("You need to enter both username and password to login.");

# The form is ok, check if the username is available
$username = gt("username");
$password = gt("password");
$r = redisLink();
$userid = $r->hget("users",$username);
if (!$userid)
    goback("Wrong username or password");
$realpassword = $r->hget("user:$userid","password");
if ($realpassword != $password)
    goback("Wrong username or password");

# Username / password OK, set the cookie and redirect to index.php
$authsecret = $r->hget("user:$userid","auth");
setcookie("auth",$authsecret,time()+3600*24*365);
header("Location: index.php");

This happens every time a user logs in, but we also need a function isLoggedIn in order to check if a given user is already authenticated or not. These are the logical steps preformed by the isLoggedIn function:

  • Get the "auth" cookie from the user. If there is no cookie, the user is not logged in, of course. Let's call the value of the cookie <authcookie>.
  • Check if <authcookie> field in the auths Hash exists, and what the value (the user ID) is (1000 in the example).
  • In order for the system to be more robust, also verify that user:1000 auth field also matches.
  • OK the user is authenticated, and we loaded a bit of information in the $User global variable.

The code is simpler than the description, possibly:

function isLoggedIn() {
    global $User, $_COOKIE;

    if (isset($User)) return true;

    if (isset($_COOKIE['auth'])) {
        $r = redisLink();
        $authcookie = $_COOKIE['auth'];
        if ($userid = $r->hget("auths",$authcookie)) {
            if ($r->hget("user:$userid","auth") != $authcookie) return false;
            loadUserInfo($userid);
            return true;
        }
    }
    return false;
}

function loadUserInfo($userid) {
    global $User;

    $r = redisLink();
    $User['id'] = $userid;
    $User['username'] = $r->hget("user:$userid","username");
    return true;
}

Having loadUserInfo as a separate function is overkill for our application, but it's a good approach in a complex application. The only thing that's missing from all the authentication is the logout. What do we do on logout? That's simple, we'll just change the random string in user:1000 auth field, remove the old authentication secret from the auths Hash, and add the new one.

Important: the logout procedure explains why we don't just authenticate the user after looking up the authentication secret in the auths Hash, but double check it against user:1000 auth field. The true authentication string is the latter, while the auths Hash is just an authentication field that may even be volatile, or, if there are bugs in the program or a script gets interrupted, we may even end with multiple entries in the auths key pointing to the same user ID. The logout code is the following (logout.php):

include("retwis.php");

if (!isLoggedIn()) {
    header("Location: index.php");
    exit;
}

$r = redisLink();
$newauthsecret = getrand();
$userid = $User['id'];
$oldauthsecret = $r->hget("user:$userid","auth");

$r->hset("user:$userid","auth",$newauthsecret);
$r->hset("auths",$newauthsecret,$userid);
$r->hdel("auths",$oldauthsecret);

header("Location: index.php");

That is just what we described and should be simple to understand.

Updates

Updates, also known as posts, are even simpler. In order to create a new post in the database we do something like this:

INCR next_post_id => 10343
HMSET post:10343 user_id $owner_id time $time body "I'm having fun with Retwis"

As you can see each post is just represented by a Hash with three fields. The ID of the user owning the post, the time at which the post was published, and finally, the body of the post, which is, the actual status message.

After we create a post and we obtain the post ID, we need to LPUSH the ID in the timeline of every user that is following the author of the post, and of course in the list of posts of the author itself (everybody is virtually following herself/himself). This is the file post.php that shows how this is performed:

include("retwis.php");

if (!isLoggedIn() || !gt("status")) {
    header("Location:index.php");
    exit;
}

$r = redisLink();
$postid = $r->incr("next_post_id");
$status = str_replace("\n"," ",gt("status"));
$r->hmset("post:$postid","user_id",$User['id'],"time",time(),"body",$status);
$followers = $r->zrange("followers:".$User['id'],0,-1);
$followers[] = $User['id']; /* Add the post to our own posts too */

foreach($followers as $fid) {
    $r->lpush("posts:$fid",$postid);
}
# Push the post on the timeline, and trim the timeline to the
# newest 1000 elements.
$r->lpush("timeline",$postid);
$r->ltrim("timeline",0,1000);

header("Location: index.php");

The core of the function is the foreach loop. We use ZRANGE to get all the followers of the current user, then the loop will LPUSH the push the post in every follower timeline List.

Note that we also maintain a global timeline for all the posts, so that in the Retwis home page we can show everybody's updates easily. This requires just doing an LPUSH to the timeline List. Let's face it, aren't you starting to think it was a bit strange to have to sort things added in chronological order using ORDER BY with SQL? I think so.

There is an interesting thing to notice in the code above: we used a new command called LTRIM after we perform the LPUSH operation in the global timeline. This is used in order to trim the list to just 1000 elements. The global timeline is actually only used in order to show a few posts in the home page, there is no need to have the full history of all the posts.

Basically LTRIM + LPUSH is a way to create a capped collection in Redis.

Paginating updates

Now it should be pretty clear how we can use LRANGE in order to get ranges of posts, and render these posts on the screen. The code is simple:

function showPost($id) {
    $r = redisLink();
    $post = $r->hgetall("post:$id");
    if (empty($post)) return false;

    $userid = $post['user_id'];
    $username = $r->hget("user:$userid","username");
    $elapsed = strElapsed($post['time']);
    $userlink = "<a class=\"username\" href=\"profile.php?u=".urlencode($username)."\">".utf8entities($username)."</a>";

    echo('<div class="post">'.$userlink.' '.utf8entities($post['body'])."<br>");
    echo('<i>posted '.$elapsed.' ago via web</i></div>');
    return true;
}

function showUserPosts($userid,$start,$count) {
    $r = redisLink();
    $key = ($userid == -1) ? "timeline" : "posts:$userid";
    $posts = $r->lrange($key,$start,$start+$count);
    $c = 0;
    foreach($posts as $p) {
        if (showPost($p)) $c++;
        if ($c == $count) break;
    }
    return count($posts) == $count+1;
}

showPost will simply convert and print a Post in HTML while showUserPosts gets a range of posts and then passes them to showPosts.

Note: LRANGE is not very efficient if the list of posts start to be very big, and we want to access elements which are in the middle of the list, since Redis Lists are backed by linked lists. If a system is designed for deep pagination of million of items, it is better to resort to Sorted Sets instead.

Following users

It is not hard, but we did not yet check how we create following / follower relationships. If user ID 1000 (antirez) wants to follow user ID 5000 (pippo), we need to create both a following and a follower relationship. We just need to ZADD calls:

    ZADD following:1000 5000
    ZADD followers:5000 1000

Note the same pattern again and again. In theory with a relational database, the list of following and followers would be contained in a single table with fields like following_id and follower_id. You can extract the followers or following of every user using an SQL query. With a key-value DB things are a bit different since we need to set both the 1000 is following 5000 and 5000 is followed by 1000 relations. This is the price to pay, but on the other hand accessing the data is simpler and extremely fast. Having these things as separate sets allows us to do interesting stuff. For example, using ZINTERSTORE we can have the intersection of following of two different users, so we may add a feature to our Twitter clone so that it is able to tell you very quickly when you visit somebody else's profile, "you and Alice have 34 followers in common", and things like that.

You can find the code that sets or removes a following / follower relation in the follow.php file.

Making it horizontally scalable

Gentle reader, if you read till this point you are already a hero. Thank you. Before talking about scaling horizontally it is worth checking performance on a single server. Retwis is extremely fast, without any kind of cache. On a very slow and loaded server, an Apache benchmark with 100 parallel clients issuing 100000 requests measured the average pageview to take 5 milliseconds. This means you can serve millions of users every day with just a single Linux box, and this one was monkey ass slow... Imagine the results with more recent hardware.

However you can't go with a single server forever, how do you scale a key-value store?

Retwis does not perform any multi-keys operation, so making it scalable is simple: you may use client-side sharding, or something like a sharding proxy like Twemproxy, or the upcoming Redis Cluster.

To know more about those topics please read our documentation about sharding. However, the point here to stress is that in a key-value store, if you design with care, the data set is split among many independent small keys. To distribute those keys to multiple nodes is more straightforward and predictable compared to using a semantically more complex database system.

13 - RESP protocol spec

Redis serialization protocol (RESP) specification

Redis clients use a protocol called RESP (REdis Serialization Protocol) to communicate with the Redis server. While the protocol was designed specifically for Redis, it can be used for other client-server software projects.

RESP is a compromise between the following things:

  • Simple to implement.
  • Fast to parse.
  • Human readable.

RESP can serialize different data types like integers, strings, and arrays. There is also a specific type for errors. Requests are sent from the client to the Redis server as arrays of strings that represent the arguments of the command to execute. Redis replies with a command-specific data type.

RESP is binary-safe and does not require processing of bulk data transferred from one process to another because it uses prefixed-length to transfer bulk data.

Note: the protocol outlined here is only used for client-server communication. Redis Cluster uses a different binary protocol in order to exchange messages between nodes.

Network layer

A client connects to a Redis server by creating a TCP connection to the port 6379.

While RESP is technically non-TCP specific, the protocol is only used with TCP connections (or equivalent stream-oriented connections like Unix sockets) in the context of Redis.

Request-Response model

Redis accepts commands composed of different arguments. Once a command is received, it is processed and a reply is sent back to the client.

This is the simplest model possible; however, there are two exceptions:

  • Redis supports pipelining (covered later in this document). So it is possible for clients to send multiple commands at once and wait for replies later.
  • When a Redis client subscribes to a Pub/Sub channel, the protocol changes semantics and becomes a push protocol. The client no longer requires sending commands because the server will automatically send new messages to the client (for the channels the client is subscribed to) as soon as they are received.

Excluding these two exceptions, the Redis protocol is a simple request-response protocol.

RESP protocol description

The RESP protocol was introduced in Redis 1.2, but it became the standard way for talking with the Redis server in Redis 2.0. This is the protocol you should implement in your Redis client.

RESP is actually a serialization protocol that supports the following data types: Simple Strings, Errors, Integers, Bulk Strings, and Arrays.

Redis uses RESP as a request-response protocol in the following way:

  • Clients send commands to a Redis server as a RESP Array of Bulk Strings.
  • The server replies with one of the RESP types according to the command implementation.

In RESP, the first byte determines the data type:

  • For Simple Strings, the first byte of the reply is "+"
  • For Errors, the first byte of the reply is "-"
  • For Integers, the first byte of the reply is ":"
  • For Bulk Strings, the first byte of the reply is "$"
  • For Arrays, the first byte of the reply is "*"

RESP can represent a Null value using a special variation of Bulk Strings or Array as specified later.

In RESP, different parts of the protocol are always terminated with "\r\n" (CRLF).

RESP Simple Strings

Simple Strings are encoded as follows: a plus character, followed by a string that cannot contain a CR or LF character (no newlines are allowed), and terminated by CRLF (that is "\r\n").

Simple Strings are used to transmit non binary-safe strings with minimal overhead. For example, many Redis commands reply with just "OK" on success. The RESP Simple String is encoded with the following 5 bytes:

"+OK\r\n"

In order to send binary-safe strings, use RESP Bulk Strings instead.

When Redis replies with a Simple String, a client library should respond with a string composed of the first character after the '+' up to the end of the string, excluding the final CRLF bytes.

RESP Errors

RESP has a specific data type for errors. They are similar to RESP Simple Strings, but the first character is a minus '-' character instead of a plus. The real difference between Simple Strings and Errors in RESP is that clients treat errors as exceptions, and the string that composes the Error type is the error message itself.

The basic format is:

"-Error message\r\n"

Error replies are only sent when something goes wrong, for instance if you try to perform an operation against the wrong data type, or if the command does not exist. The client should raise an exception when it receives an Error reply.

The following are examples of error replies:

-ERR unknown command 'helloworld'
-WRONGTYPE Operation against a key holding the wrong kind of value

The first word after the "-", up to the first space or newline, represents the kind of error returned. This is just a convention used by Redis and is not part of the RESP Error format.

For example, ERR is the generic error, while WRONGTYPE is a more specific error that implies that the client tried to perform an operation against the wrong data type. This is called an Error Prefix and is a way to allow the client to understand the kind of error returned by the server without checking the exact error message.

A client implementation may return different types of exceptions for different errors or provide a generic way to trap errors by directly providing the error name to the caller as a string.

However, such a feature should not be considered vital as it is rarely useful, and a limited client implementation may simply return a generic error condition, such as false.

RESP Integers

This type is just a CRLF-terminated string that represents an integer, prefixed by a ":" byte. For example, ":0\r\n" and ":1000\r\n" are integer replies.

Many Redis commands return RESP Integers, like INCR, LLEN, and LASTSAVE.

There is no special meaning for the returned integer. It is just an incremental number for INCR, a UNIX time for LASTSAVE, and so forth. However, the returned integer is guaranteed to be in the range of a signed 64-bit integer.

Integer replies are also used in order to return true or false. For instance, commands like EXISTS or SISMEMBER will return 1 for true and 0 for false.

Other commands like SADD, SREM, and SETNX will return 1 if the operation was actually performed and 0 otherwise.

The following commands will reply with an integer: SETNX, DEL, EXISTS, INCR, INCRBY, DECR, DECRBY, DBSIZE, LASTSAVE, RENAMENX, MOVE, LLEN, SADD, SREM, SISMEMBER, SCARD.

RESP Bulk Strings

Bulk Strings are used in order to represent a single binary-safe string up to 512 MB in length.

Bulk Strings are encoded in the following way:

  • A "$" byte followed by the number of bytes composing the string (a prefixed length), terminated by CRLF.
  • The actual string data.
  • A final CRLF.

So the string "hello" is encoded as follows:

"$5\r\nhello\r\n"

An empty string is encoded as:

"$0\r\n\r\n"

RESP Bulk Strings can also be used in order to signal non-existence of a value using a special format to represent a Null value. In this format, the length is -1, and there is no data. Null is represented as:

"$-1\r\n"

This is called a Null Bulk String.

The client library API should not return an empty string, but a nil object, when the server replies with a Null Bulk String. For example, a Ruby library should return 'nil' while a C library should return NULL (or set a special flag in the reply object).

RESP Arrays

Clients send commands to the Redis server using RESP Arrays. Similarly, certain Redis commands, that return collections of elements to the client, use RESP Arrays as their replies. An example is the LRANGE command that returns elements of a list.

RESP Arrays are sent using the following format:

  • A * character as the first byte, followed by the number of elements in the array as a decimal number, followed by CRLF.
  • An additional RESP type for every element of the Array.

So an empty Array is just the following:

"*0\r\n"

While an array of two RESP Bulk Strings "hello" and "world" is encoded as:

"*2\r\n$5\r\nhello\r\n$5\r\nworld\r\n"

As you can see after the *<count>CRLF part prefixing the array, the other data types composing the array are just concatenated one after the other. For example, an Array of three integers is encoded as follows:

"*3\r\n:1\r\n:2\r\n:3\r\n"

Arrays can contain mixed types, so it's not necessary for the elements to be of the same type. For instance, a list of four integers and a bulk string can be encoded as follows:

*5\r\n
:1\r\n
:2\r\n
:3\r\n
:4\r\n
$6\r\n
hello\r\n

(The reply was split into multiple lines for clarity).

The first line the server sent is *5\r\n in order to specify that five replies will follow. Then every reply constituting the items of the Multi Bulk reply are transmitted.

Null Arrays exist as well and are an alternative way to specify a Null value (usually the Null Bulk String is used, but for historical reasons we have two formats).

For instance, when the BLPOP command times out, it returns a Null Array that has a count of -1 as in the following example:

"*-1\r\n"

A client library API should return a null object and not an empty Array when Redis replies with a Null Array. This is necessary to distinguish between an empty list and a different condition (for instance the timeout condition of the BLPOP command).

Nested arrays are possible in RESP. For example a nested array of two arrays is encoded as follows:

*2\r\n
*3\r\n
:1\r\n
:2\r\n
:3\r\n
*2\r\n
+Hello\r\n
-World\r\n

(The format was split into multiple lines to make it easier to read).

The above RESP data type encodes a two-element Array consisting of an Array that contains three Integers (1, 2, 3) and an array of a Simple String and an Error.

Null elements in Arrays

Single elements of an Array may be Null. This is used in Redis replies to signal that these elements are missing and not empty strings. This can happen with the SORT command when used with the GET pattern option if the specified key is missing. Example of an Array reply containing a Null element:

*3\r\n
$5\r\n
hello\r\n
$-1\r\n
$5\r\n
world\r\n

The second element is a Null. The client library should return something like this:

["hello",nil,"world"]

Note that this is not an exception to what was said in the previous sections, but an example to further specify the protocol.

Send commands to a Redis server

Now that you are familiar with the RESP serialization format, you can use it to help write a Redis client library. We can further specify how the interaction between the client and the server works:

  • A client sends the Redis server a RESP Array consisting of only Bulk Strings.
  • A Redis server replies to clients, sending any valid RESP data type as a reply.

So for example a typical interaction could be the following.

The client sends the command LLEN mylist in order to get the length of the list stored at key mylist. Then the server replies with an Integer reply as in the following example (C: is the client, S: the server).

C: *2\r\n
C: $4\r\n
C: LLEN\r\n
C: $6\r\n
C: mylist\r\n

S: :48293\r\n

As usual, we separate different parts of the protocol with newlines for simplicity, but the actual interaction is the client sending *2\r\n$4\r\nLLEN\r\n$6\r\nmylist\r\n as a whole.

Multiple commands and pipelining

A client can use the same connection in order to issue multiple commands. Pipelining is supported so multiple commands can be sent with a single write operation by the client, without the need to read the server reply of the previous command before issuing the next one. All the replies can be read at the end.

For more information, see Pipelining.

Inline commands

Sometimes you may need to send a command to the Redis server but only have telnet available. While the Redis protocol is simple to implement, it is not ideal to use in interactive sessions, and redis-cli may not always be available. For this reason, Redis also accepts commands in the inline command format.

The following is an example of a server/client chat using an inline command (the server chat starts with S:, the client chat with C:)

C: PING
S: +PONG

The following is an example of an inline command that returns an integer:

C: EXISTS somekey
S: :0

Basically, you write space-separated arguments in a telnet session. Since no command starts with * that is instead used in the unified request protocol, Redis is able to detect this condition and parse your command.

High performance parser for the Redis protocol

While the Redis protocol is human readable and easy to implement, it can be implemented with a performance similar to that of a binary protocol.

RESP uses prefixed lengths to transfer bulk data, so there is never a need to scan the payload for special characters, like with JSON, nor to quote the payload that needs to be sent to the server.

The Bulk and Multi Bulk lengths can be processed with code that performs a single operation per character while at the same time scanning for the CR character, like the following C code:

#include <stdio.h>

int main(void) {
    unsigned char *p = "$123\r\n";
    int len = 0;

    p++;
    while(*p != '\r') {
        len = (len*10)+(*p - '0');
        p++;
    }

    /* Now p points at '\r', and the len is in bulk_len. */
    printf("%d\n", len);
    return 0;
}

After the first CR is identified, it can be skipped along with the following LF without any processing. Then the bulk data can be read using a single read operation that does not inspect the payload in any way. Finally, the remaining CR and LF characters are discarded without any processing.

While comparable in performance to a binary protocol, the Redis protocol is significantly simpler to implement in most high-level languages, reducing the number of bugs in client software.

14 - Redis signal handling

How Redis handles common Unix signals

This document provides information about how Redis reacts to different POSIX signals such as SIGTERM and SIGSEGV.

The information in this document only applies to Redis version 2.6 or greater.

SIGTERM and SIGINT

The SIGTERM and SIGINT signals tell Redis to shut down gracefully. When the server receives this signal, it does not immediately exit. Instead, it schedules a shutdown similar to the one performed by the SHUTDOWN command. The scheduled shutdown starts as soon as possible, specifically as long as the current command in execution terminates (if any), with a possible additional delay of 0.1 seconds or less.

If the server is blocked by a long-running Lua script, kill the script with SCRIPT KILL if possible. The scheduled shutdown will run just after the script is killed or terminates spontaneously.

This shutdown process includes the following actions:

  • If there are any replicas lagging behind in replication:
    • Pause clients attempting to write with CLIENT PAUSE and the WRITE option.
    • Wait up to the configured shutdown-timeout (default 10 seconds) for replicas to catch up with the master's replication offset.
  • If a background child is saving the RDB file or performing an AOF rewrite, the child process is killed.
  • If the AOF is active, Redis calls the fsync system call on the AOF file descriptor to flush the buffers on disk.
  • If Redis is configured to persist on disk using RDB files, a synchronous (blocking) save is performed. Since the save is synchronous, it doesn't use any additional memory.
  • If the server is daemonized, the PID file is removed.
  • If the Unix domain socket is enabled, it gets removed.
  • The server exits with an exit code of zero.

IF the RDB file can't be saved, the shutdown fails, and the server continues to run in order to ensure no data loss. Likewise, if the user just turned on AOF, and the server triggered the first AOF rewrite in order to create the initial AOF file but this file can't be saved, the shutdown fails and the server continues to run. Since Redis 2.6.11, no further attempt to shut down will be made unless a new SIGTERM is received or the SHUTDOWN command is issued.

Since Redis 7.0, the server waits for lagging replicas up to a configurable shutdown-timeout, 10 seconds by default, before shutting down. This provides a best effort to minimize the risk of data loss in a situation where no save points are configured and AOF is deactivated. Before version 7.0, shutting down a heavily loaded master node in a diskless setup was more likely to result in data loss. To minimize the risk of data loss in such setups, trigger a manual FAILOVER (or CLUSTER FAILOVER) to demote the master to a replica and promote one of the replicas to a new master before shutting down a master node.

SIGSEGV, SIGBUS, SIGFPE and SIGILL

The following signals are handled as a Redis crash:

  • SIGSEGV
  • SIGBUS
  • SIGFPE
  • SIGILL

Once one of these signals is trapped, Redis stops any current operation and performs the following actions:

  • Adds a bug report to the log file. This includes a stack trace, dump of registers, and information about the state of clients.
  • Since Redis 2.8, a fast memory test is performed as a first check of the reliability of the crashing system.
  • If the server was daemonized, the PID file is removed.
  • Finally the server unregisters its own signal handler for the received signal and resends the same signal to itself to make sure that the default action is performed, such as dumping the core on the file system.

What happens when a child process gets killed

When the child performing the Append Only File rewrite gets killed by a signal, Redis handles this as an error and discards the (probably partial or corrupted) AOF file. It will attempt the rewrite again later.

When the child performing an RDB save is killed, Redis handles the condition as a more severe error. While the failure of an AOF file rewrite can cause AOF file enlargement, failed RDB file creation reduces durability.

As a result of the child producing the RDB file being killed by a signal, or when the child exits with an error (non zero exit code), Redis enters a special error condition where no further write command is accepted.

  • Redis will continue to reply to read commands.
  • Redis will reply to all write commands with a MISCONFIG error.

This error condition will persist until it becomes possible to create an RDB file successfully.

Kill the RDB file without errors

Sometimes the user may want to kill the RDB-saving child process without generating an error. Since Redis version 2.6.10, this can be done using the signal SIGUSR1. This signal is handled in a special way: it kills the child process like any other signal, but the parent process will not detect this as a critical error and will continue to serve write requests.

15 - Sentinel client spec

How to build clients for Redis Sentinel

Redis Sentinel is a monitoring solution for Redis instances that handles automatic failover of Redis masters and service discovery (who is the current master for a given group of instances?). Since Sentinel is both responsible for reconfiguring instances during failovers, and providing configurations to clients connecting to Redis masters or replicas, clients are required to have explicit support for Redis Sentinel.

This document is targeted at Redis clients developers that want to support Sentinel in their clients implementation with the following goals:

  • Automatic configuration of clients via Sentinel.
  • Improved safety of Redis Sentinel automatic failover.

For details about how Redis Sentinel works, please check the Redis Documentation, as this document only contains information needed for Redis client developers, and it is expected that readers are familiar with the way Redis Sentinel works.

Redis service discovery via Sentinel

Redis Sentinel identifies every master with a name like "stats" or "cache". Every name actually identifies a group of instances, composed of a master and a variable number of replicas.

The address of the Redis master that is used for a specific purpose inside a network may change after events like an automatic failover, a manually triggered failover (for instance in order to upgrade a Redis instance), and other reasons.

Normally Redis clients have some kind of hard-coded configuration that specifies the address of a Redis master instance within a network as IP address and port number. However if the master address changes, manual intervention in every client is needed.

A Redis client supporting Sentinel can automatically discover the address of a Redis master from the master name using Redis Sentinel. So instead of a hard coded IP address and port, a client supporting Sentinel should optionally be able to take as input:

  • A list of ip:port pairs pointing to known Sentinel instances.
  • The name of the service, like "cache" or "timelines".

This is the procedure a client should follow in order to obtain the master address starting from the list of Sentinels and the service name.

Step 1: connecting to the first Sentinel

The client should iterate the list of Sentinel addresses. For every address it should try to connect to the Sentinel, using a short timeout (in the order of a few hundreds of milliseconds). On errors or timeouts the next Sentinel address should be tried.

If all the Sentinel addresses were tried without success, an error should be returned to the client.

The first Sentinel replying to the client request should be put at the start of the list, so that at the next reconnection, we'll try first the Sentinel that was reachable in the previous connection attempt, minimizing latency.

Step 2: ask for master address

Once a connection with a Sentinel is established, the client should retry to execute the following command on the Sentinel:

SENTINEL get-master-addr-by-name master-name

Where master-name should be replaced with the actual service name specified by the user.

The result from this call can be one of the following two replies:

  • An ip:port pair.
  • A null reply. This means Sentinel does not know this master.

If an ip:port pair is received, this address should be used to connect to the Redis master. Otherwise if a null reply is received, the client should try the next Sentinel in the list.

Step 3: call the ROLE command in the target instance

Once the client discovered the address of the master instance, it should attempt a connection with the master, and call the ROLE command in order to verify the role of the instance is actually a master.

If the ROLE commands is not available (it was introduced in Redis 2.8.12), a client may resort to the INFO replication command parsing the role: field of the output.

If the instance is not a master as expected, the client should wait a short amount of time (a few hundreds of milliseconds) and should try again starting from Step 1.

Handling reconnections

Once the service name is resolved into the master address and a connection is established with the Redis master instance, every time a reconnection is needed, the client should resolve again the address using Sentinels restarting from Step 1. For instance Sentinel should contacted again the following cases:

  • If the client reconnects after a timeout or socket error.
  • If the client reconnects because it was explicitly closed or reconnected by the user.

In the above cases and any other case where the client lost the connection with the Redis server, the client should resolve the master address again.

Sentinel failover disconnection

Starting with Redis 2.8.12, when Redis Sentinel changes the configuration of an instance, for example promoting a replica to a master, demoting a master to replicate to the new master after a failover, or simply changing the master address of a stale replica instance, it sends a CLIENT KILL type normal command to the instance in order to make sure all the clients are disconnected from the reconfigured instance. This will force clients to resolve the master address again.

If the client will contact a Sentinel with yet not updated information, the verification of the Redis instance role via the ROLE command will fail, allowing the client to detect that the contacted Sentinel provided stale information, and will try again.

Note: it is possible that a stale master returns online at the same time a client contacts a stale Sentinel instance, so the client may connect with a stale master, and yet the ROLE output will match. However when the master is back again Sentinel will try to demote it to replica, triggering a new disconnection. The same reasoning applies to connecting to stale replicas that will get reconfigured to replicate with a different master.

Connecting to replicas

Sometimes clients are interested to connect to replicas, for example in order to scale read requests. This protocol supports connecting to replicas by modifying step 2 slightly. Instead of calling the following command:

SENTINEL get-master-addr-by-name master-name

The clients should call instead:

SENTINEL replicas master-name

In order to retrieve a list of replica instances.

Symmetrically the client should verify with the ROLE command that the instance is actually a replica, in order to avoid scaling read queries with the master.

Connection pools

For clients implementing connection pools, on reconnection of a single connection, the Sentinel should be contacted again, and in case of a master address change all the existing connections should be closed and connected to the new address.

Error reporting

The client should correctly return the information to the user in case of errors. Specifically:

  • If no Sentinel can be contacted (so that the client was never able to get the reply to SENTINEL get-master-addr-by-name), an error that clearly states that Redis Sentinel is unreachable should be returned.
  • If all the Sentinels in the pool replied with a null reply, the user should be informed with an error that Sentinels don't know this master name.

Sentinels list automatic refresh

Optionally once a successful reply to get-master-addr-by-name is received, a client may update its internal list of Sentinel nodes following this procedure:

  • Obtain a list of other Sentinels for this master using the command SENTINEL sentinels <master-name>.
  • Add every ip:port pair not already existing in our list at the end of the list.

It is not needed for a client to be able to make the list persistent updating its own configuration. The ability to upgrade the in-memory representation of the list of Sentinels can be already useful to improve reliability.

Subscribe to Sentinel events to improve responsiveness

The Sentinel documentation shows how clients can connect to Sentinel instances using Pub/Sub in order to subscribe to changes in the Redis instances configurations.

This mechanism can be used in order to speedup the reconfiguration of clients, that is, clients may listen to Pub/Sub in order to know when a configuration change happened in order to run the three steps protocol explained in this document in order to resolve the new Redis master (or replica) address.

However update messages received via Pub/Sub should not substitute the above procedure, since there is no guarantee that a client is able to receive all the update messages.

Additional information

For additional information or to discuss specific aspects of this guidelines, please drop a message to the Redis Google Group.