1 - Redis administration

Advice for configuring and managing Redis in production

Redis setup hints

  • We suggest deploying Redis using the Linux operating system. Redis is also tested heavily on OS X, and tested from time to time on FreeBSD and OpenBSD systems. However Linux is where we do all the major stress testing, and where most production deployments are running.

  • Make sure to set the Linux kernel overcommit memory setting to 1. Add vm.overcommit_memory = 1 to /etc/sysctl.conf. Then reboot or run the command sysctl vm.overcommit_memory=1 for this to take effect immediately.

  • Make sure Redis won't be affected by the Linux kernel feature, transparent huge pages, otherwise it will impact greatly both memory usage and latency in a negative way. This is accomplished with the following command: echo madvise > /sys/kernel/mm/transparent_hugepage/enabled.

  • Make sure to setup swap in your system (we suggest as much as swap as memory). If Linux does not have swap and your Redis instance accidentally consumes too much memory, either Redis will crash when it is out of memory, or the Linux kernel OOM killer will kill the Redis process. When swapping is enabled Redis will work in poorly, but you'll likely notice the latency spikes and do something before it's too late.

  • Set an explicit maxmemory option limit in your instance to make sure that it will report errors instead of failing when the system memory limit is near to be reached. Note that maxmemory should be set by calculating the overhead for Redis, other than data, and the fragmentation overhead. So if you think you have 10 GB of free memory, set it to 8 or 9.+ If you are using Redis in a very write-heavy application, while saving an RDB file on disk or rewriting the AOF log, Redis may use up to 2 times the memory normally used. The additional memory used is proportional to the number of memory pages modified by writes during the saving process, so it is often proportional to the number of keys (or aggregate types items) touched during this time. Make sure to size your memory accordingly.

  • Use daemonize no when running under daemontools.

  • Make sure to setup some non-trivial replication backlog, which must be set in proportion to the amount of memory Redis is using. In a 20 GB instance it does not make sense to have just 1 MB of backlog. The backlog will allow replicas to sync with the master instance much more easily.

  • If you use replication, Redis will need to perform RDB saves even if you have persistence disabled (this doesn't apply to diskless replication). If you don't have disk usage on the master, make sure to enable diskless replication.

  • If you are using replication, make sure that either your master has persistence enabled, or that it does not automatically restart on crashes. Replicas will try to maintain an exact copy of the master, so if a master restarts with an empty data set, replicas will be wiped as well.

  • By default, Redis does not require any authentication and listens to all the network interfaces. This is a big security issue if you leave Redis exposed on the internet or other places where attackers can reach it. See for example this attack to see how dangerous it can be. Please check our security page and the quick start for information about how to secure Redis.

  • See the LATENCY DOCTOR and MEMORY DOCTOR commands to assist in troubleshooting.

Running Redis on EC2

  • Use HVM based instances, not PV based instances.
  • Don't use old instances families, for example: use m3.medium with HVM instead of m1.medium with PV.
  • The use of Redis persistence with EC2 EBS volumes needs to be handled with care since sometimes EBS volumes have high latency characteristics.
  • You may want to try the new diskless replication if you have issues when replicas are synchronizing with the master.

Upgrading or restarting a Redis instance without downtime

Redis is designed to be a very long running process in your server.

Many configuration options can be modified without any kind of restart using the CONFIG SET command.

You can also switch from AOF to RDB snapshots persistence, or the other way around, without restarting Redis. Check the output of the CONFIG GET * command for more information.

However from time to time, a restart is mandatory. For example, in order to upgrade the Redis process to a newer version, or when you need to modify some configuration parameter that is currently not supported by the CONFIG command.

The following steps provide a way that is commonly used to avoid any downtime.

  • Setup your new Redis instance as a replica for your current Redis instance. In order to do so, you need a different server, or a server that has enough RAM to keep two instances of Redis running at the same time.

  • If you use a single server, make sure that the replica is started in a different port than the master instance, otherwise the replica will not be able to start at all.

  • Wait for the replication initial synchronization to complete (check the replica's log file).

  • Using INFO, make sure the master and replica have the same number of keys. Use redis-cli to make sure the replica is working as you wish and is replying to your commands.

  • Allow writes to the replica using CONFIG SET slave-read-only no.

  • Configure all your clients to use the new instance (the replica). Note that you may want to use the CLIENT PAUSE command to make sure that no client can write to the old master during the switch.

  • Once you are sure that the master is no longer receiving any query (you can check this with the MONITOR command), elect the replica to master using the REPLICAOF NO ONE command, and then shut down your master.

If you are using Redis Sentinel or Redis Cluster, the simplest way to upgrade to newer versions is to upgrade one replica after the other. Then you can perform a manual failover to promote one of the upgraded replicas to master, and finally promote the last replica.

Note that Redis Cluster 4.0 is not compatible with Redis Cluster 3.2 at cluster bus protocol level, so a mass restart is needed in this case. However Redis 5 cluster bus is backward compatible with Redis 4.

2 - Redis CLI

Overview of redis-cli, the Redis command line interface

The redis-cli (Redis command line interface) is a terminal program used to send commands to and read replies from the Redis server. It has two main modes: an interactive REPL (Read Eval Print Loop) mode where the user types Redis commands and receives replies, and a command mode where redis-cli is executed with additional arguments and the reply is printed to the standard output.

In interactive mode, redis-cli has basic line editing capabilities to provide a familiar tyPING experience.

There are several options you can use to launch the program in special modes. You can simulate a replica and print the replication stream it receives from the primary, check the latency of a Redis server and display statistics, or request ASCII-art spectrogram of latency samples and frequencies, among many other things.

This guide will cover the different aspects of redis-cli, starting from the simplest and ending with the more advanced features.

Command line usage

To run a Redis command and receive its reply as standard output to the terminal, include the command to execute as separate arguments of redis-cli:

$ redis-cli INCR mycounter
(integer) 7

The reply of the command is "7". Since Redis replies are typed (strings, arrays, integers, nil, errors, etc.), you see the type of the reply between parentheses. This additional information may not be ideal when the output of redis-cli must be used as input of another command or redirected into a file.

redis-cli only shows additional information for human readibility when it detects the standard output is a tty, or terminal. For all other outputs it will auto-enable the raw output mode, as in the following example:

$ redis-cli INCR mycounter > /tmp/output.txt
$ cat /tmp/output.txt
8

Notice that (integer) was omitted from the output since redis-cli detected the output was no longer written to the terminal. You can force raw output even on the terminal with the --raw option:

$ redis-cli --raw INCR mycounter
9

You can force human readable output when writing to a file or in pipe to other commands by using --no-raw.

Host, port, password and database

By default redis-cli connects to the server at the address 127.0.0.1 with port 6379. You can change this using several command line options. To specify a different host name or an IP address, use the -h option. In order to set a different port, use -p.

$ redis-cli -h redis15.localnet.org -p 6390 PING
PONG

If your instance is password protected, the -a <password> option will preform authentication saving the need of explicitly using the AUTH command:

$ redis-cli -a myUnguessablePazzzzzword123 PING
PONG

For safety it is strongly advised to provide the password to redis-cli automatically via the REDISCLI_AUTH environment variable.

Finally, it's possible to send a command that operates on a database number other than the default number zero by using the -n <dbnum> option:

$ redis-cli FLUSHALL
OK
$ redis-cli -n 1 INCR a
(integer) 1
$ redis-cli -n 1 INCR a
(integer) 2
$ redis-cli -n 2 INCR a
(integer) 1

Some or all of this information can also be provided by using the -u <uri> option and the URI pattern redis://user:password@host:port/dbnum:

$ redis-cli -u redis://LJenkins:p%40ssw0rd@redis-16379.hosted.com:16379/0 PING
PONG

SSL/TLS

By default, redis-cli uses a plain TCP connection to connect to Redis. You may enable SSL/TLS using the --tls option, along with --cacert or --cacertdir to configure a trusted root certificate bundle or directory.

If the target server requires authentication using a client side certificate, you can specify a certificate and a corresponding private key using --cert and --key.

Getting input from other programs

There are two ways you can use redis-cli in order to receive input from other commands via the standard input. One is to use the target payload as the last argument from stdin. For example, in order to set the Redis key net_services to the content of the file /etc/services from a local file system, use the -x option:

$ redis-cli -x SET net_services < /etc/services
OK
$ redis-cli GETRANGE net_services 0 50
"#\n# Network services, Internet style\n#\n# Note that "

In the first line of the above session, redis-cli was executed with the -x option and a file was redirected to the CLI's standard input as the value to satisfy the SET net_services command phrase. This is useful for scripting.

A different approach is to feed redis-cli a sequence of commands written in a text file:

$ cat /tmp/commands.txt
SET item:3374 100
INCR item:3374
APPEND item:3374 xxx
GET item:3374
$ cat /tmp/commands.txt | redis-cli
OK
(integer) 101
(integer) 6
"101xxx"

All the commands in commands.txt are executed consecutively by redis-cli as if they were typed by the user in interactive mode. Strings can be quoted inside the file if needed, so that it's possible to have single arguments with spaces, newlines, or other special characters:

$ cat /tmp/commands.txt
SET arg_example "This is a single argument"
STRLEN arg_example
$ cat /tmp/commands.txt | redis-cli
OK
(integer) 25

Continuously run the same command

It is possible to execute a signle command a specified number of times with a user-selected pause between executions. This is useful in different contexts - for example when we want to continuously monitor some key content or INFO field output, or when we want to simulate some recurring write event, such as pushing a new item into a list every 5 seconds.

This feature is controlled by two options: -r <count> and -i <delay>. The -r option states how many times to run a command and -i sets the delay between the different command calls in seconds (with the ability to specify values such as 0.1 to represent 100 milliseconds).

By default the interval (or delay) is set to 0, so commands are just executed ASAP:

$ redis-cli -r 5 INCR counter_value
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5

To run the same command indefinitely, use -1 as the count value. To monitor over time the RSS memory size it's possible to use the following command:

$ redis-cli -r -1 -i 1 INFO | grep rss_human
used_memory_rss_human:2.71M
used_memory_rss_human:2.73M
used_memory_rss_human:2.73M
used_memory_rss_human:2.73M
... a new line will be printed each second ...

Mass insertion of data using redis-cli

Mass insertion using redis-cli is covered in a separate page as it is a worthwhile topic itself. Please refer to our mass insertion guide.

CSV output

A CSV (Comma Separated Values) output feature exists within redis-cli to export data from Redis to an external program.

$ redis-cli LPUSH mylist a b c d
(integer) 4
$ redis-cli --csv LRANGE mylist 0 -1
"d","c","b","a"

Note that the --csv flag will only work on a single command, not the entirety of a DB as an export.

Running Lua scripts

The redis-cli has extensive support for using the debugging facility of Lua scripting, available with Redis 3.2 onwards. For this feature, refer to the Redis Lua debugger documentation.

Even without using the debugger, redis-cli can be used to run scripts from a file as an argument:

$ cat /tmp/script.lua
return redis.call('SET',KEYS[1],ARGV[1])
$ redis-cli --eval /tmp/script.lua location:hastings:temp , 23
OK

The Redis EVAL command takes the list of keys the script uses, and the other non key arguments, as different arrays. When calling EVAL you provide the number of keys as a number.

When calling redis-cli with the --eval option above, there is no need to specify the number of keys explicitly. Instead it uses the convention of separating keys and arguments with a comma. This is why in the above call you see location:hastings:temp , 23 as arguments.

So location:hastings:temp will populate the KEYS array, and 23 the ARGV array.

The --eval option is useful when writing simple scripts. For more complex work, the Lua debugger is recommended. It is possible to mix the two approaches, since the debugger can also execute scripts from an external file.

Interactive mode

We have explored how to use the Redis CLI as a command line program. This is useful for scripts and certain types of testing, however most people will spend the majority of time in redis-cli using its interactive mode.

In interactive mode the user types Redis commands at the prompt. The command is sent to the server, processed, and the reply is parsed back and rendered into a simpler form to read.

Nothing special is needed for running the redis-cliin interactive mode - just execute it without any arguments

$ redis-cli
127.0.0.1:6379> PING
PONG

The string 127.0.0.1:6379> is the prompt. It displays the connected Redis server instance's hostname and port.

The prompt updates as the connected server changes or when operating on a database different from the database number zero:

127.0.0.1:6379> SELECT 2
OK
127.0.0.1:6379[2]> DBSIZE
(integer) 1
127.0.0.1:6379[2]> SELECT 0
OK
127.0.0.1:6379> DBSIZE
(integer) 503

Handling connections and reconnections

Using the CONNECT command in interactive mode makes it possible to connect to a different instance, by specifying the hostname and port we want to connect to:

127.0.0.1:6379> CONNECT metal 6379
metal:6379> PING
PONG

As you can see the prompt changes accordingly when connecting to a different server instance. If a connection is attempted to an instance that is unreachable, the redis-cli goes into disconnected mode and attempts to reconnect with each new command:

127.0.0.1:6379> CONNECT 127.0.0.1 9999
Could not connect to Redis at 127.0.0.1:9999: Connection refused
not connected> PING
Could not connect to Redis at 127.0.0.1:9999: Connection refused
not connected> PING
Could not connect to Redis at 127.0.0.1:9999: Connection refused

Generally after a disconnection is detected, redis-cli always attempts to reconnect transparently; if the attempt fails, it shows the error and enters the disconnected state. The following is an example of disconnection and reconnection:

127.0.0.1:6379> INFO SERVER
Could not connect to Redis at 127.0.0.1:6379: Connection refused
not connected> PING
PONG
127.0.0.1:6379> 
(now we are connected again)

When a reconnection is performed, redis-cli automatically re-selects the last database number selected. However, all other states about the connection is lost, such as within a MULTI/EXEC transaction:

$ redis-cli
127.0.0.1:6379> MULTI
OK
127.0.0.1:6379> PING
QUEUED

( here the server is manually restarted )

127.0.0.1:6379> EXEC
(error) ERR EXEC without MULTI

This is usually not an issue when using the redis-cli in interactive mode for testing, but this limitation should be known.

Editing, history, completion and hints

Because redis-cli uses the linenoise line editing library, it always has line editing capabilities, without depending on libreadline or other optional libraries.

Command execution history can be accessed in order to avoid retyping commands by pressing the arrow keys (up and down). The history is preserved between restarts of the CLI, in a file named .rediscli_history inside the user home directory, as specified by the HOME environment variable. It is possible to use a different history filename by setting the REDISCLI_HISTFILE environment variable, and disable it by setting it to /dev/null.

The redis-cli is also able to perform command-name completion by pressing the TAB key, as in the following example:

127.0.0.1:6379> Z<TAB>
127.0.0.1:6379> ZADD<TAB>
127.0.0.1:6379> ZCARD<TAB>

Once Redis command name has been entered at the prompt, the redis-cli will display syntax hints. Like command history, this behavior can be turned on and off via the redis-cli preferences.

Preferences

There are two ways to customize redis-cli behavior. The file .redisclirc in the home directory is loaded by the CLI on startup. You can override the file's default location by setting the REDISCLI_RCFILE environment variable to an alternative path. Preferences can also be set during a CLI session, in which case they will last only the duration of the session.

To set preferences, use the special :set command. The following preferences can be set, either by typing the command in the CLI or adding it to the .redisclirc file:

  • :set hints - enables syntax hints
  • :set nohints - disables syntax hints

Running the same command N times

It is possible to run the same command multiple times in interactive mode by prefixing the command name by a number:

127.0.0.1:6379> 5 INCR mycounter
(integer) 1
(integer) 2
(integer) 3
(integer) 4
(integer) 5

Showing help about Redis commands

redis-cli provides online help for most Redis commands, using the HELP command. The command can be used in two forms:

  • HELP @<category> shows all the commands about a given category. The categories are:
    • @generic
    • @string
    • @list
    • @set
    • @sorted_set
    • @hash
    • @pubsub
    • @transactions
    • @connection
    • @server
    • @scripting
    • @hyperloglog
    • @cluster
    • @geo
    • @stream
  • HELP <commandname> shows specific help for the command given as argument.

For example in order to show help for the PFADD command, use:

127.0.0.1:6379> HELP PFADD

PFADD key element [element ...]
summary: Adds the specified elements to the specified HyperLogLog.
since: 2.8.9

Note that HELP supports TAB completion as well.

Clearing the terminal screen

Using the CLEAR command in interactive mode clears the terminal's screen.

Special modes of operation

So far we saw two main modes of redis-cli.

  • Command line execution of Redis commands.
  • Interactive "REPL" usage.

The CLI performs other auxiliary tasks related to Redis that are explained in the next sections:

  • Monitoring tool to show continuous stats about a Redis server.
  • Scanning a Redis database for very large keys.
  • Key space scanner with pattern matching.
  • Acting as a Pub/Sub client to subscribe to channels.
  • Monitoring the commands executed into a Redis instance.
  • Checking the latency of a Redis server in different ways.
  • Checking the scheduler latency of the local computer.
  • Transferring RDB backups from a remote Redis server locally.
  • Acting as a Redis replica for showing what a replica receives.
  • Simulating LRU workloads for showing stats about keys hits.
  • A client for the Lua debugger.

Continuous stats mode

Continuous stats mode is probably one of the lesser known yet very useful features of redis-cli to monitor Redis instances in real time. To enable this mode, the --stat option is used. The output is very clear about the behavior of the CLI in this mode:

$ redis-cli --stat
------- data ------ --------------------- load -------------------- - child -
keys       mem      clients blocked requests            connections
506        1015.00K 1       0       24 (+0)             7
506        1015.00K 1       0       25 (+1)             7
506        3.40M    51      0       60461 (+60436)      57
506        3.40M    51      0       146425 (+85964)     107
507        3.40M    51      0       233844 (+87419)     157
507        3.40M    51      0       321715 (+87871)     207
508        3.40M    51      0       408642 (+86927)     257
508        3.40M    51      0       497038 (+88396)     257

In this mode a new line is printed every second with useful information and differences of request values between old data points. Memory usage, client connection counts, and various other statistics about the connected Redis database can be easily understood with this auxiliary redis-cli tool.

The -i <interval> option in this case works as a modifier in order to change the frequency at which new lines are emitted. The default is one second.

Scanning for big keys

In this special mode, redis-cli works as a key space analyzer. It scans the dataset for big keys, but also provides information about the data types that the data set consists of. This mode is enabled with the --bigkeys option, and produces verbose output:

$ redis-cli --bigkeys

# Scanning the entire keyspace to find biggest keys as well as
# average sizes per key type.  You can use -i 0.01 to sleep 0.01 sec
# per SCAN command (not usually needed).

[00.00%] Biggest string found so far 'key-419' with 3 bytes
[05.14%] Biggest list   found so far 'mylist' with 100004 items
[35.77%] Biggest string found so far 'counter:__rand_int__' with 6 bytes
[73.91%] Biggest hash   found so far 'myobject' with 3 fields

-------- summary -------

Sampled 506 keys in the keyspace!
Total key length in bytes is 3452 (avg len 6.82)

Biggest string found 'counter:__rand_int__' has 6 bytes
Biggest   list found 'mylist' has 100004 items
Biggest   hash found 'myobject' has 3 fields

504 strings with 1403 bytes (99.60% of keys, avg size 2.78)
1 lists with 100004 items (00.20% of keys, avg size 100004.00)
0 sets with 0 members (00.00% of keys, avg size 0.00)
1 hashs with 3 fields (00.20% of keys, avg size 3.00)
0 zsets with 0 members (00.00% of keys, avg size 0.00)

In the first part of the output, each new key larger than the previous larger key (of the same type) encountered is reported. The summary section provides general stats about the data inside the Redis instance.

The program uses the SCAN command, so it can be executed against a busy server without impacting the operations, however the -i option can be used in order to throttle the scanning process of the specified fraction of second for each SCAN command.

For example, -i 0.01 will slow down the program execution considerably, but will also reduce the load on the server to a negligible amount.

Note that the summary also reports in a cleaner form the biggest keys found for each time. The initial output is just to provide some interesting info ASAP if running against a very large data set.

Getting a list of keys

It is also possible to scan the key space, again in a way that does not block the Redis server (which does happen when you use a command like KEYS *), and print all the key names, or filter them for specific patterns. This mode, like the --bigkeys option, uses the SCAN command, so keys may be reported multiple times if the dataset is changing, but no key would ever be missing, if that key was present since the start of the iteration. Because of the command that it uses this option is called --scan.

$ redis-cli --scan | head -10
key-419
key-71
key-236
key-50
key-38
key-458
key-453
key-499
key-446
key-371

Note that head -10 is used in order to print only the first lines of the output.

Scanning is able to use the underlying pattern matching capability of the SCAN command with the --pattern option.

$ redis-cli --scan --pattern '*-11*'
key-114
key-117
key-118
key-113
key-115
key-112
key-119
key-11
key-111
key-110
key-116

Piping the output through the wc command can be used to count specific kind of objects, by key name:

$ redis-cli --scan --pattern 'user:*' | wc -l
3829433

You can use -i 0.01 to add a delay between calls to the SCAN command. This will make the command slower but will significantly reduce load on the server.

Pub/sub mode

The CLI is able to publish messages in Redis Pub/Sub channels using the PUBLISH command. Subscribing to channels in order to receive messages is different - the terminal is blocked and waits for messages, so this is implemented as a special mode in redis-cli. Unlike other special modes this mode is not enabled by using a special option, but simply by using the SUBSCRIBE or PSUBSCRIBE command, which are available in interactive or command mode:

$ redis-cli PSUBSCRIBE '*'
Reading messages... (press Ctrl-C to quit)
1) "PSUBSCRIBE"
2) "*"
3) (integer) 1

The reading messages message shows that we entered Pub/Sub mode. When another client publishes some message in some channel, such as with the command redis-cli PUBLISH mychannel mymessage, the CLI in Pub/Sub mode will show something such as:

1) "pmessage"
2) "*"
3) "mychannel"
4) "mymessage"

This is very useful for debugging Pub/Sub issues. To exit the Pub/Sub mode just process CTRL-C.

Monitoring commands executed in Redis

Similarly to the Pub/Sub mode, the monitoring mode is entered automatically once you use the MONITOR commnad. All commands received by the active Redis instance will be printed to the standard output:

$ redis-cli MONITOR
OK
1460100081.165665 [0 127.0.0.1:51706] "set" "shipment:8000736522714:status" "sorting"
1460100083.053365 [0 127.0.0.1:51707] "get" "shipment:8000736522714:status"

Note that it is possible to use to pipe the output, so you can monitor for specific patterns using tools such as grep.

Monitoring the latency of Redis instances

Redis is often used in contexts where latency is very critical. Latency involves multiple moving parts within the application, from the client library to the network stack, to the Redis instance itself.

The redis-cli has multiple facilities for studying the latency of a Redis instance and understanding the latency's maximum, average and distribution.

The basic latency-checking tool is the --latency option. Using this option the CLI runs a loop where the PING command is sent to the Redis instance and the time to receive a reply is measured. This happens 100 times per second, and stats are updated in a real time in the console:

$ redis-cli --latency
min: 0, max: 1, avg: 0.19 (427 samples)

The stats are provided in milliseconds. Usually, the average latency of a very fast instance tends to be overestimated a bit because of the latency due to the kernel scheduler of the system running redis-cli itself, so the average latency of 0.19 above may easily be 0.01 or less. However this is usually not a big problem, since most developers are interested in events of a few milliseconds or more.

Sometimes it is useful to study how the maximum and average latencies evolve during time. The --latency-history option is used for that purpose: it works exactly like --latency, but every 15 seconds (by default) a new sampling session is started from scratch:

$ redis-cli --latency-history
min: 0, max: 1, avg: 0.14 (1314 samples) -- 15.01 seconds range
min: 0, max: 1, avg: 0.18 (1299 samples) -- 15.00 seconds range
min: 0, max: 1, avg: 0.20 (113 samples)^C

Sampling sessions' length can be changed with the -i <interval> option.

The most advanced latency study tool, but also the most complex to interpret for non-experienced users, is the ability to use color terminals to show a spectrum of latencies. You'll see a colored output that indicates the different percentages of samples, and different ASCII characters that indicate different latency figures. This mode is enabled using the --latency-dist option:

$ redis-cli --latency-dist
(output not displayed, requires a color terminal, try it!)

There is another pretty unusual latency tool implemented inside redis-cli. It does not check the latency of a Redis instance, but the latency of the computer running redis-cli. This latency is intrinsic to the kernel scheduler, the hypervisor in case of virtualized instances, and so forth.

Redis calls it intrinsic latency because it's mostly opaque to the programmer. If the Redis instance has high latency regardless of all the obvious things that may be the source cause, it's worth to check what's the best your system can do by running redis-cli in this special mode directly in the system you are running Redis servers on.

By measuring the intrinsic latency, you know that this is the baseline, and Redis cannot outdo your system. In order to run the CLI in this mode, use the --intrinsic-latency <test-time>. Note that the test time is in seconds and dictates how long the test should run.

$ ./redis-cli --intrinsic-latency 5
Max latency so far: 1 microseconds.
Max latency so far: 7 microseconds.
Max latency so far: 9 microseconds.
Max latency so far: 11 microseconds.
Max latency so far: 13 microseconds.
Max latency so far: 15 microseconds.
Max latency so far: 34 microseconds.
Max latency so far: 82 microseconds.
Max latency so far: 586 microseconds.
Max latency so far: 739 microseconds.

65433042 total runs (avg latency: 0.0764 microseconds / 764.14 nanoseconds per run).
Worst run took 9671x longer than the average latency.

IMPORTANT: this command must be executed on the computer that runs the Redis server instance, not on a different host. It does not connect to a Redis instance and performs the test locally.

In the above case, the system cannot do better than 739 microseconds of worst case latency, so one can expect certain queries to occasionally run less than 1 millisecond.

Remote backups of RDB files

During a Redis replication's first synchronization, the primary and the replica exchange the whole data set in the form of an RDB file. This feature is exploited by redis-cli in order to provide a remote backup facility that allows a transfer of an RDB file from any Redis instance to the local computer running redis-cli. To use this mode, call the CLI with the --rdb <dest-filename> option:

$ redis-cli --rdb /tmp/dump.rdb
SYNC sent to master, writing 13256 bytes to '/tmp/dump.rdb'
Transfer finished with success.

This is a simple but effective way to ensure disaster recovery RDB backups exist of your Redis instance. When using this options in scripts or cron jobs, make sure to check the return value of the command. If it is non zero, an error occurred as in the following example:

$ redis-cli --rdb /tmp/dump.rdb
SYNC with master failed: -ERR Can't SYNC while not connected with my master
$ echo $?
1

Replica mode

The replica mode of the CLI is an advanced feature useful for Redis developers and for debugging operations. It allows for the inspection of the content a primary sends to its replicas in the replication stream in order to propagate the writes to its replicas. The option name is simply --replica. The following is a working example:

$ redis-cli --replica
SYNC with master, discarding 13256 bytes of bulk transfer...
SYNC done. Logging commands from master.
"PING"
"SELECT","0"
"SET","last_name","Enigk"
"PING"
"INCR","mycounter"

The command begins by discarding the RDB file of the first synchronization and then logs each command received in CSV format.

If you think some of the commands are not replicated correctly in your replicas this is a good way to check what's happening, and also useful information in order to improve the bug report.

Performing an LRU simulation

Redis is often used as a cache with LRU eviction. Depending on the number of keys and the amount of memory allocated for the cache (specified via the maxmemory directive), the amount of cache hits and misses will change. Sometimes, simulating the rate of hits is very useful to correctly provision your cache.

The redis-cli has a special mode where it performs a simulation of GET and SET operations, using an 80-20% power law distribution in the requests pattern. This means that 20% of keys will be requested 80% of times, which is a common distribution in caching scenarios.

Theoretically, given the distribution of the requests and the Redis memory overhead, it should be possible to compute the hit rate analytically with a mathematical formula. However, Redis can be configured with different LRU settings (number of samples) and LRU's implementation, which is approximated in Redis, changes a lot between different versions. Similarly the amount of memory per key may change between versions. That is why this tool was built: its main motivation was for testing the quality of Redis' LRU implementation, but now is also useful for testing how a given version behaves with the settings originally intended for deployment.

To use this mode, specify the amount of keys in the test and configure a sensible maxmemory setting as a first attempt.

IMPORTANT NOTE: Configuring the maxmemory setting in the Redis configuration is crucial: if there is no cap to the maximum memory usage, the hit will eventually be 100% since all the keys can be stored in memory. If too many keys are specified with maximum memory, eventually all of the computer RAM will be used. It is also needed to configure an appropriate maxmemory policy; most of the time allkeys-lru is selected.

In the following example there is a configured a memory limit of 100MB and an LRU simulation using 10 million keys.

WARNING: the test uses pipelining and will stress the server, don't use it with production instances.

$ ./redis-cli --lru-test 10000000
156000 Gets/sec | Hits: 4552 (2.92%) | Misses: 151448 (97.08%)
153750 Gets/sec | Hits: 12906 (8.39%) | Misses: 140844 (91.61%)
159250 Gets/sec | Hits: 21811 (13.70%) | Misses: 137439 (86.30%)
151000 Gets/sec | Hits: 27615 (18.29%) | Misses: 123385 (81.71%)
145000 Gets/sec | Hits: 32791 (22.61%) | Misses: 112209 (77.39%)
157750 Gets/sec | Hits: 42178 (26.74%) | Misses: 115572 (73.26%)
154500 Gets/sec | Hits: 47418 (30.69%) | Misses: 107082 (69.31%)
151250 Gets/sec | Hits: 51636 (34.14%) | Misses: 99614 (65.86%)

The program shows stats every second. In the first seconds the cache starts to be populated. The misses rate later stabilizes into the actual figure that can be expected:

120750 Gets/sec | Hits: 48774 (40.39%) | Misses: 71976 (59.61%)
122500 Gets/sec | Hits: 49052 (40.04%) | Misses: 73448 (59.96%)
127000 Gets/sec | Hits: 50870 (40.06%) | Misses: 76130 (59.94%)
124250 Gets/sec | Hits: 50147 (40.36%) | Misses: 74103 (59.64%)

A miss rate of 59% may not be acceptable for certain use cases therefor 100MB of memory is not enough. Observe an example using a half gigabyte of memory. After several minutes the output stabilizes to the following figures:

140000 Gets/sec | Hits: 135376 (96.70%) | Misses: 4624 (3.30%)
141250 Gets/sec | Hits: 136523 (96.65%) | Misses: 4727 (3.35%)
140250 Gets/sec | Hits: 135457 (96.58%) | Misses: 4793 (3.42%)
140500 Gets/sec | Hits: 135947 (96.76%) | Misses: 4553 (3.24%)

With 500MB there is sufficient space for the key quantity (10 million) and distribution (80-20 style).

3 - Client-side caching in Redis

Server-assisted, client-side caching in Redis

Client-side caching is a technique used to create high performance services. It exploits the memory available on application servers, servers that are usually distinct computers compared to the database nodes, to store some subset of the database information directly in the application side.

Normally when data is required, the application servers ask the database about such information, like in the following diagram:

+-------------+                                +----------+
|             | ------- GET user:1234 -------> |          |
| Application |                                | Database |
|             | <---- username = Alice ------- |          |
+-------------+                                +----------+

When client-side caching is used, the application will store the reply of popular queries directly inside the application memory, so that it can reuse such replies later, without contacting the database again:

+-------------+                                +----------+
|             |                                |          |
| Application |       ( No chat needed )       | Database |
|             |                                |          |
+-------------+                                +----------+
| Local cache |
|             |
| user:1234 = |
| username    |
| Alice       |
+-------------+

While the application memory used for the local cache may not be very big, the time needed in order to access the local computer memory is orders of magnitude smaller compared to accessing a networked service like a database. Since often the same small percentage of data are accessed frequently, this pattern can greatly reduce the latency for the application to get data and, at the same time, the load in the database side.

Moreover there are many datasets where items change very infrequently. For instance, most user posts in a social network are either immutable or rarely edited by the user. Adding to this the fact that usually a small percentage of the posts are very popular, either because a small set of users have a lot of followers and/or because recent posts have a lot more visibility, it is clear why such a pattern can be very useful.

Usually the two key advantages of client-side caching are:

  1. Data is available with a very small latency.
  2. The database system receives less queries, allowing it to serve the same dataset with a smaller number of nodes.

There are two hard problems in computer science...

A problem with the above pattern is how to invalidate the information that the application is holding, in order to avoid presenting stale data to the user. For example after the application above locally cached the information for user:1234, Alice may update her username to Flora. Yet the application may continue to serve the old username for user:1234.

Sometimes, depending on the exact application we are modeling, this isn't a big deal, so the client will just use a fixed maximum "time to live" for the cached information. Once a given amount of time has elapsed, the information will no longer be considered valid. More complex patterns, when using Redis, leverage the Pub/Sub system in order to send invalidation messages to listening clients. This can be made to work but is tricky and costly from the point of view of the bandwidth used, because often such patterns involve sending the invalidation messages to every client in the application, even if certain clients may not have any copy of the invalidated data. Moreover every application query altering the data requires to use the PUBLISH command, costing the database more CPU time to process this command.

Regardless of what schema is used, there is a simple fact: many very large applications implement some form of client-side caching, because it is the next logical step to having a fast store or a fast cache server. For this reason Redis 6 implements direct support for client-side caching, in order to make this pattern much simpler to implement, more accessible, reliable, and efficient.

The Redis implementation of client-side caching

The Redis client-side caching support is called Tracking, and has two modes:

  • In the default mode, the server remembers what keys a given client accessed, and sends invalidation messages when the same keys are modified. This costs memory in the server side, but sends invalidation messages only for the set of keys that the client might have in memory.
  • In the broadcasting mode, the server does not attempt to remember what keys a given client accessed, so this mode costs no memory at all in the server side. Instead clients subscribe to key prefixes such as object: or user:, and receive a notification message every time a key matching a subscribed prefix is touched.

To recap, for now let's forget for a moment about the broadcasting mode, to focus on the first mode. We'll describe broadcasting later more in details.

  1. Clients can enable tracking if they want. Connections start without tracking enabled.
  2. When tracking is enabled, the server remembers what keys each client requested during the connection lifetime (by sending read commands about such keys).
  3. When a key is modified by some client, or is evicted because it has an associated expire time, or evicted because of a maxmemory policy, all the clients with tracking enabled that may have the key cached, are notified with an invalidation message.
  4. When clients receive invalidation messages, they are required to remove the corresponding keys, in order to avoid serving stale data.

This is an example of the protocol:

  • Client 1 -> Server: CLIENT TRACKING ON
  • Client 1 -> Server: GET foo
  • (The server remembers that Client 1 may have the key "foo" cached)
  • (Client 1 may remember the value of "foo" inside its local memory)
  • Client 2 -> Server: SET foo SomeOtherValue
  • Server -> Client 1: INVALIDATE "foo"

This looks great superficially, but if you imagine 10k connected clients all asking for millions of keys over long living connection, the server ends up storing too much information. For this reason Redis uses two key ideas in order to limit the amount of memory used server-side and the CPU cost of handling the data structures implementing the feature:

  • The server remembers the list of clients that may have cached a given key in a single global table. This table is called the Invalidation Table. The invalidation table can contain a maximum number of entries. If a new key is inserted, the server may evict an older entry by pretending that such key was modified (even if it was not), and sending an invalidation message to the clients. Doing so, it can reclaim the memory used for this key, even if this will force the clients having a local copy of the key to evict it.
  • Inside the invalidation table we don't really need to store pointers to clients' structures, that would force a garbage collection procedure when the client disconnects: instead what we do is just store client IDs (each Redis client has an unique numerical ID). If a client disconnects, the information will be incrementally garbage collected as caching slots are invalidated.
  • There is a single keys namespace, not divided by database numbers. So if a client is caching the key foo in database 2, and some other client changes the value of the key foo in database 3, an invalidation message will still be sent. This way we can ignore database numbers reducing both the memory usage and the implementation complexity.

Two connections mode

Using the new version of the Redis protocol, RESP3, supported by Redis 6, it is possible to run the data queries and receive the invalidation messages in the same connection. However many client implementations may prefer to implement client-side caching using two separated connections: one for data, and one for invalidation messages. For this reason when a client enables tracking, it can specify to redirect the invalidation messages to another connection by specifying the "client ID" of a different connection. Many data connections can redirect invalidation messages to the same connection, this is useful for clients implementing connection pooling. The two connections model is the only one that is also supported for RESP2 (which lacks the ability to multiplex different kind of information in the same connection).

Here's an example of a complete session using the Redis protocol in the old RESP2 mode involving the following steps: enabling tracking redirecting to another connection, asking for a key, and getting an invalidation message once the key gets modified.

To start, the client opens a first connection that will be used for invalidations, requests the connection ID, and subscribes via Pub/Sub to the special channel that is used to get invalidation messages when in RESP2 modes (remember that RESP2 is the usual Redis protocol, and not the more advanced protocol that you can use, optionally, with Redis 6 using the HELLO command):

(Connection 1 -- used for invalidations)
CLIENT ID
:4
SUBSCRIBE __redis__:invalidate
*3
$9
subscribe
$20
__redis__:invalidate
:1

Now we can enable tracking from the data connection:

(Connection 2 -- data connection)
CLIENT TRACKING on REDIRECT 4
+OK

GET foo
$3
bar

The client may decide to cache "foo" => "bar" in the local memory.

A different client will now modify the value of the "foo" key:

(Some other unrelated connection)
SET foo bar
+OK

As a result, the invalidations connection will receive a message that invalidates the specified key.

(Connection 1 -- used for invalidations)
*3
$7
message
$20
__redis__:invalidate
*1
$3
foo

The client will check if there are cached keys in this caching slot, and will evict the information that is no longer valid.

Note that the third element of the Pub/Sub message is not a single key but is a Redis array with just a single element. Since we send an array, if there are groups of keys to invalidate, we can do that in a single message. In case of a flush (FLUSHALL or FLUSHDB), a null message will be sent.

A very important thing to understand about client-side caching used with RESP2 and a Pub/Sub connection in order to read the invalidation messages, is that using Pub/Sub is entirely a trick in order to reuse old client implementations, but actually the message is not really sent to a channel and received by all the clients subscribed to it. Only the connection we specified in the REDIRECT argument of the CLIENT command will actually receive the Pub/Sub message, making the feature a lot more scalable.

When RESP3 is used instead, invalidation messages are sent (either in the same connection, or in the secondary connection when redirection is used) as push messages (read the RESP3 specification for more information).

What tracking tracks

As you can see clients do not need, by default, to tell the server what keys they are caching. Every key that is mentioned in the context of a read-only command is tracked by the server, because it could be cached.

This has the obvious advantage of not requiring the client to tell the server what it is caching. Moreover in many clients implementations, this is what you want, because a good solution could be to just cache everything that is not already cached, using a first-in first-out approach: we may want to cache a fixed number of objects, every new data we retrieve, we could cache it, discarding the oldest cached object. More advanced implementations may instead drop the least used object or alike.

Note that anyway if there is write traffic on the server, caching slots will get invalidated during the course of the time. In general when the server assumes that what we get we also cache, we are making a tradeoff:

  1. It is more efficient when the client tends to cache many things with a policy that welcomes new objects.
  2. The server will be forced to retain more data about the client keys.
  3. The client will receive useless invalidation messages about objects it did not cache.

So there is an alternative described in the next section.

Opt-in caching

Clients implementations may want to cache only selected keys, and communicate explicitly to the server what they'll cache and what they will not. This will require more bandwidth when caching new objects, but at the same time reduces the amount of data that the server has to remember and the amount of invalidation messages received by the client.

In order to do this, tracking must be enabled using the OPTIN option:

CLIENT TRACKING on REDIRECT 1234 OPTIN

In this mode, by default, keys mentioned in read queries are not supposed to be cached, instead when a client wants to cache something, it must send a special command immediately before the actual command to retrieve the data:

CLIENT CACHING YES
+OK
GET foo
"bar"

The CACHING command affects the command executed immediately after it, however in case the next command is MULTI, all the commands in the transaction will be tracked. Similarly in case of Lua scripts, all the commands executed by the script will be tracked.

Broadcasting mode

So far we described the first client-side caching model that Redis implements. There is another one, called broadcasting, that sees the problem from the point of view of a different tradeoff, does not consume any memory on the server side, but instead sends more invalidation messages to clients. In this mode we have the following main behaviors:

  • Clients enable client-side caching using the BCAST option, specifying one or more prefixes using the PREFIX option. For instance: CLIENT TRACKING on REDIRECT 10 BCAST PREFIX object: PREFIX user:. If no prefix is specified at all, the prefix is assumed to be the empty string, so the client will receive invalidation messages for every key that gets modified. Instead if one or more prefixes are used, only keys matching one of the specified prefixes will be sent in the invalidation messages.
  • The server does not store anything in the invalidation table. Instead it uses a different Prefixes Table, where each prefix is associated to a list of clients.
  • No two prefixes can track overlapping parts of the keyspace. For instance, having the prefix "foo" and "foob" would not be allowed, since they would both trigger an invalidation for the key "foobar". However, just using the prefix "foo" is sufficient.
  • Every time a key matching any of the prefixes is modified, all the clients subscribed to that prefix, will receive the invalidation message.
  • The server will consume CPU proportional to the number of registered prefixes. If you have just a few, it is hard to see any difference. With a big number of prefixes the CPU cost can become quite large.
  • In this mode the server can perform the optimization of creating a single reply for all the clients subscribed to a given prefix, and send the same reply to all. This helps to lower the CPU usage.

The NOLOOP option

By default client-side tracking will send invalidation messages to the client that modified the key. Sometimes clients want this, since they implement very basic logic that does not involve automatically caching writes locally. However, more advanced clients may want to cache even the writes they are doing in the local in-memory table. In such case receiving an invalidation message immediately after the write is a problem, since it will force the client to evict the value it just cached.

In this case it is possible to use the NOLOOP option: it works both in normal and broadcasting mode. Using this option, clients are able to tell the server they don't want to receive invalidation messages for keys that they modified.

Avoiding race conditions

When implementing client-side caching redirecting the invalidation messages to a different connection, you should be aware that there is a possible race condition. See the following example interaction, where we'll call the data connection "D" and the invalidation connection "I":

[D] client -> server: GET foo
[I] server -> client: Invalidate foo (somebody else touched it)
[D] server -> client: "bar" (the reply of "GET foo")

As you can see, because the reply to the GET was slower to reach the client, we received the invalidation message before the actual data that is already no longer valid. So we'll keep serving a stale version of the foo key. To avoid this problem, it is a good idea to populate the cache when we send the command with a placeholder:

Client cache: set the local copy of "foo" to "caching-in-progress"
[D] client-> server: GET foo.
[I] server -> client: Invalidate foo (somebody else touched it)
Client cache: delete "foo" from the local cache.
[D] server -> client: "bar" (the reply of "GET foo")
Client cache: don't set "bar" since the entry for "foo" is missing.

Such a race condition is not possible when using a single connection for both data and invalidation messages, since the order of the messages is always known in that case.

What to do when losing connection with the server

Similarly, if we lost the connection with the socket we use in order to get the invalidation messages, we may end with stale data. In order to avoid this problem, we need to do the following things:

  1. Make sure that if the connection is lost, the local cache is flushed.
  2. Both when using RESP2 with Pub/Sub, or RESP3, ping the invalidation channel periodically (you can send PING commands even when the connection is in Pub/Sub mode!). If the connection looks broken and we are not able to receive ping backs, after a maximum amount of time, close the connection and flush the cache.

What to cache

Clients may want to run internal statistics about the number of times a given cached key was actually served in a request, to understand in the future what is good to cache. In general:

  • We don't want to cache many keys that change continuously.
  • We don't want to cache many keys that are requested very rarely.
  • We want to cache keys that are requested often and change at a reasonable rate. For an example of key not changing at a reasonable rate, think of a global counter that is continuously INCRemented.

However simpler clients may just evict data using some random sampling just remembering the last time a given cached value was served, trying to evict keys that were not served recently.

Other hints for implementing client libraries

  • Handling TTLs: make sure you also request the key TTL and set the TTL in the local cache if you want to support caching keys with a TTL.
  • Putting a max TTL on every key is a good idea, even if it has no TTL. This protects against bugs or connection issues that would make the client have old data in the local copy.
  • Limiting the amount of memory used by clients is absolutely needed. There must be a way to evict old keys when new ones are added.

Limiting the amount of memory used by Redis

Be sure to configure a suitable value for the maximum number of keys remembered by Redis or alternatively use the BCAST mode that consumes no memory at all on the Redis side. Note that the memory consumed by Redis when BCAST is not used, is proportional both to the number of keys tracked and the number of clients requesting such keys.

4 - Redis configuration

Overview of redis.conf, the Redis configuration file

Redis is able to start without a configuration file using a built-in default configuration, however this setup is only recommended for testing and development purposes.

The proper way to configure Redis is by providing a Redis configuration file, usually called redis.conf.

The redis.conf file contains a number of directives that have a very simple format:

keyword argument1 argument2 ... argumentN

This is an example of a configuration directive:

replicaof 127.0.0.1 6380

It is possible to provide strings containing spaces as arguments using (double or single) quotes, as in the following example:

requirepass "hello world"

Single-quoted string can contain characters escaped by backslashes, and double-quoted strings can additionally include any ASCII symbols encoded using backslashed hexadecimal notation "\xff".

The list of configuration directives, and their meaning and intended usage is available in the self documented example redis.conf shipped into the Redis distribution.

Passing arguments via the command line

You can also pass Redis configuration parameters using the command line directly. This is very useful for testing purposes. The following is an example that starts a new Redis instance using port 6380 as a replica of the instance running at 127.0.0.1 port 6379.

./redis-server --port 6380 --replicaof 127.0.0.1 6379

The format of the arguments passed via the command line is exactly the same as the one used in the redis.conf file, with the exception that the keyword is prefixed with --.

Note that internally this generates an in-memory temporary config file (possibly concatenating the config file passed by the user if any) where arguments are translated into the format of redis.conf.

Changing Redis configuration while the server is running

It is possible to reconfigure Redis on the fly without stopping and restarting the service, or querying the current configuration programmatically using the special commands [CONFIG SET](/commands/config-set) and [CONFIG GET](/commands/config-get)

Not all of the configuration directives are supported in this way, but most are supported as expected. Please refer to the [CONFIG SET](/commands/config-set) and [CONFIG GET](/commands/config-get) pages for more information.

Note that modifying the configuration on the fly has no effects on the redis.conf file so at the next restart of Redis the old configuration will be used instead.

Make sure to also modify the redis.conf file accordingly to the configuration you set using [CONFIG SET](/commands/config-set). You can do it manually or you can use [CONFIG REWRITE](/commands/config-rewrite), which will automatically scan your redis.conf file and update the fields which don't match the current configuration value. Fields non existing but set to the default value are not added. Comments inside your configuration file are retained.

Configuring Redis as a cache

If you plan to use Redis as a cache where every key will have an expire set, you may consider using the following configuration instead (assuming a max memory limit of 2 megabytes as an example):

maxmemory 2mb
maxmemory-policy allkeys-lru

In this configuration there is no need for the application to set a time to live for keys using the EXPIRE command (or equivalent) since all the keys will be evicted using an approximated LRU algorithm as long as we hit the 2 megabyte memory limit.

Basically in this configuration Redis acts in a similar way to memcached. We have more extensive documentation about using Redis as an LRU cache here.

5 - Redis data types

Overview of the many data types supported by Redis

Strings

Strings are the most basic kind of Redis value. Redis Strings are binary safe, this means that a Redis string can contain any kind of data, for instance a JPEG image or a serialized Ruby object.

A String value can be at max 512 Megabytes in length.

You can do a number of interesting things using strings in Redis, for instance you can:

  • Use Strings as atomic counters using commands in the INCR family: INCR, DECR, INCRBY.
  • Append to strings with the APPEND command.
  • Use Strings as a random access vectors with GETRANGE and SETRANGE.
  • Encode a lot of data in little space, or create a Redis backed Bloom Filter using GETBIT and SETBIT.

Check all the available string commands for more information, or read the introduction to Redis data types.

Lists

Redis Lists are simply lists of strings, sorted by insertion order. It is possible to add elements to a Redis List pushing new elements on the head (on the left) or on the tail (on the right) of the list.

The LPUSH command inserts a new element on the head, while RPUSH inserts a new element on the tail. A new list is created when one of this operations is performed against an empty key. Similarly the key is removed from the key space if a list operation will empty the list. These are very handy semantics since all the list commands will behave exactly like they were called with an empty list if called with a non-existing key as argument.

Some example of list operations and resulting lists:

LPUSH mylist a   # now the list is "a"
LPUSH mylist b   # now the list is "b","a"
RPUSH mylist c   # now the list is "b","a","c" (RPUSH was used this time)

The max length of a list is 2^32 - 1 elements (4294967295, more than 4 billion of elements per list).

The main features of Redis Lists from the point of view of time complexity are the support for constant time insertion and deletion of elements near the head and tail, even with many millions of inserted items. Accessing elements is very fast near the extremes of the list but is slow if you try accessing the middle of a very big list, as it is an O(N) operation.

You can do many interesting things with Redis Lists, for instance you can:

  • Model a timeline in a social network, using LPUSH in order to add new elements in the user time line, and using LRANGE in order to retrieve a few of recently inserted items.
  • You can use LPUSH together with LTRIM to create a list that never exceeds a given number of elements, but just remembers the latest N elements.
  • Lists can be used as a message passing primitive, See for instance the well known Resque Ruby library for creating background jobs.
  • You can do a lot more with lists, this data type supports a number of commands, including blocking commands like BLPOP.

Please check all the available commands operating on lists for more information, or read the introduction to Redis data types.

Sets


Redis Sets are an unordered collection of Strings. It is possible to add, remove, and test for existence of members in O(1) (constant time regardless of the number of elements contained inside the Set).

Redis Sets have the desirable property of not allowing repeated members. Adding the same element multiple times will result in a set having a single copy of this element. Practically speaking this means that adding a member does not require a check if exists then add operation.

A very interesting thing about Redis Sets is that they support a number of server side commands to compute sets starting from existing sets, so you can do unions, intersections, differences of sets in very short time.

The max number of members in a set is 2^32 - 1 (4294967295, more than 4 billion of members per set).

You can do many interesting things using Redis Sets, for instance you can:

  • You can track unique things using Redis Sets. Want to know all the unique IP addresses visiting a given blog post? Simply use SADD every time you process a page view. You are sure repeated IPs will not be inserted.
  • Redis Sets are good to represent relations. You can create a tagging system with Redis using a Set to represent every tag. Then you can add all the IDs of all the objects having a given tag into a Set representing this particular tag, using the SADD command. Do you want all the IDs of all the Objects having three different tags at the same time? Just use SINTER.
  • You can use Sets to extract elements at random using the SPOP or SRANDMEMBER commands.

As usual, check the full list of Set commands for more information, or read the introduction to Redis data types.

Hashes

Redis Hashes are maps between string fields and string values, so they are the perfect data type to represent objects (e.g. A User with a number of fields like name, surname, age, and so forth):

HMSET user:1000 username antirez password P1pp0 age 34
HGETALL user:1000
HSET user:1000 password 12345
HGETALL user:1000

A hash with a few fields (where few means up to one hundred or so) is stored in a way that takes very little space, so you can store millions of objects in a small Redis instance.

While Hashes are used mainly to represent objects, they are capable of storing many elements, so you can use Hashes for many other tasks as well.

Every hash can store up to 2^32 - 1 field-value pairs (more than 4 billion).

Check the full list of Hash commands for more information, or read the introduction to Redis data types.

Sorted Sets

Redis Sorted Sets are, similarly to Redis Sets, non repeating collections of Strings. The difference is that every member of a Sorted Set is associated with a score, that is used keep the Sorted Set in order, from the smallest to the greatest score. While members are unique, scores may be repeated.

With Sorted Sets you can add, remove, or update elements in a very fast way (in a time proportional to the logarithm of the number of elements). Since elements are stored in order and not ordered afterwards, you can also get ranges by score or by rank (position) in a very fast way. Accessing the middle of a Sorted Set is also very fast, so you can use Sorted Sets as a smart list of non repeating elements where you can quickly access everything you need: elements in order, fast existence test, fast access to elements in the middle!

In short with Sorted Sets you can do a lot of tasks with great performance that are really hard to model in other kind of databases.

With Sorted Sets you can:

  • Build a leaderboard in a massive online game, where every time a new score is submitted you update it using ZADD. You can easily retrieve the top users using ZRANGE, you can also, given a user name, return its rank in the listing using ZRANK. Using ZRANK and ZRANGE together you can show users with a score similar to a given user. All very quickly.
  • Sorted Sets are often used in order to index data that is stored inside Redis. For instance if you have many hashes representing users, you can use a Sorted Set with members having the age of the user as the score and the ID of the user as the value. So using ZRANGEBYSCORE it will be trivial and fast to retrieve all the users with a given age range.

Sorted Sets are one of the more advanced Redis data types, so take some time to check the full list of Sorted Set commands to discover what you can do with Redis! Also you may want to read the Introduction to Redis Data Types.

Bitmaps and HyperLogLogs

Redis also supports Bitmaps and HyperLogLogs which are actually data types based on the String base type, but having their own semantics.

Please refer to the data types tutorial for information about those types.

Streams

A Redis stream is a data structure that acts like an append-only log. Streams are useful for recording events in the order they occur. See the Redis streams docs for details and usage.

Geospatial indexes

Redis provides geospatial indexes, which are useful for finding locations within a given geographic radius. You can add locations to a geospatial index using the GEOADD command. You then search for locations within a given radius using the GEORADIUS command.

See the complete geospatial index command reference for all of the details.

5.1 - Data types tutorial

Learning the basic Redis data types and how to use them

Redis is not a plain key-value store, it is actually a data structures server, supporting different kinds of values. What this means is that, while in traditional key-value stores you associate string keys to string values, in Redis the value is not limited to a simple string, but can also hold more complex data structures. The following is the list of all the data structures supported by Redis, which will be covered separately in this tutorial:

  • Binary-safe strings.
  • Lists: collections of string elements sorted according to the order of insertion. They are basically linked lists.
  • Sets: collections of unique, unsorted string elements.
  • Sorted sets, similar to Sets but where every string element is associated to a floating number value, called score. The elements are always taken sorted by their score, so unlike Sets it is possible to retrieve a range of elements (for example you may ask: give me the top 10, or the bottom 10).
  • Hashes, which are maps composed of fields associated with values. Both the field and the value are strings. This is very similar to Ruby or Python hashes.
  • Bit arrays (or simply bitmaps): it is possible, using special commands, to handle String values like an array of bits: you can set and clear individual bits, count all the bits set to 1, find the first set or unset bit, and so forth.
  • HyperLogLogs: this is a probabilistic data structure which is used in order to estimate the cardinality of a set. Don't be scared, it is simpler than it seems... See later in the HyperLogLog section of this tutorial.
  • Streams: append-only collections of map-like entries that provide an abstract log data type. They are covered in depth in the Introduction to Redis Streams.

It's not always trivial to grasp how these data types work and what to use in order to solve a given problem from the command reference, so this document is a crash course in Redis data types and their most common patterns.

For all the examples we'll use the redis-cli utility, a simple but handy command-line utility, to issue commands against the Redis server.

Keys

Redis keys are binary safe, this means that you can use any binary sequence as a key, from a string like "foo" to the content of a JPEG file. The empty string is also a valid key.

A few other rules about keys:

  • Very long keys are not a good idea. For instance a key of 1024 bytes is a bad idea not only memory-wise, but also because the lookup of the key in the dataset may require several costly key-comparisons. Even when the task at hand is to match the existence of a large value, hashing it (for example with SHA1) is a better idea, especially from the perspective of memory and bandwidth.
  • Very short keys are often not a good idea. There is little point in writing "u1000flw" as a key if you can instead write "user:1000:followers". The latter is more readable and the added space is minor compared to the space used by the key object itself and the value object. While short keys will obviously consume a bit less memory, your job is to find the right balance.
  • Try to stick with a schema. For instance "object-type:id" is a good idea, as in "user:1000". Dots or dashes are often used for multi-word fields, as in "comment:1234:reply.to" or "comment:1234:reply-to".
  • The maximum allowed key size is 512 MB.

Strings

The Redis String type is the simplest type of value you can associate with a Redis key. It is the only data type in Memcached, so it is also very natural for newcomers to use it in Redis.

Since Redis keys are strings, when we use the string type as a value too, we are mapping a string to another string. The string data type is useful for a number of use cases, like caching HTML fragments or pages.

Let's play a bit with the string type, using redis-cli (all the examples will be performed via redis-cli in this tutorial).

> set mykey somevalue
OK
> get mykey
"somevalue"

As you can see using the SET and the GET commands are the way we set and retrieve a string value. Note that SET will replace any existing value already stored into the key, in the case that the key already exists, even if the key is associated with a non-string value. So SET performs an assignment.

Values can be strings (including binary data) of every kind, for instance you can store a jpeg image inside a value. A value can't be bigger than 512 MB.

The SET command has interesting options, that are provided as additional arguments. For example, I may ask SET to fail if the key already exists, or the opposite, that it only succeed if the key already exists:

> set mykey newval nx
(nil)
> set mykey newval xx
OK

Even if strings are the basic values of Redis, there are interesting operations you can perform with them. For instance, one is atomic increment:

> set counter 100
OK
> incr counter
(integer) 101
> incr counter
(integer) 102
> incrby counter 50
(integer) 152

The INCR command parses the string value as an integer, increments it by one, and finally sets the obtained value as the new value. There are other similar commands like INCRBY, DECR and DECRBY. Internally it's always the same command, acting in a slightly different way.

What does it mean that INCR is atomic? That even multiple clients issuing INCR against the same key will never enter into a race condition. For instance, it will never happen that client 1 reads "10", client 2 reads "10" at the same time, both increment to 11, and set the new value to 11. The final value will always be 12 and the read-increment-set operation is performed while all the other clients are not executing a command at the same time.

There are a number of commands for operating on strings. For example the GETSET command sets a key to a new value, returning the old value as the result. You can use this command, for example, if you have a system that increments a Redis key using INCR every time your web site receives a new visitor. You may want to collect this information once every hour, without losing a single increment. You can GETSET the key, assigning it the new value of "0" and reading the old value back.

The ability to set or retrieve the value of multiple keys in a single command is also useful for reduced latency. For this reason there are the MSET and MGET commands:

> mset a 10 b 20 c 30
OK
> mget a b c
1) "10"
2) "20"
3) "30"

When MGET is used, Redis returns an array of values.

Altering and querying the key space

There are commands that are not defined on particular types, but are useful in order to interact with the space of keys, and thus, can be used with keys of any type.

For example the EXISTS command returns 1 or 0 to signal if a given key exists or not in the database, while the DEL command deletes a key and associated value, whatever the value is.

> set mykey hello
OK
> exists mykey
(integer) 1
> del mykey
(integer) 1
> exists mykey
(integer) 0

From the examples you can also see how DEL itself returns 1 or 0 depending on whether the key was removed (it existed) or not (there was no such key with that name).

There are many key space related commands, but the above two are the essential ones together with the TYPE command, which returns the kind of value stored at the specified key:

> set mykey x
OK
> type mykey
string
> del mykey
(integer) 1
> type mykey
none

Key expiration

Before moving on, we should look at an important Redis feature that works regardless of the type of value you're storing: key expiration. Key expiration lets you set a timeout for a key, also known as a "time to live", or "TTL". When the time to live elapses, the key is automatically destroyed.

A few important notes about key expiration:

  • They can be set both using seconds or milliseconds precision.
  • However the expire time resolution is always 1 millisecond.
  • Information about expires are replicated and persisted on disk, the time virtually passes when your Redis server remains stopped (this means that Redis saves the date at which a key will expire).

Use the EXPIRE command to set a key's expiration:

> set key some-value
OK
> expire key 5
(integer) 1
> get key (immediately)
"some-value"
> get key (after some time)
(nil)

The key vanished between the two GET calls, since the second call was delayed more than 5 seconds. In the example above we used EXPIRE in order to set the expire (it can also be used in order to set a different expire to a key already having one, like PERSIST can be used in order to remove the expire and make the key persistent forever). However we can also create keys with expires using other Redis commands. For example using SET options:

> set key 100 ex 10
OK
> ttl key
(integer) 9

The example above sets a key with the string value 100, having an expire of ten seconds. Later the TTL command is called in order to check the remaining time to live for the key.

In order to set and check expires in milliseconds, check the PEXPIRE and the PTTL commands, and the full list of SET options.

Lists

To explain the List data type it's better to start with a little bit of theory, as the term List is often used in an improper way by information technology folks. For instance "Python Lists" are not what the name may suggest (Linked Lists), but rather Arrays (the same data type is called Array in Ruby actually).

From a very general point of view a List is just a sequence of ordered elements: 10,20,1,2,3 is a list. But the properties of a List implemented using an Array are very different from the properties of a List implemented using a Linked List.

Redis lists are implemented via Linked Lists. This means that even if you have millions of elements inside a list, the operation of adding a new element in the head or in the tail of the list is performed in constant time. The speed of adding a new element with the LPUSH command to the head of a list with ten elements is the same as adding an element to the head of list with 10 million elements.

What's the downside? Accessing an element by index is very fast in lists implemented with an Array (constant time indexed access) and not so fast in lists implemented by linked lists (where the operation requires an amount of work proportional to the index of the accessed element).

Redis Lists are implemented with linked lists because for a database system it is crucial to be able to add elements to a very long list in a very fast way. Another strong advantage, as you'll see in a moment, is that Redis Lists can be taken at constant length in constant time.

When fast access to the middle of a large collection of elements is important, there is a different data structure that can be used, called sorted sets. Sorted sets will be covered later in this tutorial.

First steps with Redis Lists

The LPUSH command adds a new element into a list, on the left (at the head), while the RPUSH command adds a new element into a list, on the right (at the tail). Finally the LRANGE command extracts ranges of elements from lists:

> rpush mylist A
(integer) 1
> rpush mylist B
(integer) 2
> lpush mylist first
(integer) 3
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"

Note that LRANGE takes two indexes, the first and the last element of the range to return. Both the indexes can be negative, telling Redis to start counting from the end: so -1 is the last element, -2 is the penultimate element of the list, and so forth.

As you can see RPUSH appended the elements on the right of the list, while the final LPUSH appended the element on the left.

Both commands are variadic commands, meaning that you are free to push multiple elements into a list in a single call:

> rpush mylist 1 2 3 4 5 "foo bar"
(integer) 9
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"
4) "1"
5) "2"
6) "3"
7) "4"
8) "5"
9) "foo bar"

An important operation defined on Redis lists is the ability to pop elements. Popping elements is the operation of both retrieving the element from the list, and eliminating it from the list, at the same time. You can pop elements from left and right, similarly to how you can push elements in both sides of the list:

> rpush mylist a b c
(integer) 3
> rpop mylist
"c"
> rpop mylist
"b"
> rpop mylist
"a"

We added three elements and popped three elements, so at the end of this sequence of commands the list is empty and there are no more elements to pop. If we try to pop yet another element, this is the result we get:

> rpop mylist
(nil)

Redis returned a NULL value to signal that there are no elements in the list.

Common use cases for lists

Lists are useful for a number of tasks, two very representative use cases are the following:

  • Remember the latest updates posted by users into a social network.
  • Communication between processes, using a consumer-producer pattern where the producer pushes items into a list, and a consumer (usually a worker) consumes those items and executed actions. Redis has special list commands to make this use case both more reliable and efficient.

For example both the popular Ruby libraries resque and sidekiq use Redis lists under the hood in order to implement background jobs.

The popular Twitter social network takes the latest tweets posted by users into Redis lists.

To describe a common use case step by step, imagine your home page shows the latest photos published in a photo sharing social network and you want to speedup access.

  • Every time a user posts a new photo, we add its ID into a list with LPUSH.
  • When users visit the home page, we use LRANGE 0 9 in order to get the latest 10 posted items.

Capped lists

In many use cases we just want to use lists to store the latest items, whatever they are: social network updates, logs, or anything else.

Redis allows us to use lists as a capped collection, only remembering the latest N items and discarding all the oldest items using the LTRIM command.

The LTRIM command is similar to LRANGE, but instead of displaying the specified range of elements it sets this range as the new list value. All the elements outside the given range are removed.

An example will make it more clear:

> rpush mylist 1 2 3 4 5
(integer) 5
> ltrim mylist 0 2
OK
> lrange mylist 0 -1
1) "1"
2) "2"
3) "3"

The above LTRIM command tells Redis to take just list elements from index 0 to 2, everything else will be discarded. This allows for a very simple but useful pattern: doing a List push operation + a List trim operation together in order to add a new element and discard elements exceeding a limit:

LPUSH mylist <some element>
LTRIM mylist 0 999

The above combination adds a new element and takes only the 1000 newest elements into the list. With LRANGE you can access the top items without any need to remember very old data.

Note: while LRANGE is technically an O(N) command, accessing small ranges towards the head or the tail of the list is a constant time operation.

Blocking operations on lists

Lists have a special feature that make them suitable to implement queues, and in general as a building block for inter process communication systems: blocking operations.

Imagine you want to push items into a list with one process, and use a different process in order to actually do some kind of work with those items. This is the usual producer / consumer setup, and can be implemented in the following simple way:

  • To push items into the list, producers call LPUSH.
  • To extract / process items from the list, consumers call RPOP.

However it is possible that sometimes the list is empty and there is nothing to process, so RPOP just returns NULL. In this case a consumer is forced to wait some time and retry again with RPOP. This is called polling, and is not a good idea in this context because it has several drawbacks:

  1. Forces Redis and clients to process useless commands (all the requests when the list is empty will get no actual work done, they'll just return NULL).
  2. Adds a delay to the processing of items, since after a worker receives a NULL, it waits some time. To make the delay smaller, we could wait less between calls to RPOP, with the effect of amplifying problem number 1, i.e. more useless calls to Redis.

So Redis implements commands called BRPOP and BLPOP which are versions of RPOP and LPOP able to block if the list is empty: they'll return to the caller only when a new element is added to the list, or when a user-specified timeout is reached.

This is an example of a BRPOP call we could use in the worker:

> brpop tasks 5
1) "tasks"
2) "do_something"

It means: "wait for elements in the list tasks, but return if after 5 seconds no element is available".

Note that you can use 0 as timeout to wait for elements forever, and you can also specify multiple lists and not just one, in order to wait on multiple lists at the same time, and get notified when the first list receives an element.

A few things to note about BRPOP:

  1. Clients are served in an ordered way: the first client that blocked waiting for a list, is served first when an element is pushed by some other client, and so forth.
  2. The return value is different compared to RPOP: it is a two-element array since it also includes the name of the key, because BRPOP and BLPOP are able to block waiting for elements from multiple lists.
  3. If the timeout is reached, NULL is returned.

There are more things you should know about lists and blocking ops. We suggest that you read more on the following:

  • It is possible to build safer queues or rotating queues using LMOVE.
  • There is also a blocking variant of the command, called BLMOVE.

Automatic creation and removal of keys

So far in our examples we never had to create empty lists before pushing elements, or removing empty lists when they no longer have elements inside. It is Redis' responsibility to delete keys when lists are left empty, or to create an empty list if the key does not exist and we are trying to add elements to it, for example, with LPUSH.

This is not specific to lists, it applies to all the Redis data types composed of multiple elements -- Streams, Sets, Sorted Sets and Hashes.

Basically we can summarize the behavior with three rules:

  1. When we add an element to an aggregate data type, if the target key does not exist, an empty aggregate data type is created before adding the element.
  2. When we remove elements from an aggregate data type, if the value remains empty, the key is automatically destroyed. The Stream data type is the only exception to this rule.
  3. Calling a read-only command such as LLEN (which returns the length of the list), or a write command removing elements, with an empty key, always produces the same result as if the key is holding an empty aggregate type of the type the command expects to find.

Examples of rule 1:

> del mylist
(integer) 1
> lpush mylist 1 2 3
(integer) 3

However we can't perform operations against the wrong type if the key exists:

> set foo bar
OK
> lpush foo 1 2 3
(error) WRONGTYPE Operation against a key holding the wrong kind of value
> type foo
string

Example of rule 2:

> lpush mylist 1 2 3
(integer) 3
> exists mylist
(integer) 1
> lpop mylist
"3"
> lpop mylist
"2"
> lpop mylist
"1"
> exists mylist
(integer) 0

The key no longer exists after all the elements are popped.

Example of rule 3:

> del mylist
(integer) 0
> llen mylist
(integer) 0
> lpop mylist
(nil)

Hashes

Redis hashes look exactly how one might expect a "hash" to look, with field-value pairs:

> hmset user:1000 username antirez birthyear 1977 verified 1
OK
> hget user:1000 username
"antirez"
> hget user:1000 birthyear
"1977"
> hgetall user:1000
1) "username"
2) "antirez"
3) "birthyear"
4) "1977"
5) "verified"
6) "1"

While hashes are handy to represent objects, actually the number of fields you can put inside a hash has no practical limits (other than available memory), so you can use hashes in many different ways inside your application.

The command HMSET sets multiple fields of the hash, while HGET retrieves a single field. HMGET is similar to HGET but returns an array of values:

> hmget user:1000 username birthyear no-such-field
1) "antirez"
2) "1977"
3) (nil)

There are commands that are able to perform operations on individual fields as well, like HINCRBY:

> hincrby user:1000 birthyear 10
(integer) 1987
> hincrby user:1000 birthyear 10
(integer) 1997

You can find the full list of hash commands in the documentation.

It is worth noting that small hashes (i.e., a few elements with small values) are encoded in special way in memory that make them very memory efficient.

Sets

Redis Sets are unordered collections of strings. The SADD command adds new elements to a set. It's also possible to do a number of other operations against sets like testing if a given element already exists, performing the intersection, union or difference between multiple sets, and so forth.

> sadd myset 1 2 3
(integer) 3
> smembers myset
1. 3
2. 1
3. 2

Here I've added three elements to my set and told Redis to return all the elements. As you can see they are not sorted -- Redis is free to return the elements in any order at every call, since there is no contract with the user about element ordering.

Redis has commands to test for membership. For example, checking if an element exists:

> sismember myset 3
(integer) 1
> sismember myset 30
(integer) 0

"3" is a member of the set, while "30" is not.

Sets are good for expressing relations between objects. For instance we can easily use sets in order to implement tags.

A simple way to model this problem is to have a set for every object we want to tag. The set contains the IDs of the tags associated with the object.

One illustration is tagging news articles. If article ID 1000 is tagged with tags 1, 2, 5 and 77, a set can associate these tag IDs with the news item:

> sadd news:1000:tags 1 2 5 77
(integer) 4

We may also want to have the inverse relation as well: the list of all the news tagged with a given tag:

> sadd tag:1:news 1000
(integer) 1
> sadd tag:2:news 1000
(integer) 1
> sadd tag:5:news 1000
(integer) 1
> sadd tag:77:news 1000
(integer) 1

To get all the tags for a given object is trivial:

> smembers news:1000:tags
1. 5
2. 1
3. 77
4. 2

Note: in the example we assume you have another data structure, for example a Redis hash, which maps tag IDs to tag names.

There are other non trivial operations that are still easy to implement using the right Redis commands. For instance we may want a list of all the objects with the tags 1, 2, 10, and 27 together. We can do this using the SINTER command, which performs the intersection between different sets. We can use:

> sinter tag:1:news tag:2:news tag:10:news tag:27:news
... results here ...

In addition to intersection you can also perform unions, difference, extract a random element, and so forth.

The command to extract an element is called SPOP, and is handy to model certain problems. For example in order to implement a web-based poker game, you may want to represent your deck with a set. Imagine we use a one-char prefix for (C)lubs, (D)iamonds, (H)earts, (S)pades:

> sadd deck C1 C2 C3 C4 C5 C6 C7 C8 C9 C10 CJ CQ CK
  D1 D2 D3 D4 D5 D6 D7 D8 D9 D10 DJ DQ DK H1 H2 H3
  H4 H5 H6 H7 H8 H9 H10 HJ HQ HK S1 S2 S3 S4 S5 S6
  S7 S8 S9 S10 SJ SQ SK
(integer) 52

Now we want to provide each player with 5 cards. The SPOP command removes a random element, returning it to the client, so it is the perfect operation in this case.

However if we call it against our deck directly, in the next play of the game we'll need to populate the deck of cards again, which may not be ideal. So to start, we can make a copy of the set stored in the deck key into the game:1:deck key.

This is accomplished using SUNIONSTORE, which normally performs the union between multiple sets, and stores the result into another set. However, since the union of a single set is itself, I can copy my deck with:

> sunionstore game:1:deck deck
(integer) 52

Now I'm ready to provide the first player with five cards:

> spop game:1:deck
"C6"
> spop game:1:deck
"CQ"
> spop game:1:deck
"D1"
> spop game:1:deck
"CJ"
> spop game:1:deck
"SJ"

One pair of jacks, not great...

This is a good time to introduce the set command that provides the number of elements inside a set. This is often called the cardinality of a set in the context of set theory, so the Redis command is called SCARD.

> scard game:1:deck
(integer) 47

The math works: 52 - 5 = 47.

When you need to just get random elements without removing them from the set, there is the SRANDMEMBER command suitable for the task. It also features the ability to return both repeating and non-repeating elements.

Sorted sets

Sorted sets are a data type which is similar to a mix between a Set and a Hash. Like sets, sorted sets are composed of unique, non-repeating string elements, so in some sense a sorted set is a set as well.

However while elements inside sets are not ordered, every element in a sorted set is associated with a floating point value, called the score (this is why the type is also similar to a hash, since every element is mapped to a value).

Moreover, elements in a sorted sets are taken in order (so they are not ordered on request, order is a peculiarity of the data structure used to represent sorted sets). They are ordered according to the following rule:

  • If A and B are two elements with a different score, then A > B if A.score is > B.score.
  • If A and B have exactly the same score, then A > B if the A string is lexicographically greater than the B string. A and B strings can't be equal since sorted sets only have unique elements.

Let's start with a simple example, adding a few selected hackers names as sorted set elements, with their year of birth as "score".

> zadd hackers 1940 "Alan Kay"
(integer) 1
> zadd hackers 1957 "Sophie Wilson"
(integer) 1
> zadd hackers 1953 "Richard Stallman"
(integer) 1
> zadd hackers 1949 "Anita Borg"
(integer) 1
> zadd hackers 1965 "Yukihiro Matsumoto"
(integer) 1
> zadd hackers 1914 "Hedy Lamarr"
(integer) 1
> zadd hackers 1916 "Claude Shannon"
(integer) 1
> zadd hackers 1969 "Linus Torvalds"
(integer) 1
> zadd hackers 1912 "Alan Turing"
(integer) 1

As you can see ZADD is similar to SADD, but takes one additional argument (placed before the element to be added) which is the score. ZADD is also variadic, so you are free to specify multiple score-value pairs, even if this is not used in the example above.

With sorted sets it is trivial to return a list of hackers sorted by their birth year because actually they are already sorted.

Implementation note: Sorted sets are implemented via a dual-ported data structure containing both a skip list and a hash table, so every time we add an element Redis performs an O(log(N)) operation. That's good, but when we ask for sorted elements Redis does not have to do any work at all, it's already all sorted:

> zrange hackers 0 -1
1) "Alan Turing"
2) "Hedy Lamarr"
3) "Claude Shannon"
4) "Alan Kay"
5) "Anita Borg"
6) "Richard Stallman"
7) "Sophie Wilson"
8) "Yukihiro Matsumoto"
9) "Linus Torvalds"

Note: 0 and -1 means from element index 0 to the last element (-1 works here just as it does in the case of the LRANGE command).

What if I want to order them the opposite way, youngest to oldest? Use ZREVRANGE instead of ZRANGE:

> zrevrange hackers 0 -1
1) "Linus Torvalds"
2) "Yukihiro Matsumoto"
3) "Sophie Wilson"
4) "Richard Stallman"
5) "Anita Borg"
6) "Alan Kay"
7) "Claude Shannon"
8) "Hedy Lamarr"
9) "Alan Turing"

It is possible to return scores as well, using the WITHSCORES argument:

> zrange hackers 0 -1 withscores
1) "Alan Turing"
2) "1912"
3) "Hedy Lamarr"
4) "1914"
5) "Claude Shannon"
6) "1916"
7) "Alan Kay"
8) "1940"
9) "Anita Borg"
10) "1949"
11) "Richard Stallman"
12) "1953"
13) "Sophie Wilson"
14) "1957"
15) "Yukihiro Matsumoto"
16) "1965"
17) "Linus Torvalds"
18) "1969"

Operating on ranges

Sorted sets are more powerful than this. They can operate on ranges. Let's get all the individuals that were born up to 1950 inclusive. We use the ZRANGEBYSCORE command to do it:

> zrangebyscore hackers -inf 1950
1) "Alan Turing"
2) "Hedy Lamarr"
3) "Claude Shannon"
4) "Alan Kay"
5) "Anita Borg"

We asked Redis to return all the elements with a score between negative infinity and 1950 (both extremes are included).

It's also possible to remove ranges of elements. Let's remove all the hackers born between 1940 and 1960 from the sorted set:

> zremrangebyscore hackers 1940 1960
(integer) 4

ZREMRANGEBYSCORE is perhaps not the best command name, but it can be very useful, and returns the number of removed elements.

Another extremely useful operation defined for sorted set elements is the get-rank operation. It is possible to ask what is the position of an element in the set of the ordered elements.

> zrank hackers "Anita Borg"
(integer) 4

The ZREVRANK command is also available in order to get the rank, considering the elements sorted a descending way.

Lexicographical scores

With recent versions of Redis 2.8, a new feature was introduced that allows getting ranges lexicographically, assuming elements in a sorted set are all inserted with the same identical score (elements are compared with the C memcmp function, so it is guaranteed that there is no collation, and every Redis instance will reply with the same output).

The main commands to operate with lexicographical ranges are ZRANGEBYLEX, ZREVRANGEBYLEX, ZREMRANGEBYLEX and ZLEXCOUNT.

For example, let's add again our list of famous hackers, but this time use a score of zero for all the elements:

> zadd hackers 0 "Alan Kay" 0 "Sophie Wilson" 0 "Richard Stallman" 0
  "Anita Borg" 0 "Yukihiro Matsumoto" 0 "Hedy Lamarr" 0 "Claude Shannon"
  0 "Linus Torvalds" 0 "Alan Turing"

Because of the sorted sets ordering rules, they are already sorted lexicographically:

> zrange hackers 0 -1
1) "Alan Kay"
2) "Alan Turing"
3) "Anita Borg"
4) "Claude Shannon"
5) "Hedy Lamarr"
6) "Linus Torvalds"
7) "Richard Stallman"
8) "Sophie Wilson"
9) "Yukihiro Matsumoto"

Using ZRANGEBYLEX we can ask for lexicographical ranges:

> zrangebylex hackers [B [P
1) "Claude Shannon"
2) "Hedy Lamarr"
3) "Linus Torvalds"

Ranges can be inclusive or exclusive (depending on the first character), also string infinite and minus infinite are specified respectively with the + and - strings. See the documentation for more information.

This feature is important because it allows us to use sorted sets as a generic index. For example, if you want to index elements by a 128-bit unsigned integer argument, all you need to do is to add elements into a sorted set with the same score (for example 0) but with an 16 byte prefix consisting of the 128 bit number in big endian. Since numbers in big endian, when ordered lexicographically (in raw bytes order) are actually ordered numerically as well, you can ask for ranges in the 128 bit space, and get the element's value discarding the prefix.

If you want to see the feature in the context of a more serious demo, check the Redis autocomplete demo.

Updating the score: leader boards

Just a final note about sorted sets before switching to the next topic. Sorted sets' scores can be updated at any time. Just calling ZADD against an element already included in the sorted set will update its score (and position) with O(log(N)) time complexity. As such, sorted sets are suitable when there are tons of updates.

Because of this characteristic a common use case is leader boards. The typical application is a Facebook game where you combine the ability to take users sorted by their high score, plus the get-rank operation, in order to show the top-N users, and the user rank in the leader board (e.g., "you are the #4932 best score here").

Bitmaps

Bitmaps are not an actual data type, but a set of bit-oriented operations defined on the String type. Since strings are binary safe blobs and their maximum length is 512 MB, they are suitable to set up to 2^32 different bits.

Bit operations are divided into two groups: constant-time single bit operations, like setting a bit to 1 or 0, or getting its value, and operations on groups of bits, for example counting the number of set bits in a given range of bits (e.g., population counting).

One of the biggest advantages of bitmaps is that they often provide extreme space savings when storing information. For example in a system where different users are represented by incremental user IDs, it is possible to remember a single bit information (for example, knowing whether a user wants to receive a newsletter) of 4 billion of users using just 512 MB of memory.

Bits are set and retrieved using the SETBIT and GETBIT commands:

> setbit key 10 1
(integer) 1
> getbit key 10
(integer) 1
> getbit key 11
(integer) 0

The SETBIT command takes as its first argument the bit number, and as its second argument the value to set the bit to, which is 1 or 0. The command automatically enlarges the string if the addressed bit is outside the current string length.

GETBIT just returns the value of the bit at the specified index. Out of range bits (addressing a bit that is outside the length of the string stored into the target key) are always considered to be zero.

There are three commands operating on group of bits:

  1. BITOP performs bit-wise operations between different strings. The provided operations are AND, OR, XOR and NOT.
  2. BITCOUNT performs population counting, reporting the number of bits set to 1.
  3. BITPOS finds the first bit having the specified value of 0 or 1.

Both BITPOS and BITCOUNT are able to operate with byte ranges of the string, instead of running for the whole length of the string. The following is a trivial example of BITCOUNT call:

> setbit key 0 1
(integer) 0
> setbit key 100 1
(integer) 0
> bitcount key
(integer) 2

Common use cases for bitmaps are:

  • Real time analytics of all kinds.
  • Storing space efficient but high performance boolean information associated with object IDs.

For example imagine you want to know the longest streak of daily visits of your web site users. You start counting days starting from zero, that is the day you made your web site public, and set a bit with SETBIT every time the user visits the web site. As a bit index you simply take the current unix time, subtract the initial offset, and divide by the number of seconds in a day (normally, 3600*24).

This way for each user you have a small string containing the visit information for each day. With BITCOUNT it is possible to easily get the number of days a given user visited the web site, while with a few BITPOS calls, or simply fetching and analyzing the bitmap client-side, it is possible to easily compute the longest streak.

Bitmaps are trivial to split into multiple keys, for example for the sake of sharding the data set and because in general it is better to avoid working with huge keys. To split a bitmap across different keys instead of setting all the bits into a key, a trivial strategy is just to store M bits per key and obtain the key name with bit-number/M and the Nth bit to address inside the key with bit-number MOD M.

HyperLogLogs

A HyperLogLog is a probabilistic data structure used in order to count unique things (technically this is referred to estimating the cardinality of a set). Usually counting unique items requires using an amount of memory proportional to the number of items you want to count, because you need to remember the elements you have already seen in the past in order to avoid counting them multiple times. However there is a set of algorithms that trade memory for precision: you end with an estimated measure with a standard error, which in the case of the Redis implementation is less than 1%. The magic of this algorithm is that you no longer need to use an amount of memory proportional to the number of items counted, and instead can use a constant amount of memory! 12k bytes in the worst case, or a lot less if your HyperLogLog (We'll just call them HLL from now) has seen very few elements.

HLLs in Redis, while technically a different data structure, are encoded as a Redis string, so you can call GET to serialize a HLL, and SET to deserialize it back to the server.

Conceptually the HLL API is like using Sets to do the same task. You would SADD every observed element into a set, and would use SCARD to check the number of elements inside the set, which are unique since SADD will not re-add an existing element.

While you don't really add items into an HLL, because the data structure only contains a state that does not include actual elements, the API is the same:

  • Every time you see a new element, you add it to the count with PFADD.

  • Every time you want to retrieve the current approximation of the unique elements added with PFADD so far, you use the PFCOUNT.

      > pfadd hll a b c d
      (integer) 1
      > pfcount hll
      (integer) 4
    

An example of use case for this data structure is counting unique queries performed by users in a search form every day.

Redis is also able to perform the union of HLLs, please check the full documentation for more information.

Other notable features

There are other important things in the Redis API that can't be explored in the context of this document, but are worth your attention:

Learn more

This tutorial is in no way complete and has covered just the basics of the API. Read the command reference to discover a lot more.

Thanks for reading, and have fun hacking with Redis!

5.2 - Redis streams

An introduction to the Redis stream data type

The Stream is a new data type introduced with Redis 5.0, which models a log data structure in a more abstract way. However the essence of a log is still intact: like a log file, often implemented as a file open in append-only mode, Redis Streams are primarily an append-only data structure. At least conceptually, because being an abstract data type represented in memory, Redis Streams implement powerful operations to overcome the limitations of a log file.

What makes Redis streams the most complex type of Redis, despite the data structure itself being quite simple, is the fact that it implements additional, non-mandatory features: a set of blocking operations allowing consumers to wait for new data added to a stream by producers, and in addition to that a concept called Consumer Groups.

Consumer groups were initially introduced by the popular messaging system Kafka (TM). Redis reimplements a similar idea in completely different terms, but the goal is the same: to allow a group of clients to cooperate in consuming a different portion of the same stream of messages.

Streams basics

For the goal of understanding what Redis Streams are and how to use them, we will ignore all the advanced features, and instead focus on the data structure itself, in terms of commands used to manipulate and access it. This is, basically, the part which is common to most of the other Redis data types, like Lists, Sets, Sorted Sets and so forth. However, note that Lists also have an optional more complex blocking API, exported by commands like BLPOP and similar. So Streams are not much different than Lists in this regard, it's just that the additional API is more complex and more powerful.

Because Streams are an append only data structure, the fundamental write command, called XADD, appends a new entry into the specified stream. A stream entry is not just a string, but is instead composed of one or more field-value pairs. This way, each entry of a stream is already structured, like an append only file written in CSV format where multiple separated fields are present in each line.

> XADD mystream * sensor-id 1234 temperature 19.8
1518951480106-0

The above call to the XADD command adds an entry sensor-id: 1234, temperature: 19.8 to the stream at key mystream, using an auto-generated entry ID, which is the one returned by the command, specifically 1518951480106-0. It gets as its first argument the key name mystream, the second argument is the entry ID that identifies every entry inside a stream. However, in this case, we passed * because we want the server to generate a new ID for us. Every new ID will be monotonically increasing, so in more simple terms, every new entry added will have a higher ID compared to all the past entries. Auto-generation of IDs by the server is almost always what you want, and the reasons for specifying an ID explicitly are very rare. We'll talk more about this later. The fact that each Stream entry has an ID is another similarity with log files, where line numbers, or the byte offset inside the file, can be used in order to identify a given entry. Returning back at our XADD example, after the key name and ID, the next arguments are the field-value pairs composing our stream entry.

It is possible to get the number of items inside a Stream just using the XLEN command:

> XLEN mystream
(integer) 1

Entry IDs

The entry ID returned by the XADD command, and identifying univocally each entry inside a given stream, is composed of two parts:

<millisecondsTime>-<sequenceNumber>

The milliseconds time part is actually the local time in the local Redis node generating the stream ID, however if the current milliseconds time happens to be smaller than the previous entry time, then the previous entry time is used instead, so if a clock jumps backward the monotonically incrementing ID property still holds. The sequence number is used for entries created in the same millisecond. Since the sequence number is 64 bit wide, in practical terms there is no limit to the number of entries that can be generated within the same millisecond.

The format of such IDs may look strange at first, and the gentle reader may wonder why the time is part of the ID. The reason is that Redis streams support range queries by ID. Because the ID is related to the time the entry is generated, this gives the ability to query for time ranges basically for free. We will see this soon while covering the XRANGE command.

If for some reason the user needs incremental IDs that are not related to time but are actually associated to another external system ID, as previously mentioned, the XADD command can take an explicit ID instead of the * wildcard ID that triggers auto-generation, like in the following examples:

> XADD somestream 0-1 field value
0-1
> XADD somestream 0-2 foo bar
0-2

Note that in this case, the minimum ID is 0-1 and that the command will not accept an ID equal or smaller than a previous one:

> XADD somestream 0-1 foo bar
(error) ERR The ID specified in XADD is equal or smaller than the target stream top item

It is also possible to use an explicit ID that only consists of the milliseconds part, and have the sequence part be automatically generated for the entry:

> XADD somestream 0-* baz qux
0-3

Getting data from Streams

Now we are finally able to append entries in our stream via XADD. However, while appending data to a stream is quite obvious, the way streams can be queried in order to extract data is not so obvious. If we continue with the analogy of the log file, one obvious way is to mimic what we normally do with the Unix command tail -f, that is, we may start to listen in order to get the new messages that are appended to the stream. Note that unlike the blocking list operations of Redis, where a given element will reach a single client which is blocking in a pop style operation like BLPOP, with streams we want multiple consumers to see the new messages appended to the stream (the same way many tail -f processes can see what is added to a log). Using the traditional terminology we want the streams to be able to fan out messages to multiple clients.

However, this is just one potential access mode. We could also see a stream in quite a different way: not as a messaging system, but as a time series store. In this case, maybe it's also useful to get the new messages appended, but another natural query mode is to get messages by ranges of time, or alternatively to iterate the messages using a cursor to incrementally check all the history. This is definitely another useful access mode.

Finally, if we see a stream from the point of view of consumers, we may want to access the stream in yet another way, that is, as a stream of messages that can be partitioned to multiple consumers that are processing such messages, so that groups of consumers can only see a subset of the messages arriving in a single stream. In this way, it is possible to scale the message processing across different consumers, without single consumers having to process all the messages: each consumer will just get different messages to process. This is basically what Kafka (TM) does with consumer groups. Reading messages via consumer groups is yet another interesting mode of reading from a Redis Stream.

Redis Streams support all three of the query modes described above via different commands. The next sections will show them all, starting from the simplest and most direct to use: range queries.

Querying by range: XRANGE and XREVRANGE

To query the stream by range we are only required to specify two IDs, start and end. The range returned will include the elements having start or end as ID, so the range is inclusive. The two special IDs - and + respectively mean the smallest and the greatest ID possible.

> XRANGE mystream - +
1) 1) 1518951480106-0
   2) 1) "sensor-id"
      2) "1234"
      3) "temperature"
      4) "19.8"
2) 1) 1518951482479-0
   2) 1) "sensor-id"
      2) "9999"
      3) "temperature"
      4) "18.2"

Each entry returned is an array of two items: the ID and the list of field-value pairs. We already said that the entry IDs have a relation with the time, because the part at the left of the - character is the Unix time in milliseconds of the local node that created the stream entry, at the moment the entry was created (however note that streams are replicated with fully specified XADD commands, so the replicas will have identical IDs to the master). This means that I could query a range of time using XRANGE. In order to do so, however, I may want to omit the sequence part of the ID: if omitted, in the start of the range it will be assumed to be 0, while in the end part it will be assumed to be the maximum sequence number available. This way, querying using just two milliseconds Unix times, we get all the entries that were generated in that range of time, in an inclusive way. For instance, if I want to query a two milliseconds period I could use:

> XRANGE mystream 1518951480106 1518951480107
1) 1) 1518951480106-0
   2) 1) "sensor-id"
      2) "1234"
      3) "temperature"
      4) "19.8"

I have only a single entry in this range, however in real data sets, I could query for ranges of hours, or there could be many items in just two milliseconds, and the result returned could be huge. For this reason, XRANGE supports an optional COUNT option at the end. By specifying a count, I can just get the first N items. If I want more, I can get the last ID returned, increment the sequence part by one, and query again. Let's see this in the following example. We start adding 10 items with XADD (I won't show that, lets assume that the stream mystream was populated with 10 items). To start my iteration, getting 2 items per command, I start with the full range, but with a count of 2.

> XRANGE mystream - + COUNT 2
1) 1) 1519073278252-0
   2) 1) "foo"
      2) "value_1"
2) 1) 1519073279157-0
   2) 1) "foo"
      2) "value_2"

In order to continue the iteration with the next two items, I have to pick the last ID returned, that is 1519073279157-0 and add the prefix ( to it. The resulting exclusive range interval, that is (1519073279157-0 in this case, can now be used as the new start argument for the next XRANGE call:

> XRANGE mystream (1519073279157-0 + COUNT 2
1) 1) 1519073280281-0
   2) 1) "foo"
      2) "value_3"
2) 1) 1519073281432-0
   2) 1) "foo"
      2) "value_4"

And so forth. Since XRANGE complexity is O(log(N)) to seek, and then O(M) to return M elements, with a small count the command has a logarithmic time complexity, which means that each step of the iteration is fast. So XRANGE is also the de facto streams iterator and does not require an XSCAN command.

The command XREVRANGE is the equivalent of XRANGE but returning the elements in inverted order, so a practical use for XREVRANGE is to check what is the last item in a Stream:

> XREVRANGE mystream + - COUNT 1
1) 1) 1519073287312-0
   2) 1) "foo"
      2) "value_10"

Note that the XREVRANGE command takes the start and stop arguments in reverse order.

Listening for new items with XREAD

When we do not want to access items by a range in a stream, usually what we want instead is to subscribe to new items arriving to the stream. This concept may appear related to Redis Pub/Sub, where you subscribe to a channel, or to Redis blocking lists, where you wait for a key to get new elements to fetch, but there are fundamental differences in the way you consume a stream:

  1. A stream can have multiple clients (consumers) waiting for data. Every new item, by default, will be delivered to every consumer that is waiting for data in a given stream. This behavior is different than blocking lists, where each consumer will get a different element. However, the ability to fan out to multiple consumers is similar to Pub/Sub.
  2. While in Pub/Sub messages are fire and forget and are never stored anyway, and while when using blocking lists, when a message is received by the client it is popped (effectively removed) from the list, streams work in a fundamentally different way. All the messages are appended in the stream indefinitely (unless the user explicitly asks to delete entries): different consumers will know what is a new message from its point of view by remembering the ID of the last message received.
  3. Streams Consumer Groups provide a level of control that Pub/Sub or blocking lists cannot achieve, with different groups for the same stream, explicit acknowledgment of processed items, ability to inspect the pending items, claiming of unprocessed messages, and coherent history visibility for each single client, that is only able to see its private past history of messages.

The command that provides the ability to listen for new messages arriving into a stream is called XREAD. It's a bit more complex than XRANGE, so we'll start showing simple forms, and later the whole command layout will be provided.

> XREAD COUNT 2 STREAMS mystream 0
1) 1) "mystream"
   2) 1) 1) 1519073278252-0
         2) 1) "foo"
            2) "value_1"
      2) 1) 1519073279157-0
         2) 1) "foo"
            2) "value_2"

The above is the non-blocking form of XREAD. Note that the COUNT option is not mandatory, in fact the only mandatory option of the command is the STREAMS option, that specifies a list of keys together with the corresponding maximum ID already seen for each stream by the calling consumer, so that the command will provide the client only with messages with an ID greater than the one we specified.

In the above command we wrote STREAMS mystream 0 so we want all the messages in the Stream mystream having an ID greater than 0-0. As you can see in the example above, the command returns the key name, because actually it is possible to call this command with more than one key to read from different streams at the same time. I could write, for instance: STREAMS mystream otherstream 0 0. Note how after the STREAMS option we need to provide the key names, and later the IDs. For this reason, the STREAMS option must always be the last one.

Apart from the fact that XREAD can access multiple streams at once, and that we are able to specify the last ID we own to just get newer messages, in this simple form the command is not doing something so different compared to XRANGE. However, the interesting part is that we can turn XREAD into a blocking command easily, by specifying the BLOCK argument:

> XREAD BLOCK 0 STREAMS mystream $

Note that in the example above, other than removing COUNT, I specified the new BLOCK option with a timeout of 0 milliseconds (that means to never timeout). Moreover, instead of passing a normal ID for the stream mystream I passed the special ID $. This special ID means that XREAD should use as last ID the maximum ID already stored in the stream mystream, so that we will receive only new messages, starting from the time we started listening. This is similar to the tail -f Unix command in some way.

Note that when the BLOCK option is used, we do not have to use the special ID $. We can use any valid ID. If the command is able to serve our request immediately without blocking, it will do so, otherwise it will block. Normally if we want to consume the stream starting from new entries, we start with the ID $, and after that we continue using the ID of the last message received to make the next call, and so forth.

The blocking form of XREAD is also able to listen to multiple Streams, just by specifying multiple key names. If the request can be served synchronously because there is at least one stream with elements greater than the corresponding ID we specified, it returns with the results. Otherwise, the command will block and will return the items of the first stream which gets new data (according to the specified ID).

Similarly to blocking list operations, blocking stream reads are fair from the point of view of clients waiting for data, since the semantics is FIFO style. The first client that blocked for a given stream will be the first to be unblocked when new items are available.

XREAD has no other options than COUNT and BLOCK, so it's a pretty basic command with a specific purpose to attach consumers to one or multiple streams. More powerful features to consume streams are available using the consumer groups API, however reading via consumer groups is implemented by a different command called XREADGROUP, covered in the next section of this guide.

Consumer groups

When the task at hand is to consume the same stream from different clients, then XREAD already offers a way to fan-out to N clients, potentially also using replicas in order to provide more read scalability. However in certain problems what we want to do is not to provide the same stream of messages to many clients, but to provide a different subset of messages from the same stream to many clients. An obvious case where this is useful is that of messages which are slow to process: the ability to have N different workers that will receive different parts of the stream allows us to scale message processing, by routing different messages to different workers that are ready to do more work.

In practical terms, if we imagine having three consumers C1, C2, C3, and a stream that contains the messages 1, 2, 3, 4, 5, 6, 7 then what we want is to serve the messages according to the following diagram:

1 -> C1
2 -> C2
3 -> C3
4 -> C1
5 -> C2
6 -> C3
7 -> C1

In order to achieve this, Redis uses a concept called consumer groups. It is very important to understand that Redis consumer groups have nothing to do, from an implementation standpoint, with Kafka (TM) consumer groups. Yet they are similar in functionality, so I decided to keep Kafka's (TM) terminology, as it originally popularized this idea.

A consumer group is like a pseudo consumer that gets data from a stream, and actually serves multiple consumers, providing certain guarantees:

  1. Each message is served to a different consumer so that it is not possible that the same message will be delivered to multiple consumers.
  2. Consumers are identified, within a consumer group, by a name, which is a case-sensitive string that the clients implementing consumers must choose. This means that even after a disconnect, the stream consumer group retains all the state, since the client will claim again to be the same consumer. However, this also means that it is up to the client to provide a unique identifier.
  3. Each consumer group has the concept of the first ID never consumed so that, when a consumer asks for new messages, it can provide just messages that were not previously delivered.
  4. Consuming a message, however, requires an explicit acknowledgment using a specific command. Redis interprets the acknowledgment as: this message was correctly processed so it can be evicted from the consumer group.
  5. A consumer group tracks all the messages that are currently pending, that is, messages that were delivered to some consumer of the consumer group, but are yet to be acknowledged as processed. Thanks to this feature, when accessing the message history of a stream, each consumer will only see messages that were delivered to it.

In a way, a consumer group can be imagined as some amount of state about a stream:

+----------------------------------------+
| consumer_group_name: mygroup           |
| consumer_group_stream: somekey         |
| last_delivered_id: 1292309234234-92    |
|                                        |
| consumers:                             |
|    "consumer-1" with pending messages  |
|       1292309234234-4                  |
|       1292309234232-8                  |
|    "consumer-42" with pending messages |
|       ... (and so forth)               |
+----------------------------------------+

If you see this from this point of view, it is very simple to understand what a consumer group can do, how it is able to just provide consumers with their history of pending messages, and how consumers asking for new messages will just be served with message IDs greater than last_delivered_id. At the same time, if you look at the consumer group as an auxiliary data structure for Redis streams, it is obvious that a single stream can have multiple consumer groups, that have a different set of consumers. Actually, it is even possible for the same stream to have clients reading without consumer groups via XREAD, and clients reading via XREADGROUP in different consumer groups.

Now it's time to zoom in to see the fundamental consumer group commands. They are the following:

  • XGROUP is used in order to create, destroy and manage consumer groups.
  • XREADGROUP is used to read from a stream via a consumer group.
  • XACK is the command that allows a consumer to mark a pending message as correctly processed.

Creating a consumer group

Assuming I have a key mystream of type stream already existing, in order to create a consumer group I just need to do the following:

> XGROUP CREATE mystream mygroup $
OK

As you can see in the command above when creating the consumer group we have to specify an ID, which in the example is just $. This is needed because the consumer group, among the other states, must have an idea about what message to serve next at the first consumer connecting, that is, what was the last message ID when the group was just created. If we provide $ as we did, then only new messages arriving in the stream from now on will be provided to the consumers in the group. If we specify 0 instead the consumer group will consume all the messages in the stream history to start with. Of course, you can specify any other valid ID. What you know is that the consumer group will start delivering messages that are greater than the ID you specify. Because $ means the current greatest ID in the stream, specifying $ will have the effect of consuming only new messages.

XGROUP CREATE also supports creating the stream automatically, if it doesn't exist, using the optional MKSTREAM subcommand as the last argument:

> XGROUP CREATE newstream mygroup $ MKSTREAM
OK

Now that the consumer group is created we can immediately try to read messages via the consumer group using the XREADGROUP command. We'll read from consumers, that we will call Alice and Bob, to see how the system will return different messages to Alice or Bob.

XREADGROUP is very similar to XREAD and provides the same BLOCK option, otherwise it is a synchronous command. However there is a mandatory option that must be always specified, which is GROUP and has two arguments: the name of the consumer group, and the name of the consumer that is attempting to read. The option COUNT is also supported and is identical to the one in XREAD.

Before reading from the stream, let's put some messages inside:

> XADD mystream * message apple
1526569495631-0
> XADD mystream * message orange
1526569498055-0
> XADD mystream * message strawberry
1526569506935-0
> XADD mystream * message apricot
1526569535168-0
> XADD mystream * message banana
1526569544280-0

Note: here message is the field name, and the fruit is the associated value, remember that stream items are small dictionaries.

It is time to try reading something using the consumer group:

> XREADGROUP GROUP mygroup Alice COUNT 1 STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) 1526569495631-0
         2) 1) "message"
            2) "apple"

XREADGROUP replies are just like XREAD replies. Note however the GROUP <group-name> <consumer-name> provided above. It states that I want to read from the stream using the consumer group mygroup and I'm the consumer Alice. Every time a consumer performs an operation with a consumer group, it must specify its name, uniquely identifying this consumer inside the group.

There is another very important detail in the command line above, after the mandatory STREAMS option the ID requested for the key mystream is the special ID >. This special ID is only valid in the context of consumer groups, and it means: messages never delivered to other consumers so far.

This is almost always what you want, however it is also possible to specify a real ID, such as 0 or any other valid ID, in this case, however, what happens is that we request from XREADGROUP to just provide us with the history of pending messages, and in such case, will never see new messages in the group. So basically XREADGROUP has the following behavior based on the ID we specify:

  • If the ID is the special ID > then the command will return only new messages never delivered to other consumers so far, and as a side effect, will update the consumer group's last ID.
  • If the ID is any other valid numerical ID, then the command will let us access our history of pending messages. That is, the set of messages that were delivered to this specified consumer (identified by the provided name), and never acknowledged so far with XACK.

We can test this behavior immediately specifying an ID of 0, without any COUNT option: we'll just see the only pending message, that is, the one about apples:

> XREADGROUP GROUP mygroup Alice STREAMS mystream 0
1) 1) "mystream"
   2) 1) 1) 1526569495631-0
         2) 1) "message"
            2) "apple"

However, if we acknowledge the message as processed, it will no longer be part of the pending messages history, so the system will no longer report anything:

> XACK mystream mygroup 1526569495631-0
(integer) 1
> XREADGROUP GROUP mygroup Alice STREAMS mystream 0
1) 1) "mystream"
   2) (empty list or set)

Don't worry if you yet don't know how XACK works, the idea is just that processed messages are no longer part of the history that we can access.

Now it's Bob's turn to read something:

> XREADGROUP GROUP mygroup Bob COUNT 2 STREAMS mystream >
1) 1) "mystream"
   2) 1) 1) 1526569498055-0
         2) 1) "message"
            2) "orange"
      2) 1) 1526569506935-0
         2) 1) "message"
            2) "strawberry"

Bob asked for a maximum of two messages and is reading via the same group mygroup. So what happens is that Redis reports just new messages. As you can see the "apple" message is not delivered, since it was already delivered to Alice, so Bob gets orange and strawberry, and so forth.

This way Alice, Bob, and any other consumer in the group, are able to read different messages from the same stream, to read their history of yet to process messages, or to mark messages as processed. This allows creating different topologies and semantics for consuming messages from a stream.

There are a few things to keep in mind:

  • Consumers are auto-created the first time they are mentioned, no need for explicit creation.
  • Even with XREADGROUP you can read from multiple keys at the same time, however for this to work, you need to create a consumer group with the same name in every stream. This is not a common need, but it is worth mentioning that the feature is technically available.
  • XREADGROUP is a write command because even if it reads from the stream, the consumer group is modified as a side effect of reading, so it can only be called on master instances.

An example of a consumer implementation, using consumer groups, written in the Ruby language could be the following. The Ruby code is aimed to be readable by virtually any experienced programmer, even if they do not know Ruby:

require 'redis'

if ARGV.length == 0
    puts "Please specify a consumer name"
    exit 1
end

ConsumerName = ARGV[0]
GroupName = "mygroup"
r = Redis.new

def process_message(id,msg)
    puts "[#{ConsumerName}] #{id} = #{msg.inspect}"
end

$lastid = '0-0'

puts "Consumer #{ConsumerName} starting..."
check_backlog = true
while true
    # Pick the ID based on the iteration: the first time we want to
    # read our pending messages, in case we crashed and are recovering.
    # Once we consumed our history, we can start getting new messages.
    if check_backlog
        myid = $lastid
    else
        myid = '>'
    end

    items = r.xreadgroup('GROUP',GroupName,ConsumerName,'BLOCK','2000','COUNT','10','STREAMS',:my_stream_key,myid)

    if items == nil
        puts "Timeout!"
        next
    end

    # If we receive an empty reply, it means we were consuming our history
    # and that the history is now empty. Let's start to consume new messages.
    check_backlog = false if items[0][1].length == 0

    items[0][1].each{|i|
        id,fields = i

        # Process the message
        process_message(id,fields)

        # Acknowledge the message as processed
        r.xack(:my_stream_key,GroupName,id)

        $lastid = id
    }
end

As you can see the idea here is to start by consuming the history, that is, our list of pending messages. This is useful because the consumer may have crashed before, so in the event of a restart we want to re-read messages that were delivered to us without getting acknowledged. Note that we might process a message multiple times or one time (at least in the case of consumer failures, but there are also the limits of Redis persistence and replication involved, see the specific section about this topic).

Once the history was consumed, and we get an empty list of messages, we can switch to using the > special ID in order to consume new messages.

Recovering from permanent failures

The example above allows us to write consumers that participate in the same consumer group, each taking a subset of messages to process, and when recovering from failures re-reading the pending messages that were delivered just to them. However in the real world consumers may permanently fail and never recover. What happens to the pending messages of the consumer that never recovers after stopping for any reason?

Redis consumer groups offer a feature that is used in these situations in order to claim the pending messages of a given consumer so that such messages will change ownership and will be re-assigned to a different consumer. The feature is very explicit. A consumer has to inspect the list of pending messages, and will have to claim specific messages using a special command, otherwise the server will leave the messages pending forever and assigned to the old consumer. In this way different applications can choose if to use such a feature or not, and exactly how to use it.

The first step of this process is just a command that provides observability of pending entries in the consumer group and is called XPENDING. This is a read-only command which is always safe to call and will not change ownership of any message. In its simplest form, the command is called with two arguments, which are the name of the stream and the name of the consumer group.

> XPENDING mystream mygroup
1) (integer) 2
2) 1526569498055-0
3) 1526569506935-0
4) 1) 1) "Bob"
      2) "2"

When called in this way, the command outputs the total number of pending messages in the consumer group (two in this case), the lower and higher message ID among the pending messages, and finally a list of consumers and the number of pending messages they have. We have only Bob with two pending messages because the single message that Alice requested was acknowledged using XACK.

We can ask for more information by giving more arguments to XPENDING, because the full command signature is the following:

XPENDING <key> <groupname> [[IDLE <min-idle-time>] <start-id> <end-id> <count> [<consumer-name>]]

By providing a start and end ID (that can be just - and + as in XRANGE) and a count to control the amount of information returned by the command, we are able to know more about the pending messages. The optional final argument, the consumer name, is used if we want to limit the output to just messages pending for a given consumer, but won't use this feature in the following example.

> XPENDING mystream mygroup - + 10
1) 1) 1526569498055-0
   2) "Bob"
   3) (integer) 74170458
   4) (integer) 1
2) 1) 1526569506935-0
   2) "Bob"
   3) (integer) 74170458
   4) (integer) 1

Now we have the details for each message: the ID, the consumer name, the idle time in milliseconds, which is how many milliseconds have passed since the last time the message was delivered to some consumer, and finally the number of times that a given message was delivered. We have two messages from Bob, and they are idle for 74170458 milliseconds, about 20 hours.

Note that nobody prevents us from checking what the first message content was by just using XRANGE.

> XRANGE mystream 1526569498055-0 1526569498055-0
1) 1) 1526569498055-0
   2) 1) "message"
      2) "orange"

We have just to repeat the same ID twice in the arguments. Now that we have some ideas, Alice may decide that after 20 hours of not processing messages, Bob will probably not recover in time, and it's time to claim such messages and resume the processing in place of Bob. To do so, we use the XCLAIM command.

This command is very complex and full of options in its full form, since it is used for replication of consumer groups changes, but we'll use just the arguments that we need normally. In this case it is as simple as:

XCLAIM <key> <group> <consumer> <min-idle-time> <ID-1> <ID-2> ... <ID-N>

Basically we say, for this specific key and group, I want that the message IDs specified will change ownership, and will be assigned to the specified consumer name <consumer>. However, we also provide a minimum idle time, so that the operation will only work if the idle time of the mentioned messages is greater than the specified idle time. This is useful because maybe two clients are retrying to claim a message at the same time:

Client 1: XCLAIM mystream mygroup Alice 3600000 1526569498055-0
Client 2: XCLAIM mystream mygroup Lora 3600000 1526569498055-0

However, as a side effect, claiming a message will reset its idle time and will increment its number of deliveries counter, so the second client will fail claiming it. In this way we avoid trivial re-processing of messages (even if in the general case you cannot obtain exactly once processing).

This is the result of the command execution:

> XCLAIM mystream mygroup Alice 3600000 1526569498055-0
1) 1) 1526569498055-0
   2) 1) "message"
      2) "orange"

The message was successfully claimed by Alice, who can now process the message and acknowledge it, and move things forward even if the original consumer is not recovering.

It is clear from the example above that as a side effect of successfully claiming a given message, the XCLAIM command also returns it. However this is not mandatory. The JUSTID option can be used in order to return just the IDs of the message successfully claimed. This is useful if you want to reduce the bandwidth used between the client and the server (and also the performance of the command) and you are not interested in the message because your consumer is implemented in a way that it will rescan the history of pending messages from time to time.

Claiming may also be implemented by a separate process: one that just checks the list of pending messages, and assigns idle messages to consumers that appear to be active. Active consumers can be obtained using one of the observability features of Redis streams. This is the topic of the next section.

Automatic claiming

The XAUTOCLAIM command, added in Redis 6.2, implements the claiming process that we've described above. XPENDING and XCLAIM provide the basic building blocks for different types of recovery mechanisms. This command optimizes the generic process by having Redis manage it and offers a simple solution for most recovery needs.

XAUTOCLAIM identifies idle pending messages and transfers ownership of them to a consumer. The command's signature looks like this:

XAUTOCLAIM <key> <group> <consumer> <min-idle-time> <start> [COUNT count] [JUSTID]

So, in the example above, I could have used automatic claiming to claim a single message like this:

> XAUTOCLAIM mystream mygroup Alice 3600000 0-0 COUNT 1
1) 1526569498055-0
2) 1) 1526569498055-0
   2) 1) "message"
      2) "orange"

Like XCLAIM, the command replies with an array of the claimed messages, but it also returns a stream ID that allows iterating the pending entries. The stream ID is a cursor, and I can use it in my next call to continue in claiming idle pending messages:

> XAUTOCLAIM mystream mygroup Lora 3600000 1526569498055-0 COUNT 1
1) 0-0
2) 1) 1526569506935-0
   2) 1) "message"
      2) "strawberry"

When XAUTOCLAIM returns the "0-0" stream ID as a cursor, that means that it reached the end of the consumer group pending entries list. That doesn't mean that there are no new idle pending messages, so the process continues by calling XAUTOCLAIM from the beginning of the stream.

Claiming and the delivery counter

The counter that you observe in the XPENDING output is the number of deliveries of each message. The counter is incremented in two ways: when a message is successfully claimed via XCLAIM or when an XREADGROUP call is used in order to access the history of pending messages.

When there are failures, it is normal that messages will be delivered multiple times, but eventually they usually get processed and acknowledged. However there might be a problem processing some specific message, because it is corrupted or crafted in a way that triggers a bug in the processing code. In such a case what happens is that consumers will continuously fail to process this particular message. Because we have the counter of the delivery attempts, we can use that counter to detect messages that for some reason are not processable. So once the deliveries counter reaches a given large number that you chose, it is probably wiser to put such messages in another stream and send a notification to the system administrator. This is basically the way that Redis Streams implements the dead letter concept.

Streams observability

Messaging systems that lack observability are very hard to work with. Not knowing who is consuming messages, what messages are pending, the set of consumer groups active in a given stream, makes everything opaque. For this reason, Redis Streams and consumer groups have different ways to observe what is happening. We already covered XPENDING, which allows us to inspect the list of messages that are under processing at a given moment, together with their idle time and number of deliveries.

However we may want to do more than that, and the XINFO command is an observability interface that can be used with sub-commands in order to get information about streams or consumer groups.

This command uses subcommands in order to show different information about the status of the stream and its consumer groups. For instance XINFO STREAM reports information about the stream itself.

> XINFO STREAM mystream
 1) "length"
 2) (integer) 2
 3) "radix-tree-keys"
 4) (integer) 1
 5) "radix-tree-nodes"
 6) (integer) 2
 7) "last-generated-id"
 8) "1638125141232-0"
 9) "max-deleted-entryid"
10) "0-0"
11) "entries-added"
12) (integer) 2
13) "groups"
14) (integer) 1
15) "first-entry"
16) 1) "1638125133432-0"
    2) 1) "message"
       2) "apple"
17) "last-entry"
18) 1) "1638125141232-0"
    2) 1) "message"
       2) "banana"

The output shows information about how the stream is encoded internally, and also shows the first and last message in the stream. Another piece of information available is the number of consumer groups associated with this stream. We can dig further asking for more information about the consumer groups.

> XINFO GROUPS mystream
1)  1) "name"
    2) "mygroup"
    3) "consumers"
    4) (integer) 2
    5) "pending"
    6) (integer) 2
    7) "last-delivered-id"
    8) "1638126030001-0"
    9) "entries-read"
   10) (integer) 2
   11) "lag"
   12) (integer) 0
2)  1) "name"
    2) "some-other-group"
    3) "consumers"
    4) (integer) 1
    5) "pending"
    6) (integer) 0
    7) "last-delivered-id"
    8) "1638126028070-0"
    9) "entries-read"
   10) (integer) 1
   11) "lag"
   12) (integer) 1

As you can see in this and in the previous output, the XINFO command outputs a sequence of field-value items. Because it is an observability command this allows the human user to immediately understand what information is reported, and allows the command to report more information in the future by adding more fields without breaking compatibility with older clients. Other commands that must be more bandwidth efficient, like XPENDING, just report the information without the field names.

The output of the example above, where the GROUPS subcommand is used, should be clear observing the field names. We can check in more detail the state of a specific consumer group by checking the consumers that are registered in the group.

> XINFO CONSUMERS mystream mygroup
1) 1) name
   2) "Alice"
   3) pending
   4) (integer) 1
   5) idle
   6) (integer) 9104628
2) 1) name
   2) "Bob"
   3) pending
   4) (integer) 1
   5) idle
   6) (integer) 83841983

In case you do not remember the syntax of the command, just ask the command itself for help:

> XINFO HELP
1) XINFO <subcommand> [<arg> [value] [opt] ...]. Subcommands are:
2) CONSUMERS <key> <groupname>
3)     Show consumers of <groupname>.
4) GROUPS <key>
5)     Show the stream consumer groups.
6) STREAM <key> [FULL [COUNT <count>]
7)     Show information about the stream.
8) HELP
9)     Prints this help.

Differences with Kafka (TM) partitions

Consumer groups in Redis streams may resemble in some way Kafka (TM) partitioning-based consumer groups, however note that Redis streams are, in practical terms, very different. The partitions are only logical and the messages are just put into a single Redis key, so the way the different clients are served is based on who is ready to process new messages, and not from which partition clients are reading. For instance, if the consumer C3 at some point fails permanently, Redis will continue to serve C1 and C2 all the new messages arriving, as if now there are only two logical partitions.

Similarly, if a given consumer is much faster at processing messages than the other consumers, this consumer will receive proportionally more messages in the same unit of time. This is possible since Redis tracks all the unacknowledged messages explicitly, and remembers who received which message and the ID of the first message never delivered to any consumer.

However, this also means that in Redis if you really want to partition messages in the same stream into multiple Redis instances, you have to use multiple keys and some sharding system such as Redis Cluster or some other application-specific sharding system. A single Redis stream is not automatically partitioned to multiple instances.

We could say that schematically the following is true:

  • If you use 1 stream -> 1 consumer, you are processing messages in order.
  • If you use N streams with N consumers, so that only a given consumer hits a subset of the N streams, you can scale the above model of 1 stream -> 1 consumer.
  • If you use 1 stream -> N consumers, you are load balancing to N consumers, however in that case, messages about the same logical item may be consumed out of order, because a given consumer may process message 3 faster than another consumer is processing message 4.

So basically Kafka partitions are more similar to using N different Redis keys, while Redis consumer groups are a server-side load balancing system of messages from a given stream to N different consumers.

Capped Streams

Many applications do not want to collect data into a stream forever. Sometimes it is useful to have at maximum a given number of items inside a stream, other times once a given size is reached, it is useful to move data from Redis to a storage which is not in memory and not as fast but suited to store the history for, potentially, decades to come. Redis streams have some support for this. One is the MAXLEN option of the XADD command. This option is very simple to use:

> XADD mystream MAXLEN 2 * value 1
1526654998691-0
> XADD mystream MAXLEN 2 * value 2
1526654999635-0
> XADD mystream MAXLEN 2 * value 3
1526655000369-0
> XLEN mystream
(integer) 2
> XRANGE mystream - +
1) 1) 1526654999635-0
   2) 1) "value"
      2) "2"
2) 1) 1526655000369-0
   2) 1) "value"
      2) "3"

Using MAXLEN the old entries are automatically evicted when the specified length is reached, so that the stream is left at a constant size. There is currently no option to tell the stream to just retain items that are not older than a given period, because such command, in order to run consistently, would potentially block for a long time in order to evict items. Imagine for example what happens if there is an insertion spike, then a long pause, and another insertion, all with the same maximum time. The stream would block to evict the data that became too old during the pause. So it is up to the user to do some planning and understand what is the maximum stream length desired. Moreover, while the length of the stream is proportional to the memory used, trimming by time is less simple to control and anticipate: it depends on the insertion rate which often changes over time (and when it does not change, then to just trim by size is trivial).

However trimming with MAXLEN can be expensive: streams are represented by macro nodes into a radix tree, in order to be very memory efficient. Altering the single macro node, consisting of a few tens of elements, is not optimal. So it's possible to use the command in the following special form:

XADD mystream MAXLEN ~ 1000 * ... entry fields here ...

The ~ argument between the MAXLEN option and the actual count means, I don't really need this to be exactly 1000 items. It can be 1000 or 1010 or 1030, just make sure to save at least 1000 items. With this argument, the trimming is performed only when we can remove a whole node. This makes it much more efficient, and it is usually what you want.

There is also the XTRIM command, which performs something very similar to what the MAXLEN option does above, except that it can be run by itself:

> XTRIM mystream MAXLEN 10

Or, as for the XADD option:

> XTRIM mystream MAXLEN ~ 10

However, XTRIM is designed to accept different trimming strategies. Another trimming strategy is MINID, that evicts entries with IDs lower than the one specified.

As XTRIM is an explicit command, the user is expected to know about the possible shortcomings of different trimming strategies.

Another useful eviction strategy that may be added to XTRIM in the future, is to remove by a range of IDs to ease use of XRANGE and XTRIM to move data from Redis to other storage systems if needed.

Special IDs in the streams API

You may have noticed that there are several special IDs that can be used in the Redis API. Here is a short recap, so that they can make more sense in the future.

The first two special IDs are - and +, and are used in range queries with the XRANGE command. Those two IDs respectively mean the smallest ID possible (that is basically 0-1) and the greatest ID possible (that is 18446744073709551615-18446744073709551615). As you can see it is a lot cleaner to write - and + instead of those numbers.

Then there are APIs where we want to say, the ID of the item with the greatest ID inside the stream. This is what $ means. So for instance if I want only new entries with XREADGROUP I use this ID to signify I already have all the existing entries, but not the new ones that will be inserted in the future. Similarly when I create or set the ID of a consumer group, I can set the last delivered item to $ in order to just deliver new entries to the consumers in the group.

As you can see $ does not mean +, they are two different things, as + is the greatest ID possible in every possible stream, while $ is the greatest ID in a given stream containing given entries. Moreover APIs will usually only understand + or $, yet it was useful to avoid loading a given symbol with multiple meanings.

Another special ID is >, that is a special meaning only related to consumer groups and only when the XREADGROUP command is used. This special ID means that we want only entries that were never delivered to other consumers so far. So basically the > ID is the last delivered ID of a consumer group.

Finally the special ID *, that can be used only with the XADD command, means to auto select an ID for us for the new entry.

So we have -, +, $, > and *, and all have a different meaning, and most of the time, can be used in different contexts.

Persistence, replication and message safety

A Stream, like any other Redis data structure, is asynchronously replicated to replicas and persisted into AOF and RDB files. However what may not be so obvious is that also the consumer groups full state is propagated to AOF, RDB and replicas, so if a message is pending in the master, also the replica will have the same information. Similarly, after a restart, the AOF will restore the consumer groups' state.

However note that Redis streams and consumer groups are persisted and replicated using the Redis default replication, so:

  • AOF must be used with a strong fsync policy if persistence of messages is important in your application.
  • By default the asynchronous replication will not guarantee that XADD commands or consumer groups state changes are replicated: after a failover something can be missing depending on the ability of replicas to receive the data from the master.
  • The WAIT command may be used in order to force the propagation of the changes to a set of replicas. However note that while this makes it very unlikely that data is lost, the Redis failover process as operated by Sentinel or Redis Cluster performs only a best effort check to failover to the replica which is the most updated, and under certain specific failure conditions may promote a replica that lacks some data.

So when designing an application using Redis streams and consumer groups, make sure to understand the semantical properties your application should have during failures, and configure things accordingly, evaluating whether it is safe enough for your use case.

Removing single items from a stream

Streams also have a special command for removing items from the middle of a stream, just by ID. Normally for an append only data structure this may look like an odd feature, but it is actually useful for applications involving, for instance, privacy regulations. The command is called XDEL and receives the name of the stream followed by the IDs to delete:

> XRANGE mystream - + COUNT 2
1) 1) 1526654999635-0
   2) 1) "value"
      2) "2"
2) 1) 1526655000369-0
   2) 1) "value"
      2) "3"
> XDEL mystream 1526654999635-0
(integer) 1
> XRANGE mystream - + COUNT 2
1) 1) 1526655000369-0
   2) 1) "value"
      2) "3"

However in the current implementation, memory is not really reclaimed until a macro node is completely empty, so you should not abuse this feature.

Zero length streams

A difference between streams and other Redis data structures is that when the other data structures no longer have any elements, as a side effect of calling commands that remove elements, the key itself will be removed. So for instance, a sorted set will be completely removed when a call to ZREM will remove the last element in the sorted set. Streams, on the other hand, are allowed to stay at zero elements, both as a result of using a MAXLEN option with a count of zero (XADD and XTRIM commands), or because XDEL was called.

The reason why such an asymmetry exists is because Streams may have associated consumer groups, and we do not want to lose the state that the consumer groups defined just because there are no longer any items in the stream. Currently the stream is not deleted even when it has no associated consumer groups.

Total latency of consuming a message

Non blocking stream commands like XRANGE and XREAD or XREADGROUP without the BLOCK option are served synchronously like any other Redis command, so to discuss latency of such commands is meaningless: it is more interesting to check the time complexity of the commands in the Redis documentation. It should be enough to say that stream commands are at least as fast as sorted set commands when extracting ranges, and that XADD is very fast and can easily insert from half a million to one million items per second in an average machine if pipelining is used.

However latency becomes an interesting parameter if we want to understand the delay of processing a message, in the context of blocking consumers in a consumer group, from the moment the message is produced via XADD, to the moment the message is obtained by the consumer because XREADGROUP returned with the message.

How serving blocked consumers works

Before providing the results of performed tests, it is interesting to understand what model Redis uses in order to route stream messages (and in general actually how any blocking operation waiting for data is managed).

  • The blocked client is referenced in a hash table that maps keys for which there is at least one blocking consumer, to a list of consumers that are waiting for such key. This way, given a key that received data, we can resolve all the clients that are waiting for such data.
  • When a write happens, in this case when the XADD command is called, it calls the signalKeyAsReady() function. This function will put the key into a list of keys that need to be processed, because such keys may have new data for blocked consumers. Note that such ready keys will be processed later, so in the course of the same event loop cycle, it is possible that the key will receive other writes.
  • Finally, before returning into the event loop, the ready keys are finally processed. For each key the list of clients waiting for data is scanned, and if applicable, such clients will receive the new data that arrived. In the case of streams the data is the messages in the applicable range requested by the consumer.

As you can see, basically, before returning to the event loop both the client calling XADD and the clients blocked to consume messages, will have their reply in the output buffers, so the caller of XADD should receive the reply from Redis at about the same time the consumers will receive the new messages.

This model is push-based, since adding data to the consumers buffers will be performed directly by the action of calling XADD, so the latency tends to be quite predictable.

Latency tests results

In order to check these latency characteristics a test was performed using multiple instances of Ruby programs pushing messages having as an additional field the computer millisecond time, and Ruby programs reading the messages from the consumer group and processing them. The message processing step consisted of comparing the current computer time with the message timestamp, in order to understand the total latency.

Results obtained:

Processed between 0 and 1 ms -> 74.11%
Processed between 1 and 2 ms -> 25.80%
Processed between 2 and 3 ms -> 0.06%
Processed between 3 and 4 ms -> 0.01%
Processed between 4 and 5 ms -> 0.02%

So 99.9% of requests have a latency <= 2 milliseconds, with the outliers that remain still very close to the average.

Adding a few million unacknowledged messages to the stream does not change the gist of the benchmark, with most queries still processed with very short latency.

A few remarks:

  • Here we processed up to 10k messages per iteration, this means that the COUNT parameter of XREADGROUP was set to 10000. This adds a lot of latency but is needed in order to allow the slow consumers to be able to keep with the message flow. So you can expect a real world latency that is a lot smaller.
  • The system used for this benchmark is very slow compared to today's standards.

6 - Key eviction

Overview of Redis key eviction policies (LRU, LFU, etc.)

When Redis is used as a cache, it is often convenient to let it automatically evict old data as you add new data. This behavior is well known in the developer community, since it is the default behavior for the popular memcached system.

This page covers the more general topic of the Redis maxmemory directive used to limit the memory usage to a fixed amount. This page it also covers in depth the LRU eviction algorithm used by Redis, that is actually an approximation of the exact LRU.

Maxmemory configuration directive

The maxmemory configuration directive configures Redis to use a specified amount of memory for the data set. You can set the configuration directive using the redis.conf file, or later using the CONFIG SET command at runtime.

For example, to configure a memory limit of 100 megabytes, you can use the following directive inside the redis.conf file:

maxmemory 100mb

Setting maxmemory to zero results into no memory limits. This is the default behavior for 64 bit systems, while 32 bit systems use an implicit memory limit of 3GB.

When the specified amount of memory is reached, how eviction policies are configured determines the default behavior. Redis can return errors for commands that could result in more memory being used, or it can evict some old data to return back to the specified limit every time new data is added.

Eviction policies

The exact behavior Redis follows when the maxmemory limit is reached is configured using the maxmemory-policy configuration directive.

The following policies are available:

  • noeviction: New values aren’t saved when memory limit is reached. When a database uses replication, this applies to the primary database
  • allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
  • allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
  • volatile-lru: Removes least recently used keys with the expire field set to true.
  • volatile-lfu: Removes least frequently used keys with the expire field set to true.
  • allkeys-random: Randomly removes keys to make space for the new data added.
  • volatile-random: Randomly removes keys with expire field set to true.
  • volatile-ttl: Removes least frequently used keys with expire field set to true and the shortest remaining time-to-live (TTL) value.

The policies volatile-lru, volatile-lfu, volatile-random, and volatile-ttl behave like noeviction if there are no keys to evict matching the prerequisites.

Picking the right eviction policy is important depending on the access pattern of your application, however you can reconfigure the policy at runtime while the application is running, and monitor the number of cache misses and hits using the Redis INFO output to tune your setup.

In general as a rule of thumb:

  • Use the allkeys-lru policy when you expect a power-law distribution in the popularity of your requests. That is, you expect a subset of elements will be accessed far more often than the rest. This is a good pick if you are unsure.

  • Use the allkeys-random if you have a cyclic access where all the keys are scanned continuously, or when you expect the distribution to be uniform.

  • Use the volatile-ttl if you want to be able to provide hints to Redis about what are good candidate for expiration by using different TTL values when you create your cache objects.

The volatile-lru and volatile-random policies are mainly useful when you want to use a single instance for both caching and to have a set of persistent keys. However it is usually a better idea to run two Redis instances to solve such a problem.

It is also worth noting that setting an expire value to a key costs memory, so using a policy like allkeys-lru is more memory efficient since there is no need for an expire configuration for the key to be evicted under memory pressure.

How the eviction process works

It is important to understand that the eviction process works like this:

  • A client runs a new command, resulting in more data added.
  • Redis checks the memory usage, and if it is greater than the maxmemory limit , it evicts keys according to the policy.
  • A new command is executed, and so forth.

So we continuously cross the boundaries of the memory limit, by going over it, and then by evicting keys to return back under the limits.

If a command results in a lot of memory being used (like a big set intersection stored into a new key) for some time, the memory limit can be surpassed by a noticeable amount.

Approximated LRU algorithm

Redis LRU algorithm is not an exact implementation. This means that Redis is not able to pick the best candidate for eviction, that is, the access that was accessed the furthest in the past. Instead it will try to run an approximation of the LRU algorithm, by sampling a small number of keys, and evicting the one that is the best (with the oldest access time) among the sampled keys.

However, since Redis 3.0 the algorithm was improved to also take a pool of good candidates for eviction. This improved the performance of the algorithm, making it able to approximate more closely the behavior of a real LRU algorithm.

What is important about the Redis LRU algorithm is that you are able to tune the precision of the algorithm by changing the number of samples to check for every eviction. This parameter is controlled by the following configuration directive:

maxmemory-samples 5

The reason Redis does not use a true LRU implementation is because it costs more memory. However, the approximation is virtually equivalent for an application using Redis. The following is a graphical comparison of how the LRU approximation used by Redis compares with true LRU.

LRU comparison

The test to generate the above graphs filled a Redis server with a given number of keys. The keys were accessed from the first to the last. The first keys are the best candidates for eviction using an LRU algorithm. Later more 50% of keys are added, in order to force half of the old keys to be evicted.

You can see three kind of dots in the graphs, forming three distinct bands.

  • The light gray band are objects that were evicted.
  • The gray band are objects that were not evicted.
  • The green band are objects that were added.

In a theoretical LRU implementation we expect that, among the old keys, the first half will be expired. The Redis LRU algorithm will instead only probabilistically expire the older keys.

As you can see Redis 3.0 does a better job with 5 samples compared to Redis 2.8, however most objects that are among the latest accessed are still retained by Redis 2.8. Using a sample size of 10 in Redis 3.0 the approximation is very close to the theoretical performance of Redis 3.0.

Note that LRU is just a model to predict how likely a given key will be accessed in the future. Moreover, if your data access pattern closely resembles the power law, most of the accesses will be in the set of keys the LRU approximated algorithm can handle well.

In simulations we found that using a power law access pattern, the difference between true LRU and Redis approximation were minimal or non-existent.

However you can raise the sample size to 10 at the cost of some additional CPU usage to closely approximate true LRU, and check if this makes a difference in your cache misses rate.

To experiment in production with different values for the sample size by using the CONFIG SET maxmemory-samples <count> command, is very simple.

The new LFU mode

Starting with Redis 4.0, the Least Frequently Used eviction mode is available. This mode may work better (provide a better hits/misses ratio) in certain cases. In LFU mode, Redis will try to track the frequency of access of items, so the ones used rarely are evicted. This means the keys used often have an higher chance of remaining in memory.

To configure the LFU mode, the following policies are available:

  • volatile-lfu Evict using approximated LFU among the keys with an expire set.
  • allkeys-lfu Evict any key using approximated LFU.

LFU is approximated like LRU: it uses a probabilistic counter, called a Morris counter to estimate the object access frequency using just a few bits per object, combined with a decay period so that the counter is reduced over time. At some point we no longer want to consider keys as frequently accessed, even if they were in the past, so that the algorithm can adapt to a shift in the access pattern.

That information is sampled similarly to what happens for LRU (as explained in the previous section of this documentation) to select a candidate for eviction.

However unlike LRU, LFU has certain tunable parameters: for example, how fast should a frequent item lower in rank if it gets no longer accessed? It is also possible to tune the Morris counters range to better adapt the algorithm to specific use cases.

By default Redis is configured to:

  • Saturate the counter at, around, one million requests.
  • Decay the counter every one minute.

Those should be reasonable values and were tested experimental, but the user may want to play with these configuration settings to pick optimal values.

Instructions about how to tune these parameters can be found inside the example redis.conf file in the source distribution. Briefly, they are:

lfu-log-factor 10
lfu-decay-time 1

The decay time is the obvious one, it is the amount of minutes a counter should be decayed, when sampled and found to be older than that value. A special value of 0 means: always decay the counter every time is scanned, and is rarely useful.

The counter logarithm factor changes how many hits are needed to saturate the frequency counter, which is just in the range 0-255. The higher the factor, the more accesses are needed to reach the maximum. The lower the factor, the better is the resolution of the counter for low accesses, according to the following table:

+--------+------------+------------+------------+------------+------------+
| factor | 100 hits   | 1000 hits  | 100K hits  | 1M hits    | 10M hits   |
+--------+------------+------------+------------+------------+------------+
| 0      | 104        | 255        | 255        | 255        | 255        |
+--------+------------+------------+------------+------------+------------+
| 1      | 18         | 49         | 255        | 255        | 255        |
+--------+------------+------------+------------+------------+------------+
| 10     | 10         | 18         | 142        | 255        | 255        |
+--------+------------+------------+------------+------------+------------+
| 100    | 8          | 11         | 49         | 143        | 255        |
+--------+------------+------------+------------+------------+------------+

So basically the factor is a trade off between better distinguishing items with low accesses VS distinguishing items with high accesses. More information is available in the example redis.conf file.

7 - High availability with Redis Sentinel

High availability for non-clustered Redis

Redis Sentinel provides high availability for Redis when not using Redis Cluster.

Redis Sentinel also provides other collateral tasks such as monitoring, notifications and acts as a configuration provider for clients.

This is the full list of Sentinel capabilities at a macroscopic level (i.e. the big picture):

  • Monitoring. Sentinel constantly checks if your master and replica instances are working as expected.
  • Notification. Sentinel can notify the system administrator, or other computer programs, via an API, that something is wrong with one of the monitored Redis instances.
  • Automatic failover. If a master is not working as expected, Sentinel can start a failover process where a replica is promoted to master, the other additional replicas are reconfigured to use the new master, and the applications using the Redis server are informed about the new address to use when connecting.
  • Configuration provider. Sentinel acts as a source of authority for clients service discovery: clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels will report the new address.

Sentinel as a distributed system

Redis Sentinel is a distributed system:

Sentinel itself is designed to run in a configuration where there are multiple Sentinel processes cooperating together. The advantage of having multiple Sentinel processes cooperating are the following:

  1. Failure detection is performed when multiple Sentinels agree about the fact a given master is no longer available. This lowers the probability of false positives.
  2. Sentinel works even if not all the Sentinel processes are working, making the system robust against failures. There is no fun in having a failover system which is itself a single point of failure, after all.

The sum of Sentinels, Redis instances (masters and replicas) and clients connecting to Sentinel and Redis, are also a larger distributed system with specific properties. In this document concepts will be introduced gradually starting from basic information needed in order to understand the basic properties of Sentinel, to more complex information (that are optional) in order to understand how exactly Sentinel works.

Sentinel quick start

Obtaining Sentinel

The current version of Sentinel is called Sentinel 2. It is a rewrite of the initial Sentinel implementation using stronger and simpler-to-predict algorithms (that are explained in this documentation).

A stable release of Redis Sentinel is shipped since Redis 2.8.

New developments are performed in the unstable branch, and new features sometimes are back ported into the latest stable branch as soon as they are considered to be stable.

Redis Sentinel version 1, shipped with Redis 2.6, is deprecated and should not be used.

Running Sentinel

If you are using the redis-sentinel executable (or if you have a symbolic link with that name to the redis-server executable) you can run Sentinel with the following command line:

redis-sentinel /path/to/sentinel.conf

Otherwise you can use directly the redis-server executable starting it in Sentinel mode:

redis-server /path/to/sentinel.conf --sentinel

Both ways work the same.

However it is mandatory to use a configuration file when running Sentinel, as this file will be used by the system in order to save the current state that will be reloaded in case of restarts. Sentinel will simply refuse to start if no configuration file is given or if the configuration file path is not writable.

Sentinels by default run listening for connections to TCP port 26379, so for Sentinels to work, port 26379 of your servers must be open to receive connections from the IP addresses of the other Sentinel instances. Otherwise Sentinels can't talk and can't agree about what to do, so failover will never be performed.

Fundamental things to know about Sentinel before deploying

  1. You need at least three Sentinel instances for a robust deployment.
  2. The three Sentinel instances should be placed into computers or virtual machines that are believed to fail in an independent way. So for example different physical servers or Virtual Machines executed on different availability zones.
  3. Sentinel + Redis distributed system does not guarantee that acknowledged writes are retained during failures, since Redis uses asynchronous replication. However there are ways to deploy Sentinel that make the window to lose writes limited to certain moments, while there are other less secure ways to deploy it.
  4. You need Sentinel support in your clients. Popular client libraries have Sentinel support, but not all.
  5. There is no HA setup which is safe if you don't test from time to time in development environments, or even better if you can, in production environments, if they work. You may have a misconfiguration that will become apparent only when it's too late (at 3am when your master stops working).
  6. Sentinel, Docker, or other forms of Network Address Translation or Port Mapping should be mixed with care: Docker performs port remapping, breaking Sentinel auto discovery of other Sentinel processes and the list of replicas for a master. Check the section about Sentinel and Docker later in this document for more information.

Configuring Sentinel

The Redis source distribution contains a file called sentinel.conf that is a self-documented example configuration file you can use to configure Sentinel, however a typical minimal configuration file looks like the following:

sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 60000
sentinel failover-timeout mymaster 180000
sentinel parallel-syncs mymaster 1

sentinel monitor resque 192.168.1.3 6380 4
sentinel down-after-milliseconds resque 10000
sentinel failover-timeout resque 180000
sentinel parallel-syncs resque 5

You only need to specify the masters to monitor, giving to each separated master (that may have any number of replicas) a different name. There is no need to specify replicas, which are auto-discovered. Sentinel will update the configuration automatically with additional information about replicas (in order to retain the information in case of restart). The configuration is also rewritten every time a replica is promoted to master during a failover and every time a new Sentinel is discovered.

The example configuration above basically monitors two sets of Redis instances, each composed of a master and an undefined number of replicas. One set of instances is called mymaster, and the other resque.

The meaning of the arguments of sentinel monitor statements is the following:

sentinel monitor <master-group-name> <ip> <port> <quorum>

For the sake of clarity, let's check line by line what the configuration options mean:

The first line is used to tell Redis to monitor a master called mymaster, that is at address 127.0.0.1 and port 6379, with a quorum of 2. Everything is pretty obvious but the quorum argument:

  • The quorum is the number of Sentinels that need to agree about the fact the master is not reachable, in order to really mark the master as failing, and eventually start a failover procedure if possible.
  • However the quorum is only used to detect the failure. In order to actually perform a failover, one of the Sentinels need to be elected leader for the failover and be authorized to proceed. This only happens with the vote of the majority of the Sentinel processes.

So for example if you have 5 Sentinel processes, and the quorum for a given master set to the value of 2, this is what happens:

  • If two Sentinels agree at the same time about the master being unreachable, one of the two will try to start a failover.
  • If there are at least a total of three Sentinels reachable, the failover will be authorized and will actually start.

In practical terms this means during failures Sentinel never starts a failover if the majority of Sentinel processes are unable to talk (aka no failover in the minority partition).

Other Sentinel options

The other options are almost always in the form:

sentinel <option_name> <master_name> <option_value>

And are used for the following purposes:

  • down-after-milliseconds is the time in milliseconds an instance should not be reachable (either does not reply to our PINGs or it is replying with an error) for a Sentinel starting to think it is down.
  • parallel-syncs sets the number of replicas that can be reconfigured to use the new master after a failover at the same time. The lower the number, the more time it will take for the failover process to complete, however if the replicas are configured to serve old data, you may not want all the replicas to re-synchronize with the master at the same time. While the replication process is mostly non blocking for a replica, there is a moment when it stops to load the bulk data from the master. You may want to make sure only one replica at a time is not reachable by setting this option to the value of 1.

Additional options are described in the rest of this document and documented in the example sentinel.conf file shipped with the Redis distribution.

Configuration parameters can be modified at runtime:

  • Master-specific configuration parameters are modified using SENTINEL SET.
  • Global configuration parameters are modified using SENTINEL CONFIG SET.

See the Reconfiguring Sentinel at runtime section for more information.

Example Sentinel deployments

Now that you know the basic information about Sentinel, you may wonder where you should place your Sentinel processes, how many Sentinel processes you need and so forth. This section shows a few example deployments.

We use ASCII art in order to show you configuration examples in a graphical format, this is what the different symbols means:

+--------------------+
| This is a computer |
| or VM that fails   |
| independently. We  |
| call it a "box"    |
+--------------------+

We write inside the boxes what they are running:

+-------------------+
| Redis master M1   |
| Redis Sentinel S1 |
+-------------------+

Different boxes are connected by lines, to show that they are able to talk:

+-------------+               +-------------+
| Sentinel S1 |---------------| Sentinel S2 |
+-------------+               +-------------+

Network partitions are shown as interrupted lines using slashes:

+-------------+                +-------------+
| Sentinel S1 |------ // ------| Sentinel S2 |
+-------------+                +-------------+

Also note that:

  • Masters are called M1, M2, M3, ..., Mn.
  • Replicas are called R1, R2, R3, ..., Rn (R stands for replica).
  • Sentinels are called S1, S2, S3, ..., Sn.
  • Clients are called C1, C2, C3, ..., Cn.
  • When an instance changes role because of Sentinel actions, we put it inside square brackets, so [M1] means an instance that is now a master because of Sentinel intervention.

Note that we will never show setups where just two Sentinels are used, since Sentinels always need to talk with the majority in order to start a failover.

Example 1: just two Sentinels, DON'T DO THIS

+----+         +----+
| M1 |---------| R1 |
| S1 |         | S2 |
+----+         +----+

Configuration: quorum = 1
  • In this setup, if the master M1 fails, R1 will be promoted since the two Sentinels can reach agreement about the failure (obviously with quorum set to 1) and can also authorize a failover because the majority is two. So apparently it could superficially work, however check the next points to see why this setup is broken.
  • If the box where M1 is running stops working, also S1 stops working. The Sentinel running in the other box S2 will not be able to authorize a failover, so the system will become not available.

Note that a majority is needed in order to order different failovers, and later propagate the latest configuration to all the Sentinels. Also note that the ability to failover in a single side of the above setup, without any agreement, would be very dangerous:

+----+           +------+
| M1 |----//-----| [M1] |
| S1 |           | S2   |
+----+           +------+

In the above configuration we created two masters (assuming S2 could failover without authorization) in a perfectly symmetrical way. Clients may write indefinitely to both sides, and there is no way to understand when the partition heals what configuration is the right one, in order to prevent a permanent split brain condition.

So please deploy at least three Sentinels in three different boxes always.

Example 2: basic setup with three boxes

This is a very simple setup, that has the advantage to be simple to tune for additional safety. It is based on three boxes, each box running both a Redis process and a Sentinel process.

       +----+
       | M1 |
       | S1 |
       +----+
          |
+----+    |    +----+
| R2 |----+----| R3 |
| S2 |         | S3 |
+----+         +----+

Configuration: quorum = 2

If the master M1 fails, S2 and S3 will agree about the failure and will be able to authorize a failover, making clients able to continue.

In every Sentinel setup, as Redis uses asynchronous replication, there is always the risk of losing some writes because a given acknowledged write may not be able to reach the replica which is promoted to master. However in the above setup there is a higher risk due to clients being partitioned away with an old master, like in the following picture:

         +----+
         | M1 |
         | S1 | <- C1 (writes will be lost)
         +----+
            |
            /
            /
+------+    |    +----+
| [M2] |----+----| R3 |
| S2   |         | S3 |
+------+         +----+

In this case a network partition isolated the old master M1, so the replica R2 is promoted to master. However clients, like C1, that are in the same partition as the old master, may continue to write data to the old master. This data will be lost forever since when the partition will heal, the master will be reconfigured as a replica of the new master, discarding its data set.

This problem can be mitigated using the following Redis replication feature, that allows to stop accepting writes if a master detects that it is no longer able to transfer its writes to the specified number of replicas.

min-replicas-to-write 1
min-replicas-max-lag 10

With the above configuration (please see the self-commented redis.conf example in the Redis distribution for more information) a Redis instance, when acting as a master, will stop accepting writes if it can't write to at least 1 replica. Since replication is asynchronous not being able to write actually means that the replica is either disconnected, or is not sending us asynchronous acknowledges for more than the specified max-lag number of seconds.

Using this configuration, the old Redis master M1 in the above example, will become unavailable after 10 seconds. When the partition heals, the Sentinel configuration will converge to the new one, the client C1 will be able to fetch a valid configuration and will continue with the new master.

However there is no free lunch. With this refinement, if the two replicas are down, the master will stop accepting writes. It's a trade off.

Example 3: Sentinel in the client boxes

Sometimes we have only two Redis boxes available, one for the master and one for the replica. The configuration in the example 2 is not viable in that case, so we can resort to the following, where Sentinels are placed where clients are:

            +----+         +----+
            | M1 |----+----| R1 |
            |    |    |    |    |
            +----+    |    +----+
                      |
         +------------+------------+
         |            |            |
         |            |            |
      +----+        +----+      +----+
      | C1 |        | C2 |      | C3 |
      | S1 |        | S2 |      | S3 |
      +----+        +----+      +----+

      Configuration: quorum = 2

In this setup, the point of view Sentinels is the same as the clients: if a master is reachable by the majority of the clients, it is fine. C1, C2, C3 here are generic clients, it does not mean that C1 identifies a single client connected to Redis. It is more likely something like an application server, a Rails app, or something like that.

If the box where M1 and S1 are running fails, the failover will happen without issues, however it is easy to see that different network partitions will result in different behaviors. For example Sentinel will not be able to setup if the network between the clients and the Redis servers is disconnected, since the Redis master and replica will both be unavailable.

Note that if C3 gets partitioned with M1 (hardly possible with the network described above, but more likely possible with different layouts, or because of failures at the software layer), we have a similar issue as described in Example 2, with the difference that here we have no way to break the symmetry, since there is just a replica and master, so the master can't stop accepting queries when it is disconnected from its replica, otherwise the master would never be available during replica failures.

So this is a valid setup but the setup in the Example 2 has advantages such as the HA system of Redis running in the same boxes as Redis itself which may be simpler to manage, and the ability to put a bound on the amount of time a master in the minority partition can receive writes.

Example 4: Sentinel client side with less than three clients

The setup described in the Example 3 cannot be used if there are less than three boxes in the client side (for example three web servers). In this case we need to resort to a mixed setup like the following:

            +----+         +----+
            | M1 |----+----| R1 |
            | S1 |    |    | S2 |
            +----+    |    +----+
                      |
               +------+-----+
               |            |
               |            |
            +----+        +----+
            | C1 |        | C2 |
            | S3 |        | S4 |
            +----+        +----+

      Configuration: quorum = 3

This is similar to the setup in Example 3, but here we run four Sentinels in the four boxes we have available. If the master M1 becomes unavailable the other three Sentinels will perform the failover.

In theory this setup works removing the box where C2 and S4 are running, and setting the quorum to 2. However it is unlikely that we want HA in the Redis side without having high availability in our application layer.

Sentinel, Docker, NAT, and possible issues

Docker uses a technique called port mapping: programs running inside Docker containers may be exposed with a different port compared to the one the program believes to be using. This is useful in order to run multiple containers using the same ports, at the same time, in the same server.

Docker is not the only software system where this happens, there are other Network Address Translation setups where ports may be remapped, and sometimes not ports but also IP addresses.

Remapping ports and addresses creates issues with Sentinel in two ways:

  1. Sentinel auto-discovery of other Sentinels no longer works, since it is based on hello messages where each Sentinel announce at which port and IP address they are listening for connection. However Sentinels have no way to understand that an address or port is remapped, so it is announcing an information that is not correct for other Sentinels to connect.
  2. Replicas are listed in the INFO output of a Redis master in a similar way: the address is detected by the master checking the remote peer of the TCP connection, while the port is advertised by the replica itself during the handshake, however the port may be wrong for the same reason as exposed in point 1.

Since Sentinels auto detect replicas using masters INFO output information, the detected replicas will not be reachable, and Sentinel will never be able to failover the master, since there are no good replicas from the point of view of the system, so there is currently no way to monitor with Sentinel a set of master and replica instances deployed with Docker, unless you instruct Docker to map the port 1:1.

For the first problem, in case you want to run a set of Sentinel instances using Docker with forwarded ports (or any other NAT setup where ports are remapped), you can use the following two Sentinel configuration directives in order to force Sentinel to announce a specific set of IP and port:

sentinel announce-ip <ip>
sentinel announce-port <port>

Note that Docker has the ability to run in host networking mode (check the --net=host option for more information). This should create no issues since ports are not remapped in this setup.

IP Addresses and DNS names

Older versions of Sentinel did not support host names and required IP addresses to be specified everywhere. Starting with version 6.2, Sentinel has optional support for host names.

This capability is disabled by default. If you're going to enable DNS/hostnames support, please note:

  1. The name resolution configuration on your Redis and Sentinel nodes must be reliable and be able to resolve addresses quickly. Unexpected delays in address resolution may have a negative impact on Sentinel.
  2. You should use hostnames everywhere and avoid mixing hostnames and IP addresses. To do that, use replica-announce-ip <hostname> and sentinel announce-ip <hostname> for all Redis and Sentinel instances, respectively.

Enabling the resolve-hostnames global configuration allows Sentinel to accept host names:

  • As part of a sentinel monitor command
  • As a replica address, if the replica uses a host name value for replica-announce-ip

Sentinel will accept host names as valid inputs and resolve them, but will still refer to IP addresses when announcing an instance, updating configuration files, etc.

Enabling the announce-hostnames global configuration makes Sentinel use host names instead. This affects replies to clients, values written in configuration files, the REPLICAOF command issued to replicas, etc.

This behavior may not be compatible with all Sentinel clients, that may explicitly expect an IP address.

Using host names may be useful when clients use TLS to connect to instances and require a name rather than an IP address in order to perform certificate ASN matching.

A quick tutorial

In the next sections of this document, all the details about Sentinel API, configuration and semantics will be covered incrementally. However for people that want to play with the system ASAP, this section is a tutorial that shows how to configure and interact with 3 Sentinel instances.

Here we assume that the instances are executed at port 5000, 5001, 5002. We also assume that you have a running Redis master at port 6379 with a replica running at port 6380. We will use the IPv4 loopback address 127.0.0.1 everywhere during the tutorial, assuming you are running the simulation on your personal computer.

The three Sentinel configuration files should look like the following:

port 5000
sentinel monitor mymaster 127.0.0.1 6379 2
sentinel down-after-milliseconds mymaster 5000
sentinel failover-timeout mymaster 60000
sentinel parallel-syncs mymaster 1

The other two configuration files will be identical but using 5001 and 5002 as port numbers.

A few things to note about the above configuration:

  • The master set is called mymaster. It identifies the master and its replicas. Since each master set has a different name, Sentinel can monitor different sets of masters and replicas at the same time.
  • The quorum was set to the value of 2 (last argument of sentinel monitor configuration directive).
  • The down-after-milliseconds value is 5000 milliseconds, that is 5 seconds, so masters will be detected as failing as soon as we don't receive any reply from our pings within this amount of time.

Once you start the three Sentinels, you'll see a few messages they log, like:

+monitor master mymaster 127.0.0.1 6379 quorum 2

This is a Sentinel event, and you can receive this kind of events via Pub/Sub if you SUBSCRIBE to the event name as specified later in Pub/Sub Messages section.

Sentinel generates and logs different events during failure detection and failover.

Asking Sentinel about the state of a master

The most obvious thing to do with Sentinel to get started, is check if the master it is monitoring is doing well:

$ redis-cli -p 5000
127.0.0.1:5000> sentinel master mymaster
 1) "name"
 2) "mymaster"
 3) "ip"
 4) "127.0.0.1"
 5) "port"
 6) "6379"
 7) "runid"
 8) "953ae6a589449c13ddefaee3538d356d287f509b"
 9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "735"
19) "last-ping-reply"
20) "735"
21) "down-after-milliseconds"
22) "5000"
23) "info-refresh"
24) "126"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "532439"
29) "config-epoch"
30) "1"
31) "num-slaves"
32) "1"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "60000"
39) "parallel-syncs"
40) "1"

As you can see, it prints a number of information about the master. There are a few that are of particular interest for us:

  1. num-other-sentinels is 2, so we know the Sentinel already detected two more Sentinels for this master. If you check the logs you'll see the +sentinel events generated.
  2. flags is just master. If the master was down we could expect to see s_down or o_down flag as well here.
  3. num-slaves is correctly set to 1, so Sentinel also detected that there is an attached replica to our master.

In order to explore more about this instance, you may want to try the following two commands:

SENTINEL replicas mymaster
SENTINEL sentinels mymaster

The first will provide similar information about the replicas connected to the master, and the second about the other Sentinels.

Obtaining the address of the current master

As we already specified, Sentinel also acts as a configuration provider for clients that want to connect to a set of master and replicas. Because of possible failovers or reconfigurations, clients have no idea about who is the currently active master for a given set of instances, so Sentinel exports an API to ask this question:

127.0.0.1:5000> SENTINEL get-master-addr-by-name mymaster
1) "127.0.0.1"
2) "6379"

Testing the failover

At this point our toy Sentinel deployment is ready to be tested. We can just kill our master and check if the configuration changes. To do so we can just do:

redis-cli -p 6379 DEBUG sleep 30

This command will make our master no longer reachable, sleeping for 30 seconds. It basically simulates a master hanging for some reason.

If you check the Sentinel logs, you should be able to see a lot of action:

  1. Each Sentinel detects the master is down with an +sdown event.
  2. This event is later escalated to +odown, which means that multiple Sentinels agree about the fact the master is not reachable.
  3. Sentinels vote a Sentinel that will start the first failover attempt.
  4. The failover happens.

If you ask again what is the current master address for mymaster, eventually we should get a different reply this time:

127.0.0.1:5000> SENTINEL get-master-addr-by-name mymaster
1) "127.0.0.1"
2) "6380"

So far so good... At this point you may jump to create your Sentinel deployment or can read more to understand all the Sentinel commands and internals.

Sentinel API

Sentinel provides an API in order to inspect its state, check the health of monitored masters and replicas, subscribe in order to receive specific notifications, and change the Sentinel configuration at run time.

By default Sentinel runs using TCP port 26379 (note that 6379 is the normal Redis port). Sentinels accept commands using the Redis protocol, so you can use redis-cli or any other unmodified Redis client in order to talk with Sentinel.

It is possible to directly query a Sentinel to check what is the state of the monitored Redis instances from its point of view, to see what other Sentinels it knows, and so forth. Alternatively, using Pub/Sub, it is possible to receive push style notifications from Sentinels, every time some event happens, like a failover, or an instance entering an error condition, and so forth.

Sentinel commands

The SENTINEL command is the main API for Sentinel. The following is the list of its subcommands (minimal version is noted for where applicable):

  • SENTINEL CONFIG GET <name> (>= 6.2) Get the current value of a global Sentinel configuration parameter. The specified name may be a wildcard, similar to the Redis CONFIG GET command.
  • SENTINEL CONFIG SET <name> <value> (>= 6.2) Set the value of a global Sentinel configuration parameter.
  • SENTINEL CKQUORUM <master name> Check if the current Sentinel configuration is able to reach the quorum needed to failover a master, and the majority needed to authorize the failover. This command should be used in monitoring systems to check if a Sentinel deployment is ok.
  • SENTINEL FLUSHCONFIG Force Sentinel to rewrite its configuration on disk, including the current Sentinel state. Normally Sentinel rewrites the configuration every time something changes in its state (in the context of the subset of the state which is persisted on disk across restart). However sometimes it is possible that the configuration file is lost because of operation errors, disk failures, package upgrade scripts or configuration managers. In those cases a way to force Sentinel to rewrite the configuration file is handy. This command works even if the previous configuration file is completely missing.
  • SENTINEL FAILOVER <master name> Force a failover as if the master was not reachable, and without asking for agreement to other Sentinels (however a new version of the configuration will be published so that the other Sentinels will update their configurations).
  • SENTINEL GET-MASTER-ADDR-BY-NAME <master name> Return the ip and port number of the master with that name. If a failover is in progress or terminated successfully for this master it returns the address and port of the promoted replica.
  • SENTINEL INFO-CACHE (>= 3.2) Return cached INFO output from masters and replicas.
  • SENTINEL IS-MASTER-DOWN-BY-ADDR Check if the master specified by ip:port is down from current Sentinel's point of view. This command is mostly for internal use.
  • SENTINEL MASTER <master name> Show the state and info of the specified master.
  • SENTINEL MASTERS Show a list of monitored masters and their state.
  • SENTINEL MONITOR Start Sentinel's monitoring. Refer to the Reconfiguring Sentinel at Runtime section for more information.
  • SENTINEL MYID (>= 6.2) Return the ID of the Sentinel instance.
  • SENTINEL PENDING-SCRIPTS This command returns information about pending scripts.
  • SENTINEL REMOVE Stop Sentinel's monitoring. Refer to the Reconfiguring Sentinel at Runtime section for more information.
  • SENTINEL REPLICAS <master name> (>= 5.0) Show a list of replicas for this master, and their state.
  • SENTINEL SENTINELS <master name> Show a list of sentinel instances for this master, and their state.
  • SENTINEL SET Set Sentinel's monitoring configuration. Refer to the Reconfiguring Sentinel at Runtime section for more information.
  • SENTINEL SIMULATE-FAILURE (crash-after-election|crash-after-promotion|help) (>= 3.2) This command simulates different Sentinel crash scenarios.
  • SENTINEL RESET <pattern> This command will reset all the masters with matching name. The pattern argument is a glob-style pattern. The reset process clears any previous state in a master (including a failover in progress), and removes every replica and sentinel already discovered and associated with the master.

For connection management and administration purposes, Sentinel supports the following subset of Redis' commands:

  • ACL (>= 6.2) This command manages the Sentinel Access Control List. For more information refer to the ACL documentation page and the Sentinel Access Control List authentication.
  • AUTH (>= 5.0.1) Authenticate a client connection. For more information refer to the AUTH command and the Configuring Sentinel instances with authentication section.
  • CLIENT This command manages client connections. For more information refer to its subcommands' pages.
  • COMMAND (>= 6.2) This command returns information about commands. For more information refer to the COMMAND command and its various subcommands.
  • HELLO (>= 6.0) Switch the connection's protocol. For more information refer to the HELLO command.
  • INFO Return information and statistics about the Sentinel server. For more information see the INFO command.
  • PING This command simply returns PONG.
  • ROLE This command returns the string "sentinel" and a list of monitored masters. For more information refer to the ROLE command.
  • SHUTDOWN Shut down the Sentinel instance.

Lastly, Sentinel also supports the SUBSCRIBE, UNSUBSCRIBE, PSUBSCRIBE and PUNSUBSCRIBE commands. Refer to the Pub/Sub Messages section for more details.

Reconfiguring Sentinel at Runtime

Starting with Redis version 2.8.4, Sentinel provides an API in order to add, remove, or change the configuration of a given master. Note that if you have multiple sentinels you should apply the changes to all to your instances for Redis Sentinel to work properly. This means that changing the configuration of a single Sentinel does not automatically propagate the changes to the other Sentinels in the network.

The following is a list of SENTINEL subcommands used in order to update the configuration of a Sentinel instance.

  • SENTINEL MONITOR <name> <ip> <port> <quorum> This command tells the Sentinel to start monitoring a new master with the specified name, ip, port, and quorum. It is identical to the sentinel monitor configuration directive in sentinel.conf configuration file, with the difference that you can't use a hostname in as ip, but you need to provide an IPv4 or IPv6 address.
  • SENTINEL REMOVE <name> is used in order to remove the specified master: the master will no longer be monitored, and will totally be removed from the internal state of the Sentinel, so it will no longer listed by SENTINEL masters and so forth.
  • SENTINEL SET <name> [<option> <value> ...] The SET command is very similar to the CONFIG SET command of Redis, and is used in order to change configuration parameters of a specific master. Multiple option / value pairs can be specified (or none at all). All the configuration parameters that can be configured via sentinel.conf are also configurable using the SET command.

The following is an example of SENTINEL SET command in order to modify the down-after-milliseconds configuration of a master called objects-cache:

SENTINEL SET objects-cache-master down-after-milliseconds 1000

As already stated, SENTINEL SET can be used to set all the configuration parameters that are settable in the startup configuration file. Moreover it is possible to change just the master quorum configuration without removing and re-adding the master with SENTINEL REMOVE followed by SENTINEL MONITOR, but simply using:

SENTINEL SET objects-cache-master quorum 5

Note that there is no equivalent GET command since SENTINEL MASTER provides all the configuration parameters in a simple to parse format (as a field/value pairs array).

Starting with Redis version 6.2, Sentinel also allows getting and setting global configuration parameters which were only supported in the configuration file prior to that.

  • SENTINEL CONFIG GET <name> Get the current value of a global Sentinel configuration parameter. The specified name may be a wildcard, similar to the Redis CONFIG GET command.
  • SENTINEL CONFIG SET <name> <value> Set the value of a global Sentinel configuration parameter.

Global parameters that can be manipulated include:

Adding or removing Sentinels

Adding a new Sentinel to your deployment is a simple process because of the auto-discover mechanism implemented by Sentinel. All you need to do is to start the new Sentinel configured to monitor the currently active master. Within 10 seconds the Sentinel will acquire the list of other Sentinels and the set of replicas attached to the master.

If you need to add multiple Sentinels at once, it is suggested to add it one after the other, waiting for all the other Sentinels to already know about the first one before adding the next. This is useful in order to still guarantee that majority can be achieved only in one side of a partition, in the chance failures should happen in the process of adding new Sentinels.

This can be easily achieved by adding every new Sentinel with a 30 seconds delay, and during absence of network partitions.

At the end of the process it is possible to use the command SENTINEL MASTER mastername in order to check if all the Sentinels agree about the total number of Sentinels monitoring the master.

Removing a Sentinel is a bit more complex: Sentinels never forget already seen Sentinels, even if they are not reachable for a long time, since we don't want to dynamically change the majority needed to authorize a failover and the creation of a new configuration number. So in order to remove a Sentinel the following steps should be performed in absence of network partitions:

  1. Stop the Sentinel process of the Sentinel you want to remove.
  2. Send a SENTINEL RESET * command to all the other Sentinel instances (instead of * you can use the exact master name if you want to reset just a single master). One after the other, waiting at least 30 seconds between instances.
  3. Check that all the Sentinels agree about the number of Sentinels currently active, by inspecting the output of SENTINEL MASTER mastername of every Sentinel.

Removing the old master or unreachable replicas

Sentinels never forget about replicas of a given master, even when they are unreachable for a long time. This is useful, because Sentinels should be able to correctly reconfigure a returning replica after a network partition or a failure event.

Moreover, after a failover, the failed over master is virtually added as a replica of the new master, this way it will be reconfigured to replicate with the new master as soon as it will be available again.

However sometimes you want to remove a replica (that may be the old master) forever from the list of replicas monitored by Sentinels.

In order to do this, you need to send a SENTINEL RESET mastername command to all the Sentinels: they'll refresh the list of replicas within the next 10 seconds, only adding the ones listed as correctly replicating from the current master INFO output.

Pub/Sub messages

A client can use a Sentinel as a Redis-compatible Pub/Sub server (but you can't use PUBLISH) in order to SUBSCRIBE or PSUBSCRIBE to channels and get notified about specific events.

The channel name is the same as the name of the event. For instance the channel named +sdown will receive all the notifications related to instances entering an SDOWN (SDOWN means the instance is no longer reachable from the point of view of the Sentinel you are querying) condition.

To get all the messages simply subscribe using PSUBSCRIBE *.

The following is a list of channels and message formats you can receive using this API. The first word is the channel / event name, the rest is the format of the data.

Note: where instance details is specified it means that the following arguments are provided to identify the target instance:

<instance-type> <name> <ip> <port> @ <master-name> <master-ip> <master-port>

The part identifying the master (from the @ argument to the end) is optional and is only specified if the instance is not a master itself.

  • +reset-master <instance details> -- The master was reset.
  • +slave <instance details> -- A new replica was detected and attached.
  • +failover-state-reconf-slaves <instance details> -- Failover state changed to reconf-slaves state.
  • +failover-detected <instance details> -- A failover started by another Sentinel or any other external entity was detected (An attached replica turned into a master).
  • +slave-reconf-sent <instance details> -- The leader sentinel sent the REPLICAOF command to this instance in order to reconfigure it for the new replica.
  • +slave-reconf-inprog <instance details> -- The replica being reconfigured showed to be a replica of the new master ip:port pair, but the synchronization process is not yet complete.
  • +slave-reconf-done <instance details> -- The replica is now synchronized with the new master.
  • -dup-sentinel <instance details> -- One or more sentinels for the specified master were removed as duplicated (this happens for instance when a Sentinel instance is restarted).
  • +sentinel <instance details> -- A new sentinel for this master was detected and attached.
  • +sdown <instance details> -- The specified instance is now in Subjectively Down state.
  • -sdown <instance details> -- The specified instance is no longer in Subjectively Down state.
  • +odown <instance details> -- The specified instance is now in Objectively Down state.
  • -odown <instance details> -- The specified instance is no longer in Objectively Down state.
  • +new-epoch <instance details> -- The current epoch was updated.
  • +try-failover <instance details> -- New failover in progress, waiting to be elected by the majority.
  • +elected-leader <instance details> -- Won the election for the specified epoch, can do the failover.
  • +failover-state-select-slave <instance details> -- New failover state is select-slave: we are trying to find a suitable replica for promotion.
  • no-good-slave <instance details> -- There is no good replica to promote. Currently we'll try after some time, but probably this will change and the state machine will abort the failover at all in this case.
  • selected-slave <instance details> -- We found the specified good replica to promote.
  • failover-state-send-slaveof-noone <instance details> -- We are trying to reconfigure the promoted replica as master, waiting for it to switch.
  • failover-end-for-timeout <instance details> -- The failover terminated for timeout, replicas will eventually be configured to replicate with the new master anyway.
  • failover-end <instance details> -- The failover terminated with success. All the replicas appears to be reconfigured to replicate with the new master.
  • switch-master <master name> <oldip> <oldport> <newip> <newport> -- The master new IP and address is the specified one after a configuration change. This is the message most external users are interested in.
  • +tilt -- Tilt mode entered.
  • -tilt -- Tilt mode exited.

Handling of -BUSY state

The -BUSY error is returned by a Redis instance when a Lua script is running for more time than the configured Lua script time limit. When this happens before triggering a fail over Redis Sentinel will try to send a SCRIPT KILL command, that will only succeed if the script was read-only.

If the instance is still in an error condition after this try, it will eventually be failed over.

Replicas priority

Redis instances have a configuration parameter called replica-priority. This information is exposed by Redis replica instances in their INFO output, and Sentinel uses it in order to pick a replica among the ones that can be used in order to failover a master:

  1. If the replica priority is set to 0, the replica is never promoted to master.
  2. Replicas with a lower priority number are preferred by Sentinel.

For example if there is a replica S1 in the same data center of the current master, and another replica S2 in another data center, it is possible to set S1 with a priority of 10 and S2 with a priority of 100, so that if the master fails and both S1 and S2 are available, S1 will be preferred.

For more information about the way replicas are selected, please check the Replica selection and priority section of this documentation.

Sentinel and Redis authentication

When the master is configured to require authentication from clients, as a security measure, replicas need to also be aware of the credentials in order to authenticate with the master and create the master-replica connection used for the asynchronous replication protocol.

Redis Access Control List authentication

Starting with Redis 6, user authentication and permission is managed with the Access Control List (ACL).

In order for Sentinels to connect to Redis server instances when they are configured with ACL, the Sentinel configuration must include the following directives:

sentinel auth-user <master-group-name> <username>
sentinel auth-pass <master-group-name> <password>

Where <username> and <password> are the username and password for accessing the group's instances. These credentials should be provisioned on all of the group's Redis instances with the minimal control permissions. For example:

127.0.0.1:6379> ACL SETUSER sentinel-user ON >somepassword allchannels +multi +slaveof +ping +exec +subscribe +config|rewrite +role +publish +info +client|setname +client|kill +script|kill

Redis password-only authentication

Until Redis 6, authentication is achieved using the following configuration directives:

  • requirepass in the master, in order to set the authentication password, and to make sure the instance will not process requests for non authenticated clients.
  • masterauth in the replicas in order for the replicas to authenticate with the master in order to correctly replicate data from it.

When Sentinel is used, there is not a single master, since after a failover replicas may play the role of masters, and old masters can be reconfigured in order to act as replicas, so what you want to do is to set the above directives in all your instances, both masters and replicas.

This is also usually a sane setup since you don't want to protect data only in the master, having the same data accessible in the replicas.

However, in the uncommon case where you need a replica that is accessible without authentication, you can still do it by setting up a replica priority of zero, to prevent this replica from being promoted to master, and configuring in this replica only the masterauth directive, without using the requirepass directive, so that data will be readable by unauthenticated clients.

In order for Sentinels to connect to Redis server instances when they are configured with requirepass, the Sentinel configuration must include the sentinel auth-pass directive, in the format:

sentinel auth-pass <master-group-name> <password>

Configuring Sentinel instances with authentication

Sentinel instances themselves can be secured by requiring clients to authenticate via the AUTH command. Starting with Redis 6.2, the Access Control List (ACL) is available, whereas previous versions (starting with Redis 5.0.1) support password-only authentication.

Note that Sentinel's authentication configuration should be applied to each of the instances in your deployment, and all instances should use the same configuration. Furthermore, ACL and password-only authentication should not be used together.

Sentinel Access Control List authentication

The first step in securing a Sentinel instance with ACL is preventing any unauthorized access to it. To do that, you'll need to disable the default superuser (or at the very least set it up with a strong password) and create a new one and allow it access to Pub/Sub channels:

127.0.0.1:5000> ACL SETUSER admin ON >admin-password allchannels +@all
OK
127.0.0.1:5000> ACL SETUSER default off
OK

The default user is used by Sentinel to connect to other instances. You can provide the credentials of another superuser with the following configuration directives:

sentinel sentinel-user <username>
sentinel sentinel-pass <password>

Where <username> and <password> are the Sentinel's superuser and password, respectively (e.g. admin and admin-password in the example above).

Lastly, for authenticating incoming client connections, you can create a Sentinel restricted user profile such as the following:

127.0.0.1:5000> ACL SETUSER sentinel-user ON >user-password -@all +auth +client|getname +client|id +client|setname +command +hello +ping +role +sentinel|get-master-addr-by-name +sentinel|master +sentinel|myid +sentinel|replicas +sentinel|sentinels

Refer to the documentation of your Sentinel client of choice for further information.

Sentinel password-only authentication

To use Sentinel with password-only authentication, add the requirepass configuration directive to all your Sentinel instances as follows:

requirepass "your_password_here"

When configured this way, Sentinels will do two things:

  1. A password will be required from clients in order to send commands to Sentinels. This is obvious since this is how such configuration directive works in Redis in general.
  2. Moreover the same password configured to access the local Sentinel, will be used by this Sentinel instance in order to authenticate to all the other Sentinel instances it connects to.

This means that you will have to configure the same requirepass password in all the Sentinel instances. This way every Sentinel can talk with every other Sentinel without any need to configure for each Sentinel the password to access all the other Sentinels, that would be very impractical.

Before using this configuration, make sure your client library can send the AUTH command to Sentinel instances.

Sentinel clients implementation


Sentinel requires explicit client support, unless the system is configured to execute a script that performs a transparent redirection of all the requests to the new master instance (virtual IP or other similar systems). The topic of client libraries implementation is covered in the document Sentinel clients guidelines.

More advanced concepts

In the following sections we'll cover a few details about how Sentinel works, without resorting to implementation details and algorithms that will be covered in the final part of this document.

SDOWN and ODOWN failure state

Redis Sentinel has two different concepts of being down, one is called a Subjectively Down condition (SDOWN) and is a down condition that is local to a given Sentinel instance. Another is called Objectively Down condition (ODOWN) and is reached when enough Sentinels (at least the number configured as the quorum parameter of the monitored master) have an SDOWN condition, and get feedback from other Sentinels using the SENTINEL is-master-down-by-addr command.

From the point of view of a Sentinel an SDOWN condition is reached when it does not receive a valid reply to PING requests for the number of seconds specified in the configuration as is-master-down-after-milliseconds parameter.

An acceptable reply to PING is one of the following:

  • PING replied with +PONG.
  • PING replied with -LOADING error.
  • PING replied with -MASTERDOWN error.

Any other reply (or no reply at all) is considered non valid. However note that a logical master that advertises itself as a replica in the INFO output is considered to be down.

Note that SDOWN requires that no acceptable reply is received for the whole interval configured, so for instance if the interval is 30000 milliseconds (30 seconds) and we receive an acceptable ping reply every 29 seconds, the instance is considered to be working.

SDOWN is not enough to trigger a failover: it only means a single Sentinel believes a Redis instance is not available. To trigger a failover, the ODOWN state must be reached.

To switch from SDOWN to ODOWN no strong consensus algorithm is used, but just a form of gossip: if a given Sentinel gets reports that a master is not working from enough Sentinels in a given time range, the SDOWN is promoted to ODOWN. If this acknowledge is later missing, the flag is cleared.

A more strict authorization that uses an actual majority is required in order to really start the failover, but no failover can be triggered without reaching the ODOWN state.

The ODOWN condition only applies to masters. For other kind of instances Sentinel doesn't require to act, so the ODOWN state is never reached for replicas and other sentinels, but only SDOWN is.

However SDOWN has also semantic implications. For example a replica in SDOWN state is not selected to be promoted by a Sentinel performing a failover.

Sentinels and replicas auto discovery

Sentinels stay connected with other Sentinels in order to reciprocally check the availability of each other, and to exchange messages. However you don't need to configure a list of other Sentinel addresses in every Sentinel instance you run, as Sentinel uses the Redis instances Pub/Sub capabilities in order to discover the other Sentinels that are monitoring the same masters and replicas.

This feature is implemented by sending hello messages into the channel named __sentinel__:hello.

Similarly you don't need to configure what is the list of the replicas attached to a master, as Sentinel will auto discover this list querying Redis.

  • Every Sentinel publishes a message to every monitored master and replica Pub/Sub channel __sentinel__:hello, every two seconds, announcing its presence with ip, port, runid.
  • Every Sentinel is subscribed to the Pub/Sub channel __sentinel__:hello of every master and replica, looking for unknown sentinels. When new sentinels are detected, they are added as sentinels of this master.
  • Hello messages also include the full current configuration of the master. If the receiving Sentinel has a configuration for a given master which is older than the one received, it updates to the new configuration immediately.
  • Before adding a new sentinel to a master a Sentinel always checks if there is already a sentinel with the same runid or the same address (ip and port pair). In that case all the matching sentinels are removed, and the new added.

Sentinel reconfiguration of instances outside the failover procedure

Even when no failover is in progress, Sentinels will always try to set the current configuration on monitored instances. Specifically:

  • Replicas (according to the current configuration) that claim to be masters, will be configured as replicas to replicate with the current master.
  • Replicas connected to a wrong master, will be reconfigured to replicate with the right master.

For Sentinels to reconfigure replicas, the wrong configuration must be observed for some time, that is greater than the period used to broadcast new configurations.

This prevents Sentinels with a stale configuration (for example because they just rejoined from a partition) will try to change the replicas configuration before receiving an update.

Also note how the semantics of always trying to impose the current configuration makes the failover more resistant to partitions:

  • Masters failed over are reconfigured as replicas when they return available.
  • Replicas partitioned away during a partition are reconfigured once reachable.

The important lesson to remember about this section is: Sentinel is a system where each process will always try to impose the last logical configuration to the set of monitored instances.

Replica selection and priority

When a Sentinel instance is ready to perform a failover, since the master is in ODOWN state and the Sentinel received the authorization to failover from the majority of the Sentinel instances known, a suitable replica needs to be selected.

The replica selection process evaluates the following information about replicas:

  1. Disconnection time from the master.
  2. Replica priority.
  3. Replication offset processed.
  4. Run ID.

A replica that is found to be disconnected from the master for more than ten times the configured master timeout (down-after-milliseconds option), plus the time the master is also not available from the point of view of the Sentinel doing the failover, is considered to be not suitable for the failover and is skipped.

In more rigorous terms, a replica whose the INFO output suggests it has been disconnected from the master for more than:

(down-after-milliseconds * 10) + milliseconds_since_master_is_in_SDOWN_state

Is considered to be unreliable and is disregarded entirely.

The replica selection only considers the replicas that passed the above test, and sorts it based on the above criteria, in the following order.

  1. The replicas are sorted by replica-priority as configured in the redis.conf file of the Redis instance. A lower priority will be preferred.
  2. If the priority is the same, the replication offset processed by the replica is checked, and the replica that received more data from the master is selected.
  3. If multiple replicas have the same priority and processed the same data from the master, a further check is performed, selecting the replica with the lexicographically smaller run ID. Having a lower run ID is not a real advantage for a replica, but is useful in order to make the process of replica selection more deterministic, instead of resorting to select a random replica.

In most cases, replica-priority does not need to be set explicitly so all instances will use the same default value. If there is a particular fail-over preference, replica-priority must be set on all instances, including masters, as a master may become a replica at some future point in time - and it will then need the proper replica-priority settings.

A Redis instance can be configured with a special replica-priority of zero in order to be never selected by Sentinels as the new master. However a replica configured in this way will still be reconfigured by Sentinels in order to replicate with the new master after a failover, the only difference is that it will never become a master itself.

Algorithms and internals

In the following sections we will explore the details of Sentinel behavior. It is not strictly needed for users to be aware of all the details, but a deep understanding of Sentinel may help to deploy and operate Sentinel in a more effective way.

Quorum

The previous sections showed that every master monitored by Sentinel is associated to a configured quorum. It specifies the number of Sentinel processes that need to agree about the unreachability or error condition of the master in order to trigger a failover.

However, after the failover is triggered, in order for the failover to actually be performed, at least a majority of Sentinels must authorize the Sentinel to failover. Sentinel never performs a failover in the partition where a minority of Sentinels exist.

Let's try to make things a bit more clear:

  • Quorum: the number of Sentinel processes that need to detect an error condition in order for a master to be flagged as ODOWN.
  • The failover is triggered by the ODOWN state.
  • Once the failover is triggered, the Sentinel trying to failover is required to ask for authorization to a majority of Sentinels (or more than the majority if the quorum is set to a number greater than the majority).

The difference may seem subtle but is actually quite simple to understand and use. For example if you have 5 Sentinel instances, and the quorum is set to 2, a failover will be triggered as soon as 2 Sentinels believe that the master is not reachable, however one of the two Sentinels will be able to failover only if it gets authorization at least from 3 Sentinels.

If instead the quorum is configured to 5, all the Sentinels must agree about the master error condition, and the authorization from all Sentinels is required in order to failover.

This means that the quorum can be used to tune Sentinel in two ways:

  1. If a quorum is set to a value smaller than the majority of Sentinels we deploy, we are basically making Sentinel more sensitive to master failures, triggering a failover as soon as even just a minority of Sentinels is no longer able to talk with the master.
  2. If a quorum is set to a value greater than the majority of Sentinels, we are making Sentinel able to failover only when there are a very large number (larger than majority) of well connected Sentinels which agree about the master being down.

Configuration epochs

Sentinels require to get authorizations from a majority in order to start a failover for a few important reasons:

When a Sentinel is authorized, it gets a unique configuration epoch for the master it is failing over. This is a number that will be used to version the new configuration after the failover is completed. Because a majority agreed that a given version was assigned to a given Sentinel, no other Sentinel will be able to use it. This means that every configuration of every failover is versioned with a unique version. We'll see why this is so important.

Moreover Sentinels have a rule: if a Sentinel voted another Sentinel for the failover of a given master, it will wait some time to try to failover the same master again. This delay is the 2 * failover-timeout you can configure in sentinel.conf. This means that Sentinels will not try to failover the same master at the same time, the first to ask to be authorized will try, if it fails another will try after some time, and so forth.

Redis Sentinel guarantees the liveness property that if a majority of Sentinels are able to talk, eventually one will be authorized to failover if the master is down.

Redis Sentinel also guarantees the safety property that every Sentinel will failover the same master using a different configuration epoch.

Configuration propagation

Once a Sentinel is able to failover a master successfully, it will start to broadcast the new configuration so that the other Sentinels will update their information about a given master.

For a failover to be considered successful, it requires that the Sentinel was able to send the REPLICAOF NO ONE command to the selected replica, and that the switch to master was later observed in the INFO output of the master.

At this point, even if the reconfiguration of the replicas is in progress, the failover is considered to be successful, and all the Sentinels are required to start reporting the new configuration.

The way a new configuration is propagated is the reason why we need that every Sentinel failover is authorized with a different version number (configuration epoch).

Every Sentinel continuously broadcast its version of the configuration of a master using Redis Pub/Sub messages, both in the master and all the replicas. At the same time all the Sentinels wait for messages to see what is the configuration advertised by the other Sentinels.

Configurations are broadcast in the __sentinel__:hello Pub/Sub channel.

Because every configuration has a different version number, the greater version always wins over smaller versions.

So for example the configuration for the master mymaster start with all the Sentinels believing the master is at 192.168.1.50:6379. This configuration has version 1. After some time a Sentinel is authorized to failover with version 2. If the failover is successful, it will start to broadcast a new configuration, let's say 192.168.1.50:9000, with version 2. All the other instances will see this configuration and will update their configuration accordingly, since the new configuration has a greater version.

This means that Sentinel guarantees a second liveness property: a set of Sentinels that are able to communicate will all converge to the same configuration with the higher version number.

Basically if the net is partitioned, every partition will converge to the higher local configuration. In the special case of no partitions, there is a single partition and every Sentinel will agree about the configuration.

Consistency under partitions

Redis Sentinel configurations are eventually consistent, so every partition will converge to the higher configuration available. However in a real-world system using Sentinel there are three different players:

  • Redis instances.
  • Sentinel instances.
  • Clients.

In order to define the behavior of the system we have to consider all three.

The following is a simple network where there are 3 nodes, each running a Redis instance, and a Sentinel instance:

            +-------------+
            | Sentinel 1  |----- Client A
            | Redis 1 (M) |
            +-------------+
                    |
                    |
+-------------+     |          +------------+
| Sentinel 2  |-----+-- // ----| Sentinel 3 |----- Client B
| Redis 2 (S) |                | Redis 3 (M)|
+-------------+                +------------+

In this system the original state was that Redis 3 was the master, while Redis 1 and 2 were replicas. A partition occurred isolating the old master. Sentinels 1 and 2 started a failover promoting Sentinel 1 as the new master.

The Sentinel properties guarantee that Sentinel 1 and 2 now have the new configuration for the master. However Sentinel 3 has still the old configuration since it lives in a different partition.

We know that Sentinel 3 will get its configuration updated when the network partition will heal, however what happens during the partition if there are clients partitioned with the old master?

Clients will be still able to write to Redis 3, the old master. When the partition will rejoin, Redis 3 will be turned into a replica of Redis 1, and all the data written during the partition will be lost.

Depending on your configuration you may want or not that this scenario happens:

  • If you are using Redis as a cache, it could be handy that Client B is still able to write to the old master, even if its data will be lost.
  • If you are using Redis as a store, this is not good and you need to configure the system in order to partially prevent this problem.

Since Redis is asynchronously replicated, there is no way to totally prevent data loss in this scenario, however you can bound the divergence between Redis 3 and Redis 1 using the following Redis configuration option:

min-replicas-to-write 1
min-replicas-max-lag 10

With the above configuration (please see the self-commented redis.conf example in the Redis distribution for more information) a Redis instance, when acting as a master, will stop accepting writes if it can't write to at least 1 replica. Since replication is asynchronous not being able to write actually means that the replica is either disconnected, or is not sending us asynchronous acknowledges for more than the specified max-lag number of seconds.

Using this configuration the Redis 3 in the above example will become unavailable after 10 seconds. When the partition heals, the Sentinel 3 configuration will converge to the new one, and Client B will be able to fetch a valid configuration and continue.

In general Redis + Sentinel as a whole are an eventually consistent system where the merge function is last failover wins, and the data from old masters are discarded to replicate the data of the current master, so there is always a window for losing acknowledged writes. This is due to Redis asynchronous replication and the discarding nature of the "virtual" merge function of the system. Note that this is not a limitation of Sentinel itself, and if you orchestrate the failover with a strongly consistent replicated state machine, the same properties will still apply. There are only two ways to avoid losing acknowledged writes:

  1. Use synchronous replication (and a proper consensus algorithm to run a replicated state machine).
  2. Use an eventually consistent system where different versions of the same object can be merged.

Redis currently is not able to use any of the above systems, and is currently outside the development goals. However there are proxies implementing solution "2" on top of Redis stores such as SoundCloud Roshi, or Netflix Dynomite.

Sentinel persistent state

Sentinel state is persisted in the sentinel configuration file. For example every time a new configuration is received, or created (leader Sentinels), for a master, the configuration is persisted on disk together with the configuration epoch. This means that it is safe to stop and restart Sentinel processes.

TILT mode

Redis Sentinel is heavily dependent on the computer time: for instance in order to understand if an instance is available it remembers the time of the latest successful reply to the PING command, and compares it with the current time to understand how old it is.

However if the computer time changes in an unexpected way, or if the computer is very busy, or the process blocked for some reason, Sentinel may start to behave in an unexpected way.

The TILT mode is a special "protection" mode that a Sentinel can enter when something odd is detected that can lower the reliability of the system. The Sentinel timer interrupt is normally called 10 times per second, so we expect that more or less 100 milliseconds will elapse between two calls to the timer interrupt.

What a Sentinel does is to register the previous time the timer interrupt was called, and compare it with the current call: if the time difference is negative or unexpectedly big (2 seconds or more) the TILT mode is entered (or if it was already entered the exit from the TILT mode postponed).

When in TILT mode the Sentinel will continue to monitor everything, but:

  • It stops acting at all.
  • It starts to reply negatively to SENTINEL is-master-down-by-addr requests as the ability to detect a failure is no longer trusted.

If everything appears to be normal for 30 second, the TILT mode is exited.

In the Sentinel TILT mode, if we send the INFO command, we could get the following response:

$ redis-cli -p 26379
127.0.0.1:26379> info
(Other information from Sentinel server skipped.)

# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=127.0.0.1:6379,slaves=0,sentinels=1

The field "sentinel_tilt_since_seconds" indicates how many seconds the Sentinel already is in the TILT mode. If it is not in TILT mode, the value will be -1.

Note that in some ways TILT mode could be replaced using the monotonic clock API that many kernels offer. However it is not still clear if this is a good solution since the current system avoids issues in case the process is just suspended or not executed by the scheduler for a long time.

A note about the word slave used in this man page: Starting with Redis 5, if not for backward compatibility, the Redis project no longer uses the word slave. Unfortunately in this command the word slave is part of the protocol, so we'll be able to remove such occurrences only when this API will be naturally deprecated.

8 - Redis keyspace notifications

Monitor changes to Redis keys and values in real time

Keyspace notifications allow clients to subscribe to Pub/Sub channels in order to receive events affecting the Redis data set in some way.

Examples of events that can be received are:

  • All the commands affecting a given key.
  • All the keys receiving an LPUSH operation.
  • All the keys expiring in the database 0.

Note: Redis Pub/Sub is fire and forget that is, if your Pub/Sub client disconnects, and reconnects later, all the events delivered during the time the client was disconnected are lost.

Type of events

Keyspace notifications are implemented by sending two distinct types of events for every operation affecting the Redis data space. For instance a DEL operation targeting the key named mykey in database 0 will trigger the delivering of two messages, exactly equivalent to the following two PUBLISH commands:

PUBLISH __keyspace@0__:mykey del
PUBLISH __keyevent@0__:del mykey

The first channel listens to all the events targeting the key mykey and the other channel listens only to del operation events on the key mykey

The first kind of event, with keyspace prefix in the channel is called a Key-space notification, while the second, with the keyevent prefix, is called a Key-event notification.

In the previous example a del event was generated for the key mykey resulting in two messages:

  • The Key-space channel receives as message the name of the event.
  • The Key-event channel receives as message the name of the key.

It is possible to enable only one kind of notification in order to deliver just the subset of events we are interested in.

Configuration

By default keyspace event notifications are disabled because while not very sensible the feature uses some CPU power. Notifications are enabled using the notify-keyspace-events of redis.conf or via the CONFIG SET.

Setting the parameter to the empty string disables notifications. In order to enable the feature a non-empty string is used, composed of multiple characters, where every character has a special meaning according to the following table:

K     Keyspace events, published with __keyspace@<db>__ prefix.
E     Keyevent events, published with __keyevent@<db>__ prefix.
g     Generic commands (non-type specific) like DEL, EXPIRE, RENAME, ...
$     String commands
l     List commands
s     Set commands
h     Hash commands
z     Sorted set commands
t     Stream commands
d     Module key type events
x     Expired events (events generated every time a key expires)
e     Evicted events (events generated when a key is evicted for maxmemory)
m     Key miss events (events generated when a key that doesn't exist is accessed)
n     New key events (Note: not included in the 'A' class)
A     Alias for "g$lshztxed", so that the "AKE" string means all the events except "m".

At least K or E should be present in the string, otherwise no event will be delivered regardless of the rest of the string.

For instance to enable just Key-space events for lists, the configuration parameter must be set to Kl, and so forth.

The string KEA can be used to enable every possible event.

Events generated by different commands

Different commands generate different kind of events according to the following list.

  • DEL generates a del event for every deleted key.
  • RENAME generates two events, a rename_from event for the source key, and a rename_to event for the destination key.
  • MOVE generates two events, a move_from event for the source key, and a move_to event for the destination key.
  • COPY generates a copy_to event.
  • MIGRATE generates a del event if the source key is removed.
  • RESTORE generates a restore event for the key.
  • EXPIRE and all its variants (PEXPIRE, EXPIREAT, PEXPIREAT) generate an expire event when called with a positive timeout (or a future timestamp). Note that when these commands are called with a negative timeout value or timestamp in the past, the key is deleted and only a del event is generated instead.
  • SORT generates a sortstore event when STORE is used to set a new key. If the resulting list is empty, and the STORE option is used, and there was already an existing key with that name, the result is that the key is deleted, so a del event is generated in this condition.
  • SET and all its variants (SETEX, SETNX,GETSET) generate set events. However SETEX will also generate an expire events.
  • MSET generates a separate set event for every key.
  • SETRANGE generates a setrange event.
  • INCR, DECR, INCRBY, DECRBY commands all generate incrby events.
  • INCRBYFLOAT generates an incrbyfloat events.
  • APPEND generates an append event.
  • LPUSH and LPUSHX generates a single lpush event, even in the variadic case.
  • RPUSH and RPUSHX generates a single rpush event, even in the variadic case.
  • RPOP generates an rpop event. Additionally a del event is generated if the key is removed because the last element from the list was popped.
  • LPOP generates an lpop event. Additionally a del event is generated if the key is removed because the last element from the list was popped.
  • LINSERT generates an linsert event.
  • LSET generates an lset event.
  • LREM generates an lrem event, and additionally a del event if the resulting list is empty and the key is removed.
  • LTRIM generates an ltrim event, and additionally a del event if the resulting list is empty and the key is removed.
  • RPOPLPUSH and BRPOPLPUSH generate an rpop event and an lpush event. In both cases the order is guaranteed (the lpush event will always be delivered after the rpop event). Additionally a del event will be generated if the resulting list is zero length and the key is removed.
  • LMOVE and BLMOVE generate an lpop/rpop event (depending on the wherefrom argument) and an lpush/rpush event (depending on the whereto argument). In both cases the order is guaranteed (the lpush/rpush event will always be delivered after the lpop/rpop event). Additionally a del event will be generated if the resulting list is zero length and the key is removed.
  • HSET, HSETNX and HMSET all generate a single hset event.
  • HINCRBY generates an hincrby event.
  • HINCRBYFLOAT generates an hincrbyfloat event.
  • HDEL generates a single hdel event, and an additional del event if the resulting hash is empty and the key is removed.
  • SADD generates a single sadd event, even in the variadic case.
  • SREM generates a single srem event, and an additional del event if the resulting set is empty and the key is removed.
  • SMOVE generates an srem event for the source key, and an sadd event for the destination key.
  • SPOP generates an spop event, and an additional del event if the resulting set is empty and the key is removed.
  • SINTERSTORE, SUNIONSTORE, SDIFFSTORE generate sinterstore, sunionstore, sdiffstore events respectively. In the special case the resulting set is empty, and the key where the result is stored already exists, a del event is generated since the key is removed.
  • ZINCR generates a zincr event.
  • ZADD generates a single zadd event even when multiple elements are added.
  • ZREM generates a single zrem event even when multiple elements are deleted. When the resulting sorted set is empty and the key is generated, an additional del event is generated.
  • ZREMBYSCORE generates a single zrembyscore event. When the resulting sorted set is empty and the key is generated, an additional del event is generated.
  • ZREMBYRANK generates a single zrembyrank event. When the resulting sorted set is empty and the key is generated, an additional del event is generated.
  • ZDIFFSTORE, ZINTERSTORE and ZUNIONSTORE respectively generate zdiffstore, zinterstore and zunionstore events. In the special case the resulting sorted set is empty, and the key where the result is stored already exists, a del event is generated since the key is removed.
  • XADD generates an xadd event, possibly followed an xtrim event when used with the MAXLEN subcommand.
  • XDEL generates a single xdel event even when multiple entries are deleted.
  • XGROUP CREATE generates an xgroup-create event.
  • XGROUP CREATECONSUMER generates an xgroup-createconsumer event.
  • XGROUP DELCONSUMER generates an xgroup-delconsumer event.
  • XGROUP DESTROY generates an xgroup-destroy event.
  • XGROUP SETID generates an xgroup-setid event.
  • XSETID generates an xsetid event.
  • XTRIM generates an xtrim event.
  • PERSIST generates a persist event if the expiry time associated with key has been successfully deleted.
  • Every time a key with a time to live associated is removed from the data set because it expired, an expired event is generated.
  • Every time a key is evicted from the data set in order to free memory as a result of the maxmemory policy, an evicted event is generated.
  • Every time a new key is added to the data set, a new event is generated.

IMPORTANT all the commands generate events only if the target key is really modified. For instance an SREM deleting a non-existing element from a Set will not actually change the value of the key, so no event will be generated.

If in doubt about how events are generated for a given command, the simplest thing to do is to watch yourself:

$ redis-cli config set notify-keyspace-events KEA
$ redis-cli --csv psubscribe '__key*__:*'
Reading messages... (press Ctrl-C to quit)
"psubscribe","__key*__:*",1

At this point use redis-cli in another terminal to send commands to the Redis server and watch the events generated:

"pmessage","__key*__:*","__keyspace@0__:foo","set"
"pmessage","__key*__:*","__keyevent@0__:set","foo"
...

Timing of expired events

Keys with a time to live associated are expired by Redis in two ways:

  • When the key is accessed by a command and is found to be expired.
  • Via a background system that looks for expired keys in the background, incrementally, in order to be able to also collect keys that are never accessed.

The expired events are generated when a key is accessed and is found to be expired by one of the above systems, as a result there are no guarantees that the Redis server will be able to generate the expired event at the time the key time to live reaches the value of zero.

If no command targets the key constantly, and there are many keys with a TTL associated, there can be a significant delay between the time the key time to live drops to zero, and the time the expired event is generated.

Basically expired events are generated when the Redis server deletes the key and not when the time to live theoretically reaches the value of zero.

Events in a cluster

Every node of a Redis cluster generates events about its own subset of the keyspace as described above. However, unlike regular Pub/Sub communication in a cluster, events' notifications are not broadcasted to all nodes. Put differently, keyspace events are node-specific. This means that to receive all keyspace events of a cluster, clients need to subscribe to each of the nodes.

@history

  • >= 6.0: Key miss events were added.
  • >= 7.0: Event type new added

9 - Redis persistence

How Redis writes data to disk (append-only files, snapshots, etc.)

Persistence refers to the writing of data to durable storage, such as a solid-state disk (SSD). Redis itself provides a range of persistence options:

  • RDB (Redis Database): The RDB persistence performs point-in-time snapshots of your dataset at specified intervals.
  • AOF (Append Only File): The AOF persistence logs every write operation received by the server, that will be played again at server startup, reconstructing the original dataset. Commands are logged using the same format as the Redis protocol itself, in an append-only fashion. Redis is able to rewrite the log in the background when it gets too big.
  • No persistence: If you wish, you can disable persistence completely, if you want your data to just exist as long as the server is running.
  • RDB + AOF: It is possible to combine both AOF and RDB in the same instance. Notice that, in this case, when Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.

The most important thing to understand is the different trade-offs between the RDB and AOF persistence.

RDB advantages

  • RDB is a very compact single-file point-in-time representation of your Redis data. RDB files are perfect for backups. For instance you may want to archive your RDB files every hour for the latest 24 hours, and to save an RDB snapshot every day for 30 days. This allows you to easily restore different versions of the data set in case of disasters.
  • RDB is very good for disaster recovery, being a single compact file that can be transferred to far data centers, or onto Amazon S3 (possibly encrypted).
  • RDB maximizes Redis performances since the only work the Redis parent process needs to do in order to persist is forking a child that will do all the rest. The parent process will never perform disk I/O or alike.
  • RDB allows faster restarts with big datasets compared to AOF.
  • On replicas, RDB supports partial resynchronizations after restarts and failovers.

RDB disadvantages

  • RDB is NOT good if you need to minimize the chance of data loss in case Redis stops working (for example after a power outage). You can configure different save points where an RDB is produced (for instance after at least five minutes and 100 writes against the data set, you can have multiple save points). However you'll usually create an RDB snapshot every five minutes or more, so in case of Redis stopping working without a correct shutdown for any reason you should be prepared to lose the latest minutes of data.
  • RDB needs to fork() often in order to persist on disk using a child process. fork() can be time consuming if the dataset is big, and may result in Redis stopping serving clients for some milliseconds or even for one second if the dataset is very big and the CPU performance is not great. AOF also needs to fork() but less frequently and you can tune how often you want to rewrite your logs without any trade-off on durability.

AOF advantages

  • Using AOF Redis is much more durable: you can have different fsync policies: no fsync at all, fsync every second, fsync at every query. With the default policy of fsync every second, write performance is still great. fsync is performed using a background thread and the main thread will try hard to perform writes when no fsync is in progress, so you can only lose one second worth of writes.
  • The AOF log is an append-only log, so there are no seeks, nor corruption problems if there is a power outage. Even if the log ends with an half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily.
  • Redis is able to automatically rewrite the AOF in background when it gets too big. The rewrite is completely safe as while Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.
  • AOF contains a log of all the operations one after the other in an easy to understand and parse format. You can even easily export an AOF file. For instance even if you've accidentally flushed everything using the FLUSHALL command, as long as no rewrite of the log was performed in the meantime, you can still save your data set just by stopping the server, removing the latest command, and restarting Redis again.

AOF disadvantages

  • AOF files are usually bigger than the equivalent RDB files for the same dataset.
  • AOF can be slower than RDB depending on the exact fsync policy. In general with fsync set to every second performance is still very high, and with fsync disabled it should be exactly as fast as RDB even under high load. Still RDB is able to provide more guarantees about the maximum latency even in the case of a huge write load.

Redis < 7.0

  • AOF can use a lot of memory if there are writes to the database during a rewrite (these are buffered in memory and written to the new AOF at the end).
  • All write commands that arrive during rewrite are written to disk twice.
  • Redis could freeze writing and fsyncing these write commands to the new AOF file at the end of the rewrite.

Ok, so what should I use?

The general indication you should use both persistence methods is if you want a degree of data safety comparable to what PostgreSQL can provide you.

If you care a lot about your data, but still can live with a few minutes of data loss in case of disasters, you can simply use RDB alone.

There are many users using AOF alone, but we discourage it since to have an RDB snapshot from time to time is a great idea for doing database backups, for faster restarts, and in the event of bugs in the AOF engine.

The following sections will illustrate a few more details about the two persistence models.

Snapshotting

By default Redis saves snapshots of the dataset on disk, in a binary file called dump.rdb. You can configure Redis to have it save the dataset every N seconds if there are at least M changes in the dataset, or you can manually call the SAVE or BGSAVE commands.

For example, this configuration will make Redis automatically dump the dataset to disk every 60 seconds if at least 1000 keys changed:

save 60 1000

This strategy is known as snapshotting.

How it works

Whenever Redis needs to dump the dataset to disk, this is what happens:

  • Redis forks. We now have a child and a parent process.

  • The child starts to write the dataset to a temporary RDB file.

  • When the child is done writing the new RDB file, it replaces the old one.

This method allows Redis to benefit from copy-on-write semantics.

Append-only file

Snapshotting is not very durable. If your computer running Redis stops, your power line fails, or you accidentally kill -9 your instance, the latest data written to Redis will be lost. While this may not be a big deal for some applications, there are use cases for full durability, and in these cases Redis snapshotting alone is not a viable option.

The append-only file is an alternative, fully-durable strategy for Redis. It became available in version 1.1.

You can turn on the AOF in your configuration file:

appendonly yes

From now on, every time Redis receives a command that changes the dataset (e.g. SET) it will append it to the AOF. When you restart Redis it will re-play the AOF to rebuild the state.

Since Redis 7.0.0, Redis uses a multi part AOF mechanism. That is, the original single AOF file is split into base file (at most one) and incremental files (there may be more than one). The base file represents an initial (RDB or AOF format) snapshot of the data present when the AOF is rewritten. The incremental files contains incremental changes since the last base AOF file was created. All these files are put in a separate directory and are tracked by a manifest file.

Log rewriting

The AOF gets bigger and bigger as write operations are performed. For example, if you are incrementing a counter 100 times, you'll end up with a single key in your dataset containing the final value, but 100 entries in your AOF. 99 of those entries are not needed to rebuild the current state.

The rewrite is completely safe. While Redis continues appending to the old file, a completely new one is produced with the minimal set of operations needed to create the current data set, and once this second file is ready Redis switches the two and starts appending to the new one.

So Redis supports an interesting feature: it is able to rebuild the AOF in the background without interrupting service to clients. Whenever you issue a BGREWRITEAOF, Redis will write the shortest sequence of commands needed to rebuild the current dataset in memory. If you're using the AOF with Redis 2.2 you'll need to run BGREWRITEAOF from time to time. Since Redis 2.4 is able to trigger log rewriting automatically (see the example configuration file for more information).

Since Redis 7.0.0, when an AOF rewrite is scheduled, The Redis parent process opens a new incremental AOF file to continue writing. The child process executes the rewrite logic and generates a new base AOF. Redis will use a temporary manifest file to track the newly generated base file and incremental file. When they are ready, Redis will perform an atomic replacement operation to make this temporary manifest file take effect. In order to avoid the problem of creating many incremental files in case of repeated failures and retries of an AOF rewrite, Redis introduces an AOF rewrite limiting mechanism to ensure that failed AOF rewrites are retried at a slower and slower rate.

How durable is the append only file?

You can configure how many times Redis will fsync data on disk. There are three options:

  • appendfsync always: fsync every time new commands are appended to the AOF. Very very slow, very safe. Note that the commands are apended to the AOF after a batch of commands from multiple clients or a pipeline are executed, so it means a single write and a single fsync (before sending the replies).
  • appendfsync everysec: fsync every second. Fast enough (since version 2.4 likely to be as fast as snapshotting), and you may lose 1 second of data if there is a disaster.
  • appendfsync no: Never fsync, just put your data in the hands of the Operating System. The faster and less safe method. Normally Linux will flush data every 30 seconds with this configuration, but it's up to the kernel's exact tuning.

The suggested (and default) policy is to fsync every second. It is both fast and relatively safe. The always policy is very slow in practice, but it supports group commit, so if there are multiple parallel writes Redis will try to perform a single fsync operation.

What should I do if my AOF gets truncated?

It is possible the server crashed while writing the AOF file, or the volume where the AOF file is stored was full at the time of writing. When this happens the AOF still contains consistent data representing a given point-in-time version of the dataset (that may be old up to one second with the default AOF fsync policy), but the last command in the AOF could be truncated. The latest major versions of Redis will be able to load the AOF anyway, just discarding the last non well formed command in the file. In this case the server will emit a log like the following:

* Reading RDB preamble from AOF file...
* Reading the remaining AOF tail...
# !!! Warning: short read while loading the AOF file !!!
# !!! Truncating the AOF at offset 439 !!!
# AOF loaded anyway because aof-load-truncated is enabled

You can change the default configuration to force Redis to stop in such cases if you want, but the default configuration is to continue regardless of the fact the last command in the file is not well-formed, in order to guarantee availability after a restart.

Older versions of Redis may not recover, and may require the following steps:

  • Make a backup copy of your AOF file.

  • Fix the original file using the redis-check-aof tool that ships with Redis:

    $ redis-check-aof --fix <filename>
    
  • Optionally use diff -u to check what is the difference between two files.

  • Restart the server with the fixed file.

What should I do if my AOF gets corrupted?

If the AOF file is not just truncated, but corrupted with invalid byte sequences in the middle, things are more complex. Redis will complain at startup and will abort:

* Reading the remaining AOF tail...
# Bad file format reading the append only file: make a backup of your AOF file, then use ./redis-check-aof --fix <filename>

The best thing to do is to run the redis-check-aof utility, initially without the --fix option, then understand the problem, jump to the given offset in the file, and see if it is possible to manually repair the file: The AOF uses the same format of the Redis protocol and is quite simple to fix manually. Otherwise it is possible to let the utility fix the file for us, but in that case all the AOF portion from the invalid part to the end of the file may be discarded, leading to a massive amount of data loss if the corruption happened to be in the initial part of the file.

How it works

Log rewriting uses the same copy-on-write trick already in use for snapshotting. This is how it works:

Redis >= 7.0

  • Redis forks, so now we have a child and a parent process.

  • The child starts writing the new base AOF in a temporary file.

  • The parent opens a new increments AOF file to continue writing updates. If the rewriting fails, the old base and increment files (if there are any) plus this newly opened increment file represent the complete updated dataset, so we are safe.

  • When the child is done rewriting the base file, the parent gets a signal, and uses the newly opened increment file and child generated base file to build a temp manifest, and persist it.

  • Profit! Now Redis does an atomic exchange of the manifest files so that the result of this AOF rewrite takes effect. Redis also cleans up the old base file and any unused increment files.

Redis < 7.0

  • Redis forks, so now we have a child and a parent process.

  • The child starts writing the new AOF in a temporary file.

  • The parent accumulates all the new changes in an in-memory buffer (but at the same time it writes the new changes in the old append-only file, so if the rewriting fails, we are safe).

  • When the child is done rewriting the file, the parent gets a signal, and appends the in-memory buffer at the end of the file generated by the child.

  • Now Redis atomically renames the new file into the old one, and starts appending new data into the new file.

How I can switch to AOF, if I'm currently using dump.rdb snapshots?

There is a different procedure to do this in version 2.0 and later versions, as you can guess it's simpler since Redis 2.2 and does not require a restart at all.

Redis >= 2.2

  • Make a backup of your latest dump.rdb file.
  • Transfer this backup to a safe place.
  • Issue the following two commands:
  • redis-cli config set appendonly yes
  • redis-cli config set save ""
  • Make sure your database contains the same number of keys it contained.
  • Make sure writes are appended to the append only file correctly.

The first CONFIG command enables the Append Only File persistence.

The second CONFIG command is used to turn off snapshotting persistence. This is optional, if you wish you can take both the persistence methods enabled.

IMPORTANT: remember to edit your redis.conf to turn on the AOF, otherwise when you restart the server the configuration changes will be lost and the server will start again with the old configuration.

Redis 2.0

  • Make a backup of your latest dump.rdb file.
  • Transfer this backup into a safe place.
  • Stop all the writes against the database!
  • Issue a redis-cli BGREWRITEAOF. This will create the append only file.
  • Stop the server when Redis finished generating the AOF dump.
  • Edit redis.conf end enable append only file persistence.
  • Restart the server.
  • Make sure that your database contains the same number of keys it contained before the switch.
  • Make sure that writes are appended to the append only file correctly.

Interactions between AOF and RDB persistence

Redis >= 2.4 makes sure to avoid triggering an AOF rewrite when an RDB snapshotting operation is already in progress, or allowing a BGSAVE while the AOF rewrite is in progress. This prevents two Redis background processes from doing heavy disk I/O at the same time.

When snapshotting is in progress and the user explicitly requests a log rewrite operation using BGREWRITEAOF the server will reply with an OK status code telling the user the operation is scheduled, and the rewrite will start once the snapshotting is completed.

In the case both AOF and RDB persistence are enabled and Redis restarts the AOF file will be used to reconstruct the original dataset since it is guaranteed to be the most complete.

Backing up Redis data

Before starting this section, make sure to read the following sentence: Make Sure to Backup Your Database. Disks break, instances in the cloud disappear, and so forth: no backups means huge risk of data disappearing into /dev/null.

Redis is very data backup friendly since you can copy RDB files while the database is running: the RDB is never modified once produced, and while it gets produced it uses a temporary name and is renamed into its final destination atomically using rename(2) only when the new snapshot is complete.

This means that copying the RDB file is completely safe while the server is running. This is what we suggest:

  • Create a cron job in your server creating hourly snapshots of the RDB file in one directory, and daily snapshots in a different directory.
  • Every time the cron script runs, make sure to call the find command to make sure too old snapshots are deleted: for instance you can take hourly snapshots for the latest 48 hours, and daily snapshots for one or two months. Make sure to name the snapshots with date and time information.
  • At least one time every day make sure to transfer an RDB snapshot outside your data center or at least outside the physical machine running your Redis instance.

Backing up AOF persistence

If you run a Redis instance with only AOF persistence enabled, you can still perform backups. Since Redis 7.0.0, AOF files are split into multiple files which reside in a single directory determined by the appenddirname configuration. During normal operation all you need to do is copy/tar the files in this directory to achieve a backup. However, if this is done during a rewrite, you might end up with an invalid backup. To work around this you must disable AOF rewrites during the backup:

  1. Turn off automatic rewrites with
    CONFIG SET auto-aof-rewrite-percentage 0
    Make sure you don't manually start a rewrite (using BGREWRITEAOF) during this time.
  2. Check there's no current rewrite in progress using
    INFO persistence
    and verifying aof_rewrite_in_progress is 0. If it's 1, then you'll need to wait for the rewrite to complete.
  3. Now you can safely copy the files in the appenddirname directory.
  4. Re-enable rewrites when done:
    CONFIG SET auto-aof-rewrite-percentage <prev-value>

Note: If you want to minimize the time AOF rewrites are disabled you may create hard links to the files in appenddirname (in step 3 above) and then re-enable rewites (step 4) after the hard links are created. Now you can copy/tar the hardlinks and delete them when done. This works because Redis guarantees that it only appends to files in this directory, or completely replaces them if necessary, so the content should be consistent at any given point in time.

Note: If you want to handle the case of the server being restarted during the backup and make sure no rewrite will automatically start after the restart you can change step 1 above to also persist the updated configuration via CONFIG REWRITE. Just make sure to re-enable automatic rewrites when done (step 4) and persist it with another CONFIG REWRITE.

Prior to version 7.0.0 backing up the AOF file can be done simply by copying the aof file (like backing up the RDB snapshot). The file may lack the final part but Redis will still be able to load it (see the previous sections about truncated AOF files).

  1. Turn off automatic rewrites with
    CONFIG SET auto-aof-rewrite-percentage 0
    Make sure you don't manually start a rewrite (using BGREWRITEAOF) during this time.
  2. Check there's no current rewrite in progress using
    INFO persistence
    and verifying aof_rewrite_in_progress is 0. If it's 1, then you'll need to wait for the rewrite to complete.
  3. Now you can safely copy the files in the appenddirname directory.
  4. Re-enable rewrites when done:
    CONFIG SET auto-aof-rewrite-percentage <prev-value>

Note: If you want to minimize the time AOF rewrites are disabled you may create hard links to the files in appenddirname (in step 3 above) and then re-enable rewites (step 4) after the hard links are created. Now you can copy/tar the hardlinks and delete them when done. This works because Redis guarantees that it only appends to files in this directory, or completely replaces them if necessary, so the content should be consistent at any given point in time.

Note: If you want to handle the case of the server being restarted during the backup and make sure no rewrite will automatically start after the restart you can change step 1 above to also persist the updated configuration via CONFIG REWRITE. Just make sure to re-enable automatic rewrites when done (step 4) and persist it with another CONFIG REWRITE.

Prior to version 7.0.0 backing up the AOF file can be done simply by copying the aof file (like backing up the RDB snapshot). The file may lack the final part but Redis will still be able to load it (see the previous sections about truncated AOF files).

Disaster recovery

Disaster recovery in the context of Redis is basically the same story as backups, plus the ability to transfer those backups in many different external data centers. This way data is secured even in the case of some catastrophic event affecting the main data center where Redis is running and producing its snapshots.

We'll review the most interesting disaster recovery techniques that don't have too high costs.

  • Amazon S3 and other similar services are a good way for implementing your disaster recovery system. Simply transfer your daily or hourly RDB snapshot to S3 in an encrypted form. You can encrypt your data using gpg -c (in symmetric encryption mode). Make sure to store your password in many different safe places (for instance give a copy to the most important people of your organization). It is recommended to use multiple storage services for improved data safety.
  • Transfer your snapshots using SCP (part of SSH) to far servers. This is a fairly simple and safe route: get a small VPS in a place that is very far from you, install ssh there, and generate an ssh client key without passphrase, then add it in the authorized_keys file of your small VPS. You are ready to transfer backups in an automated fashion. Get at least two VPS in two different providers for best results.

It is important to understand that this system can easily fail if not implemented in the right way. At least, make absolutely sure that after the transfer is completed you are able to verify the file size (that should match the one of the file you copied) and possibly the SHA1 digest, if you are using a VPS.

You also need some kind of independent alert system if the transfer of fresh backups is not working for some reason.

10 - Redis pipelining

How to optimize round-trip times by batching Redis commands

Redis pipelining is a technique for improving performance by issuing multiple commands at once without waiting for the response to each individual command. Pipelining is supported by most Redis clients. This document describes the problem that pipelining is designed to solve and how pipelining works in Redis.

Request/Response protocols and round-trip time (RTT)

Redis is a TCP server using the client-server model and what is called a Request/Response protocol.

This means that usually a request is accomplished with the following steps:

  • The client sends a query to the server, and reads from the socket, usually in a blocking way, for the server response.
  • The server processes the command and sends the response back to the client.

So for instance a four commands sequence is something like this:

  • Client: INCR X
  • Server: 1
  • Client: INCR X
  • Server: 2
  • Client: INCR X
  • Server: 3
  • Client: INCR X
  • Server: 4

Clients and Servers are connected via a network link. Such a link can be very fast (a loopback interface) or very slow (a connection established over the Internet with many hops between the two hosts). Whatever the network latency is, it takes time for the packets to travel from the client to the server, and back from the server to the client to carry the reply.

This time is called RTT (Round Trip Time). It's easy to see how this can affect performance when a client needs to perform many requests in a row (for instance adding many elements to the same list, or populating a database with many keys). For instance if the RTT time is 250 milliseconds (in the case of a very slow link over the Internet), even if the server is able to process 100k requests per second, we'll be able to process at max four requests per second.

If the interface used is a loopback interface, the RTT is much shorter, typically sub-millisecond, but even this will add up to a lot if you need to perform many writes in a row.

Fortunately there is a way to improve this use case.

Redis Pipelining

A Request/Response server can be implemented so that it is able to process new requests even if the client hasn't already read the old responses. This way it is possible to send multiple commands to the server without waiting for the replies at all, and finally read the replies in a single step.

This is called pipelining, and is a technique widely in use for many decades. For instance many POP3 protocol implementations already support this feature, dramatically speeding up the process of downloading new emails from the server.

Redis has supported pipelining since its early days, so whatever version you are running, you can use pipelining with Redis. This is an example using the raw netcat utility:

$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG

This time we don't pay the cost of RTT for every call, but just once for the three commands.

To be explicit, with pipelining the order of operations of our very first example will be the following:

  • Client: INCR X
  • Client: INCR X
  • Client: INCR X
  • Client: INCR X
  • Server: 1
  • Server: 2
  • Server: 3
  • Server: 4

IMPORTANT NOTE: While the client sends commands using pipelining, the server will be forced to queue the replies, using memory. So if you need to send a lot of commands with pipelining, it is better to send them as batches each containing a reasonable number, for instance 10k commands, read the replies, and then send another 10k commands again, and so forth. The speed will be nearly the same, but the additional memory used will be at most the amount needed to queue the replies for these 10k commands.

It's not just a matter of RTT

Pipelining is not just a way to reduce the latency cost associated with the round trip time, it actually greatly improves the number of operations you can perform per second in a given Redis server. This is because without using pipelining, serving each command is very cheap from the point of view of accessing the data structures and producing the reply, but it is very costly from the point of view of doing the socket I/O. This involves calling the read() and write() syscall, that means going from user land to kernel land. The context switch is a huge speed penalty.

When pipelining is used, many commands are usually read with a single read() system call, and multiple replies are delivered with a single write() system call. Because of this, the number of total queries performed per second initially increases almost linearly with longer pipelines, and eventually reaches 10 times the baseline obtained without pipelining, as you can see from the following graph:

Pipeline size and IOPs

A real world code example

In the following benchmark we'll use the Redis Ruby client, supporting pipelining, to test the speed improvement due to pipelining:

require 'rubygems'
require 'redis'

def bench(descr)
  start = Time.now
  yield
  puts "#{descr} #{Time.now - start} seconds"
end

def without_pipelining
  r = Redis.new
  10_000.times do
    r.ping
  end
end

def with_pipelining
  r = Redis.new
  r.pipelined do
    10_000.times do
      r.ping
    end
  end
end

bench('without pipelining') do
  without_pipelining
end
bench('with pipelining') do
  with_pipelining
end

Running the above simple script yields the following figures on my Mac OS X system, running over the loopback interface, where pipelining will provide the smallest improvement as the RTT is already pretty low:

without pipelining 1.185238 seconds
with pipelining 0.250783 seconds

As you can see, using pipelining, we improved the transfer by a factor of five.

Pipelining vs Scripting

Using Redis scripting, available since Redis 2.6, a number of use cases for pipelining can be addressed more efficiently using scripts that perform a lot of the work needed at the server side. A big advantage of scripting is that it is able to both read and write data with minimal latency, making operations like read, compute, write very fast (pipelining can't help in this scenario since the client needs the reply of the read command before it can call the write command).

Sometimes the application may also want to send EVAL or EVALSHA commands in a pipeline. This is entirely possible and Redis explicitly supports it with the SCRIPT LOAD command (it guarantees that EVALSHA can be called without the risk of failing).

Appendix: Why are busy loops slow even on the loopback interface?

Even with all the background covered in this page, you may still wonder why a Redis benchmark like the following (in pseudo code), is slow even when executed in the loopback interface, when the server and the client are running in the same physical machine:

FOR-ONE-SECOND:
    Redis.SET("foo","bar")
END

After all, if both the Redis process and the benchmark are running in the same box, isn't it just copying messages in memory from one place to another without any actual latency or networking involved?

The reason is that processes in a system are not always running, actually it is the kernel scheduler that lets the process run. So, for instance, when the benchmark is allowed to run, it reads the reply from the Redis server (related to the last command executed), and writes a new command. The command is now in the loopback interface buffer, but in order to be read by the server, the kernel should schedule the server process (currently blocked in a system call) to run, and so forth. So in practical terms the loopback interface still involves network-like latency, because of how the kernel scheduler works.

Basically a busy loop benchmark is the silliest thing that can be done when metering performances on a networked server. The wise thing is just avoiding benchmarking in this way.

11 - Redis programmability

Extending Redis with Lua and Redis Functions

Redis provides a programming interface that lets you execute custom scripts on the server itself. In Redis 7 and beyond, you can use Redis Functions to manage and run your scripts. In Redis 6.2 and below, you use Lua scripting with the EVAL command to program the server.

Background

Redis is, by definition, a "domain-specific language for abstract data types". The language that Redis speaks consists of its commands. Most the commands specialize at manipulating core data types in different ways. In many cases, these commands provide all the functionality that a developer requires for managing application data in Redis.

The term programmability in Redis means having the ability to execute arbitrary user-defined logic by the server. We refer to such pieces of logic as scripts. In our case, scripts enable processing the data where it lives, a.k.a data locality. Furthermore, the responsible embedding of programmatic workflows in the Redis server can help in reducing network traffic and improving overall performance. Developers can use this capability for implementing robust, application-specific APIs. Such APIs can encapsulate business logic and maintain a data model across multiple keys and different data structures.

User scripts are executed in Redis by an embedded, sandboxed scripting engine. Presently, Redis supports a single scripting engine, the Lua 5.1 interpreter.

Please refer to the Redis Lua API Reference page for complete documentation.

Running scripts

Redis provides two means for running scripts.

Firstly, and ever since Redis 2.6.0, the EVAL command enables running server-side scripts. Eval scripts provide a quick and straightforward way to have Redis run your scripts ad-hoc. However, using them means that the scripted logic is a part of your application (not an extension of the Redis server). Every applicative instance that runs a script must have the script's source code readily available for loading at any time. That is because scripts are only cached by the server and are volatile. As your application grows, this approach can become harder to develop and maintain.

Secondly, added in v7.0, Redis Functions are essentially scripts that are first-class database elements. As such, functions decouple scripting from application logic and enable independent development, testing, and deployment of scripts. To use functions, they need to be loaded first, and then they are available for use by all connected clients. In this case, loading a function to the database becomes an administrative deployment task (such as loading a Redis module, for example), which separates the script from the application.

Please refer to the following pages for more information:

When running a script or a function, Redis guarantees its atomic execution. The script's execution blocks all server activities during its entire time, similarly to the semantics of transactions. These semantics mean that all of the script's effects either have yet to happen or had already happened. The blocking semantics of an executed script apply to all connected clients at all times.

Note that the potential downside of this blocking approach is that executing slow scripts is not a good idea. It is not hard to create fast scripts because scripting's overhead is very low. However, if you intend to use a slow script in your application, be aware that all other clients are blocked and can't execute any command while it is running.

Sandboxed script context

Redis places the engine that executes user scripts inside a sandbox. The sandbox attempts to prevent accidental misuse and reduce potential threats from the server's environment.

Scripts should never try to access the Redis server's underlying host systems, such as the file system, network, or attempt to perform any other system call other than those supported by the API.

Scripts should operate solely on data stored in Redis and data provided as arguments to their execution.

Maximum execution time

Scripts are subject to a maximum execution time (set by default to five seconds). This default timeout is enormous since a script usually runs in less than a millisecond. The limit is in place to handle accidental infinite loops created during development.

It is possible to modify the maximum time a script can be executed with millisecond precision, either via redis.conf or by using the CONFIG SET command. The configuration parameter affecting max execution time is called busy-reply-threshold.

When a script reaches the timeout threshold, it isn't terminated by Redis automatically. Doing so would violate the contract between Redis and the scripting engine that ensures that scripts are atomic. Interrupting the execution of a script has the potential of leaving the dataset with half-written changes.

Therefore, when a script executes longer than than the configured timeout, the following happens:

  • Redis logs that a script is running for too long.
  • It starts accepting commands again from other clients but will reply with a BUSY error to all the clients sending normal commands. The only commands allowed in this state are SCRIPT KILL, FUNCTION KILL, and SHUTDOWN NOSAVE.
  • It is possible to terminate a script that only executes read-only commands using the SCRIPT KILL and FUNCTION KILL commands. These commands do not violate the scripting semantic as no data was written to the dataset by the script yet.
  • If the script had already performed even a single write operation, the only command allowed is SHUTDOWN NOSAVE that stops the server without saving the current data set on disk (basically, the server is aborted).

11.1 - Redis functions

Scripting with Redis 7 and beyond

Redis Functions is an API for managing code to be executed on the server. This feature, which became available in Redis 7, supersedes the use of EVAL in prior versions of Redis.

Prologue (or, what's wrong with Eval Scripts?)

Prior versions of Redis made scripting available only via the EVAL command, which allows a Lua script to be sent for execution by the server. The core use cases for Eval Scripts is executing part of your application logic inside Redis, efficiently and atomically. Such script can perform conditional updates across multiple keys, possibly combining several different data types.

Using EVAL requires that the application sends the entire script for execution every time. Because this results in network and script compilation overheads, Redis provides an optimization in the form of the EVALSHA command. By first calling SCRIPT LOAD to obtain the script's SHA1, the application can invoke it repeatedly afterward with its digest alone.

By design, Redis only caches the loaded scripts. That means that the script cache can become lost at any time, such as after calling SCRIPT FLUSH, after restarting the server, or when failing over to a replica. The application is responsible for reloading scripts during runtime if any are missing. The underlying assumption is that scripts are a part of the application and not maintained by the Redis server.

This approach suits many light-weight scripting use cases, but introduces several difficulties once an application becomes complex and relies more heavily on scripting, namely:

  1. All client application instances must maintain a copy of all scripts. That means having some mechanism that applies script updates to all of the application's instances.
  2. Calling cached scripts within the context of a transaction increases the probability of the transaction failing because of a missing script. Being more likely to fail makes using cached scripts as building blocks of workflows less attractive.
  3. SHA1 digests are meaningless, making debugging the system extremely hard (e.g., in a MONITOR session).
  4. When used naively, EVAL promotes an anti-pattern in which scripts the client application renders verbatim scripts instead of responsibly using the KEYS and ARGV Lua APIs.
  5. Because they are ephemeral, a script can't call another script. This makes sharing and reusing code between scripts nearly impossible, short of client-side preprocessing (see the first point).

To address these needs while avoiding breaking changes to already-established and well-liked ephemeral scripts, Redis v7.0 introduces Redis Functions.

What are Redis Functions?

Redis functions are an evolutionary step from ephemeral scripting.

Functions provide the same core functionality as scripts but are first-class software artifacts of the database. Redis manages functions as an integral part of the database and ensures their availability via data persistence and replication. Because functions are part of the database and therefore declared before use, applications aren't required to load them during runtime nor risk aborted transactions. An application that uses functions depends only on their APIs rather than on the embedded script logic in the database.

Whereas ephemeral scripts are considered a part of the application's domain, functions extend the database server itself with user-provided logic. They can be used to expose a richer API composed of core Redis commands, similar to modules, developed once, loaded at startup, and used repeatedly by various applications / clients. Every function has a unique user-defined name, making it much easier to call and trace its execution.

The design of Redis Functions also attempts to demarcate between the programming language used for writing functions and their management by the server. Lua, the only language interpreter that Redis presently support as an embedded execution engine, is meant to be simple and easy to learn. However, the choice of Lua as a language still presents many Redis users with a challenge.

The Redis Functions feature makes no assumptions about the implementation's language. An execution engine that is part of the definition of the function handles running it. An engine can theoretically execute functions in any language as long as it respects several rules (such as the ability to terminate an executing function).

Presently, as noted above, Redis ships with a single embedded Lua 5.1 engine. There are plans to support additional engines in the future. Redis functions can use all of Lua's available capabilities to ephemeral scripts, with the only exception being the Redis Lua scripts debugger.

Functions also simplify development by enabling code sharing. Every function belongs to a single library, and any given library can consist of multiple functions. The library's contents are immutable, and selective updates of its functions aren't allowed. Instead, libraries are updated as a whole with all of their functions together in one operation. This allows calling functions from other functions within the same library, or sharing code between functions by using a common code in library-internal methods, that can also take language native arguments.

Functions are intended to better support the use case of maintaining a consistent view for data entities through a logical schema, as mentioned above. As such, functions are stored alongside the data itself. Functions are also persisted to the AOF file and replicated from master to replicas, so they are as durable as the data itself. When Redis is used as an ephemeral cache, additional mechanisms (described below) are required to make functions more durable.

Like all other operations in Redis, the execution of a function is atomic. A function's execution blocks all server activities during its entire time, similarly to the semantics of transactions. These semantics mean that all of the script's effects either have yet to happen or had already happened. The blocking semantics of an executed function apply to all connected clients at all times. Because running a function blocks the Redis server, functions are meant to finish executing quickly, so you should avoid using long-running functions.

Loading libraries and functions

Let's explore Redis Functions via some tangible examples and Lua snippets.

At this point, if you're unfamiliar with Lua in general and specifically in Redis, you may benefit from reviewing some of the examples in Introduction to Eval Scripts and Lua API pages for a better grasp of the language.

Every Redis function belongs to a single library that's loaded to Redis. Loading a library to the database is done with the FUNCTION LOAD command. The command gets the library payload as input, the library payload must start with Shebang statement that provides a metadata about the library (like the engine to use and the library name). The Shebang format is:

#!<engine name> name=<library name>

Let's try loading an empty library:

redis> FUNCTION LOAD "#!lua name=mylib\n"
(error) ERR No functions registered

The error is expected, as there are no functions in the loaded library. Every library needs to include at least one registered function to load successfully. A registered function is named and acts as an entry point to the library. When the target execution engine handles the FUNCTION LOAD command, it registers the library's functions.

The Lua engine compiles and evaluates the library source code when loaded, and expects functions to be registered by calling the redis.register_function() API.

The following snippet demonstrates a simple library registering a single function named knockknock, returning a string reply:

#!lua name=mylib
redis.register_function(
  'knockknock',
  function() return 'Who\'s there?' end
)

In the example above, we provide two arguments about the function to Lua's redis.register_function() API: its registered name and a callback.

We can load our library and use FCALL to call the registered function:

redis> FUNCTION LOAD "#!lua name=mylib\nredis.register_function('knockknock', function() return 'Who\\'s there?' end)"
mylib
redis> FCALL knockknock 0
"Who's there?"

Notice that the FUNCTION LOAD command returns the name of the loaded library, this name can later be used FUNCTION LIST and FUNCTION DELETE.

We've provided FCALL with two arguments: the function's registered name and the numeric value 0. This numeric value indicates the number of key names that follow it (the same way EVAL and EVALSHA work).

We'll explain immediately how key names and additional arguments are available to the function. As this simple example doesn't involve keys, we simply use 0 for now.

Input keys and regular arguments

Before we move to the following example, it is vital to understand the distinction Redis makes between arguments that are names of keys and those that aren't.

While key names in Redis are just strings, unlike any other string values, these represent keys in the database. The name of a key is a fundamental concept in Redis and is the basis for operating the Redis Cluster.

Important: To ensure the correct execution of Redis Functions, both in standalone and clustered deployments, all names of keys that a function accesses must be explicitly provided as input key arguments.

Any input to the function that isn't the name of a key is a regular input argument.

Now, let's pretend that our application stores some of its data in Redis Hashes. We want an HSET-like way to set and update fields in said Hashes and store the last modification time in a new field named _last_modified_. We can implement a function to do all that.

Our function will call TIME to get the server's clock reading and update the target Hash with the new fields' values and the modification's timestamp. The function we'll implement accepts the following input arguments: the Hash's key name and the field-value pairs to update.

The Lua API for Redis Functions makes these inputs accessible as the first and second arguments to the function's callback. The callback's first argument is a Lua table populated with all key names inputs to the function. Similarly, the callback's second argument consists of all regular arguments.

The following is a possible implementation for our function and its library registration:

#!lua name=mylib

local function my_hset(keys, args)
  local hash = keys[1]
  local time = redis.call('TIME')[1]
  return redis.call('HSET', hash, '_last_modified_', time, unpack(args))
end

redis.register_function('my_hset', my_hset)

If we create a new file named mylib.lua that consists of the library's definition, we can load it like so (without stripping the source code of helpful whitespaces):

$ cat mylib.lua | redis-cli -x FUNCTION LOAD REPLACE

We've added the REPLACE modifier to the call to FUNCTION LOAD to tell Redis that we want to overwrite the existing library definition. Otherwise, we would have gotten an error from Redis complaining that the library already exists.

Now that the library's updated code is loaded to Redis, we can proceed and call our function:

redis> FCALL my_hset 1 myhash myfield "some value" another_field "another value"
(integer) 3
redis> HGETALL myhash
1) "_last_modified_"
2) "1640772721"
3) "myfield"
4) "some value"
5) "another_field"
6) "another value"

In this case, we had invoked FCALL with 1 as the number of key name arguments. That means that the function's first input argument is a name of a key (and is therefore included in the callback's keys table). After that first argument, all following input arguments are considered regular arguments and constitute the args table passed to the callback as its second argument.

Expanding the library

We can add more functions to our library to benefit our application. The additional metadata field we've added to the Hash shouldn't be included in responses when accessing the Hash's data. On the other hand, we do want to provide the means to obtain the modification timestamp for a given Hash key.

We'll add two new functions to our library to accomplish these objectives:

  1. The my_hgetall Redis Function will return all fields and their respective values from a given Hash key name, excluding the metadata (i.e., the _last_modified_ field).
  2. The my_hlastmodified Redis Function will return the modification timestamp for a given Hash key name.

The library's source code could look something like the following:

#!lua name=mylib

local function my_hset(keys, args)
  local hash = keys[1]
  local time = redis.call('TIME')[1]
  return redis.call('HSET', hash, '_last_modified_', time, unpack(args))
end

local function my_hgetall(keys, args)
  redis.setresp(3)
  local hash = keys[1]
  local res = redis.call('HGETALL', hash)
  res['map']['_last_modified_'] = nil
  return res
end

local function my_hlastmodified(keys, args)
  local hash = keys[1]
  return redis.call('HGET', keys[1], '_last_modified_')
end

redis.register_function('my_hset', my_hset)
redis.register_function('my_hgetall', my_hgetall)
redis.register_function('my_hlastmodified', my_hlastmodified)

While all of the above should be straightforward, note that the my_hgetall also calls redis.setresp(3). That means that the function expects RESP3 replies after calling redis.call(), which, unlike the default RESP2 protocol, provides dictionary (associative arrays) replies. Doing so allows the function to delete (or set to nil as is the case with Lua tables) specific fields from the reply, and in our case, the _last_modified_ field.

Assuming you've saved the library's implementation in the mylib.lua file, you can replace it with:

$ cat mylib.lua | redis-cli -x FUNCTION LOAD REPLACE

Once loaded, you can call the library's functions with FCALL:

redis> FCALL my_hgetall 1 myhash
1) "myfield"
2) "some value"
3) "another_field"
4) "another value"
redis> FCALL my_hlastmodified 1 myhash
"1640772721"

You can also get the library's details with the FUNCTION LIST command:

redis> FUNCTION LIST
1) 1) "library_name"
   2) "mylib"
   3) "engine"
   4) "LUA"
   5) "functions"
   6) 1) 1) "name"
         2) "my_hset"
         3) "description"
         4) (nil)
      2) 1) "name"
         2) "my_hgetall"
         3) "description"
         4) (nil)
      3) 1) "name"
         2) "my_hlastmodified"
         3) "description"
         4) (nil)

You can see that it is easy to update our library with new capabilities.

Reusing code in the library

On top of bundling functions together into database-managed software artifacts, libraries also facilitate code sharing. We can add to our library an error handling helper function called from other functions. The helper function check_keys() verifies that the input keys table has a single key. Upon success it returns nil, otherwise it returns an error reply.

The updated library's source code would be:

#!lua name=mylib

local function check_keys(keys)
  local error = nil
  local nkeys = table.getn(keys)
  if nkeys == 0 then
    error = 'Hash key name not provided'
  elseif nkeys > 1 then
    error = 'Only one key name is allowed'
  end

  if error ~= nil then
    redis.log(redis.LOG_WARNING, error);
    return redis.error_reply(error)
  end
  return nil
end

local function my_hset(keys, args)
  local error = check_keys(keys)
  if error ~= nil then
    return error
  end

  local hash = keys[1]
  local time = redis.call('TIME')[1]
  return redis.call('HSET', hash, '_last_modified_', time, unpack(args))
end

local function my_hgetall(keys, args)
  local error = check_keys(keys)
  if error ~= nil then
    return error
  end

  redis.setresp(3)
  local hash = keys[1]
  local res = redis.call('HGETALL', hash)
  res['map']['_last_modified_'] = nil
  return res
end

local function my_hlastmodified(keys, args)
  local error = check_keys(keys)
  if error ~= nil then
    return error
  end

  local hash = keys[1]
  return redis.call('HGET', keys[1], '_last_modified_')
end

redis.register_function('my_hset', my_hset)
redis.register_function('my_hgetall', my_hgetall)
redis.register_function('my_hlastmodified', my_hlastmodified)

After you've replaced the library in Redis with the above, you can immediately try out the new error handling mechanism:

127.0.0.1:6379> FCALL my_hset 0 myhash nope nope
(error) Hash key name not provided
127.0.0.1:6379> FCALL my_hgetall 2 myhash anotherone
(error) Only one key name is allowed

And your Redis log file should have lines in it that are similar to:

...
20075:M 1 Jan 2022 16:53:57.688 # Hash key name not provided
20075:M 1 Jan 2022 16:54:01.309 # Only one key name is allowed

Functions in cluster

As noted above, Redis automatically handles propagation of loaded functions to replicas. In a Redis Cluster, it is also necessary to load functions to all cluster nodes. This is not handled automatically by Redis Cluster, and needs to be handled by the cluster administrator (like module loading, configuration setting, etc.).

As one of the goals of functions is to live separately from the client application, this should not be part of the Redis client library responsibilities. Instead, redis-cli --cluster-only-masters --cluster call host:port FUNCTION LOAD ... can be used to execute the load command on all master nodes.

Also, note that redis-cli --cluster add-node automatically takes care to propagate the loaded functions from one of the existing nodes to the new node.

Functions and ephemeral Redis instances

In some cases there may be a need to start a fresh Redis server with a set of functions pre-loaded. Common reasons for that could be:

  • Starting Redis in a new environment
  • Re-starting an ephemeral (cache-only) Redis, that uses functions

In such cases, we need to make sure that the pre-loaded functions are available before Redis accepts inbound user connections and commands.

To do that, it is possible to use redis-cli --functions-rdb to extract the functions from an existing server. This generates an RDB file that can be loaded by Redis at startup.

Function flags

Redis needs to have some information about how a function is going to behave when executed, in order to properly enforce resource usage policies and maintain data consistency.

For example, Redis needs to know that a certain function is read-only before permitting it to execute using FCALL_RO on a read-only replica.

By default, Redis assumes that all functions may perform arbitrary read or write operations. Function Flags make it possible to declare more specific function behavior at the time of registration. Let's see how this works.

In our previous example, we defined two functions that only read data. We can try executing them using FCALL_RO against a read-only replica.

redis > FCALL_RO my_hgetall 1 myhash
(error) ERR Can not execute a function with write flag using fcall_ro.

Redis returns this error because a function can, in theory, perform both read and write operations on the database. As a safeguard and by default, Redis assumes that the function does both, so it blocks its execution. The server will reply with this error in the following cases:

  1. Executing a function with FCALL against a read-only replica.
  2. Using FCALL_RO to execute a function.
  3. A disk error was detected (Redis is unable to persist so it rejects writes).

In these cases, you can add the no-writes flag to the function's registration, disable the safeguard and allow them to run. To register a function with flags use the named arguments variant of redis.register_function.

The updated registration code snippet from the library looks like this:

redis.register_function('my_hset', my_hset)
redis.register_function{
  function_name='my_hgetall',
  callback=my_hgetall,
  flags={ 'no-writes' }
}
redis.register_function{
  function_name='my_hlastmodified',
  callback=my_hlastmodified,
  flags={ 'no-writes' }
}

Once we've replaced the library, Redis allows running both my_hgetall and my_hlastmodified with FCALL_RO against a read-only replica:

redis> FCALL_RO my_hgetall 1 myhash
1) "myfield"
2) "some value"
3) "another_field"
4) "another value"
redis> FCALL_RO my_hlastmodified 1 myhash
"1640772721"

For the complete documentation flags, please refer to Script flags.

11.2 - Scripting with Lua

Executing Lua in Redis

Redis lets users upload and execute Lua scripts on the server. Scripts can employ programmatic control structures and use most of the commands while executing to access the database. Because scripts execute in the server, reading and writing data from scripts is very efficient.

Redis guarantees the script's atomic execution. While executing the script, all server activities are blocked during its entire runtime. These semantics mean that all of the script's effects either have yet to happen or had already happened.

Scripting offers several properties that can be valuable in many cases. These include:

  • Providing locality by executing logic where data lives. Data locality reduces overall latency and saves networking resources.
  • Blocking semantics that ensure the script's atomic execution.
  • Enabling the composition of simple capabilities that are either missing from Redis or are too niche to a part of it.

Lua lets you run part of your application logic inside Redis. Such scripts can perform conditional updates across multiple keys, possibly combining several different data types atomically.

Scripts are executed in Redis by an embedded execution engine. Presently, Redis supports a single scripting engine, the Lua 5.1 interpreter. Please refer to the Redis Lua API Reference page for complete documentation.

Although the server executes them, Eval scripts are regarded as a part of the client-side application, which is why they're not named, versioned, or persisted. So all scripts may need to be reloaded by the application at any time if missing (after a server restart, fail-over to a replica, etc.). As of version 7.0, Redis Functions offer an alternative approach to programmability which allow the server itself to be extended with additional programmed logic.

Getting started

We'll start scripting with Redis by using the EVAL command.

Here's our first example:

> EVAL "return 'Hello, scripting!'" 0
"Hello, scripting!"

In this example, EVAL takes two arguments. The first argument is a string that consists of the script's Lua source code. The script doesn't need to include any definitions of Lua function. It is just a Lua program that will run in the Redis engine's context.

The second argument is the number of arguments that follow the script's body, starting from the third argument, representing Redis key names. In this example, we used the value 0 because we didn't provide the script with any arguments, whether the names of keys or not.

Script parameterization

It is possible, although highly ill-advised, to have the application dynamically generate script source code per its needs. For example, the application could send these two entirely different, but at the same time perfectly identical scripts:

redis> EVAL "return 'Hello'" 0
"Hello"
redis> EVAL "return 'Scripting!'" 0
"Scripting!"

Although this mode of operation isn't blocked by Redis, it is an anti-pattern due to script cache considerations (more on the topic below). Instead of having your application generate subtle variations of the same scripts, you can parametrize them and pass any arguments needed for to execute them.

The following example demonstrates how to achieve the same effects as above, but via parameterization:

redis> EVAL "return ARGV[1]" 0 Hello
"Hello"
redis> EVAL "return ARGV[1]" 0 Parameterization!
"Parameterization!"

At this point, it is essential to understand the distinction Redis makes between input arguments that are names of keys and those that aren't.

While key names in Redis are just strings, unlike any other string values, these represent keys in the database. The name of a key is a fundamental concept in Redis and is the basis for operating the Redis Cluster.

Important: to ensure the correct execution of scripts, both in standalone and clustered deployments, all names of keys that a script accesses must be explicitly provided as input key arguments. The script should only access keys whose names are given as input arguments. Scripts should never access keys with programmatically-generated names or based on the contents of data structures stored in the database.

Any input to the function that isn't the name of a key is a regular input argument.

In the example above, both Hello and Parameterization! regular input arguments for the script. Because the script doesn't touch any keys, we use the numerical argument 0 to specify there are no key name arguments. The execution context makes arguments available to the script through KEYS and ARGV global runtime variables. The KEYS table is pre-populated with all key name arguments provided to the script before its execution, whereas the ARGV table serves a similar purpose but for regular arguments.

The following attempts to demonstrate the distribution of input arguments between the scripts KEYS and ARGV runtime global variables:

redis> EVAL "return { KEYS[1], KEYS[2], ARGV[1], ARGV[2], ARGV[3] }" 2 key1 key2 arg1 arg2 arg3
1) "key1"
2) "key2"
3) "arg1"
4) "arg2"
5) "arg3"

Note: as can been seen above, Lua's table arrays are returned as RESP2 array replies, so it is likely that your client's library will convert it to the native array data type in your programming language. Please refer to the rules that govern data type conversion for more pertinent information.

Interacting with Redis from a script

It is possible to call Redis commands from a Lua script either via redis.call() or redis.pcall().

The two are nearly identical. Both execute a Redis command along with its provided arguments, if these represent a well-formed command. However, the difference between the two functions lies in the manner in which runtime errors (such as syntax errors, for example) are handled. Errors raised from calling redis.call() function are returned directly to the client that had executed it. Conversely, errors encountered when calling the redis.pcall() function are returned to the script's execution context instead for possible handling.

For example, consider the following:

> EVAL "return redis.call('SET', KEYS[1], ARGV[1])" 1 foo bar
OK

The above script accepts one key name and one value as its input arguments. When executed, the script calls the SET command to set the input key, foo, with the string value "bar".

Script cache

Until this point, we've used the EVAL command to run our script.

Whenever we call EVAL, we also include the script's source code with the request. Repeatedly calling EVAL to execute the same set of parameterized scripts, wastes both network bandwidth and also has some overheads in Redis. Naturally, saving on network and compute resources is key, so, instead, Redis provides a caching mechanism for scripts.

Every script you execute with EVAL is stored in a dedicated cache that the server keeps. The cache's contents are organized by the scripts' SHA1 digest sums, so the SHA1 digest sum of a script uniquely identifies it in the cache. You can verify this behavior by running EVAL and calling INFO afterward. You'll notice that the used_memory_scripts_eval and number_of_cached_scripts metrics grow with every new script that's executed.

As mentioned above, dynamically-generated scripts are an anti-pattern. Generating scripts during the application's runtime may, and probably will, exhaust the host's memory resources for caching them. Instead, scripts should be as generic as possible and provide customized execution via their arguments.

A script is loaded to the server's cache by calling the SCRIPT LOAD command and providing its source code. The server doesn't execute the script, but instead just compiles and loads it to the server's cache. Once loaded, you can execute the cached script with the SHA1 digest returned from the server.

Here's an example of loading and then executing a cached script:

redis> SCRIPT LOAD "return 'Immabe a cached script'"
"c664a3bf70bd1d45c4284ffebb65a6f2299bfc9f"
redis> EVALSHA c664a3bf70bd1d45c4284ffebb65a6f2299bfc9f 0
"Immabe a cached script"

Cache volatility

The Redis script cache is always volatile. It isn't considered as a part of the database and is not persisted. The cache may be cleared when the server restarts, during fail-over when a replica assumes the master role, or explicitly by SCRIPT FLUSH. That means that cached scripts are ephemeral, and the cache's contents can be lost at any time.

Applications that use scripts should always call EVALSHA to execute them. The server returns an error if the script's SHA1 digest is not in the cache. For example:

redis> EVALSHA ffffffffffffffffffffffffffffffffffffffff 0
(error) NOSCRIPT No matching script

In this case, the application should first load it with SCRIPT LOAD and then call EVALSHA once more to run the cached script by its SHA1 sum. Most of Redis' clients already provide utility APIs for doing that automatically. Please consult your client's documentation regarding the specific details.

EVALSHA in the context of pipelining

Special care should be given executing EVALSHA in the context of a pipelined request. The commands in a pipelined request run in the order they are sent, but other clients' commands may be interleaved for execution between these. Because of that, the NOSCRIPT error can return from a pipelined request but can't be handled.

Therefore, a client library's implementation should revert to using plain EVAL of parameterized in the context of a pipeline.

Script cache semantics

During normal operation, an application's scripts are meant to stay indefintely in the cache (that is, until the server is restarted or the cache being flushed). The underlying reasoning is that the script cache contents of a well-written application are unlikely to grow continuously. Even large applications that use hundreds of cached scripts shouldn't be an issue in terms of cache memory usage.

The only way to flush the script cache is by explicitly calling the SCRIPT FLUSH command. Running the command will completely flush the scripts cache, removing all the scripts executed so far. Typically, this is only needed when the instance is going to be instantiated for another customer or application in a cloud environment.

Also, as already mentioned, restarting a Redis instance flushes the non-persistent script cache. However, from the point of view of the Redis client, there are only two ways to make sure that a Redis instance was not restarted between two different commands:

  • The connection we have with the server is persistent and was never closed so far.
  • The client explicitly checks the runid field in the INFO command to ensure the server was not restarted and is still the same process.

Practically speaking, it is much simpler for the client to assume that in the context of a given connection, cached scripts are guaranteed to be there unless the administrator explicitly invoked the SCRIPT FLUSH command. The fact that the user can count on Redis to retain cached scripts is semantically helpful in the context of pipelining.

The SCRIPT command

The Redis SCRIPT provides several ways for controlling the scripting subsystem. These are:

  • SCRIPT FLUSH: this command is the only way to force Redis to flush the scripts cache. It is most useful in environments where the same Redis instance is reassigned to different uses. It is also helpful for testing client libraries' implementations of the scripting feature.

  • SCRIPT EXISTS: given one or more SHA1 digests as arguments, this command returns an array of 1's and 0's. 1 means the specific SHA1 is recognized as a script already present in the scripting cache. 0's meaning is that a script with this SHA1 wasn't loaded before (or at least never since the latest call to SCRIPT FLUSH).

  • SCRIPT LOAD script: this command registers the specified script in the Redis script cache. It is a useful command in all the contexts where we want to ensure that EVALSHA doesn't not fail (for instance, in a pipeline or when called from a [MULTI/EXEC transaction](/topics/transactions)), without the need to execute the script.

  • SCRIPT KILL: this command is the only way to interrupt a long-running script (a.k.a slow script), short of shutting down the server. A script is deemed as slow once its execution's duration exceeds the configured maximum execution time threshold. The SCRIPT KILL command can be used only with scripts that did not modify the dataset during their execution (since stopping a read-only script does not violate the scripting engine's guaranteed atomicity).

  • SCRIPT DEBUG: controls use of the built-in Redis Lua scripts debugger.

Script replication

In standalone deployments, a single Redis instance called master manages the entire database. A clustered deployment has at least three masters managing the sharded database. Redis uses replication to maintain one or more replicas, or exact copies, for any given master.

Because scripts can modify the data, Redis ensures all write operations performed by a script are also sent to replicas to maintain consistency. There are two conceptual approaches when it comes to script replication:

  1. Verbatim replication: the master sends the script's source code to the replicas. Replicas then execute the script and apply the write effects. This mode can save on replication bandwidth in cases where short scripts generate many commands (for example, a for loop). However, this replication mode means that replicas redo the same work done by the master, which is wasteful. More importantly, it also requires all write scripts to be deterministic.
  2. Effects replication: only the script's data-modifying commands are replicated. Replicas then run the commands without executing any scripts. While potentially lengthier in terms of network traffic, this replication mode is deterministic by definition and therefore doesn't require special consideration.

Verbatim script replication was the only mode supported until Redis 3.2, in which effects replication was added. The lua-replicate-commands configuration directive and redis.replicate_commands() Lua API can be used to enable it.

In Redis 5.0, effects replication became the default mode. As of Redis 7.0, verbatim replication is no longer supported.

Replicating commands instead of scripts

Starting with Redis 3.2, it is possible to select an alternative replication method. Instead of replicating whole scripts, we can replicate the write commands generated by the script. We call this script effects replication.

Note: starting with Redis 5.0, script effects replication is the default mode and does not need to be explicitly enabled.

In this replication mode, while Lua scripts are executed, Redis collects all the commands executed by the Lua scripting engine that actually modify the dataset. When the script execution finishes, the sequence of commands that the script generated are wrapped into a [MULTI/EXEC transaction](/topics/transactions) and are sent to the replicas and AOF.

This is useful in several ways depending on the use case:

  • When the script is slow to compute, but the effects can be summarized by a few write commands, it is a shame to re-compute the script on the replicas or when reloading the AOF. In this case, it is much better to replicate just the effects of the script.
  • When script effects replication is enabled, the restrictions on non-deterministic functions are removed. You can, for example, use the TIME or SRANDMEMBER commands inside your scripts freely at any place.
  • The Lua PRNG in this mode is seeded randomly on every call.

Unless already enabled by the server's configuration or defaults (before Redis 7.0), you need to issue the following Lua command before the script performs a write:

redis.replicate_commands()

The redis.replicate_commands() function returns _true) if script effects replication was enabled; otherwise, if the function was called after the script already called a write command, it returns false, and normal whole script replication is used.

This function is deprecated as of Redis 7.0, and while you can still call it, it will always succeed.

Scripts with deterministic writes

Note: Starting with Redis 5.0, script replication is by default effect-based rather than verbatim. In Redis 7.0, verbatim script replication had been removed entirely. The following section only applies to versions lower than Redis 7.0 when not using effect-based script replication.

An important part of scripting is writing scripts that only change the database in a deterministic way. Scripts executed in a Redis instance are, by default until version 5.0, propagated to replicas and to the AOF file by sending the script itself -- not the resulting commands. Since the script will be re-run on the remote host (or when reloading the AOF file), its changes to the database must be reproducible.

The reason for sending the script is that it is often much faster than sending the multiple commands that the script generates. If the client is sending many scripts to the master, converting the scripts into individual commands for the replica / AOF would result in too much bandwidth for the replication link or the Append Only File (and also too much CPU since dispatching a command received via the network is a lot more work for Redis compared to dispatching a command invoked by Lua scripts).

Normally replicating scripts instead of the effects of the scripts makes sense, however not in all the cases. So starting with Redis 3.2, the scripting engine is able to, alternatively, replicate the sequence of write commands resulting from the script execution, instead of replication the script itself.

In this section, we'll assume that scripts are replicated verbatim by sending the whole script. Let's call this replication mode verbatim scripts replication.

The main drawback with the whole scripts replication approach is that scripts are required to have the following property: the script always must execute the same Redis write commands with the same arguments given the same input data set. Operations performed by the script can't depend on any hidden (non-explicit) information or state that may change as the script execution proceeds or between different executions of the script. Nor can it depend on any external input from I/O devices.

Acts such as using the system time, calling Redis commands that return random values (e.g., RANDOMKEY), or using Lua's random number generator, could result in scripts that will not evaluate consistently.

To enforce the deterministic behavior of scripts, Redis does the following:

  • Lua does not export commands to access the system time or other external states.
  • Redis will block the script with an error if a script calls a Redis command able to alter the data set after a Redis random command like RANDOMKEY, SRANDMEMBER, TIME. That means that read-only scripts that don't modify the dataset can call those commands. Note that a random command does not necessarily mean a command that uses random numbers: any non-deterministic command is considered as a random command (the best example in this regard is the TIME command).
  • In Redis version 4.0, commands that may return elements in random order, such as SMEMBERS (because Redis Sets are unordered), exhibit a different behavior when called from Lua, and undergo a silent lexicographical sorting filter before returning data to Lua scripts. So redis.call("SMEMBERS",KEYS[1]) will always return the Set elements in the same order, while the same command invoked by normal clients may return different results even if the key contains exactly the same elements. However, starting with Redis 5.0, this ordering is no longer performed because replicating effects circumvents this type of non-determinism. In general, even when developing for Redis 4.0, never assume that certain commands in Lua will be ordered, but instead rely on the documentation of the original command you call to see the properties it provides.
  • Lua's pseudo-random number generation function math.random is modified and always uses the same seed for every execution. This means that calling math.random will always generate the same sequence of numbers every time a script is executed (unless math.randomseed is used).

All that said, you can still use commands that write and random behavior with a simple trick. Imagine that you want to write a Redis script that will populate a list with N random integers.

The initial implementation in Ruby could look like this:

require 'rubygems'
require 'redis'

r = Redis.new

RandomPushScript = <<EOF
    local i = tonumber(ARGV[1])
    local res
    while (i > 0) do
        res = redis.call('LPUSH',KEYS[1],math.random())
        i = i-1
    end
    return res
EOF

r.del(:mylist)
puts r.eval(RandomPushScript,[:mylist],[10,rand(2**32)])

Every time this code runs, the resulting list will have exactly the following elements:

redis> LRANGE mylist 0 -1
 1) "0.74509509873814"
 2) "0.87390407681181"
 3) "0.36876626981831"
 4) "0.6921941534114"
 5) "0.7857992587545"
 6) "0.57730350670279"
 7) "0.87046522734243"
 8) "0.09637165539729"
 9) "0.74990198051087"
10) "0.17082803611217"

To make the script both deterministic and still have it produce different random elements, we can add an extra argument to the script that's the seed to Lua's pseudo-random number generator. The new script is as follows:

RandomPushScript = <<EOF
    local i = tonumber(ARGV[1])
    local res
    math.randomseed(tonumber(ARGV[2]))
    while (i > 0) do
        res = redis.call('LPUSH',KEYS[1],math.random())
        i = i-1
    end
    return res
EOF

r.del(:mylist)
puts r.eval(RandomPushScript,1,:mylist,10,rand(2**32))

What we are doing here is sending the seed of the PRNG as one of the arguments. The script output will always be the same given the same arguments (our requirement) but we are changing one of the arguments at every invocation, generating the random seed client-side. The seed will be propagated as one of the arguments both in the replication link and in the Append Only File, guaranteeing that the same changes will be generated when the AOF is reloaded or when the replica processes the script.

Note: an important part of this behavior is that the PRNG that Redis implements as math.random and math.randomseed is guaranteed to have the same output regardless of the architecture of the system running Redis. 32-bit, 64-bit, big-endian and little-endian systems will all produce the same output.

Debugging Eval scripts

Starting with Redis 3.2, Redis has support for native Lua debugging. The Redis Lua debugger is a remote debugger consisting of a server, which is Redis itself, and a client, which is by default redis-cli.

The Lua debugger is described in the Lua scripts debugging section of the Redis documentation.

Execution under low memory conditions

When memory usage in Redis exceeds the maxmemory limit, the first write command encountered in the script that uses additional memory will cause the script to abort (unless redis.pcall was used).

However, an exception to the above is when the script's first write command does not use additional memory, as is the case with (for example, DEL and LREM). In this case, Redis will allow all commands in the script to run to ensure atomicity. If subsequent writes in the script consume additional memory, Redis' memory usage can exceed the threshold set by the maxmemory configuration directive.

Another scenario in which a script can cause memory usage to cross the maxmemory threshold is when the execution begins when Redis is slightly below maxmemory, so the script's first write command is allowed. As the script executes, subsequent write commands consume more memory leading to the server using more RAM than the configured maxmemory directive.

In those scenarios, you should consider setting the maxmemory-policy configuration directive to any values other than noeviction. In addition, Lua scripts should be as fast as possible so that eviction can kick in between executions.

Note that you can change this behaviour by using flags

Eval flags

Normally, when you run an Eval script, the server does not know how it accesses the database. By default, Redis assumes that all scripts read and write data. However, starting with Redis 7.0, there's a way to declare flags when creating a script in order to tell Redis how it should behave.

The way to do that is by using a Shebang statement on the first line of the script like so:

#!lua flags=no-writes,allow-stale
local x = redis.call('get','x')
return x

Note that as soon as Redis sees the #! comment, it'll treat the script as if it declares flags, even if no flags are defined, it still has a different set of defaults compared to a script without a #! line.

Another difference is that scripts without #! can run commands that access keys belonging to different cluster hash slots, but ones with #! inherit the default flags, so they cannot.

Please refer to Script flags to learn about the various scripts and the defaults.

11.3 - Redis Lua API reference

Executing Lua in Redis

Redis includes an embedded Lua 5.1 interpreter. The interpreter runs user-defined ephemeral scripts and functions. Scripts run in a sandboxed context and can only access specific Lua packages. This page describes the packages and APIs available inside the execution's context.

Sandbox context

The sandboxed Lua context attempts to prevent accidental misuse and reduce potential threats from the server's environment.

Scripts should never try to access the Redis server's underlying host systems. That includes the file system, network, and any other attempt to perform a system call other than those supported by the API.

Scripts should operate solely on data stored in Redis and data provided as arguments to their execution.

Global variables and functions

The sandboxed Lua execution context blocks the declaration of global variables and functions. The blocking of global variables is in place to ensure that scripts and functions don't attempt to maintain any runtime context other than the data stored in Redis. In the (somewhat uncommon) use case that a context needs to be maintain betweem executions, you should store the context in Redis' keyspace.

Redis will return a "Script attempted to create global variable 'my_global_variable" error when trying to execute the following snippet:

my_global_variable = 'some value'

And similarly for the following global function declaration:

function my_global_funcion()
  -- Do something amazing
end

You'll also get a similar error when your script attempts to access any global variables that are undefined in the runtime's context:

-- The following will surely raise an error
return an_undefined_global_variable

Instead, all variable and function definitions are required to be declared as local. To do so, you'll need to prepend the local keyword to your declarations. For example, the following snippet will be considered perfectly valid by Redis:

local my_local_variable = 'some value'

local function my_local_function()
  -- Do something else, but equally amazing
end

Note: the sandbox attempts to prevent the use of globals. Using Lua's debugging functionality or other approaches such as altering the meta table used for implementing the globals' protection to circumvent the sandbox isn't hard. However, it is difficult to circumvent the protection by accident. If the user messes with the Lua global state, the consistency of AOF and replication can't be guaranteed. In other words, just don't do it.

Imported Lua modules

Using imported Lua modules is not supported inside the sandboxed execution context. The sandboxed execution context prevents the loading modules by disabling Lua's require function.

The only libraries that Redis ships with and that you can use in scripts are listed under the Runtime libraries section.

Runtime globals

While the sandbox prevents users from declaring globals, the execution context is pre-populated with several of these.

The redis singleton

The redis singleton is an object instance that's accessible from all scripts. It provides the API to interact with Redis from scripts. Its description follows below.

The KEYS global variable

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: no

Important: to ensure the correct execution of scripts, both in standalone and clustered deployments, all names of keys that a function accesses must be explicitly provided as input key arguments. The script should only access keys whose names are given as input arguments. Scripts should never access keys with programmatically-generated names or based on the contents of data structures stored in the database.

The KEYS global variable is available only for ephemeral scripts. It is pre-populated with all key name input arguments.

The ARGV global variable

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: no

The ARGV global variable is available only in ephemeral scripts. It is pre-populated with all regular input arguments.

redis object

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

The Redis Lua execution context always provides a singleton instance of an object named redis. The redis instance enables the script to interact with the Redis server that's running it. Following is the API provided by the redis object instance.

redis.call(command [,arg...])

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

The redis.call() function calls a given Redis command and returns its reply. Its inputs are the command and arguments, and once called, it executes the command in Redis and returns the reply.

For example, we can call the ECHO command from a script and return its reply like so:

return redis.call('ECHO', 'Echo, echo... eco... o...')

If and when redis.call() triggers a runtime exception, the raw exception is raised back to the user as an error, automatically. Therefore, attempting to execute the following ephemeral script will fail and generate a runtime exception because ECHO accepts exactly zero or one argument:

redis> EVAL "return redis.call('ECHO', 'Echo,', 'echo... ', 'eco... ', 'o...')" 0
(error) ERR Error running script (call to b0345693f4b77517a711221050e76d24ae60b7f7): @user_script:1: @user_script: 1: Wrong number of args calling Redis command from script

Note that the call can fail due to various reasons, see Execution under low memory conditions and Script flags

To handle Redis runtime errors use `redis.pcall() instead.

redis.pcall(command [,arg...])

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

This function enables handling runtime errors raised by the Redis server. The redis.pcall() function behaves exactly like redis.call(), except that it:

  • Always returns a reply.
  • Never throws a runtime exeption, and returns in its stead a redis.error_reply in case that a runtime exception is thrown by the server.

The following demonstrates how to use redis.pcall() to intercept and handle runtime exceptions from within the context of an ephemeral script.

local reply = redis.pcall('ECHO', unpack(ARGV))
if reply['err'] ~= nil then
  -- Handle the error sometime, but for now just log it
  redis.log(redis.LOG_WARNING, reply['err'])
  reply['err'] = 'Something is wrong, but no worries, everything is under control'
end
return reply

Evaluating this script with more than one argument will return:

redis> EVAL "..." 0 hello world
(error) Something is wrong, but no worries, everything is under control

redis.error_reply(x)

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

This is a helper function that returns an error reply. The helper accepts a single string argument and returns a Lua table with the err field set to that string.

The outcome of the following code is that error1 and error2 are identical for all intents and purposes:

local text = 'My very special error'
local reply1 = { err = text }
local reply2 = redis.error_reply(text)

Therefore, both forms are valid as means for returning an error reply from scripts:

redis> EVAL "return { err = 'My very special table error' }" 0
(error) My very special table error
redis> EVAL "return redis.error_reply('My very special reply error')" 0
(error) My very special reply error

For returing Redis status replies refer to redis.status_reply(). Refer to the Data type conversion for returning other response types.

redis.status_reply(x)

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

This is a helper function that returns a simple string reply. "OK" is an example of a standard Redis status reply. The Lua API represents status replies as tables with a single field, ok, set with a simple status string.

The outcome of the following code is that status1 and status2 are identical for all intents and purposes:

local text = 'Frosty'
local status1 = { ok = text }
local status2 = redis.status_reply(text)

Therefore, both forms are valid as means for returning status replies from scripts:

redis> EVAL "return { ok = 'TICK' }" 0
TICK
redis> EVAL "return redis.status_reply('TOCK')" 0
TOCK

For returing Redis error replies refer to redis.error_reply(). Refer to the Data type conversion for returning other response types.

redis.sha1hex(x)

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

This function returns the SHA1 hexadecimal digest of its single string argument.

You can, for example, obtain the empty string's SHA1 digest:

redis> EVAL "return redis.sha1hex('')" 0
"da39a3ee5e6b4b0d3255bfef95601890afd80709"

redis.log(level, message)

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

This function writes to the Redis server log.

It expects two input arguments: the log level and a message. The message is a string to write to the log file. Log level can be on of these:

  • redis.LOG_DEBUG
  • redis.LOG_VERBOSE
  • redis.LOG_NOTICE
  • redis.LOG_WARNING

These levels map to the server's log levels. The log only records messages equal or greater in level than the server's loglevel configuration directive.

The following snippet:

redis.log(redis.LOG_WARNING, 'Something is terribly wrong')

will produce a line similar to the following in your server's log:

[32343] 22 Mar 15:21:39 # Something is terribly wrong

redis.setresp(x)

  • Since version: 6.0.0
  • Available in scripts: yes
  • Available in functions: yes

This function allows the executing script to switch between Redis Serialization Protocol (RESP) versions for the replies returned by redis.call()](#redis.call) and [redis.pall(). It expects a single numerical argument as the protocol's version. The default protocol version is 2, but it can be switched to version 3.

Here's an example of switching to RESP3 replies:

redis.setresp(3)

Please refer to the Data type conversion for more information about type conversions.

redis.set_repl(x)

  • Since version: 3.2.0
  • Available in scripts: yes
  • Available in functions: no

Note: this feature is only available when script effects replication is employed. Calling it when using verbatim script replication will result in an error. As of Redis version 2.6.0, scripts were replicated verbatim, meaning that the scripts' source code was sent for execution by replicas and stored in the AOF. An alternative replication mode added in version 3.2.0 allows replicating only the scripts' effects. As of Redis version 7.0, script replication is no longer supported, and the only replication mode available is script effects replication.

Warning: this is an advanced feature. Misuse can cause damage by violating the contract that binds the Redis master, its replicas, and AOF contents to hold the same logical content.

This function allows a script to assert control over how its effects are propagated to replicas and the AOF afterward. A script's effects are the Redis write commands that it calls.

By default, all write commands that a script executes are replicated. Sometimes, however, better control over this behavior can be helpful. This can be the case, for example, when storing intermediate values in the master alone.

Consider a script that intersects two sets and stores the result in a temporary key with SUNIONSTORE. It then picks five random elements (SRANDMEMBER) from the intersection and stores (SADD) them in another set. Finally, before returning, it deletes the temporary key that stores the intersection of the two source sets.

In this case, only the new set with its five randomly-chosen elements needs to be replicated. Replicating the SUNIONSTORE command and the `DEL'ition of the temporary key is unnecessary and wasteful.

The redis.set_repl() function instructs the server how to treat subsequent write commands in terms of replication. It accepts a single input argument that only be one of the following:

  • redis.REPL_ALL: replicates the effects to the AOF and replicas.
  • redis.REPL_AOF: replicates the effects to the AOF alone.
  • redis.REPL_REPLICA: replicates the effects to the replicas alone.
  • redis.REPL_SLAVE: same as REPL_REPLICA, maintained for backward compatibility.
  • redis.REPL_NONE: disables effect replication entirely.

By default, the scripting engine is initialized to the redis.REPL_ALL setting when a script begins its execution. You can call the redis.set_repl() function at any time during the script's execution to switch between the different replication modes.

A simple example follows:

redis.replicate_commands() -- Enable effects replication in versions lower than Redis v7.0
redis.call('SET', KEYS[1], ARGV[1])
redis.set_repl(redis.REPL_NONE)
redis.call('SET', KEYS[2], ARGV[2])
redis.set_repl(redis.REPL_ALL)
redis.call('SET', KEYS[3], ARGV[3])

If you run this script by calling EVAL "..." 3 A B C 1 2 3, the result will be that only the keys A and C are created on the replicas and AOF.

redis.replicate_commands()

  • Since version: 3.2.0
  • Until version: 7.0.0
  • Available in scripts: yes
  • Available in functions: no

This function switches the script's replication mode from verbatim replication to effects replication. You can use it to override the default verbatim script replication mode used by Redis until version 7.0.

Note: as of Redis v7.0, verbatim script replication is no longer supported. The default, and only script replication mode supported, is script effects' replication. For more information, please refer to Replicating commands instead of scripts

redis.breakpoint()

  • Since version: 3.2.0
  • Available in scripts: yes
  • Available in functions: no

This function triggers a breakpoint when using the Redis Lua debugger](/topics/ldb).

redis.debug(x)

  • Since version: 3.2.0
  • Available in scripts: yes
  • Available in functions: no

This function prints its argument in the Redis Lua debugger console.

redis.acl_check_cmd(command [,arg...])

  • Since version: 7.0.0
  • Available in scripts: yes
  • Available in functions: yes

This function is used for checking if the current user running the script has ACL permissions to execute the given command with the given arguments.

The return value is a boolean true in case the current user has permissions to execute the command (via a call to redis.call or redis.pcall) or false in case they don't.

The function will raise an error if the passed command or its arguments are invalid.

redis.register_function

  • Since version: 7.0.0
  • Available in scripts: no
  • Available in functions: yes

This function is only available from the context of the FUNCTION LOAD command. When called, it registers a function to the loaded library. The function can be called either with positional or named arguments.

positional arguments: redis.register_function(name, callback)

The first argument to redis.register_function is a Lua string representing the function name. The second argument to redis.register_function is a Lua function.

Usage example:

redis> FUNCTION LOAD "#!lua name=mylib\n redis.register_function('noop', function() end)"

Named arguments: redis.register_function{function_name=name, callback=callback, flags={flag1, flag2, ..}, description=description}

The named arguments variant accepts the following arguments:

  • function_name: the function's name.
  • callback: the function's callback.
  • flags: an array of strings, each a function flag (optional).
  • description: function's description (optional).

Both function_name and callback are mandatory.

Usage example:

redis> FUNCTION LOAD "#!lua name=mylib\n redis.register_function{function_name='noop', callback=function() end, flags={ 'no-writes' }, description='Does nothing'}"

Script flags

Important: Use script flags with care, which may negatively impact if misused. Note that the default for Eval scripts are different than the default for functions that are mentioned below, see Eval Flags

When you register a function or load an Eval script, the server does not know how it accesses the database. By default, Redis assumes that all scripts read and write data. This results in the following behavior:

  1. They can read and write data.
  2. They can run in cluster mode, and are not able to run commands accessing keys of different hash slots.
  3. Execution against a stale replica is denied to avoid inconsistent reads.
  4. Execution under low memory is denied to avoid exceeding the configured threshold.

You can use the following flags and instruct the server to treat the scripts' execution differently:

  • no-writes: this flag indicates that the script only reads data but never writes.

    By default, Redis will deny the execution of scripts against read-only replicas, as they may attempt to perform writes. Similarly, the server will not allow calling scripts with FCALL_RO / EVAL_RO. Lastly, when data persistence is at risk due to a disk error, execution is blocked as well.

    Using this flag allows executing the script:

    1. With FCALL_RO / EVAL_RO against masters and read-only replicas.
    2. Even if there's a disk error (Redis is unable to persist so it rejects writes).

    However, note that the server will return an error if the script attempts to call a write command.

  • allow-oom: use this flag to allow a script to execute when the server is out of memory (OOM).

    Unless used, Redis will deny the execution of scripts when in an OOM state, regardless of the no-write flag and method of calling. Furthermore, when you use this flag, the script can call any Redis command, including commands that aren't usually allowed in this state.

  • allow-stale: a flag that enables running the script against a stale replica.

    By default, Redis prevents data consistency problems from using old data by having stale replicas return a runtime error. In cases where the consistency is a lesser concern, this flag allows stale Redis replicas to run the script.

  • no-cluster: the flag causes the script to return an error in Redis cluster mode.

    Redis allows scripts to be executed both in standalone and cluster modes. Setting this flag prevents executing the script against nodes in the cluster.

  • allow-cross-slot-keys: The flag that allows a script to access keys from multiple slots.

    Redis typically prevents any single command from accessing keys that hash to multiple slots. This flag allows scripts to break this rule and access keys within the script that access multiple slots. Declared keys to the script are still always required to hash to a single slot. Accessing keys from multiple slots is discouraged as applications should be designed to only access keys from a single slot at a time, allowing slots to move between Redis servers.

    This flag has no effect when cluster mode is disabled.

Please refer to Function Flags and Eval Flags for a detailed example.

redis.REDIS_VERSION

  • Since version: 7.0.0
  • Available in scripts: yes
  • Available in functions: yes

Returns the current Redis server version as a Lua string. The reply's format is MM.mm.PP, where:

  • MM: is the major version.
  • mm: is the minor version.
  • PP: is the patch level.

redis.REDIS_VERSION_NUM

  • Since version: 7.0.0
  • Available in scripts: yes
  • Available in functions: yes

Returns the current Redis server version as a number. The reply is a hexadecimal value structured as 0x00MMmmPP, where:

  • MM: is the major version.
  • mm: is the minor version.
  • PP: is the patch level.

Data type conversion

Unless a runtime exception is raised, redis.call() and redis.pcall() return the reply from the executed command to the Lua script. Redis' replies from these functions are converted automatically into Lua's native data types.

Similarly, when a Lua script returns a reply with the return keyword, that reply is automatically converted to Redis' protocol.

Put differently; there's a one-to-one mapping between Redis' replies and Lua's data types and a one-to-one mapping between Lua's data types and the Redis Protocol data types. The underlying design is such that if a Redis type is converted into a Lua type and converted back into a Redis type, the result is the same as the initial value.

Type conversion from Redis protocol replies (i.e., the replies from redis.call() and redis.pcall() to Lua data types depends on the Redis Serialization Protocol version used by the script. The default protocol version during script executions is RESP2. The script may switch the replies' protocol versions by calling the redis.setresp() function.

Type conversion from a script's returned Lua data type depends on the user's choice of protocol (see the HELLO command).

The following sections describe the type conversion rules between Lua and Redis per the protocol's version.

RESP2 to Lua type conversion

The following type conversion rules apply to the execution's context by default as well as after calling redis.setresp(2):

Lua to RESP2 type conversion

The following type conversion rules apply by default as well as after the user had called HELLO 2:

There is an additional Lua-to-Redis conversion rule that has no corresponding Redis-to-Lua conversion rule:

There are three additional rules to note about converting Lua to Redis data types:

  • Lua has a single numerical type, Lua numbers. There is no distinction between integers and floats. So we always convert Lua numbers into integer replies, removing the decimal part of the number, if any. If you want to return a Lua float, it should be returned as a string, exactly like Redis itself does (see, for instance, the ZSCORE command).
  • There's no simple way to have nils inside Lua arrays due to Lua's table semantics. Therefore, who;e Redis converts a Lua array to RESP, the conversion stops when it encounters a Lua nil value.
  • When a Lua table is an associative array that contains keys and their respective values, the converted Redis reply will not include them.

Lua to RESP2 type conversion examples:

redis> EVAL "return 10" 0
(integer) 10

redis> EVAL "return { 1, 2, { 3, 'Hello World!' } }" 0
1) (integer) 1s
2) (integer) 2
3) 1) (integer) 3
   1) "Hello World!"

redis> EVAL "return redis.call('get','foo')" 0
"bar"

The last example demonstrates receiving and returning the exact return value of redis.call() (or redis.pcall()) in Lua as it would be returned if the command had been called directly.

The following example shows how floats and arrays that cont nils and keys are handled:

redis> EVAL "return { 1, 2, 3.3333, somekey = 'somevalue', 'foo', nil , 'bar' }" 0
1) (integer) 1
2) (integer) 2
3) (integer) 3
4) "foo"

As you can see, the float value of 3.333 gets converted to an integer 3, the somekey key and its value are omitted, and the string "bar" isn't returned as there is a nil value that precedes it.

RESP3 to Lua type conversion

RESP3 is a newer version of the Redis Serialization Protocol. It is available as an opt-in choice as of Redis v6.0.

An executing script may call the redis.setresp function during its execution and switch the protocol version that's used for returning replies from Redis' commands (that can be invoked via redis.call() or redis.pcall()).

Once Redis' replies are in RESP3 protocol, all of the RESP2 to Lua conversion rules apply, with the following additions:

  • RESP3 map reply -> Lua table with a single map field containing a Lua table representing the fields and values of the map.
  • RESP set reply -> Lua table with a single set field containing a Lua table representing the elements of the set as fields, each with the Lua Boolean value of true.
  • RESP3 null -> Lua nil.
  • RESP3 true reply -> Lua true boolean value.
  • RESP3 false reply -> Lua false boolean value.
  • RESP3 double reply -> Lua table with a single score field containing a Lua number representing the double value.
  • RESP3 big number reply -> Lua table with a single big_number field containing a Lua string representing the big number value.
  • Redis verbatim string reply -> Lua table with a single verbatim_string field containing a Lua table with two fields, string and format, representing the verbatim string and its format, respectively.

Note: the RESP3 big number and verbatim strings replies are only supported as of Redis v7.0 and greater. Also, presently, RESP3's attributes, streamed strings and streamed aggregate data types are not supported by the Redis Lua API.

Lua to RESP3 type conversion

Regardless of the script's choice of protocol version set for replies with the [redis.setresp() function] when it calls redis.call() or redis.pcall(), the user may opt-in to using RESP3 (with the HELLO 3 command) for the connection. Although the default protocol for incoming client connections is RESP2, the script should honor the user's preference and return adequately-typed RESP3 replies, so the following rules apply on top of those specified in the Lua to RESP2 type conversion section when that is the case.

  • Lua Boolean -> RESP3 Boolean reply (note that this is a change compared to the RESP2, in which returning a Boolean Lua true returned the number 1 to the Redis client, and returning a false used to return a null.
  • Lua table with a single map field set to an associative Lua table -> RESP3 map reply.
  • Lua table with a single _set field set to an associative Lua table -> RESP3 set reply. Values can be set to anything and are discarded anyway.
  • Lua table with a single double field to an associative Lua table -> RESP3 double reply.
  • Lua nil -> RESP3 null.

However, if the connection is set use the RESP2 protocol, and even if the script replies with RESP3-typed responses, Redis will automatically perform a RESP3 to RESP2 convertion of the reply as is the case for regular commands. That means, for example, that returning the RESP3 map type to a RESP2 connection will result in the repy being converted to a flat RESP2 array that consists of alternating field names and their values, rather than a RESP3 map.

Additional notes about scripting

Using SELECT inside scripts

You can call the SELECT command from your Lua scripts, like you can with any normal client connection. However, one subtle aspect of the behavior changed between Redis versions 2.8.11 and 2.8.12. Prior to Redis version 2.8.12, the database selected by the Lua script was set as the current database for the client connection that had called it. As of Redis version 2.8.12, the database selected by the Lua script only affects the execution context of the script, and does not modify the database that's selected by the client calling the script. This semantic change between patch level releases was required since the old behavior was inherently incompatible with Redis' replication and introduced bugs.

Runtime libraries

The Redis Lua runtime context always comes with several pre-imported libraries.

The following standard Lua libraries are available to use:

In addition, the following external libraries are loaded and accessible to scripts:

struct library

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

struct is a library for packing and unpacking C-like structures in Lua. It provides the following functions:

All of struct's functions expect their first argument to be a format string.

struct formats

The following are valid format strings for struct's functions:

  • >: big endian
  • <: little endian
  • ![num]: alignment
  • x: padding
  • b/B: signed/unsigned byte
  • h/H: signed/unsigned short
  • l/L: signed/unsigned long
  • T: size_t
  • i/In: signed/unsigned integer with size n (defaults to the size of int)
  • cn: sequence of n chars (from/to a string); when packing, n == 0 means the whole string; when unpacking, n == 0 means use the previously read number as the string's length.
  • s: zero-terminated string
  • f: float
  • d: double
  • (space): ignored

struct.pack(x)

This function returns a struct-encoded string from values. It accepts a struct format string as its first argument, followed by the values that are to be encoded.

Usage example:

redis> EVAL "return struct.pack('HH', 1, 2)" 0
"\x01\x00\x02\x00"

struct.unpack(x)

This function returns the decoded values from a struct. It accepts a struct format string as its first argument, followed by encoded struct's string.

Usage example:

redis> EVAL "return { struct.unpack('HH', ARGV[1]) }" 0 "\x01\x00\x02\x00"
1) (integer) 1
2) (integer) 2
3) (integer) 5

struct.size(x)

This function returns the size, in bytes, of a struct. It accepts a struct format string as its only argument.

Usage example:

redis> EVAL "return struct.size('HH')" 0
(integer) 4

cjson library

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

The cjson library provides fast JSON encoding and decoding from Lua. It provides these functions.

cjson.encode(x)

This function returns a JSON-encoded string for the Lua data type provided as its argument.

Usage example:

redis> EVAL "return cjson.encode({ ['foo'] = 'bar' })" 0
"{\"foo\":\"bar\"}"

cjson.decode(x)

This function returns a Lua data type from the JSON-encoded string provided as its argument.

Usage example:

redis> EVAL "return cjson.decode(ARGV[1])['foo']" 0 '{"foo":"bar"}'
"bar"

cmsgpack library

  • Since version: 2.6.0
  • Available in scripts: yes
  • Available in functions: yes

The cmsgpack library provides fast MessagePack encoding and decoding from Lua. It provides these functions.

cmsgpack.pack(x)

This function returns the packed string encoding of the Lua data type it is given as an argument.

Usage example:

redis> EVAL "return cmsgpack.pack({'foo', 'bar', 'baz'})" 0
"\x93\xa3foo\xa3bar\xa3baz"

cmsgpack.unpack(x)

This function returns the unpacked values from decocoding its input string argument.

Usage example:

redis> EVAL "return cmsgpack.unpack(ARGV[1])" 0 "\x93\xa3foo\xa3bar\xa3baz"
1) "foo"
2) "bar"
3) "baz"

bit library

  • Since version: 2.8.18
  • Available in scripts: yes
  • Available in functions: yes

The bit library provides bitwise operations on numbers. Its documentation resides at Lua BitOp documentation It provides the following functions.

bit.tobit(x)

Normalizes a number to the numeric range for bit operations and returns it.

Usage example:

redis> EVAL 'return bit.tobit(1)' 0
(integer) 1

bit.tohex(x [,n])

Converts its first argument to a hex string. The number of hex digits is given by the absolute value of the optional second argument.

Usage example:

redis> EVAL 'return bit.tohex(422342)' 0
"000671c6"

bit.bnot(x)

Returns the bitwise not of its argument.

bit.bnot(x) bit.bor(x1 [,x2...]), bit.band(x1 [,x2...]) and bit.bxor(x1 [,x2...])

Returns either the bitwise or, bitwise and, or bitwise xor of all of its arguments. Note that more than two arguments are allowed.

Usage example:

redis> EVAL 'return bit.bor(1,2,4,8,16,32,64,128)' 0
(integer) 255

bit.lshift(x, n), bit.rshift(x, n) and bit.arshift(x, n)

Returns either the bitwise logical left-shift, bitwise logical right-shift, or bitwise arithmetic right-shift of its first argument by the number of bits given by the second argument.

bit.rol(x, n) and bit.ror(x, n)

Returns either the bitwise left rotation, or bitwise right rotation of its first argument by the number of bits given by the second argument. Bits shifted out on one side are shifted back in on the other side.

bit.bswap(x)

Swaps the bytes of its argument and returns it. This can be used to convert little-endian 32-bit numbers to big-endian 32-bit numbers and vice versa.

11.4 - Debugging Lua scripts in Redis

How to use the built-in Lua debugger

Starting with version 3.2 Redis includes a complete Lua debugger, that can be used in order to make the task of writing complex Redis scripts much simpler.

The Redis Lua debugger, codenamed LDB, has the following important features:

  • It uses a server-client model, so it's a remote debugger. The Redis server acts as the debugging server, while the default client is redis-cli. However other clients can be developed by following the simple protocol implemented by the server.
  • By default every new debugging session is a forked session. It means that while the Redis Lua script is being debugged, the server does not block and is usable for development or in order to execute multiple debugging sessions in parallel. This also means that changes are rolled back after the script debugging session finished, so that's possible to restart a new debugging session again, using exactly the same Redis data set as the previous debugging session.
  • An alternative synchronous (non forked) debugging model is available on demand, so that changes to the dataset can be retained. In this mode the server blocks for the time the debugging session is active.
  • Support for step by step execution.
  • Support for static and dynamic breakpoints.
  • Support from logging the debugged script into the debugger console.
  • Inspection of Lua variables.
  • Tracing of Redis commands executed by the script.
  • Pretty printing of Redis and Lua values.
  • Infinite loops and long execution detection, which simulates a breakpoint.

Quick start

A simple way to get started with the Lua debugger is to watch this video introduction:

Important Note: please make sure to avoid debugging Lua scripts using your Redis production server. Use a development server instead. Also note that using the synchronous debugging mode (which is NOT the default) results in the Redis server blocking for all the time the debugging session lasts.

To start a new debugging session using redis-cli do the following:

  1. Create your script in some file with your preferred editor. Let's assume you are editing your Redis Lua script located at /tmp/script.lua.

  2. Start a debugging session with:

    ./redis-cli --ldb --eval /tmp/script.lua

Note that with the --eval option of redis-cli you can pass key names and arguments to the script, separated by a comma, like in the following example:

./redis-cli --ldb --eval /tmp/script.lua mykey somekey , arg1 arg2

You'll enter a special mode where redis-cli no longer accepts its normal commands, but instead prints a help screen and passes the unmodified debugging commands directly to Redis.

The only commands which are not passed to the Redis debugger are:

  • quit -- this will terminate the debugging session. It's like removing all the breakpoints and using the continue debugging command. Moreover the command will exit from redis-cli.
  • restart -- the debugging session will restart from scratch, reloading the new version of the script from the file. So a normal debugging cycle involves modifying the script after some debugging, and calling restart in order to start debugging again with the new script changes.
  • help -- this command is passed to the Redis Lua debugger, that will print a list of commands like the following:
lua debugger> help
Redis Lua debugger help:
[h]elp               Show this help.
[s]tep               Run current line and stop again.
[n]ext               Alias for step.
[c]continue          Run till next breakpoint.
[l]list              List source code around current line.
[l]list [line]       List source code around [line].
                     line = 0 means: current position.
[l]list [line] [ctx] In this form [ctx] specifies how many lines
                     to show before/after [line].
[w]hole              List all source code. Alias for 'list 1 1000000'.
[p]rint              Show all the local variables.
[p]rint <var>        Show the value of the specified variable.
                     Can also show global vars KEYS and ARGV.
[b]reak              Show all breakpoints.
[b]reak <line>       Add a breakpoint to the specified line.
[b]reak -<line>      Remove breakpoint from the specified line.
[b]reak 0            Remove all breakpoints.
[t]race              Show a backtrace.
[e]eval <code>       Execute some Lua code (in a different callframe).
[r]edis <cmd>        Execute a Redis command.
[m]axlen [len]       Trim logged Redis replies and Lua var dumps to len.
                     Specifying zero as <len> means unlimited.
[a]abort             Stop the execution of the script. In sync
                     mode dataset changes will be retained.

Debugger functions you can call from Lua scripts:
redis.debug()        Produce logs in the debugger console.
redis.breakpoint()   Stop execution as if there was a breakpoint in the
                     next line of code.

Note that when you start the debugger it will start in stepping mode. It will stop at the first line of the script that actually does something before executing it.

From this point you usually call step in order to execute the line and go to the next line. While you step Redis will show all the commands executed by the server like in the following example:

* Stopped at 1, stop reason = step over
-> 1   redis.call('ping')
lua debugger> step
<redis> ping
<reply> "+PONG"
* Stopped at 2, stop reason = step over

The <redis> and <reply> lines show the command executed by the line just executed, and the reply from the server. Note that this happens only in stepping mode. If you use continue in order to execute the script till the next breakpoint, commands will not be dumped on the screen to prevent too much output.

Termination of the debugging session

When the scripts terminates naturally, the debugging session ends and redis-cli returns in its normal non-debugging mode. You can restart the session using the restart command as usual.

Another way to stop a debugging session is just interrupting redis-cli manually by pressing Ctrl+C. Note that also any event breaking the connection between redis-cli and the redis-server will interrupt the debugging session.

All the forked debugging sessions are terminated when the server is shut down.

Abbreviating debugging commands

Debugging can be a very repetitive task. For this reason every Redis debugger command starts with a different character, and you can use the single initial character in order to refer to the command.

So for example instead of typing step you can just type s.

Breakpoints

Adding and removing breakpoints is trivial as described in the online help. Just use b 1 2 3 4 to add a breakpoint in line 1, 2, 3, 4. The command b 0 removes all the breakpoints. Selected breakpoints can be removed using as argument the line where the breakpoint we want to remove is, but prefixed by a minus sign. So for example b -3 removes the breakpoint from line 3.

Note that adding breakpoints to lines that Lua never executes, like declaration of local variables or comments, will not work. The breakpoint will be added but since this part of the script will never be executed, the program will never stop.

Dynamic breakpoints

Using the breakpoint command it is possible to add breakpoints into specific lines. However sometimes we want to stop the execution of the program only when something special happens. In order to do so, you can use the redis.breakpoint() function inside your Lua script. When called it simulates a breakpoint in the next line that will be executed.

if counter > 10 then redis.breakpoint() end

This feature is extremely useful when debugging, so that we can avoid continuing the script execution manually multiple times until a given condition is encountered.

Synchronous mode

As explained previously, but default LDB uses forked sessions with rollback of all the data changes operated by the script while it has being debugged. Determinism is usually a good thing to have during debugging, so that successive debugging sessions can be started without having to reset the database content to its original state.

However for tracking certain bugs, you may want to retain the changes performed to the key space by each debugging session. When this is a good idea you should start the debugger using a special option, ldb-sync-mode, in redis-cli.

./redis-cli --ldb-sync-mode --eval /tmp/script.lua

Note: Redis server will be unreachable during the debugging session in this mode, so use with care.

In this special mode, the abort command can stop the script half-way taking the changes operated to the dataset. Note that this is different compared to ending the debugging session normally. If you just interrupt redis-cli the script will be fully executed and then the session terminated. Instead with abort you can interrupt the script execution in the middle and start a new debugging session if needed.

Logging from scripts

The redis.debug() command is a powerful debugging facility that can be called inside the Redis Lua script in order to log things into the debug console:

lua debugger> list
-> 1   local a = {1,2,3}
   2   local b = false
   3   redis.debug(a,b)
lua debugger> continue
<debug> line 3: {1; 2; 3}, false

If the script is executed outside of a debugging session, redis.debug() has no effects at all. Note that the function accepts multiple arguments, that are separated by a comma and a space in the output.

Tables and nested tables are displayed correctly in order to make values simple to observe for the programmer debugging the script.

Inspecting the program state with print and eval

While the redis.debug() function can be used in order to print values directly from within the Lua script, often it is useful to observe the local variables of a program while stepping or when stopped into a breakpoint.

The print command does just that, and performs lookup in the call frames starting from the current one back to the previous ones, up to top-level. This means that even if we are into a nested function inside a Lua script, we can still use print foo to look at the value of foo in the context of the calling function. When called without a variable name, print will print all variables and their respective values.

The eval command executes small pieces of Lua scripts outside the context of the current call frame (evaluating inside the context of the current call frame is not possible with the current Lua internals). However you can use this command in order to test Lua functions.

lua debugger> e redis.sha1hex('foo')
<retval> "0beec7b5ea3f0fdbc95d0dd47f3c5bc275da8a33"

Debugging clients

LDB uses the client-server model where the Redis server acts as a debugging server that communicates using RESP. While redis-cli is the default debug client, any client can be used for debugging as long as it meets one of the following conditions:

  1. The client provides a native interface for setting the debug mode and controlling the debug session.
  2. The client provides an interface for sending arbitrary commands over RESP.
  3. The client allows sending raw messages to the Redis server.

For example, the Redis plugin for ZeroBrane Studio integrates with LDB using redis-lua. The following Lua code is a simplified example of how the plugin achieves that:

local redis = require 'redis'

-- add LDB's Continue command
redis.commands['ldbcontinue'] = redis.command('C')

-- script to be debugged
local script = [[
  local x, y = tonumber(ARGV[1]), tonumber(ARGV[2])
  local result = x * y
  return result
]]

local client = redis.connect('127.0.0.1', 6379)
client:script("DEBUG", "YES")
print(unpack(client:eval(script, 0, 6, 9)))
client:ldbcontinue()

12 - Redis Pub/Sub

How to use pub/sub channels in Redis

SUBSCRIBE, UNSUBSCRIBE and PUBLISH implement the Publish/Subscribe messaging paradigm where (citing Wikipedia) senders (publishers) are not programmed to send their messages to specific receivers (subscribers). Rather, published messages are characterized into channels, without knowledge of what (if any) subscribers there may be. Subscribers express interest in one or more channels, and only receive messages that are of interest, without knowledge of what (if any) publishers there are. This decoupling of publishers and subscribers can allow for greater scalability and a more dynamic network topology.

For instance in order to subscribe to channels foo and bar the client issues a SUBSCRIBE providing the names of the channels:

SUBSCRIBE foo bar

Messages sent by other clients to these channels will be pushed by Redis to all the subscribed clients.

A client subscribed to one or more channels should not issue commands, although it can subscribe and unsubscribe to and from other channels. The replies to subscription and unsubscribing operations are sent in the form of messages, so that the client can just read a coherent stream of messages where the first element indicates the type of message. The commands that are allowed in the context of a subscribed client are SUBSCRIBE, SSUBSCRIBE, SUNSUBSCRIBE, PSUBSCRIBE, UNSUBSCRIBE, PUNSUBSCRIBE, PING, RESET, and QUIT.

Please note that redis-cli will not accept any commands once in subscribed mode and can only quit the mode with Ctrl-C.

Format of pushed messages

A message is a array-reply with three elements.

The first element is the kind of message:

  • subscribe: means that we successfully subscribed to the channel given as the second element in the reply. The third argument represents the number of channels we are currently subscribed to.

  • unsubscribe: means that we successfully unsubscribed from the channel given as second element in the reply. The third argument represents the number of channels we are currently subscribed to. When the last argument is zero, we are no longer subscribed to any channel, and the client can issue any kind of Redis command as we are outside the Pub/Sub state.

  • message: it is a message received as result of a PUBLISH command issued by another client. The second element is the name of the originating channel, and the third argument is the actual message payload.

Database & Scoping

Pub/Sub has no relation to the key space. It was made to not interfere with it on any level, including database numbers.

Publishing on db 10, will be heard by a subscriber on db 1.

If you need scoping of some kind, prefix the channels with the name of the environment (test, staging, production...).

Wire protocol example

SUBSCRIBE first second
*3
$9
subscribe
$5
first
:1
*3
$9
subscribe
$6
second
:2

At this point, from another client we issue a PUBLISH operation against the channel named second:

> PUBLISH second Hello

This is what the first client receives:

*3
$7
message
$6
second
$5
Hello

Now the client unsubscribes itself from all the channels using the UNSUBSCRIBE command without additional arguments:

UNSUBSCRIBE
*3
$11
unsubscribe
$6
second
:1
*3
$11
unsubscribe
$5
first
:0

Pattern-matching subscriptions

The Redis Pub/Sub implementation supports pattern matching. Clients may subscribe to glob-style patterns in order to receive all the messages sent to channel names matching a given pattern.

For instance:

PSUBSCRIBE news.*

Will receive all the messages sent to the channel news.art.figurative, news.music.jazz, etc. All the glob-style patterns are valid, so multiple wildcards are supported.

PUNSUBSCRIBE news.*

Will then unsubscribe the client from that pattern. No other subscriptions will be affected by this call.

Messages received as a result of pattern matching are sent in a different format:

  • The type of the message is pmessage: it is a message received as result of a PUBLISH command issued by another client, matching a pattern-matching subscription. The second element is the original pattern matched, the third element is the name of the originating channel, and the last element the actual message payload.

Similarly to SUBSCRIBE and UNSUBSCRIBE, PSUBSCRIBE and PUNSUBSCRIBE commands are acknowledged by the system sending a message of type psubscribe and punsubscribe using the same format as the subscribe and unsubscribe message format.

Messages matching both a pattern and a channel subscription

A client may receive a single message multiple times if it's subscribed to multiple patterns matching a published message, or if it is subscribed to both patterns and channels matching the message. Like in the following example:

SUBSCRIBE foo
PSUBSCRIBE f*

In the above example, if a message is sent to channel foo, the client will receive two messages: one of type message and one of type pmessage.

The meaning of the subscription count with pattern matching

In subscribe, unsubscribe, psubscribe and punsubscribe message types, the last argument is the count of subscriptions still active. This number is actually the total number of channels and patterns the client is still subscribed to. So the client will exit the Pub/Sub state only when this count drops to zero as a result of unsubscribing from all the channels and patterns.

Sharded Pub/Sub

From 7.0, sharded Pub/Sub is introduced in which shard channels are assigned to slots by the same algorithm used to assign keys to slots. A shard message must be sent to a node that own the slot the shard channel is hashed to. The cluster makes sure the published shard messages are forwarded to all nodes in the shard, so clients can subscribe to a shard channel by connecting to either the master responsible for the slot, or to any of its replicas. SSUBSCRIBE, SUNSUBSCRIBE and SPUBLISH are used to implement sharded Pub/Sub.

Sharded Pub/Sub helps to scale the usage of Pub/Sub in cluster mode. It restricts the propagation of message to be within the shard of a cluster. Hence, the amount of data passing through the cluster bus is limited in comparison to global Pub/Sub where each message propagates to each node in the cluster. This allows users to horizontally scale the Pub/Sub usage by adding more shards.

Programming example

Pieter Noordhuis provided a great example using EventMachine and Redis to create a multi user high performance web chat.

Client library implementation hints

Because all the messages received contain the original subscription causing the message delivery (the channel in the case of message type, and the original pattern in the case of pmessage type) client libraries may bind the original subscription to callbacks (that can be anonymous functions, blocks, function pointers), using a hash table.

When a message is received an O(1) lookup can be done in order to deliver the message to the registered callback.

13 - Redis replication

How Redis supports high availability and failover with replication

At the base of Redis replication (excluding the high availability features provided as an additional layer by Redis Cluster or Redis Sentinel) there is a leader follower (master-replica) replication that is simple to use and configure. It allows replica Redis instances to be exact copies of master instances. The replica will automatically reconnect to the master every time the link breaks, and will attempt to be an exact copy of it regardless of what happens to the master.

This system works using three main mechanisms:

  1. When a master and a replica instances are well-connected, the master keeps the replica updated by sending a stream of commands to the replica to replicate the effects on the dataset happening in the master side due to: client writes, keys expired or evicted, any other action changing the master dataset.
  2. When the link between the master and the replica breaks, for network issues or because a timeout is sensed in the master or the replica, the replica reconnects and attempts to proceed with a partial resynchronization: it means that it will try to just obtain the part of the stream of commands it missed during the disconnection.
  3. When a partial resynchronization is not possible, the replica will ask for a full resynchronization. This will involve a more complex process in which the master needs to create a snapshot of all its data, send it to the replica, and then continue sending the stream of commands as the dataset changes.

Redis uses by default asynchronous replication, which being low latency and high performance, is the natural replication mode for the vast majority of Redis use cases. However, Redis replicas asynchronously acknowledge the amount of data they received periodically with the master. So the master does not wait every time for a command to be processed by the replicas, however it knows, if needed, what replica already processed what command. This allows having optional synchronous replication.

Synchronous replication of certain data can be requested by the clients using the WAIT command. However WAIT is only able to ensure there are the specified number of acknowledged copies in the other Redis instances, it does not turn a set of Redis instances into a CP system with strong consistency: acknowledged writes can still be lost during a failover, depending on the exact configuration of the Redis persistence. However with WAIT the probability of losing a write after a failure event is greatly reduced to certain hard to trigger failure modes.

You can check the Redis Sentinel or Redis Cluster documentation for more information about high availability and failover. The rest of this document mainly describes the basic characteristics of Redis basic replication.

Important facts about Redis replication

  • Redis uses asynchronous replication, with asynchronous replica-to-master acknowledges of the amount of data processed.
  • A master can have multiple replicas.
  • Replicas are able to accept connections from other replicas. Aside from connecting a number of replicas to the same master, replicas can also be connected to other replicas in a cascading-like structure. Since Redis 4.0, all the sub-replicas will receive exactly the same replication stream from the master.
  • Redis replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more replicas perform the initial synchronization or a partial resynchronization.
  • Replication is also largely non-blocking on the replica side. While the replica is performing the initial synchronization, it can handle queries using the old version of the dataset, assuming you configured Redis to do so in redis.conf. Otherwise, you can configure Redis replicas to return an error to clients if the replication stream is down. However, after the initial sync, the old dataset must be deleted and the new one must be loaded. The replica will block incoming connections during this brief window (that can be as long as many seconds for very large datasets). Since Redis 4.0 you can configure Redis so that the deletion of the old data set happens in a different thread, however loading the new initial dataset will still happen in the main thread and block the replica.
  • Replication can be used both for scalability, to have multiple replicas for read-only queries (for example, slow O(N) operations can be offloaded to replicas), or simply for improving data safety and high availability.
  • You can use replication to avoid the cost of having the master writing the full dataset to disk: a typical technique involves configuring your master redis.conf to avoid persisting to disk at all, then connect a replica configured to save from time to time, or with AOF enabled. However, this setup must be handled with care, since a restarting master will start with an empty dataset: if the replica tries to sync with it, the replica will be emptied as well.

Safety of replication when master has persistence turned off

In setups where Redis replication is used, it is strongly advised to have persistence turned on in the master and in the replicas. When this is not possible, for example because of latency concerns due to very slow disks, instances should be configured to avoid restarting automatically after a reboot.

To better understand why masters with persistence turned off configured to auto restart are dangerous, check the following failure mode where data is wiped from the master and all its replicas:

  1. We have a setup with node A acting as master, with persistence turned down, and nodes B and C replicating from node A.
  2. Node A crashes, however it has some auto-restart system, that restarts the process. However since persistence is turned off, the node restarts with an empty data set.
  3. Nodes B and C will replicate from node A, which is empty, so they'll effectively destroy their copy of the data.

When Redis Sentinel is used for high availability, also turning off persistence on the master, together with auto restart of the process, is dangerous. For example the master can restart fast enough for Sentinel to not detect a failure, so that the failure mode described above happens.

Every time data safety is important, and replication is used with master configured without persistence, auto restart of instances should be disabled.

How Redis replication works

Every Redis master has a replication ID: it is a large pseudo random string that marks a given story of the dataset. Each master also takes an offset that increments for every byte of replication stream that it is produced to be sent to replicas, to update the state of the replicas with the new changes modifying the dataset. The replication offset is incremented even if no replica is actually connected, so basically every given pair of:

Replication ID, offset

Identifies an exact version of the dataset of a master.

When replicas connect to masters, they use the PSYNC command to send their old master replication ID and the offsets they processed so far. This way the master can send just the incremental part needed. However if there is not enough backlog in the master buffers, or if the replica is referring to an history (replication ID) which is no longer known, than a full resynchronization happens: in this case the replica will get a full copy of the dataset, from scratch.

This is how a full synchronization works in more details:

The master starts a background saving process to produce an RDB file. At the same time it starts to buffer all new write commands received from the clients. When the background saving is complete, the master transfers the database file to the replica, which saves it on disk, and then loads it into memory. The master will then send all buffered commands to the replica. This is done as a stream of commands and is in the same format of the Redis protocol itself.

You can try it yourself via telnet. Connect to the Redis port while the server is doing some work and issue the SYNC command. You'll see a bulk transfer and then every command received by the master will be re-issued in the telnet session. Actually SYNC is an old protocol no longer used by newer Redis instances, but is still there for backward compatibility: it does not allow partial resynchronizations, so now PSYNC is used instead.

As already said, replicas are able to automatically reconnect when the master-replica link goes down for some reason. If the master receives multiple concurrent replica synchronization requests, it performs a single background save in to serve all of them.

Replication ID explained

In the previous section we said that if two instances have the same replication ID and replication offset, they have exactly the same data. However it is useful to understand what exactly is the replication ID, and why instances have actually two replication IDs the main ID and the secondary ID.

A replication ID basically marks a given history of the data set. Every time an instance restarts from scratch as a master, or a replica is promoted to master, a new replication ID is generated for this instance. The replicas connected to a master will inherit its replication ID after the handshake. So two instances with the same ID are related by the fact that they hold the same data, but potentially at a different time. It is the offset that works as a logical time to understand, for a given history (replication ID) who holds the most updated data set.

For instance, if two instances A and B have the same replication ID, but one with offset 1000 and one with offset 1023, it means that the first lacks certain commands applied to the data set. It also means that A, by applying just a few commands, may reach exactly the same state of B.

The reason why Redis instances have two replication IDs is because of replicas that are promoted to masters. After a failover, the promoted replica requires to still remember what was its past replication ID, because such replication ID was the one of the former master. In this way, when other replicas will sync with the new master, they will try to perform a partial resynchronization using the old master replication ID. This will work as expected, because when the replica is promoted to master it sets its secondary ID to its main ID, remembering what was the offset when this ID switch happened. Later it will select a new random replication ID, because a new history begins. When handling the new replicas connecting, the master will match their IDs and offsets both with the current ID and the secondary ID (up to a given offset, for safety). In short this means that after a failover, replicas connecting to the newly promoted master don't have to perform a full sync.

In case you wonder why a replica promoted to master needs to change its replication ID after a failover: it is possible that the old master is still working as a master because of some network partition: retaining the same replication ID would violate the fact that the same ID and same offset of any two random instances mean they have the same data set.

Diskless replication

Normally a full resynchronization requires creating an RDB file on disk, then reloading the same RDB from disk to feed the replicas with the data.

With slow disks this can be a very stressing operation for the master. Redis version 2.8.18 is the first version to have support for diskless replication. In this setup the child process directly sends the RDB over the wire to replicas, without using the disk as intermediate storage.

Configuration

To configure basic Redis replication is trivial: just add the following line to the replica configuration file:

replicaof 192.168.1.1 6379

Of course you need to replace 192.168.1.1 6379 with your master IP address (or hostname) and port. Alternatively, you can call the REPLICAOF command and the master host will start a sync with the replica.

There are also a few parameters for tuning the replication backlog taken in memory by the master to perform the partial resynchronization. See the example redis.conf shipped with the Redis distribution for more information.

Diskless replication can be enabled using the repl-diskless-sync configuration parameter. The delay to start the transfer to wait for more replicas to arrive after the first one is controlled by the repl-diskless-sync-delay parameter. Please refer to the example redis.conf file in the Redis distribution for more details.

Read-only replica

Since Redis 2.6, replicas support a read-only mode that is enabled by default. This behavior is controlled by the replica-read-only option in the redis.conf file, and can be enabled and disabled at runtime using CONFIG SET.

Read-only replicas will reject all write commands, so that it is not possible to write to a replica because of a mistake. This does not mean that the feature is intended to expose a replica instance to the internet or more generally to a network where untrusted clients exist, because administrative commands like DEBUG or CONFIG are still enabled. The Security page describes how to secure a Redis instance.

You may wonder why it is possible to revert the read-only setting and have replica instances that can be targeted by write operations. The answer is that writable replicas exist only for historical reasons. Using writable replicas can result in inconsistency between the master and the replica, so it is not recommended to use writable replicas. To understand in which situations this can be a problem, we need to understand how replication works. Changes on the master is replicated by propagating regular Redis commands to the replica. When a key expires on the master, this is propagated as a DEL command. If a key which exists on the master but is deleted, expired or has a different type on the replica compared to the master will react differently to commands like DEL, INCR or RPOP propagated from the master than intended. The propagated command may fail on the replica or result in a different outcome. To minimize the risks (if you insist on using writable replicas) we suggest you follow these recommendations:

  • Don't write to keys in a writable replica that are also used on the master. (This can be hard to guarantee if you don't have control over all the clients that write to the master.)

  • Don't configure an instance as a writable replica as an intermediary step when upgrading a set of instances in a running system. In general, don't configure an instance as a writable replica if it can ever be promoted to a master if you want to guarantee data consistency.

Historically, there were some use cases that were consider legitimate for writable replicas. As of version 7.0, these use cases are now all obsolete and the same can be achieved by other means. For example:

  • Computing slow Set or Sorted set operations and storing the result in temporary local keys using commands like SUNIONSTORE and ZINTERSTORE. Instead, use commands that return the result without storing it, such as SUNION and ZINTER.

  • Using the SORT command (which is not considered a read-only command because of the optional STORE option and therefore cannot be used on a read-only replica). Instead, use SORT_RO, which is a read-only command.

  • Using EVAL and EVALSHA are also not considered read-only commands, because the Lua script may call write commands. Instead, use EVAL_RO and EVALSHA_RO where the Lua script can only call read-only commands.

While writes to a replica will be discarded if the replica and the master resync or if the replica is restarted, there is no guarantee that they will sync automatically.

Before version 4.0, writable replicas were incapable of expiring keys with a time to live set. This means that if you use EXPIRE or other commands that set a maximum TTL for a key, the key will leak, and while you may no longer see it while accessing it with read commands, you will see it in the count of keys and it will still use memory. Redis 4.0 RC3 and greater versions are able to evict keys with TTL as masters do, with the exceptions of keys written in DB numbers greater than 63 (but by default Redis instances only have 16 databases). Note though that even in versions greater than 4.0, using EXPIRE on a key that could ever exists on the master can cause inconsistency between the replica and the master.

Also note that since Redis 4.0 replica writes are only local, and are not propagated to sub-replicas attached to the instance. Sub-replicas instead will always receive the replication stream identical to the one sent by the top-level master to the intermediate replicas. So for example in the following setup:

A ---> B ---> C

Even if B is writable, C will not see B writes and will instead have identical dataset as the master instance A.

Setting a replica to authenticate to a master

If your master has a password via requirepass, it's trivial to configure the replica to use that password in all sync operations.

To do it on a running instance, use redis-cli and type:

config set masterauth <password>

To set it permanently, add this to your config file:

masterauth <password>

Allow writes only with N attached replicas

Starting with Redis 2.8, you can configure a Redis master to accept write queries only if at least N replicas are currently connected to the master.

However, because Redis uses asynchronous replication it is not possible to ensure the replica actually received a given write, so there is always a window for data loss.

This is how the feature works:

  • Redis replicas ping the master every second, acknowledging the amount of replication stream processed.
  • Redis masters will remember the last time it received a ping from every replica.
  • The user can configure a minimum number of replicas that have a lag not greater than a maximum number of seconds.

If there are at least N replicas, with a lag less than M seconds, then the write will be accepted.

You may think of it as a best effort data safety mechanism, where consistency is not ensured for a given write, but at least the time window for data loss is restricted to a given number of seconds. In general bound data loss is better than unbound one.

If the conditions are not met, the master will instead reply with an error and the write will not be accepted.

There are two configuration parameters for this feature:

  • min-replicas-to-write <number of replicas>
  • min-replicas-max-lag <number of seconds>

For more information, please check the example redis.conf file shipped with the Redis source distribution.

How Redis replication deals with expires on keys

Redis expires allow keys to have a limited time to live (TTL). Such a feature depends on the ability of an instance to count the time, however Redis replicas correctly replicate keys with expires, even when such keys are altered using Lua scripts.

To implement such a feature Redis cannot rely on the ability of the master and replica to have syncd clocks, since this is a problem that cannot be solved and would result in race conditions and diverging data sets, so Redis uses three main techniques to make the replication of expired keys able to work:

  1. Replicas don't expire keys, instead they wait for masters to expire the keys. When a master expires a key (or evict it because of LRU), it synthesizes a DEL command which is transmitted to all the replicas.
  2. However because of master-driven expire, sometimes replicas may still have in memory keys that are already logically expired, since the master was not able to provide the DEL command in time. In to deal with that the replica uses its logical clock to report that a key does not exist only for read operations that don't violate the consistency of the data set (as new commands from the master will arrive). In this way replicas avoid reporting logically expired keys are still existing. In practical terms, an HTML fragments cache that uses replicas to scale will avoid returning items that are already older than the desired time to live.
  3. During Lua scripts executions no key expiries are performed. As a Lua script runs, conceptually the time in the master is frozen, so that a given key will either exist or not for all the time the script runs. This prevents keys expiring in the middle of a script, and is needed to send the same script to the replica in a way that is guaranteed to have the same effects in the data set.

Once a replica is promoted to a master it will start to expire keys independently, and will not require any help from its old master.

Configuring replication in Docker and NAT

When Docker, or other types of containers using port forwarding, or Network Address Translation is used, Redis replication needs some extra care, especially when using Redis Sentinel or other systems where the master INFO or ROLE commands output is scanned to discover replicas' addresses.

The problem is that the ROLE command, and the replication section of the INFO output, when issued into a master instance, will show replicas as having the IP address they use to connect to the master, which, in environments using NAT may be different compared to the logical address of the replica instance (the one that clients should use to connect to replicas).

Similarly the replicas will be listed with the listening port configured into redis.conf, that may be different from the forwarded port in case the port is remapped.

To fix both issues, it is possible, since Redis 3.2.2, to force a replica to announce an arbitrary pair of IP and port to the master. The two configurations directives to use are:

replica-announce-ip 5.5.5.5
replica-announce-port 1234

And are documented in the example redis.conf of recent Redis distributions.

The INFO and ROLE command

There are two Redis commands that provide a lot of information on the current replication parameters of master and replica instances. One is INFO. If the command is called with the replication argument as INFO replication only information relevant to the replication are displayed. Another more computer-friendly command is ROLE, that provides the replication status of masters and replicas together with their replication offsets, list of connected replicas and so forth.

Partial sync after restarts and failovers

Since Redis 4.0, when an instance is promoted to master after a failover, it will be still able to perform a partial resynchronization with the replicas of the old master. To do so, the replica remembers the old replication ID and offset of its former master, so can provide part of the backlog to the connecting replicas even if they ask for the old replication ID.

However the new replication ID of the promoted replica will be different, since it constitutes a different history of the data set. For example, the master can return available and can continue accepting writes for some time, so using the same replication ID in the promoted replica would violate the rule that a of replication ID and offset pair identifies only a single data set.

Moreover, replicas - when powered off gently and restarted - are able to store in the RDB file the information needed to resync with their master. This is useful in case of upgrades. When this is needed, it is better to use the SHUTDOWN command in order to perform a save & quit operation on the replica.

It is not possible to partially sync a replica that restarted via the AOF file. However the instance may be turned to RDB persistence before shutting down it, than can be restarted, and finally AOF can be enabled again.

Maxmemory on replicas

By default, a replica will ignore maxmemory (unless it is promoted to master after a failover or manually). It means that the eviction of keys will be handled by the master, sending the DEL commands to the replica as keys evict in the master side.

This behavior ensures that masters and replicas stay consistent, which is usually what you want. However, if your replica is writable, or you want the replica to have a different memory setting, and you are sure all the writes performed to the replica are idempotent, then you may change this default (but be sure to understand what you are doing).

Note that since the replica by default does not evict, it may end up using more memory than what is set via maxmemory (since there are certain buffers that may be larger on the replica, or data structures may sometimes take more memory and so forth). Make sure you monitor your replicas, and make sure they have enough memory to never hit a real out-of-memory condition before the master hits the configured maxmemory setting.

To change this behavior, you can allow a replica to not ignore the maxmemory. The configuration directives to use is:

replica-ignore-maxmemory no

14 - Scaling with Redis Cluster

Horizontal scaling with Redis Cluster

Redis scales horizontally with a deployment topology called Redis Cluster.

This document is a gentle introduction to Redis Cluster, teaching you how to set up, test, and operate Redis Cluster in production.

This tutorial also described the availability and consistency characteristics of Redis Cluster from the point of view of the end user, stated in a simple-to-understand way.

If you plan to run a production Redis Cluster deployment, or want to better understand how Redis Cluster works internally, consult the Redis Cluster specification.

Redis Cluster 101

Redis Cluster provides a way to run a Redis installation where data is automatically sharded across multiple Redis nodes.

Redis Cluster also provides some degree of availability during partitions, that is in practical terms the ability to continue the operations when some nodes fail or are not able to communicate. However the cluster stops to operate in the event of larger failures (for example when the majority of masters are unavailable).

So in practical terms, what do you get with Redis Cluster?

  • The ability to automatically split your dataset among multiple nodes.
  • The ability to continue operations when a subset of the nodes are experiencing failures or are unable to communicate with the rest of the cluster.

Redis Cluster TCP ports

Every Redis Cluster node requires two open TCP connections: a Redis TCP port used to serve clients, e.g., 6379, and second port known as the cluster bus port. By default, the cluster bus port is set by adding 10000 to the data port (e.g., 16379); however, you can override this in the cluster-port config.

This second port is used for the cluster bus, which is a node-to-node communication channel using a binary protocol. The cluster bus is used by nodes for failure detection, configuration update, failover authorization, and so forth. Clients should never try to communicate with the cluster bus port, but always with the normal Redis command port, however make sure you open both ports in your firewall, otherwise Redis cluster nodes will be unable to communicate.

Note that for a Redis Cluster to work properly you need, for each node:

  1. The normal client communication port (usually 6379) used to communicate with clients to be open to all the clients that need to reach the cluster, plus all the other cluster nodes (that use the client port for keys migrations).
  2. The cluster bus port must be reachable from all the other cluster nodes.

If you don't open both TCP ports, your cluster will not work as expected.

The cluster bus uses a different, binary protocol, for node to node data exchange, which is more suited to exchange information between nodes using little bandwidth and processing time.

Redis Cluster and Docker

Currently, Redis Cluster does not support NATted environments and in general environments where IP addresses or TCP ports are remapped.

Docker uses a technique called port mapping: programs running inside Docker containers may be exposed with a different port compared to the one the program believes to be using. This is useful for running multiple containers using the same ports, at the same time, in the same server.

To make Docker compatible with Redis Cluster, you need to use Docker's host networking mode. Please see the --net=host option in the Docker documentation for more information.

Redis Cluster data sharding

Redis Cluster does not use consistent hashing, but a different form of sharding where every key is conceptually part of what we call a hash slot.

There are 16384 hash slots in Redis Cluster, and to compute the hash slot for a given key, we simply take the CRC16 of the key modulo 16384.

Every node in a Redis Cluster is responsible for a subset of the hash slots, so, for example, you may have a cluster with 3 nodes, where:

  • Node A contains hash slots from 0 to 5500.
  • Node B contains hash slots from 5501 to 11000.
  • Node C contains hash slots from 11001 to 16383.

This makes it easy to add and remove cluster nodes. For example, if I want to add a new node D, I need to move some hash slots from nodes A, B, C to D. Similarly, if I want to remove node A from the cluster, I can just move the hash slots served by A to B and C. Once node A is empty, I can remove it from the cluster completely.

Moving hash slots from a node to another does not require stopping any operations; therefore, adding and removing nodes, or changing the percentage of hash slots held by a node, requires no downtime.

Redis Cluster supports multiple key operations as long as all of the keys involved in a single command execution (or whole transaction, or Lua script execution) belong to the same hash slot. The user can force multiple keys to be part of the same hash slot by using a feature called hash tags.

Hash tags are documented in the Redis Cluster specification, but the gist is that if there is a substring between {} brackets in a key, only what is inside the string is hashed. For example, they keys user:{123}:profile and user:{123}:account are guaranteed to be in the same hash slot because they share the same hash tag. As a result, you can operate on these two keys in the same multi-key operation.

Redis Cluster master-replica model

To remain available when a subset of master nodes are failing or are not able to communicate with the majority of nodes, Redis Cluster uses a master-replica model where every hash slot has from 1 (the master itself) to N replicas (N-1 additional replica nodes).

In our example cluster with nodes A, B, C, if node B fails the cluster is not able to continue, since we no longer have a way to serve hash slots in the range 5501-11000.

However, when the cluster is created (or at a later time), we add a replica node to every master, so that the final cluster is composed of A, B, C that are master nodes, and A1, B1, C1 that are replica nodes. This way, the system can continue if node B fails.

Node B1 replicates B, and B fails, the cluster will promote node B1 as the new master and will continue to operate correctly.

However, note that if nodes B and B1 fail at the same time, Redis Cluster will not be able to continue to operate.

Redis Cluster consistency guarantees

Redis Cluster does not guarantee strong consistency. In practical terms this means that under certain conditions it is possible that Redis Cluster will lose writes that were acknowledged by the system to the client.

The first reason why Redis Cluster can lose writes is because it uses asynchronous replication. This means that during writes the following happens:

  • Your client writes to the master B.
  • The master B replies OK to your client.
  • The master B propagates the write to its replicas B1, B2 and B3.

As you can see, B does not wait for an acknowledgement from B1, B2, B3 before replying to the client, since this would be a prohibitive latency penalty for Redis, so if your client writes something, B acknowledges the write, but crashes before being able to send the write to its replicas, one of the replicas (that did not receive the write) can be promoted to master, losing the write forever.

This is very similar to what happens with most databases that are configured to flush data to disk every second, so it is a scenario you are already able to reason about because of past experiences with traditional database systems not involving distributed systems. Similarly you can improve consistency by forcing the database to flush data to disk before replying to the client, but this usually results in prohibitively low performance. That would be the equivalent of synchronous replication in the case of Redis Cluster.

Basically, there is a trade-off to be made between performance and consistency.

Redis Cluster has support for synchronous writes when absolutely needed, implemented via the WAIT command. This makes losing writes a lot less likely. However, note that Redis Cluster does not implement strong consistency even when synchronous replication is used: it is always possible, under more complex failure scenarios, that a replica that was not able to receive the write will be elected as master.

There is another notable scenario where Redis Cluster will lose writes, that happens during a network partition where a client is isolated with a minority of instances including at least a master.

Take as an example our 6 nodes cluster composed of A, B, C, A1, B1, C1, with 3 masters and 3 replicas. There is also a client, that we will call Z1.

After a partition occurs, it is possible that in one side of the partition we have A, C, A1, B1, C1, and in the other side we have B and Z1.

Z1 is still able to write to B, which will accept its writes. If the partition heals in a very short time, the cluster will continue normally. However, if the partition lasts enough time for B1 to be promoted to master on the majority side of the partition, the writes that Z1 has sent to B in the meantime will be lost.

Note that there is a maximum window to the amount of writes Z1 will be able to send to B: if enough time has elapsed for the majority side of the partition to elect a replica as master, every master node in the minority side will have stopped accepting writes.

This amount of time is a very important configuration directive of Redis Cluster, and is called the node timeout.

After node timeout has elapsed, a master node is considered to be failing, and can be replaced by one of its replicas. Similarly, after node timeout has elapsed without a master node to be able to sense the majority of the other master nodes, it enters an error state and stops accepting writes.

Redis Cluster configuration parameters

We are about to create an example cluster deployment. Before we continue, let's introduce the configuration parameters that Redis Cluster introduces in the redis.conf file.

  • cluster-enabled <yes/no>: If yes, enables Redis Cluster support in a specific Redis instance. Otherwise the instance starts as a standalone instance as usual.
  • cluster-config-file <filename>: Note that despite the name of this option, this is not a user editable configuration file, but the file where a Redis Cluster node automatically persists the cluster configuration (the state, basically) every time there is a change, in order to be able to re-read it at startup. The file lists things like the other nodes in the cluster, their state, persistent variables, and so forth. Often this file is rewritten and flushed on disk as a result of some message reception.
  • cluster-node-timeout <milliseconds>: The maximum amount of time a Redis Cluster node can be unavailable, without it being considered as failing. If a master node is not reachable for more than the specified amount of time, it will be failed over by its replicas. This parameter controls other important things in Redis Cluster. Notably, every node that can't reach the majority of master nodes for the specified amount of time, will stop accepting queries.
  • cluster-slave-validity-factor <factor>: If set to zero, a replica will always consider itself valid, and will therefore always try to failover a master, regardless of the amount of time the link between the master and the replica remained disconnected. If the value is positive, a maximum disconnection time is calculated as the node timeout value multiplied by the factor provided with this option, and if the node is a replica, it will not try to start a failover if the master link was disconnected for more than the specified amount of time. For example, if the node timeout is set to 5 seconds and the validity factor is set to 10, a replica disconnected from the master for more than 50 seconds will not try to failover its master. Note that any value different than zero may result in Redis Cluster being unavailable after a master failure if there is no replica that is able to failover it. In that case the cluster will return to being available only when the original master rejoins the cluster.
  • cluster-migration-barrier <count>: Minimum number of replicas a master will remain connected with, for another replica to migrate to a master which is no longer covered by any replica. See the appropriate section about replica migration in this tutorial for more information.
  • cluster-require-full-coverage <yes/no>: If this is set to yes, as it is by default, the cluster stops accepting writes if some percentage of the key space is not covered by any node. If the option is set to no, the cluster will still serve queries even if only requests about a subset of keys can be processed.
  • cluster-allow-reads-when-down <yes/no>: If this is set to no, as it is by default, a node in a Redis Cluster will stop serving all traffic when the cluster is marked as failed, either when a node can't reach a quorum of masters or when full coverage is not met. This prevents reading potentially inconsistent data from a node that is unaware of changes in the cluster. This option can be set to yes to allow reads from a node during the fail state, which is useful for applications that want to prioritize read availability but still want to prevent inconsistent writes. It can also be used for when using Redis Cluster with only one or two shards, as it allows the nodes to continue serving writes when a master fails but automatic failover is impossible.

Creating and using a Redis Cluster

To create a cluster, the first thing we need is to have a few empty Redis instances running in cluster mode.

At minimum, you'll see to set the following directives in the redis.conf file:

port 7000
cluster-enabled yes
cluster-config-file nodes.conf
cluster-node-timeout 5000
appendonly yes

Setting the cluster-enabled directive to yes enables cluster mode. Every instance also contains the path of a file where the configuration for this node is stored, which by default is nodes.conf. This file is never touched by humans; it is simply generated at startup by the Redis Cluster instances, and updated every time it is needed.

Note that the minimal cluster that works as expected must contain at least three master nodes. For deployment, we strongly recommend a six-node cluster, with three masters and three replicas.

You can test this locally by creating the following directories named after the port number of the instance you'll run inside any given directory.

For example:

mkdir cluster-test
cd cluster-test
mkdir 7000 7001 7002 7003 7004 7005

Create a redis.conf file inside each of the directories, from 7000 to 7005. As a template for your configuration file just use the small example above, but make sure to replace the port number 7000 with the right port number according to the directory name.

You can start each instance as follows, each running in a separate terminal tab:

cd 7000
redis-server ./redis.conf

You'll see from the logs that every node assigns itself a new ID:

[82462] 26 Nov 11:56:55.329 * No cluster configuration found, I'm 97a3a64667477371c4479320d683e4c8db5858b1

This ID will be used forever by this specific instance in order for the instance to have a unique name in the context of the cluster. Every node remembers every other node using this IDs, and not by IP or port. IP addresses and ports may change, but the unique node identifier will never change for all the life of the node. We call this identifier simply Node ID.

Initializing the cluster

Now that we have a number of instances running, we need to create our cluster by writing some meaningful configuration to the nodes.

To create the cluster, run:

redis-cli --cluster create 127.0.0.1:7000 127.0.0.1:7001 \
127.0.0.1:7002 127.0.0.1:7003 127.0.0.1:7004 127.0.0.1:7005 \
--cluster-replicas 1

The command used here is create, since we want to create a new cluster. The option --cluster-replicas 1 means that we want a replica for every master created.

The other arguments are the list of addresses of the instances I want to use to create the new cluster.

redis-cli will propose a configuration. Accept the proposed configuration by typing yes. The cluster will be configured and joined, which means that instances will be bootstrapped into talking with each other. Finally, if everything has gone well, you'll see a message like this:

[OK] All 16384 slots covered

This means that there is at least one master instance serving each of the 16384 available slots.

Creating a Redis Cluster using the create-cluster script

If you don't want to create a Redis Cluster by configuring and executing individual instances manually as explained above, there is a much simpler system (but you'll not learn the same amount of operational details).

Find the utils/create-cluster directory in the Redis distribution. There is a script called create-cluster inside (same name as the directory it is contained into), it's a simple bash script. In order to start a 6 nodes cluster with 3 masters and 3 replicas just type the following commands:

  1. create-cluster start
  2. create-cluster create

Reply to yes in step 2 when the redis-cli utility wants you to accept the cluster layout.

You can now interact with the cluster, the first node will start at port 30001 by default. When you are done, stop the cluster with:

  1. create-cluster stop

Please read the README inside this directory for more information on how to run the script.

Interacting with the cluster


To connect to Redis Cluster, you'll need a cluster-aware Redis client. See the documentation for your client of choice to determine its cluster support.

You can also test your Redis Cluster using the redis-cli command line utility:

$ redis-cli -c -p 7000
redis 127.0.0.1:7000> set foo bar
-> Redirected to slot [12182] located at 127.0.0.1:7002
OK
redis 127.0.0.1:7002> set hello world
-> Redirected to slot [866] located at 127.0.0.1:7000
OK
redis 127.0.0.1:7000> get foo
-> Redirected to slot [12182] located at 127.0.0.1:7002
"bar"
redis 127.0.0.1:7002> get hello
-> Redirected to slot [866] located at 127.0.0.1:7000
"world"

Note: if you created the cluster using the script, your nodes may listen on different ports, starting from 30001 by default.

The redis-cli cluster support is very basic, so it always uses the fact that Redis Cluster nodes are able to redirect a client to the right node. A serious client is able to do better than that, and cache the map between hash slots and nodes addresses, to directly use the right connection to the right node. The map is refreshed only when something changed in the cluster configuration, for example after a failover or after the system administrator changed the cluster layout by adding or removing nodes.

Writing an example app with redis-rb-cluster

Before going forward showing how to operate the Redis Cluster, doing things like a failover, or a resharding, we need to create some example application or at least to be able to understand the semantics of a simple Redis Cluster client interaction.

In this way we can run an example and at the same time try to make nodes failing, or start a resharding, to see how Redis Cluster behaves under real world conditions. It is not very helpful to see what happens while nobody is writing to the cluster.

This section explains some basic usage of redis-rb-cluster showing two examples. The first is the following, and is the example.rb file inside the redis-rb-cluster distribution:

   1  require './cluster'
   2
   3  if ARGV.length != 2
   4      startup_nodes = [
   5          {:host => "127.0.0.1", :port => 7000},
   6          {:host => "127.0.0.1", :port => 7001}
   7      ]
   8  else
   9      startup_nodes = [
  10          {:host => ARGV[0], :port => ARGV[1].to_i}
  11      ]
  12  end
  13
  14  rc = RedisCluster.new(startup_nodes,32,:timeout => 0.1)
  15
  16  last = false
  17
  18  while not last
  19      begin
  20          last = rc.get("__last__")
  21          last = 0 if !last
  22      rescue => e
  23          puts "error #{e.to_s}"
  24          sleep 1
  25      end
  26  end
  27
  28  ((last.to_i+1)..1000000000).each{|x|
  29      begin
  30          rc.set("foo#{x}",x)
  31          puts rc.get("foo#{x}")
  32          rc.set("__last__",x)
  33      rescue => e
  34          puts "error #{e.to_s}"
  35      end
  36      sleep 0.1
  37  }

The application does a very simple thing, it sets keys in the form foo<number> to number, one after the other. So if you run the program the result is the following stream of commands:

  • SET foo0 0
  • SET foo1 1
  • SET foo2 2
  • And so forth...

The program looks more complex than it should usually as it is designed to show errors on the screen instead of exiting with an exception, so every operation performed with the cluster is wrapped by begin rescue blocks.

The line 14 is the first interesting line in the program. It creates the Redis Cluster object, using as argument a list of startup nodes, the maximum number of connections this object is allowed to take against different nodes, and finally the timeout after a given operation is considered to be failed.

The startup nodes don't need to be all the nodes of the cluster. The important thing is that at least one node is reachable. Also note that redis-rb-cluster updates this list of startup nodes as soon as it is able to connect with the first node. You should expect such a behavior with any other serious client.

Now that we have the Redis Cluster object instance stored in the rc variable, we are ready to use the object like if it was a normal Redis object instance.

This is exactly what happens in line 18 to 26: when we restart the example we don't want to start again with foo0, so we store the counter inside Redis itself. The code above is designed to read this counter, or if the counter does not exist, to assign it the value of zero.

However note how it is a while loop, as we want to try again and again even if the cluster is down and is returning errors. Normal applications don't need to be so careful.

Lines between 28 and 37 start the main loop where the keys are set or an error is displayed.

Note the sleep call at the end of the loop. In your tests you can remove the sleep if you want to write to the cluster as fast as possible (relatively to the fact that this is a busy loop without real parallelism of course, so you'll get the usually 10k ops/second in the best of the conditions).

Normally writes are slowed down in order for the example application to be easier to follow by humans.

Starting the application produces the following output:

ruby ./example.rb
1
2
3
4
5
6
7
8
9
^C (I stopped the program here)

This is not a very interesting program and we'll use a better one in a moment but we can already see what happens during a resharding when the program is running.

Resharding the cluster

Now we are ready to try a cluster resharding. To do this please keep the example.rb program running, so that you can see if there is some impact on the program running. Also you may want to comment the sleep call in order to have some more serious write load during resharding.

Resharding basically means to move hash slots from a set of nodes to another set of nodes, and like cluster creation it is accomplished using the redis-cli utility.

To start a resharding just type:

redis-cli --cluster reshard 127.0.0.1:7000

You only need to specify a single node, redis-cli will find the other nodes automatically.

Currently redis-cli is only able to reshard with the administrator support, you can't just say move 5% of slots from this node to the other one (but this is pretty trivial to implement). So it starts with questions. The first is how much of a resharding do you want to do:

How many slots do you want to move (from 1 to 16384)?

We can try to reshard 1000 hash slots, that should already contain a non trivial amount of keys if the example is still running without the sleep call.

Then redis-cli needs to know what is the target of the resharding, that is, the node that will receive the hash slots. I'll use the first master node, that is, 127.0.0.1:7000, but I need to specify the Node ID of the instance. This was already printed in a list by redis-cli, but I can always find the ID of a node with the following command if I need:

$ redis-cli -p 7000 cluster nodes | grep myself
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5460

Ok so my target node is 97a3a64667477371c4479320d683e4c8db5858b1.

Now you'll get asked from what nodes you want to take those keys. I'll just type all in order to take a bit of hash slots from all the other master nodes.

After the final confirmation you'll see a message for every slot that redis-cli is going to move from a node to another, and a dot will be printed for every actual key moved from one side to the other.

While the resharding is in progress you should be able to see your example program running unaffected. You can stop and restart it multiple times during the resharding if you want.

At the end of the resharding, you can test the health of the cluster with the following command:

redis-cli --cluster check 127.0.0.1:7000

All the slots will be covered as usual, but this time the master at 127.0.0.1:7000 will have more hash slots, something around 6461.

Scripting a resharding operation

Resharding can be performed automatically without the need to manually enter the parameters in an interactive way. This is possible using a command line like the following:

redis-cli --cluster reshard <host>:<port> --cluster-from <node-id> --cluster-to <node-id> --cluster-slots <number of slots> --cluster-yes

This allows to build some automatism if you are likely to reshard often, however currently there is no way for redis-cli to automatically rebalance the cluster checking the distribution of keys across the cluster nodes and intelligently moving slots as needed. This feature will be added in the future.

The --cluster-yes option instructs the cluster manager to automatically answer "yes" to the command's prompts, allowing it to run in a non-interactive mode. Note that this option can also be activated by setting the REDISCLI_CLUSTER_YES environment variable.

A more interesting example application

The example application we wrote early is not very good. It writes to the cluster in a simple way without even checking if what was written is the right thing.

From our point of view the cluster receiving the writes could just always write the key foo to 42 to every operation, and we would not notice at all.

So in the redis-rb-cluster repository, there is a more interesting application that is called consistency-test.rb. It uses a set of counters, by default 1000, and sends INCR commands in order to increment the counters.

However instead of just writing, the application does two additional things:

  • When a counter is updated using INCR, the application remembers the write.
  • It also reads a random counter before every write, and check if the value is what we expected it to be, comparing it with the value it has in memory.

What this means is that this application is a simple consistency checker, and is able to tell you if the cluster lost some write, or if it accepted a write that we did not receive acknowledgment for. In the first case we'll see a counter having a value that is smaller than the one we remember, while in the second case the value will be greater.

Running the consistency-test application produces a line of output every second:

$ ruby consistency-test.rb
925 R (0 err) | 925 W (0 err) |
5030 R (0 err) | 5030 W (0 err) |
9261 R (0 err) | 9261 W (0 err) |
13517 R (0 err) | 13517 W (0 err) |
17780 R (0 err) | 17780 W (0 err) |
22025 R (0 err) | 22025 W (0 err) |
25818 R (0 err) | 25818 W (0 err) |

The line shows the number of Reads and Writes performed, and the number of errors (query not accepted because of errors since the system was not available).

If some inconsistency is found, new lines are added to the output. This is what happens, for example, if I reset a counter manually while the program is running:

$ redis-cli -h 127.0.0.1 -p 7000 set key_217 0
OK

(in the other tab I see...)

94774 R (0 err) | 94774 W (0 err) |
98821 R (0 err) | 98821 W (0 err) |
102886 R (0 err) | 102886 W (0 err) | 114 lost |
107046 R (0 err) | 107046 W (0 err) | 114 lost |

When I set the counter to 0 the real value was 114, so the program reports 114 lost writes (INCR commands that are not remembered by the cluster).

This program is much more interesting as a test case, so we'll use it to test the Redis Cluster failover.

Testing the failover

Note: during this test, you should take a tab open with the consistency test application running.

In order to trigger the failover, the simplest thing we can do (that is also the semantically simplest failure that can occur in a distributed system) is to crash a single process, in our case a single master.

We can identify a master and crash it with the following command:

$ redis-cli -p 7000 cluster nodes | grep master
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385482984082 0 connected 5960-10921
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 master - 0 1385482983582 0 connected 11423-16383
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422

Ok, so 7000, 7001, and 7002 are masters. Let's crash node 7002 with the DEBUG SEGFAULT command:

$ redis-cli -p 7002 debug segfault
Error: Server closed the connection

Now we can look at the output of the consistency test to see what it reported.

18849 R (0 err) | 18849 W (0 err) |
23151 R (0 err) | 23151 W (0 err) |
27302 R (0 err) | 27302 W (0 err) |

... many error warnings here ...

29659 R (578 err) | 29660 W (577 err) |
33749 R (578 err) | 33750 W (577 err) |
37918 R (578 err) | 37919 W (577 err) |
42077 R (578 err) | 42078 W (577 err) |

As you can see during the failover the system was not able to accept 578 reads and 577 writes, however no inconsistency was created in the database. This may sound unexpected as in the first part of this tutorial we stated that Redis Cluster can lose writes during the failover because it uses asynchronous replication. What we did not say is that this is not very likely to happen because Redis sends the reply to the client, and the commands to replicate to the replicas, about at the same time, so there is a very small window to lose data. However the fact that it is hard to trigger does not mean that it is impossible, so this does not change the consistency guarantees provided by Redis cluster.

We can now check what is the cluster setup after the failover (note that in the meantime I restarted the crashed instance so that it rejoins the cluster as a replica):

$ redis-cli -p 7000 cluster nodes
3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385503418521 0 connected
a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385503419023 0 connected
97a3a64667477371c4479320d683e4c8db5858b1 :0 myself,master - 0 0 0 connected 0-5959 10922-11422
3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385503419023 3 connected 11423-16383
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385503417005 0 connected 5960-10921
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385503418016 3 connected

Now the masters are running on ports 7000, 7001 and 7005. What was previously a master, that is the Redis instance running on port 7002, is now a replica of 7005.

The output of the CLUSTER NODES command may look intimidating, but it is actually pretty simple, and is composed of the following tokens:

  • Node ID
  • ip:port
  • flags: master, replica, myself, fail, ...
  • if it is a replica, the Node ID of the master
  • Time of the last pending PING still waiting for a reply.
  • Time of the last PONG received.
  • Configuration epoch for this node (see the Cluster specification).
  • Status of the link to this node.
  • Slots served...

Manual failover

Sometimes it is useful to force a failover without actually causing any problem on a master. For example in order to upgrade the Redis process of one of the master nodes it is a good idea to failover it in order to turn it into a replica with minimal impact on availability.

Manual failovers are supported by Redis Cluster using the CLUSTER FAILOVER command, that must be executed in one of the replicas of the master you want to failover.

Manual failovers are special and are safer compared to failovers resulting from actual master failures, since they occur in a way that avoid data loss in the process, by switching clients from the original master to the new master only when the system is sure that the new master processed all the replication stream from the old one.

This is what you see in the replica log when you perform a manual failover:

# Manual failover user request accepted.
# Received replication offset for paused master manual failover: 347540
# All master replication stream processed, manual failover can start.
# Start of election delayed for 0 milliseconds (rank #0, offset 347540).
# Starting a failover election for epoch 7545.
# Failover election won: I'm the new master.

Basically clients connected to the master we are failing over are stopped. At the same time the master sends its replication offset to the replica, that waits to reach the offset on its side. When the replication offset is reached, the failover starts, and the old master is informed about the configuration switch. When the clients are unblocked on the old master, they are redirected to the new master.

Note:

  • To promote a replica to master, it must first be known as a replica by a majority of the masters in the cluster. Otherwise, it cannot win the failover election. If the replica has just been added to the cluster (see Adding a new node as a replica below), you may need to wait a while before sending the CLUSTER FAILOVER command, to make sure the masters in cluster are aware of the new replica.

Adding a new node

Adding a new node is basically the process of adding an empty node and then moving some data into it, in case it is a new master, or telling it to setup as a replica of a known node, in case it is a replica.

We'll show both, starting with the addition of a new master instance.

In both cases the first step to perform is adding an empty node.

This is as simple as to start a new node in port 7006 (we already used from 7000 to 7005 for our existing 6 nodes) with the same configuration used for the other nodes, except for the port number, so what you should do in order to conform with the setup we used for the previous nodes:

  • Create a new tab in your terminal application.
  • Enter the cluster-test directory.
  • Create a directory named 7006.
  • Create a redis.conf file inside, similar to the one used for the other nodes but using 7006 as port number.
  • Finally start the server with ../redis-server ./redis.conf

At this point the server should be running.

Now we can use redis-cli as usual in order to add the node to the existing cluster.

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000

As you can see I used the add-node command specifying the address of the new node as first argument, and the address of a random existing node in the cluster as second argument.

In practical terms redis-cli here did very little to help us, it just sent a CLUSTER MEET message to the node, something that is also possible to accomplish manually. However redis-cli also checks the state of the cluster before to operate, so it is a good idea to perform cluster operations always via redis-cli even when you know how the internals work.

Now we can connect to the new node to see if it really joined the cluster:

redis 127.0.0.1:7006> cluster nodes
3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 127.0.0.1:7001 master - 0 1385543178575 0 connected 5960-10921
3fc783611028b1707fd65345e763befb36454d73 127.0.0.1:7004 slave 3e3a6cb0d9a9a87168e266b0a0b24026c0aae3f0 0 1385543179583 0 connected
f093c80dde814da99c5cf72a7dd01590792b783b :0 myself,master - 0 0 0 connected
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543178072 3 connected
a211e242fc6b22a9427fed61285e85892fa04e08 127.0.0.1:7003 slave 97a3a64667477371c4479320d683e4c8db5858b1 0 1385543178575 0 connected
97a3a64667477371c4479320d683e4c8db5858b1 127.0.0.1:7000 master - 0 1385543179080 0 connected 0-5959 10922-11422
3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 127.0.0.1:7005 master - 0 1385543177568 3 connected 11423-16383

Note that since this node is already connected to the cluster it is already able to redirect client queries correctly and is generally speaking part of the cluster. However it has two peculiarities compared to the other masters:

  • It holds no data as it has no assigned hash slots.
  • Because it is a master without assigned slots, it does not participate in the election process when a replica wants to become a master.

Now it is possible to assign hash slots to this node using the resharding feature of redis-cli. It is basically useless to show this as we already did in a previous section, there is no difference, it is just a resharding having as a target the empty node.

Adding a new node as a replica

Adding a new Replica can be performed in two ways. The obvious one is to use redis-cli again, but with the --cluster-slave option, like this:

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slave

Note that the command line here is exactly like the one we used to add a new master, so we are not specifying to which master we want to add the replica. In this case what happens is that redis-cli will add the new node as replica of a random master among the masters with fewer replicas.

However you can specify exactly what master you want to target with your new replica with the following command line:

redis-cli --cluster add-node 127.0.0.1:7006 127.0.0.1:7000 --cluster-slave --cluster-master-id 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

This way we assign the new replica to a specific master.

A more manual way to add a replica to a specific master is to add the new node as an empty master, and then turn it into a replica using the CLUSTER REPLICATE command. This also works if the node was added as a replica but you want to move it as a replica of a different master.

For example in order to add a replica for the node 127.0.0.1:7005 that is currently serving hash slots in the range 11423-16383, that has a Node ID 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e, all I need to do is to connect with the new node (already added as empty master) and send the command:

redis 127.0.0.1:7006> cluster replicate 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e

That's it. Now we have a new replica for this set of hash slots, and all the other nodes in the cluster already know (after a few seconds needed to update their config). We can verify with the following command:

$ redis-cli -p 7000 cluster nodes | grep slave | grep 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e
f093c80dde814da99c5cf72a7dd01590792b783b 127.0.0.1:7006 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617702 3 connected
2938205e12de373867bf38f1ca29d31d0ddb3e46 127.0.0.1:7002 slave 3c3a0c74aae0b56170ccb03a76b60cfe7dc1912e 0 1385543617198 3 connected

The node 3c3a0c... now has two replicas, running on ports 7002 (the existing one) and 7006 (the new one).

Removing a node

To remove a replica node just use the del-node command of redis-cli:

redis-cli --cluster del-node 127.0.0.1:7000 `<node-id>`

The first argument is just a random node in the cluster, the second argument is the ID of the node you want to remove.

You can remove a master node in the same way as well, however in order to remove a master node it must be empty. If the master is not empty you need to reshard data away from it to all the other master nodes before.

An alternative to remove a master node is to perform a manual failover of it over one of its replicas and remove the node after it turned into a replica of the new master. Obviously this does not help when you want to reduce the actual number of masters in your cluster, in that case, a resharding is needed.

Replica migration

In Redis Cluster it is possible to reconfigure a replica to replicate with a different master at any time just using the following command:

CLUSTER REPLICATE <master-node-id>

However there is a special scenario where you want replicas to move from one master to another one automatically, without the help of the system administrator. The automatic reconfiguration of replicas is called replicas migration and is able to improve the reliability of a Redis Cluster.

Note: you can read the details of replicas migration in the Redis Cluster Specification, here we'll only provide some information about the general idea and what you should do in order to benefit from it.

The reason why you may want to let your cluster replicas to move from one master to another under certain condition, is that usually the Redis Cluster is as resistant to failures as the number of replicas attached to a given master.

For example a cluster where every master has a single replica can't continue operations if the master and its replica fail at the same time, simply because there is no other instance to have a copy of the hash slots the master was serving. However while net-splits are likely to isolate a number of nodes at the same time, many other kind of failures, like hardware or software failures local to a single node, are a very notable class of failures that are unlikely to happen at the same time, so it is possible that in your cluster where every master has a replica, the replica is killed at 4am, and the master is killed at 6am. This still will result in a cluster that can no longer operate.

To improve reliability of the system we have the option to add additional replicas to every master, but this is expensive. Replica migration allows to add more replicas to just a few masters. So you have 10 masters with 1 replica each, for a total of 20 instances. However you add, for example, 3 instances more as replicas of some of your masters, so certain masters will have more than a single replica.

With replicas migration what happens is that if a master is left without replicas, a replica from a master that has multiple replicas will migrate to the orphaned master. So after your replica goes down at 4am as in the example we made above, another replica will take its place, and when the master will fail as well at 5am, there is still a replica that can be elected so that the cluster can continue to operate.

So what you should know about replicas migration in short?

  • The cluster will try to migrate a replica from the master that has the greatest number of replicas in a given moment.
  • To benefit from replica migration you have just to add a few more replicas to a single master in your cluster, it does not matter what master.
  • There is a configuration parameter that controls the replica migration feature that is called cluster-migration-barrier: you can read more about it in the example redis.conf file provided with Redis Cluster.

Upgrading nodes in a Redis Cluster

Upgrading replica nodes is easy since you just need to stop the node and restart it with an updated version of Redis. If there are clients scaling reads using replica nodes, they should be able to reconnect to a different replica if a given one is not available.

Upgrading masters is a bit more complex, and the suggested procedure is:

  1. Use CLUSTER FAILOVER to trigger a manual failover of the master to one of its replicas. (See the Manual failover section in this document.)
  2. Wait for the master to turn into a replica.
  3. Finally upgrade the node as you do for replicas.
  4. If you want the master to be the node you just upgraded, trigger a new manual failover in order to turn back the upgraded node into a master.

Following this procedure you should upgrade one node after the other until all the nodes are upgraded.

Migrating to Redis Cluster

Users willing to migrate to Redis Cluster may have just a single master, or may already using a preexisting sharding setup, where keys are split among N nodes, using some in-house algorithm or a sharding algorithm implemented by their client library or Redis proxy.

In both cases it is possible to migrate to Redis Cluster easily, however what is the most important detail is if multiple-keys operations are used by the application, and how. There are three different cases:

  1. Multiple keys operations, or transactions, or Lua scripts involving multiple keys, are not used. Keys are accessed independently (even if accessed via transactions or Lua scripts grouping multiple commands, about the same key, together).
  2. Multiple keys operations, or transactions, or Lua scripts involving multiple keys are used but only with keys having the same hash tag, which means that the keys used together all have a {...} sub-string that happens to be identical. For example the following multiple keys operation is defined in the context of the same hash tag: SUNION {user:1000}.foo {user:1000}.bar.
  3. Multiple keys operations, or transactions, or Lua scripts involving multiple keys are used with key names not having an explicit, or the same, hash tag.

The third case is not handled by Redis Cluster: the application requires to be modified in order to don't use multi keys operations or only use them in the context of the same hash tag.

Case 1 and 2 are covered, so we'll focus on those two cases, that are handled in the same way, so no distinction will be made in the documentation.

Assuming you have your preexisting data set split into N masters, where N=1 if you have no preexisting sharding, the following steps are needed in order to migrate your data set to Redis Cluster:

  1. Stop your clients. No automatic live-migration to Redis Cluster is currently possible. You may be able to do it orchestrating a live migration in the context of your application / environment.
  2. Generate an append only file for all of your N masters using the BGREWRITEAOF command, and waiting for the AOF file to be completely generated.
  3. Save your AOF files from aof-1 to aof-N somewhere. At this point you can stop your old instances if you wish (this is useful since in non-virtualized deployments you often need to reuse the same computers).
  4. Create a Redis Cluster composed of N masters and zero replicas. You'll add replicas later. Make sure all your nodes are using the append only file for persistence.
  5. Stop all the cluster nodes, substitute their append only file with your pre-existing append only files, aof-1 for the first node, aof-2 for the second node, up to aof-N.
  6. Restart your Redis Cluster nodes with the new AOF files. They'll complain that there are keys that should not be there according to their configuration.
  7. Use redis-cli --cluster fix command in order to fix the cluster so that keys will be migrated according to the hash slots each node is authoritative or not.
  8. Use redis-cli --cluster check at the end to make sure your cluster is ok.
  9. Restart your clients modified to use a Redis Cluster aware client library.

There is an alternative way to import data from external instances to a Redis Cluster, which is to use the redis-cli --cluster import command.

The command moves all the keys of a running instance (deleting the keys from the source instance) to the specified pre-existing Redis Cluster. However note that if you use a Redis 2.8 instance as source instance the operation may be slow since 2.8 does not implement migrate connection caching, so you may want to restart your source instance with a Redis 3.x version before to perform such operation.

A note about the word slave used in this page: Starting with Redis 5, if not for backward compatibility, the Redis project no longer uses the word slave. Unfortunately in this command the word slave is part of the protocol, so we'll be able to remove such occurrences only when this API will be naturally deprecated.

15 - Redis security

Security model and features in Redis

This document provides an introduction to the topic of security from the point of view of Redis. It covers the access control provided by Redis, code security concerns, attacks that can be triggered from the outside by selecting malicious inputs, and other similar topics.

For security-related contacts, open an issue on GitHub, or when you feel it is really important to preserve the security of the communication, use the GPG key at the end of this document.

Security model

Redis is designed to be accessed by trusted clients inside trusted environments. This means that usually it is not a good idea to expose the Redis instance directly to the internet or, in general, to an environment where untrusted clients can directly access the Redis TCP port or UNIX socket.

For instance, in the common context of a web application implemented using Redis as a database, cache, or messaging system, the clients inside the front-end (web side) of the application will query Redis to generate pages or to perform operations requested or triggered by the web application user.

In this case, the web application mediates access between Redis and untrusted clients (the user browsers accessing the web application).

In general, untrusted access to Redis should always be mediated by a layer implementing ACLs, validating user input, and deciding what operations to perform against the Redis instance.

Network security

Access to the Redis port should be denied to everybody but trusted clients in the network, so the servers running Redis should be directly accessible only by the computers implementing the application using Redis.

In the common case of a single computer directly exposed to the internet, such as a virtualized Linux instance (Linode, EC2, ...), the Redis port should be firewalled to prevent access from the outside. Clients will still be able to access Redis using the loopback interface.

Note that it is possible to bind Redis to a single interface by adding a line like the following to the redis.conf file:

bind 127.0.0.1

Failing to protect the Redis port from the outside can have a big security impact because of the nature of Redis. For instance, a single FLUSHALL command can be used by an external attacker to delete the whole data set.

Protected mode

Unfortunately, many users fail to protect Redis instances from being accessed from external networks. Many instances are simply left exposed on the internet with public IPs. Since version 3.2.0, Redis enters a special mode called protected mode when it is executed with the default configuration (binding all the interfaces) and without any password in order to access it. In this mode, Redis only replies to queries from the loopback interfaces, and replies to clients connecting from other addresses with an error that explains the problem and how to configure Redis properly.

We expect protected mode to seriously decrease the security issues caused by unprotected Redis instances executed without proper administration. However, the system administrator can still ignore the error given by Redis and disable protected mode or manually bind all the interfaces.

Authentication

While Redis does not try to implement Access Control, it provides a tiny layer of optional authentication that is turned on by editing the redis.conf file.

When the authorization layer is enabled, Redis will refuse any query by unauthenticated clients. A client can authenticate itself by sending the AUTH command followed by the password.

The password is set by the system administrator in clear text inside the redis.conf file. It should be long enough to prevent brute force attacks for two reasons:

  • Redis is very fast at serving queries. Many passwords per second can be tested by an external client.
  • The Redis password is stored in the redis.conf file and inside the client configuration. Since the system administrator does not need to remember it, the password can be very long.

The goal of the authentication layer is to optionally provide a layer of redundancy. If firewalling or any other system implemented to protect Redis from external attackers fail, an external client will still not be able to access the Redis instance without knowledge of the authentication password.

Since the AUTH command, like every other Redis command, is sent unencrypted, it does not protect against an attacker that has enough access to the network to perform eavesdropping.

TLS support

Redis has optional support for TLS on all communication channels, including client connections, replication links, and the Redis Cluster bus protocol.

Disallowing specific commands

It is possible to disallow commands in Redis or to rename them as an unguessable name, so that normal clients are limited to a specified set of commands.

For instance, a virtualized server provider may offer a managed Redis instance service. In this context, normal users should probably not be able to call the Redis CONFIG command to alter the configuration of the instance, but the systems that provide and remove instances should be able to do so.

In this case, it is possible to either rename or completely shadow commands from the command table. This feature is available as a statement that can be used inside the redis.conf configuration file. For example:

rename-command CONFIG b840fc02d524045429941cc15f59e41cb7be6c52

In the above example, the CONFIG command was renamed into an unguessable name. It is also possible to completely disallow it (or any other command) by renaming it to the empty string, like in the following example:

rename-command CONFIG ""

Attacks triggered by malicious inputs from external clients

There is a class of attacks that an attacker can trigger from the outside even without external access to the instance. For example, an attacker might insert data into Redis that triggers pathological (worst case) algorithm complexity on data structures implemented inside Redis internals.

An attacker could supply, via a web form, a set of strings that are known to hash to the same bucket in a hash table in order to turn the O(1) expected time (the average time) to the O(N) worst case. This can consume more CPU than expected and ultimately cause a Denial of Service.

To prevent this specific attack, Redis uses a per-execution, pseudo-random seed to the hash function.

Redis implements the SORT command using the qsort algorithm. Currently, the algorithm is not randomized, so it is possible to trigger a quadratic worst-case behavior by carefully selecting the right set of inputs.

String escaping and NoSQL injection

The Redis protocol has no concept of string escaping, so injection is impossible under normal circumstances using a normal client library. The protocol uses prefixed-length strings and is completely binary safe.

Since Lua scripts executed by the EVAL and EVALSHA commands follow the same rules, those commands are also safe.

While it would be a strange use case, the application should avoid composing the body of the Lua script from strings obtained from untrusted sources.

Code security

In a classical Redis setup, clients are allowed full access to the command set, but accessing the instance should never result in the ability to control the system where Redis is running.

Internally, Redis uses all the well-known practices for writing secure code to prevent buffer overflows, format bugs, and other memory corruption issues. However, the ability to control the server configuration using the CONFIG command allows the client to change the working directory of the program and the name of the dump file. This allows clients to write RDB Redis files to random paths. This is a security issue that may lead to the ability to compromise the system and/or run untrusted code as the same user as Redis is running.

Redis does not require root privileges to run. It is recommended to run it as an unprivileged redis user that is only used for this purpose.

GPG key

-----BEGIN PGP PUBLIC KEY BLOCK-----

mQINBF9FWioBEADfBiOE/iKpj2EF/cJ/KzFX+jSBKa8SKrE/9RE0faVF6OYnqstL
S5ox/o+yT45FdfFiRNDflKenjFbOmCbAdIys9Ta0iq6I9hs4sKfkNfNVlKZWtSVG
W4lI6zO2Zyc2wLZonI+Q32dDiXWNcCEsmajFcddukPevj9vKMTJZtF79P2SylEPq
mUuhMy/jOt7q1ibJCj5srtaureBH9662t4IJMFjsEe+hiZ5v071UiQA6Tp7rxLqZ
O6ZRzuamFP3xfy2Lz5NQ7QwnBH1ROabhJPoBOKCATCbfgFcM1Rj+9AOGfoDCOJKH
7yiEezMqr9VbDrEmYSmCO4KheqwC0T06lOLIQC4nnwKopNO/PN21mirCLHvfo01O
H/NUG1LZifOwAURbiFNF8Z3+L0csdhD8JnO+1nphjDHr0Xn9Vff2Vej030pRI/9C
SJ2s5fZUq8jK4n06sKCbqA4pekpbKyhRy3iuITKv7Nxesl4T/uhkc9ccpAvbuD1E
NczN1IH05jiMUMM3lC1A9TSvxSqflqI46TZU3qWLa9yg45kDC8Ryr39TY37LscQk
9x3WwLLkuHeUurnwAk46fSj7+FCKTGTdPVw8v7XbvNOTDf8vJ3o2PxX1uh2P2BHs
9L+E1P96oMkiEy1ug7gu8V+mKu5PAuD3QFzU3XCB93DpDakgtznRRXCkAQARAQAB
tBtSZWRpcyBMYWJzIDxyZWRpc0ByZWRpcy5pbz6JAk4EEwEKADgWIQR5sNCo1OBf
WO913l22qvOUq0evbgUCX0VaKgIbAwULCQgHAgYVCgkICwIEFgIDAQIeAQIXgAAK
CRC2qvOUq0evbpZaD/4rN7xesDcAG4ec895Fqzk3w74W1/K9lzRKZDwRsAqI+sAz
ZXvQMtWSxLfF2BITxLnHJXK5P+2Y6XlNgrn1GYwC1MsARyM9e1AzwDJHcXFkHU82
2aALIMXGtiZs/ejFh9ZSs5cgRlxBSqot/uxXm9AvKEByhmIeHPZse/Rc6e3qa57v
OhCkVZB4ETx5iZrgA+gdmS8N7MXG0cEu5gJLacG57MHi+2WMOCU9Xfj6+Pqhw3qc
E6lBinKcA/LdgUJ1onK0JCnOG1YVHjuFtaisfPXvEmUBGaSGE6lM4J7lass/OWps
Dd+oHCGI+VOGNx6AiBDZG8mZacu0/7goRnOTdljJ93rKkj31I+6+j4xzkAC0IXW8
LAP9Mmo9TGx0L5CaljykhW6z/RK3qd7dAYE+i7e8J9PuQaGG5pjFzuW4vY45j0V/
9JUMKDaGbU5choGqsCpAVtAMFfIBj3UQ5LCt5zKyescKCUb9uifOLeeQ1vay3R9o
eRSD52YpRBpor0AyYxcLur/pkHB0sSvXEfRZENQTohpY71rHSaFd3q1Hkk7lZl95
m24NRlrJnjFmeSPKP22vqUYIwoGNUF/D38UzvqHD8ltTPgkZc+Y+RRbVNqkQYiwW
GH/DigNB8r2sdkt+1EUu+YkYosxtzxpxxpYGKXYXx0uf+EZmRqRt/OSHKnf2GLkC
DQRfRVoqARAApffsrDNo4JWjX3r6wHJJ8IpwnGEJ2IzGkg8f1Ofk2uKrjkII/oIx
sXC3EeauC1Plhs+m9GP/SPY0LXmZ0OzGD/S1yMpmBeBuXJ0gONDo+xCg1pKGshPs
75XzpbggSOtEYR5S8Z46yCu7TGJRXBMGBhDgCfPVFBBNsnG5B0EeHXM4trqqlN6d
PAcwtLnKPz/Z+lloKR6bFXvYGuN5vjRXjcVYZLLCEwdV9iY5/Opqk9sCluasb3t/
c2gcsLWWFnNz2desvb/Y4ADJzxY+Um848DSR8IcdoArSsqmcCTiYvYC/UU7XPVNk
Jrx/HwgTVYiLGbtMB3u3fUpHW8SabdHc4xG3sx0LeIvl+JwHgx7yVhNYJEyOQfnE
mfS97x6surXgTVLbWVjXKIJhoWnWbLP4NkBc27H4qo8wM/IWH4SSXYNzFLlCDPnw
vQZSel21qxdqAWaSxkKcymfMS4nVDhVj0jhlcTY3aZcHMjqoUB07p5+laJr9CCGv
0Y0j0qT2aUO22A3kbv6H9c1Yjv8EI7eNz07aoH1oYU6ShsiaLfIqPfGYb7LwOFWi
PSl0dCY7WJg2H6UHsV/y2DwRr/3oH0a9hv/cvcMneMi3tpIkRwYFBPXEsIcoD9xr
RI5dp8BBdO/Nt+puoQq9oyialWnQK5+AY7ErW1yxjgie4PQ+XtN+85UAEQEAAYkC
NgQYAQoAIBYhBHmw0KjU4F9Y73XeXbaq85SrR69uBQJfRVoqAhsMAAoJELaq85Sr
R69uoV0QAIvlxAHYTjvH1lt5KbpVGs5gwIAnCMPxmaOXcaZ8V0Z1GEU+/IztwV+N
MYCBv1tYa7OppNs1pn75DhzoNAi+XQOVvU0OZgVJutthZe0fNDFGG9B4i/cxRscI
Ld8TPQQNiZPBZ4ubcxbZyBinE9HsYUM49otHjsyFZ0GqTpyne+zBf1GAQoekxlKo
tWSkkmW0x4qW6eiAmyo5lPS1bBjvaSc67i+6Bv5QkZa0UIkRqAzKN4zVvc2FyILz
+7wVLCzWcXrJt8dOeS6Y/Fjbhb6m7dtapUSETAKu6wJvSd9ndDUjFHD33NQIZ/nL
WaPbn01+e/PHtUDmyZ2W2KbcdlIT9nb2uHrruqdCN04sXkID8E2m2gYMA+TjhC0Q
JBJ9WPmdBeKH91R6wWDq6+HwOpgc/9na+BHZXMG+qyEcvNHB5RJdiu2r1Haf6gHi
Fd6rJ6VzaVwnmKmUSKA2wHUuUJ6oxVJ1nFb7Aaschq8F79TAfee0iaGe9cP+xUHL
zBDKwZ9PtyGfdBp1qNOb94sfEasWPftT26rLgKPFcroCSR2QCK5qHsMNCZL+u71w
NnTtq9YZDRaQ2JAc6VDZCcgu+dLiFxVIi1PFcJQ31rVe16+AQ9zsafiNsxkPdZcY
U9XKndQE028dGZv1E3S5BwpnikrUkWdxcYrVZ4fiNIy5I3My2yCe
=J9BD
-----END PGP PUBLIC KEY BLOCK-----

15.1 - ACL

Redis access control list

The Redis ACL, short for Access Control List, is the feature that allows certain connections to be limited in terms of the commands that can be executed and the keys that can be accessed. The way it works is that, after connecting, a client is required to provide a username and a valid password to authenticate. If authentication succeeded, the connection is associated with a given user and the limits the user has. Redis can be configured so that new connections are already authenticated with a "default" user (this is the default configuration). Configuring the default user has, as a side effect, the ability to provide only a specific subset of functionalities to connections that are not explicitly authenticated.

In the default configuration, Redis 6 (the first version to have ACLs) works exactly like older versions of Redis. Every new connection is capable of calling every possible command and accessing every key, so the ACL feature is backward compatible with old clients and applications. Also the old way to configure a password, using the requirepass configuration directive, still works as expected. However, it now sets a password for the default user.

The Redis AUTH command was extended in Redis 6, so now it is possible to use it in the two-arguments form:

AUTH <username> <password>

Here's an example of the old form:

AUTH <password>

What happens is that the username used to authenticate is "default", so just specifying the password implies that we want to authenticate against the default user. This provides backward compatibility.

When ACLs are useful

Before using ACLs, you may want to ask yourself what's the goal you want to accomplish by implementing this layer of protection. Normally there are two main goals that are well served by ACLs:

  1. You want to improve security by restricting the access to commands and keys, so that untrusted clients have no access and trusted clients have just the minimum access level to the database in order to perform the work needed. For instance, certain clients may just be able to execute read only commands.
  2. You want to improve operational safety, so that processes or humans accessing Redis are not allowed to damage the data or the configuration due to software errors or manual mistakes. For instance, there is no reason for a worker that fetches delayed jobs from Redis to be able to call the FLUSHALL command.

Another typical usage of ACLs is related to managed Redis instances. Redis is often provided as a managed service both by internal company teams that handle the Redis infrastructure for the other internal customers they have, or is provided in a software-as-a-service setup by cloud providers. In both setups, we want to be sure that configuration commands are excluded for the customers.

Configure ACLs with the ACL command

ACLs are defined using a DSL (domain specific language) that describes what a given user is allowed to do. Such rules are always implemented from the first to the last, left-to-right, because sometimes the order of the rules is important to understand what the user is really able to do.

By default there is a single user defined, called default. We can use the ACL LIST command in order to check the currently active ACLs and verify what the configuration of a freshly started, defaults-configured Redis instance is:

> ACL LIST
1) "user default on nopass ~* &* +@all"

The command above reports the list of users in the same format that is used in the Redis configuration files, by translating the current ACLs set for the users back into their description.

The first two words in each line are "user" followed by the username. The next words are ACL rules that describe different things. We'll show how the rules work in detail, but for now it is enough to say that the default user is configured to be active (on), to require no password (nopass), to access every possible key (~*) and Pub/Sub channel (&*), and be able to call every possible command (+@all).

Also, in the special case of the default user, having the nopass rule means that new connections are automatically authenticated with the default user without any explicit AUTH call needed.

ACL rules

The following is the list of valid ACL rules. Certain rules are just single words that are used in order to activate or remove a flag, or to perform a given change to the user ACL. Other rules are char prefixes that are concatenated with command or category names, key patterns, and so forth.

Enable and disallow users:

  • on: Enable the user: it is possible to authenticate as this user.
  • off: Disallow the user: it's no longer possible to authenticate with this user; however, previously authenticated connections will still work. Note that if the default user is flagged as off, new connections will start as not authenticated and will require the user to send AUTH or HELLO with the AUTH option in order to authenticate in some way, regardless of the default user configuration.

Allow and disallow commands:

  • +<command>: Add the command to the list of commands the user can call. Can be used with | for allowing subcommands (e.g "+config|get").
  • -<command>: Remove the command to the list of commands the user can call. Starting Redis 7.0, it can be used with | for blocking subcommands (e.g "-config|set").
  • +@<category>: Add all the commands in such category to be called by the user, with valid categories being like @admin, @set, @sortedset, ... and so forth, see the full list by calling the ACL CAT command. The special category @all means all the commands, both the ones currently present in the server, and the ones that will be loaded in the future via modules.
  • -@<category>: Like +@<category> but removes the commands from the list of commands the client can call.
  • +<command>|first-arg: Allow a specific first argument of an otherwise disabled command. It is only supported on commands with no sub-commands, and is not allowed as negative form like -SELECT|1, only additive starting with "+". This feature is deprecated and may be removed in the future.
  • allcommands: Alias for +@all. Note that it implies the ability to execute all the future commands loaded via the modules system.
  • nocommands: Alias for -@all.

Allow and disallow certain keys and key permissions:

  • ~<pattern>: Add a pattern of keys that can be mentioned as part of commands. For instance ~* allows all the keys. The pattern is a glob-style pattern like the one of KEYS. It is possible to specify multiple patterns.
  • %R~<pattern>: (Available in Redis 7.0 and later) Add the specified read key pattern. This behaves similar to the regular key pattern but only grants permission to read from keys that match the given pattern. See key permissions for more information.
  • %W~<pattern>: (Available in Redis 7.0 and later) Add the specified write key pattern. This behaves similar to the regular key pattern but only grants permission to write to keys that match the given pattern. See key permissions for more information.
  • %RW~<pattern>: (Available in Redis 7.0 and later) Alias for ~<pattern>.
  • allkeys: Alias for ~*.
  • resetkeys: Flush the list of allowed keys patterns. For instance the ACL ~foo:* ~bar:* resetkeys ~objects:*, will only allow the client to access keys that match the pattern objects:*.

Allow and disallow Pub/Sub channels:

  • &<pattern>: (Available in Redis 6.2 and later) Add a glob style pattern of Pub/Sub channels that can be accessed by the user. It is possible to specify multiple channel patterns. Note that pattern matching is done only for channels mentioned by PUBLISH and SUBSCRIBE, whereas PSUBSCRIBE requires a literal match between its channel patterns and those allowed for user.
  • allchannels: Alias for &* that allows the user to access all Pub/Sub channels.
  • resetchannels: Flush the list of allowed channel patterns and disconnect the user's Pub/Sub clients if these are no longer able to access their respective channels and/or channel patterns.

Configure valid passwords for the user:

  • ><password>: Add this password to the list of valid passwords for the user. For example >mypass will add "mypass" to the list of valid passwords. This directive clears the nopass flag (see later). Every user can have any number of passwords.
  • <<password>: Remove this password from the list of valid passwords. Emits an error in case the password you are trying to remove is actually not set.
  • #<hash>: Add this SHA-256 hash value to the list of valid passwords for the user. This hash value will be compared to the hash of a password entered for an ACL user. This allows users to store hashes in the acl.conf file rather than storing cleartext passwords. Only SHA-256 hash values are accepted as the password hash must be 64 characters and only contain lowercase hexadecimal characters.
  • !<hash>: Remove this hash value from the list of valid passwords. This is useful when you do not know the password specified by the hash value but would like to remove the password from the user.
  • nopass: All the set passwords of the user are removed, and the user is flagged as requiring no password: it means that every password will work against this user. If this directive is used for the default user, every new connection will be immediately authenticated with the default user without any explicit AUTH command required. Note that the resetpass directive will clear this condition.
  • resetpass: Flushes the list of allowed passwords and removes the nopass status. After resetpass, the user has no associated passwords and there is no way to authenticate without adding some password (or setting it as nopass later).

Note: if a user is not flagged with nopass and has no list of valid passwords, that user is effectively impossible to use because there will be no way to log in as that user.

Configure selectors for the user:

  • (<rule list>): (Available in Redis 7.0 and later) Create a new selector to match rules against. Selectors are evaluated after the user permissions, and are evaluated according to the order they are defined. If a command matches either the user permissions or any selector, it is allowed. See selectors for more information.
  • clearselectors: (Available in Redis 7.0 and later) Delete all of the selectors attached to the user.

Reset the user:

  • reset Performs the following actions: resetpass, resetkeys, resetchannels, off, -@all. The user returns to the same state it had immediately after its creation.

Create and edit user ACLs with the ACL SETUSER command

Users can be created and modified in two main ways:

  1. Using the ACL command and its ACL SETUSER subcommand.
  2. Modifying the server configuration, where users can be defined, and restarting the server. With an external ACL file, just call ACL LOAD.

In this section we'll learn how to define users using the ACL command. With such knowledge, it will be trivial to do the same things via the configuration files. Defining users in the configuration deserves its own section and will be discussed later separately.

To start, try the simplest ACL SETUSER command call:

> ACL SETUSER alice
OK

The ACL SETUSER command takes the username and a list of ACL rules to apply to the user. However the above example did not specify any rule at all. This will just create the user if it did not exist, using the defaults for new users. If the user already exists, the command above will do nothing at all.

Check the default user status:

> ACL LIST
1) "user alice off &* -@all"
2) "user default on nopass ~* ~& +@all"

The new user "alice" is:

  • In the off status, so AUTH will not work for the user "alice".
  • The user also has no passwords set.
  • Cannot access any command. Note that the user is created by default without the ability to access any command, so the -@all in the output above could be omitted; however, ACL LIST attempts to be explicit rather than implicit.
  • There are no key patterns that the user can access.
  • The user can access all Pub/Sub channels.

New users are created with restrictive permissions by default. Starting with Redis 6.2, ACL provides Pub/Sub channels access management as well. To ensure backward compatibility with version 6.0 when upgrading to Redis 6.2, new users are granted the 'allchannels' permission by default. The default can be set to resetchannels via the acl-pubsub-default configuration directive.

From 7.0, The acl-pubsub-default value is set to resetchannels to restrict the channels access by default to provide better security. The default can be set to allchannels via the acl-pubsub-default configuration directive to be compatible with previous versions.

Such user is completely useless. Let's try to define the user so that it is active, has a password, and can access with only the GET command to key names starting with the string "cached:".

> ACL SETUSER alice on >p1pp0 ~cached:* +get
OK

Now the user can do something, but will refuse to do other things:

> AUTH alice p1pp0
OK
> GET foo
(error) NOPERM this user has no permissions to access one of the keys used as arguments
> GET cached:1234
(nil)
> SET cached:1234 zap
(error) NOPERM this user has no permissions to run the 'set' command

Things are working as expected. In order to inspect the configuration of the user alice (remember that user names are case sensitive), it is possible to use an alternative to ACL LIST which is designed to be more suitable for computers to read, while ACL LIST is more human readable.

> ACL GETUSER alice
1) "flags"
2) 1) "on"
   2) "allchannels"
3) "passwords"
4) 1) "2d9c75..."
5) "commands"
6) "-@all +get"
7) "keys"
8) "~cached:*"
9) "channels"
10) "&*"
11) "selectors"
12) 1) 1) "commands"
       2) "-@all +set"
       3) "keys"
       4) "~*"
       5) "channels"
       6) "&*"

The ACL GETUSER returns a field-value array that describes the user in more parsable terms. The output includes the set of flags, a list of key patterns, passwords, and so forth. The output is probably more readable if we use RESP3, so that it is returned as a map reply:

> ACL GETUSER alice
1# "flags" => 1~ "on"
   2~ "allchannels"
2# "passwords" => 1) "2d9c75273d72b32df726fb545c8a4edc719f0a95a6fd993950b10c474ad9c927"
3# "commands" => "-@all +get"
4# "keys" => "~cached:*"
5# "channels" => "&*"
6# "selectors" => 1) 1# "commands" => "-@all +set"
    2# "keys" => "~*"
    3# "channels" => "&*"

Note: from now on, we'll continue using the Redis default protocol, version 2

Using another ACL SETUSER command (from a different user, because alice cannot run the ACL command), we can add multiple patterns to the user:

> ACL SETUSER alice ~objects:* ~items:* ~public:*
OK
> ACL LIST
1) "user alice on >2d9c75... ~cached:* ~objects:* ~items:* ~public:* &* -@all +get"
2) "user default on nopass ~* &* +@all"

The user representation in memory is now as we expect it to be.

Multiple calls to ACL SETUSER

It is very important to understand what happens when ACL SETUSER is called multiple times. What is critical to know is that every ACL SETUSER call will NOT reset the user, but will just apply the ACL rules to the existing user. The user is reset only if it was not known before. In that case, a brand new user is created with zeroed-ACLs. The user cannot do anything, is disallowed, has no passwords, and so forth. This is the best default for safety.

However later calls will just modify the user incrementally. For instance, the following sequence:

> ACL SETUSER myuser +set
OK
> ACL SETUSER myuser +get
OK

Will result in myuser being able to call both GET and SET:

> ACL LIST
1) "user default on nopass ~* &* +@all"
2) "user myuser off &* -@all +set +get"

Command categories

Setting user ACLs by specifying all the commands one after the other is really annoying, so instead we do things like this:

> ACL SETUSER antirez on +@all -@dangerous >42a979... ~*

By saying +@all and -@dangerous, we included all the commands and later removed all the commands that are tagged as dangerous inside the Redis command table. Note that command categories never include modules commands with the exception of +@all. If you say +@all, all the commands can be executed by the user, even future commands loaded via the modules system. However if you use the ACL rule +@read or any other, the modules commands are always excluded. This is very important because you should just trust the Redis internal command table. Modules may expose dangerous things and in the case of an ACL that is just additive, that is, in the form of +@all -... You should be absolutely sure that you'll never include what you did not mean to.

The following is a list of command categories and their meanings:

  • admin - Administrative commands. Normal applications will never need to use these. Includes REPLICAOF, CONFIG, DEBUG, SAVE, MONITOR, ACL, SHUTDOWN, etc.
  • bitmap - Data type: bitmaps related.
  • blocking - Potentially blocking the connection until released by another command.
  • connection - Commands affecting the connection or other connections. This includes AUTH, SELECT, COMMAND, CLIENT, ECHO, PING, etc.
  • dangerous - Potentially dangerous commands (each should be considered with care for various reasons). This includes FLUSHALL, MIGRATE, RESTORE, SORT, KEYS, CLIENT, DEBUG, INFO, CONFIG, SAVE, REPLICAOF, etc.
  • geo - Data type: geospatial indexes related.
  • hash - Data type: hashes related.
  • hyperloglog - Data type: hyperloglog related.
  • fast - Fast O(1) commands. May loop on the number of arguments, but not the number of elements in the key.
  • keyspace - Writing or reading from keys, databases, or their metadata in a type agnostic way. Includes DEL, RESTORE, DUMP, RENAME, EXISTS, DBSIZE, KEYS, EXPIRE, TTL, FLUSHALL, etc. Commands that may modify the keyspace, key, or metadata will also have the write category. Commands that only read the keyspace, key, or metadata will have the read category.
  • list - Data type: lists related.
  • pubsub - PubSub-related commands.
  • read - Reading from keys (values or metadata). Note that commands that don't interact with keys, will not have either read or write.
  • scripting - Scripting related.
  • set - Data type: sets related.
  • sortedset - Data type: sorted sets related.
  • slow - All commands that are not fast.
  • stream - Data type: streams related.
  • string - Data type: strings related.
  • transaction - WATCH / MULTI / EXEC related commands.
  • write - Writing to keys (values or metadata).

Redis can also show you a list of all categories and the exact commands each category includes using the Redis ACL command's CAT subcommand. It can be used in two forms:

ACL CAT -- Will just list all the categories available
ACL CAT <category-name> -- Will list all the commands inside the category

Examples:

 > ACL CAT
 1) "keyspace"
 2) "read"
 3) "write"
 4) "set"
 5) "sortedset"
 6) "list"
 7) "hash"
 8) "string"
 9) "bitmap"
10) "hyperloglog"
11) "geo"
12) "stream"
13) "pubsub"
14) "admin"
15) "fast"
16) "slow"
17) "blocking"
18) "dangerous"
19) "connection"
20) "transaction"
21) "scripting"

As you can see, so far there are 21 distinct categories. Now let's check what command is part of the geo category:

> ACL CAT geo
1) "geohash"
2) "georadius_ro"
3) "georadiusbymember"
4) "geopos"
5) "geoadd"
6) "georadiusbymember_ro"
7) "geodist"
8) "georadius"

Note that commands may be part of multiple categories. For example, an ACL rule like +@geo -@read will result in certain geo commands to be excluded because they are read-only commands.

Allow/block subcommands

Starting from Redis 7.0, subcommands can be allowed/blocked just like other commands (by using the separator | between the command and subcommand, for example: +config|get or -config|set)

That is true for all commands except DEBUG. In order to allow/block specific DEBUG subcommands, see the next section.

Allow the first-arg of a blocked command

Note: This feature is deprecated since Redis 7.0 and may be removed in the future.

Sometimes the ability to exclude or include a command or a subcommand as a whole is not enough. Many deployments may not be happy providing the ability to execute a SELECT for any DB, but may still want to be able to run SELECT 0.

In such case we could alter the ACL of a user in the following way:

ACL SETUSER myuser -select +select|0

First, remove the SELECT command and then add the allowed first-arg. Note that it is not possible to do the reverse since first-args can be only added, not excluded. It is safer to specify all the first-args that are valid for some user since it is possible that new first-args may be added in the future.

Another example:

ACL SETUSER myuser -debug +debug|digest

Note that first-arg matching may add some performance penalty; however, it is hard to measure even with synthetic benchmarks. The additional CPU cost is only paid when such commands are called, and not when other commands are called.

It is possible to use this mechanism in order to allow subcommands in Redis versions prior to 7.0 (see above section).

+@all VS -@all

In the previous section, it was observed how it is possible to define command ACLs based on adding/removing single commands.

Selectors

Starting with Redis 7.0, Redis supports adding multiple sets of rules that are evaluated independently of each other. These secondary sets of permissions are called selectors and added by wrapping a set of rules within parentheses. In order to execute a command, either the root permissions (rules defined outside of parenthesis) or any of the selectors (rules defined inside parenthesis) must match the given command. Internally, the root permissions are checked first followed by selectors in the order they were added.

For example, consider a user with the ACL rules +GET ~key1 (+SET ~key2). This user is able to execute GET key1 and SET key2 hello, but not GET key2 or SET key1 world.

Unlike the user's root permissions, selectors cannot be modified after they are added. Instead, selectors can be removed with the clearselectors keyword, which removes all of the added selectors. Note that clearselectors does not remove the root permissions.

Key permissions

Starting with Redis 7.0, key patterns can also be used to define how a command is able to touch a key. This is achieved through rules that define key permissions. The key permission rules take the form of %(<permission>)~<pattern>. Permissions are defined as individual characters that map to the following key permissions:

  • W (Write): The data stored within the key may be updated or deleted.
  • R (Read): User supplied data from the key is processed, copied or returned. Note that this does not include metadata such as size information (example STRLEN), type information (example TYPE) or information about whether a value exists within a collection (example SISMEMBER).

Permissions can be composed together by specifying multiple characters. Specifying the permission as 'RW' is considered full access and is analogous to just passing in ~<pattern>.

For a concrete example, consider a user with ACL rules +@all ~app1:* (+@readonly ~app2:*). This user has full access on app1:* and readonly access on app2:*. However, some commands support reading data from one key, doing some transformation, and storing it into another key. One such command is the COPY command, which copies the data from the source key into the destination key. The example set of ACL rules is unable to handle a request copying data from app2:user into app1:user, since neither the root permission or the selector fully matches the command. However, using key selectors you can define a set of ACL rules that can handle this request +@all ~app1:* %R~app2:*. The first pattern is able to match app1:user and the second pattern is able to match app2:user.

Which type of permission is required for a command is documented through key specifications. The type of permission is based off the keys logical operation flags. The insert, update, and delete flags map to the write key permission. The access flag maps to the read key permission. If the key has no logical operation flags, such as EXISTS, the user still needs either key read or key write permissions to execute the command.

Note: Side channels to accessing user data are ignored when it comes to evaluating whether read permissions are required to execute a command. This means that some write commands that return metadata about the modified key only require write permission on the key to execute: For example, consider the following two commands:

  • LPUSH key1 data: modifies "key1" but only returns metadata about it, the size of the list after the push, so the command only requires write permission on "key1" to execute.
  • LPOP key2: modifies "key2" but also returns data from it, the left most item in the list, so the command requires both read and write permission on "key2" to execute.

If an application needs to make sure no data is accessed from a key, including side channels, it's recommended to not provide any access to the key.

How passwords are stored internally

Redis internally stores passwords hashed with SHA256. If you set a password and check the output of ACL LIST or ACL GETUSER, you'll see a long hex string that looks pseudo random. Here is an example, because in the previous examples, for the sake of brevity, the long hex string was trimmed:

> ACL GETUSER default
1) "flags"
2) 1) "on"
   2) "allkeys"
   3) "allcommands"
   4) "allchannels"
3) "passwords"
4) 1) "2d9c75273d72b32df726fb545c8a4edc719f0a95a6fd993950b10c474ad9c927"
5) "commands"
6) "+@all"
7) "keys"
8) "~*"
9) "channels"
10) "&*"
11) "selectors"
12) (empty array)

Also, starting with Redis 6, the old command CONFIG GET requirepass will no longer return the clear text password, but instead the hashed password.

Using SHA256 provides the ability to avoid storing the password in clear text while still allowing for a very fast AUTH command, which is a very important feature of Redis and is coherent with what clients expect from Redis.

However ACL passwords are not really passwords. They are shared secrets between the server and the client, because the password is not an authentication token used by a human being. For instance:

  • There are no length limits, the password will just be memorized in some client software. There is no human that needs to recall a password in this context.
  • The ACL password does not protect any other thing. For example, it will never be the password for some email account.
  • Often when you are able to access the hashed password itself, by having full access to the Redis commands of a given server, or corrupting the system itself, you already have access to what the password is protecting: the Redis instance stability and the data it contains.

For this reason, slowing down the password authentication, in order to use an algorithm that uses time and space to make password cracking hard, is a very poor choice. What we suggest instead is to generate strong passwords, so that nobody will be able to crack it using a dictionary or a brute force attack even if they have the hash. To do so, there is a special ACL command ACL GENPASS that generates passwords using the system cryptographic pseudorandom generator:

> ACL GENPASS
"dd721260bfe1b3d9601e7fbab36de6d04e2e67b0ef1c53de59d45950db0dd3cc"

The command outputs a 32-byte (256-bit) pseudorandom string converted to a 64-byte alphanumerical string. This is long enough to avoid attacks and short enough to be easy to manage, cut & paste, store, and so forth. This is what you should use in order to generate Redis passwords.

Use an external ACL file

There are two ways to store users inside the Redis configuration:

  1. Users can be specified directly inside the redis.conf file.
  2. It is possible to specify an external ACL file.

The two methods are mutually incompatible, so Redis will ask you to use one or the other. Specifying users inside redis.conf is good for simple use cases. When there are multiple users to define, in a complex environment, we recommend you use the ACL file instead.

The format used inside redis.conf and in the external ACL file is exactly the same, so it is trivial to switch from one to the other, and is the following:

user <username> ... acl rules ...

For instance:

user worker +@list +@connection ~jobs:* on >ffa9203c493aa99

When you want to use an external ACL file, you are required to specify the configuration directive called aclfile, like this:

aclfile /etc/redis/users.acl

When you are just specifying a few users directly inside the redis.conf file, you can use CONFIG REWRITE in order to store the new user configuration inside the file by rewriting it.

The external ACL file however is more powerful. You can do the following:

  • Use ACL LOAD if you modified the ACL file manually and you want Redis to reload the new configuration. Note that this command is able to load the file only if all the users are correctly specified. Otherwise, an error is reported to the user, and the old configuration will remain valid.
  • Use ACL SAVE to save the current ACL configuration to the ACL file.

Note that CONFIG REWRITE does not also trigger ACL SAVE. When you use an ACL file, the configuration and the ACLs are handled separately.

ACL rules for Sentinel and Replicas

In case you don't want to provide Redis replicas and Redis Sentinel instances full access to your Redis instances, the following is the set of commands that must be allowed in order for everything to work correctly.

For Sentinel, allow the user to access the following commands both in the master and replica instances:

  • AUTH, CLIENT, SUBSCRIBE, SCRIPT, PUBLISH, PING, INFO, MULTI, SLAVEOF, CONFIG, CLIENT, EXEC.

Sentinel does not need to access any key in the database but does use Pub/Sub, so the ACL rule would be the following (note: AUTH is not needed since it is always allowed):

ACL SETUSER sentinel-user on >somepassword allchannels +multi +slaveof +ping +exec +subscribe +config|rewrite +role +publish +info +client|setname +client|kill +script|kill

Redis replicas require the following commands to be allowed on the master instance:

  • PSYNC, REPLCONF, PING

No keys need to be accessed, so this translates to the following rules:

ACL setuser replica-user on >somepassword +psync +replconf +ping

Note that you don't need to configure the replicas to allow the master to be able to execute any set of commands. The master is always authenticated as the root user from the point of view of replicas.

15.2 - TLS

Redis TLS support

SSL/TLS is supported by Redis starting with version 6 as an optional feature that needs to be enabled at compile time.

Getting Started

Building

To build with TLS support you'll need OpenSSL development libraries (e.g. libssl-dev on Debian/Ubuntu).

Run make BUILD_TLS=yes.

Tests

To run Redis test suite with TLS, you'll need TLS support for TCL (i.e. tcl-tls package on Debian/Ubuntu).

  1. Run ./utils/gen-test-certs.sh to generate a root CA and a server certificate.

  2. Run ./runtest --tls or ./runtest-cluster --tls to run Redis and Redis Cluster tests in TLS mode.

Running manually

To manually run a Redis server with TLS mode (assuming gen-test-certs.sh was invoked so sample certificates/keys are available):

./src/redis-server --tls-port 6379 --port 0 \
    --tls-cert-file ./tests/tls/redis.crt \
    --tls-key-file ./tests/tls/redis.key \
    --tls-ca-cert-file ./tests/tls/ca.crt

To connect to this Redis server with redis-cli:

./src/redis-cli --tls \
    --cert ./tests/tls/redis.crt \
    --key ./tests/tls/redis.key \
    --cacert ./tests/tls/ca.crt

Certificate configuration

In order to support TLS, Redis must be configured with a X.509 certificate and a private key. In addition, it is necessary to specify a CA certificate bundle file or path to be used as a trusted root when validating certificates. To support DH based ciphers, a DH params file can also be configured. For example:

tls-cert-file /path/to/redis.crt
tls-key-file /path/to/redis.key
tls-ca-cert-file /path/to/ca.crt
tls-dh-params-file /path/to/redis.dh

TLS listening port

The tls-port configuration directive enables accepting SSL/TLS connections on the specified port. This is in addition to listening on port for TCP connections, so it is possible to access Redis on different ports using TLS and non-TLS connections simultaneously.

You may specify port 0 to disable the non-TLS port completely. To enable only TLS on the default Redis port, use:

port 0
tls-port 6379

Client certificate authentication

By default, Redis uses mutual TLS and requires clients to authenticate with a valid certificate (authenticated against trusted root CAs specified by ca-cert-file or ca-cert-dir).

You may use tls-auth-clients no to disable client authentication.

Replication

A Redis master server handles connecting clients and replica servers in the same way, so the above tls-port and tls-auth-clients directives apply to replication links as well.

On the replica server side, it is necessary to specify tls-replication yes to use TLS for outgoing connections to the master.

Cluster

When Redis Cluster is used, use tls-cluster yes in order to enable TLS for the cluster bus and cross-node connections.

Sentinel

Sentinel inherits its networking configuration from the common Redis configuration, so all of the above applies to Sentinel as well.

When connecting to master servers, Sentinel will use the tls-replication directive to determine if a TLS or non-TLS connection is required.

In addition, the very same tls-replication directive will determine whether Sentinel's port, that accepts connections from other Sentinels, will support TLS as well. That is, Sentinel will be configured with tls-port if and only if tls-replication is enabled.

Additional configuration

Additional TLS configuration is available to control the choice of TLS protocol versions, ciphers and cipher suites, etc. Please consult the self documented redis.conf for more information.

Performance considerations

TLS adds a layer to the communication stack with overheads due to writing/reading to/from an SSL connection, encryption/decryption and integrity checks. Consequently, using TLS results in a decrease of the achievable throughput per Redis instance (for more information refer to this discussion).

Limitations

I/O threading is currently not supported with TLS.

16 - Transactions

How transactions work in Redis

Redis Transactions allow the execution of a group of commands in a single step, they are centered around the commands MULTI, EXEC, DISCARD and WATCH. Redis Transactions make two important guarantees:

  • All the commands in a transaction are serialized and executed sequentially. A request sent by another client will never be served in the middle of the execution of a Redis Transaction. This guarantees that the commands are executed as a single isolated operation.

  • The EXEC command triggers the execution of all the commands in the transaction, so if a client loses the connection to the server in the context of a transaction before calling the EXEC command none of the operations are performed, instead if the EXEC command is called, all the operations are performed. When using the append-only file Redis makes sure to use a single write(2) syscall to write the transaction on disk. However if the Redis server crashes or is killed by the system administrator in some hard way it is possible that only a partial number of operations are registered. Redis will detect this condition at restart, and will exit with an error. Using the redis-check-aof tool it is possible to fix the append only file that will remove the partial transaction so that the server can start again.

Starting with version 2.2, Redis allows for an extra guarantee to the above two, in the form of optimistic locking in a way very similar to a check-and-set (CAS) operation. This is documented later on this page.

Usage

A Redis Transaction is entered using the MULTI command. The command always replies with OK. At this point the user can issue multiple commands. Instead of executing these commands, Redis will queue them. All the commands are executed once EXEC is called.

Calling DISCARD instead will flush the transaction queue and will exit the transaction.

The following example increments keys foo and bar atomically.

> MULTI
OK
> INCR foo
QUEUED
> INCR bar
QUEUED
> EXEC
1) (integer) 1
2) (integer) 1

As is clear from the session above, EXEC returns an array of replies, where every element is the reply of a single command in the transaction, in the same order the commands were issued.

When a Redis connection is in the context of a MULTI request, all commands will reply with the string QUEUED (sent as a Status Reply from the point of view of the Redis protocol). A queued command is simply scheduled for execution when EXEC is called.

Errors inside a transaction

During a transaction it is possible to encounter two kind of command errors:

  • A command may fail to be queued, so there may be an error before EXEC is called. For instance the command may be syntactically wrong (wrong number of arguments, wrong command name, ...), or there may be some critical condition like an out of memory condition (if the server is configured to have a memory limit using the maxmemory directive).
  • A command may fail after EXEC is called, for instance since we performed an operation against a key with the wrong value (like calling a list operation against a string value).

Starting with Redis 2.6.5, the server will detect an error during the accumulation of commands. It will then refuse to execute the transaction returning an error during EXEC, discarding the transaction.

Note for Redis < 2.6.5: Prior to Redis 2.6.5 clients needed to detect errors occurring prior to EXEC by checking the return value of the queued command: if the command replies with QUEUED it was queued correctly, otherwise Redis returns an error. If there is an error while queueing a command, most clients will abort and discard the transaction. Otherwise, if the client elected to proceed with the transaction the EXEC command would execute all commands queued successfully regardless of previous errors.

Errors happening after EXEC instead are not handled in a special way: all the other commands will be executed even if some command fails during the transaction.

This is more clear on the protocol level. In the following example one command will fail when executed even if the syntax is right:

Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
MULTI
+OK
SET a abc
+QUEUED
LPOP a
+QUEUED
EXEC
*2
+OK
-ERR Operation against a key holding the wrong kind of value

EXEC returned two-element bulk string reply where one is an OK code and the other an -ERR reply. It's up to the client library to find a sensible way to provide the error to the user.

It's important to note that even when a command fails, all the other commands in the queue are processed – Redis will not stop the processing of commands.

Another example, again using the wire protocol with telnet, shows how syntax errors are reported ASAP instead:

MULTI
+OK
INCR a b c
-ERR wrong number of arguments for 'incr' command

This time due to the syntax error the bad INCR command is not queued at all.

What about rollbacks?

Redis does not support rollbacks of transactions since supporting rollbacks would have a significant impact on the simplicity and performance of Redis.

Discarding the command queue

DISCARD can be used in order to abort a transaction. In this case, no commands are executed and the state of the connection is restored to normal.

> SET foo 1
OK
> MULTI
OK
> INCR foo
QUEUED
> DISCARD
OK
> GET foo
"1"

Optimistic locking using check-and-set

WATCH is used to provide a check-and-set (CAS) behavior to Redis transactions.

WATCHed keys are monitored in order to detect changes against them. If at least one watched key is modified before the EXEC command, the whole transaction aborts, and EXEC returns a Null reply to notify that the transaction failed.

For example, imagine we have the need to atomically increment the value of a key by 1 (let's suppose Redis doesn't have INCR).

The first try may be the following:

val = GET mykey
val = val + 1
SET mykey $val

This will work reliably only if we have a single client performing the operation in a given time. If multiple clients try to increment the key at about the same time there will be a race condition. For instance, client A and B will read the old value, for instance, 10. The value will be incremented to 11 by both the clients, and finally SET as the value of the key. So the final value will be 11 instead of 12.

Thanks to WATCH we are able to model the problem very well:

WATCH mykey
val = GET mykey
val = val + 1
MULTI
SET mykey $val
EXEC

Using the above code, if there are race conditions and another client modifies the result of val in the time between our call to WATCH and our call to EXEC, the transaction will fail.

We just have to repeat the operation hoping this time we'll not get a new race. This form of locking is called optimistic locking. In many use cases, multiple clients will be accessing different keys, so collisions are unlikely – usually there's no need to repeat the operation.

WATCH explained

So what is WATCH really about? It is a command that will make the EXEC conditional: we are asking Redis to perform the transaction only if none of the WATCHed keys were modified. This includes modifications made by the client, like write commands, and by Redis itself, like expiration or eviction. If keys were modified between when they were WATCHed and when the EXEC was received, the entire transaction will be aborted instead.

NOTE

  • In Redis versions before 6.0.9, an expired key would not cause a transaction to be aborted. More on this
  • Commands within a transaction wont trigger the WATCH condition since they are only queued until the EXEC is sent.

WATCH can be called multiple times. Simply all the WATCH calls will have the effects to watch for changes starting from the call, up to the moment EXEC is called. You can also send any number of keys to a single WATCH call.

When EXEC is called, all keys are UNWATCHed, regardless of whether the transaction was aborted or not. Also when a client connection is closed, everything gets UNWATCHed.

It is also possible to use the UNWATCH command (without arguments) in order to flush all the watched keys. Sometimes this is useful as we optimistically lock a few keys, since possibly we need to perform a transaction to alter those keys, but after reading the current content of the keys we don't want to proceed. When this happens we just call UNWATCH so that the connection can already be used freely for new transactions.

Using WATCH to implement ZPOP

A good example to illustrate how WATCH can be used to create new atomic operations otherwise not supported by Redis is to implement ZPOP (ZPOPMIN, ZPOPMAX and their blocking variants have only been added in version 5.0), that is a command that pops the element with the lower score from a sorted set in an atomic way. This is the simplest implementation:

WATCH zset
element = ZRANGE zset 0 0
MULTI
ZREM zset element
EXEC

If EXEC fails (i.e. returns a Null reply) we just repeat the operation.

Redis scripting and transactions

Something else to consider for transaction like operations in redis are redis scripts which are transactional. Everything you can do with a Redis Transaction, you can also do with a script, and usually the script will be both simpler and faster.

17 - Troubleshooting Redis

Problems with Redis? Start here.

This page tries to help you with what to do if you have issues with Redis. Part of the Redis project is helping people that are experiencing problems because we don't like to leave people alone with their issues.

  • If you have latency problems with Redis, that in some way appears to be idle for some time, read our Redis latency troubleshooting guide.
  • Redis stable releases are usually very reliable, however in the rare event you are experiencing crashes the developers can help a lot more if you provide debugging information. Please read our Debugging Redis guide.
  • We have a long history of users experiencing crashes with Redis that actually turned out to be servers with broken RAM. Please test your RAM using redis-server --test-memory in case Redis is not stable in your system. Redis built-in memory test is fast and reasonably reliable, but if you can you should reboot your server and use memtest86.

For every other problem please drop a message to the Redis Google Group. We will be glad to help.

You can also find assistance on the Redis Discord server.

List of known critical bugs in Redis 3.0.x, 2.8.x and 2.6.x

To find a list of critical bugs please refer to the changelogs:

Check the upgrade urgency level in each patch release to more easily spot releases that included important fixes.

  • Ubuntu 10.04 and 10.10 have serious bugs (especially 10.10) that cause slow downs if not just instance hangs. Please move away from the default kernels shipped with this distributions. Link to 10.04 bug. Link to 10.10 bug. Both bugs were reported many times in the context of EC2 instances, but other users confirmed that also native servers are affected (at least by one of the two).
  • Certain versions of the Xen hypervisor are known to have very bad fork() performances. See the latency page for more information.