This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

RedisBloom

Bloom filters and other probabilistic data structures for Redis

Discord Github

RedisBloom adds four probabilistic data structures to Redis: a scalable Bloom filter, a cuckoo filter, a count-min sketch, and a top-k. These data structures trade perfect accuracy for extreme memory efficiency, so they're especially useful for big data and streaming applications.

Bloom and cuckoo filters are used to determine, with a high degree of certainty, whether an element is a member of a set.

A count-min sketch is generally used to determine the frequency of events in a stream. You can query the count-min sketch get an estimate of the frequency of any given event.

A top-k maintains a list of k most frequently seen items.

Bloom vs. Cuckoo filters

Bloom filters typically exhibit better performance and scalability when inserting items (so if you're often adding items to your dataset, then a Bloom filter may be ideal). Cuckoo filters are quicker on check operations and also allow deletions.

Academic sources

Bloom Filter

Cuckoo Filter

Count-Min Sketch

Top-K

References

Webinars

  1. Probabilistic Data Structures - The most useful thing in Redis you probably aren't using

Blog posts

  1. RedisBloom Quick Start Tutorial
  2. Developing with Bloom Filters
  3. RedisBloom on Redis Enterprise
  4. Probably and No: Redis, RedisBloom, and Bloom Filters
  5. RedisBloom – Bloom Filter Datatype for Redis

Mailing List / Forum

Got questions? Feel free to ask at the RedisBloom forum.

License

RedisBloom is licensed under the Redis Source Available License Agreement

1 - Commands

Commands Overview

Overview

Supported Probabilistic Data Structures

Redis includes a single Probabilistic Data Structure (PDS), the HyperLogLog which is used to count distinct elements in a multiset. RedisBloom adds to redis five additional PDS.

  • Bloom Filter - test for membership of elements in a set.
  • Cuckoo Filter - test for membership of elements in a set.
  • Count-Min Sketch - count the frequency of elements in a stream.
  • TopK - maintain a list of most frequent K elements in a stream.
  • T-digest Sketch - query for quantile.

Probabilistic Data Structures API

Details on module's commands can be filtered for a specific PDS.

The details also include the syntax for the commands, where:

  • Command and subcommand names are in uppercase, for example BF.RESERVE or BF.ADD
  • Optional arguments are enclosed in square brackets, for example [NONSCALING]
  • Additional optional arguments, such as multiple element for insertion or query, are indicated by three period characters, for example ...

Commands usually require a key's name as their first argument and an element to add or to query.

Complexity

Some commands has complexity O(k) which indicated the number of hash functions. The number of hash functions is constant and complexity may be considered O(1).

2 - Quick start

"Quick start guide"

Quick Start Guide for RedisBloom

Redis Cloud

RedisBloom is available on all Redis Cloud managed services. Redis Cloud Essentials offers a completely free managed database up to 30MB.

Get started here

Launch RedisBloom with Docker

docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest

Download

A pre-compiled version can be downloaded from Redis download center.

Building

git clone --recursive https://github.com/RedisBloom/RedisBloom.git
cd RedisBloom
make setup
make

running

# Assuming you have a redis build from the unstable branch:
/path/to/redis-server --loadmodule ./redisbloom.so

Bloom Filters

Bloom: Adding new items to the filter

A new filter is created for you if it does not yet exist

127.0.0.1:6379> BF.ADD newFilter foo
(integer) 1

Bloom: Checking if an item exists in the filter

127.0.0.1:6379> BF.EXISTS newFilter foo
(integer) 1
127.0.0.1:6379> BF.EXISTS newFilter notpresent
(integer) 0

Bloom: Adding and checking multiple items

127.0.0.1:6379> BF.MADD myFilter foo bar baz
1) (integer) 1
2) (integer) 1
3) (integer) 1
127.0.0.1:6379> BF.MEXISTS myFilter foo nonexist bar
1) (integer) 1
2) (integer) 0
3) (integer) 1

Bloom: Creating a new filter with custom properties

127.0.0.1:6379> BF.RESERVE customFilter 0.0001 600000
OK
127.0.0.1:6379> BF.MADD customFilter foo bar baz

Cuckoo Filters

Cuckoo: Adding new items to a filter

Create an empty cuckoo filter with an initial capacity (of 1000 items)

127.0.0.1:6379> CF.RESERVE newCuckooFilter 1000
(integer) 1

A new filter is created for you if it does not yet exist

127.0.0.1:6379> CF.ADD newCuckooFilter foo
(integer) 1

You can add the item multiple times. The filter will attempt to count it.

Cuckoo: Checking whether item exists

127.0.0.1:6379> CF.EXISTS newCuckooFilter foo
(integer) 1
127.0.0.1:6379> CF.EXISTS newCuckooFilter notpresent
(integer) 0

Cuckoo: Deleting item from filter

127.0.0.1:6379> CF.DEL newCuckooFilter foo
(integer) 1

3 - Client Libraries

RedisBloom has several client libraries, written by the module authors and community members - abstracting the API in different programming languages. While it is possible and simple to use the raw Redis commands API, in most cases it's easier to just use a client library abstracting it.

Currently available Libraries

ProjectLanguageLicenseAuthorURL
JedisJavaMITRedisGitHub
redisbloom-pyPythonBSDRedisGitHub
JRedisBloomJavaBSDRedisGitHub
node-redisJavaScriptMITRedisGitHub
redisbloom-goGoBSDRedisGitHub
rueidisGoApache License 2.0RueianGitHub
rebloomJavaScriptMITAlbert TeamGitHub
phpredis-bloomPHPMITRafa CampoyGitHub
phpRebloomPHPMITAlessandro BalascoGitHub
redis-modules-sdkTypeScriptBSD-3-ClauseDani TseitlinGitHub
redis-modules-javaJavaApache License 2.0denglimingGitHub
NRedisBloom.NETMITyadazulaGitHub

4 - Configuration

RedisBloom supports a few run-time configuration options that can be defined when loading the module. In the future more options will be added.

Run-time configuration

RedisBloom supports a few run-time configuration options that should be determined when loading the module.

Passing Configuration Options During Loading

In general, passing configuration options is done by appending arguments after the --loadmodule argument in the command line, loadmodule configuration directive in a Redis config file, or the MODULE LOAD command. For example:

In redis.conf:

loadmodule redisbloom.so OPT1 OPT2

From redis-cli:

127.0.0.6379> MODULE load redisbloom.so OPT1 OPT2

From command line:

$ redis-server --loadmodule ./redisbloom.so OPT1 OPT2

Default parameters

!!! warning "Note on using initialization default sizes" A filter should always be sized for the expected capacity and the desired error-rate. Using the INSERT family commands with the default values should be used in cases where many small filter exist and the expectation is most will remain at about that size. Not optimizing a filter for its intended use will result in degradation of performance and memory efficiency.

Error rate and Initial Size for Bloom Filter

You can adjust the default error ratio and the initial filter size (for bloom filters) using the ERROR_RATE and INITIAL_SIZE options respectively when loading the module, e.g.

$ redis-server --loadmodule /path/to/redisbloom.so INITIAL_SIZE 400 ERROR_RATE 0.004

The default error rate is 0.01 and the default initial capacity is 100.

Initial Size for Cuckoo Filter

For Cuckoo filter, the default capacity is 1024.