RedisBloom
Bloom filters and other probabilistic data structures for Redis
RedisBloom adds four probabilistic data structures to Redis: a scalable Bloom filter, a cuckoo filter, a count-min sketch, and a top-k. These data structures trade perfect accuracy for extreme memory efficiency, so they're especially useful for big data and streaming applications.
Bloom and cuckoo filters are used to determine, with a high degree of certainty, whether an element is a member of a set.
A count-min sketch is generally used to determine the frequency of events in a stream. You can query the count-min sketch get an estimate of the frequency of any given event.
A top-k maintains a list of k most frequently seen items.
Bloom vs. Cuckoo filters
Bloom filters typically exhibit better performance and scalability when inserting
items (so if you're often adding items to your dataset, then a Bloom filter may be ideal).
Cuckoo filters are quicker on check operations and also allow deletions.
Academic sources
Bloom Filter
Cuckoo Filter
Count-Min Sketch
Top-K
References
Webinars
- Probabilistic Data Structures - The most useful thing in Redis you probably aren't using
Blog posts
- RedisBloom Quick Start Tutorial
- Developing with Bloom Filters
- RedisBloom on Redis Enterprise
- Probably and No: Redis, RedisBloom, and Bloom Filters
- RedisBloom – Bloom Filter Datatype for Redis
Mailing List / Forum
Got questions? Feel free to ask at the RedisBloom forum.
License
RedisBloom is licensed under the Redis Source Available License Agreement
1 - Commands
Commands Overview
Overview
Supported Probabilistic Data Structures
Redis includes a single Probabilistic Data Structure (PDS), the HyperLogLog which is used to count distinct elements in a multiset. RedisBloom adds to redis five additional PDS.
- Bloom Filter - test for membership of elements in a set.
- Cuckoo Filter - test for membership of elements in a set.
- Count-Min Sketch - count the frequency of elements in a stream.
- TopK - maintain a list of most frequent K elements in a stream.
- T-digest Sketch - query for quantile.
Probabilistic Data Structures API
Details on module's commands can be filtered for a specific PDS.
The details also include the syntax for the commands, where:
- Command and subcommand names are in uppercase, for example
BF.RESERVE
or BF.ADD
- Optional arguments are enclosed in square brackets, for example
[NONSCALING]
- Additional optional arguments, such as multiple element for insertion or query, are indicated by three period characters, for example
...
Commands usually require a key's name as their first argument and an element to add or to query.
Complexity
Some commands has complexity O(k)
which indicated the number of hash functions. The number of hash functions is constant and complexity may be considered O(1)
.
2 - Quick start
"Quick start guide"
Quick Start Guide for RedisBloom
Redis Cloud
RedisBloom is available on all Redis Cloud managed services. Redis Cloud Essentials offers a completely free managed database up to 30MB.
Get started here
Launch RedisBloom with Docker
docker run -p 6379:6379 --name redis-redisbloom redislabs/rebloom:latest
Download
A pre-compiled version can be downloaded from Redis download center.
Building
git clone --recursive https://github.com/RedisBloom/RedisBloom.git
cd RedisBloom
make setup
make
running
# Assuming you have a redis build from the unstable branch:
/path/to/redis-server --loadmodule ./redisbloom.so
Bloom Filters
Bloom: Adding new items to the filter
A new filter is created for you if it does not yet exist
127.0.0.1:6379> BF.ADD newFilter foo
(integer) 1
Bloom: Checking if an item exists in the filter
127.0.0.1:6379> BF.EXISTS newFilter foo
(integer) 1
127.0.0.1:6379> BF.EXISTS newFilter notpresent
(integer) 0
Bloom: Adding and checking multiple items
127.0.0.1:6379> BF.MADD myFilter foo bar baz
1) (integer) 1
2) (integer) 1
3) (integer) 1
127.0.0.1:6379> BF.MEXISTS myFilter foo nonexist bar
1) (integer) 1
2) (integer) 0
3) (integer) 1
Bloom: Creating a new filter with custom properties
127.0.0.1:6379> BF.RESERVE customFilter 0.0001 600000
OK
127.0.0.1:6379> BF.MADD customFilter foo bar baz
Cuckoo Filters
Cuckoo: Adding new items to a filter
Create an empty cuckoo filter with an initial capacity (of 1000 items)
127.0.0.1:6379> CF.RESERVE newCuckooFilter 1000
(integer) 1
A new filter is created for you if it does not yet exist
127.0.0.1:6379> CF.ADD newCuckooFilter foo
(integer) 1
You can add the item multiple times. The filter will attempt to count it.
Cuckoo: Checking whether item exists
127.0.0.1:6379> CF.EXISTS newCuckooFilter foo
(integer) 1
127.0.0.1:6379> CF.EXISTS newCuckooFilter notpresent
(integer) 0
Cuckoo: Deleting item from filter
127.0.0.1:6379> CF.DEL newCuckooFilter foo
(integer) 1
3 - Client Libraries
RedisBloom has several client libraries, written by the module authors and community members - abstracting the API in different programming languages. While it is possible and simple to use the raw Redis commands API, in most cases it's easier to just use a client library abstracting it.
Currently available Libraries
4 - Configuration
RedisBloom supports a few run-time configuration options that can be defined when loading the module. In the future more options will be added.
Run-time configuration
RedisBloom supports a few run-time configuration options that should be determined when loading the module.
Passing Configuration Options During Loading
In general, passing configuration options is done by appending arguments after the --loadmodule
argument in the command line, loadmodule
configuration directive in a Redis config file, or the MODULE LOAD
command. For example:
In redis.conf:
loadmodule redisbloom.so OPT1 OPT2
From redis-cli:
127.0.0.6379> MODULE load redisbloom.so OPT1 OPT2
From command line:
$ redis-server --loadmodule ./redisbloom.so OPT1 OPT2
Default parameters
!!! warning "Note on using initialization default sizes"
A filter should always be sized for the expected capacity and the desired error-rate.
Using the INSERT
family commands with the default values should be used in cases where many small filter exist and the expectation is most will remain at about that size.
Not optimizing a filter for its intended use will result in degradation of performance and memory efficiency.
Error rate and Initial Size for Bloom Filter
You can adjust the default error ratio and the initial filter size (for bloom filters)
using the ERROR_RATE
and INITIAL_SIZE
options respectively when loading the
module, e.g.
$ redis-server --loadmodule /path/to/redisbloom.so INITIAL_SIZE 400 ERROR_RATE 0.004
The default error rate is 0.01
and the default initial capacity is 100
.
Initial Size for Cuckoo Filter
For Cuckoo filter, the default capacity is 1024.