2 - Quickstart
Quick Start Guide to RedisTimeSeries
Setup
You can either get RedisTimeSeries setup in the cloud, in a Docker container or on your own machine.
Redis Cloud
RedisTimeSeries is available on all Redis Cloud managed services, including a completely free managed database up to 30MB.
Get started here
Docker
To quickly try out RedisTimeSeries, launch an instance using docker:
docker run -p 6379:6379 -it --rm redislabs/redistimeseries
Download and running binaries
First download the pre-compiled version from the Redis download center.
Next, run Redis with RedisTimeSeries:
$ redis-server --loadmodule /path/to/module/redistimeseries.so
Build and Run it yourself
You can also build and run RedisTimeSeries on your own machine.
Major Linux distributions as well as macOS are supported.
Requirements
First, clone the RedisTimeSeries repository from git:
git clone --recursive https://github.com/RedisTimeSeries/RedisTimeSeries.git
Then, to install required build artifacts, invoke the following:
cd RedisTimeSeries
make setup
Or you can install required dependencies manually listed in system-setup.py.
If make
is not yet available, the following commands are equivalent:
./deps/readies/bin/getpy3
./system-setup.py
Note that system-setup.py
will install various packages on your system using the native package manager and pip. This requires root permissions (i.e. sudo) on Linux.
If you prefer to avoid that, you can:
- Review system-setup.py and install packages manually,
- Utilize a Python virtual environment,
- Use Docker with the
--volume
option to create an isolated build environment.
Build
Binary artifacts are placed under the bin
directory.
Run
In your redis-server run: loadmodule bin/redistimeseries.so
For more information about modules, go to the redis official documentation.
Give it a try with redis-cli
After you setup RedisTimeSeries, you can interact with it using redis-cli.
$ redis-cli
127.0.0.1:6379> TS.CREATE sensor1
OK
Creating a timeseries
A new timeseries can be created with the TS.CREATE
command; for example, to create a timeseries named sensor1
run the following:
TS.CREATE sensor1
You can prevent your timeseries growing indefinitely by setting a maximum age for samples compared to the last event time (in milliseconds) with the RETENTION
option. The default value for retention is 0
, which means the series will not be trimmed.
TS.CREATE sensor1 RETENTION 2678400000
This will create a timeseries called sensor1
and trim it to values of up to one month.
Adding data points
For adding new data points to a timeseries we use the TS.ADD
command:
TS.ADD key timestamp value
The timestamp
argument is the UNIX timestamp of the sample in milliseconds and value
is the numeric data value of the sample.
Example:
TS.ADD sensor1 1626434637914 26
To add a datapoint with the current timestamp you can use a *
instead of a specific timestamp:
TS.ADD sensor1 * 26
You can append data points to multiple timeseries at the same time with the TS.MADD
command:
TS.MADD key timestamp value [key timestamp value ...]
Deleting data points
Data points between two timestamps (inclusive) can be deleted with the TS.DEL
command:
TS.DEL key fromTimestamp toTimestamp
Example:
TS.DEL sensor1 1000 2000
To delete a single timestamp, use it as both the "from" and "to" timestamp:
TS.DEL sensor1 1000 1000
Note: When a sample is deleted, the data in all downsampled timeseries will be recalculated for the specific bucket. If part of the bucket has already been removed though, because it's outside of the retention period, we won't be able to recalculate the full bucket, so in those cases we will refuse the delete operation.
Labels
Labels are key-value metadata we attach to data points, allowing us to group and filter. They can be either string or numeric values and are added to a timeseries on creation:
TS.CREATE sensor1 LABELS region east
Downsampling
Another useful feature of RedisTimeSeries is compacting data by creating a rule for downsampling (TS.CREATERULE
). For example, if you have collected more than one billion data points in a day, you could aggregate the data by every minute in order to downsample it, thereby reducing the dataset size to 24 * 60 = 1,440 data points. You can choose one of the many available aggregation types in order to aggregate multiple data points from a certain minute into a single one. The currently supported aggregation types are: avg, sum, min, max, range, count, first, last, std.p, std.s, var.p, var.s and twa
.
It's important to point out that there is no data rewriting on the original timeseries; the compaction happens in a new series, while the original one stays the same. In order to prevent the original timeseries from growing indefinitely, you can use the retention option, which will trim it down to a certain period of time.
NOTE: You need to create the destination (the compacted) timeseries before creating the rule.
TS.CREATERULE sourceKey destKey AGGREGATION aggregationType bucketDuration
Example:
TS.CREATE sensor1_compacted # Create the destination timeseries first
TS.CREATERULE sensor1 sensor1_compacted AGGREGATION avg 60000 # Create the rule
With this creation rule, datapoints added to the sensor1
timeseries will be grouped into buckets of 60 seconds (60000ms), averaged, and saved in the sensor1_compacted
timeseries.
Filtering
RedisTimeSeries allows to filter by value, timestamp and by labels:
Filtering by label
You can retrieve datapoints from multiple timeseries in the same query, and the way to do this is by using label filters. For example:
TS.MRANGE - + FILTER area_id=32
This query will show data from all sensors (timeseries) that have a label of area_id
with a value of 32
. The results will be grouped by timeseries.
Or we can also use the TS.MGET
command to get the last sample that matches the specific filter:
TS.MGET FILTER area_id=32
Filtering by value
We can filter by value across a single or multiple timeseries:
TS.RANGE sensor1 - + FILTER_BY_VALUE 25 30
This command will return all data points whose value sits between 25 and 30, inclusive.
To achieve the same filtering on multiple series we have to combine the filtering by value with filtering by label:
TS.MRANGE - + FILTER_BY_VALUE 20 30 FILTER region=east
Filtering by timestamp
To retrieve the datapoints for specific timestamps on one or multiple timeseries we can use the FILTER_BY_TS
argument:
Filter on one timeseries:
TS.RANGE sensor1 - + FILTER_BY_TS 1626435230501 1626443276598
Filter on multiple timeseries:
TS.MRANGE - + FILTER_BY_TS 1626435230501 1626443276598 FILTER region=east
Aggregation
It's possible to combine values of one or more timeseries by leveraging aggregation functions:
TS.RANGE ... AGGREGATION aggType bucketDuration...
For example, to find the average temperature per hour in our sensor1
series we could run:
TS.RANGE sensor1 - + + AGGREGATION avg 3600000
To achieve the same across multiple sensors from the area with id of 32 we would run:
TS.MRANGE - + AGGREGATION avg 3600000 FILTER area_id=32
Aggregation bucket alignment
When doing aggregations, the aggregation buckets will be aligned to 0 as so:
TS.RANGE sensor3 10 70 + AGGREGATION min 25
Value: | (1000) (2000) (3000) (4000) (5000) (6000) (7000)
Timestamp: |-------|10|-------|20|-------|30|-------|40|-------|50|-------|60|-------|70|--->
Bucket(25ms): |_________________________||_________________________||___________________________|
V V V
min(1000, 2000)=1000 min(3000, 4000)=3000 min(5000, 6000, 7000)=5000
And we will get the following datapoints: 1000, 3000, 5000.
You can choose to align the buckets to the start or end of the queried interval as so:
TS.RANGE sensor3 10 70 + AGGREGATION min 25 ALIGN start
Value: | (1000) (2000) (3000) (4000) (5000) (6000) (7000)
Timestamp: |-------|10|-------|20|-------|30|-------|40|-------|50|-------|60|-------|70|--->
Bucket(25ms): |__________________________||_________________________||___________________________|
V V V
min(1000, 2000, 3000)=1000 min(4000, 5000)=4000 min(6000, 7000)=6000
The result array will contain the following datapoints: 1000, 4000 and 6000
Aggregation across timeseries
By default, results of multiple timeseries will be grouped by timeseries, but (since v1.6) you can use the GROUPBY
and REDUCE
options to group them by label and apply an additional aggregation.
To find minimum temperature per region, for example, we can run:
TS.MRANGE - + FILTER region=(east,west) GROUPBY region REDUCE min
3 - Configuration
Run-time configuration
RedisTimeSeries supports a few run-time configuration options that should be determined when loading the module. In time more options will be added.
Passing Configuration Options During Loading
In general, passing configuration options is done by appending arguments after the --loadmodule
argument in the command line, loadmodule
configuration directive in a Redis config file, or the MODULE LOAD
command. For example:
In redis.conf:
loadmodule redistimeseries.so OPT1 OPT2
From redis-cli:
127.0.0.6379> MODULE load redistimeseries.so OPT1 OPT2
From command line:
$ redis-server --loadmodule ./redistimeseries.so OPT1 OPT2
RedisTimeSeries configuration options
COMPACTION_POLICY {policy}
Default compaction/downsampling rules for newly created key with TS.ADD
.
Each rule is separated by a semicolon (;
), the rule consists of several fields that are separated by a colon (:
):
aggregation function - avg, sum, min, max, count, first, last
time bucket duration - number and the time representation (Example for 1 minute: 1M)
- m - millisecond
- M - minute
- s - seconds
- d - day
retention time - in milliseconds
Example:
max:1M:1h
- Aggregate using max over 1 minute and retain the last 1 hour
Default
Example
$ redis-server --loadmodule ./redistimeseries.so COMPACTION_POLICY max:1m:1h;min:10s:5d:10d;last:5M:10ms;avg:2h:10d;avg:3d:100d
RETENTION_POLICY
Maximum age for samples compared to last event time (in milliseconds) per key, this configuration will set
the default retention for newly created keys that do not have a an override.
Default
0
Example
$ redis-server --loadmodule ./redistimeseries.so RETENTION_POLICY 20
CHUNK_TYPE
Default chunk type for automatically created keys when COMPACTION_POLICY is configured.
Possible values: COMPRESSED
, UNCOMPRESSED
.
Default
COMPRESSED
Example
$ redis-server --loadmodule ./redistimeseries.so COMPACTION_POLICY max:1m:1h; CHUNK_TYPE COMPRESSED
NUM_THREADS
The maximal number of per-shard threads for cross-key queries when using cluster mode (TS.MRANGE, TS.MGET, and TS.QUERYINDEX). The value must be equal to or greater than 1. Note that increasing this value may either increase or decrease the performance!
Default
3
Example
$ redis-server --loadmodule ./redistimeseries.so NUM_THREADS 3
DUPLICATE_POLICY
Policy that will define handling of duplicate samples.
The following are the possible policies:
BLOCK
- an error will occur for any out of order sampleFIRST
- ignore the new valueLAST
- override with latest valueMIN
- only override if the value is lower than the existing valueMAX
- only override if the value is higher than the existing valueSUM
- If a previous sample exists, add the new sample to it so that the updated value is equal to (previous + new). If no previous sample exists, set the updated value equal to the new value.
Precedence order
Since the duplication policy can be provided at different levels, the actual precedence of the used policy will be:
- TS.ADD input
- Key level policy
- Module configuration (AKA database-wide)
Default configuration
The default policy for database-wide is BLOCK
, new and pre-existing keys will conform to database-wide default policy.
Example
$ redis-server --loadmodule ./redistimeseries.so DUPLICATE_POLICY LAST
4 - Development
Developing RedisTimeSeries
Developing RedisTimeSeries involves setting up the development environment (which can be either Linux-based or macOS-based), building RedisTimeSeries, running tests and benchmarks, and debugging both the RedisTimeSeries module and its tests.
Cloning the git repository
By invoking the following command, RedisTimeSeries module and its submodules are cloned:
git clone --recursive https://github.com/RedisTimeSeries/RedisTimeSeries.git
Working in an isolated environment
There are several reasons to develop in an isolated environment, like keeping your workstation clean, and developing for a different Linux distribution.
The most general option for an isolated environment is a virtual machine (it's very easy to set one up using Vagrant).
Docker is even a more agile solution, as it offers an almost instant solution:
ts=$(docker run -d -it -v $PWD:/build debian:bullseye bash)
docker exec -it $ts bash
Then, from within the container, cd /build
and go on as usual.
In this mode, all installations remain in the scope of the Docker container.
Upon exiting the container, you can either re-invoke the container with the above docker exec
or commit the state of the container to an image and re-invoke it on a later stage:
docker commit $ts ts1
docker stop $ts
ts=$(docker run -d -it -v $PWD:/build ts1 bash)
docker exec -it $ts bash
Installing prerequisites
To build and test RedisTimeSeries one needs to install several packages, depending on the underlying OS. Currently, we support the Ubuntu/Debian, CentOS, Fedora, and macOS.
If you have gnu make
installed, you can execute
cd RedisTimeSeries
make setup
Alternatively, just invoke the following:
cd RedisTimeSeries
git submodule update --init --recursive
./deps/readies/bin/getpy3
./system-setup.py
Note that system-setup.py
will install various packages on your system using the native package manager and pip. This requires root permissions (i.e. sudo
) on Linux.
If you prefer to avoid that, you can:
- Review
system-setup.py
and install packages manually, - Use an isolated environment like explained above,
- Utilize a Python virtual environment, as Python installations known to be sensitive when not used in isolation.
Installing Redis
As a rule of thumb, you're better off running the latest Redis version.
If your OS has a Redis package, you can install it using the OS package manager.
Otherwise, you can invoke ./deps/readies/bin/getredis
.
Getting help
make help
provides a quick summary of the development features.
Building from source
make
will build RedisTimeSeries.
Build artifacts are placed into bin/linux-x64-release
(or similar, according to your platform and build options).
Use make clean
to remove built artifacts. make clean ALL=1
will remove the entire binary artifacts directory.
Running Redis with RedisTimeSeries
The following will run redis
and load RedisTimeSeries module.
make run
You can open redis-cli
in another terminal to interact with it.
Running tests
The module includes a basic set of unit tests and integration tests:
- C unit tests, located in
src/tests
, run by make unit_tests
. - Python integration tests (enabled by RLTest), located in
tests/flow
, run by make flow_tests
.
One can run all tests by invoking make test
.
A single test can be run using the TEST
parameter, e.g. make flow_test TEST=file:name
.
Debugging
To build for debugging (enabling symbolic information and disabling optimization), run make DEBUG=1
.
One can the use make run DEBUG=1
to invoke gdb
.
In addition to the usual way to set breakpoints in gdb
, it is possible to use the BB
macro to set a breakpoint inside RedisTimeSeries code. It will only have an effect when running under gdb
.
Similarly, Python tests in a single-test mode, one can set a breakpoint by using the BB()
function inside a test. This will invoke pudb
.
The two methods can be combined: one can set a breakpoint within a flow test, and when reached, connect gdb
to a redis-server
process to debug the module.
6.1 - Out-of-order / backfilled ingestion performance considerations
Out-of-order / backfilled ingestion performance considerations
When an older timestamp is inserted into a time series, the chunk of memory corresponding to the new sample’s time frame will potentially have to be retrieved from the main memory (you can read more about these chunks here). When this chunk is a compressed chunk, it will also have to be decoded before we can insert/update to it. These are memory-intensive—and in the case of decoding, compute-intensive—operations that will influence the overall achievable ingestion rate.
Ingest performance is critical for us, which pushed us to assess and be transparent about the impact of the out-of-order backfilled ratio on our overall high-performance TSDB.
To do so, we created a Go benchmark client that enabled us to control key factors that dictate overall system performance, like the out-of-order ratio, the compression of the series, the number of concurrent clients used, and command pipelining. For the full benchmark-driver configuration details and parameters, please refer to this GitHub link.
Furthermore, all benchmark variations were run on Amazon Web Services instances, provisioned through our benchmark-testing infrastructure. Both the benchmarking client and database servers were running on separate c5.9xlarge instances. The tests were executed on a single-shard setup, with RedisTimeSeries version 1.4.
Below you can see the correlation between achievable ops/sec and out-of-order ratio for both compressed and uncompressed chunks.
Compressed chunks out-of-order/backfilled impact analysis
With compressed chunks, given that a single out-of-order datapoint implies the full decompression from double delta of the entire chunk, you should expect higher overheads in out-of-order writes.
As a rule of thumb, to increase out-of-order compressed performance, reduce the chunk size as much as possible. Smaller chunks imply less computation on double-delta decompression and thus less overall impact, with the drawback of smaller compression ratio.
The graphs and tables below make these key points:
If the database receives 1% of out-of-order samples with our current default chunk size in bytes (4096) the overall impact on the ingestion rate should be 10%.
At larger out-of-order percentages, like 5%, 10%, or even 25%, the overall impact should be between 35% to 75% fewer ops/sec. At this level of out-of-order percentages, you should really consider reducing the chunk size.
We've observed a maximum 95% drop in the achievable ops/sec even at 99% out-of-order ingestion. (Again, reducing the chunk size can cut the impact in half.)
Uncompressed chunks out-of-order/backfilled impact analysis
As visible on the charts and tables below, the chunk size does not affect the overall out-of-order impact on ingestion (meaning that if I have a chunk size of 256 bytes and a chunk size of 4096 bytes, the expected impact that out-of-order ingestion is the same—as it should be).
Apart from that, we can observe the following key take-aways:
If the database receives 1% of out-of-order samples, the overall impact in ingestion rate should be low or even unmeasurable.
At higher out-of-order percentages, like 5%, 10%, or even 25%, the overall impact should be 5% to 19% fewer ops/sec.
We've observed a maximum 45% drop in the achievable ops/sec, even at 99% out-of-order ingestion.