This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

RedisGraph

A Graph database built on Redis

Discord Github

RedisGraph is a graph database built on Redis. This graph database uses GraphBlas under the hood for its sparse adjacency matrix graph representation.

Primary features

  • Based on the property graph model
  • Nodes can have any number of labels
  • Relationships have a relationship type
  • Graphs represented as sparse adjacency matrices
  • Cypher as the query language
  • Cypher queries translate into linear algebra expressions

To see RedisGraph in action, explore our demos.

Docker

To quickly try out RedisGraph, launch an instance using docker:

docker run -p 6379:6379 -it --rm redislabs/redisgraph

Give it a try

After you load RedisGraph, you can interact with it using redis-cli.

Here we'll quickly create a small graph representing a subset of motorcycle riders and teams taking part in the MotoGP championship. Once created, we'll start querying our data.

With redis-cli

$ redis-cli
127.0.0.1:6379> GRAPH.QUERY MotoGP "CREATE (:Rider {name:'Valentino Rossi'})-[:rides]->(:Team {name:'Yamaha'}), (:Rider {name:'Dani Pedrosa'})-[:rides]->(:Team {name:'Honda'}), (:Rider {name:'Andrea Dovizioso'})-[:rides]->(:Team {name:'Ducati'})"
1) 1) "Labels added: 2"
   2) "Nodes created: 6"
   3) "Properties set: 6"
   4) "Relationships created: 3"
   5) "Cached execution: 0"
   6) "Query internal execution time: 0.399000 milliseconds"

Now that our MotoGP graph is created, we can start asking questions. For example: Who's riding for team Yamaha?

127.0.0.1:6379> GRAPH.QUERY MotoGP "MATCH (r:Rider)-[:rides]->(t:Team) WHERE t.name = 'Yamaha' RETURN r.name, t.name"
1) 1) "r.name"
   2) "t.name"
2) 1) 1) "Valentino Rossi"
      2) "Yamaha"
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 0.625399 milliseconds"

How many riders represent team Ducati?

127.0.0.1:6379> GRAPH.QUERY MotoGP "MATCH (r:Rider)-[:rides]->(t:Team {name:'Ducati'}) RETURN count(r)"
1) 1) "count(r)"
2) 1) 1) (integer) 1
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 0.624435 milliseconds"

Building

Requirements:

  • The RedisGraph repository: git clone --recurse-submodules -j8 https://github.com/RedisGraph/RedisGraph.git

  • On Ubuntu Linux, run: apt-get install build-essential cmake m4 automake peg libtool autoconf

  • On OS X, verify that homebrew is installed and run: brew install cmake m4 automake peg libtool autoconf.

    • The version of Clang that ships with the OS X toolchain does not support OpenMP, which is a requirement for RedisGraph. One way to resolve this is to run brew install gcc g++ and follow the on-screen instructions to update the symbolic links. Note that this is a system-wide change - setting the environment variables for CC and CXX will work if that is not an option.

To build, run make in the project's directory.

Congratulations! You can find the compiled binary at: src/redisgraph.so

Installing RedisGraph

RedisGraph is part of Redis Stack. See the Redis Stack download page for installaton options.

Using RedisGraph

Before using RedisGraph, you should familiarize yourself with its commands and syntax as detailed in the command reference.

You can call RedisGraph's commands from any Redis client.

With redis-cli

$ redis-cli
127.0.0.1:6379> GRAPH.QUERY social "CREATE (:person {name: 'roi', age: 33, gender: 'male', status: 'married'})"

With any other client

You can interact with RedisGraph using your client's ability to send raw Redis commands. The exact method for doing that depends on your client of choice.

Python example

This code snippet shows how to use RedisGraph with raw Redis commands from Python using redis-py:

import redis

r = redis.Redis()
reply = r.graph("social").query("MATCH (r:Rider)-[:rides]->(t:Team {name:'Ducati'}) RETURN count(r)")

Client libraries

Language-specific clients have been written by the community and the RedisGraph team for 6 languages.

The full list and links can be found on the Clients page.

Data import

The RedisGraph team maintains the redisgraph-bulk-loader for importing new graphs from CSV files.

The data format used by this tool is described in the GRAPH.BULK implementation details.

Mailing List / Forum

Got questions? Feel free to ask at the RedisGraph forum.

License

Redis Source Available License Agreement - see LICENSE

1 - Commands

Commands Overview

Overview

RedisGraph Features

RedisGraph exposes graph database functionality within Redis using the openCypher query language. Its basic commands accept openCypher queries, while additional commands are exposed for configuration or metadata retrieval.

RedisGraph API

Command details can be retrieved by filtering for the module or for a specific command, e.g. [GRAPH.QUERY](/commands/?name=graph.query). The details include the syntax for the commands, where:

  • Command and subcommand names are in uppercase, for example GRAPH.CONFIG or SET
  • Optional arguments are enclosed in square brackets, for example [timeout]
  • Additional optional arguments are indicated by an ellipsis: ...

Most commands require a graph key's name as their first argument.

2 - RedisGraph Client Libraries

The full functionality of RedisGraph is available through redis-cli and the Redis API. RedisInsight is a visual tool that provides capabilities to design, develop and optimize into a single easy-to-use environment, and has built-in support for RedisGraph. In addition there are severeal client libraries to improve abstractions and allow for a more natural experience in a project's native language. Additionally, these clients take advantage of some RedisGraph features that may reduce network throughput in some circumstances.

Currently available Libraries

ProjectLanguageLicenseAuthorStars
JedisJavaMITRedisJedis-stars
redisgraph-pyPythonBSDRedisredisgraph-py-stars
JRedisGraphJavaBSDRedisJRedisGraph-stars
redisgraph-rbRubyBSDRedisredisgraph-rb-stars
redisgraph-goGoBSDRedisredisgraph-go-stars
rueidisGoApache 2.0Rueianrueidis-stars
node-redisJavaScriptMITRedisnode-redis-stars
ioredisgraphJavaScriptISCJonahioredisgraph-stars
@hydre/rgraphJavaScriptMITSceatrgraph-stars
redis-modules-sdkTypeScriptBSD-3-ClauseDani Tseitlinredis-modules-sdk-stars
php-redis-graphPHPMITKJDevphp-redis-graph-stars
redislabs-redisgraph-phpPHPMITmkorkmazredislabs-redisgraph-php-stars
redisgraph_phpPHPMITjpbourbonredisgraph_php-stars
redisgraph-exElixirMITcrflynnredisgraph-ex-stars
redisgraph-rsRUSTMITmalte-vredisgraph-rs-stars
redis_graphRUSTBSDtomproredis_graph-stars
NRedisGraphC#BSDtombatronNRedisGraph-stars
RedisGraphDotNet.ClientC#BSDSgawrysRedisGraphDotNet.Client-stars
RedisGraph.jlJuliaMITxyxelRedisGraph.jl-stars

Implementing a client

Information on some of the tasks involved in writing a RedisGraph client can be found in the Client Specification.

3 - Run-time Configuration

RedisGraph supports a few run-time configuration options that can be defined when loading the module. In the future more options will be added.

Passing configuration options on module load

Passing configuration options is done by appending arguments after the --loadmodule argument when starting a server from the command line or after the loadmodule directive in a Redis config file. For example:

In redis.conf:

loadmodule ./redisgraph.so OPT1 VAL1

From the command line:

$ redis-server --loadmodule ./redisgraph.so OPT1 VAL1

Passing configuration options at run-time

RedisGraph exposes the GRAPH.CONFIG endpoint to allowing for the setting and retrieval of configurations at run-time.

To set a config, simply run:

GRAPH.CONFIG SET OPT1 VAL1

Similarly, the current configurations can be retrieved using the syntax:

GRAPH.CONFIG GET OPT1
GRAPH.CONFIG GET *

RedisGraph configuration options

ConfigurationLoad-timeRun-time
THREAD_COUNT:white_check_mark::white_large_square:
CACHE_SIZE:white_check_mark::white_large_square:
OMP_THREAD_COUNT:white_check_mark::white_large_square:
NODE_CREATION_BUFFER:white_check_mark::white_large_square:
MAX_QUEUED_QUERIES:white_check_mark::white_check_mark:
TIMEOUT:white_check_mark::white_check_mark:
RESULTSET_SIZE:white_check_mark::white_check_mark:
QUERY_MEM_CAPACITY:white_check_mark::white_check_mark:

THREAD_COUNT

The number of threads in RedisGraph's thread pool. This is equivalent to the maximum number of queries that can be processed concurrently.

Default

THREAD_COUNT defaults to the system's hardware threads (logical cores).

Example

$ redis-server --loadmodule ./redisgraph.so THREAD_COUNT 4

CACHE_SIZE

The max number of queries for RedisGraph to cache. When a new query is encountered and the cache is full, meaning the cache has reached the size of CACHE_SIZE, it will evict the least recently used (LRU) entry.

Default

CACHE_SIZE default value is 25.

Example

$ redis-server --loadmodule ./redisgraph.so CACHE_SIZE 10

OMP_THREAD_COUNT

The maximum number of threads that OpenMP may use for computation per query. These threads are used for parallelizing GraphBLAS computations, so may be considered to control concurrency within the execution of individual queries.

Default

OMP_THREAD_COUNT is defined by GraphBLAS by default.

Example

$ redis-server --loadmodule ./redisgraph.so OMP_THREAD_COUNT 1

NODE_CREATION_BUFFER

The node creation buffer is the number of new nodes that can be created without resizing matrices. For example, when set to 16,384, the matrices will have extra space for 16,384 nodes upon creation. Whenever the extra space is depleted, the matrices' size will increase by 16,384.

Reducing this value will reduce memory consumption, but cause performance degradation due to the increased frequency of matrix resizes.

Conversely, increasing it might improve performance for write-heavy workloads but will increase memory consumption.

If the passed argument was not a power of 2, it will be rounded to the next-greatest power of 2 to improve memory alignment.


MAX_QUEUED_QUERIES

Setting the maximum number of queued queries allows the server to reject incoming queries with the error message Max pending queries exceeded. This reduces the memory overhead of pending queries on an overloaded server and avoids congestion when the server processes its backlog of queries.

Default

MAX_QUEUED_QUERIES is effectively unlimited by default (config value of UINT64_MAX).

Example

$ redis-server --loadmodule ./redisgraph.so MAX_QUEUED_QUERIES 500

$ redis-cli GRAPH.CONFIG SET MAX_QUEUED_QUERIES 500

Default

NODE_CREATION_BUFFER is 16,384 by default.

Minimum

The minimum value for NODE_CREATION_BUFFER is 128. Values lower than this will be accepted as arguments, but will internally be converted to 128.

Example

$ redis-server --loadmodule ./redisgraph.so NODE_CREATION_BUFFER 200

TIMEOUT

Timeout is a flag that specifies the maximum runtime for read queries in milliseconds. This configuration will not be respected by write queries, to avoid leaving the graph in an inconsistent state.

Default

TIMEOUT is off by default (config value of 0).

Example

$ redis-server --loadmodule ./redisgraph.so TIMEOUT 1000


RESULTSET_SIZE

Result set size is a limit on the number of records that should be returned by any query. This can be a valuable safeguard against incurring a heavy IO load while running queries with unknown results.

Default

RESULTSET_SIZE is unlimited by default (negative config value).

Example

127.0.0.1:6379> GRAPH.CONFIG SET RESULTSET_SIZE 3
OK
127.0.0.1:6379> GRAPH.QUERY G "UNWIND range(1, 5) AS x RETURN x"
1) 1) "x"
2) 1) 1) (integer) 1
   2) 1) (integer) 2
   3) 1) (integer) 3
3) 1) "Cached execution: 0"
   2) "Query internal execution time: 0.445790 milliseconds"

QUERY_MEM_CAPACITY

Setting the memory capacity of a query allows the server to kill queries that are consuming too much memory and return with the error message Query's mem consumption exceeded capacity. This helps to avoid scenarios when the server becomes unresponsive due to an unbounded query exhausting system resources.

The configuration argument is the maximum number of bytes that can be allocated by any single query.

Default

QUERY_MEM_CAPACITY is unlimited by default; this default can be restored by setting QUERY_MEM_CAPACITY to zero or a negative value.

Example

$ redis-server --loadmodule ./redisgraph.so QUERY_MEM_CAPACITY 1048576 // 1 megabyte limit

$ redis-cli GRAPH.CONFIG SET QUERY_MEM_CAPACITY 1048576

Query Configurations

The query timeout configuration may also be set per query in the form of additional arguments after the query string. This configuration is unset by default unless using a language-specific client, which may establish its own defaults.

Query Timeout

The query flag timeout allows the user to specify a timeout as described in TIMEOUT for a single query.

Example

Retrieve all paths in a graph with a timeout of 1000 milliseconds.

GRAPH.QUERY wikipedia "MATCH p=()-[*]->() RETURN p" timeout 1000

4 - RedisGraph: A High Performance In-Memory Graph Database

Abstract

Graph-based data is everywhere nowadays. Facebook, Google, Twitter and Pinterest are just a few who've realize the power behind relationship data and are utilizing it to its fullest. As a direct result, we see a rise both in interest for and the variety of graph data solutions.

With the introduction of Redis Modules we've recognized the great potential of introducing a graph data structure to the Redis arsenal, and developed RedisGraph. Bringing new graph capabilities to Redis through a native C implementation with an emphasis on performance, RedisGraph is now available as an open source project.

In this documentation, we'll discuss the internal design and features of RedisGraph and demonstrate its current capabilities.

RedisGraph At-a-Glance

RedisGraph is a graph database developed from scratch on top of Redis, using the new Redis Modules API to extend Redis with new commands and capabilities. Its main features include:

  • Simple, fast indexing and querying
  • Data stored in RAM using memory-efficient custom data structures
  • On-disk persistence
  • Tabular result sets
  • Uses the popular graph query language openCypher

A Little Taste: RedisGraph in Action

Let’s look at some of the key concepts of RedisGraph using this example over the redis-cli tool:

Constructing a graph

It is common to represent entities as nodes within a graph. In this example, we'll create a small graph with both actors and movies as its entities, and an "act" relation that will connect actors to the movies they acted in. We'll use the graph.QUERY command to issue a CREATE query, which will introduce new entities and relations to our graph.

graph.QUERY <graph_id> 'CREATE (:<label> {<attribute_name>:<attribute_value>,...})'
graph.QUERY <graph_id> 'CREATE (<source_node_alias>)-[<relation> {<attribute_name>:<attribute_value>,...}]->(<dest_node_alias>)'

We'll construct our graph in one command:

graph.QUERY IMDB 'CREATE (aldis:actor {name: "Aldis Hodge", birth_year: 1986}),
                         (oshea:actor {name: "OShea Jackson", birth_year: 1991}),
                         (corey:actor {name: "Corey Hawkins", birth_year: 1988}),
                         (neil:actor {name: "Neil Brown", birth_year: 1980}),
                         (compton:movie {title: "Straight Outta Compton", genre: "Biography", votes: 127258, rating: 7.9, year: 2015}),
                         (neveregoback:movie {title: "Never Go Back", genre: "Action", votes: 15821, rating: 6.4, year: 2016}),
                         (aldis)-[:act]->(neveregoback),
                         (aldis)-[:act]->(compton),
                         (oshea)-[:act]->(compton),
                         (corey)-[:act]->(compton),
                         (neil)-[:act]->(compton)'

Querying the graph

RedisGraph exposes a subset of the openCypher graph language. Although only some language capabilities are supported, there's enough functionality to extract valuable insights from your graphs. To execute a query, we'll use the GRAPH.QUERY command:

GRAPH.QUERY <graph_id> <query>

Let's execute a number of queries against our movies graph.

Find the sum, max, min and avg age of the 'Straight Outta Compton' cast:

GRAPH.QUERY IMDB 'MATCH (a:actor)-[:act]->(m:movie {title:"Straight Outta Compton"})
RETURN m.title, SUM(2020-a.birth_year), MAX(2020-a.birth_year), MIN(2020-a.birth_year), AVG(2020-a.birth_year)'

RedisGraph will reply with:

1) 1) "m.title"
   2) "SUM(2020-a.birth_year)"
   3) "MAX(2020-a.birth_year)"
   4) "MIN(2020-a.birth_year)"
   5) "AVG(2020-a.birth_year)"
2) 1) 1) "Straight Outta Compton"
      2) "135"
      3) (integer) 40
      4) (integer) 29
      5) "33.75"
  • The first row is our result-set header which names each column according to the return clause.
  • The second row contains our query result.

Let's try another query. This time, we'll find out how many movies each actor played in.

GRAPH.QUERY IMDB "MATCH (actor)-[:act]->(movie) RETURN actor.name, COUNT(movie.title) AS movies_count ORDER BY
movies_count DESC"

1) "actor.name, movies_count"
2) "Aldis Hodge,2.000000"
3) "O'Shea Jackson,1.000000"
4) "Corey Hawkins,1.000000"
5) "Neil Brown,1.000000"

The Theory: Ideas behind RedisGraph

Representation

RedisGraph uses sparse adjacency matrices to represent graphs. As directed relationship connecting source node S to destination node T is recorded within an adjacency matrix M, by setting M's S,T entry to 1 (M[S,T]=1). As a rule of thumb, matrix rows represent source nodes while matrix columns represent destination nodes.

Every graph stored within RedisGraph has at least one matrix, referred to as THE adjacency matrix (relation-type agnostic). In addition, every relation with a type has its own dedicated matrix. Consider a graph with two relationships types:

  1. visits
  2. friend

The underlying graph data structure maintains three matrices:

  1. THE adjacency matrix - marking every connection within the graph
  2. visit matrix - marking visit connections
  3. friend matrix - marking friend connections

A 'visit' relationship E that connects node A to node B, sets THE adjacency matrix at position [A,B] to 1. Also, the visit matrix V sets position V[A,B] to 1.

To accommodate typed nodes, one additional matrix is allocated per label, and a label matrix is symmetric with ones along the main diagonal. Assume that node N was labeled as a Person, then the Person matrix P sets position P[N,N] to 1.

This design lets RedisGraph modify its graph easily, including:

  • Adding new nodes simply extends matrices, adding additional rows and columns
  • Adding new relationships by setting the relevant entries at the relevant matrices
  • Removing relationships clears relevant entries
  • Deleting nodes by deleting matrix row/column.

One of the main reasons we chose to represent our graphs as sparse matrices is graph traversal.

Traversal

Graph traversal is done by matrix multiplication. For example, if we wanted to find friends of friends for every node in the graph, this traversal can be expressed by FOF = F^2. F stands for the friendship relation matrix, and the result matrix FOF summarizes the traversal. The rows of the FOF represent source nodes and its columns represent friends who are two hops away: if FOF[i,j] = 1 then j is a friend of a friend of i.

Algebraic expression

When a search pattern such as (N0)-[A]->(N1)-[B]->(N2)<-[A]-(N3) is used as part of a query, we translate it into a set of matrix multiplications. For the given example, one possible expression would be: A * B * Transpose(A).

Note that matrix multiplication is an associative and distributive operation. This gives us the freedom to choose which terms we want to multiply first (preferring terms that will produce highly sparse intermediate matrices). It also enables concurrency when evaluating an expression, e.g. we could compute AZ and BY in parallel.

GraphBLAS

To perform all of these operations for sparse matrices, RedisGraph uses GraphBLAS - a standard API similar to BLAS. The current implementation uses the CSC sparse matrix format (compressed sparse columns), although the underlying format is subject to change.

Query language: openCypher

There are a number of graph query languages, so we didn't want to reinvent the wheel. We decided to implement a subset of one of the most popular graph query languages out there: openCypher While the openCypher project provides a parser for the language, we decided to create our own parser. We used Lex as a tokenizer and Lemon to generate a C target parser.

As mentioned earlier, only a subset of the language is currently supported, but we plan to continue adding new capabilities and extend RedisGraph's openCypher capabilities.

Runtime: query execution

Let's review the steps RedisGraph takes when executing a query.

Consider this query that finds all actors who played alongside Aldis Hodge and are over 30 years old:

MATCH (aldis::actor {name:"Aldis Hodge"})-[:act]->(m:movie)<-[:act]-(a:actor) WHERE a.age > 30 RETURN m.title, a.name

RedisGraph will:

  • Parse the query, and build an abstract syntax tree (AST)
  • Compose traversal algebraic expressions
  • Build filter trees
  • Construct an optimized query execution plan composed of:
    • Filtered traverse
    • Conditional traverse
    • Filter
    • Project
  • Execute the plan
  • Populate a result set with matching entity attributes

Filter tree

A query can filter out entities by creating predicates. In our example, we filter actors younger then 30.

It's possible to combine predicates using OR and AND keywords to form granular conditions.

During runtime, the WHERE clause is used to construct a filter tree, and each node within the tree is either a condition (e.g. A > B) or an operation (AND/OR). When finding candidate entities, they are passed through the tree and evaluated.

Benchmarks

Depending on the underlying hardware results may vary. However, inserting a new relationship is done in O(1). RedisGraph is able to create over 1 million nodes under half a second and form 500K relations within 0.3 of a second.

License

RedisGraph is published under Redis Source Available License Agreement.

Conclusion

Although RedisGraph is still a young project, it can be an alternative to other graph databases. With its subset of operations, you can use it to analyze and explore graph data. Being a Redis Module, this project is accessible from every Redis client without adjustments. It's our intention to keep on improving and extending RedisGraph with the help of the open source community.

4.1 - Technical specification for writing RedisGraph client libraries

By design, there is not a full standard for RedisGraph clients to adhere to. Areas such as pretty-print formatting, query validation, and transactional and multithreaded capabilities have no canonically correct behavior, and the implementer is free to choose the approach and complexity that suits them best. RedisGraph does, however, provide a compact result set format for clients that minimizes the amount of redundant data transmitted from the server. Implementers are encouraged to take advantage of this format, as it provides better performance and removes ambiguity from decoding certain data. This approach requires clients to be capable of issuing procedure calls to the server and performing a small amount of client-side caching.

Retrieving the compact result set

Appending the flag --compact to any query issued to the GRAPH.QUERY endpoint will cause the server to issue results in the compact format. Because we don't store connection-specific configurations, all queries should be issued with this flag.

GRAPH.QUERY demo "MATCH (a) RETURN a" --compact

Formatting differences in the compact result set

The result set has the same overall structure as described in the Result Set documentation.

Certain values are emitted as integer IDs rather than strings:

  1. Node labels
  2. Relationship types
  3. Property keys

Instructions on how to efficiently convert these IDs in the Procedure Calls section below.

Additionally, two enums are exposed:

ColumnType, which as of RedisGraph v2.1.0 will always be COLUMN_SCALAR. This enum is retained for backwards compatibility, and may be ignored by the client unless RedisGraph versions older than v2.1.0 must be supported.

ValueType indicates the data type (such as Node, integer, or string) of each returned value. Each value is emitted as a 2-array, with this enum in the first position and the actual value in the second. Each property on a graph entity also has a scalar as its value, so this construction is nested in each value of the properties array when a column contains a node or relationship.

Decoding the result set

Given the graph created by the query:

GRAPH.QUERY demo "CREATE (:plant {name: 'Tree'})-[:GROWS {season: 'Autumn'}]->(:fruit {name: 'Apple'})"

Let's formulate a query that returns 3 columns: nodes, relationships, and scalars, in that order.

Verbose (default):

127.0.0.1:6379> GRAPH.QUERY demo "MATCH (a)-[e]->(b) RETURN a, e, b.name"
1) 1) "a"
   2) "e"
   3) "b.name"
2) 1) 1) 1) 1) "id"
            2) (integer) 0
         2) 1) "labels"
            2) 1) "plant"
         3) 1) "properties"
            2) 1) 1) "name"
                  2) "Tree"
      2) 1) 1) "id"
            2) (integer) 0
         2) 1) "type"
            2) "GROWS"
         3) 1) "src_node"
            2) (integer) 0
         4) 1) "dest_node"
            2) (integer) 1
         5) 1) "properties"
            2) 1) 1) "season"
                  2) "Autumn"
      3) "Apple"
3) 1) "Query internal execution time: 1.326905 milliseconds"

Compact:

127.0.0.1:6379> GRAPH.QUERY demo "MATCH (a)-[e]->(b) RETURN a, e, b.name" --compact
1) 1) 1) (integer) 1
      2) "a"
   2) 1) (integer) 1
      2) "e"
   3) 1) (integer) 1
      2) "b.name"
2) 1) 1) 1) (integer) 8
         2) 1) (integer) 0
            2) 1) (integer) 0
            3) 1) 1) (integer) 0
                  2) (integer) 2
                  3) "Tree"
      2) 1) (integer) 7
         2) 1) (integer) 0
            2) (integer) 0
            3) (integer) 0
            4) (integer) 1
            5) 1) 1) (integer) 1
                  2) (integer) 2
                  3) "Autumn"
      3) 1) (integer) 2
         2) "Apple"
3) 1) "Query internal execution time: 1.085412 milliseconds"

These results are being parsed by redis-cli, which adds such visual cues as array indexing and indentation, as well as type hints like (integer). The actual data transmitted is formatted using the RESP protocol. All of the current RedisGraph clients rely upon a stable Redis client in the same language (such as redis-rb for Ruby) which handles RESP decoding.

Top-level array results

The result set above had 3 members in its top-level array:

1) Header row
2) Result rows
3) Query statistics

All queries that have a RETURN clause will have these 3 members. Queries that don't return results have only one member in the outermost array, the query statistics:

127.0.0.1:6379> GRAPH.QUERY demo "CREATE (:plant {name: 'Tree'})-[:GROWS {season: 'Autumn'}]->(:fruit {name: 'Apple'})" --compact
1) 1) "Labels added: 2"
   2) "Nodes created: 2"
   3) "Properties set: 3"
   4) "Relationships created: 1"
   5) "Query internal execution time: 1.972868 milliseconds"

Rather than introspecting on the query being emitted, the client implementation can check whether this array contains 1 or 3 elements to choose how to format data.

Reading the header row

Our sample query MATCH (a)-[e]->(b) RETURN a, e, b.name generated the header:

1) 1) (integer) 1
   2) "a"
3) 1) (integer) 1
   3) "e"
4) 1) (integer) 1
   3) "b.name"

The 4 array members correspond, in order, to the 3 entities described in the RETURN clause.

Each is emitted as a 2-array:

1) ColumnType (enum)
2) column name (string)

The first element is the ColumnType enum, which as of RedisGraph v2.1.0 will always be COLUMN_SCALAR. This element is retained for backwards compatibility, and may be ignored by the client unless RedisGraph versions older than v2.1.0 must be supported.

Reading result rows

The entity representations in this section will closely resemble those found in Result Set Graph Entities.

Our query produced one row of results with 3 columns (as described by the header):

1) 1) 1) (integer) 8
      2) 1) (integer) 0
         2) 1) (integer) 0
         3) 1) 1) (integer) 0
               2) (integer) 2
               3) "Tree"
   2) 1) (integer) 7
      2) 1) (integer) 0
         2) (integer) 0
         3) (integer) 0
         4) (integer) 1
         5) 1) 1) (integer) 1
               2) (integer) 2
               3) "Autumn"
   3) 1) (integer) 2
      2) "Apple"

Each element is emitted as a 2-array - [ValueType, value].

It is the client's responsibility to store the ValueType enum. RedisGraph guarantees that this enum may be extended in the future, but the existing values will not be altered.

The ValueType for the first entry is VALUE_NODE. The node representation contains 3 top-level elements:

  1. The node's internal ID.
  2. An array of all label IDs associated with the node (currently, each node can have either 0 or 1 labels, though this restriction may be lifted in the future).
  3. An array of all properties the node contains. Properties are represented as 3-arrays - [property key ID, ValueType, value].
[	
    Node ID (integer),
    [label ID (integer) X label count]
    [[property key ID (integer), ValueType (enum), value (scalar)] X property count]
]

The ValueType for the first entry is VALUE_EDGE. The edge representation differs from the node representation in two respects:

  • Each relation has exactly one type, rather than the 0+ labels a node may have.
  • A relation is emitted with the IDs of its source and destination nodes.

As such, the complete representation is as follows:

  1. The relation's internal ID.
  2. The relationship type ID.
  3. The source node's internal ID.
  4. The destination node's internal ID.
  5. The key-value pairs of all properties the relation possesses.
[	
    Relation ID (integer),
    type ID (integer),
    source node ID (integer),
    destination node ID (integer),
    [[property key ID (integer), ValueType (enum), value (scalar)] X property count]
]

The ValueType for the third entry is VALUE_STRING, and the other element in the array is the actual value, "Apple".

Reading statistics

The final top-level member of the GRAPH.QUERY reply is the execution statistics. This element is identical between the compact and standard response formats.

The statistics always include query execution time, while any combination of the other elements may be included depending on how the graph was modified.

  1. "Labels added: (integer)"
  2. "Nodes created: (integer)"
  3. "Properties set: (integer)"
  4. "Nodes deleted: (integer)"
  5. "Relationships deleted: (integer)"
  6. "Relationships created: (integer)"
  7. "Query internal execution time: (float) milliseconds"

Procedure Calls

Property keys, node labels, and relationship types are all returned as IDs rather than strings in the compact format. For each of these 3 string-ID mappings, IDs start at 0 and increase monotonically.

As such, the client should store an string array for each of these 3 mappings, and print the appropriate string for the user by checking an array at position ID. If an ID greater than the array length is encountered, the local array should be updated with a procedure call.

These calls are described generally in the Procedures documentation.

To retrieve each full mapping, the appropriate calls are:

db.labels()

127.0.0.1:6379> GRAPH.QUERY demo "CALL db.labels()"
1) 1) "label"
2) 1) 1) "plant"
   2) 1) "fruit"
3) 1) "Query internal execution time: 0.321513 milliseconds"

db.relationshipTypes()

127.0.0.1:6379> GRAPH.QUERY demo "CALL db.relationshipTypes()"
1) 1) "relationshipType"
2) 1) 1) "GROWS"
3) 1) "Query internal execution time: 0.429677 milliseconds"

db.propertyKeys()

127.0.0.1:6379> GRAPH.QUERY demo "CALL db.propertyKeys()"
1) 1) "propertyKey"
2) 1) 1) "name"
   2) 1) "season"
3) 1) "Query internal execution time: 0.318940 milliseconds"

Because the cached values never become outdated, it is possible to just retrieve new values with slightly more complex constructions:

CALL db.propertyKeys() YIELD propertyKey RETURN propertyKey SKIP [cached_array_length]

Though the property calls are quite efficient regardless of whether this optimization is used.

As an example, the Python client checks its local array of labels to resolve every label ID as seen here.

In the case of an IndexError, it issues a procedure call to fully refresh its label cache as seen here.

Reference clients

All the logic described in this document has been implemented in most of the clients listed in Client Libraries. Among these, node-redis, redis-py and jedis are currently the most sophisticated.

4.2 - RedisGraph Result Set Structure

This document describes the format RedisGraph uses to print data when accessed through the redis-cli utility. The language-specific clients retrieve data in a more succinct format, and provide their own functionality for printing result sets.

Top-level members

Queries that return data emit an array with 3 top-level members:

  1. The header describing the returned records. This is an array which corresponds precisely in order and naming to the RETURN clause of the query.
  2. A nested array containing the actual data returned by the query.
  3. An array of metadata related to the query execution. This includes query runtime as well as data changes, such as the number of entities created, deleted, or modified by the query.

Result Set data types

A column in the result set can be populated with graph entities (nodes or relations) or scalar values.

Scalars

RedisGraph replies are formatted using the RESP protocol. The current RESP iteration provides fewer data types than RedisGraph supports internally, so displayed results are mapped as follows:

RedisGraph typeDisplay format
IntegerInteger
NULLNULL (nil)
StringString
BooleanString ("true"/"false")
DoubleString (15-digit precision)

Graph Entities

When full entities are specified in a RETURN clause, all data relevant to each entity value is emitted within a nested array. Key-value pairs in the data, such as the combination of a property name and its corresponding value, are represented as 2-arrays.

Internal IDs are returned with nodes and relations, but these IDs are not immutable. After entities have been deleted, higher IDs may be migrated to the vacated lower IDs.

Nodes

The node representation contains 3 top-level elements:

  1. The node's internal ID.
  2. Any labels associated with the node.
  3. The key-value pairs of all properties the node possesses.
[	
    [“id”, ID (integer)]
    [“labels”,
        [label (string) X label count]
    ]
    [properties”, [
        [prop_key (string), prop_val (scalar)] X property count]
    ]
]

Relations

The relation representation contains 5 top-level elements:

  1. The relation's internal ID.
  2. The type associated with the relation.
  3. The source node's internal ID.
  4. The destination node's internal ID.
  5. The key-value pairs of all properties the relation possesses.
[	
    [“id”, ID (integer)]
    [“type”, type (string)]
    [“src_node”, source node ID (integer)]
    [“dest_node”, destination node ID (integer)]
    [properties”, [
        [prop_key (string), prop_val (scalar)] X property count]
    ]
]

Collections

Arrays

When array values are specified in a RETURN clause, the representation in the response is the array string representation. This is done solely for better readability of the response. The string representation of an array which contains graph entities, will print only their ID. A node string representation is round braces around its ID, and edge string representation is brackets around the edge ID.

Paths

Returned path value is the string representation of an array with the path's nodes and edges, interleaved.

Example

Given the graph created by the command:

CREATE (:person {name:'Pam', age:27})-[:works {since: 2010}]->(:employer {name:'Dunder Mifflin'})""

We can run a query that returns a node, a relation, and a scalar value.

"MATCH (n1)-[r]->(n2) RETURN n1, r, n2.name"
1) 1) "n1"
   2) "r"
   3) "n2.name"
2) 1) 1) 1) 1) "id"
            2) (integer) 0
         2) 1) "labels"
            2) 1) "person"
         3) 1) "properties"
            2) 1) 1) "age"
                  2) (integer) 27
               2) 1) "name"
                  2) "Pam"
      2) 1) 1) "id"
            2) (integer) 0
         2) 1) "type"
            2) "works"
         3) 1) "src_node"
            2) (integer) 0
         4) 1) "dest_node"
            2) (integer) 1
         5) 1) "properties"
            2) 1) 1) "since"
                  2) (integer) 2010
      3) "Dunder Mifflin"
3) 1) "Query internal execution time: 1.858986 milliseconds"

4.3 - Implementation details for the GRAPH.BULK endpoint

The RedisGraph bulk loader uses the GRAPH.BULK endpoint to build a new graph from 1 or more Redis queries. The bulk of these queries is binary data that is unpacked to create nodes, edges, and their properties. This endpoint could be used to write bespoke import tools for other data formats using the implementation details provided here.

Caveats

The main complicating factor in writing bulk importers is that Redis has a maximum string length of 512 megabytes and a default maximum query size of 1 gigabyte. As such, large imports must be written incrementally.

The RedisGraph team will do their best to ensure that future updates to this logic do not break current implementations, but cannot guarantee it.

Query Format

GRAPH.BULK [graph name] ["BEGIN"] [node count] [edge count] ([binary blob] * N)

Arguments

graph name

The name of the graph to be inserted.

BEGIN

The endpoint cannot be used to update existing graphs, only to create new ones. For this reason, the first query in a sequence of BULK commands should pass the string literal "BEGIN".

node count

Number of nodes being inserted in this query.

edge count

Number of edges being inserted in this query.

binary blob

A binary string of up to 512 megabytes that partially or completely describes a single label or relationship type.

Any number of these blobs may be provided in a query provided that Redis's 1-gigabyte query limit is not exceeded.

Module behavior

The endpoint will parse binary blobs as nodes until the number of created nodes matches the node count, then will parse subsequent blobs as edges. The import tool is expected to correctly provide these counts.

If the BEGIN token is found, the module will verify that the graph key is unused, and will emit an error if it is. Otherwise, the partially-constructed graph will be retrieved in order to resume building.

Binary Blob format

Node format

Nodes in node blobs do not need to specify their IDs. The ID of each node is an 8-byte unsigned integer corresponding to the node count at the time of its creation. (The first-created node has the ID of 0, the second has 1, and so forth.)

The blob consists of:

  1. header specification

  2. 1 or more property specifications

Edge format

The import tool is responsible for tracking the IDs of nodes used as edge endpoints.

The blob consists of:

  1. header specification

  2. 1 or more:

    1. 8-byte unsigned integer representing source node ID
    2. 8-byte unsigned integer representing destination node ID
    3. property specification

Header specification

  1. name - A null-terminated string representing the name of the label or relationship type.

  2. property count - A 4-byte unsigned integer representing the number of properties each entry in this blob possesses.

  3. property names - an ordered sequence of property count null-terminated strings, each representing the name for the property at that position.

Property specification

  1. property type - A 1-byte integer corresponding to the TYPE enum:
BI_NULL = 0,
BI_BOOL = 1,
BI_DOUBLE = 2,
BI_STRING = 3,
BI_LONG = 4,
BI_ARRAY = 5,
  1. property:
    • 1-byte true/false if type is boolean
    • 8-byte double if type is double
    • 8-byte integer if type is integer
    • Null-terminated C string if type is string
    • 8-byte array length followed by N values of this same type-property pair if type is array

Redis Reply

Redis will reply with a string of the format:

[N] nodes created, [M] edges created

5 - RedisGraph Data Types

RedisGraph supports a number of distinct data types, some of which can be persisted as property values and some of which are ephemeral.

Graph types

All graph types are either structural elements of the graph or projections thereof. None can be stored as a property value.

Nodes

Nodes are persistent graph elements that can be connected to each other via relationships.

They can have any number of labels that describe their general type. For example, a node representing London may be created with the Place and City labels and retrieved by queries using either or both of them.

Nodes have sets of properties to describe all of their salient characteristics. For example, our London node may have the property set: {name: 'London', capital: True, elevation: 11}.

When querying nodes, multiple labels can be specified. Only nodes that hold all specified labels will be matched:

$ redis-cli GRAPH.QUERY G "MATCH (n:Place:Continent) RETURN n"

Relationships

Relationships are persistent graph elements that connect one node to another.

They must have exactly one type that describes what they represent. For example, a RESIDENT_OF relationship may be used to connect a Person node to a City node.

Relationships are always directed, connecting a source node to its destination.

Like nodes, relationships have sets of properties to describe all of their salient characteristics.

When querying relationships, multiple types can be specified when separated by types. Relationships that hold any of the specified types will be matched:

$ redis-cli GRAPH.QUERY G "MATCH (:Person)-[r:RESIDENT_OF|:VISITOR_TO]->(:Place {name: 'London'}) RETURN r"

Paths

Paths are alternating sequences of nodes and edges, starting and ending with a node.

They are not structural elements in the graph, but can be created and returned by queries.

For example, the following query returns all paths of any length connecting the node London to the node New York:

$ redis-cli GRAPH.QUERY G "MATCH p=(:City {name: 'London'})-[*]->(:City {name: 'New York'}) RETURN p"

Scalar types

All scalar types may be provided by queries or stored as property values on node and relationship objects.

Strings

RedisGraph strings are Unicode character sequences. When using Redis with a TTY (such as invoking RedisGraph commands from the terminal via redis-cli), some code points may not be decoded, as in:

$ redis-cli GRAPH.QUERY G "RETURN '日本人' as stringval"
1) 1) "stringval"
2) 1) 1) "\xe6\x97\xa5\xe6\x9c\xac\xe4\xba\xba"

Output decoding can be forced using the --raw flag:

$ redis-cli --raw GRAPH.QUERY G "RETURN '日本人' as stringval"
stringval
日本人

Booleans

Boolean values are specified as true or false. Internally, they are stored as numerics, with 1 representing true and 0 representing false. As RedisGraph considers types in its comparisons, 1 is not considered equal to true:

$ redis-cli GRAPH.QUERY G "RETURN 1 = true"
1) 1) "1 = true"
2) 1) 1) "false"

Integers

All RedisGraph integers are treated as 64-bit signed integers.

Floating-point values

All RedisGraph floating-point values are treated as 64-bit signed doubles.

Geospatial Points

The Point data type is a set of latitude/longitude coordinates, stored within RedisGraph as a pair of 32-bit floats. It is instantiated using the point() function call.

Nulls

In RedisGraph, null is used to stand in for an unknown or missing value.

Since we cannot reason broadly about unknown values, null is an important part of RedisGraph's 3-valued truth table. For example, the comparison null = null will evaluate to null, as we lack adequate information about the compared values. Similarly, null in [1,2,3] evaluates to null, since the value we're looking up is unknown.

Unlike all other scalars, null cannot be stored as a property value.

Collection types

Arrays

Arrays are ordered lists of elements. They can be provided as literals or generated by functions like collect(). Nested arrays are supported, as are many functions that operate on arrays such as list comprehensions.

Arrays can be stored as property values provided that no array element is of an unserializable type, such as graph entities or null values.

Maps

Maps are order-agnostic collections of key-value pairs. If a key is a string literal, the map can be accessed using dot notation. If it is instead an expression that evaluates to a string literal, bracket notation can be used:

$ redis-cli GRAPH.QUERY G "WITH {key1: 'stringval', key2: 10} AS map RETURN map.key1, map['key' + 2]"
1) 1) "map.key1"
   2) "map['key' + 2]"
2) 1) 1) "stringval"
      2) (integer) 10

This aligns with way that the properties of nodes and relationships can be accessed.

Maps cannot be stored as property values.

Map projections

Maps can be constructed as projections using the syntax alias {.key1 [, ...n]}. This can provide a useful format for returning graph entities. For example, given a graph with the node (name: 'Jeff', age: 32), we can build the projection:

$ redis-cli GRAPH.QUERY G "MATCH (n) RETURN n {.name, .age} AS projection"
1) 1) "projection"
2) 1) 1) "{name: Jeff, age: 32}"

Function calls in map values

The values in maps and map projections are flexible, and can generally refer either to constants or computed values:

$ redis-cli GRAPH.QUERY G "RETURN {key1: 'constant', key2: rand(), key3: toLower('GENERATED') + '_string'} AS map"
1) 1) "map"
2) 1) 1) "{key1: constant, key2: 0.889656, key3: generated_string}"

The exception to this is aggregation functions, which must be computed in a preceding WITH clause instead of being invoked within the map. This restriction is intentional, as it helps to clearly disambiguate the aggregate function calls and the key values they are grouped by:

$ redis-cli GRAPH.QUERY G "
MATCH (follower:User)-[:FOLLOWS]->(u:User)
WITH u, COUNT(follower) AS count
RETURN u {.name, follower_count: count} AS user"
1) 1) "user"
2) 1) 1) "{name: Jeff, follower_count: 12}"
   2) 1) "{name: Roi, follower_count: 18}"

6 - Cypher Coverage

RedisGraph implements a subset of the Cypher language, which is growing as development continues. This document is based on the Cypher Query Language Reference (version 9), available at OpenCypher Resources.

Patterns

Patterns are fully supported.

Types

Structural types

  • Nodes
  • Relationships
  • Path variables (alternating sequence of nodes and relationships).

Composite types

  • Lists

  • Maps

    Unsupported:

  • Temporal types (Date, DateTime, LocalDateTime, Time, LocalTime, Duration)

Literal types

  • Numeric types (64-bit doubles and 64-bit signed integer representations)

  • String literals

  • Booleans

    Unsupported:

  • Hexadecimal and octal numerics

Other

NULL is supported as a representation of a missing or undefined value.

Comparability, equality, orderability, and equivalence

This is a somewhat nebulous area in Cypher itself, with a lot of edge cases. Broadly speaking, RedisGraph behaves as expected with string and numeric values. There are likely some behaviors involving the numerics NaN, -inf, inf, and possibly -0.0 that deviate from the Cypher standard. We do not support any of these properties at the type level, meaning nodes and relationships are not internally comparable.

Clauses

Reading Clauses

  • MATCH
  • OPTIONAL MATCH

Projecting Clauses

  • RETURN
  • AS
  • WITH
  • UNWIND

Reading sub-clauses

  • WHERE
  • ORDER BY
  • SKIP
  • LIMIT

Writing Clauses

  • CREATE

  • DELETE

    • We actually implement DETACH DELETE, the distinction being that relationships invalidated by node deletions are automatically deleted.
  • SET

    Unsupported:

  • REMOVE (to modify properties)
    • Properties can be deleted with SET [prop] = NULL.

Reading/Writing Clauses

Set Operations

  • UNION
  • UNION ALL

Functions

Scalar functions

  • Some casting functions (toBoolean, toFloat)
  • Temporal arithmetic functions
  • Functions returning maps (properties)

Aggregating functions

  • avg
  • collect
  • count
  • max
  • min
  • percentileCont
  • percentileDist
  • stDev
  • stDevP
  • sum

List functions

  • head
  • range
  • reverse
  • size
  • tail

Math functions - numeric

  • abs
  • ceil
  • floor
  • sign
  • round
  • rand
  • toInteger

String functions

  • left

  • right

  • trim

  • lTrim

  • rTrim

  • reverse

  • substring

  • toLower

  • toUpper

  • toString

    Unsupported:

  • replace
  • split

Predicate functions

  • exists
  • any
  • all
  • single
  • none

Expression functions

  • case...when

Geospatial functions

  • distance
  • point

Unsupported function classes

  • Logarithmic math functions
  • Trigonometric math functions
  • User-defined functions

Operators

Mathematical operators

  • Multiplication, addition, subtraction, division, modulo

    Unsupported:

  • Exponentiation

String operators

  • String operators (STARTS WITH, ENDS WITH, CONTAINS) are supported.

    Unsupported:

  • Regex operator

Boolean operators

  • AND
  • OR
  • NOT
  • XOR

Parameters

Parameters may be specified to allow for more flexible query construction:

CYPHER name_param = "Niccolò Machiavelli" birth_year_param = 1469 MATCH (p:Person {name: $name_param, birth_year: $birth_year_param}) RETURN p

The example above shows the syntax used by redis-cli to set parameters, but each RedisGraph client introduces a language-appropriate method for setting parameters, and is described in their documentation.

Non-Cypher queries

  • RedisGraph provides the GRAPH.EXPLAIN command to print the execution plan of a provided query.
  • GRAPH.DELETE will remove a graph and all Redis keys associated with it.
  • We do not currently provide support for queries that retrieve schemas, though the LABELS and TYPE scalar functions may be used to get a graph overview.

7 - References

Quick start

8 - Known limitations

Relationship uniqueness in patterns

When a relation in a match pattern is not referenced elsewhere in the query, RedisGraph will only verify that at least one matching relation exists (rather than operating on every matching relation).

In some queries, this will cause unexpected behaviors. Consider a graph with 2 nodes and 2 relations between them:

CREATE (a)-[:e {val: '1'}]->(b), (a)-[:e {val: '2'}]->(b)

Counting the number of explicit edges returns 2, as expected.

MATCH (a)-[e]->(b) RETURN COUNT(e)

However, if we count the nodes in this pattern without explicitly referencing the relation, we receive a value of 1.

MATCH (a)-[e]->(b) RETURN COUNT(b)

We are researching designs that resolve this problem without negatively impacting performance. As a temporary workaround, queries that must operate on every relation matching a pattern should explicitly refer to that relation's alias elsewhere in the query. Two options for this are:

MATCH (a)-[e]->(b) WHERE ID(e) >= 0 RETURN COUNT(b)
MATCH (a)-[e]->(b) RETURN COUNT(b), e.dummyval

LIMIT clause does not affect eager operations

When a WITH or RETURN clause introduces a LIMIT value, this value ought to be respected by all preceding operations.

For example, given the query:

UNWIND [1,2,3] AS value CREATE (a {property: value}) RETURN a LIMIT 1

One node should be created with its 'property' set to 1. RedisGraph will currently create three nodes, and only return the first.

This limitation affects all eager operations: CREATE, SET, DELETE, MERGE, and projections with aggregate functions.

Indexing limitations

One way in which RedisGraph will optimize queries is by introducing index scans when a filter is specified on an indexed label-property pair.

The current index implementation, however, does not handle not-equal (<>) filters.

To profile a query and see whether index optimizations have been introduced, use the GRAPH.EXPLAIN endpoint:

$ redis-cli GRAPH.EXPLAIN social "MATCH (p:person) WHERE p.id < 5 RETURN p"
1) "Results"
2) "    Project"
3) "        Index Scan | (p:person)"