RedisGraph
A Graph database built on Redis
RedisGraph is a graph database built on Redis. This graph database uses GraphBlas under the hood for its sparse adjacency matrix graph representation.
Primary features
- Based on the property graph model
- Nodes can have any number of labels
- Relationships have a relationship type
- Graphs represented as sparse adjacency matrices
- Cypher as the query language
- Cypher queries translate into linear algebra expressions
To see RedisGraph in action, explore our demos.
Docker
To quickly try out RedisGraph, launch an instance using docker:
docker run -p 6379:6379 -it --rm redislabs/redisgraph
Give it a try
After you load RedisGraph, you can interact with it using redis-cli
.
Here we'll quickly create a small graph representing a subset of motorcycle riders and teams taking part in the MotoGP championship. Once created, we'll start querying our data.
With redis-cli
$ redis-cli
127.0.0.1:6379> GRAPH.QUERY MotoGP "CREATE (:Rider {name:'Valentino Rossi'})-[:rides]->(:Team {name:'Yamaha'}), (:Rider {name:'Dani Pedrosa'})-[:rides]->(:Team {name:'Honda'}), (:Rider {name:'Andrea Dovizioso'})-[:rides]->(:Team {name:'Ducati'})"
1) 1) "Labels added: 2"
2) "Nodes created: 6"
3) "Properties set: 6"
4) "Relationships created: 3"
5) "Cached execution: 0"
6) "Query internal execution time: 0.399000 milliseconds"
Now that our MotoGP graph is created, we can start asking questions. For example:
Who's riding for team Yamaha?
127.0.0.1:6379> GRAPH.QUERY MotoGP "MATCH (r:Rider)-[:rides]->(t:Team) WHERE t.name = 'Yamaha' RETURN r.name, t.name"
1) 1) "r.name"
2) "t.name"
2) 1) 1) "Valentino Rossi"
2) "Yamaha"
3) 1) "Cached execution: 0"
2) "Query internal execution time: 0.625399 milliseconds"
How many riders represent team Ducati?
127.0.0.1:6379> GRAPH.QUERY MotoGP "MATCH (r:Rider)-[:rides]->(t:Team {name:'Ducati'}) RETURN count(r)"
1) 1) "count(r)"
2) 1) 1) (integer) 1
3) 1) "Cached execution: 0"
2) "Query internal execution time: 0.624435 milliseconds"
Building
Requirements:
The RedisGraph repository: git clone --recurse-submodules -j8 https://github.com/RedisGraph/RedisGraph.git
On Ubuntu Linux, run: apt-get install build-essential cmake m4 automake peg libtool autoconf
On OS X, verify that homebrew
is installed and run: brew install cmake m4 automake peg libtool autoconf
.
- The version of Clang that ships with the OS X toolchain does not support OpenMP, which is a requirement for RedisGraph. One way to resolve this is to run
brew install gcc g++
and follow the on-screen instructions to update the symbolic links. Note that this is a system-wide change - setting the environment variables for CC
and CXX
will work if that is not an option.
To build, run make
in the project's directory.
Congratulations! You can find the compiled binary at: src/redisgraph.so
Installing RedisGraph
RedisGraph is part of Redis Stack. See the Redis Stack download page for installaton options.
Using RedisGraph
Before using RedisGraph, you should familiarize yourself with its commands and syntax as detailed in the
command reference.
You can call RedisGraph's commands from any Redis client.
With redis-cli
$ redis-cli
127.0.0.1:6379> GRAPH.QUERY social "CREATE (:person {name: 'roi', age: 33, gender: 'male', status: 'married'})"
With any other client
You can interact with RedisGraph using your client's ability to send raw Redis commands.
The exact method for doing that depends on your client of choice.
Python example
This code snippet shows how to use RedisGraph with raw Redis commands from Python using
redis-py:
import redis
r = redis.Redis()
reply = r.graph("social").query("MATCH (r:Rider)-[:rides]->(t:Team {name:'Ducati'}) RETURN count(r)")
Client libraries
Language-specific clients have been written by the community and the RedisGraph team for 6 languages.
The full list and links can be found on the Clients page.
Data import
The RedisGraph team maintains the redisgraph-bulk-loader for importing new graphs from CSV files.
The data format used by this tool is described in the GRAPH.BULK implementation details.
Mailing List / Forum
Got questions? Feel free to ask at the RedisGraph forum.
License
Redis Source Available License Agreement - see LICENSE
1 - Commands
Commands Overview
Overview
RedisGraph Features
RedisGraph exposes graph database functionality within Redis using the openCypher query language. Its basic commands accept openCypher queries, while additional commands are exposed for configuration or metadata retrieval.
RedisGraph API
Command details can be retrieved by filtering for the module or for a specific command, e.g. [GRAPH.QUERY
](/commands/?name=graph.query).
The details include the syntax for the commands, where:
- Command and subcommand names are in uppercase, for example
GRAPH.CONFIG
or SET
- Optional arguments are enclosed in square brackets, for example
[timeout]
- Additional optional arguments are indicated by an ellipsis:
...
Most commands require a graph key's name as their first argument.
2 - RedisGraph Client Libraries
The full functionality of RedisGraph is available through
redis-cli
and the Redis API.
RedisInsight is a visual tool that provides capabilities to design, develop and optimize into a single easy-to-use environment, and has built-in support for RedisGraph. In addition there are severeal client libraries to improve abstractions and allow for a more natural experience in a project's native language. Additionally, these clients take advantage of some RedisGraph features that may reduce network throughput in some circumstances.
Currently available Libraries
Implementing a client
Information on some of the tasks involved in writing a RedisGraph client can be found in the Client Specification.
3 - Run-time Configuration
RedisGraph supports a few run-time configuration options that can be defined when loading the module. In the future more options will be added.
Passing configuration options on module load
Passing configuration options is done by appending arguments after the --loadmodule
argument when starting a server from the command line or after the loadmodule
directive in a Redis config file. For example:
In redis.conf:
loadmodule ./redisgraph.so OPT1 VAL1
From the command line:
$ redis-server --loadmodule ./redisgraph.so OPT1 VAL1
Passing configuration options at run-time
RedisGraph exposes the GRAPH.CONFIG
endpoint to allowing for the setting and retrieval of configurations at run-time.
To set a config, simply run:
GRAPH.CONFIG SET OPT1 VAL1
Similarly, the current configurations can be retrieved using the syntax:
GRAPH.CONFIG GET OPT1
GRAPH.CONFIG GET *
RedisGraph configuration options
THREAD_COUNT
The number of threads in RedisGraph's thread pool. This is equivalent to the maximum number of queries that can be processed concurrently.
Default
THREAD_COUNT
defaults to the system's hardware threads (logical cores).
Example
$ redis-server --loadmodule ./redisgraph.so THREAD_COUNT 4
CACHE_SIZE
The max number of queries for RedisGraph to cache. When a new query is encountered and the cache is full, meaning the cache has reached the size of CACHE_SIZE
, it will evict the least recently used (LRU) entry.
Default
CACHE_SIZE
default value is 25.
Example
$ redis-server --loadmodule ./redisgraph.so CACHE_SIZE 10
OMP_THREAD_COUNT
The maximum number of threads that OpenMP may use for computation per query. These threads are used for parallelizing GraphBLAS computations, so may be considered to control concurrency within the execution of individual queries.
Default
OMP_THREAD_COUNT
is defined by GraphBLAS by default.
Example
$ redis-server --loadmodule ./redisgraph.so OMP_THREAD_COUNT 1
NODE_CREATION_BUFFER
The node creation buffer is the number of new nodes that can be created without resizing matrices. For example, when set to 16,384, the matrices will have extra space for 16,384 nodes upon creation. Whenever the extra space is depleted, the matrices' size will increase by 16,384.
Reducing this value will reduce memory consumption, but cause performance degradation due to the increased frequency of matrix resizes.
Conversely, increasing it might improve performance for write-heavy workloads but will increase memory consumption.
If the passed argument was not a power of 2, it will be rounded to the next-greatest power of 2 to improve memory alignment.
MAX_QUEUED_QUERIES
Setting the maximum number of queued queries allows the server to reject incoming queries with the error message Max pending queries exceeded
. This reduces the memory overhead of pending queries on an overloaded server and avoids congestion when the server processes its backlog of queries.
Default
MAX_QUEUED_QUERIES
is effectively unlimited by default (config value of UINT64_MAX
).
Example
$ redis-server --loadmodule ./redisgraph.so MAX_QUEUED_QUERIES 500
$ redis-cli GRAPH.CONFIG SET MAX_QUEUED_QUERIES 500
Default
NODE_CREATION_BUFFER
is 16,384 by default.
Minimum
The minimum value for NODE_CREATION_BUFFER
is 128. Values lower than this will be accepted as arguments, but will internally be converted to 128.
Example
$ redis-server --loadmodule ./redisgraph.so NODE_CREATION_BUFFER 200
TIMEOUT
Timeout is a flag that specifies the maximum runtime for read queries in milliseconds. This configuration will not be respected by write queries, to avoid leaving the graph in an inconsistent state.
Default
TIMEOUT
is off by default (config value of 0
).
Example
$ redis-server --loadmodule ./redisgraph.so TIMEOUT 1000
RESULTSET_SIZE
Result set size is a limit on the number of records that should be returned by any query. This can be a valuable safeguard against incurring a heavy IO load while running queries with unknown results.
Default
RESULTSET_SIZE
is unlimited by default (negative config value).
Example
127.0.0.1:6379> GRAPH.CONFIG SET RESULTSET_SIZE 3
OK
127.0.0.1:6379> GRAPH.QUERY G "UNWIND range(1, 5) AS x RETURN x"
1) 1) "x"
2) 1) 1) (integer) 1
2) 1) (integer) 2
3) 1) (integer) 3
3) 1) "Cached execution: 0"
2) "Query internal execution time: 0.445790 milliseconds"
QUERY_MEM_CAPACITY
Setting the memory capacity of a query allows the server to kill queries that are consuming too much memory and return with the error message Query's mem consumption exceeded capacity
. This helps to avoid scenarios when the server becomes unresponsive due to an unbounded query exhausting system resources.
The configuration argument is the maximum number of bytes that can be allocated by any single query.
Default
QUERY_MEM_CAPACITY
is unlimited by default; this default can be restored by setting QUERY_MEM_CAPACITY
to zero or a negative value.
Example
$ redis-server --loadmodule ./redisgraph.so QUERY_MEM_CAPACITY 1048576 // 1 megabyte limit
$ redis-cli GRAPH.CONFIG SET QUERY_MEM_CAPACITY 1048576
Query Configurations
The query timeout configuration may also be set per query in the form of additional arguments after the query string. This configuration is unset by default unless using a language-specific client, which may establish its own defaults.
Query Timeout
The query flag timeout
allows the user to specify a timeout as described in TIMEOUT for a single query.
Example
Retrieve all paths in a graph with a timeout of 1000 milliseconds.
GRAPH.QUERY wikipedia "MATCH p=()-[*]->() RETURN p" timeout 1000
4 - RedisGraph: A High Performance In-Memory Graph Database
Abstract
Graph-based data is everywhere nowadays. Facebook, Google, Twitter and Pinterest are just a few who've realize the power
behind relationship data and are utilizing it to its fullest. As a direct result, we see a rise both in interest for and
the variety of graph data solutions.
With the introduction of Redis Modules we've recognized the great potential of introducing a
graph data structure to the Redis arsenal, and developed RedisGraph. Bringing new graph capabilities to Redis
through a native C implementation with an emphasis on performance, RedisGraph is now
available as an open source project.
In this documentation, we'll discuss the internal design and features of RedisGraph and demonstrate its current capabilities.
RedisGraph At-a-Glance
RedisGraph is a graph database developed from scratch on top of Redis, using the new Redis Modules API to extend Redis
with new commands and capabilities. Its main features include:
- Simple, fast indexing and querying
- Data stored in RAM using memory-efficient custom data structures
- On-disk persistence
- Tabular result sets
- Uses the popular graph query language openCypher
A Little Taste: RedisGraph in Action
Let’s look at some of the key concepts of RedisGraph using this example over the redis-cli tool:
Constructing a graph
It is common to represent entities as nodes within a graph. In this example,
we'll create a small graph with both actors and movies as its entities,
and an "act" relation that will connect actors to the movies they acted in.
We'll use the graph.QUERY command to issue a CREATE query,
which will introduce new entities and relations to our graph.
graph.QUERY <graph_id> 'CREATE (:<label> {<attribute_name>:<attribute_value>,...})'
graph.QUERY <graph_id> 'CREATE (<source_node_alias>)-[<relation> {<attribute_name>:<attribute_value>,...}]->(<dest_node_alias>)'
We'll construct our graph in one command:
graph.QUERY IMDB 'CREATE (aldis:actor {name: "Aldis Hodge", birth_year: 1986}),
(oshea:actor {name: "OShea Jackson", birth_year: 1991}),
(corey:actor {name: "Corey Hawkins", birth_year: 1988}),
(neil:actor {name: "Neil Brown", birth_year: 1980}),
(compton:movie {title: "Straight Outta Compton", genre: "Biography", votes: 127258, rating: 7.9, year: 2015}),
(neveregoback:movie {title: "Never Go Back", genre: "Action", votes: 15821, rating: 6.4, year: 2016}),
(aldis)-[:act]->(neveregoback),
(aldis)-[:act]->(compton),
(oshea)-[:act]->(compton),
(corey)-[:act]->(compton),
(neil)-[:act]->(compton)'
Querying the graph
RedisGraph exposes a subset of the openCypher graph language. Although only some language capabilities are supported,
there's enough functionality to extract valuable insights from your graphs. To execute a query, we'll use the GRAPH.QUERY
command:
GRAPH.QUERY <graph_id> <query>
Let's execute a number of queries against our movies graph.
Find the sum, max, min and avg age of the 'Straight Outta Compton' cast:
GRAPH.QUERY IMDB 'MATCH (a:actor)-[:act]->(m:movie {title:"Straight Outta Compton"})
RETURN m.title, SUM(2020-a.birth_year), MAX(2020-a.birth_year), MIN(2020-a.birth_year), AVG(2020-a.birth_year)'
RedisGraph will reply with:
1) 1) "m.title"
2) "SUM(2020-a.birth_year)"
3) "MAX(2020-a.birth_year)"
4) "MIN(2020-a.birth_year)"
5) "AVG(2020-a.birth_year)"
2) 1) 1) "Straight Outta Compton"
2) "135"
3) (integer) 40
4) (integer) 29
5) "33.75"
- The first row is our result-set header which names each column according to the return clause.
- The second row contains our query result.
Let's try another query. This time, we'll find out how many movies each actor played in.
GRAPH.QUERY IMDB "MATCH (actor)-[:act]->(movie) RETURN actor.name, COUNT(movie.title) AS movies_count ORDER BY
movies_count DESC"
1) "actor.name, movies_count"
2) "Aldis Hodge,2.000000"
3) "O'Shea Jackson,1.000000"
4) "Corey Hawkins,1.000000"
5) "Neil Brown,1.000000"
The Theory: Ideas behind RedisGraph
Representation
RedisGraph uses sparse adjacency matrices to represent graphs. As directed relationship connecting source node S to destination node T is
recorded within an adjacency matrix M, by setting M's S,T entry to 1 (M[S,T]=1).
As a rule of thumb, matrix rows represent source nodes while matrix columns represent destination nodes.
Every graph stored within RedisGraph has at least one matrix, referred to as THE adjacency matrix (relation-type agnostic). In addition, every relation with a type has its own dedicated matrix. Consider a graph with two relationships types:
- visits
- friend
The underlying graph data structure maintains three matrices:
- THE adjacency matrix - marking every connection within the graph
- visit matrix - marking visit connections
- friend matrix - marking friend connections
A 'visit' relationship E that connects node A to node B, sets THE adjacency matrix at position [A,B] to 1. Also, the visit matrix V sets position V[A,B] to 1.
To accommodate typed nodes, one additional matrix is allocated per label, and a label matrix is symmetric with ones along the main diagonal. Assume that node N was labeled as a Person, then the Person matrix P sets position P[N,N] to 1.
This design lets RedisGraph modify its graph easily, including:
- Adding new nodes simply extends matrices, adding additional rows and columns
- Adding new relationships by setting the relevant entries at the relevant matrices
- Removing relationships clears relevant entries
- Deleting nodes by deleting matrix row/column.
One of the main reasons we chose to represent our graphs as sparse matrices is graph traversal.
Traversal
Graph traversal is done by matrix multiplication. For example, if we wanted to find friends of friends for every node in the graph,
this traversal can be expressed by FOF = F^2. F stands for the friendship relation matrix, and the result matrix FOF summarizes the traversal.
The rows of the FOF represent source nodes and its columns represent friends who are two hops away: if FOF[i,j] = 1 then j is a friend of a friend of i.
Algebraic expression
When a search pattern such as (N0)-[A]->(N1)-[B]->(N2)<-[A]-(N3)
is used as part of a query, we translate it into a set of matrix multiplications. For the given example, one possible expression would be: A * B * Transpose(A)
.
Note that matrix multiplication is an associative and distributive operation. This gives us the freedom to choose which terms we want to multiply first (preferring terms that will produce highly sparse intermediate matrices). It also enables concurrency when evaluating an expression, e.g. we could compute AZ and BY in parallel.
GraphBLAS
To perform all of these operations for sparse matrices, RedisGraph uses GraphBLAS - a standard API similar to BLAS. The current implementation uses the CSC sparse matrix format (compressed sparse columns), although the underlying format is subject to change.
Query language: openCypher
There are a number of graph query languages, so we didn't want to reinvent the wheel. We decided to implement a subset of one of the most popular graph query languages out there: openCypher
While the openCypher project provides a parser for the language, we decided to create our own parser. We used Lex as a tokenizer and Lemon to generate a C target parser.
As mentioned earlier, only a subset of the language is currently supported, but we plan to continue adding new capabilities
and extend RedisGraph's openCypher capabilities.
Runtime: query execution
Let's review the steps RedisGraph takes when executing a query.
Consider this query that finds all actors who played alongside Aldis Hodge and are over 30 years old:
MATCH (aldis::actor {name:"Aldis Hodge"})-[:act]->(m:movie)<-[:act]-(a:actor) WHERE a.age > 30 RETURN m.title, a.name
RedisGraph will:
- Parse the query, and build an abstract syntax tree (AST)
- Compose traversal algebraic expressions
- Build filter trees
- Construct an optimized query execution plan composed of:
- Filtered traverse
- Conditional traverse
- Filter
- Project
- Execute the plan
- Populate a result set with matching entity attributes
Filter tree
A query can filter out entities by creating predicates. In our example, we filter actors younger then 30.
It's possible to combine predicates using OR and AND keywords to form granular conditions.
During runtime, the WHERE clause is used to construct a filter tree, and each node within the tree is either a condition (e.g. A > B) or an operation (AND/OR). When finding candidate entities, they are passed through the tree and evaluated.
Benchmarks
Depending on the underlying hardware results may vary. However, inserting a new relationship is done in O(1).
RedisGraph is able to create over 1 million nodes under half a second and form 500K relations within 0.3 of a second.
License
RedisGraph is published under Redis Source Available License Agreement.
Conclusion
Although RedisGraph is still a young project, it can be an alternative to other graph databases. With its subset of
operations, you can use it to analyze and explore graph data. Being a Redis Module, this project is accessible from
every Redis client without adjustments. It's our intention to keep on improving and extending
RedisGraph with the help of the open source community.
4.1 - Technical specification for writing RedisGraph client libraries
By design, there is not a full standard for RedisGraph clients to adhere to. Areas such as pretty-print formatting, query validation, and transactional and multithreaded capabilities have no canonically correct behavior, and the implementer is free to choose the approach and complexity that suits them best. RedisGraph does, however, provide a compact result set format for clients that minimizes the amount of redundant data transmitted from the server. Implementers are encouraged to take advantage of this format, as it provides better performance and removes ambiguity from decoding certain data. This approach requires clients to be capable of issuing procedure calls to the server and performing a small amount of client-side caching.
Retrieving the compact result set
Appending the flag --compact
to any query issued to the GRAPH.QUERY endpoint will cause the server to issue results in the compact format. Because we don't store connection-specific configurations, all queries should be issued with this flag.
GRAPH.QUERY demo "MATCH (a) RETURN a" --compact
The result set has the same overall structure as described in the Result Set documentation.
Certain values are emitted as integer IDs rather than strings:
- Node labels
- Relationship types
- Property keys
Instructions on how to efficiently convert these IDs in the Procedure Calls section below.
Additionally, two enums are exposed:
ColumnType, which as of RedisGraph v2.1.0 will always be COLUMN_SCALAR
. This enum is retained for backwards compatibility, and may be ignored by the client unless RedisGraph versions older than v2.1.0 must be supported.
ValueType indicates the data type (such as Node, integer, or string) of each returned value. Each value is emitted as a 2-array, with this enum in the first position and the actual value in the second. Each property on a graph entity also has a scalar as its value, so this construction is nested in each value of the properties array when a column contains a node or relationship.
Decoding the result set
Given the graph created by the query:
GRAPH.QUERY demo "CREATE (:plant {name: 'Tree'})-[:GROWS {season: 'Autumn'}]->(:fruit {name: 'Apple'})"
Let's formulate a query that returns 3 columns: nodes, relationships, and scalars, in that order.
Verbose (default):
127.0.0.1:6379> GRAPH.QUERY demo "MATCH (a)-[e]->(b) RETURN a, e, b.name"
1) 1) "a"
2) "e"
3) "b.name"
2) 1) 1) 1) 1) "id"
2) (integer) 0
2) 1) "labels"
2) 1) "plant"
3) 1) "properties"
2) 1) 1) "name"
2) "Tree"
2) 1) 1) "id"
2) (integer) 0
2) 1) "type"
2) "GROWS"
3) 1) "src_node"
2) (integer) 0
4) 1) "dest_node"
2) (integer) 1
5) 1) "properties"
2) 1) 1) "season"
2) "Autumn"
3) "Apple"
3) 1) "Query internal execution time: 1.326905 milliseconds"
Compact:
127.0.0.1:6379> GRAPH.QUERY demo "MATCH (a)-[e]->(b) RETURN a, e, b.name" --compact
1) 1) 1) (integer) 1
2) "a"
2) 1) (integer) 1
2) "e"
3) 1) (integer) 1
2) "b.name"
2) 1) 1) 1) (integer) 8
2) 1) (integer) 0
2) 1) (integer) 0
3) 1) 1) (integer) 0
2) (integer) 2
3) "Tree"
2) 1) (integer) 7
2) 1) (integer) 0
2) (integer) 0
3) (integer) 0
4) (integer) 1
5) 1) 1) (integer) 1
2) (integer) 2
3) "Autumn"
3) 1) (integer) 2
2) "Apple"
3) 1) "Query internal execution time: 1.085412 milliseconds"
These results are being parsed by redis-cli
, which adds such visual cues as array indexing and indentation, as well as type hints like (integer)
. The actual data transmitted is formatted using the RESP protocol. All of the current RedisGraph clients rely upon a stable Redis client in the same language (such as redis-rb for Ruby) which handles RESP decoding.
Top-level array results
The result set above had 3 members in its top-level array:
1) Header row
2) Result rows
3) Query statistics
All queries that have a RETURN
clause will have these 3 members. Queries that don't return results have only one member in the outermost array, the query statistics:
127.0.0.1:6379> GRAPH.QUERY demo "CREATE (:plant {name: 'Tree'})-[:GROWS {season: 'Autumn'}]->(:fruit {name: 'Apple'})" --compact
1) 1) "Labels added: 2"
2) "Nodes created: 2"
3) "Properties set: 3"
4) "Relationships created: 1"
5) "Query internal execution time: 1.972868 milliseconds"
Rather than introspecting on the query being emitted, the client implementation can check whether this array contains 1 or 3 elements to choose how to format data.
Our sample query MATCH (a)-[e]->(b) RETURN a, e, b.name
generated the header:
1) 1) (integer) 1
2) "a"
3) 1) (integer) 1
3) "e"
4) 1) (integer) 1
3) "b.name"
The 4 array members correspond, in order, to the 3 entities described in the RETURN clause.
Each is emitted as a 2-array:
1) ColumnType (enum)
2) column name (string)
The first element is the ColumnType enum, which as of RedisGraph v2.1.0 will always be COLUMN_SCALAR
. This element is retained for backwards compatibility, and may be ignored by the client unless RedisGraph versions older than v2.1.0 must be supported.
Reading result rows
The entity representations in this section will closely resemble those found in Result Set Graph Entities.
Our query produced one row of results with 3 columns (as described by the header):
1) 1) 1) (integer) 8
2) 1) (integer) 0
2) 1) (integer) 0
3) 1) 1) (integer) 0
2) (integer) 2
3) "Tree"
2) 1) (integer) 7
2) 1) (integer) 0
2) (integer) 0
3) (integer) 0
4) (integer) 1
5) 1) 1) (integer) 1
2) (integer) 2
3) "Autumn"
3) 1) (integer) 2
2) "Apple"
Each element is emitted as a 2-array - [ValueType
, value].
It is the client's responsibility to store the ValueType enum. RedisGraph guarantees that this enum may be extended in the future, but the existing values will not be altered.
The ValueType
for the first entry is VALUE_NODE
. The node representation contains 3 top-level elements:
- The node's internal ID.
- An array of all label IDs associated with the node (currently, each node can have either 0 or 1 labels, though this restriction may be lifted in the future).
- An array of all properties the node contains. Properties are represented as 3-arrays - [property key ID,
ValueType
, value].
[
Node ID (integer),
[label ID (integer) X label count]
[[property key ID (integer), ValueType (enum), value (scalar)] X property count]
]
The ValueType
for the first entry is VALUE_EDGE
. The edge representation differs from the node representation in two respects:
- Each relation has exactly one type, rather than the 0+ labels a node may have.
- A relation is emitted with the IDs of its source and destination nodes.
As such, the complete representation is as follows:
- The relation's internal ID.
- The relationship type ID.
- The source node's internal ID.
- The destination node's internal ID.
- The key-value pairs of all properties the relation possesses.
[
Relation ID (integer),
type ID (integer),
source node ID (integer),
destination node ID (integer),
[[property key ID (integer), ValueType (enum), value (scalar)] X property count]
]
The ValueType
for the third entry is VALUE_STRING
, and the other element in the array is the actual value, "Apple".
Reading statistics
The final top-level member of the GRAPH.QUERY reply is the execution statistics. This element is identical between the compact and standard response formats.
The statistics always include query execution time, while any combination of the other elements may be included depending on how the graph was modified.
- "Labels added: (integer)"
- "Nodes created: (integer)"
- "Properties set: (integer)"
- "Nodes deleted: (integer)"
- "Relationships deleted: (integer)"
- "Relationships created: (integer)"
- "Query internal execution time: (float) milliseconds"
Procedure Calls
Property keys, node labels, and relationship types are all returned as IDs rather than strings in the compact format. For each of these 3 string-ID mappings, IDs start at 0 and increase monotonically.
As such, the client should store an string array for each of these 3 mappings, and print the appropriate string for the user by checking an array at position ID. If an ID greater than the array length is encountered, the local array should be updated with a procedure call.
These calls are described generally in the Procedures documentation.
To retrieve each full mapping, the appropriate calls are:
db.labels()
127.0.0.1:6379> GRAPH.QUERY demo "CALL db.labels()"
1) 1) "label"
2) 1) 1) "plant"
2) 1) "fruit"
3) 1) "Query internal execution time: 0.321513 milliseconds"
db.relationshipTypes()
127.0.0.1:6379> GRAPH.QUERY demo "CALL db.relationshipTypes()"
1) 1) "relationshipType"
2) 1) 1) "GROWS"
3) 1) "Query internal execution time: 0.429677 milliseconds"
db.propertyKeys()
127.0.0.1:6379> GRAPH.QUERY demo "CALL db.propertyKeys()"
1) 1) "propertyKey"
2) 1) 1) "name"
2) 1) "season"
3) 1) "Query internal execution time: 0.318940 milliseconds"
Because the cached values never become outdated, it is possible to just retrieve new values with slightly more complex constructions:
CALL db.propertyKeys() YIELD propertyKey RETURN propertyKey SKIP [cached_array_length]
Though the property calls are quite efficient regardless of whether this optimization is used.
As an example, the Python client checks its local array of labels to resolve every label ID as seen here.
In the case of an IndexError, it issues a procedure call to fully refresh its label cache as seen here.
Reference clients
All the logic described in this document has been implemented in most of the clients listed in Client Libraries. Among these, node-redis
, redis-py
and jedis
are currently the most sophisticated.
4.2 - RedisGraph Result Set Structure
This document describes the format RedisGraph uses to print data when accessed through the
redis-cli
utility. The
language-specific clients retrieve data in a more succinct format, and provide their own functionality for printing result sets.
Top-level members
Queries that return data emit an array with 3 top-level members:
- The header describing the returned records. This is an array which corresponds precisely in order and naming to the RETURN clause of the query.
- A nested array containing the actual data returned by the query.
- An array of metadata related to the query execution. This includes query runtime as well as data changes, such as the number of entities created, deleted, or modified by the query.
Result Set data types
A column in the result set can be populated with graph entities (nodes or relations) or scalar values.
Scalars
RedisGraph replies are formatted using the RESP protocol. The current RESP iteration provides fewer data types than RedisGraph supports internally, so displayed results are mapped as follows:
RedisGraph type | Display format |
---|
Integer | Integer |
NULL | NULL (nil) |
String | String |
Boolean | String ("true"/"false") |
Double | String (15-digit precision) |
Graph Entities
When full entities are specified in a RETURN clause, all data relevant to each entity value is emitted within a nested array. Key-value pairs in the data, such as the combination of a property name and its corresponding value, are represented as 2-arrays.
Internal IDs are returned with nodes and relations, but these IDs are not immutable. After entities have been deleted, higher IDs may be migrated to the vacated lower IDs.
Nodes
The node representation contains 3 top-level elements:
- The node's internal ID.
- Any labels associated with the node.
- The key-value pairs of all properties the node possesses.
[
[“id”, ID (integer)]
[“labels”,
[label (string) X label count]
]
[properties”, [
[prop_key (string), prop_val (scalar)] X property count]
]
]
Relations
The relation representation contains 5 top-level elements:
- The relation's internal ID.
- The type associated with the relation.
- The source node's internal ID.
- The destination node's internal ID.
- The key-value pairs of all properties the relation possesses.
[
[“id”, ID (integer)]
[“type”, type (string)]
[“src_node”, source node ID (integer)]
[“dest_node”, destination node ID (integer)]
[properties”, [
[prop_key (string), prop_val (scalar)] X property count]
]
]
Collections
Arrays
When array values are specified in a RETURN clause, the representation in the response is the array string representation. This is done solely for better readability of the response.
The string representation of an array which contains graph entities, will print only their ID. A node string representation is round braces around its ID, and edge string representation is brackets around the edge ID.
Paths
Returned path value is the string representation of an array with the path's nodes and edges, interleaved.
Example
Given the graph created by the command:
CREATE (:person {name:'Pam', age:27})-[:works {since: 2010}]->(:employer {name:'Dunder Mifflin'})""
We can run a query that returns a node, a relation, and a scalar value.
"MATCH (n1)-[r]->(n2) RETURN n1, r, n2.name"
1) 1) "n1"
2) "r"
3) "n2.name"
2) 1) 1) 1) 1) "id"
2) (integer) 0
2) 1) "labels"
2) 1) "person"
3) 1) "properties"
2) 1) 1) "age"
2) (integer) 27
2) 1) "name"
2) "Pam"
2) 1) 1) "id"
2) (integer) 0
2) 1) "type"
2) "works"
3) 1) "src_node"
2) (integer) 0
4) 1) "dest_node"
2) (integer) 1
5) 1) "properties"
2) 1) 1) "since"
2) (integer) 2010
3) "Dunder Mifflin"
3) 1) "Query internal execution time: 1.858986 milliseconds"
4.3 - Implementation details for the GRAPH.BULK endpoint
The RedisGraph bulk loader uses the GRAPH.BULK endpoint to build a new graph from 1 or more Redis queries. The bulk of these queries is binary data that is unpacked to create nodes, edges, and their properties. This endpoint could be used to write bespoke import tools for other data formats using the implementation details provided here.
Caveats
The main complicating factor in writing bulk importers is that Redis has a maximum string length of 512 megabytes and a default maximum query size of 1 gigabyte. As such, large imports must be written incrementally.
The RedisGraph team will do their best to ensure that future updates to this logic do not break current implementations, but cannot guarantee it.
GRAPH.BULK [graph name] ["BEGIN"] [node count] [edge count] ([binary blob] * N)
Arguments
graph name
The name of the graph to be inserted.
BEGIN
The endpoint cannot be used to update existing graphs, only to create new ones. For this reason, the first query in a sequence of BULK commands should pass the string literal "BEGIN".
node count
Number of nodes being inserted in this query.
edge count
Number of edges being inserted in this query.
binary blob
A binary string of up to 512 megabytes that partially or completely describes a single label or relationship type.
Any number of these blobs may be provided in a query provided that Redis's 1-gigabyte query limit is not exceeded.
Module behavior
The endpoint will parse binary blobs as nodes until the number of created nodes matches the node count, then will parse subsequent blobs as edges. The import tool is expected to correctly provide these counts.
If the BEGIN
token is found, the module will verify that the graph key is unused, and will emit an error if it is. Otherwise, the partially-constructed graph will be retrieved in order to resume building.
Nodes in node blobs do not need to specify their IDs. The ID of each node is an 8-byte unsigned integer corresponding to the node count at the time of its creation. (The first-created node has the ID of 0, the second has 1, and so forth.)
The blob consists of:
header specification
1 or more property specifications
The import tool is responsible for tracking the IDs of nodes used as edge endpoints.
The blob consists of:
header specification
1 or more:
- 8-byte unsigned integer representing source node ID
- 8-byte unsigned integer representing destination node ID
- property specification
name
- A null-terminated string representing the name of the label or relationship type.
property count
- A 4-byte unsigned integer representing the number of properties each entry in this blob possesses.
property names
- an ordered sequence of property count
null-terminated strings, each representing the name for the property at that position.
Property specification
property type
- A 1-byte integer corresponding to the TYPE enum:
BI_NULL = 0,
BI_BOOL = 1,
BI_DOUBLE = 2,
BI_STRING = 3,
BI_LONG = 4,
BI_ARRAY = 5,
property
:- 1-byte true/false if type is boolean
- 8-byte double if type is double
- 8-byte integer if type is integer
- Null-terminated C string if type is string
- 8-byte array length followed by N values of this same type-property pair if type is array
Redis Reply
Redis will reply with a string of the format:
[N] nodes created, [M] edges created
5 - RedisGraph Data Types
RedisGraph supports a number of distinct data types, some of which can be persisted as property values and some of which are ephemeral.
Graph types
All graph types are either structural elements of the graph or projections thereof. None can be stored as a property value.
Nodes
Nodes are persistent graph elements that can be connected to each other via relationships.
They can have any number of labels that describe their general type. For example, a node representing London may be created with the Place
and City
labels and retrieved by queries using either or both of them.
Nodes have sets of properties to describe all of their salient characteristics. For example, our London node may have the property set: {name: 'London', capital: True, elevation: 11}
.
When querying nodes, multiple labels can be specified. Only nodes that hold all specified labels will be matched:
$ redis-cli GRAPH.QUERY G "MATCH (n:Place:Continent) RETURN n"
Relationships
Relationships are persistent graph elements that connect one node to another.
They must have exactly one type that describes what they represent. For example, a RESIDENT_OF
relationship may be used to connect a Person
node to a City
node.
Relationships are always directed, connecting a source node to its destination.
Like nodes, relationships have sets of properties to describe all of their salient characteristics.
When querying relationships, multiple types can be specified when separated by types. Relationships that hold any of the specified types will be matched:
$ redis-cli GRAPH.QUERY G "MATCH (:Person)-[r:RESIDENT_OF|:VISITOR_TO]->(:Place {name: 'London'}) RETURN r"
Paths
Paths are alternating sequences of nodes and edges, starting and ending with a node.
They are not structural elements in the graph, but can be created and returned by queries.
For example, the following query returns all paths of any length connecting the node London to the node New York:
$ redis-cli GRAPH.QUERY G "MATCH p=(:City {name: 'London'})-[*]->(:City {name: 'New York'}) RETURN p"
Scalar types
All scalar types may be provided by queries or stored as property values on node and relationship objects.
Strings
RedisGraph strings are Unicode character sequences. When using Redis with a TTY (such as invoking RedisGraph commands from the terminal via redis-cli
), some code points may not be decoded, as in:
$ redis-cli GRAPH.QUERY G "RETURN '日本人' as stringval"
1) 1) "stringval"
2) 1) 1) "\xe6\x97\xa5\xe6\x9c\xac\xe4\xba\xba"
Output decoding can be forced using the --raw
flag:
$ redis-cli --raw GRAPH.QUERY G "RETURN '日本人' as stringval"
stringval
日本人
Booleans
Boolean values are specified as true
or false
. Internally, they are stored as numerics, with 1 representing true and 0 representing false. As RedisGraph considers types in its comparisons, 1 is not considered equal to true
:
$ redis-cli GRAPH.QUERY G "RETURN 1 = true"
1) 1) "1 = true"
2) 1) 1) "false"
Integers
All RedisGraph integers are treated as 64-bit signed integers.
Floating-point values
All RedisGraph floating-point values are treated as 64-bit signed doubles.
Geospatial Points
The Point data type is a set of latitude/longitude coordinates, stored within RedisGraph as a pair of 32-bit floats. It is instantiated using the point() function call.
Nulls
In RedisGraph, null
is used to stand in for an unknown or missing value.
Since we cannot reason broadly about unknown values, null
is an important part of RedisGraph's 3-valued truth table. For example, the comparison null = null
will evaluate to null
, as we lack adequate information about the compared values. Similarly, null in [1,2,3]
evaluates to null
, since the value we're looking up is unknown.
Unlike all other scalars, null
cannot be stored as a property value.
Collection types
Arrays
Arrays are ordered lists of elements. They can be provided as literals or generated by functions like collect()
. Nested arrays are supported, as are many functions that operate on arrays such as list comprehensions.
Arrays can be stored as property values provided that no array element is of an unserializable type, such as graph entities or null
values.
Maps
Maps are order-agnostic collections of key-value pairs. If a key is a string literal, the map can be accessed using dot notation. If it is instead an expression that evaluates to a string literal, bracket notation can be used:
$ redis-cli GRAPH.QUERY G "WITH {key1: 'stringval', key2: 10} AS map RETURN map.key1, map['key' + 2]"
1) 1) "map.key1"
2) "map['key' + 2]"
2) 1) 1) "stringval"
2) (integer) 10
This aligns with way that the properties of nodes and relationships can be accessed.
Maps cannot be stored as property values.
Map projections
Maps can be constructed as projections using the syntax alias {.key1 [, ...n]}
. This can provide a useful format for returning graph entities. For example, given a graph with the node (name: 'Jeff', age: 32)
, we can build the projection:
$ redis-cli GRAPH.QUERY G "MATCH (n) RETURN n {.name, .age} AS projection"
1) 1) "projection"
2) 1) 1) "{name: Jeff, age: 32}"
Function calls in map values
The values in maps and map projections are flexible, and can generally refer either to constants or computed values:
$ redis-cli GRAPH.QUERY G "RETURN {key1: 'constant', key2: rand(), key3: toLower('GENERATED') + '_string'} AS map"
1) 1) "map"
2) 1) 1) "{key1: constant, key2: 0.889656, key3: generated_string}"
The exception to this is aggregation functions, which must be computed in a preceding WITH
clause instead of being invoked within the map. This restriction is intentional, as it helps to clearly disambiguate the aggregate function calls and the key values they are grouped by:
$ redis-cli GRAPH.QUERY G "
MATCH (follower:User)-[:FOLLOWS]->(u:User)
WITH u, COUNT(follower) AS count
RETURN u {.name, follower_count: count} AS user"
1) 1) "user"
2) 1) 1) "{name: Jeff, follower_count: 12}"
2) 1) "{name: Roi, follower_count: 18}"
6 - Cypher Coverage
RedisGraph implements a subset of the Cypher language, which is growing as development continues. This document is based on the Cypher Query Language Reference (version 9), available at
OpenCypher Resources.
Patterns
Patterns are fully supported.
Types
Structural types
- Nodes
- Relationships
- Path variables (alternating sequence of nodes and relationships).
Composite types
- Temporal types (Date, DateTime, LocalDateTime, Time, LocalTime, Duration)
Literal types
- Hexadecimal and octal numerics
Other
NULL is supported as a representation of a missing or undefined value.
Comparability, equality, orderability, and equivalence
This is a somewhat nebulous area in Cypher itself, with a lot of edge cases.
Broadly speaking, RedisGraph behaves as expected with string and numeric values.
There are likely some behaviors involving the numerics NaN, -inf, inf, and possibly -0.0 that deviate from the Cypher standard.
We do not support any of these properties at the type level, meaning nodes and relationships are not internally comparable.
Clauses
Reading Clauses
Projecting Clauses
Reading sub-clauses
Writing Clauses
CREATE
DELETE
- We actually implement DETACH DELETE, the distinction being that relationships invalidated by node deletions are automatically deleted.
SET
Unsupported:
- REMOVE (to modify properties)
- Properties can be deleted with SET [prop] = NULL.
Reading/Writing Clauses
Set Operations
Functions
Scalar functions
- Some casting functions (toBoolean, toFloat)
- Temporal arithmetic functions
- Functions returning maps (properties)
Aggregating functions
- avg
- collect
- count
- max
- min
- percentileCont
- percentileDist
- stDev
- stDevP
- sum
List functions
Math functions - numeric
- abs
- ceil
- floor
- sign
- round
- rand
- toInteger
String functions
left
right
trim
lTrim
rTrim
reverse
substring
toLower
toUpper
toString
Unsupported:
Predicate functions
Expression functions
Geospatial functions
Unsupported function classes
- Logarithmic math functions
- Trigonometric math functions
- User-defined functions
Operators
Mathematical operators
Multiplication, addition, subtraction, division, modulo
Unsupported:
String operators
Boolean operators
Parameters
Parameters may be specified to allow for more flexible query construction:
CYPHER name_param = "Niccolò Machiavelli" birth_year_param = 1469 MATCH (p:Person {name: $name_param, birth_year: $birth_year_param}) RETURN p
The example above shows the syntax used by redis-cli
to set parameters, but
each RedisGraph client introduces a language-appropriate method for setting parameters,
and is described in their documentation.
Non-Cypher queries
- RedisGraph provides the
GRAPH.EXPLAIN
command to print the execution plan of a provided query. GRAPH.DELETE
will remove a graph and all Redis keys associated with it.
- We do not currently provide support for queries that retrieve schemas, though the LABELS and TYPE scalar functions may be used to get a graph overview.
7 - References
Quick start
8 - Known limitations
Relationship uniqueness in patterns
When a relation in a match pattern is not referenced elsewhere in the query, RedisGraph will only verify that at least one matching relation exists (rather than operating on every matching relation).
In some queries, this will cause unexpected behaviors. Consider a graph with 2 nodes and 2 relations between them:
CREATE (a)-[:e {val: '1'}]->(b), (a)-[:e {val: '2'}]->(b)
Counting the number of explicit edges returns 2, as expected.
MATCH (a)-[e]->(b) RETURN COUNT(e)
However, if we count the nodes in this pattern without explicitly referencing the relation, we receive a value of 1.
MATCH (a)-[e]->(b) RETURN COUNT(b)
We are researching designs that resolve this problem without negatively impacting performance. As a temporary workaround, queries that must operate on every relation matching a pattern should explicitly refer to that relation's alias elsewhere in the query. Two options for this are:
MATCH (a)-[e]->(b) WHERE ID(e) >= 0 RETURN COUNT(b)
MATCH (a)-[e]->(b) RETURN COUNT(b), e.dummyval
LIMIT clause does not affect eager operations
When a WITH or RETURN clause introduces a LIMIT value, this value ought to be respected by all preceding operations.
For example, given the query:
UNWIND [1,2,3] AS value CREATE (a {property: value}) RETURN a LIMIT 1
One node should be created with its 'property' set to 1. RedisGraph will currently create three nodes, and only return the first.
This limitation affects all eager operations: CREATE, SET, DELETE, MERGE, and projections with aggregate functions.
Indexing limitations
One way in which RedisGraph will optimize queries is by introducing index scans when a filter is specified on an indexed label-property pair.
The current index implementation, however, does not handle not-equal (<>
) filters.
To profile a query and see whether index optimizations have been introduced, use the GRAPH.EXPLAIN
endpoint:
$ redis-cli GRAPH.EXPLAIN social "MATCH (p:person) WHERE p.id < 5 RETURN p"
1) "Results"
2) " Project"
3) " Index Scan | (p:person)"