Note: this document was written by the creator of Redis, Salvatore Sanfilippo, early in the development of Redis (c. 2010), and does not necessarily reflect the latest Redis implementation.
Let us figure it out through a series of Q&As.
Q: What do you expect a network server to be doing all the time?
A: Watch for inbound connections on the port its listening and accept them.
Q: Calling [accept](http://man.cx/accept%282%29 accept) yields a descriptor. What do I do with it?
A: Save the descriptor and do a non-blocking read/write operation on it.
Q: Why does the read/write have to be non-blocking?
A: If the file operation ( even a socket in Unix is a file ) is blocking how could the server for example accept other connection requests when its blocked in a file I/O operation.
Q: I guess I have to do many such non-blocking operations on the socket to see when it's ready. Am I right?
A: Yes. That is what an event library does for you. Now you get it.
Q: How do Event Libraries do what they do?
A: They use the operating system's polling facility along with timers.
Q: So are there any open source event libraries that do what you just described?
A: Yes. libevent
and libev
are two such event libraries that I can recall off the top of my head.
Q: Does Redis use such open source event libraries for handling socket I/O?
A: No. For various reasons Redis uses its own event library.
Redis implements its own event library. The event library is implemented in ae.c
.
The best way to understand how the Redis event library works is to understand how Redis uses it.
initServer
function defined in redis.c
initializes the numerous fields of the redisServer
structure variable. One such field is the Redis event loop el
:
aeEventLoop *el
initServer
initializes server.el
field by calling aeCreateEventLoop
defined in ae.c
. The definition of aeEventLoop
is below:
typedef struct aeEventLoop
{
int maxfd;
long long timeEventNextId;
aeFileEvent events[AE_SETSIZE]; /* Registered events */
aeFiredEvent fired[AE_SETSIZE]; /* Fired events */
aeTimeEvent *timeEventHead;
int stop;
void *apidata; /* This is used for polling API specific data */
aeBeforeSleepProc *beforesleep;
} aeEventLoop;
aeCreateEventLoop
aeCreateEventLoop
first malloc
s aeEventLoop
structure then calls ae_epoll.c:aeApiCreate
.
aeApiCreate
malloc
s aeApiState
that has two fields - epfd
that holds the epoll
file descriptor returned by a call from epoll_create
and events
that is of type struct epoll_event
define by the Linux epoll
library. The use of the events
field will be described later.
Next is ae.c:aeCreateTimeEvent
. But before that initServer
call anet.c:anetTcpServer
that creates and returns a listening descriptor. The descriptor listens on port 6379 by default. The returned listening descriptor is stored in server.fd
field.
aeCreateTimeEvent
aeCreateTimeEvent
accepts the following as parameters:
eventLoop
: This is server.el
in redis.c
proc
: Function pointer. Stores the address of the function that has to be called after the timer expires.clientData
: Mostly NULL
.finalizerProc
: Pointer to the function that has to be called before the timed event is removed from the list of timed events.initServer
calls aeCreateTimeEvent
to add a timed event to timeEventHead
field of server.el
. timeEventHead
is a pointer to a list of such timed events. The call to aeCreateTimeEvent
from redis.c:initServer
function is given below:
aeCreateTimeEvent(server.el /*eventLoop*/, 1 /*milliseconds*/, serverCron /*proc*/, NULL /*clientData*/, NULL /*finalizerProc*/);
redis.c:serverCron
performs many operations that helps keep Redis running properly.
aeCreateFileEvent
The essence of aeCreateFileEvent
function is to execute epoll_ctl
system call which adds a watch for EPOLLIN
event on the listening descriptor create by anetTcpServer
and associate it with the epoll
descriptor created by a call to aeCreateEventLoop
.
Following is an explanation of what precisely aeCreateFileEvent
does when called from redis.c:initServer
.
initServer
passes the following arguments to aeCreateFileEvent
:
server.el
: The event loop created by aeCreateEventLoop
. The epoll
descriptor is got from server.el
.server.fd
: The listening descriptor that also serves as an index to access the relevant file event structure from the eventLoop->events
table and store extra information like the callback function.AE_READABLE
: Signifies that server.fd
has to be watched for EPOLLIN
event.acceptHandler
: The function that has to be executed when the event being watched for is ready. This function pointer is stored in eventLoop->events[server.fd]->rfileProc
.This completes the initialization of Redis event loop.
ae.c:aeMain
called from redis.c:main
does the job of processing the event loop that is initialized in the previous phase.
ae.c:aeMain
calls ae.c:aeProcessEvents
in a while loop that processes pending time and file events.
aeProcessEvents
ae.c:aeProcessEvents
looks for the time event that will be pending in the smallest amount of time by calling ae.c:aeSearchNearestTimer
on the event loop. In our case there is only one timer event in the event loop that was created by ae.c:aeCreateTimeEvent
.
Remember, that the timer event created by aeCreateTimeEvent
has probably elapsed by now because it had an expiry time of one millisecond. Since the timer has already expired, the seconds and microseconds fields of the tvp
timeval
structure variable is initialized to zero.
The tvp
structure variable along with the event loop variable is passed to ae_epoll.c:aeApiPoll
.
aeApiPoll
functions does an epoll_wait
on the epoll
descriptor and populates the eventLoop->fired
table with the details:
fd
: The descriptor that is now ready to do a read/write operation depending on the mask value.mask
: The read/write event that can now be performed on the corresponding descriptor.aeApiPoll
returns the number of such file events ready for operation. Now to put things in context, if any client has requested for a connection then aeApiPoll
would have noticed it and populated the eventLoop->fired
table with an entry of the descriptor being the listening descriptor and mask being AE_READABLE
.
Now, aeProcessEvents
calls the redis.c:acceptHandler
registered as the callback. acceptHandler
executes accept on the listening descriptor returning a connected descriptor with the client. redis.c:createClient
adds a file event on the connected descriptor through a call to ae.c:aeCreateFileEvent
like below:
if (aeCreateFileEvent(server.el, c->fd, AE_READABLE,
readQueryFromClient, c) == AE_ERR) {
freeClient(c);
return NULL;
}
c
is the redisClient
structure variable and c->fd
is the connected descriptor.
Next the ae.c:aeProcessEvent
calls ae.c:processTimeEvents
processTimeEvents
ae.processTimeEvents
iterates over list of time events starting at eventLoop->timeEventHead
.
For every timed event that has elapsed processTimeEvents
calls the registered callback. In this case it calls the only timed event callback registered, that is, redis.c:serverCron
. The callback returns the time in milliseconds after which the callback must be called again. This change is recorded via a call to ae.c:aeAddMilliSeconds
and will be handled on the next iteration of ae.c:aeMain
while loop.
That's all.