I/O multiplexing refers to the concept of processing multiple input/output events from a single event loop, with system calls like poll and select (Unix). –Wikipedia
Introduction
kqueue(on MacOS) and epoll(on Linux) are kernel system calls for scalable I/O event notification mechanisms in an efficient manner. In simple words, you subscribe to certain kernel events and you get notified when any of those events occur. These system calls are desigend for scalable situations such as a webserver where 10,000
concurrent connections are being handled by one server.
epoll/kqueue are replacements for their deprecated counterparts poll and select. Let’s take a look at those two and see why they are not suitable for today’s use cases.
select
In order to watch one file descriptor, say 777
, you need to write something like this in C:
When you call select
, you pass nfds = 778
which is the “number of file descriptors”. What select
does under the hood is, it simply loops through all the file descriptors from 0
to nfds - 1
. It checks if you have set them and if the desired event occured. So, the runtime is O(n)
where n
is the largest file descriptor you’re watching!
This line of the man page, explains it well:
The first nfds descriptors are checked in each set; i.e., the descriptors from 0 through nfds-1 in the descriptor sets are examined. (Example: If you have set two file descriptors “4” and “17”, nfds should not be “2”, but rather “17 + 1” or “18”.)
Aside from performance, a bigger problem is, using select
you may destroy your call stack and crash your process! Let’s again look at how we set our desired file descriptor:
fds
is the set of file descriptors we want to watch, and according to the man page:
The descriptor sets are stored as bit fields in arrays of integers.
It also later states that:
The default size FD_SETSIZE (currently 1024) is somewhat smaller than the current kernel limit to the number of open files.
Now, if you try to watch file descriptor 2000
, select
will loop over fds from 0
to 1999
and will read garbage. The bigger issue is when it tries to set results for a file descriptor past 1024
and tries to set that bit field in say readfds
, writefds
or errorfds
field. At this point it will write something random on the stack eventually crashing the process and making it very hard to debug what happened since your stack is randomized.
poll
After the introduction of select in 1983, poll was released in 1986 (and came to libc in linux in 1997) to address the shortcomings of select. poll solved the fd limit of 1024, introduced more flavours of events to watch and slight changes to the timeout in the api, such as using milliseconds instead of microseconds.
You can see that poll
’s api makes a lot more sense:
You pass in the list of fds you’re interested in, in a sparse format, and ndfs
is actually the number of fds you are interested in. Here’s the structure of pollfd
:
In order to watch fds 777
for read, you can simply do:
Although poll
is an improvement to select
, it still loops over the all the provided file descriptors hence runs in O(n)
where n
is the number of file descriptors we are watching.
Kqueue
Kqueue is a scalable event notification interface introduced in 2000. It provides a standard API for applications to register their interest in various events/conditions and have their notifications delivered efficiently. It is designed to be scalable, flexible, reliable and correct.
In order to understand how to use kqueue, let’s look at a few concepts.
kevent structure
A kevent is identified by an <ident, filter>
pair, where ident can be a file/socket descriptor and filter is the kernel filter used to process the respective event. There are some pre-defined system filters, such as EVFILT_READ
or EVFILT_WRITE
, that are triggered when data exists for read or write.
For exmaple, if we want to be notified when there is data available for reading in a socket, we need to specify a kevent in the form of <sckfd, EVFILT_READ>
.
flags
We use flags to add, delete, enable or disable a kevent from the queue (EV_ADD
, EV_DELETE
, EV_ENABLE
, EV_DISABLE
).
kqueue
The kqueue holds all the events we are interested in. So, to start we can simply create an empty kqueue.
kevent system call
We can use the kevent(...)
system call to:
- add/modify/delete events on the queue
- wait/read occured events
It also allows for setting a timeout but for now we put in NULL
which means no timeout (wait indefinitely).
Example usage:
Note: re-adding an existing event will modify the parameters of the original event and won’t result in a duplicate entry.
event loop
After creating the queue and registering our events, we wait for the events in an infinite loop. Once they occur, the call to kevent(...)
unblocks and we receive these events in evList
. We can iterate over these events and handle them however we wish. For example, if we receive a new connection from a client, we can accept the connection or if we receive new data, we can read it from the socket.
For a more practical use case, the next post Streaming Server Using Kqueue shows how to write a simple server using kqueue in only 100 lines in C!