<code>epoll</code> is a Linux kernel system call for a scalable I/O event notification mechanism, first introduced in version 2.5.45 of the Linux kernel in October, 2002. Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the older POSIX <code>select(2)</code> and <code>poll(2)</code> system calls, to achieve better performance in more demanding applications, where the number of watched file descriptors is large (unlike the older system calls, which operate in O(n) time, <code>epoll</code> operates in O(1) time).
<code>epoll</code> is similar to FreeBSD's <code>kqueue</code>, in that it consists of a set of user-space functions, each taking a file descriptor argument denoting the configurable kernel object, against which they cooperatively operate. <code>epoll</code> uses a redâÂÂblack tree (RB-tree) data structure to keep track of all file descriptors that are currently being monitored.
Creates an <code>epoll</code> object and returns its file descriptor. The <code>flags</code> parameter allows epoll behavior to be modified. It has only one valid value, <code>EPOLL_CLOEXEC</code>. <code>epoll_create()</code> is an older variant of <code>epoll_create1()</code> and is deprecated as of Linux kernel version 2.6.27 and glibc version 2.9.
Controls (configures) which file descriptors are watched by this object, and for which events. <code>op</code> can be ADD, MODIFY or DELETE.
Waits for any of the events registered for with <code>epoll_ctl</code>, until at least one occurs or the timeout elapses. Returns the occurred events in <code>events</code>, up to <code>maxevents</code> at once. <code>maxevents</code> is the maximum number of <code>epoll_event</code>/file descriptors to be monitored. In most case, <code>maxevents</code> is set to the value of the size of <code>*events</code> argument (<code>struct epoll_event* events</code> array).
<code>epoll</code> provides both edge-triggered and level-triggered modes. In edge-triggered mode, a call to <code>epoll_wait</code> will return only when a new event is enqueued with the <code>epoll</code> object, while in level-triggered mode, <code>epoll_wait</code> will return as long as the condition holds.
For instance, if a pipe registered with <code>epoll</code> has received data, a call to <code>epoll_wait</code> will return, signaling the presence of data to be read. Suppose, the reader only consumed part of data from the buffer. In level-triggered mode, further calls to <code>epoll_wait</code> will return immediately, as long as the pipe's buffer contains data to be read. In edge-triggered mode, however, <code>epoll_wait</code> will return only once new data is written to the pipe.
Bryan Cantrill pointed out that <code>epoll</code> had mistakes that could have been avoided, had it learned from its predecessors: input/output completion ports, event ports (Solaris) and kqueue. However, a large part of his criticism was addressed by <code>epoll</code>'s <code>EPOLLONESHOT</code> and <code>EPOLLEXCLUSIVE</code> options. <code>EPOLLONESHOT</code> was added in version 2.6.2 of the Linux kernel mainline, released in February 2004. <code>EPOLLEXCLUSIVE</code> was added in version 4.5, released in March 2016.