Epoll and io_uring: a comparison for asynchronous I/O in Linux

As software development progresses, so too do the methods and tools available for handling asynchronous input/output (I/O) on platforms like Linux. Two primary players in this space are epoll and io_uring, each presenting unique architectures and operational efficiencies. By understanding both, developers can better grasp how to enhance performance and scalability in their applications.

The legacy of epoll

Epoll, introduced in the Linux kernel in 2002, has been a cornerstone of managing asynchronous I/O for years. When it first emerged, it was essentially the only viable option for developers seeking a way to efficiently handle concurrent connections. It allows applications to monitor multiple file descriptors to see which can perform I/O operations without blocking.

The operational method of epoll is straightforward. It involves three primary syscall interactions:

1. A registration syscall, typically using epoll_ctl, to register interest in a file descriptor. 2. A waiting syscall, utilizing epoll_wait, to block until an I/O event occurs. 3. An I/O syscall, such as read or write, to perform the actual operation once an event is detected.

For most applications, this entails two system calls each time an I/O operation is required, alongside the initial registration. Consequently, the kernel must handle context switches between user and kernel modes, resulting in performance overhead, particularly under heavy load.

Introducing io_uring

Fast forward about 17 years, and the introduction of io_uring in 2019 marked a significant evolution in Linux's asynchronous I/O capabilities. This new interface dramatically rethinks how asynchronous operations should be handled, moving from a readiness model, which waits for I/O on a file descriptor, to a completion model that informs when I/O operations have actually finished.

Io_uring operates using two ring buffers—one for submission and another for completion—both shared between user space and the kernel. This design minimizes the necessity for context switching and maximizes throughput by batching operations.

Instead of multiple syscalls per I/O operation, io_uring allows developers to submit numerous requests with a single syscall through io_uring_enter(). Furthermore, under its specific mode, IORING_SETUP_SQPOLL, it can facilitate an almost syscall-free environment during normal operation. This is accomplished by dispatching a dedicated kernel thread to poll the submission queue continually.

Performance comparison: epoll vs. io_uring

The performance benefits of io_uring become evident when comparing it directly to epoll, especially on systems running Linux kernel versions 5.1 and above where io_uring is natively supported. The architectural shift from readiness to completion models fundamentally transforms how applications manage I/O operations.

The two fundamentally different approaches to handling I/O are best illustrated through a simple code example. First, let's look at how epoll operates:


#include <sys/types.h>
#include <sys/epoll.h>
#include <unistd.h>
#include <stdio.h>

int main() {
    int epfd = epoll_create1(0);
    struct epoll_event event;
    event.events = EPOLLIN;
    event.data.fd = STDIN_FILENO;
    epoll_ctl(epfd, EPOLL_CTL_ADD, STDIN_FILENO, &event);

    while (1) {
        epoll_wait(epfd, &event, 1, -1);
        char buf[100];
        read(STDIN_FILENO, buf, sizeof(buf));
    }
}

This snippet shows the process of setting up epoll, registering standard input, and waiting for an event to read data. Here, we can observe three syscalls—initially registering the file descriptor and two more during each operation.

In stark contrast, here’s how the same logic unfolds using io_uring:


#include <liburing.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    struct io_uring ring;
    io_uring_queue_init(256, &ring, 0);

    while (1) {
        struct io_uring_sqe *sqe = io_uring_get_sqe(&ring);
        io_uring_prep_read(sqe, STDIN_FILENO, buf, sizeof(buf));
        io_uring_submit(&ring);
        struct io_uring_cqe *cqe;
        io_uring_wait_cqe(&ring, &cqe);
        io_uring_cqe_seen(&ring, cqe);
    }
}

This io_uring example displays fewer resources being used overall. While it still employs an io_uring_enter() call at the batch level, the batching nature helps offset performance costs. Keep in mind that, unlike the epoll version, this example does not account for cases where the submission queue may be full—another aspect developers must consider when implementing it.

Choosing the right tool for your project

Both epoll and io_uring have their advantages, but the tradeoffs between architectural simplicity and advanced performance tuning can impact which technology is preferred in different scenarios.

If you are developing on a relatively modern operating system that supports kernel version 5.1 or newer, it is generally advantageous to choose io_uring for its performance benefits and reduced syscall overhead. For projects aimed at scalability or those experiencing high I/O demands, adopting io_uring can facilitate faster response times and a more efficient use of resources.

Epoll may still have relevance for legacy systems or applications with strict compatibility requirements, but for any new projects, especially those that can afford to focus solely on the current generation of Linux, io_uring poses the better option.