We generally spend our time talking about databases, but occasionally run into fun technical challenges that seem worth sharing. Here’s something one of our newest team members (Paul Rubin) recently learned.
One of our upcoming features requires low level, high performance networking (more about the actual feature later…). We originally prototyped the feature with Python, and exercised the new tool with a LOT of open TCP connections which caused some weird crashes. The crashes seemed random; it took us time to undercover the underlying snag. The fix turned out to be fairly easy, as you might expect.
The snag we encountered was due to a Linux limitation which is poorly documented and not widely known. The limitation is in the underlying Linux
select() system call and as such, it applies to all programs (in whatever language) that use
The simplest way to listen to several sockets or pipes concurrently is with the
select system call. In Python,
select takes 3 arrays of file or socket objects to know what to listen to. Python documentation does not mention the library translates the arrays to bit vectors indexed by numeric file descriptors, which is implemented in C as part of the system interface. The bit vectors have fixed sizes determined by a kernel parameter, 1024 bits by default. Even if your
select() call is listening on only one socket, given the socket’s
.fileno() is higher than 1024,
select() cannot handle the connections, and will trigger a runtime error.
The solution is to use
select.epoll() instead of
Epolldoes not have the 1024 file descriptor limitation and, as a bonus, is more efficient than
select. When listening to a large number of sockets,
Epoll is quicker because the library does not linearly scan a large bitmap result searching for sockets with available data. In Python, this does not matter since building the return array is likely to be much slower than scanning the bitmap, but high-performance server implementations should take this into account.
epoll events give OS-level numeric file descriptor numbers rather than mapping to the associated Python socket or file objects. Mapping the events manually can be tricky in situations when sockets are opened and closed in multiple application locations. OS-level file descriptors can be reused after begin closed, so the mapping must be fresh.
The classic article about high-concurrency server implementation is “The C10K problem” by Dan Kegel http://www.kegel.com/c10k.html. It is a bit out of date by now, but still worth reading for anyone working in this area.