User-level Threading: Have Your Cake and Eat It Too

Martin Karsten and Saman Barghi

An important class of computer software, such as network servers, exhibits concurrency through many loosely coupled and potentially long-running communication sessions. For these applications, a long-standing open question is whether thread-per-session programming can deliver comparable performance to event-driven programming. This paper clearly demonstrates, for the first time, that it is possible to employ user-level threading for building thread-per-session applications without compromising functionality, efficiency, performance, or scalability. We present the design and implementation of a general-purpose, yet nimble, user-level M:N threading runtime that is built from scratch to accomplish these objectives. Its key components are efficient and effective load balancing and user-level I/O blocking. While no other runtime exists with comparable characteristics, an important fundamental finding of this work is that building this runtime does not require particularly intricate data structures or algorithms. The runtime is thus a straightforward existence proof for user-level threading without performance compromises and can serve as a reference platform for future research. It is evaluated in comparison to event-driven software, system-level threading, and several other user-level threading runtimes. An experimental evaluation is conducted using benchmark programs, as well as the popular Memcached application. We demonstrate that our user-level runtime outperforms other threading runtimes and enables thread-per-session programming at high levels of concurrency and hardware parallelism without sacrificing performance.

ACM SIGMETRICS 2020
Proceedings of the ACM on Measurement and Analysis of Computing Systems, Volume 4, Issue 1, May 2020

Preprint
Presentation Slides
Supplementary Material

Notice

This is the author's version of the work. It is posted here for your personal use. Not for redistribution. The definitive version is published in ACM POMACS, https://doi.org/10.1145/3379483.

An extended abstract and video presentation is published in ACM PER, https://doi.org/10.1145/3393691.3394226.