Please note: This master’s thesis presentation will take place online.
Daewoo Kim, Master’s candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Trevor Brown
Memory management in multicore systems is a well studied area. Many approaches to memory management have been developed and tuned with specific hardware architectures in mind, capitalizing on hardware characteristics to improve performance. In this thesis, the focus is on memory allocation and reclamation in multicore systems.
I first identify and diagnose a performance anomaly in epoch based memory reclamation (EBR), one of the most popular approaches to reclaiming memory in multicore systems. EBR experiences significant performance degradation when running on multiple processor sockets. This degradation is related to the fact that EBR is vulnerable to thread delays. Even minor delays can trigger a chain reaction that induces longer delays and more substantial performance problems. Moreover, there I discovered a pathological interaction between EBR and the page migration mechanism in Linux, which is responsible for periodically relocating pages from one processor socket to another, in an attempt to improve performance. Page migration can cause small delays, which are amplified by EBR, especially on systems with more than two processor sockets.
To solve these issues, an improvement to EBR, called amortized batch free is introduced to limit the amplification of delays and performance degradation when freeing. Amortized batch free gradually reclaims objects, and can drastically reduce the average amount of time spent freeing an object.
I also propose a simple new variant of EBR called Token EBR. Token EBR is conceptually simpler and easier to implement than the state of the art in EBR algorithms, and performs similarly (and sometimes slightly better). Furthermore, I demonstrate that Token EBR can be improved using the amortized batch free technique.
Finally, I present a new design for an architecture aware memory allocator for multi-socket systems, using a state of the art allocator called Supermalloc as a starting point for my design. Several key bottlenecks in the original Supermalloc design are improved or eliminated in the new design. In particular, the new design dramatically improves performance when the address space is actively growing, reduces contention on shared resources, and optimizes memory accesses to reduce communication across processor sockets. Taking into account the lessons learned in the study of EBR, the new design also attempts to minimize the overhead of freeing objects. Formative experiments on a prototype implementation of this new allocator showed some performance improvement compared to the original Supermalloc allocator, but rigorous implementation and optimization is left for future work.