Please note: This seminar will be given online.
Ramnatthan Alagappan, Postdoctoral Researcher
VMware Research Group, VMware
Distributed storage systems form the core of modern cloud services. Like many systems software, these systems are built using layering: designers layer distributed protocols (e.g., Paxos, 2PC) upon local storage stacks. Such layering abstracts details about the local storage stack to the layers above, easing development. I will show that such black-box layering, unfortunately, masks vital information, resulting in poor reliability and missed performance opportunities. I will then demonstrate that it is greatly beneficial to expose useful information across layers of a distributed storage system (while hiding unimportant details). In particular, I will show that reliability and performance can be significantly improved by co-designing distributed systems and storage stacks.
In this talk, I will focus on reliability and first show how local problems in the storage layer can lead to data loss, corruption, and unavailability in widely used distributed storage systems. I then present CTRL, a new approach that co-designs the storage stack and the distributed layers to cooperate with each other to perform correct recovery. I implement CTRL in two practical systems and show that CTRL incurs negligible performance overhead while significantly improving resiliency to storage faults. Towards the end, I briefly discuss how higher performance can be realized through a similar co-design.
Bio: Ram Alagappan is a postdoctoral researcher at the VMware Research Group. He earned his Ph.D., working with Professors Andrea Arpaci-Dusseau and Remzi Arpaci-Dusseau at the University of Wisconsin – Madison. His work has been published at top systems venues, invited to journals, and has won three best paper awards (FAST 17, 18, and 20). His dissertation also won an honorable mention for the UW CS Best Dissertation. His open-source frameworks have had a practical impact: these tools have exposed more than 80 severe vulnerabilities across 20 widely used systems. Ideas from his work on CTRL have been adopted by a financial database to make it resilient to storage faults.
To join this seminar on Zoom, please go to https://uwaterloo.zoom.us/j/94820006346?pwd=d1lpQ2R5b3M5T29KaU42SENzeUJndz09.
200 University Avenue West
Waterloo, ON N2L 3G1