Please note: This PhD defence will take place in DC 3317 and online.
Sreeharsha Udayashankar, PhD candidate
David R. Cheriton School of Computer Science
Supervisor: Professor Samer Al-Kiswany
The exponential growth of digital data generation imposes severe performance and efficiency demands on modern datacenter infrastructure. While hardware accelerators, such as CPUs supporting SIMD instruction sets and network switches supporting P4-based programmability, have the potential to mitigate these bottlenecks, their adoption in large-scale systems is hindered by restrictive programming models and resource constraints. This thesis addresses these challenges by redesigning deduplicated storage systems and cluster schedulers to leverage hardware-acceleration effectively.
This thesis first presents VectorCDC, a method for accelerating data deduplication by restructuring hashless content-defined chunking (CDC) algorithms to exploit vector instructions. By identifying and optimizing common processing patterns, specifically Extreme Byte Searches and Range Scans, VectorCDC achieves throughput improvements of 8.35x–26.2x over existing vector-accelerated techniques and up to 207.2x over unaccelerated baselines across x86, ARM, and IBM architectures.
To further address the throughput degradation of CDC algorithms at the large chunk sizes favored by production systems, this thesis presents SeqCDC. SeqCDC is a novel chunking algorithm that uses a novel lightweight boundary detection mechanism and content-defined data skipping. SeqCDC improves chunking throughput by 10x over unaccelerated algorithms and 25–30% over the fastest vector-accelerated alternatives, while minimally affecting deduplication efficiency.
Finally, this thesis proposes Draconis, a network-accelerated scheduler built using P4 programmable switches, designed to support microsecond-scale workloads. Draconis forgoes the inefficient design adopted by prior switch-based schedulers by implementing a switch-compatible task queue with delayed pointer correction, eliminating the latency penalties caused by node-level blocking. Evaluation results demonstrate that Draconis reduces the 99th percentile scheduling delay by 3x–200x over state-of-the-art network-accelerated solutions, and increases scheduling throughput by 52x–116x over state-of-the-art server-based solutions.
To attend this PhD defence in person, please go to DC 3317. You can also attend virtually on MS Teams.