PhD Defence • Systems and Networking • Improving Reliability for Networked Systems and Software Execution

Friday, May 8, 2026 11:00 am - 2:00 pm EDT (GMT -04:00)

Please note: This PhD defence will take place in DC 2310.

Haoyu Gu, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Ali Mashtizadeh

Reliability is a fundamental requirement for modern software systems and services. As these systems grow larger and serve more users, even minor failures or outages can escalate into critical incidents. Reliability is a broad concept that covers the reliability design of many systems. When narrowed down, three areas still lack good solutions: networked systems, software bug triage and diagnosis, and software executions.

HA/TCP improves the reliability of networked systems. HA/TCP is the first framework to support the migration and failover of TCP-based layer 7 network functions (NFs) for reliability and multi-node scalability. HA/TCP does not modify the TCP protocol allowing existing projects to take advantage of HA/TCP without client changes. HA/TCP actively replicates traffic from primary node to all replica nodes to keep the state in sync. In the case of a node failure, HA/TCP enables replica NFs to takeover connections in microseconds. Moreover, HA/TCP is completely transparent to the client, such that connection migration/failover are not visible to the client.

AutoPecker provides a solution for automatic software bug triage and diagnosis. AutoPecker achieves the best of both worlds by combining a low overhead record/replay system with customized sanitizers and other instrumentation that can be enabled on replay. On a program crash, or by a manual invocation by the user, AutoPecker captures a trace of the program execution and test it against a suite of sanitizers and programmer assertions. AutoPecker can run on the user’s or developer’s machine to automatically triage the bug and provide a detailed analysis.

PerfCheck provides a comprehensive solution for improving the reliability and reproducibility of software execution. PerfCheck collects configurations and specifications of the host system, allowing developers or researchers to share their project together with a PerfCheck report as a configuration baseline. When other researchers attempt to reproduce the execution results, PerfCheck allows them to inspect and identify differences in their local environment configurations, ensuring reliable reproduction.