PhD Defence • Systems and Networking • Improving Reliability for Networked Systems and Software Execution | Cheriton School of Computer Science

Friday, May 8, 2026 11:00 am - 2:00 pm EDT (GMT -04:00)

Please note: This PhD defence will take place in DC 2310.

Haoyu Gu, PhD candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Ali Mashtizadeh

Reliability is a fundamental requirement for modern software systems and services. As these systems grow larger and serve more users, even minor failures or outages can escalate into critical incidents. Reliability is a broad concept that covers the reliability design of many systems. When narrowed down, three areas still lack good solutions: networked systems, software bug triage and diagnosis, and software executions.

HA/TCP improves the reliability of networked systems. HA/TCP is the first framework to support the migration and failover of TCP-based layer 7 network functions (NFs) for reliability and multi-node scalability. HA/TCP does not modify the TCP protocol allowing existing projects to take advantage of HA/TCP without client changes. HA/TCP actively replicates traffic from primary node to all replica nodes to keep the state in sync. In the case of a node failure, HA/TCP enables replica NFs to takeover connections in microseconds. Moreover, HA/TCP is completely transparent to the client, such that connection migration/failover are not visible to the client.

AutoPecker provides a solution for automatic software bug triage and diagnosis. AutoPecker achieves the best of both worlds by combining a low overhead record/replay system with customized sanitizers and other instrumentation that can be enabled on replay. On a program crash, or by a manual invocation by the user, AutoPecker captures a trace of the program execution and test it against a suite of sanitizers and programmer assertions. AutoPecker can run on the user’s or developer’s machine to automatically triage the bug and provide a detailed analysis.

PerfCheck provides a comprehensive solution for improving the reliability and reproducibility of software execution. PerfCheck collects configurations and specifications of the host system, allowing developers or researchers to share their project together with a PerfCheck report as a configuration baseline. When other researchers attempt to reproduce the execution results, PerfCheck allows them to inspect and identify differences in their local environment configurations, ensuring reliable reproduction.

Location Information

Location Address: DC - William G. Davis Computer Research Centre
200 University Avenue West
DC 2310
Waterloo, ON, CA N2L 3G1

Location coordinates: