Master’s Thesis Presentation • Systems and Networking • A Fault Injection Tool for Testing Distributed System with Network Faults

Friday, January 26, 2024 10:30 am - 11:30 am EST (GMT -05:00)

Please note: This master’s thesis presentation will take place online.

Seba Khaleel, Master’s candidate
David R. Cheriton School of Computer Science

Supervisor: Professor Samer Al-Kiswany

Modern systems are complex, they include hundreds of components that implement complex protocols such as scheduling, replication, membership, resource management, client access, and security. These systems are expected to offer high availability and to preserve data stored in them despite environment faults. Testing is the primary approach for improving system reliability. Testing against environment faults such as hardware failures, memory corruption, and network problems is complicated since environment faults happen at any time in the system life time, at any component, and at any step in a complex protocol.

In this work, we focus on testing for network partitioning faults. We build PPATT, a fault injection testing tool that injects network partition faults between components. To reduce the number of test scenarios that need to be considered we implement two techniques to focus testing on components that communicate during an operation. We verify the tool through reproducing four catastrophic failures from two widely popular systems: Spark and Kafka. To demonstrate the benefit of our system we test three systems using PPATT: Flink, Hazelcast, and ActiveMQ Artemis. Our testing discovers three failures in these systems. All these failures are due to design flaws.


Bio: Seba is pursuing an MMath degree under the supervision of Prof. Samer Al-Kiswany. Her research focuses on testing distributed systems resiliency to network partitions. Her thesis explores using fault injection testing to inject network faults between connections established during distributed system operations.