2015 technical reports

CS-2015-01
Title	Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases
Authors	Daniel Nicoara, Shahin Kamali, Khuzaima Daudjee and Lei Chen
Abstract	Social networks are large graphs that require multiple graph database servers to store and manage them. Each database server hosts a graph partition with the objectives of balancing server loads, reducing remote traversals (edge-cuts), and adapting the partitioning to changes in the structure of the graph in the face of changing workloads. To achieve these objectives, a dynamic repartitioning algorithm is required to modify an existing partitioning to maintain good quality partitions while not imposing a significant overhead to the system. In this paper, we introduce a lightweight repartitioner, which dynamically modifies partitioning using a small amount of resources. In contrast to the existing repartitioning algorithms, our lightweight repartitioner is efficient, making it suitable for use in a real system. We integrated our lightweight repartitioner into Hermes, which we designed as an extension of the open source Neo4j graph database system, to support workloads over partitioned graph data distributed over multiple servers. Using real-world social network data, we show that Hermes leverages the lightweight repartitioner to maintain high quality partitions and provides a two to three times performance improvement over the de-facto standard random hash-based partitioning.
Date	February
Report	Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases (PDF)

CS-2015-02
Title	The SPOT* System for Flexible Personal Heating and Cooling
Authors	Alimohammad Rabbani and S. Keshav
Abstract	Building on our prior work on the SPOT and SPOT+ Smart Personalized Office Thermal control systems, this paper presents SPOT, a cost-effective, flexible system for person- alized heating and cooling. Like its predecessors, SPOT senses occupancy and worker comfort to reactively control office temperature. Specifically, it uses the Predicted Mean Vote equation to determine worker comfort and actuates a fan or a heater to adjust the comfort level so that it lies between -0.5 and +0.5 in the ASHRAE comfort scale. Un- like the two prior systems, however, which use many sensors and a per-user compute engine, SPOT* greatly reduces costs by using the fewest possible sensors and a lightweight com- pute engine that can optionally even be located in the cloud. Moreover, SPOT* provides both cooling and heating using a speed-controlled desktop fan, rather than only controlling heating using a radiant heater. Finally, SPOT* is less intrusive in that it does not use a camera. We find that the per-user cost for SPOT* is about $185 compared to $1000 for SPOT. Moreover, in a preliminary deployment, SPOT* is able to improve user comfort by 78%.
Date	February 6, 2015
Report	The SPOT* System for Flexible Personal Heating and Cooling (PDF)

CS-2015-03
Title	Evaluating Call Graph Construction for JVM-hosted Language Implementations
Authors	Xiaoni Lai, Zhaoyi Luo, Karim Ali, Ondřej Lhoták, Julian Dolby, and Frank Tip
Abstract	An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing bytecode. While this approach works well when applied to bytecode produced from Java code, its efficacy when applied to other bytecode has not been studied until now. We present qualitative and quantitative analysis of the soundness and precision of call graphs constructed from JVM bytecodes produced for Python, Ruby, Clojure, Groovy, Scala, and OCaml applications. We show that, for Python, Ruby, Clojure, and Groovy, the call graphs are unsound due to use of reflection, invokedynamic instructions, and run-time code generation, and imprecise due to how function calls are translated. For Scala and OCaml, all unsoundness comes from rare, complex uses of reflection and proxies, and the translation of first-class features in Scala incurs a significant loss of precision.
Date	February 12, 2015
Report	Evaluating Call Graph Construction for JVM-hosted Language Implementations (PDF)

CS-2015-04
Title	Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems
Authors	Minyang Han and Khuzaima Daudjee
Abstract	The bulk synchronous parallel (BSP) model used by synchronous graph processing systems allows algorithms to be easily implemented and reasoned about. However, BSP can suffer from poor performance due to stale messages and frequent global synchronization barriers. Asynchronous computation models have been proposed to alleviate these overheads but existing asynchronous systems that implement such models have limited scalability or retain frequent global barriers, and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that reduces both message staleness and global synchronization. This enables BAP to overcome the limitations of existing asynchronous models while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate our system at scale with large real-world graphs on 64 EC2 machines. We show that GiraphUC provides across-the-board performance improvements of up to $5\times$ faster over synchronous systems and up to an order of magnitude faster than asynchronous systems. Our results demonstrate that the BAP model provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously.
Date	April 6, 2015
Report	Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems (PDF)

CS-2015-05
Title	The Development of Normative Autonomous Agents: an Approach
Authors	Marx Viana¹, Paulo Alencar², Donald Cowan², Carlos J. P. de Lucena¹
Abstract	Open multi-agent systems (MASs) act as societies in which autonomous and heterogeneous agents can work towards similar or different goals. In order to cope with the heterogeneity, autonomy and diversity of interests among the different agents in the society, open MASs establish a set of behavioural norms that is used as a mechanism to ensure a state of co-operation among agents. Such norms regulate the behaviour of the agents by defining obligations, permissions and prohibitions. Fulfillment of a norm may be encouraged through a reward while violation of a norm may be discouraged through punishment. Although norms are promising mechanisms to regulate an agent’s behaviour, we should note that each agent is an autonomous entity that is free to fulfill or violate each associated norm. Thus, agents can use different strategies when deciding to achieve their goals including whether to comply with their associated norms. Agents might choose to achieve their goals while ignoring their norms, thus overlooking the rewards or punishments they may receive. In contrast agents may choose to comply with all the norms although some of their goals may not be achieved. In this context, this paper proposes a framework for simulation of normative agents providing a basis for understanding the impacts of norms on agents.
Date	April 27, 2015
Report	The Development of Normative Autonomous Agents: an Approach (PDF)

CS-2015-06
Title	Adapting to Climate Change - An Open Data Platform for Cumulative Environmental Analysis and Management
Authors	Donald Cowan, Paulo Alencar, Fred McGarry and Mark R. Palmer
Abstract	The frequency of extreme weather events has accelerated, an apparent outcome of progressive climate change. Excess water is a significant consequence of these events and is now the leading cause of insurance claims for infrastructure and property damage. Governments recognize that plans for growth must reflect communities' needs, strengths and opportunities while balancing the cumulative effects of economic growth with environmental concerns. Legislation must incorporate the cumulative effects of economic growth with adaptation to weather events to protect the environment and citizens, while ensuring that products of growth such as buildings and infrastructure are resilient. For such a process to be effective it will be necessary for the private sector to develop and operate cumulative effect decision support software (CEDSS) tools and to work closely with all levels of government including watershed management authorities (WMAs) that supply environmental data. Such co-operation and sharing will require a new Open Data information-sharing platform managed by the private sector. This paper outlines that platform, its operation and possible governance model.
Date	May 21, 2015
Report	Adapting to Climate Change - An Open Data Platform for Cumulative Environmental Analysis and Management (PDF)

CS-2015-07
Title	Precise Data Flow Analysis in the Presence of Correlated Method Calls
Authors	Marianna Rapoport, Ondřej Lhoták, and Frank Tip
Abstract	When two methods are invoked on the same object, the dispatch behaviours of these method calls will be correlated. If two correlated method calls are polymorphic (i.e., they dispatch to different method definitions depending on the type of the receiver object), a program’s interprocedural control flow graph will contain infeasible paths. Existing algorithms for data-flow analysis are unable to ignore such infeasible paths, giving rise to loss of precision. We show how infeasible paths due to correlated calls can be eliminated for Interprocedural Finite Distributive Subset (IFDS) problems, a large class of data-flow analysis problems with broad applications. Our approach is to transform an IFDS problem into an Interprocedural Distributive Environment (IDE) problem, in which edge functions filter out data flow along infeasible paths. A solution to this IDE problem can be mapped back to the solution space of the original IFDS problem. We formalize the approach, prove it correct, and report on an implementation in the WALA analysis framework.
Date	June 9, 2015
Report	Precise Data Flow Analysis in the Presence of Correlated Method Calls (PDF)

CS-2015-08
Title	CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems
Authors	Sharon Choy, Bernard Wong, Xu Cui, and Xiaoyi Liu
Abstract	Current transaction systems for geo-distributed datastores either have high transaction processing latencies or are unable to support general transactions with dependent operations. In this paper, we introduce CrossStitch, an efficient transaction processing framework that reduces latency by restructuring each transaction into a chain of state transitions, where each state consists of a key operation and computation. Transaction states are processed sequentially, and the transaction code and data is sent directly to the next hop in the chain. CrossStitch transactions can be organized such that all states in a location are processed before transitioning to a state in a different location. This allows CrossStitch to significantly reduce the number of inter-location crossings compared to transaction systems that retrieve remote data to a single location for processing. To provide transactional properties while preserving the chain communication pattern, CrossStitch introduces a pipelined commit protocol that executes in parallel with the transaction and does not require any centralized co-ordination. Our evaluation results show that CrossStitch can reduce the latency of geo-distributed transactions when compared to a traditional 2PC-based distributed transaction system. We demonstrate that CrossStitch can reduce the number of round trips by more than half for TPC-C-like transactions.
Date	July 28, 2015
Report	CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems (PDF)

CS-2015-09
Title	Nessie: A Decoupled, Client-Driven, Key-Value Store using RDMA
Authors	Tyler Szepesi, Benjamin Cassell, Bernard Wong, Tim Brecht, Xiaoyi Liu
Abstract	The increasing use of key-value storage systems in performance-critical data centre applications has moti- vated new storage system designs that use Remote Di- rect Memory Access (RDMA) to reduce communication overhead. However, existing approaches that achieve low latency and high throughput do so by dedicating en- tire cores to RDMA message handling, including polling local memory for incoming RDMA messages. In this paper we describe and demonstrate why polling-based RDMA is not suitable for many data cen- tre applications with significant data processing and data storage requirements. We then propose, design, implement and evaluate an alternative communication approach that strictly uses one-sided RDMA opera- tions, which eliminates polling on the server-side and is therefore a serverless (client-driven) design. This approach is used to build Nessie, a distributed client- driven RDMA-enabled key-value store that uses na- tive RDMA compare-and-swap operations to co-ordinate conflicting operations between clients. Nessie further reduces communication requirements by decoupling the key-value location from the key lookup mechanism, which enables local write operations, thus avoiding com- paratively expensive RDMA operations. Nessie’s one- sided design ensures that only the clients in the system consume any CPU, which in turn allows it to main- tain high performance despite the need for computa- tion by the application utilizing the key-value store or contention from other applications.
Date	July 22, 2015
Report	Nessie: A Decoupled, Client-Driven, Key-Value Store using RDMA (PDF)

CS-2015-10
Title	Mayflower: Improving Distributed Filesystem Performance Through SDN/Filesystem Co-Design
Authors	Sajjad Rizvi, Xi Li, Bernard Wong, Xiao Cao, and Benjamin Cassell
Abstract	The increasing prevalence of oversubscribed networks and fast solid-state storage devices in the datacenter has made the network the new performance bottleneck for many distributed filesystems. As a result, distributed filesystems need to be network-aware in order to make more effective use of available network resources. In this paper, we introduce Mayflower, a new distributed filesystem that is not only network-aware, it is co-designed from the ground up to work together with a network control plane. In addition to the standard distributed filesystem components, Mayflower has a flow monitor and manager running inside a software-defined networking controller. This tight coupling with the network controller enables Mayflower to make intelligent replica selection and flow scheduling decisions based on both filesystem and network information. It also enables Mayflower to perform global optimizations that are unavailable to conventional distributed filesystems and network control planes. Our evaluation results from both simulations and a prototype implementation show that Mayflower reduces average read completion time by more than 25% compared to current state-of-the-art distributed filesystems with an independent network flow scheduler, and more than 80% compared to HDFS with ECMP.
Date	July 23, 2015
Report	Mayflower: Improving Distributed Filesystem Performance Through SDN/Filesystem Co-Design (PDF)

CS-2015-11
Title	Lanternfish: Better Random Networks Through Optics
Authors	Tyler Szepesi, Bernard Wond, Fiodar Kazhami, Sajjad Rizvi, and Tim Brecht
Abstract	Current random datacenter network designs, such as Jellyfish, directly wire top-of-rack switches together randomly, which is difficult even when using best-practice cable management techniques. Moreover, these are static designs that cannot adapt to changing workloads. In this paper, we introduce Lanternfish, a new approach to building random datacenter networks using an optical ring that significantly reduces wiring complexity and provides the opportunity for reconfigurability. Unlike previous optical ring designs, Lanternfish does not require wavelength planning because it is specifically designed to provide random connectivity between switches. This design choice further reduces the difficulty of deploying a Lanternfish network, making random datacenter networks more practical. Our experimental results using both simulations and network emulation show that Lanternfish can effectively provide the same network properties as Jellyfish. Additionally, we demonstrate that by replacing passive optical components with active optical components at each switch, we can dynamically reconfigure the network topology to better suit the workload while remaining cost competitive. Lanternfish is able to construct workload specific topologies that provide as much as half the average pathlength than a Jellyfish deployment with twice as many switch-to-switch connections.
Date	July 23, 2015
Report	Lanternfish: Better Random Networks Through Optics (PDF)

CS-2015-13
Title	A Modular Notation for Monitoring Network Systems
Authors	Prashant Raghav and Richard Tefler
Abstract	Design of next generation network systems with predictable behaviour in all situations poses a significant challenge. Monitoring of events happening at different points in a distributed environment can detect the occurrence of events that indicates significant error conditions. We use Modular Timing Diagrams (MTD) as a specification language to describe these error conditions. MTDs are a component-oriented and compositional notation. We take advantage of these features of MTDs and point out that, in many cases, a global MTD specification describing behaviours of several system components can be efficiently decomposed into a set of sub-specifications. Each of the sub-specifications describes a local monitor that is specific to the component on which the monitor is intended to run. We illustrate the compositional nature of MTDs in describing several network monitoring conditions related to network security.
Date	July 16, 2015
Report	A Modular Notation for Monitoring Network Systems (PDF)

CS-2015-14
Title	Representing Behavioural Models with Rich Control Structures in SMT-LIB
Authors	Nancy A. Day and Amirhossein Vakili
Abstract	We motivate and present a proposal for how to represent extended nite state machine behavioural models with rich hierarchical states and compositional control structures (e.g., the Statecharts family) in SMT-LIB. Our goal with such a representation is to facilitate automated deductive reasoning on such models, which can exploit the structure found in the control structures. We present a novel method that combines deep and shallow encoding techniques to describe models that have both rich control structures and rich datatypes. Our representation permits varying semantics to be chosen for the control structures recognizing the rich variety of semantics that exist for the family of extended nite state machine languages. We hope that discussion of these representation issues will facilitate model sharing for investigation of analysis techniques.
Date	September 1, 2015
Report	Representing Behavioural Models with Rich Control Structures in SMT-LIB (PDF)

CS-2015-15
Title	A Qualitative Exploratory Study of How OSS Developers Define Code Review Quality
Authors	Oleksii Kononenko and Olga Baysal
Abstract	In a large, long-lived project, an effective code review process is key to ensuring the long-term quality of the code base. In this work, we study code review practices of a large, open source project, and we investigate how the developers themselves perceive code review quality. We present a qualitative study that summarizes the results from a survey of 88 Mozilla core developers. The results provide developer insights into how they define review quality, what factors contribute to how they evaluate submitted code, and what challenges they face when performing review tasks. We found that the review quality is primarily associated with the thoroughness of the feedback, the reviewer’s familiarity with the code, and the perceived quality of the code itself. Also, we found that while different factors are perceived to contribute to the review quality, reviewers often find it difficult to keep their technical skills up-to-date, manage personal priorities, and mitigate context switching.
Date	September 11, 2015
Report	A Qualitative Exploratory Study of How OSS Developers Define Code Review Quality (PDF)

CS-2015-16
Title	Feedback-Directed Instrumentation for Deployed JavaScript Applications
Authors	Magnus Madsen, Frank Tip, Esben Andreasen, Koushik Sen, Anders Moller
Abstract	Many bugs in JavaScript applications manifest themselves as objects that have incorrect property values when a failure occurs. For this type of error, stack traces and log files are often insufficient for diagnosing problems. In such cases, it is helpful for developers to know the control flow path from the creation of an object to a crashing statement. Such crash paths are useful for understanding where the object originated and whether any properties of the object were corrupted since its creation. We present a feedback-directed instrumentation technique for computing crash paths that allows the instrumentation overhead to be distributed over a crowd of users and to re- duce it for users who do not encounter the crash. We imple- mented our technique in a tool, Crowdie, and evaluated it on 10 real-world issues for which error messages and stack traces are insufficient to isolate the problem. Our results show that feedback-directed instrumentation requires 5% to 25% of the program to be instrumented, that the same crash must be observed three to ten times to discover the crash path, and that feedback-directed instrumentation typically slows down execution by a factor 2x–9x compared to 8x–90x for an approach where applications are fully instrumented.
Date	September 8, 2015
Report	Feedback-Directed Instrumentation for Deployed JavaScript Applications (PDF)

CS-2015-17
Title	NoSE: Schema Design for NoSQL Applications
Author	Michael J. Mior, Kenneth Salem, Ashraf Aboulnaga
Abstract	Database design is critical for high performance in relational databases and many tools exist to aid application designers in selecting an appropriate schema. While the problem of schema optimization is also highly relevant for NoSQL databases, existing tools for relational databases are inadequate for this setting. Application designers wishing to use a NoSQL database instead rely on rules of thumb to select an appropriate schema. We present a system for recommending database schemas for NoSQL applications. Our cost-based approach uses a novel binary integer programming formulation to guide the mapping from the application's conceptual data model to a database schema. We implemented a prototype of this approach for the Cassandra extensible record store. Our prototype, the NoSQL Schema Evaluator (NoSE) is able to capture rules of thumb used by expert designers without explicitly encoding the rules. Automating the design process allows NoSE to produce efficient schemas and to examine more alternatives than would be possible with a manual rule-based approach.
Date	October 19, 2015
Report	NoSE: Schema Design for NoSQL Applications (PDF)

CS-2015-18
Title	Comparison of Affect-Control Theoretic Agents and Humans in the Prisoner's Dilemma
Author	Joshua D. A. Jung, Jesse Hoey, Jonathan H. Morgan, Tobias Schröder and IngoWolf
Abstract	Symbolic interactionist principles of sociology are based on the idea that human action is guided by culturally shared symbolic representations of identities, behaviours, situations and emotions. Shared linguistic, paralinguistic, or kinesic elements allow humans to co-ordinate action by enacting identities in social situations. Structures of identity-based interactions can lead to the enactment of social orders that solve social dilemmas (e.g., by promoting co-operation). Our goal is to build an artificial agent that mimics the identity-based interactions of humans, and to compare networks of such agents to human networks. In this paper, we take a first step in this direction, and describe a study in which humans played a repeated prisoner's dilemma game against other humans, or against one of three artificial agents (bots). One of the bots has an explicit representation of identity (for self and other), and attempts to optimise with respect to this representation. We compare the human play against bots to human play against humans, and show how the identity-based bot exhibits the most human-like behaviour.
Date	November 2, 2015
Report	Comparison of Affect-Control Theoretic Agents and Humans in the Prisoner's Dilemma (PDF)

CS-2015-19
Title	Big Open Data for Environmental Information Systems
Author	Donald Cowan, Paulo Alencar, Fred McGarry, R. MarkPalmer, Trevor Boston, Rigel Rozanski
Abstract	The frequency of extreme weather events has accelerated, an apparent outcome of progressive climate change. Excess water is a significant consequence of these events and is now the leading cause of insurance claims for infrastructure and property damage. Governments recognize that plans for growth must reflect communities needs, strengths and opportunities while balancing the cumulative effects of economic growth with environmental concerns. For such a process to be effective it will be necessary to develop and operate cumulative effect decision support software (CEDSS) tools, and to work closely with all levels of government including watershed management authorities (WMAs) that supply environmental data. Such co-operation and sharing will require a new open and big data information-sharing platform, which is described in this paper as an open and big data platform for environmental analysis and management.
Date	December 4, 2015
Report	Big Open Data for Environmental Information Systems (PDF)