CS-2015-01 | ||||
---|---|---|---|---|
Title |
Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases | |||
Authors |
Daniel Nicoara, Shahin Kamali, Khuzaima Daudjee and Lei Chen | |||
Abstract |
Social networks are large graphs that require multiple graph database servers to store and manage them. Each database server hosts a graph partition with the objectives of balancing server loads, reducing remote traversals (edge-cuts), and adapting the partitioning to changes in the structure of the graph in the face of changing workloads. To achieve these objectives, a dynamic repartitioning algorithm is required to modify an existing partitioning to maintain good quality partitions while not imposing a significant overhead to the system. In this paper, we introduce a lightweight repartitioner, which dynamically modifies partitioning using a small amount of resources. In contrast to the existing repartitioning algorithms, our lightweight repartitioner is efficient, making it suitable for use in a real system. We integrated our lightweight repartitioner into Hermes, which we designed as an extension of the open source Neo4j graph database system, to support workloads over partitioned graph data distributed over multiple servers. Using real-world social network data, we show that Hermes leverages the lightweight repartitioner to maintain high quality partitions and provides a two to three times performance improvement over the de-facto standard random hash-based partitioning. | |||
Date | February | |||
Report | Hermes: Dynamic Partitioning for Distributed Social Network Graph Databases (PDF) |
CS-2015-02 | ||||
---|---|---|---|---|
Title |
The
SPOT*
System
for
Flexible
Personal | |||
Authors |
Alimohammad Rabbani and S. Keshav | |||
Abstract |
Building
on
our
prior
work
on
the
SPOT
and
SPOT+
Smart Personalized Office Thermal control systems, this paper presents SPOT*, a cost-effective, flexible system for person- alized heating and cooling. Like its predecessors, SPOT* senses occupancy and worker comfort to reactively control office temperature. Specifically, it uses the Predicted Mean Vote equation to determine worker comfort and actuates a fan or a heater to adjust the comfort level so that it lies between -0.5 and +0.5 in the ASHRAE comfort scale. Un- like the two prior systems, however, which use many sensors and a per-user compute engine, SPOT* greatly reduces costs by using the fewest possible sensors and a lightweight com- pute engine that can optionally even be located in the cloud. Moreover, SPOT* provides both cooling and heating using a speed-controlled desktop fan, rather than only controlling heating using a radiant heater. Finally, SPOT* is less intrusive in that it does not use a camera. We find that the per-user cost for SPOT* is about $185 compared to $1000 for SPOT. Moreover, in a preliminary deployment, SPOT* is able to improve user comfort by 78%. | |||
Date | February 6, 2015 | |||
Report |
The
SPOT*
System
for
Flexible
Personal Heating and Cooling (PDF) |
CS-2015-03 | ||||
---|---|---|---|---|
Title |
Evaluating Call Graph Construction for JVM-hosted Language Implementations | |||
Authors |
Xiaoni Lai, Zhaoyi Luo, Karim Ali, Ondřej Lhoták, Julian Dolby, and Frank Tip | |||
Abstract |
An increasing number of programming languages compile to the Java Virtual Machine (JVM), and program analysis frameworks such as WALA and SOOT support a broad range of program analysis algorithms by analyzing bytecode. While this approach works well when applied to bytecode produced from Java code, its efficacy when applied to other bytecode has not been studied until now. We present qualitative and quantitative analysis of the soundness and precision of call graphs constructed from JVM bytecodes produced for Python, Ruby, Clojure, Groovy, Scala, and OCaml applications. We show that, for Python, Ruby, Clojure, and Groovy, the call graphs are unsound due to use of reflection, invokedynamic instructions, and run-time code generation, and imprecise due to how function calls are translated. For Scala and OCaml, all unsoundness comes from rare, complex uses of reflection and proxies, and the translation of first-class features in Scala incurs a significant loss of precision. | |||
Date | February 12, 2015 | |||
Report | Evaluating Call Graph Construction for JVM-hosted Language Implementations (PDF) |
CS-2015-04 | ||||
---|---|---|---|---|
Title | Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems | |||
Authors |
Minyang Han and Khuzaima Daudjee | |||
Abstract |
The bulk synchronous parallel (BSP) model used by synchronous graph processing systems allows algorithms to be easily implemented and reasoned about. However, BSP can suffer from poor performance due to stale messages and frequent global synchronization barriers. Asynchronous computation models have been proposed to alleviate these overheads but existing asynchronous systems that implement such models have limited scalability or retain frequent global barriers, and do not always support graph mutations or algorithms with multiple computation phases. We propose barrierless asynchronous parallel (BAP), a new computation model that reduces both message staleness and global synchronization. This enables BAP to overcome the limitations of existing asynchronous models while retaining support for graph mutations and algorithms with multiple computation phases. We present GiraphUC, which implements our BAP model in the open source distributed graph processing system Giraph, and evaluate our system at scale with large real-world graphs on 64 EC2 machines. We show that GiraphUC provides across-the-board performance improvements of up to $5\times$ faster over synchronous systems and up to an order of magnitude faster than asynchronous systems. Our results demonstrate that the BAP model provides efficient and transparent asynchronous execution of algorithms that are programmed synchronously. | |||
Date | April 6, 2015 | |||
Report | Giraph Unchained: Barrierless Asynchronous Parallel Execution in Pregel-like Graph Processing Systems (PDF) |
CS-2015-05 | ||||
---|---|---|---|---|
Title |
The Development of Normative Autonomous Agents: an Approach | |||
Authors |
Marx Viana1, Paulo Alencar2, Donald Cowan2, Carlos J. P. de Lucena1 | |||
Abstract |
Open multi-agent systems (MASs) act as societies in which autonomous and heterogeneous agents can work towards similar or different goals. In order to cope with the heterogeneity, autonomy and diversity of interests among the different agents in the society, open MASs establish a set of behavioural norms that is used as a mechanism to ensure a state of co-operation among agents. Such norms regulate the behaviour of the agents by defining obligations, permissions and prohibitions. Fulfillment of a norm may be encouraged through a reward while violation of a norm may be discouraged through punishment. Although norms are promising mechanisms to regulate an agent’s behaviour, we should note that each agent is an autonomous entity that is free to fulfill or violate each associated norm. Thus, agents can use different strategies when deciding to achieve their goals including whether to comply with their associated norms. Agents might choose to achieve their goals while ignoring their norms, thus overlooking the rewards or punishments they may receive. In contrast agents may choose to comply with all the norms although some of their goals may not be achieved. In this context, this paper proposes a framework for simulation of normative agents providing a basis for understanding the impacts of norms on agents. | |||
Date | April 27, 2015 | |||
Report | The Development of Normative Autonomous Agents: an Approach (PDF) |
CS-2015-06 | ||||
---|---|---|---|---|
Title |
Adapting to Climate Change - An Open Data Platform for Cumulative Environmental Analysis and Management | |||
Authors |
Donald Cowan, Paulo Alencar, Fred McGarry and Mark R. Palmer | |||
Abstract |
The frequency of extreme weather events has accelerated, an apparent outcome of progressive climate change. Excess water is a significant consequence of these events and is now the leading cause of insurance claims for infrastructure and property damage. Governments recognize that plans for growth must reflect communities' needs, strengths and opportunities while balancing the cumulative effects of economic growth with environmental concerns. Legislation must incorporate the cumulative effects of economic growth with adaptation to weather events to protect the environment and citizens, while ensuring that products of growth such as buildings and infrastructure are resilient. For such a process to be effective it will be necessary for the private sector to develop and operate cumulative effect decision support software (CEDSS) tools and to work closely with all levels of government including watershed management authorities (WMAs) that supply environmental data. Such co-operation and sharing will require a new Open Data information-sharing platform managed by the private sector. This paper outlines that platform, its operation and possible governance model. | |||
Date | May 21, 2015 | |||
Report | Adapting to Climate Change - An Open Data Platform for Cumulative Environmental Analysis and Management (PDF) |
CS-2015-07 | ||||
---|---|---|---|---|
Title |
Precise Data Flow Analysis in the Presence of Correlated Method Calls | |||
Authors |
Marianna Rapoport, Ondřej Lhoták, and Frank Tip | |||
Abstract |
When two methods are invoked on the same object, the dispatch behaviours of these method calls will be correlated. If two correlated method calls are polymorphic (i.e., they dispatch to different method definitions depending on the type of the receiver object), a program’s interprocedural control flow graph will contain infeasible paths. Existing algorithms for data-flow analysis are unable to ignore such infeasible paths, giving rise to loss of precision. We show how infeasible paths due to correlated calls can be eliminated for Interprocedural Finite Distributive Subset (IFDS) problems, a large class of data-flow analysis problems with broad applications. Our approach is to transform an IFDS problem into an Interprocedural Distributive Environment (IDE) problem, in which edge functions filter out data flow along infeasible paths. A solution to this IDE problem can be mapped back to the solution space of the original IFDS problem. We formalize the approach, prove it correct, and report on an implementation in the WALA analysis framework. | |||
Date | June 9, 2015 | |||
Report | Precise Data Flow Analysis in the Presence of Correlated Method Calls (PDF) |
CS-2015-08 | ||||
---|---|---|---|---|
Title |
CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems | |||
Authors |
Sharon Choy, Bernard Wong, Xu Cui, and Xiaoyi Liu | |||
Abstract |
Current transaction systems for geo-distributed datastores either have high transaction processing latencies or are unable to support general transactions with dependent operations. In this paper, we introduce CrossStitch, an efficient transaction processing framework that reduces latency by restructuring each transaction into a chain of state transitions, where each state consists of a key operation and computation. Transaction states are processed sequentially, and the transaction code and data is sent directly to the next hop in the chain. CrossStitch transactions can be organized such that all states in a location are processed before transitioning to a state in a different location. This allows CrossStitch to significantly reduce the number of inter-location crossings compared to transaction systems that retrieve remote data to a single location for processing. To provide transactional properties while preserving the chain communication pattern, CrossStitch introduces a pipelined commit protocol that executes in parallel with the transaction and does not require any centralized co-ordination. Our evaluation results show that CrossStitch can reduce the latency of geo-distributed transactions when compared to a traditional 2PC-based distributed transaction system. We demonstrate that CrossStitch can reduce the number of round trips by more than half for TPC-C-like transactions. | |||
Date | July 28, 2015 | |||
Report | CrossStitch: An Efficient Transaction Processing Framework for Geo-Distributed Systems (PDF) |
CS-2015-09 | ||||
---|---|---|---|---|
Title | Nessie: A Decoupled, Client-Driven, Key-Value Store using RDMA | |||
Authors |
Tyler Szepesi, Benjamin Cassell, Bernard Wong, Tim Brecht, Xiaoyi Liu | |||
Abstract |
The
increasing
use
of
key-value
storage
systems
in
performance-critical
data
centre
applications
has
moti-
vated
new
storage
system
designs
that
use
Remote
Di-
rect
Memory
Access
(RDMA)
to
reduce
communication
overhead.
However,
existing
approaches
that
achieve
low
latency
and
high
throughput
do
so
by
dedicating
en-
tire
cores
to
RDMA
message
handling,
including
polling
local
memory
for
incoming
RDMA
messages.
In
this
paper
we
describe
and
demonstrate
why
polling-based
RDMA
is
not
suitable
for
many
data
cen-
tre
applications
with
significant
data
processing
and
data
storage
requirements.
We
then
propose,
design,
implement
and
evaluate
an
alternative
communication
approach
that
strictly
uses
one-sided
RDMA
opera-
tions,
which
eliminates
polling
on
the
server-side
and
is
therefore
a
serverless
(client-driven)
design.
This
approach
is
used
to
build
Nessie,
a
distributed
client-
driven
RDMA-enabled
key-value
store
that
uses
na-
tive
RDMA
compare-and-swap
operations
to
co-ordinate
conflicting
operations
between
clients.
Nessie
further
reduces
communication
requirements
by
decoupling
the
key-value
location
from
the
key
lookup
mechanism,
which
enables
local
write
operations,
thus
avoiding
com-
paratively
expensive
RDMA
operations.
Nessie’s
one-
sided
design
ensures
that
only
the
clients
in
the
system
consume
any
CPU,
which
in
turn
allows
it
to
main-
tain
high
performance
despite
the
need
for
computa-
tion
by
the
application
utilizing
the
key-value
store
or
contention
from
other
applications.
| |||
Date | July 22, 2015 | |||
Report | Nessie: A Decoupled, Client-Driven, Key-Value Store using RDMA (PDF) |
CS-2015-10 | ||||
---|---|---|---|---|
Title |
Mayflower:
Improving
Distributed
Filesystem
Performance Through SDN/Filesystem Co-Design | |||
Authors |
Sajjad Rizvi, Xi Li, Bernard Wong, Xiao Cao, and Benjamin Cassell | |||
Abstract |
The increasing prevalence of oversubscribed networks and fast solid-state storage devices in the datacenter has made the network the new performance bottleneck for many distributed filesystems. As a result, distributed filesystems need to be network-aware in order to make more effective use of available network resources. In this paper, we introduce Mayflower, a new distributed filesystem that is not only network-aware, it is co-designed from the ground up to work together with a network control plane. In addition to the standard distributed filesystem components, Mayflower has a flow monitor and manager running inside a software-defined networking controller. This tight coupling with the network controller enables Mayflower to make intelligent replica selection and flow scheduling decisions based on both filesystem and network information. It also enables Mayflower to perform global optimizations that are unavailable to conventional distributed filesystems and network control planes. Our evaluation results from both simulations and a prototype implementation show that Mayflower reduces average read completion time by more than 25% compared to current state-of-the-art distributed filesystems with an independent network flow scheduler, and more than 80% compared to HDFS with ECMP. | |||
Date | July 23, 2015 | |||
Report | Mayflower: Improving Distributed Filesystem Performance Through SDN/Filesystem Co-Design (PDF) |
CS-2015-11 | ||||
---|---|---|---|---|
Title | Lanternfish: Better Random Networks Through Optics | |||
Authors |
Tyler Szepesi, Bernard Wond, Fiodar Kazhami, Sajjad Rizvi, and Tim Brecht | |||
Abstract | Current random datacenter network designs, such as Jellyfish, directly wire top-of-rack switches together randomly, which is difficult even when using best-practice cable management techniques. Moreover, these are static designs that cannot adapt to changing workloads. In this paper, we introduce Lanternfish, a new approach to building random datacenter networks using an optical ring that significantly reduces wiring complexity and provides the opportunity for reconfigurability. Unlike previous optical ring designs, Lanternfish does not require wavelength planning because it is specifically designed to provide random connectivity between switches. This design choice further reduces the difficulty of deploying a Lanternfish network, making random datacenter networks more practical. Our experimental results using both simulations and network emulation show that Lanternfish can effectively provide the same network properties as Jellyfish. Additionally, we demonstrate that by replacing passive optical components with active optical components at each switch, we can dynamically reconfigure the network topology to better suit the workload while remaining cost competitive. Lanternfish is able to construct workload specific topologies that provide as much as half the average pathlength than a Jellyfish deployment with twice as many switch-to-switch connections. | |||
Date | July 23, 2015 | |||
Report | Lanternfish: Better Random Networks Through Optics (PDF) |
CS-2015-13 | ||||
---|---|---|---|---|
Title |
A Modular Notation for Monitoring Network Systems | |||
Authors |
Prashant Raghav and Richard Tefler | |||
Abstract |
Design of next generation network systems with predictable behaviour in all situations poses a significant challenge. Monitoring of events happening at different points in a distributed environment can detect the occurrence of events that indicates significant error conditions. We use Modular Timing Diagrams (MTD) as a specification language to describe these error conditions. MTDs are a component-oriented and compositional notation. We take advantage of these features of MTDs and point out that, in many cases, a global MTD specification describing behaviours of several system components can be efficiently decomposed into a set of sub-specifications. Each of the sub-specifications describes a local monitor that is specific to the component on which the monitor is intended to run. We illustrate the compositional nature of MTDs in describing several network monitoring conditions related to network security. | |||
Date | July 16, 2015 | |||
Report | A Modular Notation for Monitoring Network Systems (PDF) |
CS-2015-14 | ||||
---|---|---|---|---|
Title |
Representing Behavioural Models with Rich Control Structures in SMT-LIB | |||
Authors |
Nancy A. Day and Amirhossein Vakili | |||
Abstract |
We motivate and present a proposal for how to represent extended nite state machine behavioural models with rich hierarchical states and compositional control structures (e.g., the Statecharts family) in SMT-LIB. Our goal with such a representation is to facilitate automated deductive reasoning on such models, which can exploit the structure found in the control structures. We present a novel method that combines deep and shallow encoding techniques to describe models that have both rich control structures and rich datatypes. Our representation permits varying semantics to be chosen for the control structures recognizing the rich variety of semantics that exist for the family of extended nite state machine languages. We hope that discussion of these representation issues will facilitate model sharing for investigation of analysis techniques. | |||
Date | September 1, 2015 | |||
Report |
Representing Behavioural Models with Rich Control Structures in SMT-LIB (PDF) |
CS-2015-15 | ||||
---|---|---|---|---|
Title | A Qualitative Exploratory Study of How OSS Developers Define Code Review Quality | |||
Authors |
Oleksii Kononenko and Olga Baysal | |||
Abstract |
In a large, long-lived project, an effective code review process is key to ensuring the long-term quality of the code base. In this work, we study code review practices of a large, open source project, and we investigate how the developers themselves perceive code review quality. We present a qualitative study that summarizes the results from a survey of 88 Mozilla core developers. The results provide developer insights into how they define review quality, what factors contribute to how they evaluate submitted code, and what challenges they face when performing review tasks. We found that the review quality is primarily associated with the thoroughness of the feedback, the reviewer’s familiarity with the code, and the perceived quality of the code itself. Also, we found that while different factors are perceived to contribute to the review quality, reviewers often find it difficult to keep their technical skills up-to-date, manage personal priorities, and mitigate context switching. | |||
Date | September 11, 2015 | |||
Report | A Qualitative Exploratory Study of How OSS Developers Define Code Review Quality (PDF) |
CS-2015-16 | ||||
---|---|---|---|---|
Title |
Feedback-Directed Instrumentation for Deployed JavaScript Applications | |||
Authors |
Magnus Madsen, Frank Tip, Esben Andreasen, Koushik Sen, Anders Moller | |||
Abstract |
Many bugs in JavaScript applications manifest themselves as objects that have incorrect property values when a failure occurs. For this type of error, stack traces and log files are often insufficient for diagnosing problems. In such cases, it is helpful for developers to know the control flow path from the creation of an object to a crashing statement. Such crash paths are useful for understanding where the object originated and whether any properties of the object were corrupted since its creation. We present a feedback-directed instrumentation technique for computing crash paths that allows the instrumentation overhead to be distributed over a crowd of users and to re- duce it for users who do not encounter the crash. We imple- mented our technique in a tool, Crowdie, and evaluated it on 10 real-world issues for which error messages and stack traces are insufficient to isolate the problem. Our results show that feedback-directed instrumentation requires 5% to 25% of the program to be instrumented, that the same crash must be observed three to ten times to discover the crash path, and that feedback-directed instrumentation typically slows down execution by a factor 2x–9x compared to 8x–90x for an approach where applications are fully instrumented. | |||
Date | September 8, 2015 | |||
Report |
Feedback-Directed Instrumentation for Deployed JavaScript Applications (PDF) |
CS-2015-17 | ||||
---|---|---|---|---|
Title |
NoSE: Schema Design for NoSQL Applications | |||
Author |
Michael J. Mior, Kenneth Salem, Ashraf Aboulnaga | |||
Abstract |
Database
design
is
critical
for
high
performance
in
relational
databases
and
many
tools
exist
to
aid
application
designers
in
selecting
an
appropriate
schema.
While
the
problem
of
schema
optimization
is
also
highly
relevant
for
NoSQL
databases,
existing
tools
for
relational
databases
are
inadequate
for
this
setting.
Application
designers
wishing
to
use
a
NoSQL
database
instead
rely
on
rules
of
thumb
to
select
an
appropriate
schema.
We
present
a
system
for
recommending
database
schemas
for
NoSQL
applications.
Our
cost-based
approach
uses
a
novel
binary
integer
programming
formulation
to
guide
the
mapping
from
the
application's
conceptual
data
model
to
a
database
schema.
We
implemented
a
prototype
of
this
approach
for
the
Cassandra
extensible
record
store.
Our
prototype,
the
NoSQL
Schema
Evaluator
(NoSE)
is
able
to
capture
rules
of
thumb
used
by
expert
designers
without
explicitly
encoding
the
rules.
Automating
the
design
process
allows
NoSE
to
produce
efficient
schemas
and
to
examine
more
alternatives
than
would
be
possible
with
a
manual
rule-based
approach.
| |||
Date | October 19, 2015 | |||
Report |
CS-2015-18 | ||||
---|---|---|---|---|
Title |
Comparison of Affect-Control Theoretic Agents and Humans in the Prisoner's Dilemma | |||
Author |
Joshua D. A. Jung, Jesse Hoey, Jonathan H. Morgan, Tobias Schröder and IngoWolf | |||
Abstract |
Symbolic
interactionist
principles
of
sociology
are
based
on
the
idea
that
human
action
is
guided
by
culturally
shared
symbolic
representations
of
identities,
behaviours,
situations
and
emotions.
Shared
linguistic,
paralinguistic,
or
kinesic
elements
allow
humans
to
co-ordinate
action
by
enacting
identities
in
social
situations.
Structures
of
identity-based
interactions
can
lead
to
the
enactment
of
social
orders
that
solve
social
dilemmas
(e.g.,
by
promoting
co-operation).
Our
goal
is
to
build
an
artificial
agent
that
mimics
the
identity-based
interactions
of
humans,
and
to
compare
networks
of
such
agents
to
human
networks.
In
this
paper,
we
take
a
first
step
in
this
direction,
and
describe
a
study
in
which
humans
played
a
repeated
prisoner's
dilemma
game
against
other
humans,
or
against
one
of
three
artificial
agents
(bots).
One
of
the
bots
has
an
explicit
representation
of
identity
(for
self
and
other),
and
attempts
to
optimise
with
respect
to
this
representation.
We
compare
the
human
play
against
bots
to
human
play
against
humans,
and
show
how
the
identity-based
bot
exhibits
the
most
human-like
behaviour.
| |||
Date | November 2, 2015 | |||
Report |
Comparison of Affect-Control Theoretic Agents and Humans in the Prisoner's Dilemma (PDF) |
CS-2015-19 | ||||
---|---|---|---|---|
Title |
Big
Open
Data
for
Environmental
Information | |||
Author |
Donald Cowan, Paulo Alencar, Fred McGarry, R. MarkPalmer, Trevor Boston, Rigel Rozanski | |||
Abstract |
The frequency of extreme weather events has accelerated, an apparent outcome of progressive climate change. Excess water is a significant consequence of these events and is now the leading cause of insurance claims for infrastructure and property damage. Governments recognize that plans for growth must reflect communities needs, strengths and opportunities while balancing the cumulative effects of economic growth with environmental concerns. For such a process to be effective it will be necessary to develop and operate cumulative effect decision support software (CEDSS) tools, and to work closely with all levels of government including watershed management authorities (WMAs) that supply environmental data. Such co-operation and sharing will require a new open and big data information-sharing platform, which is described in this paper as an open and big data platform for environmental analysis and management. | |||
Date | December 4, 2015 | |||
Report |