2010 technical reports

CS-2010-01
Title	Points-To Analysis with Efficient Strong Updates
Authors	Ondrej Lhotak and Kwok-Chiang Andrew Chung
Abstract	This paper explores a sweet spot between flow-insensitive and flow-sensitive subset-based points-to analysis. Flow-insensitive analysis is efficient: it has been applied to million-line programs and its worst-case requirements are quadratic space and cubic time. Flow-sensitive analysis is precise because it allows strong updates so that points-to relationships holding in one program location can be removed from the analysis when they no longer hold in other locations. We propose a "Strong Update" analysis combining both features: it is efficient like flow-insensitive analysis, with the same worst-case bounds, yet its precision benefits from strong updates like flow-sensitive analysis. The key enabling insight is that strong updates are applicable when the dereferenced points-to set is a singleton, and a singleton set is cheap to analyze. The analysis therefore focuses flow sensitivity on singleton sets. Larger sets, which will not lead to strong updates, are modelled flow insensitively to maintain efficiency. We have implemented and evaluated the analysis as an extension of the standard flow-insensitive points-to analysis in the LLVM compiler infrastructure.
Date	Feb 1, 2010
Report	Points-To Analysis with Efficient Strong Updates (PDF)

CS-2010-02
Title	Distributed XML Query Processing: Fragmentation, Localization and Pruning
Authors	Patrick Kling, M. Tamer Özsu, Khuzaima Daudjee
Abstract	Distributing data collections by fragmenting them is an effective way of improving the scalability of a database system. While the distribution of relational data is well understood, the unique characteristics of XML data and its query model present challenges that require different distribution techniques. In this paper, we show how XML data can be fragmented horizontally and vertically. Based on this, we propose solutions to two of the problems encountered in distributed query processing and optimization on XML data, namely localization and pruning. Localization takes a fragmentation-unaware query plan and converts it to a distribued query plan that can executed at the sites that hold XML data fragments in a distributed system. We then show how the resulting distributed query plan can be pruned so that only those sites are accessed that can contribute to the query result. We demonstrate that our techniques can be integrated into a real-life XML database system and that they significantly improve the performance of distributed query execution.
Date	Sept 14, 2010
Report	Distributed XML Query Processing: Fragmentation, Localization and Pruning (PDF)

CS-2010-03
Title	Understanding the Effects and Implications of Gesture-Based Interaction for Dynamic Presentations
Authors	Adam Fourney, Michael Terry and Richard Mann
Abstract	Gesture-based interaction has long been seen as a natural means of input for electronic presentation systems. However, gesture-based presentation systems have not been evaluated in real-world contexts. This paper presents the design and evaluation of Maestro, a gesture-based presentation system whose design was informed by observations of realworld practices. To understand the implications of gesturebased interaction, Maestro was deployed in a classroom setting for two weeks. The study results indicate that gestures that support interaction with content are valued most, as opposed to those that support slide navigation. Notably, past systems have only used gestures for slide navigation. Our study also revealed that the presenter would position, orient, and conduct himself in ways to more reliably perform gestures and perceive system feedback, and to avoid accidental gesture recognition. However, these behaviors negatively impacted presentation dynamics. Collectively, these results outline clear directions for future research.
Date	Feb 11, 2010
Report	Understanding the Effects and Implications of Gesture-Based Interaction for Dynamic Presentations (PDF)

CS-2010-04
Title	Incorporating Trust in Network Virtualization
Authors	Loubna Mekouar, Youssef Iraqi and Raouf Boutaba
Abstract	In this paper, we propose a trust management framework for network virtualization environments. The proposed framework helps distinguish among the infrastructure providers based on their past experiences and feedbacks from service providers. We describe the main components involved in gathering the feedbacks and managing the reputation data. We detail the proposed trust system and we define the concept of degree of involvement of an infrastructure provider in terms of nodes and links. We also perform a mathematical analysis of the proposed system. Simulation results confirm the effectiveness of the proposed trust management system in increasing the service providers’ satisfaction and reducing the selection of infrastructure providers that do not fulfill their Service Level Agreement (SLA).
Date	Mar 4, 2010
Report	Incorporating Trust in Network Virtualization (PDF)

CS-2010-05
Title	An End-user Domain Specific Model to Drive Dynamic User Agents Adaptations
Authors	Ingrid Oliveira de Nunes, Simone Diniz Junqueira Barbosa, Carlos José Pereira de Lucena
Abstract	Modelling automated user tasks based on agent-oriented approaches is a promising but challenging task. Personalized user agents have been investigated as a potential way of addressing this issue. Most of recent research work has focused on learning, eliciting and reasoning about user preferences and profiles. In this paper, our goal is to deal with the engineering of such systems, barely discussed in the literature. In this context, we present a high-level domain specific model whose aim is to give users the power of customizing and dynamically adapting their user agents. In addition, we propose a general architecture to develop agent-based systems able to be customized to users.
Date	Mar 3, 2010
Report	An End-user Domain Specific Model to Drive Dynamic User Agents Adaptations (PDF)

CS-2010-06
Title	Data Driven Smartphone Energy Level Prediction
Authors	Earl Oliver, Prof. S. Keshav
Abstract	The body of mobile applications is growing at a near-exponential rate; many applications are increasing in both scale, complexity, and their demand for energy. The energy density of smartphone batteries is increasing at a comparably insignificant rate, and thus inhibits the practicality of these applications. Despite the importance of energy to mobile applications, energy is rarely considered by mobile applications developers primarily due to lack of knowledge about users and the absence of energy-aware developer tools. We conduct a large-scale user study to measure the energy consumption characteristics of 15500 BlackBerry smartphone users. Our dataset is several orders of magnitude larger than any previous work. Using this dataset we build the Energy Emulation Toolkit (EET) that allows developers to evaluate the energy consumption requirements of their applications against real user energy traces. The EET computes the successful execution rate of energy-intensive applications across all users, specific devices, and specific smartphone user types. We also consider active adaptation to energy constraints. By classifying smartphone users based on their charging characteristics we demonstrate that energy level can be predicted within 72% accuracy a full day in advance, and through an Energy Management Oracle energy intensive applications can adapt their execution to achieve a near optimal successful execution rate.
Date	May 15, 2010
Report	Data Driven Smartphone Energy Level Prediction (PDF)

CS-2010-07
Title	Modeling Context-Aware Distributed Event-Based Systems
Authors	Eduardo S. Barrenechea, Rolando Blanco, and Paulo Alencar
Abstract	Emerging applications are becoming increasingly dynamic, adaptive and context-aware in areas such as just-in-time location-based m-commerce, situational health monitoring, and dynamic social networking collaboration. Although numerous systems and implementation infrastructures have been proposed to deal with some features of such systems, there is a lack of higher-level modelling abstractions and associated component metamodels that combine dynamic features, adaptation mechanisms, and contextawareness. In this report we propose a metamodel for contextaware distributed event-based systems that supports novel forms of event-based context-aware interactions such as context event schemas, component interfaces that react to context, event and context interactions at the interface level and subscriptions based on both events and context. Our approach also supports requirements for context-aware systems proposed in the literature. The applicability of the approach is illustrated by showing how existing context-aware systems can be modeled using our metamodel.
Date	April 12, 2010
Report	Modeling Context-Aware Distributed Event-Based Systems (PDF)

CS-2010-08
Title	Reducing Waste in Data Center Network Upgrades
Authors	Andrew R. Curtis, S. Keshav, and Alejandro Lopez-Ortiz
Abstract	Fundamental limitations of traditional data center network architectures have led to the development of architectures that provide enormous bisection bandwidth for up to hundreds of thousands of servers. To implement any of these solutions, a legacy data center operator usually must replace most switches in their network—a tremendously wasteful proposition in terms of capital costs, energy used in manufacturing, and raw materials. To enable such upgrades, we introduce a construction for the first data center network topology that supports heterogeneous switches while providing guarantees on bisection bandwidth and using optimal link capacity. We finish by describing an algorithm that finds the minimum cost upgrade path for an existing network to support a given traffic load.
Date	April 12, 2010
Report	Reducing Waste in Data Center Network Upgrades (PDF)

CS-2010-09
Title	Factored HMMs for Bimanual, Context-Dependent Gestures
Authors	Adam Fourney, Richard Mann, Michael Terry
Abstract	As we expand our use of hand gestures for interacting with computational devices, the spatial context in which gestures are performed becomes an increasingly important feature for interpreting user intent. In this paper, we demonstrate how spatial context, and bimanual co-ordinated hand motion, can be efficiently modeled using a factored hidden Markov model. This factorization, guided by topological constraints relating to feature extraction, reduces the number of modelling parameters by two orders of magnitude compared to an unfactored model. We used these factored models when constructing a gesture-based presentation system called Maestro. We then performed a series of experiments to evaluate the performance of Maestro’s recognizer with and without the spatial and bimanual context features.
Date	May 28, 2010
Report	Factored HMMs for Bimanual, Context-Dependent Gestures (PDF)

CS-2010-10
Title	Feature and Class Models in Clafer:Mixed, Specialized, and Coupled
Authors	Kacper Bak, Krzysztof Czarnecki, and Andrzej Wasowski
Abstract	We present Clafer, a class modelling language with first-class support for feature modelling.We designed Clafer as a concise notation for class models, feature models, mixtures of class and feature models (such as components with options), and models that couple feature models and class models via constraints (such as mapping feature configurations to component configurations). Clafer also allows arranging models into multiple specialization and extension layers via constraints and inheritance. We identified four key mechanisms allowing a class modelling language to express feature models concisely and show that Clafer meets its design objectives using a sample product line.
Date	June 25, 2010
Report	Feature and Class Models in Clafer:Mixed, Specialized, and Coupled (PDF)

CS-2010-11
Title	On-the-Fly Counterexample Abstraction for Model Checking Invariants
Authors	Alma L. Juarez-Dominguez, Nancy A. Day
Abstract	Verification of invariants using model checking can find errors, inconsistencies, and contradictions in a model. In some applications, it is beneficial to see at once all paths of the model in which the invariant fails. Thus, instead of the traditional cycle of find bug – fix bug – re-run model checker, we would like to produce all counterexamples prior to fixing the bug. However, the set of all counterexamples is often too large to generate or comprehend. We define a series of abstractions that each represent the complete set of counterexamples in a reduced set of equivalence classes. Our abstractions are based on the control states and transitions of an extended finite state machine (EFSM) model. We show how our abstractions can be represented as LTL properties and, therefore, reduce the set of counterexamples on-the-fly during model checking.
Date	July 5, 2010
Report	On-the-Fly Counterexample Abstraction for Model Checking Invariants (PDF)

CS-2010-12
Title	LEGUP: Using Heterogeneity to Reduce the Cost of Data Center Network Upgrades
Authors	Andrew R. Curtis, S. Keshav and Alejandro Lopez-Ortiz
Abstract	Fundamental limitations of traditional data center network architectures have led to the development of architectures that provide enormous bisection bandwidth for up to hundreds of thousands of servers. Because these architectures rely on homogeneous switches, implementing one in a legacy data center usually requires replacing most existing switches. Such forklift upgrades are typically prohibitively expensive; instead, a data center manager should be able to selectively add switches to boost bisection bandwidth. Doing so adds heterogeneity to the network’s switches and heterogeneous high-performance interconnection topologies are not well understood. Therefore, we develop the theory of heterogeneous Clos networks. We show that our construction needs only as much link capacity as the classic Clos network to route the same traffic matrices and this bound is the optimal. Placing additional equipment in a highly constrained data center is challenging in practice, however. We propose LEGUP to design the topology and physical arrangement of such network upgrades or expansions. Compared to current solutions, we show that LEGUP finds network upgrades with more bisection bandwidth for half the cost. And when expanding a data center iteratively, LEGUP’s network has 265% more bisection bandwidth than an iteratively upgraded fat-tree.
Date	July 7, 2010
Report	LEGUP: Using Heterogeneity to Reduce the Cost of Data Center Network Upgrades (PDF)

CS-2010-13
Title	Recognizing handwritten mathematics via fuzzy parsing
Authors	Scott MacLean and George Labahn
Abstract	We present a new approach to multi-dimensional parsing using relational grammars and fuzzy sets. A fast, incremental parsing algorithm is developed, motivated by the two-dimensional structure of written mathematics. Our approach makes no hard decisions; recognized math expressions are reported to the user in ranked order. A exible correction mechanism enables users to quickly choose the correct math expression in case of recognition errors or ambiguity.
Date	June 13, 2010
Report	Recognizing handwritten mathematics via fuzzy parsing (PDF)

CS-2010-14
Title	Talk is Cheap(er): Mitigating DoS and Byzantine Attacks in Sensor Networks
Authors	Valerie King, Jared Saia and Maxwell Young
Abstract	Sensor networks are extremely vulnerable to denial-of-service (DoS) attacks due to their shared communication medium and the constrained power supply of the devices. We examine the scenario where one player attempts to send a message m to another player in a single-channel network. Assume an adversary that jams for T instances where T is unknown to either player and where cost is measured in terms of energy expenditure. We give a protocol guaranteeing delivery of m while the ratio of either player's expected cost to the adversary's cost is O(T^(varphi-2)) ~ O(T0.382) where varphi= (1+ sqrt(5))/2 is the golden ratio. Our result implies that to prevent communication of m, the adversary incurs an asymptotically higher cost than the expected cost incurred by either correct player. This property holds even when the receiver suffers a Byzantine fault. We extend our analysis to multiple receivers and later, to a multi-hop setting where both senders and receivers can suffer Byzantine faults. Our work pertains to a wide variety of adversaries that are energy constrained. Notably, we can tolerate an adaptive adversary who launches attacks using knowledge of the past actions of all players. Moreover, in networks with a sufficient amount of communication traffic, we can tolerate a reactive adversary who may detect a transmission in the current time slot and then decide to jam. Finally, we apply our result to the fundamental network communication problem of reliable broadcast which deals with conveying a message from one player to all other players in the network. In a popular sensor network grid model, we give a reliable broadcast protocol that, in comparison to previous work in the grid, offers improved resilience to DoS attacks.
Date	November 8, 2010
Comments	Originally posted August 3, 2010 updated November 8, 2010 - three minor typos corrected
Report	Talk is Cheap(er): Mitigating DoS and Byzantine Attacks in Sensor Networks (PDF)

CS-2010-15
Title	Paging for Multicore (CMP) Caches
Authors	Alejandro Lopez-Ortiz and Alejandro Salinger
Abstract	In the last few years, multicore processors have become the dominant processor architecture. While cache eviction policies have been widely studied both in theory and practice for sequential processors, in the case in which various simultaneous processes use a shared cache the performance of even the most common eviction policies is not yet fully understood, nor do we know if current best strategies in practice are optimal. In particular, there is almost no theoretical backing for the use of current eviction policies in multicore processors. Recently, a work by Hassidim [14] initiated the theoretical study of cache eviction policies for multicore pro cessors under the traditional competitive analysis, showing that LRU is not competitive against an offline policy that has the power of arbitrarily delaying requests sequences to its advantage. In this paper we study caching under the more conservative model in which requests must be served as they arrive. We perform a thorough all-to-all comparison of strategies providing lower and upper bounds for the ratios between performances. We show that if the cache is partitioned, the partition policy has a greater influence on performance than the eviction policy. On the other hand, we show that sharing the cache among cores is in general better than partitioning the cache, unless the partition is dynamic and changes frequently, in which case shared cache strategies and dynamic partitions are essentially equivalent when serving disjoint requests.
Date	July 29, 2010
Report	Paging for Multicore (CMP) Caches (PDF)

CS-2010-16
Title	Autonomous Thermal Control System
Authors	Omid Ardakanian
Abstract	Building heating and cooling accounts for nearly 18-24% of all energy usage. Building energy management can reduce both its operating costs and its carbon footprint. We propose the use of a factored Partially Observable Markov Decision Process (POMDP) to model and efficiently control a commercial building heating system. We use supervised learning and ground truth data to capture the parameters of the POMDP. We show that our solution outperforms the HVAC system in terms of energy efficiency.
Date	October 5, 2010
Report	Autonomous Thermal Control System (PDF)

CS-2010-17
Title	Direct Adaptive Control of Electricity Demand
Authors	S. Keshav and C. Rosenberg
Abstract	The legacy electrical grid upper-bounds a customer’s energy demand using a circuit breaker. The breaker is conservatively sized so that it rarely ‘trips,’ interrupting supply and inconveniencing the customer. This results in a system whose peak load can be much higher than its average load. Such a system can be made reliable only by being provisioned, at great cost, for infrequent peaks. The goal of demand management is to reduce the peak load and the resulting infrastructure provisioning cost. The most widely used such approach is to price a unit of energy higher during peak times, giving an incentive to consumers to reduce their demand during these times. Although this does reduce the peak load by 2-10%, this ‘human in the loop’ approach is onerous and frequently ineffective. Instead, we propose a radically different approach that draws upon the Internet traffic management approaches of proactive and reactive congestion control for fine-grained management of customer demand without human intervention. We show that this direct adaptive control of electricity demand can significantly reduce the peak load and therefore infrastructure costs. Moreover, it allows individual users to place a greater load than in the existing system, as long as it does not hurt other customers or the grid itself, thus increasing revenues. We believe that our approach has the potential to avert grid congestion, reduce capital costs, and eliminate a portion of the carbon footprint of the grid.
Date	September 27, 2010
Report	Direct Adaptive Control of Electricity Demand (PDF)

CS-2010-18
Title	Characterizing the Usability of Interactive Applications Through Query Log Analysis
Authors	Adam Fourney, Richard Mann and Michael Terry
Abstract	People routinely rely on Internet search engines to support their use of interactive applications, making query logs a rich source of data cataloguing the day-to-day tasks and needs of a user base. In this paper, we introduce an automated process for harvesting, ordering, labeling, and grouping search queries related to any publicly available interactive system. The end result is a data set that can complement and augment data collected through traditional usability methods. We call this process CUTS—characterizing usability through search. The labeled, ordered data produced by CUTS can be assembled in minutes, is timely, has a high degree of ecological validity, and is arguably less prone to self-selection bias than traditional usability methods. We describe this process and demonstrate applications of its use with a handful of interactive systems.
Date	September 28, 2010
Report	Characterizing the Usability of Interactive Applications Through Query Log Analysis (PDF)

CS-2010-19
Title	Understanding How Users Express Preferences: a User Study
Authors	Ingrid Oliveira de Nunes, Simone Diniz Junqueira Barbosa and Carlos José Pereira de Lucena
Abstract	With the exponential growth of information available on the web and our 24/7 availability through social networks, instant messengers and e-mail, people are facing the challenge of processing huge amounts of data and playing multiple roles at the same time. Personal Assistance Software (PAS) aims at aiding users in these tasks from making recommendations to acting on their behalf. Even though extensive research has been done on improving such systems, little attention has been given to their acceptance by users. With the goal of building users' trust on PAS, we aim at developing an enduser language in order to empowering users to control and instruct their PAS to provide task delegation. However, such language would be meaningless if users are not able to adequately express their preferences. This paper presents a user study whose goal was to provide a deeper understanding of how users express their preferences. Seven research questions are investigated, including how the knowledge about a domain inuences the expression of preferences and how users change them after being exposed to decisionmaking situations. This study allowed us to identify kinds of support users need to better express their preferences so that a system can be able to act on their behalf.
Date	October 14, 2010
Report	Understanding How Users Express Preferences: a User Study (PDF)

CS-2010-20
Title	Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency?
Authors	Fathiyeh Faghih and Daniel G. Brown
Abstract	Answer set programming (ASP) approach is an efficient approach for solving the maximum quartet consistency problem. We distinguish two factors that affect the efficiency of a recent ASP approach for solving maximum quartet consistency problem; answer set programming itself and a variety of preprocessing steps. In this paper, we propose a method for applying one of the preprocessing steps used in the ASP approach to an alternative algorithm. Our results show that the preprocessing step gives the main contribution to efficiency of the ASP approach for data sets with less than 20 taxa. We also identify an issue in the ASP approach for solving the maximum quartet consistency problem. Keywords: maximum quartet consistency, answer set programming, edge detection strategy, efficiency
Date	November 3, 2010
Report	Answer Set Programming or Hypercleaning: Where does the Magic Lie in Solving Maximum Quartet Consistency? (PDF)

CS-2010-21
Title	Orthogonal cartograms with few corners per face
Authors	Therese Biedl and Lesvia Elena Ruiz Vel´azquez
Abstract	We give an algorithm to create orthogonal drawings of 3-connected 3-regular planar graphs such that each interior face of the graph is drawn with a prescribed area. This algorithm produces a drawing with at most 12 corners per face and four bends per edge, which improves the previous known result of 34 corners per face.
Date	December 3, 2010
Report	Orthogonal cartograms with few corners per face (PDF)

CS-2010-22
Title	Interaction-Aware Scheduling of Report Generation Workloads
Authors	Mumtaz Ahmad , Ashraf Aboulnaga, Shivnath Babu and Kamesh Munagala
Abstract	The typical workload in a database system consists of a mix of multiple queries of different types that run concurrently. Interactions among the different queries in a query mix can have a significant impact on database performance. Hence, optimizing database performance requires reasoning about query mixes rather than considering queries individually. Current database systems lack the ability to do such reasoning. We propose a new approach based on planning experiments and statistical modelling to capture the impact of query interactions. Our approach requires no prior assumptions about the internal workings of the database system or the nature and cause of query interactions; making it portable across systems. To demonstrate the potential of modelling and exploiting query interactions, we have developed a novel interactionaware query scheduler for report-generation workloads. Our scheduler, called QShuffler, uses two query scheduling algorithms that leverage models of query interactions. The first algorithm is optimized for workloads where queries are submitted in large batches. The second algorithm targets workloads where queries arrive continuously, and scheduling decisions have to be made on-line. We report an experimental evaluation of QShuffler using TPC-H workloads running on IBM DB2. The evaluation shows that QShuffler, by modelling and exploiting query interactions, can consistently out- perform (up to 4x) query schedulers in current database systems. Keywords: Business Intelligence, Report generation, Query interactions, Scheduling, Experiment-driven performance modelling, Workload management.
Date	December 16, 2010
Report	Interaction-Aware Scheduling of Report Generation Workloads (PDF)