The David R. Cheriton School of Computer Science will hold its annual Cheriton Research Symposium September 28-29, 2017 in the Davis Centre.
This year's symposium will consist of talks by industry leaders and members of the School.
Posters by David R. Cheriton Graduate Scholarship recipients will be on display in the Great Hall, Davis Centre from 10:00 am to 3:00 pm, daily.
Thursday, September 28, 2017
DC 1302 - Mark Giesbrecht — Welcome and Opening Remarks
DC 1302 - David R. Cheriton, Stanford University
The Age of Incompetence and Software Evolution
Human beings have become increasingly incompetent to perform almost any useful task because of automation, and this trend is destined to continue, if not accelerate. You would not hire a person to do a job if they are 10 times slower at it, can work less than a 1/4 of the time, make far more mistakes and cost you far more than an alternative hire; the comparison to automation is even more extreme. Realistically, humans are pathetic and now are becoming dangerous. The monkeys could run the jungle, but not competent to run a zoo, and dangerous if they have any control over an automated zoo. How can human civilization survive this age as we transition to full automation? But wait, don’t these humans write the software that is at the core of automation?
I claim we are rapidly moving to a model in which software evolves in the Darwinian sense, and is not “written” in the conventional sense. Humans feed in software mutations and the environment decides which mutations survive, and which don’t. Thus, us pathetic human beings don’t have to be smarter than the software to make the software smarter. I will talk about some approaches I see for better software evolution.
|DC 1302 - Steve Woods and Verna Friesen, Google Waterloo|
Friday, September 29, 2017
DC 1302 - Matei Zaharia, Stanford University
Composable Parallel Processing in Apache Spark and Weld
Giving every developer easy access to modern, massively parallel hardware, whether at the scale of a datacenter or a single modern server, remains a daunting challenge. In this talk, I’ll cover one powerful weapon we can use to meet this challenge: enabling efficient *composition* of parallel programs. Composition is arguably the main way developers are productive writing software, but unfortunately, it has taken a back seat in the design of many parallel processing APIs. For example, composing MapReduce jobs required writing data to files between each job, which was slow and error-prone, and many single-machine parallel libraries face similar problems. I’ll show how composability enabled much higher productivity in the Apache Spark API, and how this idea has been taken much further in recent versions of Spark with “structured” APIs such as DataFrames and Spark SQL. In addition, I’ll discuss Weld, a research project at Stanford that aims to enable much more efficient composition between parallel libraries on a single server (either for the CPU and GPU). We show that the traditional way of composing libraries in this setting, through function calls that exchange data through memory, can create order-of-magnitude slowdowns. In contrast, Weld can transparently speed up applications using libraries such as NumPy, Pandas and TensorFlow by up to 30x through a novel API that lets it optimize across the library calls used in each program.
|11:00 a.m.||Poster Session - David R. Cheriton Graduate Scholarship recipients|
|1:00 p.m.||Poster Session- David R. Cheriton Graduate Scholarship recipients|
DC 1302 - Tim Brecht - University of Waterloo
Understanding and Improving 802.11 (WiFi) Network Performance
Forecasts predict that 3 billion WiFi devices will be shipped this year and more than 9 billion devices will be in use by the end of the year. These large numbers are driven by smart phones, tablets, laptops, entertainment devices, and a growing number of devices used in the Internet of Things. Despite the introduction of 4G and 5G cellular data technologies, 802.11 (WiFi) networks are the dominant technology used for wireless communications on mobile devices. In 2016, 60% of potential cellular network traffic was off-loaded to WiFi networks and this number is expected to increase.
Obtaining peak throughput in 802.11 networks depends on software that chooses the combination of physical layer features (the transmission data rate) best suited for the current channel conditions. For 802.11n and 802.11ac devices there can be up to 128 and 640 combinations, respectively. The goal, for each packet, is to choose the combination that maximizes throughput by trading off high transmission data rates with low frame error rates.
In this talk, I will describe research we are conducting to characterize, better understand, and improve the operation of WiFi networks. We have found that interesting relationships exist between the large number of transmission data rates that can be chosen when sending data. Some exciting and interesting properties of these relationships include: (1) they persist even when mobile devices create highly variable channel conditions; (2) they may change overtime; (3) despite such changes, relationships have been found to exist over periods of up to one hour. After describing these relationships and our findings, I will show some results from an example application of this work, describe other potential implications of this research, and outline several compelling avenues for future work.
This is joint work with Ali Abedi.
DC 1302 - Shai Ben-David - University of Waterloo
Clustering - what both theoreticians and practitioners are doing wrong
Unsupervised learning is widely recognized as one of the most important challenges facing machine learning nowadays.However, in spite of hundreds of papers on the topic being published every year, current theoretical understanding and practical implementations of such tasks,and in particular of clustering, is very rudimentary.
My talk will focus on clustering. The first challenge I will address is model selection - how should a user pick an appropriate clustering tool for a given clustering problem, and how should the parameters of such an algorithmic tool be tuned? In contrast with other common computational tasks, in clustering, different algorithms often yield drastically different outcomes. Therefore, the choice of a clustering algorithm may play a crucial role in the usefulness of an output clustering solution. However, currently there exists no methodical guidance for clustering tool selection for a given clustering task. I will explain the severity of this problem and describe some recent proposals aiming to address this crucial lacuna.
The second aspect of clustering that I will address is the complexity of computing a cost minimizing clustering (given some clustering objective function). While most of the clustering objective optimization problems are computationally infeasible, they are being carried out routinely in practice.This theory-practice gap has attracted significant research attention recently. I will survey some of the theoretical attempts to address this gap and discuss how close do they bring us to a satisfactory understanding of the computational resources needed for achieving good clustering solutions.
Poster session winner will be announced after the last talk of the day.