E2: Filter, aggregate and visualize your data
Objective
The goal is to produce figures similar to the 2 left figures presented in Figure 4 of this paper:
Axel Antoine, Sylvain Malacria, and Géry Casiez. 2017. ForceEdge: Controlling Autoscroll on Both Desktop and Mobile Computers Using the Force. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ‘17). ACM, New York, NY, USA, 3281-3292. DOI: https://doi.org/10.1145/3025453.3025605
Read the paper above. You don’t need to read everything in details: understand how the interaction technique works and understand the experiment design presented in STUDY 1: FORCEEDGE ON DESKTOP section. Note: you will continue to work on the data from this paper in the following weeks.
Filtering and aggregating data
It is common to start by filtering and aggregating data before doing some visualization. Here we want to keep only the trials that were successfully completed. In addition we want to do some data aggregation to keep a subset of the independent variables considered in the experiment, in order to make the graphs.
- Get the data from the first experiment. The CSV file provides columns with the
participant
number,task
(Select, Move),technique
(ForceEdge, Baseline),block
number, draggingdistance
,repetition
number,trial
number (the participant had to successfully complete a trial before moving to the next one), trial completiontime
in seconds, theovershoot
distance in lines, and, last, thesuccess
or error for the trial. - Load the data in R
- Use the
filter
command from dplyr package to keep only the data wheresuccess
is True (it should remain 2880 trials). - Now we want to aggregate data to compute the mean time for each task, technique and distance. Use the
group_by
andsummarise
commands from dplyr. At this point you should obtain a dataframe 12 lines (also called observations in R). In addition to compute the mean values, we want to compute the 95% confidence intervals for the means (error bars). In the summarise command, in addition to computing the mean, compute also the lower and upper values for the confidence intervals using theci
command available in gmodels package
Visualizing data
- Read chapter 3 of Modern Statistical Methods for HCI and play with the examples provided.
- Using ggplot2 and the data previously aggregated, replicate as faithfully as possible the 2 left figures presented in Figure 4. Note that the figures in the paper were produced using matplotlib.
Submit
Follow the submission instructions on the course information page. Provide the Rmd file and its html output for the filtering, aggregation and the two figures. In your solution notes, provide your operating system, any problems you ran into, and the main resource(s) you used (blog posts, online tutorials, stackoverflow posts, papers, textbooks, etc.). These resources should have brief descriptions of what the resource is and how it helped you.