E2: Filter, aggregate and visualize your data

Objective

Learn how to filter and aggregate data using dplyr
Visualize your data using ggplot2.

The goal is to produce figures similar to the 2 left figures presented in Figure 4 of this paper:

Axel Antoine, Sylvain Malacria, and Géry Casiez. 2017. ForceEdge: Controlling Autoscroll on Both Desktop and Mobile Computers Using the Force. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ‘17). ACM, New York, NY, USA, 3281-3292. DOI: https://doi.org/10.1145/3025453.3025605

Read the paper above. You don’t need to read everything in details: understand how the interaction technique works and understand the experiment design presented in STUDY 1: FORCEEDGE ON DESKTOP section. Note: you will continue to work on the data from this paper in the following weeks.

Filtering and aggregating data

It is common to start by filtering and aggregating data before doing some visualization. Here we want to keep only the trials that were successfully completed. In addition we want to do some data aggregation to keep a subset of the independent variables considered in the experiment, in order to make the graphs.

Get the data from the first experiment. The CSV file provides columns with the participant number, task (Select, Move), technique (ForceEdge, Baseline), block number, dragging distance, repetition number, trial number (the participant had to successfully complete a trial before moving to the next one), trial completion time in seconds, the overshoot distance in lines, and, last, the success or error for the trial.
Load the data in R
Use the filter command from dplyr package to keep only the data where success is True (it should remain 2880 trials).
Now we want to aggregate data to compute the mean time for each task, technique and distance. Use the group_by and summarise commands from dplyr. At this point you should obtain a dataframe 12 lines (also called observations in R). In addition to compute the mean values, we want to compute the 95% confidence intervals for the means (error bars). In the summarise command, in addition to computing the mean, compute also the lower and upper values for the confidence intervals using the ci command available in gmodels package

Visualizing data

Read chapter 3 of Modern Statistical Methods for HCI and play with the examples provided.
Using ggplot2 and the data previously aggregated, replicate as faithfully as possible the 2 left figures presented in Figure 4. Note that the figures in the paper were produced using matplotlib.

Submit

Follow the submission instructions on the course information page. Provide the Rmd file and its html output for the filtering, aggregation and the two figures. In your solution notes, provide your operating system, any problems you ran into, and the main resource(s) you used (blog posts, online tutorials, stackoverflow posts, papers, textbooks, etc.). These resources should have brief descriptions of what the resource is and how it helped you.