# E2: Filter, aggregate and visualize your data

### Objective

The goal is to produce figures similar to the 2 left figures presented in Figure 4 of this paper:

Axel Antoine, Sylvain Malacria, and Géry Casiez. 2017. ForceEdge: Controlling Autoscroll on Both Desktop and Mobile Computers Using the Force. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI ‘17). ACM, New York, NY, USA, 3281-3292. DOI: https://doi.org/10.1145/3025453.3025605

Read the paper above. You don’t need to read everything in details: understand how the interaction technique works and understand the experiment design presented in *STUDY 1: FORCEEDGE ON DESKTOP* section. Note: you will continue to work on the data from this paper in the following weeks.

### Filtering and aggregating data

It is common to start by filtering and aggregating data before doing some visualization. Here we want to keep only the trials that were successfully completed. In addition we want to do some data aggregation to keep a subset of the independent variables considered in the experiment, in order to make the graphs.

- Get the data from the first experiment. The CSV file provides columns with the
`participant`

number,`task`

(Select, Move),`technique`

(ForceEdge, Baseline),`block`

number, dragging`distance`

,`repetition`

number,`trial`

number (the participant had to successfully complete a trial before moving to the next one), trial completion`time`

in seconds, the`overshoot`

distance in lines, and, last, the`success`

or error for the trial. - Load the data in R
- Use the
`filter`

command from dplyr package to keep only the data where`success`

is True (it should remain 2880 trials). - Now we want to aggregate data to compute the mean time for each task, technique and distance. Use the
`group_by`

and`summarise`

commands from dplyr. At this point you should obtain a dataframe 12 lines (also called observations in R). In addition to compute the mean values, we want to compute the 95% confidence intervals for the means (error bars). In the summarise command, in addition to computing the mean, compute also the lower and upper values for the confidence intervals using the`ci`

command available in gmodels package

### Visualizing data

- Read chapter 3 of Modern Statistical Methods for HCI and play with the examples provided.
- Using ggplot2 and the data previously aggregated, replicate as faithfully as possible the 2 left figures presented in Figure 4. Note that the figures in the paper were produced using matplotlib.

### Submit

Follow the submission instructions on the course information page. Provide the Rmd file and its html output for the filtering, aggregation and the two figures. In your solution notes, provide your operating system, any problems you ran into, and the main resource(s) you used (blog posts, online tutorials, stackoverflow posts, papers, textbooks, etc.). These resources should have brief descriptions of what the resource is and how it helped you.