Some data is already loaded when you load certain packages in
R, to access these, you just need to use the
data() function like this:
Other times you’ll have data in a file, like a
.csv or Excel file. You can use
read_* functions that load when you load the tidyverse package to read these in. For example, to read a
.csv file in, you could run:
movie_data.csv would need to be saved in your RStudio project folder for this code to run. We will practice this in a few weeks.
glimpseat your data
Rows: 1,846 Columns: 3 $ dataset <chr> "dino", "dino", "dino", "dino", "dino", "d… $ x <dbl> 55.3846, 51.5385, 46.1538, 42.8205, 40.769… $ y <dbl> 97.1795, 96.0256, 94.4872, 91.4103, 88.333…
How many rows are in this dataset? How many columns?
glimpseat your data
Rows: 344 Columns: 8 $ species <fct> Adelie, Adelie, Adelie, Adelie, … $ island <fct> Torgersen, Torgersen, Torgersen,… $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3… $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6… $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181… $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650… $ sex <fct> male, female, female, NA, female… $ year <int> 2007, 2007, 2007, 2007, 2007, 20…
What type of variable is
species? How many numeric variables are there?
Let’s grab one of the
filter do? Why
geom_* in ggplot2 describe the type of plot you want to create. What do you think would create a histogram?
What does this warning mean? How do you think we can get rid of it?
What does this plot tell us about the shape of this data?
geom_ do you think would create a density plot?
geom_ do you think would create a boxplot?
Does this give us as much information as the histogram?
What does this plot tell us?
What is missing?
How can we make this more legible?
Open the Welcome Penguins folder from the previous application exercise
Create a boxplot examining the relationship between the body mass of a penguin and their species.
Add jittered points to this plot
Add labels and a title to this plot