class: center, middle, inverse, title-slide # Week 3: Importing and visualizing data ### STAT 021 with Suzanne Thornton ### Swarthmore College --- <style type="text/css"> pre { background: #FFBB33; max-width: 100%; overflow-x: scroll; } .scroll-output { height: 75%; overflow-y: scroll; } .scroll-small { height: 50%; overflow-y: scroll; } .red{color: #ce151e;} .green{color: #26b421;} .blue{color: #426EF0;} </style> # Creating a tibble of data Think of data in terms of a spreadsheet (or matrix) where - Rows are the observations, - Columns are the different variable names. .scroll-output[ ```r ## Data object (in this example we're looking at miles per hour and number of passengers) car_data <- tibble(mph = c(22, 35, 20, 25, 24, 20, 34, 31, 30, 32, 33, 25, 29, 24), passengers = c(1, 1, 3, 2, 5, 2, 1, 1, 2, 1, 3, 1, 1, 2)) car_data ``` ``` ## # A tibble: 14 x 2 ## mph passengers ## <dbl> <dbl> ## 1 22 1 ## 2 35 1 ## 3 20 3 ## 4 25 2 ## 5 24 5 ## 6 20 2 ## 7 34 1 ## 8 31 1 ## 9 30 2 ## 10 32 1 ## 11 33 3 ## 12 25 1 ## 13 29 1 ## 14 24 2 ``` ] --- ## Plotting in the `tidyverse` ### Scatterplot and plot basics .scroll-output[ The components for building a plot in the `tidyverse` pack are: - the data object (typically a `tibble` or `data.frame`), - the plot initialization page (where you reference the data object and orient the axes), and then - the plot itself (histogram, density curve, box plot, etc). ```r ## Plot initialization (aes() stands for aesthetics) ggplot(car_data, aes(x=passengers, y=mph)) + ## The actual plot (a scatterplot in this example) geom_point() ``` ] --- ## Plotting in the `tidyverse` ### Scatterplot and plot basics <img src="Figs/unnamed-chunk-4-1.png" style="display: block; margin: auto;" /> --- ## Plotting in the `tidyverse` ### Scatterplot and plot basics .scroll-output[ We finish the plot by adding labels and trimming the x and y axes with the `xlim` and `ylim ` functions. ```r ## Plot initialization (aes() stands for aesthetics) ggplot(car_data, aes(x=passengers, y=mph)) + ## The actual plot (a scatterplot in this example) geom_point() + ylim(10,40) + xlim(0,6) + labs(title="Scatterplot") + xlab("Number of passengers") + ylab("Miles per hour") ``` ] --- ## Plotting in the `tidyverse` ### Scatterplot and plot basics <img src="Figs/unnamed-chunk-6-1.png" style="display: block; margin: auto;" /> --- ## Plotting in the `tidyverse` ### Histograms .scroll-output[ Often, to visualize a single numeric variable, we create a histogram. ```r ggplot(car_data, aes(x=mph)) + geom_histogram(bins=5) + labs(title="Histogram") + xlab("Miles per hour") + ylab("Count") ``` ] --- ## Plotting in the `tidyverse` ### Histograms <img src="Figs/unnamed-chunk-8-1.png" style="display: block; margin: auto;" />