This exercise brings together what we have learned about data visualization. You will be creating your own notebook from scratch, calling data, manipulating that data, and visualizing it. Please submit the notebook create via this Lyceum link (section A) (section B) by the start of class on November 27th
Review Best Practices. As a class, we came up with the following best-practices for data visualization. Read over them, and make sure that you have a good sense of what they would mean for your own visualizations:
- Acknowledge the source of your data
- Understand the story behind your data
- Understand what you’re looking for
- Make sure the visualization is clear
- Consider your audience
- Good labels, axis titles, colors
- Use trend lines appropriately
- Make sure that your data is sufficient and representative data
- Makes sure your visualization is accessible
- Make sure your visualization is interactive (where appropriate)
- Makes sure that you use frameworks that your audience is familiar with
- Create a new notebook. Name it PE6 (do not include your name). You will be downloading, zipping, and uploading this folder.
- Title your notebook. Make your first cell into a markdown cell (hit escape and then m) and then enter one hashtag – this makes the cell into a title cell. Write a placeholder title (you’ll come back at the end and change the title). Run the cell.
- Title your sections. Repeat the steps above for seven new cells but use two hashtags (this creates smaller title text). Name them: Analysis, Packages, Calling Data, First Viz, Second Viz
- Load the package “ggplot2” by running library(“ggplot2). YOU DO NOT NEED TO RUN INSTALL PACKAGES.
- Call in each of the tables we have been working with: person, work, family, factory. Save each in a different variable.
- Take some time to re-familiarize yourself with this data. Look at the README if you need to.
- Take some time to familiarize yourself with the kinds of visualizations we created in the Challenges of Data Viz notebook
- Come up with ONE question about the relationship between two numerical columns and ONE categorical column from TWO different tables.
- Merge those tables.
- Remove the NAs from the columns you are interested in.
- Consider the best practices for dataviz outlined above, and make a visualization, using one of the methods we used in the Challenges of Data Viz notebook. You will be plotting the two numerical columns, and color-coding according to the categorical column.
- Title your first viz section (something other than first viz)
- Write a paragraph outlining what the visualization is meant to show, and how you designed it in keeping with best practices in your analysis section.
- Look at the additional types of visualizations available in ggplot2 using this ggplot2 cheat sheet
- Pick a visualization that we have not used already.
- Using different variables, repeat the steps above.
- Write an introduction that explains any insights that you gained from your data, and any new questions that your visualizations raised.
- Illustrate your insights using your images.
- Create a witty and/or informative title for your notebook.