Ethical Concerns When Working with Data

  • Data is not objective, and is not neutral. (Noble, O’Neill)
  • Human motives influence the bias in the data. (Eviction Lab)
  • Readers might not be aware of these biases. (Simpson’s Paradox, Asimov)
  • Data needs to be looked at in context. (Klein, Klein and D’Ignazsio)
  • Lack of depth/context can lead to unintentional bias. (Eviction Lab, O’Grada and Guinnane)
  • In order for users to feel safe, they should understand how/where data is being collected. (Eviction Lab, Force 11)
  • Not enough people know that data is not your own. (Gabourey)
  • Verifying your data
  • Make sure your survey design is good (Eviction Lab)
  • Consent (What you Can’t and Shouldn’t do with social media)
  • It is not always clear what your audience is when using social media to collect data  (Gabourey, Klein, Healy)
  • Think about the financial incentives (Gabourey)
  • Publishing accurate statistics makes it possible to infer personal information (Healy)

Best Practices for Sharing Data and Code

  • Make sure that your data was ethically collected. (Pew, Healey)
  • Use common language (Binder)
  • Have variables that make sense
  • Make sure that the layout of your interface is mobile accessible
  • Use keys
  • Make sure that your data that can be analyzed in many softwares (i.e. non-proprietary format)
  • Pre-usage instructions (e.g. read me document)
  • Being respectful – anticipating the expectations of the people whose data you are using (Noble)
  • Sticking to larger datasets
  • Only share the data you need/used for replication
  • Account for the Hawthorn effect
  • Do not share names and location.

Best Practices for Making Public-Facing Data Projects

  • Make the presentation of the data understandable by anyone. Enough so that it is not necessary for you to be there to describe the data
  • Use a shared vocabulary so people can understand your work without doing too much learning
  • Make sources of information clear and available. Be clear about where the data is coming from and how you collected it (Klein and D’Ignazsio)
  • Protect any personal information provided by people that you used for your data
  • Be mindful of the purposes of your research
  • Create visual / easy to understand methodology along with data (Drucker)
  • Always cite our sources to ensure credibility & give credit to people/services that are typically overlooked
  • Make sure any visualizations accurately represents the data
  • Being transparent enough for replication purposes *(Force 11, Ramsey and Rockwell, Pew, Eviction Lab)
  • Consider context when looking at data (Klein and D’Ignazsio)
  • Being overall aware of potential subjectivity of data and how you yourself are able to skew the data.

Code to check if packages are installed, run if not

Hello all,

We can certainly run each package at the start of each notebook, but if you are comfortable, you can also use the following code:

# List your packages here (this checks to see if tidyRSS and stringr are installed.
packages <- c("tidyRSS", "stringr")

# Run this without making changes
package.check <- lapply(packages, FUN = function(x) {
  if (!require(x, character.only = TRUE)) {
    install.packages(x, dependencies = TRUE)
    library(x, character.only = TRUE)

What do I need to do before class today?

Figuring out what work needs to be done before each class can be a bit confusing – this flowchart should help you determine whether you should post a reading response or annotate some readings.

If you have forgotten whether or not you signed up to post, you can check here for section A (11:00-12:40) and here for section B (2:40-4:00). The google table lists your initials, as well as the classes for which you signed up to post. The next available class for which to post will be bolded. After a class has passed, it will be deleted from the list.

DCS 104 Class Work Flow Chart