Instructions: This exercise continues to develop the oral histories dataset. In your same groups from the data modeling assignment, complete the following. Submit your discussion and revised model via Lyceum by the start of class on October 9th (section A) (section B):
REMEMBER TO NOT INCLUDE IDENTIFYING INFORMATION IN YOUR SUBMISSION
- A revised data model changed to take into account (a) the current state of the oral history data and (b) conversations with your colleagues in the data model share around. This model should include entities, attributes, relationships AND the kind of data for each. So, if you propose an attribute of something like “work related illness” you would specify that this is character/string data, and (probably) suggest that it make use of a controlled vocabulary (i.e. a pre-defined set of illnesses, or at least additional data clarifying how an illness described in the data relates to a diagnostic category)
- A concrete and specific plan for realizing some part of your revision. (i.e. “read through all histories and tag x” or “strip all numbers” or “make first two rows of history a title column” – NOT “ID diseases” or “clean data” or “add titles”)
- If your concrete plan involves creating systematic variables from unsystematic language (i.e. creating a variable that captures working conditions from different descriptions of working conditions), then you must include a controlled vocabulary (i.e. easy, moderate, difficult) and an explanation of how you would encode different personal descriptions of (in this example, working conditions) to that controlled vocabulary.
- A short paragraph explaining how some of the decisions you have made so far have been influenced (or taken in spite of) some of the big ideas in data cultures we have encountered so far. Make sure to reference AT LEAST three of the starred readings.
You can download the “cleaned” data from the Lyceum assignment folder (there is still more cleaning to do) as well as the R scripts I used to clean it.