CM-A – Data Cultures

Posted on November 27, 2018 by CM-A

Digital Humanities- an oxymoron or a revolutionary field?

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Reading this article towards the end of the semester felt as though it was bringing the class full-circle. At the beginning of the semester I knew very little about the possibilities of data analysis, and was still relatively new to the idea of digital humanities. When Professor Shrout introduced herself as a historian who used coding in her field, I was pretty surprised/intrigued by the concept. This article, as well as the many others we have read this year, introduced the innumerable and valuable ways in which the digital world and humanities can work together.

For years, we have heard the phrase “don’t trust everything you read on the internet”. However, as the Internet becomes a primary platform for sharing information, the issue now lies less-so in finding sources outside of the internet, and more-so in discriminating between credible and non-credible sources on the web.

In the same way, scholars are hesitant to acknowledge online texts as equal to their physical counterparts. As the authors state, “People who publish in online journals undoubtedly experience more substantial resistance, but the belief that online articles don’t really count seems more and more like the quaint prejudice of age than a substantive critique.” In a new age of sharing data, people are beginning to adapt their review processes in order to cater to the increasingly popular method of digital data distribution.

The professionals who fall under the field of “digital humanities” are also facing some hesitation from others who are not immediately familiar with the relatively new field. “For this group, making their work count is by no means an easy matter.” It would seem that all they contribute (e.g. “digital libraries”, “deep coding of literary texts,” “3-D models of Roman ruins”, “charts and graphs of linguistic phenomena”) are inherently of great value to their fields, but they seem to struggle for scholarly recognition nonetheless. (As my peer said in a previous blog post: “Their efforts to help facilitate the work of professionals… are crucial to their success and yet their work is constantly overlooked and even at times deemed non-scholarly.”)

In seeking to understand the study of digital humanities, I think the authors nail it on the head with their description of theory in the context of digital humanities: “ In the context of history or literary study, “theory” doesn’t predict, but it does explain. It promises deeper understanding of something already given, like historical events or a literary work.To say that software is a theory is to say that digital works convey knowledge the way a theory does, in this more general sense.”

Posted on November 13, 2018 by CM-A

Using Context Clues to Determine Location

There are two aspects of this piece that are particularly intriguing. The first is that there are STILL more aspects of data that can be missed or misinterpreted in analysis. Each week, the readings reveal another aspect that I never would have thought could a) be misleading or b) be missed entirely, yet this week showed that place-based data also falls into this category (Given that main has a Norway, Paris, Denmark, Naples, Sweden, Poland, Mexico, Peru, and China, perhaps I should have considered this earlier). The second intriguing aspect is the technology’s ability to “locate place names using the document’s context.” Given that humans have trouble discerning important information using context clues, the idea that technology has this ability is amazing (this and sentiment analysis seem all-too human). In my peer’s blog, they wrote of the arc diagram Lauren Klein developed “to visualize the people with whom Jefferson corresponded” about Hemings. The current article reminded me of Klein’s efforts, as she was able to extract information using the context that provides additional information pertaining both to the original piece and perhaps to tangential investigations as well.

Posted on November 6, 2018 by CM-A

The Dangers of Omission

This paper by D’Ignazio and Klein emphasizes the importance of omission. Previously, we have discussed the dangers of omission in relation to Simpsons paradox and amounted variable bias (described by my classmate as occurring when “a variable that is correlated with both the dependent and one or more included independent variables is omitted from a regression equation.” In this case, unacknowledged confounding variables account for misleading conclusions. In a more recent article, omission is discussed in the context of digital humanities. As another classmate of mine said, “Lauren Klein brings up a very pertinent question on how scholars account for absences in the archival record. Interested in revealing the absence of information regarding Jefferson’s head Chef, James Hemings,” Klein developed a social network (as the author of the Paul Revere work also did to uncover hidden data) and uncovered a wealth of information concerning Hemings that had not otherwise surfaced.

Hemings was a slave, yet was one of the few who was literate. However, despite their friendship, even Jefferson himself refused to write Hemings directly. Thus, we must turn to the mentions of Hemings in Jefferson’s letters to others. This tactic, while a valuable source of information that would not otherwise have been readily available, is still plagued by the issue of omission and bias. Hemings was a slave, a member of a marginalized population of which few were literate. Women too, were often illiterate, and even those in such groups who could write were often not taken seriously and certainly not published (unless under an alias).

This is the issue that is addressed by D’Ignazio and Klein. Their feminist theory (which advocates not only for women but all marginalized groups), seeks “to challenge the idea that science and/or technology is objective and neutral by demonstrating how scientific thought is situated in particular cultural, historical, economic, and social systems. Feminist STS, both implicitly and explicitly, looks to the perspectives of those marginalized by current power configurations (including and especially those marginalized because of gender, sexuality, race, and/or ethnicity) as a way of exposing how their perspectives are not included in what is considered “objective”truth.”

A long quote, but one that accurately summarizes their mission, and introduces yet another reason for caution when analyzing data of the digital humanities. It is so easy to look at a data set or a work and to pick out what’s wrong, yet its harder and perhaps more important to think further and consider what’s missing. Similar to considering confounding variables that may result in omission bias and ruin the validity of the study, one must not accept the records of history as objective fact.

Posted on November 1, 2018 by CM-A

Secret Agent Paul

As I touched upon in my last response, we have recently been approaching forms of data analysis apprehensively. Although they were in no doubt useful, we were paying close attention to what they may lack -issues in code, inability for computers to replicate/understand human sentiments etc. However, in our more recent readings, the highlight has been more so on the potential of data analysis – in a mostly positive sense. As one of my colleagues said, the Jefferson reading demonstrated how “digging deeper into the absent stories can help piece together many more voices” that were lost in history. The current article demonstrated just how powerful certain forms of analyses can be in extrapolating data. “From the merest sliver of metadata about a single modality of relationship between people…we have gotten a picture of a kind of social network between individuals, a sense of the degree of connection between organizations, and some strong hints of who the key players are in this world.” As I read the article I felt as though I was watching one of those movie scenes where the “FBI” hacks into all the security cameras in the world, pins one blurry image from Grand Central, and magnifies a picture 300x to reveal a crystal clear picture of the suspects face. Whereas those scenes are entertaining given their impressive (and impossible) nature, this data analysis was similar AND possible. Too cool.

Posted on October 30, 2018 by CM-A

A Report Has Come Here – The Bright Side of Text Analysis and Visualization in Digital Humanities

In a previous post, a peer wrote that “topic modeling extends beyond the capacities of humans and opens new doors of understanding.” They said so in the context of approaching data pulled from topic modeling with caution, as it does not necessarily outweigh “the interpretive capacities of human scholars.” http://courses.shroutdocs.org/dcs104-fall2018/2018/10/25/the-digital-humanities-contribution-to-topic-modeling/ In contrast to this sentiment, “A Report has Come Here” sheds light not on what analysis of digital humanities cannot do, but on what it can. What we know off history is transmitted either orally or, more concretely, form documentation. Given that the people doing the documenting were often literate and upper class white men, quite a few voices (slaves, the poor, women, slaves etc) got left out along the way. Not only were their voices omitted, but what records we do have are what was said about them by the white men. This lack of data makes the small traces of these people that much more important. Whereas approaching all the texts of history manually would be akin to searching for a needle in a haystack, Klein demonstrates how text analysis and visualization “offer some acknowledgment of the lives and stories that will forever remain unknown… challenges us to make the untold storied that we detect- those we might otherwise pass over- instead expand in our eyes with significance and meaning.”

Posted on October 23, 2018 by CM-A

Text Analysis and Visualization: Making Meaning Count

Whereas previously we have discussed how data can put forth inaccurate depictions or can be misleading when attempting to draw conclusions, this article demonstrates the use of data in uncovering otherwise hidden insights. As the article highlights, text-based records in particular readily provide the opportunity for analysis and visualization of data.

Visualizations allow for an alternative method of “representing significant features…more compactly and more efficiently…in service of drawing attention to … significant aspect[s].”

From a human perspective, there are two sides to this coin: On one hand, text analysis can be overwhelmingly beneficial in that it can help readers to understand texts that they do not have the time, or perhaps the ability, to otherwise comprehend. On the other hand, how can text analysis help draw a reader’s attention to items of interest they had not previously noticed? How is a computer to know what items are of importance and know better than a human would? (This dilemma is reminiscent of one a peer mentioned in a previous post concerning text mining, computers tend to privilege informational significance over the aesthetic prose of the test, making it extremely difficult to be able to fully automate the understanding of a text).

Essentially, the computer reads the texts as a series of parts and patterns and it is up to humans to write a code that determines what is “important.” Generally, this would require some preconceived notion of what is important in the text- wouldn’t this ultimately defeat the original purpose of discovering items of interest that the reader had not previously considered? In this sense its interesting to think of the variety of perspectives from which text analysis can be approached (e.g. whether the reader is seeking out a general understanding, details of a specific topic, investigating a topic that could shed light on a factor that was not clearly represented in the text overall etc).

Posted on October 11, 2018 by CM-A

The Effects of Low Pay and Unemployment on Psychological Well-Being: The Fine Art of Operationally Defining Qualitative Data

This paper demonstrated times when researchers must make bold claims when operationally defining qualitative variables. For example, they had to make decisions as to how one would asses general happiness pertaining to employment while avoiding confounds. They realized that aging can result in changes to one’s psychological well-being in some but not in others (thus changes across age could be due to factors other than employment), but also had to acknowledge that mental health varies greatly at any given age. When assessing well-being, they used statements pertaining to the person’s “usual” state. In some studies, this could be problematic- given that everyone’s “usual” varies greatly. However, this methodology worked for their experiment because it assessed deviations from their normal state in response to stressors pertaining to employment.

In a more general sense, it was interesting to see the level of detail researchers must consider when attempting to analyze qualitative data (versus it’s more straightforward quantitative counterpart). In a classmate’s post, they referenced the danger of omitted variables being used in order to support a preferred outcome. This reading seems to tackle the other end of the spectrum, in which data must be assessed in a way that best fits the study, but does not manipulate the outcome in a way that would ensure favorable support of the hypotheses.

Posted on September 25, 2018 by CM-A

Simpson’s Paradox & Application to Previous Readings

In another class, I have recently been studying the effect of extreme groups or differing subgroups within a sample- particularly how differing results within subgroups may not be accurately represented by the resulting correlation coefficient.

I did not consider that this can be related to the issues with gathering data that we have been discussing in Data Cultures. In response to previous articles, we have concluded that large conglomerations of individual data, while useful, may omit aspects of the individual that are key in the analysis of the data. Simpson’s paradox provides a clear and concise way of stating and demonstrating this effect.

For example, in the civilian casualties reading, nuisance crimes and more serious crimes were treated/ analyzed on the same level. As a result, people who committed petty crimes and areas in which these occurred received the same amount of attention from law enforcement as more violent/ serious crimes. There are two factors that this form of analysis omits: nuisance crimes are of a smaller scale and thus occur more often, so areas in which these occur were likely flagged even more than areas with dangerous crime. In addition, this means that people who commit smaller scale crimes were being institutionalized at higher rates, whereas the people who are likely committing dangerous crimes out of malicious intent (not due to SEO or other related factors) are not prosecuted at the same rate.

The article discussing Simpson’s paradox raised concerns of a similar magnitude, such as medical conclusions (ie inaccurate conclusions pertaining to effective dosage). Such high-importance examples, particularly ones that have been published and effect people’s behavior related to these topics, demonstrate the importance of scrutiny and close observation when analyzing data.