MLC – Data Cultures

Posted on November 27, 2018 by MLC

Developing Things: Digital Humanities

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

This article felt a little too opinionated for my liking. I agree with DR that this article seemed to have a lot of filler, without making many concrete connections throughout the whole article. I find it ironic that this author seemed to be skeptical of online academia and scholarship (and the criticism of it) yet, they were writing academically and criticizing online work… A little too meta for my liking.

The only point I found interesting was at the very end when the author was debating whether creation of scholarly material still “counted” if it was built by a machine. I think that this just gets into ethics of ownership and intellectual property.

Posted on November 8, 2018 by MLC

Lynching, Visualization, and Visibility

It is baffling to me that this data is available even though lynchings were not reported by state or federal government. Even though there was a wide knowledge of lynchings, it was still illegal by law, and thus the government turned the other cheek. All of the data was collected from citizens and newspapers, who were likely related to the victims, or from the community of the victims. This data is also almost definitely under-reported which is heartbreaking in and of itself especially because the sheer number of reported lynchings was incredibly high on a weekly basis. The visualization demonstrated just how many lynchings there were by showing a near opaque red graph for 40 years until it started to dissipate around the 1920s. Once again, this was only reported data- and the graph only shows these years on the axis, but that doesn’t mean that the lynchings didn’t extend far beyond it.

Posted on November 6, 2018 by MLC

Feminist Data Visualization

It is great to read about intersectional feminism being applied to technology and data, because in a male dominated field it is important to know which information is biased towards a patriarchal society. D’Ignazio acknowledged that collecting and presenting data can be biased by who is creating it, who it is created for, and the societal influences around it. The article concluded that to further embrace intersectional feminism, the field needs to; rethink binaries, embrace pluralism, examine power, consider context, legitimize embodiment and affect, and make labor visible. Critical thinking about all of these categories will allow the audience and the author to remove some of the societal inequalities that all STEM fields currently have.

MS-B mentioned in their post that by data should have the flexibility to be collected in a fluid manner, and the technology should adapt to be able to handle that sort of analysis. If data was collected in a fluid manner since it’s creation, we wouldn’t have the check”male or female” box as often as we do.

Posted on November 1, 2018 by MLC

Social Networks and Paul Revere

When we think of the term ‘social network’ we don’t think about that network expanding back into colonial periods. Social networking is widely used today to communicate with lost acquaintances, and stay updated on trends. In a less modern sense, this social networking scheme can be used to track correspondences and give a detailed analysis of which groups are in communication. In the article, the strength of the relationship between the groups can indicate whether the groups were allies or not. This is useful to track “terrorists” as seen in the article.

The use of the betweenness scores was very interesting because it showed who was in the midst of communications, with Paul Revere having the clear highest score. This article also described the use of eigenvectors to further analyze the matrices, which is a very useful tool when using large dataframes.

As SJ mentioned in their blog, It is cool how we can use data that was not collected by direct communication. Secondhand communication suddenly comes into play in a large way, which introduces a whole new world of data collection.

Posted on October 23, 2018 by MLC

Text Analysis and Hidden Bias

The Jockers article “Text Analysis and Visualization” made points about not only looking at data presented in writing, but to look deeper into the data within the text itself. Looking at the frequency of times particular words are mentioned can uncover some bias, which may be unnoticeable at a readthrough. I like the point that CM-A stated in their blog post, that although text analysis can expedite the content that we are consuming from a particular text, it can also highlight particular facts or data that may not have been highlighted previously.

Visualization of texts, such as wordle, have always been a fun activity- but personally, I have never looked at it as an analysis technique. In the example of boy advertisements vs girl advertisements, the visualization is pretty jarringly different- even if the data frequencies would not appear to be extremely different.

Posted on October 11, 2018 by MLC

Unemployment on Psychological Well-Being

Theodossiou articulates that unemployment has a negative effect on psychological well-being, and then by natural flow of reasoning, he further made the assumption that most unemployment was involuntary. Theodossiou says that the parameters of measuring the psychological wellbeing were not limited to “caseness scores” which are numerical values associated with some type of psychological testing. He did his analysis by collecting categorical data (ordinally ranked in 4 levels). His categories were; feeling under strain, losing confidence,thinking of being a worthless person, as well as happiness levels and ability and motivation to do day to day activities. Each ordinal ranking was associated with a value, [e.i. not at all (1), no more than usual (2), rather more than usual (3), much more than usual (4)]. I think that these categorical variables are accurate/useful for this collecting data because the data is subjective to the person answering the questions- and the researchers were looking for subjective data.

Additionally, as Elizabeth said in her blog post, even when looking at the data across various subgroups (age, gender, married/divorced, etc) as to avoid Simpson’s paradox, it all seemed to be relatively following the same trend, but the reasoning behind each trend could be explained differently. I would agree with Elizabeth that this article does a good job of categorizing data, and analyzing based on many variables and across many groups.

Posted on October 9, 2018 by MLC

Simpson’s Paradox RR2

The Simpson’s Paradox is interesting when related to large corporations or news sources, because many times the paradox is manipulated in their favor. This paradox basically says that when many subsets of data are expressed, the individual subsets may have specific characteristics, but expressed as one big data set they may have the opposite characteristics. When you don’t consider individual variables, there are pieces of the picture missing from the data set. News sources do this all the time when reporting data for state or even national and political reasons. Relating this back to the Pew article about recidivism rates, if the database reported that the recidivism rates were declining nationwide, they may be completely be eliminating the fact that they had been increasing in half of the states across the country. I like how in this source about the paradox, the author acknowledged that there needs to be treatments to fix the biased data in a good statistical model. Many of the treatments involve separating the variables into subsets or clusters. This changes the entire outcome of the data, but it is more accurate.

Posted on October 4, 2018 by MLC

Reading Response 1

The Martell and Pew articles, each gathered mass amounts of information from databases to analyze and make claims. However, in both instances there were major flaws in; the spread of the data, location of data collection, the follow-through of time related data (primarily seen in Pew) and failing to control all confounding variables. The Pew article stood out to me because the data collected for these prisons showed such high rates of recidivism, and there were still somewhere between 12-20 states that did not report. I think the percentage of released prisoners that reoffend is much higher than what the article explained, because of the underreported data. If a prison has extremely high rates of recidivism, they might not report that data because it is possible that their prison could lose funding, or be absorbed into another prison system. Also, in both the methodology and the article, the authors acknowledged that Pew did not keep track of people who reoffended in a different state, which lessens their overall percentage. Furthermore, the purpose of the article was to show the steady decline of recidivism, but in my opinion, there were too many uncontrolled variables to make that claim boldly enough to put in the title.

However, while mass data bases often have large information gaps, they do provide valuable information when used with caution, and with wide (and as random as possible) spreads of data.