NL – Data Cultures

Posted on November 6, 2018 by NL

Drucker Reflection

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

After reading Graphical Approaches to the Digital Humanities by Johanna Drucker, I learned of the complex formula needed to display information digitally. What really sparked my interest was the spreadsheet. Drucker mentions that spreadsheets are as old as civilization itself. Spreadsheets are a convenient way to relay information in a clear an organized fashion. I believe the progression of the spreadsheet can be a symbol of human digital progress. The spreadsheet has always dominated business and accounting and continues to this day. However, people no longer use paper spreadsheets and instead use programs like Microsoft Excel. The invention of digital spreadsheets changed business and even made personal computers a necessity. College students now still need to learn excel for many full time positions. This shows how important it is to learn tools that allow you to visualize data in the modern age.

I drew some parallels to Making Meaning Count, by Stefan Sinclair and Geoffrey Rockwell. They mention a new way of visualizes data called the word cloud. The wordcloud and spreadsheet are similar in a way because it is just words. It only shows the data itself, but with formatting. In the post Text Analysis in Professional Sports by NB, the author examines the bias associated with athletics and I would be curious of other visual ways to show this bias.

Posted on October 31, 2018 by NL

The British are coming! The British are coming!

After reading Using Metadata to find Paul Revere I learned about the effectiveness of simple social network analysis. The data was gathered from David Haskett Fischer and was mostly lists of committee members. Next, the author compiled a “Person by Person” matrix to indicate relationships between people. People are linked through the groups they belong too and groups are linked by the people they share. After analysis of the amount of connections different people had Paul Revere appeared to score very highly. Thus, the analysis found Paul Revere! This showed that using network analysis will help pick out important people arithmetically. Had Paul Revere not been a household name, this would have showed he had a significant presence during the period.

I believe this related closely with the six degrees of Francis Bacon because they both showed that important people can be found using unbiased network analysis. This technique has tons of interesting applications.

I also found the post Paul Revere’s Social Life by SJ very interesting. SJ wrote “This made me think about how individuals use metadata in our every day lives in order to obtain information about people that we do not personally know.” It is interesting to think that the way we here of people can be applied to so many different applications from Facebook to 1500s historical texts.

Posted on October 29, 2018 by NL

Francis (Kevin?) Bacon

The Six Degrees of Francis Bacon was a very fascinating analysis of relationships from 1500 to 1700. The authors showed that people who have similar links tend to have the same number of mentions. For example, assume I was mentioned in a history book by a professor. However, the professor also mentioned another student in section B. I may not know the other student at all, but that person can be a good indicator for number of mentions for myself in a history book. This took some very advanced concepts to show and I am curious to see if we can explore those further in class.

This article highlights some of the really interesting use cases of machine learning for large (65 million word) data sets. This analysis would not be possible without the machine learning.

I believe this article relates closely to quantitative linguistics referred to in Sentiment Analysis and Subjectivity. As an expansion,I would be very curious if one could separate a “likability” for each person in the history books based on positive language as well.

After reading Why Topic Modeling is important by JM, I believe another expansion of the data set would be the tone women and gender are mentioned over time. I would be interested to see the progression of language for this topic.

Posted on October 22, 2018 by NL

Text Analysis and Visualization

After reading Text Analysis and Visualization, I was shocked at the power of text. First, finding racism from plain speech is a difficult task, but text analysis shows that it is apparent. It is very easy to forget how much data is really out there in the world and this quote really made a strong point: “It is estimated that every day some 200 billion emails are sent and some 5 billion Google search queries are performed – and they are nearly all text-based.4 The hundred hours of video uploaded to YouTube every minute would remain largely inaccessible were it not for text-based searches of the title, description, and other metadata. “(Jockers and Underwood). Most of that data would be lost without strong text analysis and visualization.

Another topic that stuck out to me was the communication aspect. People still communicate primarily through text, so it would make sense for that to be the biggest data source. The text analysis of the toy commercials shows these themes. From the surface the evidence may seem slim, but the word cloud visualizations show certain words can play a major role in advertisements. It is extremely easy to read and gets the point across quickly.

A fellow students post Text Mining/Language Standardization by BM stood out to me. They mention “computer will never understand the emotional values and ever-changing expressions of human beings”. I agree this is a difficult thing to represent and it should be taken into consideration. The word cloud completely misses all of the meaning and only judges by frequency. That is a loss in information.

Posted on October 15, 2018 by NL

Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes

This week I was given the opportunity to read Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes by Maciej Eder. The reading discusses the use cases of different common words throughout time. The field is called quantitative linguistics and Google defines it as “the comparative study of the frequency and distribution of words and syntactic structures in different texts.” The article uses a version of trend searching in order to capture the frequency of different words used. Based on the data presented, it appears that use of language goes through cycles of rapid change followed by slower change. The article does not attempt to link these to specific events, however, I believe it would be an incredible extension to this project. In the post by EC called Simpson’s Paradox: Is the data telling the right story? The author states “They conclude that data has to be analyzed very carefully in order to make the correct conclusions.” I think this becomes very relevant because of the ambiguity of the word selection. The study could have chose completely different words and may have formed a different conclusion.

On the subject of quantitative linguistics, I have some experience using a really cool tool from Google search that is helping modern scientists solve solutions using data. The tool is called Google Trends and it uses the exact same methodology as this article.

One really great use case example is catching flu outbreaks by find the amount of times people search flu symptoms online. This allows hospitals, schools and business’s to be prepared and take precaution.

https://www.lifewire.com/google-flu-trends-1616299

Another interesting trend that you can follow is the correlation of searching frequency for Bitcoin and its price. Below I included two figures, that when you compare seem shockingly similar. The first is the search results for “Bitcoin” and the second is Bitcoins price.

I think using something like Google Trends for a project in the future would be a great way to incorporate unconventional and linguistic based data into an analysis.

Sources

Eder, Maciej. “Words that Have Made History, or Modeling the Dynamics of Linguistic Changes.” DH. 2018.

Simpson’s Paradox: Is the data telling the right story?

EC. “Simpson’s Paradox: Is the data telling the right story? ” 2018.

https://trends.google.com/trends/explore?q=%2Fm%2F05p0rrx

https://www.coindesk.com/price/

https://www.lifewire.com/google-flu-trends-1616299

Posted on October 11, 2018 by NL

The Effects of Low Pay and Unemployment on Psychological Well-being

After reading the effects of low pay and unemployment on psychological well-being, I learned that unemployment actually has a major impact on a persons well-being. Literature suggests that many people derive purpose from work and not just income. This was shown to be true in the results. If we look at the regressions ran; the unemployment variable always had a strong, statistically significant, effect on a persons well-being. I found this to be fascinating and I will be sure to remember this the next time I meet a jobless person.

However, looking at an October 7, 2018 post by NB, omitted variable bias may still exist. I do not know the motivation behind the data used. I believe the authors did a great job trying to cover all the bases: married, age, age^2, children, ect. However, for physiological well being there is not a one size fits all solution so I believe we need to be cautious of these results.

I would also be very interested to see how these results change on an update. I know the economy has evolved and many more people opt out of a full time job and choose to drive for Uber instead.

Lastly, I think it is important to note that survey results may not tell the whole story. People can lie on the forms or only certain people will actually participate in the survey. Both of which cause bias.

Citations

Theodossiou, Ioannis. “The effects of low-pay and unemployment on psychological well-being: a logistic regression approach.” Journal of health economics 17.1 (1998): 85-104.

NB, “Omitted Variable Bias Simpson’s Paradox.” Nick Beatis Blog October 7th 2018

Posted on October 3, 2018 by NL

The Changing State of Recidivism: Fewer People Going Back to Prison

The article The Changing State of Recidivism: Fewer People Going Back to Prison shows a radical decrease in the amount of prisoners who go back to prison after they are released since 2005. This paper uses the data from 23 states from 2005 through 2015. I believe that the results of this paper are impressive, however it should be taken with a grain of salt. The data does not account for prisoners who returned to prison in a different state and also uses less than half the states in the US. It is very difficult to actually find sufficient data and the authors did a great job of addressing this concern in the article. The authors also provided some other sources that in conjunction could be used to form an opinion.

The study is a very interesting read and my obvious next question would be: why are prisoners less likely now than in 2005 to return to prison? What type of factors could be causing this change? I am curious if the prison system is improving or if it is an outside factor driving past prisoners to remain outside bars. The judicial system may also be prosecuting less prisoners, so in order to determine if the prison system is actually improving, I would like to review a couple other different data sources. The first data set I would review is arrest numbers across all 23 states. If arrests are continuously decreasing, then these results would be expected. The next aspect I would look at is the amount of court cases that result in the defendant going to prison (conviction rate?). A lower percentage would also imply a decrease in prison returns.

Overall, I believe that this study individually is not enough to form a conclusive argument. However in addition to other reliable sources, it provides strong evidence of the decrease in prison returns, hence an increase in effectiveness of US prisons.