Don’t expect too much from tools: the limitation of text analysis


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

I loved this reading. It was easily digestible and interesting. It assumed that I knew nothing, which was correct, so everything was explained in a way I could understand.

I had a couple main takeaways that I’ll summarize here:

  1. Computers can only do whatever you tell them to, therefore they are subject to potential bias. For example, in the very first example, regarding the adjective that described NFL players, the team who put together the program could have compared two different races, or players above and below a certain height, leading us to very different conclusions.
  2. These tools are limited. They cannot *understand* the meaning behind the strings they are encoding. They don’t have a sense of morality and cannot solv problems for us that we do not understand.

Text Analysis in Professional Sports


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

After reading Text Analysis and Visualization: Making Meaning Count, by Stefan Sinclair and Geoffrey Rockwell, I was forced to realize that racial stereotypes still do exist in professional sports. We’ve as a country and society come a long way since the 50s and 60s, but we still have a long way to go. Relating specifically to the article, in a study performed on NFL players they found that, “white players were more likely to be called “intelligent” and blacks more likely to be called “natural”.” This doesn’t come as a huge surprise considering the recent events in the NFL involving quarterback Colin Kaepernick. Kaepernick, with both the IQ and athleticism to be an elite quarterback in the league, still has yet to be signed because he publicly expressed his disgust in the current state of the league.  Relating this back to text analysis, studies like the one in the article are easy to conduct, but are they completely relevant and accurate in their findings. For example, is there an even split in white vs. black players in the NFL. I would guess absolutely not. This makes it very difficult to compare frequency of certain words like natural and intelligent.

MLC makes some good points in her article about the potential biases that come about with this type of analysis. The author says, “Looking at the frequency of times particular words are mentioned can uncover some bias, which may be unnoticeable at a readthrough.” I would like to do some further research on the particular NFL study conducted to see if I could uncover some biases.

Response to Sentiment Analysis and Subjectivity


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

In Sentiment Analysis and Subjectivity, Bing Liu discusses novel techniques in text analysis that aim to find the sentiment embedded in text. Bing importantly notes that this sort of text analysis can only be done now with the rise of the internet. With the plethora of information available in online forums, groups and blogs text analysis has never been easier. The article does well in describing all of the different methods and applications of text analysis but I wish they had gone into more detail about the potential of this tool. For example, measuring news sentiment is increasingly becoming prevalent for political and economic analysis and the ability to separate “fact” from “opinion” or sentiment is huge advancement in text analysis.

There are some caveats to text analysis that I think CM-A explained well. In their post they talked about how these analysis tools still are reliant on how humans interpret what is important to look at. This is particularly important for sentiment analysis since whoever creates the code has to determine what kinds of words or phrases contribute to a particular sentiment, which probably is not a very scientific process.

Whatever you do don’t cite this


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Text analysis is something we’ve all seen at some point but might not recall. Remember those god-awful word clouds you saw in 2007?  Those are just a form of text analysis.  Stefan Sinclair and Geoffrey Rockwell explain text analysis at its core and explain a few examples in their paper “Text Analysis and Visualization: Making Meaning Count”.  In their first example they talk about one of my favorite Deadspin articles, which used text analysis to show discrepancies in words used to describe white and black NFL players (the worst kept secret in sports). 

HaydenC insightfully points out that in the figure provided in the paper, the rate of the word “tough” appearing was about 1.5 times as high for white players compared to black players, but the total usage was about 2 times as high for black players.  While black players do make up roughly 70% of the NFL, HaydenC raises a great question – why are so many more words written about black players on aggregate?

I must admit that I forgot the password to the class website, thus I cannot reference another response and had to respond to a comment instead.  Remember classmates, whatever you do don’t cite this.

Text Analysis and Hidden Bias


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

The Jockers article “Text Analysis and Visualization”  made points about not only looking at data presented in writing, but to look deeper into the data within the text itself. Looking at the frequency of times particular words are mentioned can uncover some bias, which may be unnoticeable at a readthrough. I like the point that CM-A stated in their blog post, that although text analysis can expedite the content that we are consuming from a particular text, it can also highlight particular facts or data that may not have been highlighted previously.

Visualization of texts, such as wordle, have always been a fun activity- but personally, I have never looked at it as an analysis technique. In the example of boy advertisements vs girl advertisements, the visualization is pretty jarringly different- even if the data frequencies would not appear to be extremely different.

The Future of The Web and Text Analysis


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

A study, “Sentiment Analysis and Subjectivity” discussed the impact the web has made on individuals’ “opinions”. With the immense popularity of the Web, humans have more access to information, beliefs, and ideas previously not made accessible. Arguably, before the Web, opinions were heavily shaped by the place a person was born, their parents and developments in their lives. With the introduction of the Web, and its role of expressing news, information, and ideas,  people’s opinions are being less shaped by their immediate surroundings. “The Web has dramatically changed the way that people express their views and opinions. They can now post reviews of products at merchant sites and express their views on almost anything in Internet forums, discussion groups, and blogs, which are collectively called the user-generated content” the study states. The web and its power to influence peoples opinions on a global scale may result in a more uniformed or divided planet. The second study “Text Analysis and Visualization” discusses the unhuman speeds of text analysis. Computers are able to read, analyze and find trends in thousands of texts at rates much higher than a human could. This is a massive breakthrough in terms of identifying social and political trends through texts such as books, articles, essays, and even speeches. Even being able to identify trends below the surface can contribute to understanding social changes over time. Syntax, use of language and tone are attributes which can be identified and contribute to data sets. HR reflected on the use of data mining. “I think it’s so interesting how that information is dealt with using these programs in order to extract the main topics out of texts with relative ease and maximum efficiency.” The ability for computers to recognize themes in a text at the same efficiency humans presents us the opportunity to learn more about the past, analyze the present and forecast the future in ways humans have never been able to before.

The Impact of Words


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

In Sinclair and Rockwell’s Text Analysis and Word Visualization: Making Meaning Count , the authors highlight the importance and efficiency of text analysis. Throughout the article, Sinclair and Rockwell provide examples of how the use of these semantic programs are effectively able to produce information on digital texts that exist. Specifically, they describe how visualizations are able to help collect important data, “Visualizations are transformations of text that tend to reduce the amount of information presented, but in service of drawing attention to some significant aspect.” Although this is extremely valuable information, it is only made significant by the individual examining the data. While text analysis is important, the data that is collected can only be used if it interpreted by the user. In Sinclair and Rockwells’s interactive NFL example, only the individual can determine the significant information that emphasizes the racial discrepancies in player description. A classmate of mine succinctly poses this question , “How is a computer to know what items are of importance and know better than a human would?” These are essential questions that are necessary for the development of semantic and text analysis. While I think creating a program that is able to differentiate this significant information is next to impossible, the extraction of analysis in extremely complicated and long texts is very important.

Text Analysis and Visualization


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

The power of text is stressed. A clear understanding should be laid as the power of text remains, even as other forms of visual contact take off. As million of hours of Youtube and other visual digital forms are uploaded by the minute, most is categorized by text, a major feature that we ca use to visualize the relationship and frequency these text terms hold with one another. “It is estimated that every day some 200 billion emails are sent and some 5 billion Google search queries are performed – and they are nearly all text-based.” Can we take this information and create a digital image? Create a way to represent data?

 

In the process of trying to understand these text files, we follow the process of turning “bits and bytes” into a structured “format and markup” which should allow us to visualize and conclude on results of the data. In a simple case of text visualization, a word mashup can be used, and their remain multiple digital software available to make this easy. Although this will not instill a mathematical way to determine the relationship between items, it stands for a great way to briefly visualize data. In a concluding thought, we can gain new incite from the distribution and frequency of words, such as in the case of Alice and Wonderland.

Reading Response #4: Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Maciej Eder’s paper, titled “Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes”, Eder discusses the history of quantitative modeling of words, along with the new technologies and the advancements that have come because of them. As my classmate (Ben Lyons) wrote in his journal entry, the data collected from the text mining can if done correctly, be very effectively manipulated by whoever is running the study. So, people using this methodology for their studies need to be careful to capture the full picture, instead of what they are looking for.

Reading Response #4: Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Maciej Eder’s paper, titled “Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes”, Eder discusses the history of quantitative modeling of words, along with the new technologies and the advancements that have come because of them. As my classmate (Ben Lyons) wrote in his journal entry, the data collected from the text mining can if done correctly, be very effectively manipulated by whoever is running the study. So, people using this methodology for their studies need to be careful to capture the full picture, instead of what they are looking for.