Response Week 7


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

I found the article about the text mining and language standardization very interesting as the use of data to “read” texts for their important points can create a whole new level of efficiency for projects where only basic information is needed. An ability for a computer to read through foreign text in an effort to mine out important language is something that can be utilized greatly for professions involving great amounts of reading. I found the reasoning behind the model structure interesting as it dates back to early times of the computer where “The idea of using a computer to automatically identify “topics” is in large part a product of the desire to exploit the increasingly large amount of text that was being distributed electronically in the late twentieth century.” Back in the late 20th century, computer text was exponentially growing as the rise in technology became mainstream, I think it’s so interesting how that information is dealt with using these programs in order to extract the main topics out of texts with relative ease and maximum efficiency.

Text Mining/Language Standardization


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Jeffery M. Binder’s article on text mining, language standardization, and their application to the humanities brings several questions into discussion. It is nothing short of fascinating how a computer can automatically identify specific topics based on criteria that a system searches for in any length of text. Since this idea of “topic modeling” has been applied to the humanities through programs such as MALLET, the attempt and effort to determine topics based on clusters of information has increased greatly since the late twentieth century. From some of the first computer systems developed at universities to process short boxes of text to output predictions in fields such as politics, to the latest programs embedded into our cellphones that take essentially every word of text we input into account, it seems that the biggest challenge facing the rapidly expanding use of topic modeling has stayed consistent. There is an inevitable bias toward standardized forms of language use. This issue is also apparent in other text-mining methods that depend on statistical analysis of words, because it’s such a difficult one to overcome in a computer. For example, there are text analysis software programs that are designed to guess the emotional state of a group of words, but the problem is that no matter how elaborate a program may be coded a computer will never understand the emotional values and ever-changing expressions of human beings. Binder’s mentions metaphors, irony, and dark humor in text as some of the unbeatable emotional obstacles from humans that text-mining programs must overcome, but often struggle with greatly.

Alien Reading


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Understanding what a block of text means is hard enough for humans to decipher, so how could we program a machine to do it for us without it having similar flaws? The simple answer is we can’t. Humans are still better at figuring out what a string of words means because it isn’t something that involves computational thought. To understand a piece of writing you have to not just know what each word means, but what they mean together, and in a particular order. You also have to figure out the tone of the piece, because the same sentence in one part of the document could mean something totally different in another. It’s comforting to know that computers can’t seem to quite get this figured out, because there seems to be something uniquely human about the written word, and without it we would be just like the other animals on earth. I think it’s good that the programs that read big blocks of text are still improving though, because I see the value in having condensed summaries of huge legal documents or scientific articles printed out by a machine that can summarize them better than a human could. Based off of the other responses to this article, especially “DCS Response 2 for 10.16.2018” by PE, it seems that most other people feel the same way. We are happy computers aren’t as good at us at this one task, but we understand eventually they could surpass us.

DCS Response 2 for 10.16.2018


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

After reading “Alien Reading: Text-Mining, Language Standardization and the Humanities”, I was more aware of how hard it is to fully understand the written word, especially when you are a computer trying to read and synthesize the information in the text. Just as the article stated, computers “tend to privilege the informational over the aesthetic dimensions of language; and they primarily consist of prose” which makes it extremely difficult to be able to fully automate the understanding of a text. For text-mining, for example, technical and informational genres suit the task better as they are easier to understand compared to multiple types of writing at once. This topic was really interesting to me as an english major because my main line of work, currently, is to connect with writers and readers by written communication, whether that be my own writing or reading someone else’s. To have that connection be something that a computer can’t always fully automate is both wildly interesting and powerful, and honestly comforting. I am happy that the written word cannot be so easily understood. I am happy that it takes more than a software to understand humans and their written thoughts.

Alien Interpretations


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

The state of computational linguistic interpretation is at a notable limbo in time it seems, being carried through the advancing technologies of fast computation and big data, but still can be comically inept at talking the talk. With all the amazing things software is capable of these days, it still has troubles effectively analyzing text the way we would.

In Binder’s piece, “Alien Reading: Text Mining, Language Standardization, and the Humanities” the process of analyzing text is talked about, and the current place of text-interpretation is contemplated. I agree with many of his points about the pitfalls of relying on this current format of analysis for humanist purposes. It is important to know that highly statistical means that calculate the interpretation, and the literary aspects current measures neglect. However, I felt as though certain parts were overly dismissive of current processing techniques, because while we may not have a machine that can talk to you flawlessly or analyze the intricacies of Walt Whitman’s words, for specific purposes current methods are quite good. Search and translation engines can be highly effective these days. Binder did acknowledge this to some degree, and was more focused on humanist readings, though I don’t know much software that even claims to do that well.

We certainly have a lot of exploration left in this field, but where we are right now is still worthwhile,.

Data in the Humanities


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

The idea that data is purely used in math, science and the numbers of the world is quickly changing. Data seems to be a help to the humanities as well. Computers are able to analyze and turn stories and words into data models to help us further understand the past and our future. From the article Alien Reading: Text Mining, language standardization, and the humanities, the author explains how today computers have the capability to understand culture and language at the level humans can. He discusses the power of text-mining by saying: “Thinking of text-mining programs as objects of cultural criticism could open up an interchange between digital scholarship and the critical study of computers that is productive in both directions.” Using this, computers are able to understand biases and human tendencies to further think and act like humans. With that said, there are many complexities. For example, comprehending language and cultural changes over time. The study Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes examines the complications of stylistic shifts over time. Syntax, punctuation, and tone are constantly changing. This may be the divider between the computers and humans’  ability to be “culturally critical”. As our culture changes over time are computers able to keep up or do they have to be re-programmed? CN discussed the complexities of humans which is very difficult to record through computer data systems. CN writes: “Theodossiou decided to correlate all of the test subjects emotions to the amount of money they get paid, but there also may be some external factors that affect productivity in the workplace.” To function effectively in humanities, especially in the realm of psychology, computers need to include the complexities of culture and language to make accurate analyses on studies. 

Flawed Linguistic Changes


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

In her research study, Words that Have Changed History, Or Modeling the Dynamics of Linguistic Changes, Marciej Eder explores how the frequency and acceleration of language has changed over time. She does this through a form of graphical trend lines that helps to model the ways in which the researchers have measured language changing over time. Although this is fascinating, I do believe there is a fundamental error when collecting this type of data. Eder writes, “More important, however, is the fact that the scores are not even: the signal becomes stronger in some periods, clearly indicating an acceleration of the language change.” In order to identify these changes, the researchers rely on their own judgement when relating historical events to the acceleration of language. Biased data is flawed and unreliable because that allows for researchers and other experimenters to frame data and other elements of an experiment to their own liking. In addition, I also believe that the researchers require more specificity when testing language change over time. As NL states in their post, “this becomes very relevant because of the ambiguity of the word selection. The study could have chose completely different words and may have formed a different conclusion.” The trend lines created were based off of the frequency of mostly “common function words”, causing a lot of unclear data.

Computer’s Difficulty with Humanness


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Something I found very interesting in Jeremy Binder’s article about text mining is that computers seem have great difficulty when dealing with the human reality of language. Binder argues that when studying literary and cultural texts, text-mining softwares can pull out key words or sentences, but in analysis it focuses primarily on the literal meanings of words. This could lead to many misinterpretations, as language and slang change so frequently over time that evaluating texts on literal meaning could lead to false conclusions about the information presented. As a result, as Binder recognizes, text-mining is often better utilized as “statistical methods in applications like search engines, spellcheckers, autocomplete features, and computer vision systems”. This makes sense, because these application don’t take into the account the fluid nature of language and they search strictly based off spelling or literal meaning.

 

 

In response to “Money=Happiness?”, by ZC, I agree! This studying presents itself with a convincing case, but when you really break down the method of data collection, you can start to realize that it doesn’t sound too convincing. You also referenced a point which I overlooked, the sample size is way too small, I agree, good point.

 

Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

This week I was given the opportunity to read Words That Have Made History, Or Modeling The Dynamics Of Linguistic Changes by Maciej Eder. The reading discusses the use cases of different common words throughout time. The field is called quantitative linguistics and Google defines it as “the comparative study of the frequency and distribution of words and syntactic structures in different texts.” The article uses a version of trend searching in order to capture the frequency of different words used.  Based on the data presented, it appears that use of language goes through cycles of rapid change followed by slower change. The article does not attempt to link these to specific events, however, I believe it would be an incredible extension to this project.  In the post by EC called Simpson’s Paradox: Is the data telling the right story? The author states “They conclude that data has to be analyzed very carefully in order to make the correct conclusions.” I think this becomes very relevant because of the ambiguity of the word selection. The study could have chose completely different words and may have formed a different conclusion.

 

On the subject of quantitative linguistics, I have some experience using a really cool tool from Google search that is helping modern scientists solve solutions using data. The tool is called Google Trends and it uses the exact same methodology as this article.

One really great use case example is catching flu outbreaks by find the amount of times people search flu symptoms online. This allows hospitals, schools and business’s to be prepared and take precaution.

https://www.lifewire.com/google-flu-trends-1616299

 

Another interesting trend that you can follow is the correlation of searching frequency for Bitcoin and its price.  Below I included two figures, that when you compare seem shockingly similar. The first is the search results for “Bitcoin” and the second is Bitcoins price.

I think using something like Google Trends for a project in the future would be a great way to incorporate unconventional and linguistic based data into an analysis.

Sources

Eder, Maciej. “Words that Have Made History, or Modeling the Dynamics of Linguistic Changes.” DH. 2018.

Simpson’s Paradox: Is the data telling the right story?

EC. “Simpson’s Paradox: Is the data telling the right story? ” 2018.

https://trends.google.com/trends/explore?q=%2Fm%2F05p0rrx

https://www.coindesk.com/price/

https://www.lifewire.com/google-flu-trends-1616299

 

7.1 Reading


Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

In the article, “Words that Have Made History, or Modeling the Dynamics of Linguistic Changes,” the author discusses how language has changed over time. Regarding this, one of my classmates wrote that they “feel like taking random words to see changes in certain words or certain parts of the language may not be very effective .” I agree with this statement because the article didn’t do a very good job of making clear what exactly their steps were to determine how this language has changed. Like many of the other articles we have read, they don’t use language that would make it easy for readers to grasp exactly what their plan is. If the reader is unable to fully understand, then it is hard to believe what is being written about and find it effective. Along with this, the article itself says that there are many potential problems with how they could collect this data. For example, it says, “These methods, however, share a common drawback, namely their results are by no means stable. Also, no cross-validation can be considered a downside.” Later it reads, “any attempts at finding direct correlations between historical events and stylistic breaks are subject to human prejudices, and therefore might introduce substantial bias to the results.” While there is always some sort of error, the lack of a proper explanation doesn’t allow for the reader to understand how the person conducting the experiment accounted for them.