Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Warning: Cannot modify header information - headers already sent by (output started at /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php:126) in /home/shroutdo/public_html/courses/wp-includes/rest-api/class-wp-rest-server.php on line 1902

Warning: Cannot modify header information - headers already sent by (output started at /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php:126) in /home/shroutdo/public_html/courses/wp-includes/rest-api/class-wp-rest-server.php on line 1902
{"id":427,"date":"2018-10-13T20:53:50","date_gmt":"2018-10-13T20:53:50","guid":{"rendered":"https:\/\/rfeldman-blog.rfeldman.catapult.bates.edu\/?p=8"},"modified":"2018-10-13T20:53:50","modified_gmt":"2018-10-13T20:53:50","slug":"text-mining-and-language-standardization","status":"publish","type":"post","link":"https:\/\/courses.shroutdocs.org\/dcs104-fall2018\/2018\/10\/13\/text-mining-and-language-standardization\/","title":{"rendered":"Text Mining and Language Standardization"},"content":{"rendered":"

Jeffery M. Binder’s ‘Alien Reading’ introduces us to the controversial and unchartered world of Text Mining and Language Standardization.\u00a0 In an age where written information is exploding at light speeds, the prospect of being able to quickly breakdown and categorize and localize snippets of texts is an extremely compelling technology for researchers and linguists.\u00a0 However, the difficulty in this task lies in the fluidity of language itself.\u00a0 To try and convert language into data so that it can be used to make statistical analysis is an inherent problem in and of itself.\u00a0 For example, language is dynamic and is constantly changing.\u00a0 What one word or phrase means to somebody may have a completely different meaning to somebody else.\u00a0 Thus creating a method of standardization is controversial.\u00a0 This issue is ubiquitous across models by which “over fitting” for language occurs.\u00a0 The technology of text mining and language standardization needs to find a balance in which their technology is fast and conclusive enough to be useful while also taking into consideration the locomotive nature of language.<\/p>\n

In addition, Text mining faces issues of context.\u00a0 When certain models rely on words, their spelling, and their respective definition these algorithms run into issues about true definition.\u00a0 This phenomena surfaces in\u00a0Matthew Jockers\u2019s book\u00a0Macroanalysis.\u00a0\u00a0<\/i>We see a “particular use of\u00a0stream<\/i>\u00a0[that] is not related to the\u00a0\u201cjet stream\u201d or to the \u201cstream of immigrants\u201d entering the United States in the 1850s.” Rather this stream refers to running water.\u00a0\u00a0<\/span><\/p>\n

With the issues of overfitting and context misjudgment, these text mining algorithms face serious obstacles.\u00a0 If they continue along this pathway without serious considerations and critical analysis by humans on the other side these algorithms could be responsible for a great deal of confirmation bias down the line.\u00a0 One could easily imagine an algorithm sacrificing nuance for efficiency leading to a serious misuse of information.<\/p>\n","protected":false},"excerpt":{"rendered":"

Jeffery M. Binder’s ‘Alien Reading’ introduces us to the controversial and unchartered world of Text Mining and Language Standardization. In an age where written information is exploding at light speeds, the prospect of being able to quickly breakdown and categorize and localize snippets of texts is an extremely compelling technology for researchers and linguists. However, … <\/p>\n