Limitations of Topic Modeling

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Over the course of the last week or so, our class has researched, read, and have been taught the principle ideas of topic modeling and text analysis. I have learned that these two types of data extraction are extremely efficient and can also lead to the discovery of valuable information that would not have been found without the analysis of the compiled text. For instance, we read about the subtle differences in racial treatment through the descriptions of white and black NFL athletes. I found NB’s example of Colin Kaepernick to be extremely relevant when reading about this, “This doesn’t come as a huge surprise considering the recent events in the NFL involving quarterback Colin Kaepernick. Kaepernick, with both the IQ and athleticism to be an elite quarterback in the league, still has yet to be signed because he publicly expressed his disgust in the current state of the league.” While the information obtained through this specific example of text analysis is important, Sharon Block’s, What, Where, When, and Sometimes Why: Data Mining Two Decades of Women’s History Abstracts raised some essential questions for me. The only limitations I have about her topic analysis research is that the scale of her topic modeling may have been too large. If a researcher is concluding that the number of words that appear in a text document ultimately dictates, in this case equality of information within gender, word count is not a reasonable causation to this claim. I am not sure how one would go about collecting more substantial data for this, but I do not think assessing the number of specific words can prove that there is inequality among sources. Inequality in regards to publishing history relates to the fundamental issue of published works, whether something written is published about gender or women’s history. The data in this case would not be found in word count, but instead the number of abstracts that were presented and published, and the number of abstracts that were presented and not published. Topic modeling cannot efficiently answer this question.