BM – Data Cultures

Posted on December 6, 2018 by BM

Questioning Scholarship: The Digital Humanities

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Writers Stephen Ramsey and Geoffrey Rockwell collaborate in an article for DH Debates discussing the criticisms towards digital work and it’s scholarly credit. It can be difficult to clearly define what scholarly work really is in the humanities, especially when it varies depending on the context of the work. For example, digital work in the humanities is performed across professional environments as well as academic. Comparing any type of publishing, the online platform has proven to result in a greater amount of resistance and criticism. The reasons for this are still being explored, but they include the extreme ease the average person with an internet capable device has when searching for an article to read about any matter. I think it is interesting how professions that entail undoubtable scholarly work such as text editors, literary critics, librarians, historians, and archeologists often use complicated computing software that has been developed and worked on first professionally in the digital humanities yet the tools they use are looked down upon as unscholarly. As Ramsey and Rockwell mention, however, I agree that questions such as this one are going to be given more attention and focus as time goes on, especially as new media and digital work continues to develop and take prominence across every field of professional scholarly work.

Posted on December 5, 2018 by BM

Natural Language Processing

Historical data text sources are often filled with metadata about where a subject or object was made and stored. There is a plethora of geographical information throughout any textual document, no matter how old. The idea of having this information embedded in textual data sounds great because you could tag any document with the corresponding data in which it includes itself, however, the challenge surrounding this is that it is difficult to extract that sort of geospatial information on a large scale. For example, sorting through hundreds and thousands of documents attempting to extract the location stated in each document has been a hard obstacle in the realm of digital work. A DH2018, Mexico City article mentions how a piece of text might say Paris. The question standing would be if the writer intended to talk about Paris, France or Paris, Texas, USA. Although with close reading through the document that would be in you could figure out it was most likely the capital of France they were talking about, it just might be Texas the writer is talking about. This sort of close reading is simply impossible when we start to discuss hundreds and thousands of documents, and for that reason digital workers use computational methods that identify and geolocate place-based data. Tools such as Named Entity Recognition (NER) and natural language processing (NLP) are used to find and label geospatial data factors, such as countries, states, and cities, at scale.

Posted on December 5, 2018 by BM

The Challenges Surrounding Data Visualization

Johanna Drucker writes about graphical approaches to the digital humanities, and the challenges in doing so. The digital humanities often use data visualization methods and user interfaces that fundamentally contradict conventions of the humanities. The challenges surrounding graphical methods found in the humanities with digital work stem from the fact that they first originated in the humanities. Remembering the relative youth of the digital realm, all of these methods are being applied second hand to digital work which raises some complications. When attempting to apply graphical methods to any digital work, it is necessary to analyze the graphical method first from a humanistic approach in order to adapt it to digital matters. While using graphical methods such as spreadsheets and other grid forms, bar charts, bubble charts, and network diagrams it is so important to understand how these tools we may use in our data analysis have undergone fundamental developments in the humanities field. We need to remember to assess our use of data visualization tools based on the assumptions built into the first developments of these visualization methods, as they didn’t take place digitally.

Posted on November 30, 2018 by BM

Reconstructing Large Social Networks

This Digital Humanities Quarterly article discusses probabilistic text mining as a way to learn about social interactions from Britain in the early-modern period—around the 16 th and 17 th century. Researchers developed a model (or matrix) that creates a map of relationships between individuals that are
referred to in scholarly texts. This is interesting because it lies at the intersection of computer
science and the humanities. In the fifth section, “Humanities Significance,” it is noted that
while humanists care about documentary evidence of connections between people, they would need to do such an unbelievable amount of research to develop a map like the one this model produces. Thus, while this is not perfect, it can definitely be used to get a sense of social networks. And moving forward, the opportunities are profound. Models like this can be used to study social networks in other places—all that is needed “is machine-readable text in which the concurrence of names is a reasonable indicator
of connections between persons.” It is truly amazing how groups can identify source texts that can be used as evidence of historical relationships to then serve as the material for network analysis. We see in the reading how marriage certificates, archival letters—any type of document that historians could use to link individuals together are used. For our work in DCS 104, the relationships among mill workers, their families, and their community can be used by looking at address data.

Posted on November 30, 2018 by BM

Sinclair and Rockwell – Text Analysis and Visualization

Despite the modern world becoming more and more digital, text is still paramount in communicating in every facet of life. Visualizations are increasingly taking the place of text because they are a more efficient and appealing way of displaying information. We can see how groups like Wordle and the New York Times use word clouds because they easily visually show the difference in words used when talking about one variable versus another, for example boys and girls. Interactive visualizations are also becoming increasingly popular, where viewers can not only see the visual but they can manipulate it to gain more specific information on a particular topic, which provides the viewer with an opportunity to gain more knowledge than before. Digital texts contain specific units of information that can be altered and shifted as desired through designed algorithms. There are also many different kinds of text files, some more complex than others. The continually expanding software and ideas in the realm of text analysis is always opening new doors for different applications, and we can see how the possibilities never seem limited.

Posted on November 1, 2018 by BM

Klein: James Hemings in Jefferson’s Letters

Lauren F. Klein’s article discusses issues surrounding the social network analysis of Thomas Jefferson’s letters as president. Klein brings interesting points into account, such as the power relations evident in the relationships among various figures in Jefferson’s letters. For example, the subject most prominent in this article is Jefferson’s chef, James Hemings. Seen in several letters written by Jefferson, Hemings is continually referred to indirectly due to his status as an enslaved African American. Although Jefferson and Hemings shared a uniquely close relationship, from master to slave, Jefferson still referred to him and wrote about him as any other slave on paper. Phrases such as “servant James” were used by Jefferson to write to or about Hemings, despite Hemings’ ability to speak and write in two different languages. It’s intriguing that a search for “James Hemings” in the archives of Jefferson’s letters shows no results. This is because even after Hemings was emancipated, Jefferson referred to him in writing as “former servant James”. Jefferson would even ask to “receive him” or ask for someone to “send for” him. Through language that we can see with the text mining exercised with the letters that Klein brings into discussion, Jefferson shows a consistent position of power over Hemings regardless of his emancipation and true loyalty as a servant.

Posted on October 16, 2018 by BM

Text Mining/Language Standardization

Jeffery M. Binder’s article on text mining, language standardization, and their application to the humanities brings several questions into discussion. It is nothing short of fascinating how a computer can automatically identify specific topics based on criteria that a system searches for in any length of text. Since this idea of “topic modeling” has been applied to the humanities through programs such as MALLET, the attempt and effort to determine topics based on clusters of information has increased greatly since the late twentieth century. From some of the first computer systems developed at universities to process short boxes of text to output predictions in fields such as politics, to the latest programs embedded into our cellphones that take essentially every word of text we input into account, it seems that the biggest challenge facing the rapidly expanding use of topic modeling has stayed consistent. There is an inevitable bias toward standardized forms of language use. This issue is also apparent in other text-mining methods that depend on statistical analysis of words, because it’s such a difficult one to overcome in a computer. For example, there are text analysis software programs that are designed to guess the emotional state of a group of words, but the problem is that no matter how elaborate a program may be coded a computer will never understand the emotional values and ever-changing expressions of human beings. Binder’s mentions metaphors, irony, and dark humor in text as some of the unbeatable emotional obstacles from humans that text-mining programs must overcome, but often struggle with greatly.

Posted on October 2, 2018 by BM

Recidivism Data

Trends in data, collected by the National Corrections Reporting Program, revolving around reoffending convicted criminals has been analyzed by looking at the data sets of prison admissions and releases. The federal Bureau of Justice Statistics (BJS) collects their data submitted by state corrections and parole departments. I find it interesting how there are only some small differences in Pew’s and the BJS’s return-to-prison data in 2005. I would have expected more apparent differences based on the different measurements of data collection. For example, the BJS includes data on 70k out of 400k rearrested prisoners, strategically selecting based on their access to fingerprints that link records to prisoners. They could also remove prisoners in their data sets who have died, which is necessary because dead prisoner data could skew the whole data set. Pew could not access this sort of information, or enough information to the degree that BJS could which makes me question how BJS’s and Pew’s numbers only show small differences.