TB – Data Cultures

Posted on November 29, 2018 by TB

Considering FAIRness

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

Should it be expected that all data objects have all their metadata explicitly and thoroughly stated? Or even recommended? I found myself asking these questions reading through “The FAIR Data Principles,” which outlines an ideal of data availability.

I agree with much of the idea of the article. It seems like generally a good idea to ensure that information published on the internet could be used by others. The ideas for making the information accessible all seem sound. The one thing that I think could have used some more discussion was the idea of metadata subjectivity. For things that are objective such as author and character count, other metadata could be less objective. For instance, an automatic image tagger based on image processing (such as on Facebook: e.g. “happy people, dog running”) could vary based on how they are interpreted.

It seems the advantages of making data more apparent is the ease of use. However, the caveat is that metadata that is presented as a buffer to actually accessing the data could present problems if used haphazardly. As interpretations of data can differ, should we not seek to evaluate the primary source. Again, this doesn’t mean that we shouldn’t make data easily accessible, as RF stated, accessible data leads to fewer people being “left out of the data creation process thus creating implicit bias within our data,” which is a positive. However we should be careful about the metadata that is presented as objective.

Posted on November 27, 2018 by TB

Developing

To be frank, the piece “Developing Things: Notes toward an Epistemology of Building in the Digital Humanities” by Rockwell and Ramsay did not do much for me. I felt that the article was really vague and not blatantly insightful. While there were undoubtedly points that were beyond me, I feel that if somebody has a good point to make, it will generally be clear to the reader.

Was one point of this to decide if digital tools can be theories? They already stated that in the humanities sense “theory” is an adapted term from the classic scientific sense, so are the semantics really that important? I guess it just seems that whatever point they are arguing has little effect on what the results of any research would be.

I agree with DR that much of this “went right over my head”.

Posted on November 13, 2018 by TB

Geo Location via Text

A lot goes on behind geo-tagging historical texts. A piece by Tilton, Arnold, Rivard titled “Locating Place Names at Scale: Using Natural Language Processing to Identify Geographical Information in Text” explains the process fairly thoroughly and in simple terms. From identifying the location fragment in the source, to trying to find other references, to finding general geographic closeness, the steps the algorithms takes is laid out in an easy to comprehend way. Though very concise, the writing gives us a good understanding of how the goal is accomplished.

There are some lingering questions, however. What if the location found is not what it seems to be? Does the algorithm discard the term if there is too much ambiguity? Depending on the use of the results, this could have some implications, where incorrect locations are tagged and conclusions may be skewed.

However, this article for the most part is very informative, and as SA says “an eye-opener” to consider what goes on behind the scenes of this sort of interpretation of data.

Posted on November 1, 2018 by TB

Finding Paul

Catching a colonial criminal sometimes requires some analytical thinking. In Healy’s article “Using Metadata to find Paul Revere,” we see how using some simple matrix arithmetic we can use existing data to quickly and effectively see relations between the data. By multiplying the transpose of an individual v. group matrix by the original matrix, you get a matrix showing the number of connections between each two groups.

This article was witty and folksy, but also was very good at finding a balance between explaining the data well and concisely and making it clear. I thought the author explained the material well, and I thought the material was really interesting. The applied math really showed how connections can be made in large sets of data. Overall, I felt the article was very well done.

Posted on October 30, 2018 by TB

Humanizing Ghosts Through Text Analysis

The web article “A Report Has Come Here” by Klein (which was adapted from a presentation) is an enlightening glimpse at the positive uses that the digital humanities can have. The piece talks about the usage of computational finding connections in the documents of Thomas Jefferson to find out a man named James Hemings. Though it is pointed out that Hemings, who was a slave of Jefferson, isn’t mentioned by his full name once in the documents, through carefully making algorithm to tease out references to him in the vast array of documents, Klein was able to find parts of his life that weren’t clear before.

I thought the usage of secondary social connections to better understand the way that Hemings fit into Jefferson’s life was novel, and very useful. I feel that this piece informatively shows the benefits that can be derived from the use of technology in the humanities.

Posted on October 25, 2018 by TB

To Model a Topic

In the piece “The Digital Humanities Contribution to Topic Modeling” by Meeks and Weingart topic modeling is discussed. More specifically, the authors look at the place of topic modeling within the digital humanities. The post also acts to pull together some resources from others about topic modeling. In fact, much of the information references links that are external to the piece, so without reading those links it is hard to understand the point trying to be made. The piece also mentions two tools for topic modeling, MALLET and Paper Machines. However, the article offers little explanation of what makes them popular or how they function. Overall, the post seems to talk about the field of topic modeling very generally, and rather vague opinions of others (“modeling points the way to a computing that is of as well as in the humanities”). I think in order to get the tangible important ideas of this article, it is necessary to read the external links.

Posted on October 16, 2018 by TB

Alien Interpretations

The state of computational linguistic interpretation is at a notable limbo in time it seems, being carried through the advancing technologies of fast computation and big data, but still can be comically inept at talking the talk. With all the amazing things software is capable of these days, it still has troubles effectively analyzing text the way we would.

In Binder’s piece, “Alien Reading: Text Mining, Language Standardization, and the Humanities” the process of analyzing text is talked about, and the current place of text-interpretation is contemplated. I agree with many of his points about the pitfalls of relying on this current format of analysis for humanist purposes. It is important to know that highly statistical means that calculate the interpretation, and the literary aspects current measures neglect. However, I felt as though certain parts were overly dismissive of current processing techniques, because while we may not have a machine that can talk to you flawlessly or analyze the intricacies of Walt Whitman’s words, for specific purposes current methods are quite good. Search and translation engines can be highly effective these days. Binder did acknowledge this to some degree, and was more focused on humanist readings, though I don’t know much software that even claims to do that well.

We certainly have a lot of exploration left in this field, but where we are right now is still worthwhile,.

Posted on October 11, 2018 by TB

On Psychological Well-Being and Unemployment

Those currently unemployed are more likely to report a lower sense of well-being, thus unemployment is likely not voluntary, concludes a study titled “The effects of low-pay and unemployment on psychological well-being: a logistic regression approach” by Theodossiou. Through self reported data and a wide array of dummy variables, the study finds correlates between age/gender and mental toll of being unemployed.

While I found much of these reported findings believable and perhaps even intuitive, I still find myself uncertain of how much this data is really telling us. First off, self reported data always calls some question on validity. In what fashion and context is this being obtained? We see the literal text of the question they were asked, but the answer to the question “Have you recently been feeling reasonably happy, all things considered?” could change by the hour for any given person. Also who is being asked this? Just anyone 16-91 year old? How is this sample found? Are those who are more likely to respond to this sort of study more likely to answer in a certain way? We cannot say from this piece.

Also, the conclusion that if people don’t like doing something, and they do that thing, then it must be involuntary, seems unconvincing. That is not to say I believe this conclusion is necessarily incorrect, but human emotions and reasoning are not perfect. It may seem fair to assume that people employed involuntarily, but using the fact that unemployed people are unhappy to show that it is against their will doesn’t feel like it tells us much. Would it be expected that people who are unemployed (which means they are actively searching for a job) would be way happier? I don’t think anyone would expect that. I believe this study could have told us more with some more data.