RF – Data Cultures

Posted on December 7, 2018 by RF

Social Media Makes Politics

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

In reading “What you can, can’t and shouldn’t do with social media data” I found it particularly interesting how data scientists are discovering the relationship between social media and political affiliation for example. While I believe that the abundance and availability of social media data will provide researchers with valuable insights on these questions, I also believe that the sheer magnitude of data will just confirm phenomena that we already know. In other words I believe that social media creates more data points that follow a curve that we already plotted. In terms of political affiliation, researchers were well informed about geography and social frameworks that surround each political party affiliation.

What I find much more interesting is not the relationship between social media and politics is how social media creates politics. The social media platforms that have popped up are major rivers for the flow of information, ideas, and political rhetoric. With this platform we see the emergence of niche political groups and people rallying around certain ideologies. Additionally, these social media platforms have created a completely new form of political debate online.

Finally, I think that social media has reinforced confirmation biases within politics. Coming full circle, one way in which social media data has confirmed our understanding of political party is confirmation bias and friend groups. A map of facebook friends and their corresponding political ideology show little overlap across party lines. This confirms our understanding of politics and further reinforces a political dichotomy.

Posted on November 28, 2018 by RF

E-Universe

The “Guiding Principles for FAIR Data Publishing” lays out a comprehensive set of rules by which we should consider when interacting with data. I think this is extremely important today because data is more ubiquitous than ever. Sometimes, it feels like data is so abundant that it has created a lawless e-universe. As a college student that tinkers with code and data science, the ways in which I am supposed to interact with data is not taught in a classroom. Other than machine-jargon protocols that can sometimes be found on an API, I have been forced to fend for myself in this growing e-universe. As a result, gaining access to data has proved difficult along with using online data all together. By making data more findable and accessible as laid out in the FAIR guidelines, the e-universe will have a readable map for navigating it. This will hopefully make interacting with data less intimidating and in turn create a more robust data environment.

The FAIR guidelines are also critical when considering data equity. Historically speaking, data has been controlled by a very specific group of people. As a result, many people have been left out of the data creation process thus creating implicit bias within our data. This can be seen for example within Thomas Jefferson’s diaries. By shifting the paradigm in which data exists to be more accessible and usable I think that we will see a more equitable e-universe. Instead of having to having wide gaps in data, I think that data as a whole will become more equitable.

Posted on November 25, 2018 by RF

Space, Time, and Coalitions

In the article “Urban Electoral Coalitions in an age of immigration”, Sonenshein and Drayse explore some interesting phenomena about the relationship between time, space, and political coalitions. One of the things I found most surprising was how dynamic political coalitions in this country are. In other words, I was surprised to see how much political coalitions change depending on the political landscape as well as geography and immigration. In modern politics, it’s easy to assume or predict which groups of people align themselves with others. This article proves, however, that political alliances are much more fickle than meets the eye. For example, sometimes, African-Americans and white conservatives align themselves based on mutual agreed threat that is brought in by Latino immigrants. Due to ideological differences within white conservatives alongside the belief that immigrants pose a threat to the African-American position, sometimes these groups and their votes coincide. However, a fissure between white conservatives and African-Americans was exposed due to a policy regarding police support in LA. The point is that the coalitions that we come to assume can no longer be taken for granted. Due to the heavy influx of immigrants from around the world, these political coalitions are more fickle than ever.

I also found the coalition data with regard to the maps to be very interesting. In LA, historically white liberals and African-Americans have formed political coalitions. The map suggests that white liberals have the furthest proximity from African-American and immigrant communities. This draws my attention to a potentially interesting hypocrisy. It appears from this map that white liberals have adopted a progressive approach to immigration in parallel to a “not in my backyard” approach. I find the effects of geography and spatial proximity on feelings toward immigration incredibly interesting and also ironic. To me it’s obvious that more exposure to undocumented citizens would demystify them and create support for them. As the article shows, this is true in some cases, but not for others and is a truly fascinating phenomena.

I like how HR related this topic back to Lewiston, ME. Lewiston faces many interesting immigration issues and is interesting to see the coalitions form between different communities, the racial implications they hold, and how they’ve changed. Historically Lewiston is a city of immigrants. It’s interesting to see how the community and their relationship towards the latest wave of immigrants has changed and weather or not their are racial implications.

Posted on November 8, 2018 by RF

Data in the shadows

In reading Lynching, Visualization, and Visibility, I am very interested in the ways that public discourse shapes the way in which we collect and record data. This issue is well understood in terms of violence against African-Americans. Until anti-lynching activists gathered and published data about the number of lynchings that took place and where they occured, there was no official record on these events. I believe that this creates a pernicious feedback loop that enables more lynchings to occur. Because of the government’s ignorance towards these acts of hate, so many go undocumented. As a result, when research is published about this violence, there is a drastic underestimation of lynchings which results in an underwhelming public response. I believe that this allows violence to persist while forcing researchers and advocates to dig deep in the shadows for evidence. This is the true evil power of data invisibility.

This phenomena is not unique to lynching though. This same ignorance can be seen in our attitude towards police brutality. While the federal government gathers no official records of police brutality, the records that do exist at the federal Justice Department are known to undercount these records. While the recent killings of certain African-Americans bring police violence to the headlines, it’s important to realize that these events do not represent a rise in violence but rather a “fill in the blanks”. These events have always occurred. By choosing not to record them however, they do not get the public attention they deserve. This is a serious issue and another example of data inequality that exists in our society.

In EC’s response, Lynching Data?, they believe that by eliminating the story or context surrounding data, one is in turn eliminating the effect that raw data has on us. I agree with EC because data without context is less accurate. In the case of Lynching, seeing a visual representation of lynchings in the southeastern United States is misleading. While the data does represent the aggregate number of lynchings, it’s important to know that these data points were not actually recorded at the time. This information gives us valuable insight to how society saw African-Americans at the time which is important in understand the power dynamics of the time.

Posted on November 6, 2018 by RF

Rosling’s Road Show

This video for me was very Ironic. Data visualization at its most basic form is the process by which we shine light onto our data so that people can easily understand its conclusions. Data visualization, while often convoluted, is designed such that the data can actually speak for itself. Just as the way that a picture is worth a 1000 words, I believe that a great graph is the most effective way to understand data. With this in mind, I was shocked at how helpful Hans Rosling’s presentation of human life expectancy vs income graph. Hans Rosling proved that the context in which we read and visualize data can sometimes be just as powerful as the insights themselves. In many cases, the context in which our data persists is inextricable to the conclusions formed. In this case, Hans Rosling’s step by step explanation of population rates in light of events such as world wars and flu outbreaks was extremely helpful. This goes to show the power of data context as well as supplemental information about the data itself.

In watching this video, I agree with KS in that Rosling’s presence behind the graph made the data science itself much more accessible. So often I feel as though data is meant to be presented in a complicated way as a means for achieving complicated or sophisticated insights. By standing behind his graph and walking through his conclusions in plain english, Rosling did a great job of inviting people into his research.

Posted on November 1, 2018 by RF

A Revered Paul

In Kieran Healy’s “Using Metadata to find Paul Revere”, we are introduced to the Network Analysis. Without any conversation data, but only using metadata such as organizations that each individual belonged to, one can illuminate connections between people and organizations. To take this one step further, one can use these connections and correlations in order to create a conclusion based on two different people or organizations within the dataset. One thing about network analysis is how reliant it is on assumption or bias? In other words would a network analysis of two hundred and sixty people living in the colonial Boston area be of any use without knowing that Paul Revere was a war hero. I would argue that the correlations and conclusions drawn from network analysis are only helpful in relation to prior knowledge or a data point that acts as a point of contention.

That being said, I still believe that this technique is useful however it raises concerns over the accuracy of our predetermined notions of what is wrong and right. If we think about using Network Analysis to fight terrorism, one could easily see how this model would feed off our preconceived notion of what a terrorist looks and acts like and as a result lead to racial profiling on a large scale.

I found it interesting how SJ related Network analysis to our daily lives. I agree with SJ in how often we use social media to provide us with metadata about our piers without ever talking to them. It is interesting to see how we use this metadata to form conclusions about people, categorize them, or befriend them all based upon information on their profiles. The way in which we see new people is changing which is another reason why this type of network analysis is extremely interesting.

Posted on October 25, 2018 by RF

Can Computer Scientists and Humanists be Friends?

In The Digital Humanities Contribution to Topic Modeling, I found it interesting how much the authors stress the importance of humanist’s role in topic modeling. Due to the availability and magnitude of text data in recent years topic modeling has exploded with popularity. Because topic models work by consuming vast amounts or corpus of text, the results of topic models are generally widespread or blanket statements about the data itself. For this reason I agree with the authors that understanding how these methods work is critical. However, is it really enough to understand the inner workings of the algorithm alone?

Meeks and Weingart believe that in some cases the debate surrounding topic models is too concerned with the success of the algorithm itself opposed to the human space that the algorithm is working in. As a field, topic modelers have become obsessed with understanding the strengths and weaknesses of the topic models that they lost focus on what is really important, interpreting language.

This point of contention will be difficult for researchers to balance in the future. Society rewards speed and profit making it difficult to ensure that our models are not only accurate but ethical. My hope is that humanists and computer scientists will work together to make topic models more accurate and it turn less futile.

In SJ’s reading response, he/she brings up a very compelling argument against topic models, which is that sheer abundance of word count should be mistaken for abundance of meaning. Language and text are not always exact science but also art. There is certainly room for human emotion within a text and I agree with SJ in that sometimes these emotions can trump frequency.

Posted on October 13, 2018 by RF

Text Mining and Language Standardization

Jeffery M. Binder’s ‘Alien Reading’ introduces us to the controversial and unchartered world of Text Mining and Language Standardization. In an age where written information is exploding at light speeds, the prospect of being able to quickly breakdown and categorize and localize snippets of texts is an extremely compelling technology for researchers and linguists. However, the difficulty in this task lies in the fluidity of language itself. To try and convert language into data so that it can be used to make statistical analysis is an inherent problem in and of itself. For example, language is dynamic and is constantly changing. What one word or phrase means to somebody may have a completely different meaning to somebody else. Thus creating a method of standardization is controversial. This issue is ubiquitous across models by which “over fitting” for language occurs. The technology of text mining and language standardization needs to find a balance in which their technology is fast and conclusive enough to be useful while also taking into consideration the locomotive nature of language.

In addition, Text mining faces issues of context. When certain models rely on words, their spelling, and their respective definition these algorithms run into issues about true definition. This phenomena surfaces in Matthew Jockers’s book Macroanalysis. We see a “particular use of stream [that] is not related to the “jet stream” or to the “stream of immigrants” entering the United States in the 1850s.” Rather this stream refers to running water.

With the issues of overfitting and context misjudgment, these text mining algorithms face serious obstacles. If they continue along this pathway without serious considerations and critical analysis by humans on the other side these algorithms could be responsible for a great deal of confirmation bias down the line. One could easily imagine an algorithm sacrificing nuance for efficiency leading to a serious misuse of information.