NB – Data Cultures

Posted on November 8, 2018 by NB

Quality of Visuals in Lynching, Visualization, and Visibility

Warning: Undefined variable $num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 126

Warning: Undefined variable $posts_num in /home/shroutdo/public_html/courses/wp-content/plugins/single-categories/single_categories.php on line 127

In the article “Lynching, Visualization, and Visibility,” many visual displays are shown to articulate findings about lynchings in the United States throughout the 20th century. The first chart attempts to make the pervasiveness of lynchings visible with a grid for each week in each year. This chart was a little difficult to decipher findings. Time is expressed on both axes and the color scale didn’t have too much contrast. The subsequent line graph that illustrates lynchings per year as a function of time is a much easier plot to interpret and see trends over time. This graph is very similar to the number of executions by race later in the article. This visualization is very sophisticated in its ability to separate executions by race over a period of time, but the results are surprising. We see a drastic drop off in executions in the decades following 1950, but the number of executions increase again somewhere near 1980. It would be interesting to see if policy during this time had an effect on this instrumental change in executions.

In SJ’s blog post, they question some of the visualizations displayed. The author notes that , “Although the data collected was important, I genuinely believe the visualizations produced did not help with any new or essential trends that were deemed not discovered prior to the author’s research.” I somewhat disagree with this statement. Some of the visualizations didn’t do the best job of highlighting these results, but I believe they all are important for showing trends in lynchings. The comparison between lynchings and executions is a necessary comparison to make in the context of the article.

Posted on November 5, 2018 by NB

Questions with Data

Even though we live in such a data driven world with a plethora of visualizations thrown at us, challenges still exist with how we can use this data in a meaningful way. In the article, “Feminist Data Visualization,” several questions are proposed about the exploration of data and how we can best use data to be effective but also efficient. In the section titled, Design Process Questions, the authors propose the idea of many excellent considerations while conducting research. For example, they discuss questions like:

How is power distributed across the design team? Whose voice matters more and why? How can end users’ voices be more fully integrated into the design process? Can we build capacity in user communities, or enlarge our internal perspectives, by employing a more participatory design process?

Despite the focus of this article being about feminist data visualization, these data driven questions and solutions can be applied to almost anything in the growing business analytics field. This article did bring forth a lot of key thoughts to consider when using data and am excited to dig deeper into these questions during class. When reading other blog posts, I came across a post by BL about some of the same challenges with visualization. They talk about how “data can very much be subjectively created, and in doing, can be altered depending on how it is chosen to be recorded. It supports the importance of the role of data in telling story, and questions some more traditional ways that we have been doing so.”

Posted on November 1, 2018 by NB

One if by Land, Two if by Sea

In the article, “Using Metadata to find Paul Revere”, data is used to find suspected individuals involved with terror groups in the 18th century within the Colonies of the United States. To do so, they used a “beta of PRISM has been used to collect and analyze information on more than two hundred and sixty persons (of varying degrees of suspicion) belonging variously to seven different organizations in the Boston area.” From there, various networks and connections were made among people and organizations. This data, at first, was very difficult to condense and use purposely. “Links between people and some other kind of thing, like attendance at various events, or membership in various groups” were used to start this type of analysis and provided the framework for networks. The article notes that there could be some inaccuracies and biases with this data, but a very simple and straightfoward method has the ability to “pick the name of a traitor like Paul Revere from those of two hundred and fifty four other men, using nothing but a list of memberships and a portable calculating engine.”

In CM-A’s article, “Secret Agent Paul”, the author makes comparison with this type of analysis to a FBI movie scene where we look into a blurry security cam. It is extremely fascinating to think that data has the ability to identify criminals by connecting their past activity and involvement among different organizations and individuals.

Posted on October 30, 2018 by NB

Computational Linguistics

We are fortunate with the use of computers, smartphones, and other forms of media the great avenues to which we can explore data. This can sometimes, though, lead us down somewhat of a rabbit hole, in which we can’t really draw any conclusions or question the intent of our research. At first glance reading “A Report Has Come Here”, I was left thinking this after learning a little about computational linguistics. In this technique, data researches are able to manipulate data “that allows you to present textual data in various visual forms.” I’ve included an example from the article below. From there, we are able to draw complicated links between various objects, people, etc. This connects to a reading we did last week involving deep reading. There are a lot of exciting challenges to discover using this method, but I am honestly not a big fan of it. For me at first glance, it was extremely confusing to draw connections between the arrows and I felt utterly confused at the data displayed and more importantly, the intent behind the visualization.

JH makes a good point in his post that these extraction methods are important. He highlights that, “James Hemings could have easily been forgotten in history, but thanks to various digital techniques, information regarding Hemings could be uncovered.” These various extraction approaches do serve a purpose, but as an economics major, I believe there are more sophisticated ways to draw connections between pieces of data.

Posted on October 23, 2018 by NB

Text Analysis in Professional Sports

After reading Text Analysis and Visualization: Making Meaning Count, by Stefan Sinclair and Geoffrey Rockwell, I was forced to realize that racial stereotypes still do exist in professional sports. We’ve as a country and society come a long way since the 50s and 60s, but we still have a long way to go. Relating specifically to the article, in a study performed on NFL players they found that, “white players were more likely to be called “intelligent” and blacks more likely to be called “natural”.” This doesn’t come as a huge surprise considering the recent events in the NFL involving quarterback Colin Kaepernick. Kaepernick, with both the IQ and athleticism to be an elite quarterback in the league, still has yet to be signed because he publicly expressed his disgust in the current state of the league. Relating this back to text analysis, studies like the one in the article are easy to conduct, but are they completely relevant and accurate in their findings. For example, is there an even split in white vs. black players in the NFL. I would guess absolutely not. This makes it very difficult to compare frequency of certain words like natural and intelligent.

MLC makes some good points in her article about the potential biases that come about with this type of analysis. The author says, “Looking at the frequency of times particular words are mentioned can uncover some bias, which may be unnoticeable at a readthrough.” I would like to do some further research on the particular NFL study conducted to see if I could uncover some biases.

Posted on October 11, 2018 by NB

Use of Dummy Variables in OLS

In I. Theodossiou’s article from the journal of health economics, regressors such as low pay and unemployment are measured against psychological well-being. The dependent variable, psychological well-being, are numerically quantified using feelings such as dissatisfaction, unhappiness, and low self-esteem. These individual feelings are rated on a integer scale. In the article, dummy variables are brought up as a possible measure to help the author draw conclusions about the data. In OLS regressions, dummy variables are used to illustrate the absence or presence of some categorical effect that may be expected to shift the outcome. A dummy variable takes on a value of 0 or 1. The significance of a regression using dummy variables is it now becomes binary and the coefficient in front of the slope is defined as the treatment effect. The treatment effect is this predicted difference among the two groups in the regression.

In MLC’s post on “Unemployment on Psychological Well-Being”, the author highlights the process of collecting categorical data and quantifying this data somehow. One example of a rating scale the author used that was present in the article is the following: not at all (1), no more than usual (2), rather more than usual (3), much more than usual (4). My only issue with this type of analysis is the difficulty in labeling these distinct categories. In OLS regression, I prefer using continuous variables instead of discrete.

Posted on October 7, 2018 by NB

Omitted Variable Bias Simpson’s Paradox

Money, profit, and success define our world today whether we like it or not. All research and studies are conducted with a motive in mind. In Kievit’s Simpson’s Paradox article, treatment dosage and recovery percentage is analyzed among male and female samples. The results reach a fairly contradictory conclusion, that both men and women have a negative relationship, meaning that as you increase the treatment dosage, the recovery percentage is lowered. At first glance, one might believe this to sound utterly ridiculous. But if you think about this relationship more closely, you realize that there is omitted variable bias taking place. Omitted variable bias occurs when “a variable that is correlated with both the dependent and one or more included independent variables is omitted from a regression equation.” In this example, an omitted variable is preexistent health. For example, the majority of people who need a low dosage are probably pretty healthy, so their recovery percentage is high as opposed to the people who need a very high dosage. These individuals might already have some preexisting health conditions that are very serious affecting their recovery percentage. This isn’t captured in the data, which can distort the results. Again, here, we see that data isn’t always accurate and can be easily manipulated to tell a story. For example, coffee manufacturers want to release information from studies indicating that coffee consumption is good for your health to increase consumption and profit.

Posted on October 3, 2018 by NB

Recidivism in US Prisons

Despite the forefront of this article being about recidivism (the share of people who return to state prison three years after being released), I chose to highlight a small detail that might go unnoticed. This is the lack of data readily available for the conductors of this study. Throughout the first few weeks of this semester, I’ve noticed this common trend involving data and the difficulty to obtain it. This article has a lot of similarities to the Eviction Lab, where many areas had unreported data on evictions. Here, the lack of data surrounding prisoners “has complicated efforts to understand the aggregate effects of myriad federal, state, and local efforts to reduce reoffending”. Missing or unclear data can lead to inconclusive conclusions.

Posted on September 29, 2018October 2, 2018 by NB

Exploratory Post 9/29/18

In exploring catapult and WordPress, I just wanted to introduce my website by writing a very short post on my prior experience with R. The winter semester of my freshman year, a classmate and I initiated and designed an independent study in data analytics with one of our professors. For the first six weeks, we audited a high-level economics course in programming and statistical machine learning methods and used this knowledge to create a self-designed final project. We examined NBA analytics and conducted regression models using R, a programming language to draw conclusions about the changing landscape of the game. Unfortunately, I forgot most of this R knowledge, but I am confident I will be able to pick it back up pretty quickly.