By experiencing deeper the interaction of digital humanities and the power of visualization interactivities.
Flourish is a tool that helps users, especially journalists design and create graphics to embed on a website. Without any coding experience, it can make high-end interactive graphics and stories with no tech support.
In the class, we analyze the examples of menus and dishes names that being used over time. We cleaned the data of the dishes’ names and run it through Flourish to examine the trending and analyze the data.
By using Flourish, we are not only able to analyze the text but also can use data visualization served for user’s purposes. We can analyze the text and fix the duplicate problems, hovering over texts to highlight the year and date.
If we used a technique like Flourish, can we analyze and interpret beyond data？
We believe there is still more work left we have to do to flourish. While analyzing and visualizing data is the strength, it still needs more technique to read beyond data and maps.
We surely seeing the trending over time as the spreadsheet and chart shown above. How can we analyze these data beyond that? How can we use this to explore deeper stories beyond data?
This blog posts dives deeper into the collection of articles from “The Crisis”, a collection of articles illustrating the real-life troubles and difficulties that African Americans lived through in the early 1900’s.
For this blog, we look at a different data visualization site called Flourish. Flourish is a network graphing tool that takes data sets and turns them into visualizations that are pleasing to the eye and can provide many different ways of analyzing large amounts of data. This is one of the easiest ways to answer the overarching question provided for this blog post:
Question: Using Flourish, which of the following authors had the most connections while writing in “The Crisis”: Adams, Fauset, Johnson, or Du Bois?
Figure 1: An analysis of all authors
The above Flourish graph shows the connections between every author within “The Crisis” and how each one has a relationship with each other given certain similar topics and mentions as well as written relationships. The larger the dot, the more this author has written throughout the 10-20 years of this articles existence. Each different colored dot represents a different aspect of “The Crisis”, in which the author focused on that kind of information the most during his writings. The key can be found at the top left portion of each figure; Blue represents letters that the author collected and documented, yellow as images, green as articles, purple as fiction, light blue as poetry, dark blue as drama, and finally dark green as music. Even before analyzing more information about the certain authors, I think it is safe to conclude that images and articles were the most prominently used genre for each author, and that could relate to the author that wrote within the genre.
Figure 2: An analysis of John Adams
John Henry Adams mainly wrote in the images genre. He has mainly connections in articles, but also several in poetry and fiction. However, each fiction bubble is smaller than the rest, so it can be inferred that this genre had the least amount of connections to Adams.
Figure 3: An analysis of Jessie Fauset
Jessie Fauset has a significantly greater number of connections than previously spoken Adams. His focus was on articles, in which he has primarily other article connections, as well as larger poetry sections and a small portion of fictional connections. His bubble has an extremely similar size as Adams’ image bubble and they have the same types of connections, so it is interesting to note that the two different genres can have completely different amounts of connections even though they wrote in “The Crisis” for the same amount of time.
Figure 4: An analysis of George Douglass Johnson
George Douglass Johnson focused his work on poetry, which, in previous figures, can be noted as one of the lesser known connections within these articles. However, Johnson still has a numerous amount of connections with the same genres as the previous figures. Even more interesting, his bubble seems to be smaller than the previous two figures, meaning he didn’t write for as long as the other writers. How come quick poetry has such a greater impact on connections and genres than longer works of articles and imagery?
Figure 5: An analysis of W.E.B. Du Bois
Finally, W.E.B. Du Bois is an obvious larger known author throughout “The Crisis”. Before doing any investigating with Flourish in its data sets, we talked about Du Bois more than any other author within these articles. While doing initial research on “The Crisis” at the beginning of the semester, I noticed the Du Bois was a common name in most pages I’ve read. When doing further research with The Voyant tools, I found myself look at Du Bois’ works a lot more than others so I could easily make connections to specific parts of the articles. Du Bois has a similar shaped bubble as the first two individual author figures, which makes sense as he likely wrote for the longest as well as others, and would beat out the connections from smaller poetry authors like Johnson. He has an exuberant amount of connections, mainly towards other article-based authors. Du Bois even has more connections to genres not previously stated in other figures, such as letters.
All in all, Flourish is a great tool to provide connections between authors and their genres, as well as any other specific connections between multiple data points.
I was able to answer my initial question of analyzing four prominent authors: Du Bois had the most connections as an article writer, followed by Fauset, Johnson, and Adams.
Although this question was pretty well answered through easy searches and data analyzation, it also provided even more questions that can be asked in future blog posts. For example, how does the genre impact on well these connections can be made? It seemed as though the more you wrote, the more widely you were known, yet Johnson was a poetry writer and had more connections than Adams, but didn’t write for nearly as long as he did. So what is the true relationship between the length of writing and genres? I can go on with multiple more particular questions, but I am satisfied that my original question was answered easily through analyzing with Flourish.
After reviewing Flourish and its uses, I was pretty interested in how the tool could be used to connect the different types of authors and publications to one another. There are so many authors who were writing different types of works who were connected to each other, as was shown in the network map.
Hovering over a single point highlights every other author they are associated with in the works.
Using flourish, we can also set the thickness of the links connecting the authors. In this case we set the thickness to be thicker the higher the weight, or how many times two authors were mentioned together. This led me to the question: How can we use the different features provided by Flourish to predict trends in writing?
We were able to set the thickness of the connections to be thicker the more times authors were mentioned together. This feature allows us to map out which authors discuss similar topics in their writings and how likely they are to write regarding each other, be It different forms of writing.
“The Crisis” is a compilation of documents that master the trends of the time. But trends are too broad in scope because analyzing trends are not finding a flow that people will like but have to find a specific behavior or situation that people do in their daily lives. After all, the essence of the trend is curating – the process of tying something to something meaningful. Trends are not just pointing to trends, but rather the factors that tell people about new messages. And that’s where I came up with this research question: If we could create a data set of trends and visualize them in graph-making programs like Flourish, wouldn’t we know how certain trends exist in our real lives and how intertwined they are? And if that’s possible, what storytelling is possible through that visualization?
The picture shown above shows a set of data (Connections of authors) from “The Crisis” through a graph that connects points to points. Through the graph, I was able to see how many trends and authors can be visualized and how connected they are. In conclusion, I couldn’t figure out how to process data storytelling through this visualized graph right now to answer my research question. From the result, I think we’re facing a lot of data today, and it’s no longer just showing data to people, but we need to transform data into information and empathize with it. And we need to emphasize storytelling technology to solve data-based problems by communicating empathic visualizations in the form of stories.
Using the “cleaned” dataset of the content within issues of The Crisis in Flourish allows for a variety of research questions to be answered, particularly within the Network Graph function.
This particular tool allows for the relative connection and comparison between the content included in The Crisis to be analyzed, as specified by which columns of data are selected to create a visualization.
Reviewing this tool, I thought it would be interesting to examine the following research question:
How does Flourish’s Network Graph tool allow for an analysis of the overall contributions of various authors between 1910 and 1922?
To answer this question, I utilized the two spreadsheets and organized them in the “Select Columns to Visualize” box:
For Links, “A” refers to “source,” “B” refers to “author,” and “C” refers to “weight.”
For Points, “A” refers to “author,” “B” refers to “most_common_genre,” and “C” refers to “total_extent.”
I selected and ordered these columns as listed to focus the visualization on the authors and how many total pages, within the issues published in the specified timeframe, did they write.
Using this tool, with this selection of data columns, allows for a few analyses to take place in response to the research question.
Figure 1. Visualization of author contributions
Figure 2. Primary: W.E.B. Du Bois
Figure 3. Primary: Madeline G. Allison
Figure 4. Primary: Jessie Fauset
First, the visualization creates points that are sized in order to display the amount, in this case the total pages written by specific authors. In the visualization, the three largest appear as W.E.B. Du Bois, Madeline G. Allison, and Jessie Fauset. Not only does this visualization indicate to readers that these three authors are included the most (in terms of pages) in the issues of The Crisis between 1910 and 1922, but the color of each informs readers of their most common genres. In the case of these three authors, they each contributed articles the most over other genres.
While these three are the highest contributors in terms of page count, within the data, there remains roughly 420 other authors who were also published within the time frame. Next to these three authors, the following highest contributors (Vincent Saunders, Georgia Douglas Johnson, John Henry Adams) published works outside of the article genres, including images and poetry.
Figure 5. Primary: Vincent Saunders
Figure 6. Primary: Georgia Douglas Johnson
Figure 7. Primary: John Henry Adams
I note this because the visualization here, while informing users of those who contributed the most in page count, it could be misleading in that users may deduce that each issue was primarily filled with articles versus other genre content. Furthermore, this visualization is also skewed, in my opinion, since these three authors are the most visible points on the graph but by emphasizing their points, the remaining contributors are less visible, which makes it hard to make comparisons. Whereas, if you focus on the following three highest contributors, only then can you identify what genre they are and their overall contribution. I must also note that finding each of these names, given that their points were not quite as large, required spending more time trying to find their location in the visualization, which is another drawback of this tool.
Overall, while I find the Flourish tool to be useful in answering very specific research questions, when it is used, in particular this network graphing visualization tool, creators and users alike should consider the various ways that the visualization may not be as wholly representative as desired and could in fact produce misleading conclusions and take-aways. When working with data such as issues of The Crisis (which includes a variety of page lengths and content genres) in the aggregate as in this visualization, the aspects that made each issue unique (such as the inclusion of varied materials), may get lost in the visualization.
Flourish is an interesting and helpful website to analyze the word data. It can find the clue from the data to help people which part connects to which part.
Let’s see two examples from the image.
We can gather information though this image by seeing the connection to the different people and the dot color represent the different type of content. We can see the connection to the author is relatively less if we compare another author.
We can compare the two authors by seeing the difference in the amount of connection. Du Bois. W. E. B. obviously has more connections to other content and authors. We might say he has more ‘weight’ in history and other content.
Research Question: What is the difference connections/targets and genre between authors Du Bois, W.E.B. and Underwood & Underwood?
When creating a visualization with the data collected from The Crisis. The data overall was showing the connections between authors and their most common genre that tied them together when published over a span of ten years. However, my question is not looking at the overall collected authors from the year 1910 to 1920. But looking at two authors Du Bois, W.E.B. and Underwood & Underwood and the difference between them in the years 1910 to 1920.
Upon analyzing figure 1 above, the key tells us which colors coordinate to the types of genres there are. Being that the genres are letters, images, articles, fiction, poetry, drama, and music. While also looking at the visitation one can see that the bigger the circle the more the author used that genre in the span of the years of being published in The Crisis. For instance, a smaller circle would represent someone being published in the years 1915-1916 or just the one year 1910. Does not matter but were published in only a few amount of years. Versus larger circles that represent being published for far more many years such as 1910-1917 or even 1910-1920. There is so much data collected but I really wanted to look at the difference between two authors that were apart of The Crisis for a span of many years but took part mostly in being published in different genres. Thus why Authors chosen were Du Bois, W.E.B. and Underwood & Underwood.
When analyzing the connections of author Du Bois, W.E.B in figure 2. I noticed straight away that their most popular genre that they published the most within was articles. I also noticed that they had a very large circle that represented their name since they took part in The Crisis for a span of twelve years overall. Next was looking at their connections between the various authors. Since Du Bois was published quite a bit through the span of many years. They were connected with a various amount of authors with a various amount of genres. Used quite a bit through the years, it would appear Du Bois was a very important and essential part in creating content for The Crisis over the twelve years. Hence, Du Bois was connected to what would appear over fifty plus different authors that were published as well in the same issues that Du Bois was apart of. Since he was published quite a bit through the many years of service within The Crisis articles.
The second author I wanted to analyze was Underwood & Underwood in figure 3. This was because I wanted to look at how someone with such a big circle did not have as nearly as many connections to other authors that Du Bois, W.E.B. had. Looking at one point right away is that you could tell that the most popular genre this author used was images. With such a large circle as well and looking at the data, Underwood & Underwood was apart of The Crisis for nine years. You could also see that there was a connection between both Du Bois, W.E.B. and Underwood & Underwood that may have been hard to catch when looking at Du Bois’s visualization in figure 2. However, looking at figure 3, the visualization is one hundred times easier to analyze and see what actual authors are connected to Underwood & Underwood. However, my main question is why so little connections than Du Bois even though Underwood had been published for The Crisis for so many years? When I thought about it, it made sense. Underwood & Underwood uses images to add to the issue which is used a far less amount of times in various issues. Rather than Du Bois that uses articles that are published within the various issues probably 90% more of the time. Hence, articles are used more than images which leads Du Bois to be connected to more authors through out the years 1910 to 1920 in The Crisis than Underwood.
My overall research question therefore has been answered. I wanted to analyze the different connections and genres that authors had between Du Bois, W.E.B. and Underwood & Underwood. I was able to see what was the most popular genre used for the two authors and why they had so many or so little conections between various authors. I look forward to further analysis and connections between other authors in The Crisis journals.
Research Question: How does the use of the same or different words change from the beginning article to the last through the analysis of trends. While also analyzing how the use of Voyant tools gives the reader a visualization of the depicted trends.
As a reader it can be hard to visually see the evolving trends of an article, especially when there are a plethora of issued articles to read from and analyze over time. With the help of digital humanities, it has given the reader the chance to fully immerse themselves to understand the different trends throughout production. Where as for this analysis the evolved changes through out the history of the Modernist Journals Project The Crisis from beginning to end. With the help of Voyant Tools anyone can analyze a section or whole issue of the Modern Crisis and help uncover the different trends shown below.
Through analyzing the data one could be particularly interested in seeing how the use of the two words “white” and “people” could either increase or decrease in word use from the first to last book. It was found that in Vol. 1, No. 1 the word “people” was used 54 times and the word “white” was used 48 times. Where as in Vol. 25, No.2 the word “people” was used 39 times and the word “white” was used 57 times. From looking at the data through numerical terms one can tell that the use of the word “people” decreased where as the use of the word “white” increased. However, visually looking at the data makes it more attainable to see how in figure one the use of people was used sporadically throughout issue one. The information was indicated by the color coding of the purple bubbles. With an increased use amount of the word in the middle of the text as well. Versus in the second figure using the word people indicated by pink bubbles. The word was used more towards the middle to end of the issue. Opposite however can be said about the word white. In the first figure visually you see by the color orange that the word is used increasingly more towards the beginning. Whereas for the second figure it is used more frequently in the beginning to the end part of the issue. As one may find this relatively hard to interpret the use of the visualization makes it much easier to understand and focus on the trend seen. Hence, words are spread out based on how much they are used and the time of when the word can be used the most for the topic it is addressing.
With just one more thing to point out, not all visualizations have to depict a certain word you are trying to analyze. Through out the first issue, you can see the words that are commonly used throughout the one article. However, when looking at the second article that one can compare to such as Vol. 25, No. 2 (1922-12-01), when analyzed through Voyant Tools, the same words are not commonly used. Words like “colored, negro, new, York, and people” are used increasingly in that order through out the very first issue. Which cannot be said for the same as the last article. It can be said however that the use of different visualization trends can help in interpreting the data through a more pleasing and helpful way for one to read.
This blog post illustrates a few questions about the relationship that The Voyant has with articles such as “The Crisis”. The Voyant is a website that allows a visual representation of the most common words used in a piece of work. A few of the various ways that this data can be represented is in screenshots below, and it is crucial to analyzing pieces of text because it allows the scientist to have an easier way of interpreting and organizing data. I believe this is extremely important because the collection of visual data can be useful in starting a theme for a piece of work. This data can not only answer historical questions about the setting of when “The Crisis” was created, but can also provide evidence for other historical artworks and future works.
For this post, the main questions to focus on are how the usage of certain words impacts the overall message of this article. Without knowing anything about the article, what can we infer that “The Crisis” is about? Another way of asking this question would be: How does text visualization allow scientists and others to understand the meaning of an article? What kind of advantages and disadvantages to The Voyant can we conclude by answering this question?
The methods I used to prepare this blog post was through the readings of “The Crisis” and utilizing screenshots from The Voyant. Uploading a TEI file into The Voyant is the simplest way of achieving this data. The full text can be seen in the middle of the page after uploading the file, but the most important information for analyzing a piece of work will be to the left and right as well as below the overall text. To the left is the visual way of seeing the most commonly used words in the text and to the right are the graphical viewpoints of these words used throughout the text. The summary can be found below all of the above, but I did not use that information because it only gives the basis of what we are looking at for this post. I will use the information in the summary throughout these conclusions in order to pinpoint every aspect of The Voyant.
Figure 1 above is a screenshot taken of the Cirrus of most of the commonly used words in “The Crisis”. Each word has a different color that doesn’t have any correlation – it seems to just be random – but this is helpful because it differentiates between all words and isn’t bland to look at. The larger the word is, the more commonly it is used in the text. Therefore, the most commonly used word is “colored”, with “negro”, “new”, “white”, and “people” close by.
Figure 2 is the screenshot above showing the links between the three most commonly used words – in blue – and the multiple other words – in orange – that are oftentimes next to each other throughout “The Crisis” articles. The most commonly used independent words are “colored”, “negro”, and “new” – similar to Figure 1 – but the grey lines indicated the words that these three are often connected to and follow after.
Figure 3 shown above is the relative frequency of a few of the most frequently used words throughout the articles. The difference between this figure and the previous figures is that this graph shows when the words are used in the text, not just what the words actually are. This information can show an important trend of what the articles are discussing. For example, at the beginning of the articles we can infer that the word “colored” is used more often than the rest of the piece work.
Question 1: Without knowing anything about the article, what can we infer that “The Crisis” is about?
I think the important thing to look at is the relationship between these three figures and how they work hand in hand with each other. Figure 1 is the formation of the most important and frequently used words in the text. Figure 2 are the connections between a few of the most important words from Figure 1 and other commonly used words. This information is important as well because the articles often include the phrases “colored people” or “New York”. The knowledge of this can eliminate the confusion of individual words and even create a more focused conclusion on words/phrases most important in these articles. Finally, Figure 3 uses some of the same words as Figures 1 and 2 in order to visually represent the occurrences of each word and which of the words occurs in the text the most. Sometimes, a certain word is used more frequently in the beginning middle or end of the text rather than others. Without knowing anything about “The Crisis”, the large frequencies of the words from Figure 1, the connections of Figure 1 to Figure 2 – going from only the word “colored”, which can have multiple meanings, to more frequently “colored people”, and the title name itself can conclude that these articles have to do with the social, economic and political crisis between African Americans and whites in the New York area.
Question 2: What kind of advantages and disadvantages to The Voyant can we conclude by answering the previous question?
I personally believe there are more advantages than disadvantages towards the utilization of The Voyant in answering the thematic question in the previous paragraphs. The three figures of data allow for an easier way of analyzing word usage and connecting to an extremely broad theme of a large piece of text. Although Figure 2 was able to create the actual links between words from Figure 1 and clearing up confusion on how these words could be used in these articles, we are not entirely sure of the way these words are used unless we read “The Crisis” from beginning to end. Figure 3 is difficult when answering question 1 because the timestamps of when the most common words are used doesn’t correlate with the theme of the text. However, it is another beneficial way of storing data.
Research question: In the Crisis Magazine, how do trends in word count, vocabulary and frequency of words change over the course of the various publications and what meaning can we deduce from this? We will also take a look at word clouds and charts to make various inferences. This is largely speculation, however it demonstrates how using a simple software program such as Voyant can show us so much without even reading the entire source material. If you truly want to understand what “The Crisis” is saying you either have to get a summary from someone who read it or read it yourself. For this article I want you to pretend you have no idea what the Crisis magazine is, you have never heard of it and you barely know any American history. We will see what a rookie American can find out assuming they can read english well. I will try to work with this from a similar perspective, so we can see what I myself can deduce from working with this Voyant program for the first time.
I will analyze multiple editions of the crisis magazine, making sure to jump ahead in time and space out which editions I review-so as not to review back to back editions. The point of this is to see how the magazine has changed in its language usage, vocabulary and so on. This is important as there may be a trend in the length and word complexity of the magazine.
We begin with the first edition, Vol 1. No 1. With just over 13,000 words, a vocabulary density of 0.254, and the most common words: “Colored”, “negro”, and “new” respectively. Right off the bat this may not tell us a lot, but you may be surprised. Firstly, we know that this is a relatively long first magazine article (13,000 word essay is long trust me). They are talking a lot about black people. New York is mentioned a lot, meaning this article could have strong relevance to this location. Some less frequently used words are “School,” “Chicago,” and “states.” From this we can see this is referring to the United States, or at least has some relevance to it. We also see “Washington” and “Baltimore” used a few times, now we can be certain they are indeed talking about the United States.
Now let’s look at the next two editions: Vol. 4 No. 6 and Vol. 11. No. 5. There is a very interesting phenomenon, the word count has gone up substantially in both of these, going from 26,129 words to 34,822 respectively. This says that either the depth of information of what the articles are saying has been greatly increased, or there are simply more sections to look at. The budget could have increased and would allow for more writing? The vocabulary density has only gone down in the respective editions, this could be because there are simply more words (thereby having more repeated words). It could also mean the ideas/concepts/stories are spelled out more, perhaps to appeal to a younger demographic. If we dig a little deeper we see that the word cloud is mostly similar in all these, signifying similar language is being used. Just because a lot of the same words are being used doesn’t mean the subject matter is identical.
Finally, we come to the conclusion that just by analyzing the texts through Voyant, we figured out the subject material, the change in word count, relevant locations, vocabulary density, and frequently used words.