History Cloud, News, Programming, Tagging

History Cloud

Something I’ve always been fascinated with is the way that certain words can define a time period. Over the last decade, for example, there were periods of weeks where you couldn’t turn on the television without hearing “anthrax” or “lewinsky” or “taliban” or “tsunami.”

I’ve been working on a little project to help visualize what these “hot topics” were for each month over the last seven years, and I’m calling it the History Cloud. What I’ve done is parsed the main pages of several major news sites going back to mid-2001 and extracted the words most commonly used in news articles during each month. I’ve arranged these words in a tag cloud to make it easy to see which terms dominated the news. Clicking on any of the terms in the cloud shows the stories from that month that made the term popular, courtesy of Google News.

History Cloud Example

My favorite feature of this history cloud is that when you hover over a term in the tag cloud, the other terms that are related to that term are highlighted as well. This makes it even easier to tell why a certain term was in the news. For example, if you hover over the “bush” tag in May of 2001, the highlighted related terms are “energy”, “education”, “tax”, and “plan.” Contrast that with the “bush” term in April of 2003, where the related tags are “UN”, “Iraq”, and “Iraqi.” Oh to be back in the innocent days of early 2001…

If any of this sounds interesting to you, check it out. Feedback is always welcome. FYI: there are a few months with little or no data; clicking on these months just won’t display any tags.

Standard

2 thoughts on “History Cloud

  1. robert says:

    Almost as interesting is what words don’t show up. I didn’t see ‘islam’ or ‘muslim’ in any of the months. I thought particularly that one or both would show up during the month of the cartoon protests, but they did not. ‘Terrorist’ also shows up very seldom, but surprisingly ‘insurgent’ also makes very rare appearances.

Leave a Reply

Your email address will not be published.