Tracking the top Diggers

For the past 15 months, I’ve been maintaining a list of the top 100 (and top 1000) Digg users. As Digg has become less and less relevant to my interests (and the interests of the greater tech community), I’ve decided to stop updating these lists. However, I have chosen to provide the tools I have used in case someone else has an interest in tracking Digg statistics.

Caveat: This whole thing was thrown together without any regard for coding or design conventions. Also, I can’t guarantee that you won’t be banned from Digg for using any of this information – I was, a few times. Proceed with caution.

First, the data structures. The Top Diggers lists are supported by two database tables: `digg` and `digg_users`. `digg` holds the set of front-page stories and their submitters, while `digg_users` stores aggregated information about each user in the system.

CREATE TABLE `digg` (
  `user` varchar(128) NOT NULL default '',
  `submission` varchar(255) NOT NULL default '',
           `date` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`submission`),
  KEY `user` (`user`)
);

CREATE TABLE `digg_users` (
  `user` varchar(128) NOT NULL default '',
  `frontpage` int(11) NOT NULL default '0',
  `dugg` int(11) NOT NULL default '0',
  `submitted` int(11) NOT NULL default '0',
  `profileViews` int(11) NOT NULL default '0',
  `frontpagestatic` int(11) NOT NULL default '0',
  `frontpagetotal` int(11) NOT NULL default '0',
  `submittedstatic` int(11) NOT NULL default '0',
  `image` varchar(128) NOT NULL default '',
  UNIQUE KEY `username` (`user`)
);

Next, some data to get you started. Here is a SQL dump of the two tables described above, including information on about 13000 Digg users (for `digg_users`), as well as the last 4 stories on the Digg homepage at the time of the last update (for `digg`). Import this data into your database in preparation for the next step.

The next step: data retrieval. The main work of updating the top 100 list is done by a Python script:

import digg

# Grab the last X pages of popular stories
digg.update_news(100)

# Update the top X profiles
digg.update_profiles(110)

Of course, this code makes no sense without the digg.* methods, downloadable here. This script also requires the excellent BeautifulSoup Python HTML parser. You will have to modify digg.py to change the database connection parameters.

For those who don’t care to read through the code, it achieves two main objectives: Find out who submitted any frontpage stories since the last update (it stops when it hits a story already in the `digg` table), and using that information, determine the new top 100 users and update their profile information.

The last step is data presentation. The information in the database tables needs to be transformed into a readable HTML file. I’ll leave this step as an exercise for the reader, but to get you started, this SQL query will get you the data you want in an easy-to-read format:

SELECT 
   user `Username`, 
   frontpagetotal `Frontpage Stories`, 
   submitted `Stories Submitted`, 
   dugg `Stories Dugg`, 
   profileViews `Profile Views`
FROM digg_users 
WHERE 
   frontpagetotal <= submitted 
ORDER BY 
   `Frontpage Stories` DESC, 
   `Stories Submitted` ASC, 
   `Stories Dugg` DESC 
LIMIT 100

So to sum up, if you want to manage your own “Top 100 Diggers” list, take the following steps:

1. Import the dump of digg data linked above.
2. Set up and run your scripts
3. Create a readable version of the data.

Have fun, and beware the Digg ban-hammer.

4 comments on “Tracking the top Diggers”

FirstDigg says:

Christopher, I wanted to thank you for running this for as long as you did, and then to make your code available when you discontinued it. I wanted to let you know I set up the top diggers list at SocialBlade and will be maintaining it for the foreseeable future there.

Thanks again, and everyone check out the following pages for the new updates:

http://socialblade.com/digg/topusers.html
http://socialblade.com/digg/top1000users.html

and of course also the original SocialBlade page for digg front page data:
http://socialblade.com/digg/diggfpdata.php

Thanks again Chris, and good luck on your future endeavors!

May 28, 2008 at 7:01 am Reply
FirstDigg says:

Just to update everyone, the top diggers list at

http://socialblade.com/digg/topusers.html and
http://socialblade.com/digg/top1000users.html

is now being updated in real time. Every time a digg story gets posted, the list will be updated and resorted.

Enjoy.

June 7, 2008 at 1:20 am Reply
Mad as hell says:

I am sick of Digg being owned by a few select users …. that is not social news

Greasemonkey script to auto bury top users
http://userscripts.org/scripts/show/32571

August 27, 2008 at 11:44 am Reply
Fadi says:

Thanks for the list! I have been looking for it!

March 19, 2009 at 6:02 am Reply

Less Talk, More Do

Christopher Finke writes about things he has done: software, woodworking, and other creative endeavors.

Related

4 comments on “Tracking the top Diggers”

Leave a Reply Cancel reply