Programming, Python, Twitter

Retweeting in Python

My friend Eliot has been running a retweet bot named @SanMo for some time that is designed to serve Twitter users in the Santa Monica, California area by allowing them to broadcast tweets to other Santa Monica-area users without explicitly friending them. He wrote up the full details of the service and the script behind it on his blog, and after reading about it, I wanted to start a similar service for Twitterers in my immediate vicinity.

After grabbing a copy of the code running @SanMo, (it’s the same Perl script created to power @lotd, available here), I quickly decided that I would rather write my own, for three reasons:

  • Strike 1: It was written in Perl.
  • Strike 2: It didn’t implement a feature that links the retweet to the original tweet.
  • Strike 3: It required MySQL – a bit much for simple bot.
  • Bonus Strike 4: It was written in Perl.

I sought out to write a lightweight script that would accomplish the same end goal (republishing tweets directed at a given account), and I’m happy to say that it’s finished: retweet.py has been running smoothly for the past few weeks behind the Twitter account SWMetro, a service for Twitter users in the southwest metro area of the Twin Cities.

It’s 40 lines of Python (if you omit blank lines and comments), and you can grab a copy of it from here. It uses SQLite for storage (which you should already have installed if you have Python installed), and it utilizes the great python-twitter library from Dewitt Clinton of Google. (Make sure you get the latest version for trunk; you’ll need simplejson as well, as python-twitter requires it.)

Just download the script and replace “username” and “password” at the top with your account credentials. (You can manage multiple accounts by adding another username/password pair to the ACCOUNTS variable.) Change the DB_PATH variable to point to the directory where you’ll keep your SQLite databases, and then add this to your crontab:

*/2 * * * * python /full/path/to/retweet.py

The script will run every other minute, republishing any tweets that start with “@username”, where “username” is the value you gave the USER variable. The only thing it stores in the SQLite database are the status id’s of the tweets it republishes, and it links each retweet to the original message being republished. If the new message is longer than 140 characters, it chops words off of the end, replacing them with “…” until it’s under the 140 character limit.

If you’re using this script to replace an existing retweet bot, you can supply it with the status id of the last message it re-published so that you don’t end up republishing a bunch of old tweets. To do that, just run it once like this:

$ python retweet.py 12345

where 12345 is the status id of the last message your existing bot published.

Feel free to download retweet.py and use it for your own purposes. All I ask is that if you make an improvement (or start up a new service with it), take a minute and mention it in the comments below.

Update: Some people have had to use the full path to Python (version 2.5 or greater) in their crontab to get retweet.py working properly.

Update: Retweet.py has been updated to ensure it keeps working as intended after Twitter started categorizing all tweets that contain a username as replies, not just ones that start with the username. Grab the updated version here.

Standard
Digg, Digg Statistical Data, Python

Tracking the top Diggers

For the past 15 months, I’ve been maintaining a list of the top 100 (and top 1000) Digg users. As Digg has become less and less relevant to my interests (and the interests of the greater tech community), I’ve decided to stop updating these lists. However, I have chosen to provide the tools I have used in case someone else has an interest in tracking Digg statistics.

Caveat: This whole thing was thrown together without any regard for coding or design conventions. Also, I can’t guarantee that you won’t be banned from Digg for using any of this information – I was, a few times. Proceed with caution.

First, the data structures. The Top Diggers lists are supported by two database tables: `digg` and `digg_users`. `digg` holds the set of front-page stories and their submitters, while `digg_users` stores aggregated information about each user in the system.

CREATE TABLE `digg` (
  `user` varchar(128) NOT NULL default '',
  `submission` varchar(255) NOT NULL default '',
           `date` datetime NOT NULL default '0000-00-00 00:00:00',
  PRIMARY KEY  (`submission`),
  KEY `user` (`user`)
);

CREATE TABLE `digg_users` (
  `user` varchar(128) NOT NULL default '',
  `frontpage` int(11) NOT NULL default '0',
  `dugg` int(11) NOT NULL default '0',
  `submitted` int(11) NOT NULL default '0',
  `profileViews` int(11) NOT NULL default '0',
  `frontpagestatic` int(11) NOT NULL default '0',
  `frontpagetotal` int(11) NOT NULL default '0',
  `submittedstatic` int(11) NOT NULL default '0',
  `image` varchar(128) NOT NULL default '',
  UNIQUE KEY `username` (`user`)
);

Next, some data to get you started. Here is a SQL dump of the two tables described above, including information on about 13000 Digg users (for `digg_users`), as well as the last 4 stories on the Digg homepage at the time of the last update (for `digg`). Import this data into your database in preparation for the next step.

The next step: data retrieval. The main work of updating the top 100 list is done by a Python script:

import digg

# Grab the last X pages of popular stories
digg.update_news(100)

# Update the top X profiles
digg.update_profiles(110)

Of course, this code makes no sense without the digg.* methods, downloadable here. This script also requires the excellent BeautifulSoup Python HTML parser. You will have to modify digg.py to change the database connection parameters.

For those who don’t care to read through the code, it achieves two main objectives: Find out who submitted any frontpage stories since the last update (it stops when it hits a story already in the `digg` table), and using that information, determine the new top 100 users and update their profile information.

The last step is data presentation. The information in the database tables needs to be transformed into a readable HTML file. I’ll leave this step as an exercise for the reader, but to get you started, this SQL query will get you the data you want in an easy-to-read format:

SELECT 
   user `Username`, 
   frontpagetotal `Frontpage Stories`, 
   submitted `Stories Submitted`, 
   dugg `Stories Dugg`, 
   profileViews `Profile Views`
FROM digg_users 
WHERE 
   frontpagetotal <= submitted 
ORDER BY 
   `Frontpage Stories` DESC, 
   `Stories Submitted` ASC, 
   `Stories Dugg` DESC 
LIMIT 100

So to sum up, if you want to manage your own “Top 100 Diggers” list, take the following steps:

1. Import the dump of digg data linked above.
2. Set up and run your scripts
3. Create a readable version of the data.

Have fun, and beware the Digg ban-hammer.

Standard
Django, PHP, Programming, Project X, Python

What’s old is new again

A few weeks ago, I started a little side project, and I decided to write it in Python with the Django framework based on all of the good things I’ve heard about it. I may never go back to PHP.

It’s like this: imagine you’ve been driving the same 1987 Dodge Dynasty for the last 8 years. It gets you around, and you know exactly how to handle it. Most importantly, you’ve learned just what to do when it breaks down to get it going again. Then, one day, someone offers to trade you their brand-new Mustang for your Dynasty, straight-up. (They’re a collector of late ’80’s sedans, you see.) You are unsure, since you’ll have to learn how to handle this new car, but you accept, and your entire perspective on driving changes – the tired chore of going to the post office becomes your favorite pass-time; you’ve volunteering to take friends to the airport even when they have no flights to catch; and you can finally drive on the interstate since you know you won’t break down.

This is what it’s like to switch to Python after a lifetime of writing PHP. Programming is part problem-solving and part code-writing. With PHP, the fun of solving the problems overcomes the chore of writing the code; with Python, writing the code is enjoyable enough that I find myself wanting more problems to solve just so I can code the solutions. It’s a great feeling.

Standard