Gervase Markham has an insightful post about Facebook’s lock-in policy with regards to e-mail. He has a way of writing about the situation that clearly explains the points I’d like to make but can never form into coherent paragraphs.
I received this lovely letter from Facebook’s lawyers earlier today. The key points are transcribed below:
“Dear Mr. Finke:
I am writing you concerning the Firefox extensions you posted at:
These plug-ins are deeply concerning to Facebook because, among other things, they violate Facebook’s trademark rights, its Terms of Service, the security of the site and Facebook user privacy. For example, the facebook-image-to-email extensions permits people to circumvent Facebook security measures that protect user privacy and the scavenger extension allows people to harvest data off the site in contravention of the Terms of Service and also infringes upon user privacy. […]
I insist that you immediately take down the extensions listed above. […]
Deputy General Counsel”
At the request of (and under the threats of legal action from) Facebook, I’ve taken down the Facebook Image-to-Email and Facebook Scavenger Firefox extensions. Facebook claims that any method of automating against their site is a TOS violation, although if that were true, simply using a Web browser to convert their raw HTML code into readable text and images would constitute a violation.
I maintain that both tools provided a useful service that Facebook has neglected to provide itself, but I will not continue to make them available via this website. Facebook has also confirmed to me via e-mail that it will not offer users the option of having their e-mail address displayed in plain clickable text, under the guise of protecting the users’ privacy. (E-mail addresses are already visible as images, but you can’t click on them to send the user a message. This has the serendipitious side-effect of making Facebook’s own in-site messaging system a much more attractive method of communication for Facebook users.)
A couple of things to note: while I obviously cannot retrieve any copies of these extensions that have already been downloaded, Facebook feels quite strongly that the usage of Image-to-Email and Scavenger violates their Terms of Service. They cannot stop you from using it, but they can (in theory) test for the presence of either extension and ban you from their site if you have them installed. Contact me privately via e-mail if you’re concerned about your usage being detected in this manner.
There’s a big debate going on today about Robert Scoble getting booted from Facebook for harvesting data about his friends with a bot. The relevant portion of the Facebook TOS that he violated is this:
“You agree not to: […] use automated scripts to collect information from or otherwise interact with the Service or the Site;”
But according to the same TOS,
“you are granted a limited license to access and use the Site and the Site Content and to download or print a copy of any portion of the Site Content to which you have properly gained access solely for your personal, non-commercial use, provided that you keep all copyright or other proprietary notices intact.”
So you can download/print the data on the site for personal use, but you can’t write a bot to go out and get it. Fair enough, but what if someone were to write, oh, I don’t know, a Firefox extension that sits quietly in the background while you browse Facebook, and as you manually view your friends’ pages, it takes the data from the browser’s cache, grabs the info you want (like, oh, I don’t know, their e-mail addresses), and allows you to export that in a common format, like CSV. That wouldn’t break the Facebook TOS, since there is no automatic collection of information from Facebook’s servers (just the browser cache), but you could still have the info you want in an easy to read (and easy to import) format. It might not be a reasonable solution for people with 5,000 friends, but for us regular Joes, we could easily spend half an hour and have all the data we need from Facebook.
Anyway, it’s just a thought; it’s not like I’m planning on doing this or anything. Since when am I the kind of person to irk a large social networking site by making their data easily available?
A while back I mentioned that Facebook Image-to-Email (a Firefox extension that converts Facebook’s e-mail address images to plain-text) was broken after some unknown change was made by Facebook. I am happy to announce that it is working again, after I re-tooled it with a different method for accessing the image data of those e-mail address images.
You can download this new release from the Facebook Image-to-Email homepage. If you don’t care to know more about the technical details, stop reading now.
What I’ve done is this: instead of accessing the images directly, the extension now takes a screenshot of the entire page (allowed under the browser’s security policies), locates the portions of the page that contain the e-mail address images, and parses them out entirely from the browser’s chrome, a beautiful place with much looser security restrictions than a webpage. (I’ve also added character maps for “-” (hyphen) and the “r.” sequence that wasn’t being parsed properly.)
(Sidenote with relevance to current events: this extension is now a hop, skip, and a jump away from being able to be used to parse and download all of your friends’ information, including e-mail addresses. If Scoble had only waited, he could have avoided this whole mess.)
I am aware that the Facebook Image-to-Email Firefox extension is (once again) broken, and given that version 1.1 installed on Firefox 18.104.22.168 was working, and now version 1.1 installed on Firefox 22.214.171.124 is not working, it has to be due to a change that Facebook made. The problem is that I can’t discern any relevant changes in Facebook’s profile pages that would cause a problem.
Any insight into this is appreciated.
I’ve just updated the Facebook Image-to-Email Firefox extension to be compatible with Facebook’s new image generation algorithm. (Facebook Image-to-Email converts e-mail addresses in images to plain text.)
This new version should be a little more robust; if it can’t convert the image to text, it appends a small question mark after the image, rather than just giving up and removing the image completely. Thanks to Gervase and Aaron for their help in finding and fixing this set of bugs.
You can install Facebook Image-to-Email from its homepage. It is compatible with Firefox 1.5 through 3.0a9, Netscape Navigator 9, and Flock.
The extension I’ve written to do this is called Facebook Image-to-Email. On any Facebook page containing an e-mail address image, the extension converts the image to text using the following workflow:
- Copy the image to a canvas using drawImage()
- Scans through the canvas to find matches for a pre-determined set of character sprites using getImageData
- Replaces the image with a clickable text e-mail address
Here’s an example of a “before” view:
Of course, this appears to be pretty simple. Ironically, the hardest part of solving this problem was the only part that Gervase remarked would be trivial: matching the image against a set of known character patterns. Since the font is not monospaced, and certain letters bleed into each other when adjacent (such as “89” and “ef”), it wasn’t possible to just store the pixel values for each full letter. What I ended up doing was only matching against only the center of the letters (which are never affected by adjacent letters) and just ignoring character edges.
One other detail as to the implementation: there appears to be some sort of security restriction in Firefox on reading data from images that are not in the same domain as the script reading them. For example, trying to call getImageData() from the chrome on a canvas that contained an image loaded from facebook.com returned null every time; the same happened if the script was running locally but loading a remote image. For this reason, the actual scripting that converts the image to text has to be injected into each page that requires it so that it appears to be running in the same domain as the image.
I’m not claiming that this is the most efficient implementation, but it is definitely complete. In my testing so far, it has correctly identified 100% of the e-mail addresses displayed in the images.