When I first ported YouTube Comment Snob to Chrome, Chrome’s lack of a spellchecking API for extensions meant that I would be unable to implement Comment Snob’s most popular and distinguishing feature: the ability to filter out comments based on spelling mistakes. That, my friend, is about to change.
How does it work?
Typo.js uses Hunspell-style dictionaries – the same ones used in the spellcheckers of OpenOffice.org and Firefox. (Typo.js ships with the latest American English dictionary, but you could add any number of other dictionary files to it yourself.) You initialize a Typo.js instance in one of two ways:
var dictionary = new Typo("en_US");
This tells Typo.js to load the dictionary represented by two files in the
dictionaries/en_US/ directory: en_US.aff and en_US.dic. The .aff file is an affix file: a list of rules for creating multiple forms of a word by adding prefixes and suffixes. The .dic file is the dictionary file: a list of root words and the affix rules that apply to them. Typo parses these files and generates a complete dictionary by applying the applicable affix rules to the list of root words.
var dictionary = new Typo("en_US", affData, dicData);
Once you’ve initialized a Typo instance, you can use it to check whether a word is misspelled:
var is_correct_spelling = dictionary.check("mispelled");
Depending on your needs, you can configure Typo.js to perform word lookups in one of two ways:
- hash: Stores the dictionary words as the keys of a hash and does a key existence check to determine whether a word is spelled correctly. Lookups are very fast, but this method uses more memory.
binary search: Concatenates dictionary words of identical length into sets of long strings and uses binary search in these strings to check whether a word exists in the dictionary. It uses less memory than the hash implementation, but lookups are slower. This method was abandoned as it became impractical to implement for some features.
Practice vs. Theory
Typo.js is already in use in my Comment Snob extension. You can install it today to experience Typo.js in action, filtering comments on YouTube based on the number of spelling mistakes in each one.
What’s next for Typo.js?
The next step is adding support for returning spelling suggestions; right now, all Typo.js can do is tell you whether a word is spelled correctly or not. It also needs to support Hunspell’s compound word rules. These are the rules that a spellchecker uses to determine whether words like “100th”, “101st”, “102th” are correct spellings (yes, yes, and no, for those of you keeping track) since it would be impossible to precompute a list of all possible words of these forms.
The Typo.js code is available on GitHub. I welcome any and all suggestions or code contributions.