When I first ported YouTube Comment Snob to Chrome, Chrome’s lack of a spellchecking API for extensions meant that I would be unable to implement Comment Snob’s most popular and distinguishing feature: the ability to filter out comments based on spelling mistakes. That, my friend, is about to change.
How does it work?
Typo.js uses Hunspell-style dictionaries – the same ones used in the spellcheckers of OpenOffice.org and Firefox. (Typo.js ships with the latest American English dictionary, but you could add any number of other dictionary files to it yourself.) You initialize a Typo.js instance in one of two ways:
var dictionary = new Typo("en_US");
This tells Typo.js to load the dictionary represented by two files in the
dictionaries/en_US/ directory: en_US.aff and en_US.dic. The .aff file is an affix file: a list of rules for creating multiple forms of a word by adding prefixes and suffixes. The .dic file is the dictionary file: a list of root words and the affix rules that apply to them. Typo parses these files and generates a complete dictionary by applying the applicable affix rules to the list of root words.
var dictionary = new Typo("en_US", affData, dicData);
Once you’ve initialized a Typo instance, you can use it to check whether a word is misspelled:
var is_correct_spelling = dictionary.check("mispelled");
Depending on your needs, you can configure Typo.js to perform word lookups in one of two ways:
- hash: Stores the dictionary words as the keys of a hash and does a key existence check to determine whether a word is spelled correctly. Lookups are very fast, but this method uses more memory.
- binary search: Concatenates dictionary words of identical length into sets of long strings and uses binary search in these strings to check whether a word exists in the dictionary. It uses less memory than the hash implementation, but lookups are slower. This method was abandoned as it became impractical to implement for some features.
Practice vs. Theory
Typo.js is already in use in my Comment Snob extension. You can install it today to experience Typo.js in action, filtering comments on YouTube based on the number of spelling mistakes in each one.
What’s next for Typo.js?
The next step is adding support for returning spelling suggestions; right now, all Typo.js can do is tell you whether a word is spelled correctly or not. It also needs to support Hunspell’s compound word rules. These are the rules that a spellchecker uses to determine whether words like “100th”, “101st”, “102th” are correct spellings (yes, yes, and no, for those of you keeping track) since it would be impossible to precompute a list of all possible words of these forms.
The Typo.js code is available on GitHub. I welcome any and all suggestions or code contributions.
Wow, very interesting and useful.. hope to see one working on my fav browser already..
I see the licensing information for the en_US dictionary files (LGPL and BSD). Does “typo.js” carry any special licensing?
DB: It is licensed under the Modified BSD License. I’ve updated the README to reflect this.
Is the default setup using hash or binary word lookups? Also how can this be configured? It is mentioned in this post but I don’t see instructions on how to do so anywhere.
After this post was written, the binary lookup method was abandoned as it became impractical for some of the more advanced features of the spellchecking dictionaries.
Do you have an example for Method #2?
var dictionary = new Typo(“en_US”, affData, dicData);
Like what affData and dicData look like?
I tried creating two large strings based on what you had in your en_US.aff and en_US.dic files, but doesn’t seem to be working for me. Might be doing something wrong, I’ve only been playing with it for half an hour or so.
affData and dicData should be strings containing the contents of en_US.aff and en_US.dic, respectively. Is it possible that your strings do not use \n as a newline, but rather \r or \r\n? That might be problematic. If you can find out what this.rules equals after line 67, that would help with debugging the problem.
Thanks! I had the string like this:
var en_US_dic = “a\
But that didn’t work, not quite sure why. I replaced the \ with \n and got rid of the carriage returns in the file and it’s working now!
Looks like a great tool. Thanks for making this!
really good, but with the suggested word i found a problem, it suggest only words with the same number of letter, for example if i tape :
“mispelle” the suggestions words are “misspell, misspells, ispell”, it does’nt suggest “misspelled”, same thing with appl, it does’nt suggest “apple”
Try to load the dictionary in this way:
I am getting:
jquery.min.js:4 GET http://localhost:60989/scripts/typo/dictionaries/en_US/en_US.aff 404 (Not Found)
Any help would be appreciated.
In case someone has the same problem as above. This solved it for me:
In the web.config in the system.webServer section add:
Thanks for the solution, Godfrey! (I edited your comments to fix the missing tags.)
Good library. To use this with React in the front end, I have written a simple loader for webpack.
Add this to your webpack config:
and create dictionaryLoader.js:
then in your react library just import as below:
And is there a link i can find with instructions? Thank you!
The README covers the typical use case: https://github.com/cfinke/Typo.js/blob/master/typo/README.md
I’m not familiar with React Native development, but there are two things you should need to do:
1. Make sure that typo.js is included in your project in a way that you can call `new Typo()`. In a regular webpage, this would be mean adding a script tag like <script src=”typo/typo.js” />
2. Make sure that Typo can read the dictionary files. It already has methods for reading them when it’s bundled in a Chrome extension, webpage, or Node.js. See the `_readFile()` method for this.
If you run into issues, let me know and I can help work it out.