A blazingly fast spell-checking library - Pure C99 implementation of Wolf Garbe's SymSpell algorithm.
By Suman Pokhrel
SymSpell is 1 million times faster than traditional spell-checkers. Instead of generating all possible corrections for a misspelling, it pre-computes deletions of dictionary words, enabling lightning-fast lookups.
Original algorithm by Wolf Garbe
- β‘ Fast: Average lookup time ~5Β΅s (real-world usage)*
- π― Accurate: 82-84% correction rate on standard test sets
- π§Ή Clean: Zero compiler warnings, ~700 lines of well-structured code
- π Portable: POSIX-compliant, works on any Unix system
- β Complete: Includes tests, benchmarks, and dictionary tools
- π¦ Self-contained: No external dependencies
make
./test_symspell dictionaries/dictionary.txtInteractive mode:
Enter words to correct (or 'quit' to exit):
recieve
Suggestions:
receive (distance=1, iwf=7.234567, prob=0.001234, freq=234567)
#include "symspell.h"
// Create dictionary with max edit distance=2, prefix length=7
symspell_dict_t* dict = symspell_create(2, 7);
// Load dictionary (86k words optimized for spell-checking)
symspell_load_dictionary(dict, "dictionaries/dictionary.txt", 0, 1);
// Lookup suggestions
symspell_suggestion_t suggestions[5];
int count = symspell_lookup(dict, "speling", 2, suggestions, 5);
// Print results
for (int i = 0; i < count; i++) {
printf("%s (distance=%d)\n",
suggestions[i].term,
suggestions[i].distance);
}
// Cleanup
symspell_destroy(dict);Average lookup time: ~5Β΅s (real-world usage)*
Misspelling correction: ~30Β΅s (worst case)
Correctly spelled: ~0.7Β΅s (fast path)
Correction accuracy: 82-84%
Dictionary size: 86,060 words
Memory usage: ~45MB (with deletes index)
*Based on 15% real-world error rate in user-typed text (2-3% error per character Γ 4.7 average characters per word)
Tested on Apple M4, comparable results on x86.
Requirements:
- C99 compiler (GCC, Clang)
- Make
- POSIX-compliant system (Linux, macOS, BSD)
Compile:
make # Build test and benchmark programs
make test # Build and run tests
make benchmark # Build and run benchmark
make clean # Remove built programsManual compilation:
gcc -std=c99 -O2 -Iinclude test/test_symspell.c src/symspell.c -o test_symspellSymSpell returns suggestions based on edit distance (number of character changes needed):
# "helo" with max_distance=1
$ ./test_symspell dictionaries/dictionary.txt
> helo
held (distance=1) # One substitution: oβd
# "helo" with max_distance=2
$ ./test_symspell dictionaries/dictionary.txt --max-distance 2
> helo
hello (distance=2) # Two insertions: +l, +o
held (distance=1) # Still shown as closer matchThe algorithm prefers closer matches. If you want "hello" as a suggestion, increase max_edit_distance when calling symspell_lookup().
Real-world typos are ambiguous:
- "helo" β "held" or "hello"?
- "teh" β "the" or "tea"?
- "there" β correct word or "their"?
Even commercial spell-checkers (Microsoft Word, Google Docs) achieve 80-85% accuracy on standardized test sets because:
- Language is inherently ambiguous
- Context is needed for many corrections
- Trade-offs exist between precision and recall
Our 82-84% matches the original SymSpell paper and is competitive with commercial solutions.
"5Β΅s average" means:
- β 200x faster than Python implementations (~1000Β΅s)
- β 2000x faster than traditional approaches (~10000Β΅s)
- β Can check 1000 words in 5ms (imperceptible to users)
- β Real-time spell-checking with zero UI lag
For most applications, this is more than fast enough.
Main dictionary (dictionaries/dictionary.txt): 86,060 words
- Optimized for spell-checking accuracy (82-84%)
- Balanced across multiple test sets
- Recommended for general use
Extras dictionary (dictionaries/dictionary-extras.txt): 25,327 words
- Technical terms and acronyms
- For keyword extraction and IWF scoring
- DO NOT use for spell-checking (reduces accuracy)
Building custom dictionaries:
See dictionaries/reference/ for source materials and tools.
Full documentation in docs/DICTIONARY.md.
symspell-c99/
βββ src/
β βββ symspell.c # Implementation (~700 lines)
βββ include/
β βββ symspell.h # Public API
β βββ hash.h # Hash table implementation
β βββ xxh3.h # xxHash for fast hashing
β βββ posix.h # POSIX compatibility layer
βββ test/
β βββ test_symspell.c # Interactive test program
β βββ benchmark_symspell.c # Performance benchmarks
β βββ data/ # Test datasets
βββ dictionaries/
β βββ dictionary.txt # Main 86k word dictionary
β βββ reference/ # Dictionary building tools
βββ docs/
βββ SYMSPELL.md # Algorithm details
βββ DICTIONARY.md # Dictionary customization
- SYMSPELL.md - Algorithm explanation and implementation details
- DICTIONARY.md - Dictionary building and customization guide
- dictionaries/README.md - Dictionary overview
SymSpell uses Symmetric Delete spelling correction:
- Pre-computation: Generate all deletions up to edit distance N for each dictionary word
- Lookup: Generate deletions of the input word
- Match: Find dictionary words that share these deletions
- Rank: Sort by edit distance and word frequency
This trades memory for speed - pre-computing deletions makes lookups extremely fast.
Run the test program:
./test_symspell dictionaries/dictionary.txtRun benchmarks:
./benchmark_symspell dictionaries/dictionary.txt test/data/symspell/misspellings/misspell-codespell.txtQ: Why does "helo" suggest "held" instead of "hello"?
A: "held" has edit distance 1 (one character change), while "hello" has edit distance 2 (two insertions). SymSpell prefers closer matches. Increase max_edit_distance to get more suggestions.
Q: Why only 82-84% accuracy?
A: This is competitive with commercial spell-checkers and matches the original SymSpell paper. Perfect accuracy is impossible due to language ambiguity.
Q: Compilation fails with undefined reference to log
A: Add -lm flag to link the math library:
gcc -std=c99 -O2 -Iinclude test/test_symspell.c src/symspell.c -lm -o test_symspellMIT License - See LICENSE file for details.
This implementation is independent but follows the algorithm described in:
- SymSpell by Wolf Garbe (MIT License)
- Algorithm: Wolf Garbe - Creator of SymSpell
- Implementation: Suman Pokhrel - Pure C99 port
Special thanks to Dr. James Freeman for his mentorship and guidance on this project.
Suman Pokhrel
- GitHub: @sumanpokhrel-11
- Email: 11pksuman@gmail.com
This is the first pure C99 implementation of SymSpell. Contributions welcome:
- Bug reports and fixes
- Performance improvements
- Dictionary enhancements
- Documentation improvements
- Platform-specific optimizations
Please open an issue or pull request on GitHub!
- Original SymSpell (C#) - Wolf Garbe's original implementation
- SymSpell Ports - Implementations in other languages
This is the first and only pure C99 implementation of SymSpell, designed for performance, portability, and clean code.