Fuzzy String Matching
Fuzzy matching and confidence levels is what this exercise is all about. It is tough to match two strings and say that they are quite similar, but not exact. There are a few ways you can achieve this goal.
This calculates the minimum number of insertions, deletions, and substitutions necessary to convert one string into another. A low distance between two strings means that the strings are more similar. The best site I have found is Levenshtein Distance, in Three Flavors, which I mirrored locally because the original site went down.
I stumbled across this algorithm in PHP's documentation about the similar_text() function. The best source for the algorithm that I found was in PHP's source code for the string functions. Look for the
I have created C and FoxPro versions of the code. They are both recursive, so be careful with large strings on limited devices. Eduardo Curtolo provided a Pascal version. Someone (sorry, I don't have this information any longer) contributed a Ruby implementation.
This algorithm was once used on many U.S. driver's licenses. Its goal is to group letters that sound alike, then convert the name into a series of numbers that can represent the name. Understanding Classic Soundex Algorithms provides a very nice description of how SoundEx is used and generated.