java - Improving performance of fuzzy string matching against a dictionary -
so i'm working using secondstring fuzzy string matching, have large dictionary compare (with each entry in dictionary has associated non-unique identifier). using hashmap store dictionary.
when want fuzzy string matching, first check see if string in hashmap , iterate through of other potential keys, calculating string similarity , storing k,v pair/s highest similarity. depending on dictionary using can take long time ( 12330 - 1800035 entries ). there way speed or make faster? writing memoization function/table way of speeding up, can else think of better way improve speed of this? maybe different structure or else i'm missing.
many in advance,
nathan
what looking bktree (bk-tree) combined levenshtein distance algorithm. lookup performance in bktree depends on how "fuzzy" search is. fuzzy defined number of distance (edits) between search word , matches.
here blog on subject: http://blog.notdot.net/2007/4/damn-cool-algorithms-part-1-bk-trees
some notes on performance: http://www.kafsemo.org/2010/08/03_bk-tree-performance-notes.html
notes on http://en.wikipedia.org/wiki/levenshtein_distance algorithm.
also, here bk-tree written in java. should give idea of interface: http://code.google.com/p/java-bk-tree/
Comments
Post a Comment