full text search - Java-Counting occurrence of word from huge textfile -


i have text file of size 115mb. consists of 20 million words. have use file word collection, , use search occurrence of each user-given words collection. using process small part in project. need method finding out number of occurrence of given words in faster , correct manner since may use in iterations. in need of suggestion api can make use or other way performs task in quicker manner. recommendations appreciated.

this kind of thing typically implemented using lucene, if going restarting application repeatedly or don't have oodles of memory. lucene supports lots of other goodies too.

however, if wanted "roll own" code, , have enough memory (probably 1gb), application could:

  • parse file sequence of words,
  • filter out stopwords,
  • build "reverse index" hashmap<string, list<integer>>, string values unique words, , list<integer> objects give offsets of words' occurrences in file.

it take few seconds (or minutes) process file big. once you've created in-memory reverse index can occurrence search quickly. (maybe sub-microsecond per search.)


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -