full text search - Java-Counting occurrence of word from huge textfile -

- September 15, 2013

i have text file of size 115mb. consists of 20 million words. have use file word collection, , use search occurrence of each user-given words collection. using process small part in project. need method finding out number of occurrence of given words in faster , correct manner since may use in iterations. in need of suggestion api can make use or other way performs task in quicker manner. recommendations appreciated.

this kind of thing typically implemented using lucene, if going restarting application repeatedly or don't have oodles of memory. lucene supports lots of other goodies too.

however, if wanted "roll own" code, , have enough memory (probably 1gb), application could:

parse file sequence of words,
filter out stopwords,
build "reverse index" hashmap<string, list<integer>>, string values unique words, , list<integer> objects give offsets of words' occurrences in file.

it take few seconds (or minutes) process file big. once you've created in-memory reverse index can occurrence search quickly. (maybe sub-microsecond per search.)

Search This Blog

Aleternatvie

full text search - Java-Counting occurrence of word from huge textfile -

Comments

Post a Comment

Popular posts from this blog

java - netbeans "Please wait - classpath scanning in progress..." -

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

openxml - Programmatically format a date in an excel sheet using Office Open Xml SDK -