c# - How to find the number of occurrences of a string within a huge string like a big book -


i asked question during c# interview session:

how efficiently find number of occurrences of word within huge text big book (the bible, dictionary, etc).

i wondering efficient data structure store contents of book in. dirtiest soultion think of store in stringbuilder , find count of substrings, sure there has better way this.

and reasonably sized string there multiple ways of doing using substring, regular expressions, etc humongous string efficient way.

update: looking this:

assuming there text file, lets again bible, of size 20 mb, , want find number of times word "jesus" occurs in text, other loading entire 20 mb string or stringbuilder , using substring or regex find match count, there other data structure used store entire text contents. actual search can accomplished in multiple ways, looking efficient "data structure" temporary storage.

assuming dont care substrings, full words, use hashtable. can built in linear time , size proportional number of distinct words. dictionary<string,int> specifically. on machine, took 450ms load entire bible hashtable , find entries of word "god".


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -