solr - Nutch querying on the fly -
i newbie nutch , solr. relatively newer solr nutch :)
i have been using nutch past 2 weeks, , wanted know if can query or search on nutch crawls on fly(before completes). asking because websites crawling huge , takes around 3-4 days crawl complete. want analyze quick results while nutch crawler still crawling urls. 1 suggested me solr make possible.
i followed steps in http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ this. see injected urls shown in solr search. know did foolish , crawl never happened, feel missing information here. did steps mentioned in link. think somewhere in process there should crawling happening , missed.
just wanted see if 1 me pointing out , went wrong in process. forgive foolishness , patience.
cheers, abi
this not possible. though chunk crawl cycle in smaller number of url's such publish result more whith command
nutch generate crawl/crawldb crawl/segments -topn <the limit>
if using onestop command craw
l should same.
i typically have 24hours chunking scheme.
Comments
Post a Comment