cell - Retrieving extracted text with Apache Solr -
i'm new apache solr, , want use indexing pdf files. managed , running far , can search added pdf files.
however, need able retrieve searched text results.
i found xml snippet in default solrconfig.xml concerning that:
<requesthandler name="/update/extract" class="org.apache.solr.handler.extraction.extractingrequesthandler" startup="lazy"> <lst name="defaults"> <!-- main content goes "text"... if need return extracted text or highlighting, use stored field. --> <str name="fmap.content">text</str> <str name="lowernames">true</str> <str name="uprefix">ignored_</str> <!-- capture link hrefs ignore div attributes --> <str name="captureattr">true</str> <str name="fmap.a">links</str> <str name="fmap.div">ignored_</str> </lst>
from here (http://www.lucidimagination.com/community/hear-from-the-experts/articles/content-extraction-tika), think have add new field schema.xml (e.g. "content") has stored="true" , indexed="true". however, i'm not sure how accomplish exactly?
any appreciated, thx
add schema.xml looking this:
<?xml version="1.0" encoding="utf-8" ?> <schema name="whatever" version="1.2"> <types> <fieldtype name="string" class="solr.strfield" sortmissinglast="true" omitnorms="true"/> <fieldtype name="int" class="solr.trieintfield" precisionstep="0" omitnorms="true" positionincrementgap="0"/> <fieldtype name="float" class="solr.triefloatfield" precisionstep="0" omitnorms="true" positionincrementgap="0"/> <fieldtype name="long" class="solr.trielongfield" precisionstep="0" omitnorms="true" positionincrementgap="0"/> <fieldtype name="double" class="solr.triedoublefield" precisionstep="0" omitnorms="true" positionincrementgap="0"/> <fieldtype name="date" class="solr.triedatefield" omitnorms="true" precisionstep="0" positionincrementgap="0"/> <fieldtype name="text" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <charfilter class="solr.htmlstripcharfilterfactory"/> <charfilter class="solr.mappingcharfilterfactory" mapping="../../mapping-isolatin1accent.txt"/> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.standardfilterfactory"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <charfilter class="solr.htmlstripcharfilterfactory"/> <charfilter class="solr.mappingcharfilterfactory" mapping="../../mapping-isolatin1accent.txt"/> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.standardfilterfactory"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> </types> <fields> <field name="internal_id" type="string" indexed="true" stored="true"/> <field name="cat" type="int" indexed="true" stored="true"/> <field name="desc" type="text" indexed="true" stored="true"/> </fields> <uniquekey>internal_id</uniquekey> <defaultsearchfield>desc</defaultsearchfield> <solrqueryparser defaultoperator="or"/> <similarity class="org.apache.lucene.search.defaultsimilarity"/> </schema>
if "field" "stored", show in results, default.
Comments
Post a Comment