Giuseppe: n gram - Rails sunspot-solr

Thursday, 15 May 2014

n gram - Rails sunspot-solr - words with hyphen -

I am using sunspot_rails gem and everything is working correctly so far: but for the words No search results are found with a hyphen Example: The string "tron" gives a lot of results (the word described in all the articles is e-tron)

The string "e-tron" returns 0 results, although this is all my articles The correct word outlined in

My current schema.xml config:

  & lt; FieldType name = "text" class = "Solr.TextField" omitNorms = "false" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.StandardFilterFactory" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; Filter class = "solr.EnggeramFilterFactory" minGramSize = "2" maxGramSize = "15" side = "front" /> & Lt; / Analyzer & gt; & Lt; Analyzer type = "query" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.StandardFilterFactory" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt;    What do I need: The behavior of the search string tron is absolutely fine, but I have to match the exact string e-tron to match. The problem is that solr.StandardTokenizerFactory is the words divided by hyphens, so "e-tron" tokens are " E "," Tron ". Probably "E" is lost in the form of a solar. Textfield Filter 2 with minimum token size  
 This is an example that will show your specific problem.  
  & lt; FieldType Name = "text" class = "solr.TextField" omitNorms = "false" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.WhitespaceTokenizerFactory" /> & Lt; Filter class = "solr.WordDelimiterFilterFactory" preserveOriginal = "1" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; Filter class = "solr.EnggeramFilterFactory" minGramSize = "2" maxGramSize = "15" side = "front" /> & Lt; / Analyzer & gt; & Lt; Analyzer type = "query" & gt; & Lt; Tokenizer class = "solr.WhitespaceTokenizerFactory" /> & Lt; Filter class = "solr.WordDelimiterFilterFactory" preserveOriginal = "1" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt;      solr.WhitespaceTokenizerFactory  will generate a token on white space.  ["e-tron"]   
  will be divided on  solr.WordDelimiterFilterFactory  hyphen but will preserve the original word as well as  ["e", "Tron", "e-tron"]




Posted by



Unknown




at

03:22











Email ThisBlogThis!Share to XShare to FacebookShare to Pinterest




No comments:







Post a Comment