Thursday, 15 May 2014

n gram - Rails sunspot-solr - words with hyphen -


I am using sunspot_rails gem and everything is working correctly so far: but for the words No search results are found with a hyphen Example: The string "tron" gives a lot of results (the word described in all the articles is e-tron)

The string "e-tron" returns 0 results, although this is all my articles The correct word outlined in

My current schema.xml config:

  & lt; FieldType name = "text" class = "Solr.TextField" omitNorms = "false" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.StandardFilterFactory" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; Filter class = "solr.EnggeramFilterFactory" minGramSize = "2" maxGramSize = "15" side = "front" /> & Lt; / Analyzer & gt; & Lt; Analyzer type = "query" & gt; & Lt; Tokenizer class = "solr.StandardTokenizerFactory" /> & Lt; Filter class = "solr.StandardFilterFactory" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt;   

What do I need: The behavior of the search string tron ​​is absolutely fine, but I have to match the exact string e-tron to match. The problem is that solr.StandardTokenizerFactory is the words divided by hyphens, so "e-tron" tokens are " E "," Tron ". Probably "E" is lost in the form of a solar. Textfield Filter 2 with minimum token size

This is an example that will show your specific problem.

  & lt; FieldType Name = "text" class = "solr.TextField" omitNorms = "false" & gt; & Lt; Analyzer Type = "Index" & gt; & Lt; Tokenizer class = "solr.WhitespaceTokenizerFactory" /> & Lt; Filter class = "solr.WordDelimiterFilterFactory" preserveOriginal = "1" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; Filter class = "solr.EnggeramFilterFactory" minGramSize = "2" maxGramSize = "15" side = "front" /> & Lt; / Analyzer & gt; & Lt; Analyzer type = "query" & gt; & Lt; Tokenizer class = "solr.WhitespaceTokenizerFactory" /> & Lt; Filter class = "solr.WordDelimiterFilterFactory" preserveOriginal = "1" /> & Lt; Filter class = "solr.LowerCaseFilterFactory" /> & Lt; / Analyzer & gt; & Lt; / FieldType & gt;   
  1. solr.WhitespaceTokenizerFactory will generate a token on white space. ["e-tron"]
  2. will be divided on solr.WordDelimiterFilterFactory hyphen but will preserve the original word as well as ["e", "Tron", "e-tron"]

No comments:

Post a Comment