I have a question about Lucene's token filter order. For example, if I want to filter the following, what is the order of Lucene to use these filters?
1- Lowercase Fill Cats => Cats
2- TrimFilter Cat! = & Gt; Cats
3- StopFilter A Cat =>
4- Length filter
5- stempharm filter
6- SynonymFilter
I could not find any documents explained to this sequence.
The order of the filter depends on your needs, however, your order seems appropriate.
Note, I do not believe that Trimfilters and lowercase filters are usually used in advance, so that the filters do not need to handle the cover or white space to operate on the content of words. Stopfilter and length filter work in this way, and I usually use it before using it. Words taken by StopFilter are usually very atomic, and therefore apply it before understanding the stammer. Applying these filters after stenmar can give you the result of losing meaningful stems. If you want do want to work on the stamped word, then your stomper will need to come after stammer. The stemmer certainly depends on your needs, after the stemmer synonyms filter is usually more useful, but you will need to define synonyms using your coordinated forms trimfilter specifies by you. Leaving white space with
trimfilter token, not punctuation marks. Most tokenser renders this filter unnecessarily.
No comments:
Post a Comment