Wednesday, 15 July 2015

java - Handling large search queries on relatively small index documents in Lucene -


I am working on a project where we index relatively small documents / sentences, and we consider these indexes to be large Here's a relatively simple example of the query that you want to use the documents: I am indexing the document:

  docId: 1 text: "back black"   

and I want to query to use it July 25, 1980 Programmed release, "Back in Black" was the first AC / DC album, which was recorded without singer Bon Scott, who died on 1 February 9 at the age of 33 years. And he was dedicated. "

What is the best way for this in Lucene? For simple examples, where the text I want to get is, exactly input query, Analyzer + by using a phrase QueryParser.parse (QueryParser.escape (.. my big input ...)) - that ends up preparing a big Boolean / word query.

But I try not to use a phrase interpretation for a real world example I can do it, i think that I have to use a word ngram approach like a horn anchor wrapper, but as my input documents can be quite big, the organizer will be difficult to handle ...

In other words, I got stuck and any thoughts would be greatly appreciated :) PS

I did not mention it, but it is also an annoying thing to index small documents That "m Because the "float" is being encoded on only 1 byte, all 3-4 word sentences get the same value, so the phrase "ABC" as the search "ABC" and "ABCD" shows with the same score is.

Thank you!

I do not know how many sentences you have, but you may want to reverse the problem: store

(Note: This is how Elasticsearch works.)

Edit (2013-06-21) :

If you have a large number of sentences, it may be better to store sentences in an index. But instead of using phrase questions, you can try indexing using Lucene. On the query time, instead of using QueryParser, your method is good for manually creating queries, but if you make index pulse, then you can create a pure bullion query where each clause is a query rather than a query Matches the Shingle.

No comments:

Post a Comment