Wednesday 15 August 2012

ElasticSearch types and indexing performance -


I would like to understand the effect of the performance of multiple types of indexing documents in a single index, where the number imbalances are of each type Of objects (there are millions of types of one type, where there are thousands of other types of documents). I have seen some indexed issues, and have decided whether different types of indexes are done within an index (or not), will help me, can i agree that like a relational database differently Is indexed, where each table is effectively different?

If there is no answer to the answer given above and that type can be effectively mixed together, then I am trying to do all the rest and some more detailed input I'm getting.

In the case of using this example, Twitter users have to capture tweets (call it for clarity). I have a multi-tenant environment with an index of Twitter owner. I said that, focusing on a single owner:

  • I have every timeline (mention, direct message, my tweets and full 'home') 'Timeline) capturing tweets with each one in an index. Timesty type has a different mapping in ElasticSearch
  • Each tweet refers to the type of a parent, the user who has the color (Which may be the owner or maybe) the author, with parents mapping. There is only one single 'user' type for all timeline types
  • I search and initiate only one owner in a single question, so I do not have to search in many indexes.
  • The time limit of the house can take millions of tweets, where the owner's own tweets can result in hundreds or thousands
  • Regularly posting the user's documents to the timeline Outside Information is updated with information, so I would like to keep several copies of the same user object in sync in many sequences to avoid (if possible) I

    I have seen By indexing millions of documents, even with few thousand entries, indexing millions of documents except the 'Home Timeline' type on the index. Due to the parent's relationship between a tweet and user, I do not need to divide the different parts into different indexes (unless I have it)

    Can anyone be understood? The issue is that this issue is with the number of total documents in a particular index, there is something for the sake of filtered queries 'susilield', some other poor designs of questions or aspects

    EDIT < / Strong>

    This statement For an illustration of tweets that have been credited to Timeline. This means that there is an elastic search type defined for home_timeline, my_tweets_timeline, ment_timeline, direct_messages_timeline, etc. What you see in the standard twitter.com UI. Therefore, there is a natural split between the sets of tweets, though some overlap.

    I have gone back to check the hat-child queries, and this is a fixed red herring at this point. The basic questions on the large index are very slow, even if you ask a question with a few thousand rows (my_tweets_timeline).

    Can I assume that it is indexed separately like a relational database Where is each table effectively different?

    No, type all together at an index as you have guessed.

    Can I understand that this issue is in the number of total documents in a particular index, there is something with the operation of some 'fielded queries', some questions or other aspects of the aspects Bad design, or something else?

    The number of documents in the index is clearly a factor. Are special to the Ha_child Prsnen slow - try for example trivial to the has_child queries performed with questions. "The memory considerations provide a clue under":

    With the current implementation, all _id Memory of values ​​to support fast lookup (heap) Is loaded, so make sure there is sufficient memory for it.

    Indicates that any has_child query requires large amounts of memory, where millions of potential children are there to make sure that such actions are performed There is ample memory available, or consider a new design that removes the need for has_child .

No comments:

Post a Comment