Friday 15 May 2015

java - Algorithm and Data Structure for Checking letters in a word with another set of letters -


I have a set of 200,000 words and letters, I need an algorithm to find out that all of the words The characters are in that set of letters. Testing words one by one is very slow because in the process there are a large number of words, I need a data structure to do this. any idea? Thanks! For example: I have a letter {b, g, e, f, t, u, i, t, g, n, c, m, m, w, c, s} I want to check the word "big" and "buff" All the letters of "big" are the subset of the original set, then the "big" is what I want, while "fond" I do not want because the original set has only one "F", this is what I want to do.

It's something like scratch or side, is not it? Okay, the words you make pre-generate your dictionary with the letter sorting in each word. So, the word becomes dorw . Then you rotate all of these in the tree data structure, therefore, in your tree, the sequence will point to dorw value word . [Note that because we solve words, they lose their uniqueness, so a sorted word can indicate many different words. i.e. your tree needs to store a list or array on your data node]

You can save this structure if you can quickly without all the sorting steps It has to be loaded. / P>

What do you do then take your input letters and you sort them too, then you start to walk again through your tree if the current letter matches a current path in the tray, then you Follow it. Because you can have unused letters, you give permission to leave the current letter.

And it is easy that at any time you face a node in your tree, which is the value, it is a word that you can get out of those letters that you can get there Used to. You add these words to the list as you find them, and when recursive, you get all possible words.

If you have given repeated letters in your input, you may need additional logic to prevent multiple instances of the same word (unless you want to do this). This argument can be applied during a move that 'exits' a letter (you leave all repeating letters behind) in the next letter.


[edit] Contrary to what you do. The above solution solves all possible words that can be made from a set of letters but you want to check a specific word to see if you have given a set of letters.

It's easy.

Store your available characters in the form of histograms. That is, for each letter, you cache the number that you have. Then, you walk through each letter in your test word, you go when you create a new histogram. As soon as one of your histogram buckets is more than the value in your available letters, the word can not be made. If you get to the end completely, then you can successfully make words.

No comments:

Post a Comment