Friday 15 May 2015

python - Is mapping features of strings helping parsing strings faster? -


I am developing a dictionary which will help me to see the words of English according to phonetics and keywords. This dictionary will help me find specific examples of English words to teach children.

For this, I have created a large Python dictionary with around 200k words, the value of which is their phonetic.

For example, to see the words, the words of the last -a * e , where there will be any part of Kashmir * consonant, I use all the keys with regular expressions Can i parse / P>

However, I thought that in order to map the words, it would be a bit more clever as written in the grid. That's why I can "bookmark" all those words whose last letters are -e and so on, so when I search for the word, I can just call those bookmarks and be sure to be a hit I can and I can reduce the parse to reduce the amount of words every time because I go through many criteria searches with the above example.

Is my strategy really making sense? Or is there a way to go about using regular expressions?

I have less time for the program, I need some expert advice before typing valuable time. Thank you.

It is true that they make it very fast and efficient to answer these questions. It is not very clear that you must always be searching from the end of the word or from the beginning, but if it is going to be a bit of both then you have to try both directions. And if you ever need to find matches in between, then no one can help triangular.

The reverse index (such as the power search engine) sometimes stores the word as words, and then stores the information of connectivity. For, 'overflow' can be broken into 'Ow', 'RFL', and 'ow', and some metadata exists somewhere that there is a word in the combination of these three ngms, each word is different, Violation in a different way Although I am unclear on the details: -

or consider the fact that unless the performance is performed In fact, is important for this application, that it is possible to use the expression regularly (and can possibly be optimized), for this type of dictionary size, and very simple is a 80k-word Use the dictionary A quick and dirty test:

  with the open ( 'dictionary.txt') as feathers: words = wing. Reid (). Strip () Import the import time ('\ N') expr = re.compile (r'a [^ aiouy] + e $ ', re.I) # Of course, to use a dictionary, too def bench () Easily spread: Start = -time.time () match = [word for word in word if expr.search (word)] come back + time.time (  

It is taking approximately 50ms on my computer, and for the simplicity and clarity of using regular expressions and your limited time, I think that's worth it.

No comments:

Post a Comment