Wednesday 15 February 2012

regex - Regular expression avoid unnecessary backtracking in Java -


Hello I RegEx am very new to the world I want to remove the timestamp in my test string in Java, location and "id_str" field . Wrong, "geo_enabled": true, "profile_background_image_url": "http: \": "/\/a3.twimg.com\/a\/1298918947\/images\/themes\/theme1\/bg.png", "listed_count": 0, "favourites_count": 2, "verified": false, "time_zone": "mountain time (US & amp; Canada)", "profile_text_color": "333333" "contributors_enabled": false, "statuses_count" : 152, "profile_sidebar_fill_color": "DDEEF6", "ID_STR": "207,356,721" "", "description": null, "Profile_link_color": "0084B4", "Profail_bank_fail": false, "Mitr_gnit": 14, "Anuyayi_kount" : 13 "Nirmit_t": "Mon October 25 04:05:43 +0000 2010", "description" choice ":" WaKeeney, KS "," profile_sidebar_border_color ":" C0DEED ", < P> I have tried it

  (\ d *). *? "Id_str": "(\ d *) "." * "Location": "([^" * *] "  

If I get a lazy quantifier . *? (3000 steps in regexbuddy) , But the number of letters between the anchor "id_str" and "space" is not always the same, besides this, it can be offensive if a place is found in the string.

How can I avoid unnecessary backing tracking 1?

And

2) Fast to find non-matching string?

Thank you.

You can try it instead:

  (\ D * +) (? ^ [^ "] ++ |" (?! ID_STR ":)) + 'ID_STR": "(\ d * +)" ?, (> [^ "+++ ++] The idea here is only possible as much as possible and only restricted characters ("Location:") ("Location" :)) + "Location": "([^"] * +) " with sections (as you did in the last captured group)  

for example, to avoid the first lazy Kwantifayr, I use it:

  (? & Gt; [^ "] ++ |" (?! Id_str ":)) +   

The ridge engine will take all those characters which are not possible as much as possible for double quotation marks (and do not register a backtrack position because, a claimant quantifier is used), when a double quote checks an eye , It is not followed by anchor id_str ": . All this part is wrapped by an atomic group (no one is possible) once or twice repeated.

Do not be afraid to use a low oakhead inside which will quickly fail and only if there will be a double quote. Although you can try one with i , if you are sure it is compared to " (or before a rare character, before you meet) Less:

  (? [^ I] ++ | I (d_str ?! ":)) + ID_STR": (...   

Edit: The best option is , > Less often: (200 steps vs. 422 double quotes)

 < Code> (\ d * +) (? [^,] ++ |, (?! "ID_STR" :)), "ID_STR": "(\ d * +)", (> [ ^,] ("!", "Location" :)) +, "location": "([^"] * +) "  

, And if you have the possibility, try to add an anchor ( ^ ) for your pattern, if it is the beginning of a string or a new line (with multi mode).

  ^ (\ d * +) (? [^ "] ++ |" (ID_STR ":) + ?!)" ID_STR ":"! (\ D * +) "? , (& Gt; [^ "] ++ |" (location ":)) +" location ":" ([^ "* +)"    

No comments:

Post a Comment