Friday, 15 April 2011

perl - regex with variable part -


How can I merge these 2 regex into a single regex string string (all available on the basis of the last 3 fields Captures parts) are optional in $ s and they should be captured if they exist)? I could not find any work solution using (? = ...)

  $ s = '1.2.3.4 - Example [10 / Dec / 2007: 21: 07: 20 +0100] "GET /x.htm HTTP / 1.1" 401 488'; $ Re = qr / \ A (\ d +) \. (\ D +) \. (\ D +) \. (\ D +) [] (\ S +) [] (\ S +) [] + \ [(\ d +) \ / (\ S +) \ / (\ d +): (\ d +): (\ D +): (\ d +) [] (\ S +) \] [] "(\ S +) [] (. *?) [] (\ S +)" [] (\ S +) [] (\ S +) \ z / x; Print "[" .join ('], [', $ s = ~ $ re]. "] \ N \ n"; $ S = '1.2.3.4 - - [13 / Jun / 2007: 01: 37: 44 +0200] "GET /x.htm HTTP / 1.0" 404 283 "-" "Mozilla / 5.0 ..." "- '; $ Re = qr / \ A (\ d +) \. (\ D +) \. (\ D +) \. (\ D +) [] (\ S +) [] (\ S +) [] + \ [(\ d +) \ / (\ S +) \ / (\ d +): (\ D +): (\ d +): (\ d +) [] (\ S +) \] [] "(\ S +) [] (. *?) [] (\ S + ) "[] (\ S +) [] (\ S +) []" (. *) "[]" (. *?) "[]" (. *?) "\ Z / x; print" [".join ('], [', $ s = ~ $ re].]] \ N \ n";    

a lookahead (? =) You can match a non-capturing group (?) and zero or an event Are: $ re = qr / \ a (\ d +) \. (\ D +) \. (\ D +) \. (\ D +) [] (\ S +) [] (\ S +) [] + \ [(\ d +) \ / (\ S +) \ / (\ d +): (\ D +): (\ d +) [] (\ S +) \] [] "(\ 'S) [] (. *?) [] (\ S +)" [] (\ S +) [] (\ S +) (?: [] "(. *?)" [ ] ". (? *)?" [] "(. *?)") ?? Z / X;

This will capture fixed-range boundaries of captures, but if the optional capture If the group does not match, then the last 3 eggs will be if you want to match between 1 and 3 optional fields, then each of your non-capturing groups has zero or more (? ) I also tried to do this, but it does not work:

  (?: [] "(. *?)") {0}}   

It matches and occupies each of the previous three areas, but each capture caption overwrites the last position in the capture array, so after capturing it only The final area is there.

Be careful that you are using a very strict expression that may not be suitable for all web logs: in particular, the IP address will not handle the IPv6 address for the IP address, and Based on the user agent for match user agents, the characters, how they survive (lighttpd 1.4.28, for example, does not save them).

No comments:

Post a Comment