Tuesday, 15 May 2012

regex - Restricting the area of text that is searched by python -


I want to find and count the number of strings in a webscrape However I want to find between webs and x within webscrape.

Anyone can tell me the easiest way to count the sea bass between the main fisherman and the secondary fisherman in the following example webscrape.

  & lt; P style = "color: # 555555; font-family: aerial, helvetica, non-serif, font-size: 12px; row-height: 18px;" & Gt; June 21, 2013 FISH PPL Admin & lt; / Small> & Lt; / Div & gt; & Lt ;! - Post Body Copy - & gt; & Lt; Div class = "post-bodyicopy cleanix" & gt; & Lt; P & gt; MAIN FISHERMAN & amp; # 8211; & Lt; / P & gt; & Lt; P & gt; & Lt; Strong> CHAMP & lt; / Strong> & Amp; # 8211; Pedro 00777 & lt; Br / & gt; Byte & amp; # 8211; LOCATION1 & amp; # 8211; 2:30 & amp; nbsp; # 8211; Sea bass (3 lbs 11/4) & lt; Br / & gt; Multi A LOCATION2 & amp; # 8211; 7:30 and # 8211; COD (3lbs 13/8) & lt; Br / & gt; Laur a ???? LOCATION5 & amp; # 8211; 3:20 and # 8211; Rood (2 lbs 6/1) & lt; / P & gt; & Lt; P & gt; Joey Blogs & lt; A href = "url" & gt; Url & lt; / A & gt; & Lt; Br / & gt; Byte & amp; # 8211; LOCATION4 and # 8211; 4:45 and # 8211; Roach (5 lbs 3/1) & lt; Br / & gt; Multi A LOCATION2 & amp; # 8211; 5:50 and # 8211; Perak (3 lbs 6/1) & lt; Br / & gt; Laur a ???? LOCATION1 & amp; # 8211; 3:45 and # 8211; Pick (2 lbs 5/1) & lt; / P & gt; Byte & amp; # 8211; LOCATION1 & amp; # 8211; 2:30 & amp; nbsp; # 8211; Sea bass (3 lbs 11/4) & lt; Br / & gt; Multi One LOCATION1 & amp; # 8211; 3:45 and # 8211; Only the judge (3 lbs 3/1) & lt; Br / & gt; Laur a ???? LOCATION3 & amp; # 8211; 8:25 & amp; # 8211; School fees (2 lbs 7/1) & lt; / P & gt; & Lt; Div class = "post-bodyicopy cleanix" & gt; & Lt; P & gt; Secondary Fisherman & amp; # 8211; & Lt; / P & gt; & Lt; P & gt; & Lt; Strong & gt; SPOON & amp; # 8211; & Lt; A href = "url" & gt; Url & lt; / A & gt; & Lt; / Strong> & Lt; Br / & gt; Byte & amp; # 8211; LOCATION1 & amp; # 8211; 2:30 & amp; nbsp; # 8211; Sea bass (3 lbs 11/4) & lt; Br / & gt; Multi A LOCATION2 & amp; # 8211; 7:30 and # 8211; COD (3 lbs 7/4) & lt; Br / & gt; Laur a ???? LOCATION1 & amp; # 8211; 4:25 and # 8211; TOUTT (2 lbs 5/1) & lt; / P & gt;   

I tried to use the following code to get it, but there is no benefit.

  HTML = website.read () pattern_to_exclude_unwanted_data = re.compile ( 'chief fisherman (. *) Secondary fisherman) excluding_unwanted_data = re.findall (pattern_to_exclude_unwanted_data, HTML) Print Excluding_unwanted_data ( "sea bass"    

Do this in two steps:

  1. Remove the middle-string between the main fisherman and the secondary fisherman
  2. Calculate sea bass

    like this:.

      relevant = re.search (r "chief fisherman (. *) Secondary fisherman", HTML, RekDOTALL) Kgroup (L) = found Relevantkcount ( "sea bass")    

No comments:

Post a Comment