regex - Attribute Error for strings created from lists -


i'm trying create data-scraping file class, , data have scrape requires use while loops right data separate arrays-- i.e. states, , sat averages, etc.

however, once set while loops, regex cleared majority of html tags data broke, , getting error reads:

attribute error: 'nonetype' object has no attribute 'groups'

my code is:

import re, util beautifulsoup import beautifulstonesoup  # create comma-delineated file delim = ", "  #base url sat data base = "http://www.usatoday.com/news/education/2007-08-28-sat-table_n.htm"  #get webpage object site soup = util.mysoupopen(base)  #get column headings colcols = soup.findall("td", {"class":"vatextbold"})  #get data datacols = soup.findall("td", {"class":"vatext"})  #append data cols in range(len(datacols)):     colcols.append(datacols[i])  #open csv file write data fob=open("sat.csv", 'a')  #initiate 5 arrays states = [] participate = [] math = [] read = [] write = []  #split 5 lists each row in range(len(colcols)):     if i%5 == 0:         states.append(colcols[i]) i=1 while i<=250:     participate.append(colcols[i])     = i+5  i=2 while i<=250:     math.append(colcols[i])     = i+5  i=3 while i<=250:     read.append(colcols[i])     = i+5  i=4 while i<=250:     write.append(colcols[i])     = i+5  #write data file in range(len(states)):     states = str(states[i])     participate = str(participate[i])     math = str(math[i])     read = str(read[i])     write = str(write[i])      #regex remove html data scraped      #remove <td> tags     line = re.search(">(.*)<", states).groups()[0] + delim + re.search(">(.*)<",       participate).groups()[0]+ delim  + re.search(">(.*)<", math).groups()[0] + delim + re.search(">(.*)<", read).groups()[0] + delim  + re.search(">(.*)<", write).groups()[0]      #append data point file    fob.write(line) 

any ideas regarding why error appeared? regex working fine until tried split data different lists. have tried printing various strings inside final "for" loop see if of them "none" first value (0), string supposed be.

any appreciated!

it looks regex search failing on (one of) strings, returns none instead of matchobject.

try following instead of long #remove <td> tags line:

out_list = [] item in (states, participate, math, read, write):     try:         out_list.append(re.search(">(.*)<", item).groups()[0])     except attributeerror:         print "regex match failed on", item         sys.exit() line = delim.join(out_list) 

that way, can find out regex failing.

also, suggest use .group(1) instead of .groups()[0]. former more explicit.


Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -