python - Trouble parsing HTML using BeautifulSoup -


i'm trying use beautifulsoup parse html in python. specifically, i'm trying create 2 arrays of soup objects: 1 dates of postings on website, , 1 postings themselves. however, when use findall on div class matches postings, initial tag returned, not text inside tag. on other hand, code works fine dates. going on??

# store texts of posts texts = soup.findall("div", {"class":"quote"})  # store dates of posts dates = soup.findall("div", {"class":"datetab"}) 

the first line above returns only

<div class="quote"> 

which not want. second line returns

<div class="datetab">feb<span>2</span></div> 

which want (pre-refining).

i have no idea i'm doing wrong. here website i'm trying parse. homework, , i'm really desperate.

which version of beautifulsoup using? version 3.1.0 performs worse real-world html (read: invalid html) 3.0.8. code works 3.0.8:

import urllib2 beautifulsoup import beautifulsoup  page = urllib2.urlopen("http://harvardfml.com/") soup = beautifulsoup(page) incident in soup.findall('span', { "class" : "quote" }):     print incident.contents 

Comments

Popular posts from this blog

python - Scipy curvefit RuntimeError:Optimal parameters not found: Number of calls to function has reached maxfev = 1000 -

c# - How to add a new treeview at the selected node? -

java - netbeans "Please wait - classpath scanning in progress..." -