Detailed docs: the Beautiful Soup 4 Docs.
Assume t
is an object of Tag
.
Core concepts (classes)
Tag
, a Tag object corresponds to an XML or HTML tag.BeautifulSoup
, the BeautifulSoup object represents the parsed document as a whole. You can treat it like a special Tag. It needs a parser to parse the document, a built-in parser is"html.parser"
, e.g.soup = BeautifulSoup("<html>a web page</html>", 'html.parser')
NavigableString
, a string corresponds to a bit of text (as you see it in the browser) within a tag. A NavigableString is just like a Python Unicode string, except that it also supports some of the features for navigating the tree and searching the tree.
The Tag
class
Object attributes:
t.name
, the text inside the angle brackets, for example,<a>
t.attrs
, accesses all attributes of a tag as a dictt["foo"]
, gets the HTML/XML attribute of "foo", set it byt["foo"] = "bar"
Beautiful Soup presents the value(s) of a multi-valued attribute as a list, e.g.t['class']
t.string
gets the text within a tagt.get_text()
gets the human-readable text as a string inside a document/tag
Navigating the tree
- get the first tag by its name, e.g.
t.head
,t.title
- get all direct children as a list by
t.contents
, or using the.children
generator, e.g.for tag in t.children: pass
t.parents
to iterate over all of an element's parentst.previous_sibling, t.next_sibling
to go sideways, ort.previous_siblings, t.next_siblings
to iterate over siblings.
Searching the tree
Filter types:
- A string
- A regex
- A list
- A function
Search methods:
t.find_all()
looks through a tag’s descendants and retrieves all descendants that match your filterst.select()
matches elements by using CSS selectorst.find()
likesfind_all()
but it only finds one resultt.find_parents()
andt.find_parent()
t.find_parents()
andt.find_parent()
t.find_next_siblings()
andt.find_next_sibling()
t.find_previous_siblings()
andt.find_previous_sibling()
Debugging
print(t.prettify())
pretty-prints the tagprint(t)
prints the html without beautificationtype(t)
shows the type of an object