Beautiful Soup 4 Cheatsheet

Beautiful Soup
Beautiful Soup

Detailed docs: the Beautiful Soup 4 Docs.

Assume t is an object of Tag.

Core concepts (classes)

  • Tag, a Tag object corresponds to an XML or HTML tag.
  • BeautifulSoup, the BeautifulSoup object represents the parsed document as a whole. You can treat it like a special Tag. It needs a parser to parse the document, a built-in parser is "html.parser", e.g. soup = BeautifulSoup("<html>a web page</html>", 'html.parser')
  • NavigableString, a string corresponds to a bit of text (as you see it in the browser) within a tag. A NavigableString is just like a Python Unicode string, except that it also supports some of the features for navigating the tree and searching the tree.

The Tag class

Object attributes:

[Read More]