python-beautifulsoup4

Edit Package python-beautifulsoup4

Beautiful Soup is a Python HTML/XML parser designed for quick turnaround
projects like screen-scraping. Three features make it powerful:

* Beautiful Soup won't choke if you give it bad markup. It yields a parse tree
that makes approximately as much sense as your original document. This is
usually good enough to collect the data you need and run away

* Beautiful Soup provides a few simple methods and Pythonic idioms for
navigating, searching, and modifying a parse tree: a toolkit for dissecting a
document and extracting what you need. You don't have to create a custom
parser for each application

* Beautiful Soup automatically converts incoming documents to Unicode and
outgoing documents to UTF-8. You don't have to think about encodings, unless
the document doesn't specify an encoding and Beautiful Soup can't autodetect
one. Then you just have to specify the original encoding

Beautiful Soup parses anything you give it, and does the tree traversal stuff
for you. You can tell it "Find all the links", or "Find all the links of class
externalLink", or "Find all the links whose urls match "foo.com", or "Find the
table heading that's got bold text, then give me that text."

Valuable data that was once locked up in poorly-designed websites is now within
your reach. Projects that would have taken hours take only minutes with
Beautiful Soup.

Refresh
Refresh
Source Files
Filename Size Changed
beautifulsoup4-4.10.0.tar.gz 0000399890 391 KB
python-beautifulsoup4.changes 0000030859 30.1 KB
python-beautifulsoup4.spec 0000003702 3.62 KB
Revision 35 (latest revision is 40)
Dominique Leuenberger's avatar Dominique Leuenberger (dimstar_suse) accepted request 952905 from Steve Kowalik's avatar Steve Kowalik (StevenK) (revision 35)
- Update to 4.10.0:
  * This is the first release of Beautiful Soup to only support Python 3.
  * The behavior of methods like .get_text() and .strings now differs
    depending on the type of tag.
  * NavigableString and its subclasses now implement the get_text()
    method, as well as the properties .strings and
    .stripped_strings.
  * The 'html5' formatter now treats attributes whose values are the
    empty string as HTML boolean attributes.
  * The 'replace_with()' method now takes a variable number of arguments,
    and can be used to replace a single element with a sequence of elements.
  * Corrected output when the namespace prefix associated with a
    namespaced attribute is the empty string, as opposed to
    None.
  * Performance improvement when processing tags that speeds up overall
    tree construction by 2%. Patch by Morotti. [bug=1899358]
  * Corrected the use of special string container classes in cases when a
    single tag may contain strings with different containers; such as
    the <template> tag, which may contain both TemplateString objects
    and Comment objects.
  * The html.parser tree builder can now handle named entities
    found in the HTML5 spec in much the same way that the html5lib
    tree builder does.
  * Added a second way to pass specify encodings to UnicodeDammit and
    EncodingDetector, based on the order of precedence defined in the
    HTML5 spec.
  * Improve the warning issued when a directory name (as opposed to
    the name of a regular file) is passed as markup into the BeautifulSoup
    constructor.
- Do not pass the directory to pytest.
Comments 0
openSUSE Build Service is sponsored by