You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Go to file
Linas Valiukas 68fb5ad4c0 Try a workaround to make build work on 3.7
https://github.com/travis-ci/travis-ci/issues/9815
6 years ago
readability Improved positive_keywords and negative_keywords processing for the CLI 6 years ago
tests Close sample input file after reading it 6 years ago
.gitignore Adds tox configuration. 9 years ago
.travis.yml Try a workaround to make build work on 3.7 6 years ago
Makefile Updated docs for positive_keywords and negative_keywords, cleaner implementation. 6 years ago
README.rst Release version 0.7 . Better HTML5 support and an important bugfix. 6 years ago
requirements.txt Adds tox configuration. 9 years ago
setup.py Add Python 3.7 classifier 6 years ago
tox.ini Test with Python 3.7 on Travis 6 years ago

README.rst

.. image:: https://travis-ci.org/buriy/python-readability.svg?branch=master
    :target: https://travis-ci.org/buriy/python-readability


python-readability
==================

Given a html document, it pulls out the main body text and cleans it up.

This is a python port of a ruby port of `arc90's readability
project <http://lab.arc90.com/experiments/readability/>`__.

Installation
------------

It's easy using ``pip``, just run:

::

    $ pip install readability-lxml

Usage
-----

::

    >> import requests
    >> from readability import Document
    >>
    >> response = requests.get('http://example.com')
    >> doc = Document(response.text)
    >> doc.title()
    >> 'Example Domain'

Change Log
----------

-  0.7 Improved HTML5 tags handling. Heuristics were changed for a lot of sites: Fixed an important
bug with stripping unwanted HTML nodes (only first matching node was removed before).
-  0.6 Finally a release which supports Python versions 2.6, 2.7, 3.3
   and 3.4
-  0.5 Preparing a release to support Python versions 2.6, 2.7, 3.3 and
   3.4
-  0.4 Added Videos loading and allowed more images per paragraph
-  0.3 Added Document.encoding, positive\_keywords and
   negative\_keywords

Licensing
=========

This code is under `the Apache License
2.0 <http://www.apache.org/licenses/LICENSE-2.0>`__ license.

Thanks to
---------

-  Latest
   `readability.js <https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js>`__
-  Ruby port by starrhorne and iterationlabs
-  `Python port <https://github.com/gfxmonk/python-readability>`__ by
   gfxmonk
-  `Decruft
   effort <http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/>`__
   to move to lxml
-  "BR to P" fix from readability.js which improves quality for smaller
   texts
-  Github users contributions.