You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Go to file
Richard Harding a4b6957be2 Update html to be a property with a getter 12 years ago
src Update html to be a property with a getter 12 years ago
.gitignore Move the module into the readable_lxml space so that we can actually import it nicely. 12 years ago
CREDITS Add credits file 12 years ago
LICENSE Add a license file 12 years ago
Makefile Make sure we update both version strings until we can figure out how to pull it into the setup.py by magic 12 years ago
README.rst garden docs 12 years ago
setup.py Fix setup.py to pull the rst readme 12 years ago

README.rst

readability_lxml
================

This is a python port of a ruby port of `arc90's readability`_ project

Given a html document, it pulls out the main body text and cleans it up.
It also can clean up title based on latest readability.js code.


Inspiration
-----------
- Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )
- Ruby port by starrhorne and iterationlabs
- Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
- Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )
- "BR to P" fix from readability.js which improves quality for smaller texts.
- Github users contributions.


Installation
-------------
::

    $ easy_install readability-lxml
    # or
    $ pip install readability-lxml


Usage
------

Command Line Client
~~~~~~~~~~~~~~~~~~~
::

    $ readability http://pypi.python.org/pypi/readability-lxml
    $ readability /home/rharding/sampledoc.html

As a Library
~~~~~~~~~~~~
::

    from readability.readability import Document
    import urllib
    html = urllib.urlopen(url).read()
    readable_article = Document(html).summary()
    readable_title = Document(html).short_title()

Optional `Document` keyword argument:

- attributes:
- debug: output debug messages
- min_text_length:
- retry_length:
- url: will allow adjusting links to be absolute


History
-------

- `0.2.5` Update setup.py for uploading .tar.gz to pypi


.. _arc90's readability: http://lab.arc90.com/experiments/readability/