You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Go to file
Richard Harding 58c69651d3 Update README to be a rst file and clean up a little bit. 12 years ago
src Update README to be a rst file and clean up a little bit. 12 years ago
.gitignore Move the module into the readable_lxml space so that we can actually import it nicely. 12 years ago
CREDITS Add credits file 12 years ago
LICENSE Add a license file 12 years ago
Makefile Make sure we update both version strings until we can figure out how to pull it into the setup.py by magic 12 years ago
README.rst Update README to be a rst file and clean up a little bit. 12 years ago
setup.py Make sure we update both version strings until we can figure out how to pull it into the setup.py by magic 12 years ago

README.rst

readability_lxml
================

This is a python port of a ruby port of `arc90's readability`_ project

Given a html document, it pulls out the main body text and cleans it up.
It also can clean up title based on latest readability.js code.


Inspiration
-----------
 - Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )
 - Ruby port by starrhorne and iterationlabs
 - Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
 - Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )
 - "BR to P" fix from readability.js which improves quality for smaller texts.
 - Github users contributions.


Installation
-------------
::

    $ easy_install readability-lxml
    # or
    $ pip install readability-lxml


Usage
------

Command Line Client
~~~~~~~~~~~~~~~~~~~
::

    $ readability http://pypi.python.org/pypi/readability-lxml
    $ readability /home/rharding/sampledoc.html

As a Library
~~~~~~~~~~~~
::

    from readability.readability import Document
    import urllib
    html = urllib.urlopen(url).read()
    readable_article = Document(html).summary()
    readable_title = Document(html).short_title()

Optional `Document` keyword argument:

- attributes:
- debug: output debug messages
- min_text_length:
- retry_length:
- url: will allow adjusting links to be absolute


History
-------

 - `0.2.5`` Update setup.py for uploading .tar.gz to pypi


.. _arc90's readability: http://lab.arc90.com/experiments/readability/