You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Go to file
Richard Harding 93ac1111a1 Add try it out to the readable server 12 years ago
src PEP8 linting, so so close 12 years ago
.gitignore Move the module into the readable_lxml space so that we can actually import it nicely. 12 years ago
CREDITS Add credits file 12 years ago
LICENSE Add a license file 12 years ago
Makefile Make sure we update both version strings until we can figure out how to pull it into the setup.py by magic 12 years ago
README.rst Add try it out to the readable server 12 years ago
setup.py Fix setup.py to pull the rst readme 12 years ago

README.rst

readability_lxml
================

This is a python port of a ruby port of `arc90's readability`_ project

Given a html document, it pulls out the main body text and cleans it up.
It also can clean up title based on latest readability.js code.


Inspiration
-----------
- Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )
- Ruby port by starrhorne and iterationlabs
- Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
- Decruft effort to move to lxml ( http://www.minvolai.com/blog/decruft-arc90s-readability-in-python/ )
- "BR to P" fix from readability.js which improves quality for smaller texts.
- Github users contributions.


Try it out!
-----------
You can try out the parser by entering your test urls on the following test
service.

http://readable.bmark.us


Installation
-------------
::

    $ easy_install readability-lxml
    # or
    $ pip install readability-lxml


Usage
------

Command Line Client
~~~~~~~~~~~~~~~~~~~
::

    $ readability http://pypi.python.org/pypi/readability-lxml
    $ readability /home/rharding/sampledoc.html

As a Library
~~~~~~~~~~~~
::

    from readability.readability import Document
    import urllib
    html = urllib.urlopen(url).read()
    readable_article = Document(html).summary()
    readable_title = Document(html).short_title()

Optional `Document` keyword argument:

- attributes:
- debug: output debug messages
- min_text_length:
- retry_length:
- url: will allow adjusting links to be absolute


History
-------

- `0.2.5` Update setup.py for uploading .tar.gz to pypi


.. _arc90's readability: http://lab.arc90.com/experiments/readability/