Update README to be a rst file and clean up a little bit.

12 years ago · 58c69651d3
parent 8b0210c4dc
commit 58c69651d3
2 changed files with 39 additions and 29 deletions
--- a/README.rst
+++ b/README.rst
@ -1,14 +1,14 @@
-This code is under the Apache License 2.0.  http://www.apache.org/licenses/LICENSE-2.0
+readability_lxml
 ================
-This is a python port of a ruby port of arc90's readability project
+This is a python port of a ruby port of `arc90's readability`_ project
 http://lab.arc90.com/experiments/readability/
 In few words,
 Given a html document, it pulls out the main body text and cleans it up.
 It also can clean up title based on latest readability.js code.
-Based on:
+
 Inspiration
 -----------
 - Latest readability.js ( https://github.com/MHordecki/readability-redux/blob/master/readability/readability.js )
 - Ruby port by starrhorne and iterationlabs
 - Python port by gfxmonk ( https://github.com/gfxmonk/python-readability , based on BeautifulSoup )
@ -16,13 +16,29 @@ Based on:
 - "BR to P" fix from readability.js which improves quality for smaller texts.
 - Github users contributions.
 Installation::
-    easy_install readability-lxml
+Installation
-    or
+-------------
-    pip install readability-lxml
+::
    $ easy_install readability-lxml
    # or
    $ pip install readability-lxml
 Usage
 ------
-Usage::
+Command Line Client
 ~~~~~~~~~~~~~~~~~~~
 ::
    $ readability http://pypi.python.org/pypi/readability-lxml
    $ readability /home/rharding/sampledoc.html
 As a Library
 ~~~~~~~~~~~~
 ::
    from readability.readability import Document
    import urllib
@ -30,21 +46,19 @@ Usage::
    readable_article = Document(html).summary()
    readable_title = Document(html).short_title()
-Command-line usage::
+Optional `Document` keyword argument:
    python -m readability.readability -u http://pypi.python.org/pypi/readability-lxml
-Document() kwarg options:
+- attributes:
 - debug: output debug messages
 - min_text_length:
 - retry_length:
 - url: will allow adjusting links to be absolute
 - attributes:
 - debug: output debug messages
 - min_text_length:
 - retry_length:
 - url: will allow adjusting links to be absolute
 History
 -------
-Updates
+ - `0.2.5`` Update setup.py for uploading .tar.gz to pypi
 - 0.2.5 Update setup.py for uploading .tar.gz to pypi
 .. _arc90's readability: http://lab.arc90.com/experiments/readability/
--- a/src/readability_lxml/readability.py
+++ b/src/readability_lxml/readability.py
@ -102,11 +102,6 @@ class Document:
        self.options = options
        self.html = None
    def _html(self, force=False):
        if force or self.html is None:
            self.html = self._parse(self.input_doc)
        return self.html
    def _parse(self, input_doc):
        doc = build_doc(input_doc)
        doc = html_cleaner.clean_html(doc)
@ -136,7 +131,8 @@ class Document:
        try:
            ruthless = True
            while True:
-                self._html(True)
+                self.html = self._parse(self.input_doc)
                for i in self.tags(self.html, 'script', 'style'):
                    i.drop_tree()
                for i in self.tags(self.html, 'body'):