Commit Graph

13 Commits (master)

Author SHA1 Message Date
anekos 6842ea906e Fix causing lxml error 4 years ago
Éloi Rivard e9acdd091b Use black to format the code 4 years ago
Éloi Rivard 0846955dd7 Fixed issue with self-closing tags. Fix #125 4 years ago
Yuri Baburov 494b19ed4e
Merge branch 'master' into many_repeated_spaces_timeout 6 years ago
Linas Valiukas 63fbc36cb8 Close sample input file after reading it
Otherwise tests spit out:

    ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/pypt/Dropbox/etc-MediaCloud/python-readability/tests/samples/si-game.sample.html' mode='r' encoding='UTF-8'>
    return open(os.path.join(SAMPLES, filename)).read()
6 years ago
Linas Valiukas 747c46abce Trim many repeated spaces to make clean() faster
When Readability encounters many repeated whitespace, the cleanup
regexes in clean() take forever to run, so trim the amount of whitespace
to 255 characters.

Additionally, test the extracting performance with "timeout_decorator".
6 years ago
Yuri Baburov 0e50b53d05 Release version 0.7 . Better HTML5 support and an important bugfix. 6 years ago
Mariusz Osiecki bf9e7404fa Failure if best_elem is root (fix #58) 9 years ago
Yuri Baburov dc648e7d0b Added a test for issue #48 but can't reproduce it -- seems to work fine. 9 years ago
Yuri Baburov 1fac7e685a Added a feature to allow more images per article (with a test) 9 years ago
Richard Harding 46f0302ebc rename the document_only flag to html_partial 12 years ago
Richard Harding 5a98e2c1b8 Correct appending and allow for document only
- Fix the appending of siblings to the correct nested element
- Add a document only flag so that you can get a dom tree you can nest
yourself without html/body tags.
12 years ago
Richard Harding edccec5d3b Work on why we have an empty <body/> tag
- Seems to come because the sanitizer ends up with two nodes, not one. The
first is an empty body, the second is the article div.
- Fix up the tabs so we can work with the file. Needs lots of pep8 love.
- Implement an initial hack that at least gets it working atm.
- Start to add test cases, sample html files we can test against, etc.
12 years ago