Commit Graph

204 Commits (master)
 

Author SHA1 Message Date
Marko Horvatic f0ff9b2425 Move logging.basicConfig to main function 9 years ago
Yuri Baburov e2bc1ea055 Improved #65 which has given warning, added cssselect lib, bumped to 0.5.1 9 years ago
Yuri Baburov 1cb17d919b Merge pull request #65 from avalanchy/best_elem_is_root
Failure if best_elem is root (fix #58)
Thanks a lot @avalanchy and @jnothman !
9 years ago
Mariusz Osiecki bf9e7404fa Failure if best_elem is root (fix #58) 9 years ago
Martin Thurau 386e48d29b Fixes checking of declared encodings in get_encoding.
In PYthon 3 .decode() on bytes requires the name of the encoding to be a str type which means we have to convert the extracted encoding before we can use it.
9 years ago
Martin Thurau 046d2c10c3 Fixes regex declaration in get_encoding.
Since get_encoding() is only called when the input is *not* already unicode we need to declare the regexs as byte type so they continue to work in Python 3.
9 years ago
Martin Thurau ce7ca26835 Adds compatibility `raise_with_traceback` method to support different `raise` syntax
Unfortunately the Python 2 `raise` syntax is not supported in Python 3.3 and not all 3.4.x versions so we deal with that by using conditional imports and a compatibility layer.
9 years ago
Martin Thurau 3ac56329e2 Corrects some things were 2to3 did to much. 9 years ago
Martin Thurau aa4132f57a Adds Python 3.4 support.
Code now supports Python 2.6, 2.7 and 3.4. PYthon 3.3 isn't support
because of some issues with the parser and the difference between old and
new `raise` syntax.
9 years ago
Martin Thurau 13cca1dd19 Adds tox configuration.
Adds tox.ini to support running the tests on multiple versions. Adds
requirements.txt to support dependency installtion via pip.
9 years ago
Yuri Baburov 1d4ee9d421 Releasing as version 0.5 9 years ago
Yuri Baburov 987570bef0 Updated package links for Python 2.7 and Python 3 support 9 years ago
Yuri Baburov dc648e7d0b Added a test for issue #48 but can't reproduce it -- seems to work fine. 9 years ago
Yuri Baburov c715426584 Releasing as version 0.4 9 years ago
Yuri Baburov 1fac7e685a Added a feature to allow more images per article (with a test) 9 years ago
Yuri Baburov c6796195a7 Fixed makefile testing. 9 years ago
Miguel Galves d04d41b749 Insert text inside iframe for correct output 9 years ago
Miguel Galves be2a1c4646 Let width and height attributes 9 years ago
Miguel Galves f1759c1404 Allows iframes containing youtube or vimeo videos. People like them 9 years ago
Yuri Baburov 332ad810de Bumped to 0.3.0.6 9 years ago
Yuri Baburov e4bcbe57d7 Fixes #53 9 years ago
Yuri Baburov aeb4f4c782 Merge pull request #59 from seomoz/mac_10_10
Fix mac version comparison in setup.py for 10.10
9 years ago
Matthew Peters c8c2f8809c Fix mac version comparison in setup.py for 10.10 9 years ago
Yuri Baburov 2d4cfdb2c8 Merge pull request #56 from nathanathan/patch-1
Defaulting to utf-8 when chardet returns None
10 years ago
Nathan Breit 75e2e0cb3a Defaulting to utf-8 when chardet returns None
On articles like this one chardet returns None:
http://news.zing.vn/nhip-song-tre/thay-giao-gay-sot-tung-bo-luat-tinh-yeu/a291427.html
This causes exceptions later on when encoding.lower() is called
10 years ago
Yuri Baburov 0c2f29ed0d Version bump. 10 years ago
Yuri Baburov 638f73f6a2 Fix for #52: <input type="hidden"> are not counted any more for "form removal" heuristic. 10 years ago
Yuri Baburov 2fab5ffa6b Merge pull request #48 from mperdomo1/master
Added code to check declared encodings first
10 years ago
Mark Perdomo 3a43a3fe7e Added code to check declared encodings first and check them
from kennethreitz/requests/utils.py.  Also I added some superset
encodings I have found in Chinese pages that are mishandled by
chardet/character declarations.
10 years ago
Yuri Baburov 1a4d3697bc Allow latest lxml on Mac OS X 10.9, see issue #39 for comments and setup instructions 10 years ago
Yuri Baburov d8595b7103 Quickfix for #41 11 years ago
Yuri Baburov 318f25c577 Minor fix in encoding guessing. Claiming it v0.3.0.1 11 years ago
Yuri Baburov 08658d1d31 Released v 0.3, and uploaded to the pypi. 11 years ago
Yuri Baburov 4e3192f5ab Merge pull request #29 from hush-hush/master
Make lxml clean tree available for user modifications
12 years ago
hush-hush e2e78e4d55 Make lxml clean tree available for user modifications. 12 years ago
Yuri Baburov c923995606 Merge pull request #27 from sunlightlabs/master
Simple guard for empty title elements. Thanks, dvogel!
12 years ago
Drew Vogel fdba8d9e11 Added check on title.text to avoid a TypeError on None. 12 years ago
Yuri Baburov 9cd5fb6226 Bump to 0.2.6.1 12 years ago
Yuri Baburov 44915518d3 Merge pull request #24 from zacharydenton/master
Fix issue 22: all titles were blank.
12 years ago
Zach Denton 0843d9cdf2 Explicitly check if title is None. fixes #22
This fixes #22 which caused all titles to be blank.
12 years ago
Yuri Baburov 8aefc6175f Updated README with 0.2.6 changes. 12 years ago
Yuri Baburov 20d5f3a73a Bump to 0.2.6 12 years ago
Yuri Baburov 2e49e34e11 Merge pull request #20 from andreypopp/master
readability.htmls: some docs do not have title elem
12 years ago
Andrey Popp 95852d5c18 readability.htmls: some docs do not have title elem 12 years ago
Yuri Baburov 274b60cdb1 Merge pull request #19 from EvaSDK/master
Package that provides source code
12 years ago
Gilles Dartiguelongue ea6afd3d49 Make sure code is actually distributed 12 years ago
Richard Harding a19e766900 Update version so we can upload new tar.gz to pypi 12 years ago
Richard Harding b9f6f6777f Merge branch 'master' of github.com:buriy/python-readability 12 years ago
Richard Harding 873562cfba Update setup.py for finding the package correctly 12 years ago
Richard Harding e9a5cbfe7f Remove pdb dummy 12 years ago