Adrien Barbaresi
8ea6a20e01
Skip missing interpreters in tox.ini
4 years ago
Adrien Barbaresi
a98151e6dd
Extended travis config
...
- Python versions added (3.9, pypy)
- OS added (MacOS, 2 different versions)
4 years ago
Yuri Baburov
615ce803c6
Merge pull request #124 from dariobig/patch-1
...
Catch LookupError in case of bad encoding string
4 years ago
Yuri Baburov
52f767c812
Update __init__.py
4 years ago
Yuri Baburov
c24808fbb2
Update README.rst
4 years ago
Yuri Baburov
da9e285f73
Merge pull request #128 from azmeuk/self-closing
...
Replaced XHTML output with HTML5 output in summary for empty elements (a, br), issue #125
4 years ago
Yuri Baburov
5032e2d3ab
Merge pull request #127 from azmeuk/warnings
...
Fixed a few regex warnings, thanks azmeuk !
4 years ago
Yuri Baburov
471d89dde9
Merge pull request #126 from azmeuk/py38
...
Added official python 3.8 support, dropped python 3.4 support.
Thanks Éloi Rivard (@azmeuk) !
4 years ago
Yuri Baburov
4980b0c141
Merge branch 'master' into py38
4 years ago
Yuri Baburov
331b58ef50
Merge pull request #129 from azmeuk/doc
...
Added basic documentation
4 years ago
Éloi Rivard
f9977b727d
Documentation draft
4 years ago
Éloi Rivard
0846955dd7
Fixed issue with self-closing tags. Fix #125
4 years ago
Éloi Rivard
6c1c6391e2
Fixed a few regex warnings
4 years ago
Éloi Rivard
326fb43b4c
Drop support for python 3.4 - Add support for python 3.8
4 years ago
Dario
0442358942
Catch LookupError in case of bad encoding string
...
I've seen cases where bad encoding strings will result in errors, catching LookupError should solve the problem by falling back onto `chardet` or `utf-8`
Here's one case:
```
textPayload: "Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 189, in summary
self._html(True)
File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 132, in _html
self.html = self._parse(self.input)
File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 141, in _parse
doc, self.encoding = build_doc(input)
File "/opt/conda/lib/python3.7/site-packages/readability/htmls.py", line 17, in build_doc
encoding = get_encoding(page) or 'utf-8'
File "/opt/conda/lib/python3.7/site-packages/readability/encoding.py", line 46, in get_encoding
page.decode(encoding)
LookupError: unknown encoding: utf-8, ie=edge, chrome=1
```
5 years ago
Yuri Baburov
de20908e57
Update README.rst
5 years ago
Yuri Baburov
4fa85d2778
Merge pull request #116 from baby5/master
...
Fixed compile_pattern to support uppercase.
5 years ago
baby5
0ac3c5bbc6
Fix compile_pattern not support uppercase
5 years ago
Yuri Baburov
a4ac1c7704
Merge pull request #115 from johnklee/Issue99
...
Fix #99 - Hiding exception raised during "a href" normalization, added handle_failures parameter defaulting to "discard" bad urls.
5 years ago
jkclee
bac691a0a4
Fix #99
5 years ago
Yuri Baburov
3cbede6be4
Update README.rst
6 years ago
Yuri Baburov
d40c4dd34d
Update README.rst
6 years ago
Yuri Baburov
9aba330e68
Update README.rst
6 years ago
Yuri Baburov
0b28643f0d
Update README.rst
6 years ago
Yuri Baburov
59b99ffa0b
Merge pull request #105 from pypt/many_repeated_spaces_timeout
...
Trim many repeated spaces to make clean() faster
6 years ago
Yuri Baburov
494b19ed4e
Merge branch 'master' into many_repeated_spaces_timeout
6 years ago
Yuri Baburov
dca6e2197a
Merge pull request #107 from pypt/module_version_constant
...
Add __version__ constant to __init__.py, read it in setup.py
6 years ago
Yuri Baburov
5215ab657b
Merge pull request #106 from pypt/python_3_7
...
Improvements for Python 3.7 support and CI
6 years ago
Linas Valiukas
68fb5ad4c0
Try a workaround to make build work on 3.7
...
https://github.com/travis-ci/travis-ci/issues/9815
6 years ago
Linas Valiukas
34fce7664d
Update Python version in .travis.yml
6 years ago
Linas Valiukas
0233936e72
Add __version__ constant to __init__.py, read it in setup.py
...
Users wouldn't need to install, import and use Pip ("pkg_resources") to
find out which version of readability-lxml is being used.
6 years ago
Linas Valiukas
63fbc36cb8
Close sample input file after reading it
...
Otherwise tests spit out:
ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/pypt/Dropbox/etc-MediaCloud/python-readability/tests/samples/si-game.sample.html' mode='r' encoding='UTF-8'>
return open(os.path.join(SAMPLES, filename)).read()
6 years ago
Linas Valiukas
bdb6d671d8
Test with Python 3.7 on Travis
6 years ago
Linas Valiukas
34d198fe5a
Add Python 3.7 classifier
6 years ago
Linas Valiukas
2bbb70b3e5
Fix Travis build
...
Add "test" extra and install dependencies for said extra as detailed in:
https://stackoverflow.com/a/41398850/200603
6 years ago
Linas Valiukas
747c46abce
Trim many repeated spaces to make clean() faster
...
When Readability encounters many repeated whitespace, the cleanup
regexes in clean() take forever to run, so trim the amount of whitespace
to 255 characters.
Additionally, test the extracting performance with "timeout_decorator".
6 years ago
Yuri Baburov
8235f0794c
Trying to pass travis tests.
6 years ago
Yuri Baburov
f7f439d019
Improved positive_keywords and negative_keywords processing for the CLI
6 years ago
Yuri Baburov
0c8f040d53
Updated docs for positive_keywords and negative_keywords, cleaner implementation.
6 years ago
Yuri Baburov
0e50b53d05
Release version 0.7 . Better HTML5 support and an important bugfix.
6 years ago
Yuri Baburov
537de2b8f6
Improved remove_unlikely_candidates following an advice from issue #102
6 years ago
Yuri Baburov
97e86c4559
Merge pull request #101 from hugovk/add-3.5-3.6
...
Add support for Python 3.5 and 3.6, drop support for Python 3.3 and 2.6
7 years ago
Hugo
f4a04732fd
Workaround for py35
7 years ago
Hugo
4172699812
Add Python 3.5 and 3.6
7 years ago
Hugo
f74adc6893
Drop support for EOL Python 3.3
7 years ago
Hugo
27159f45b3
Drop support for EOL Python 2.6
7 years ago
Yuri Baburov
78cac34bb3
Merge pull request #96 from ccurvey/master
...
fix encoding detection to use the encoding being tested
7 years ago
Chris Curvey
9a31587192
fix encoding detection to use the encoding being tested
7 years ago
Yuri Baburov
e4efc87a20
Update readability.py
8 years ago
Yuri Baburov
b20d5c15ef
Improved Document class documentation
8 years ago