Commit Graph

187 Commits (14d4474f33446e5ce08d6fc5750b17092311844b)
 

Author SHA1 Message Date
Adrien Barbaresi 14d4474f33 add coverage tests 4 years ago
Yuri Baburov 5a74140fdb
Merge pull request #132 from azmeuk/readme
Syntax highlight the README
4 years ago
Yuri Baburov 07f6861ece
Merge pull request #135 from adbar/master
unnecessary imports removed
added lines for conformity and readability
linted code parts
4 years ago
Adrien Barbaresi bd8293eb63 code linting 4 years ago
Yuri Baburov 17ffad5a26
Merge pull request #134 from adbar/patch-1
Extended travis config:
 - Python versions added (3.9, pypy)
 - OS added (MacOS, 2 different versions)
4 years ago
Yuri Baburov baf03e0d8e
Update .travis.yml 4 years ago
Yuri Baburov 8c122cc862
Update .travis.yml 4 years ago
Yuri Baburov 28db33a1ad
Update .travis.yml 4 years ago
Yuri Baburov 44ee1c4a87
Update .travis.yml 4 years ago
Adrien Barbaresi 9a85102555
Set TOXENV for macOS tests 4 years ago
Adrien Barbaresi 8ea6a20e01
Skip missing interpreters in tox.ini 4 years ago
Adrien Barbaresi a98151e6dd
Extended travis config
- Python versions added (3.9, pypy)
- OS added (MacOS, 2 different versions)
4 years ago
Éloi Rivard 0556abb794 Syntax highlight the README 4 years ago
Yuri Baburov 615ce803c6
Merge pull request #124 from dariobig/patch-1
Catch LookupError in case of bad encoding string
4 years ago
Yuri Baburov 52f767c812
Update __init__.py 4 years ago
Yuri Baburov c24808fbb2
Update README.rst 4 years ago
Yuri Baburov da9e285f73
Merge pull request #128 from azmeuk/self-closing
Replaced XHTML output with HTML5 output in summary for empty elements (a, br), issue #125
4 years ago
Yuri Baburov 5032e2d3ab
Merge pull request #127 from azmeuk/warnings
Fixed a few regex warnings, thanks azmeuk !
4 years ago
Yuri Baburov 471d89dde9
Merge pull request #126 from azmeuk/py38
Added official python 3.8 support, dropped python 3.4 support.
Thanks Éloi Rivard (@azmeuk) !
4 years ago
Yuri Baburov 4980b0c141
Merge branch 'master' into py38 4 years ago
Yuri Baburov 331b58ef50
Merge pull request #129 from azmeuk/doc
Added basic documentation
4 years ago
Éloi Rivard f9977b727d Documentation draft 4 years ago
Éloi Rivard 0846955dd7 Fixed issue with self-closing tags. Fix #125 4 years ago
Éloi Rivard 6c1c6391e2 Fixed a few regex warnings 4 years ago
Éloi Rivard 326fb43b4c Drop support for python 3.4 - Add support for python 3.8 4 years ago
Dario 0442358942
Catch LookupError in case of bad encoding string
I've seen cases where bad encoding strings will result in errors, catching LookupError should solve the problem by falling back onto `chardet` or `utf-8`

Here's one case:

```
 textPayload: "Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 189, in summary
    self._html(True)
  File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 132, in _html
    self.html = self._parse(self.input)
  File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 141, in _parse
    doc, self.encoding = build_doc(input)
  File "/opt/conda/lib/python3.7/site-packages/readability/htmls.py", line 17, in build_doc
    encoding = get_encoding(page) or 'utf-8'
  File "/opt/conda/lib/python3.7/site-packages/readability/encoding.py", line 46, in get_encoding
    page.decode(encoding)
LookupError: unknown encoding: utf-8, ie=edge, chrome=1
```
5 years ago
Yuri Baburov de20908e57
Update README.rst 5 years ago
Yuri Baburov 4fa85d2778
Merge pull request #116 from baby5/master
Fixed compile_pattern to support uppercase.
5 years ago
baby5 0ac3c5bbc6 Fix compile_pattern not support uppercase 5 years ago
Yuri Baburov a4ac1c7704
Merge pull request #115 from johnklee/Issue99
Fix #99 - Hiding exception raised during "a href" normalization, added handle_failures parameter defaulting to "discard" bad urls.
5 years ago
jkclee bac691a0a4 Fix #99 5 years ago
Yuri Baburov 3cbede6be4
Update README.rst 6 years ago
Yuri Baburov d40c4dd34d
Update README.rst 6 years ago
Yuri Baburov 9aba330e68
Update README.rst 6 years ago
Yuri Baburov 0b28643f0d
Update README.rst 6 years ago
Yuri Baburov 59b99ffa0b
Merge pull request #105 from pypt/many_repeated_spaces_timeout
Trim many repeated spaces to make clean() faster
6 years ago
Yuri Baburov 494b19ed4e
Merge branch 'master' into many_repeated_spaces_timeout 6 years ago
Yuri Baburov dca6e2197a
Merge pull request #107 from pypt/module_version_constant
Add __version__ constant to __init__.py, read it in setup.py
6 years ago
Yuri Baburov 5215ab657b
Merge pull request #106 from pypt/python_3_7
Improvements for Python 3.7 support and CI
6 years ago
Linas Valiukas 68fb5ad4c0 Try a workaround to make build work on 3.7
https://github.com/travis-ci/travis-ci/issues/9815
6 years ago
Linas Valiukas 34fce7664d Update Python version in .travis.yml 6 years ago
Linas Valiukas 0233936e72 Add __version__ constant to __init__.py, read it in setup.py
Users wouldn't need to install, import and use Pip ("pkg_resources") to
find out which version of readability-lxml is being used.
6 years ago
Linas Valiukas 63fbc36cb8 Close sample input file after reading it
Otherwise tests spit out:

    ResourceWarning: unclosed file <_io.TextIOWrapper name='/Users/pypt/Dropbox/etc-MediaCloud/python-readability/tests/samples/si-game.sample.html' mode='r' encoding='UTF-8'>
    return open(os.path.join(SAMPLES, filename)).read()
6 years ago
Linas Valiukas bdb6d671d8 Test with Python 3.7 on Travis 6 years ago
Linas Valiukas 34d198fe5a Add Python 3.7 classifier 6 years ago
Linas Valiukas 2bbb70b3e5 Fix Travis build
Add "test" extra and install dependencies for said extra as detailed in:

https://stackoverflow.com/a/41398850/200603
6 years ago
Linas Valiukas 747c46abce Trim many repeated spaces to make clean() faster
When Readability encounters many repeated whitespace, the cleanup
regexes in clean() take forever to run, so trim the amount of whitespace
to 255 characters.

Additionally, test the extracting performance with "timeout_decorator".
6 years ago
Yuri Baburov 8235f0794c Trying to pass travis tests. 6 years ago
Yuri Baburov f7f439d019 Improved positive_keywords and negative_keywords processing for the CLI 6 years ago
Yuri Baburov 0c8f040d53 Updated docs for positive_keywords and negative_keywords, cleaner implementation. 6 years ago