Catch LookupError in case of bad encoding string

I've seen cases where bad encoding strings will result in errors, catching LookupError should solve the problem by falling back onto `chardet` or `utf-8`

Here's one case:

```
 textPayload: "Traceback (most recent call last):
  File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 189, in summary
    self._html(True)
  File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 132, in _html
    self.html = self._parse(self.input)
  File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 141, in _parse
    doc, self.encoding = build_doc(input)
  File "/opt/conda/lib/python3.7/site-packages/readability/htmls.py", line 17, in build_doc
    encoding = get_encoding(page) or 'utf-8'
  File "/opt/conda/lib/python3.7/site-packages/readability/encoding.py", line 46, in get_encoding
    page.decode(encoding)
LookupError: unknown encoding: utf-8, ie=edge, chrome=1
```
pull/124/head
Dario 5 years ago committed by GitHub
parent de20908e57
commit 0442358942
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -46,7 +46,7 @@ def get_encoding(page):
page.decode(encoding)
# It worked!
return encoding
except UnicodeDecodeError:
except (UnicodeDecodeError, LookupError):
pass
# Fallback to chardet if declared encodings fail

Loading…
Cancel
Save