I've seen cases where bad encoding strings will result in errors, catching LookupError should solve the problem by falling back onto `chardet` or `utf-8`
Here's one case:
```
textPayload: "Traceback (most recent call last):
File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 189, in summary
self._html(True)
File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 132, in _html
self.html = self._parse(self.input)
File "/opt/conda/lib/python3.7/site-packages/readability/readability.py", line 141, in _parse
doc, self.encoding = build_doc(input)
File "/opt/conda/lib/python3.7/site-packages/readability/htmls.py", line 17, in build_doc
encoding = get_encoding(page) or 'utf-8'
File "/opt/conda/lib/python3.7/site-packages/readability/encoding.py", line 46, in get_encoding
page.decode(encoding)
LookupError: unknown encoding: utf-8, ie=edge, chrome=1
```
When Readability encounters many repeated whitespace, the cleanup
regexes in clean() take forever to run, so trim the amount of whitespace
to 255 characters.
Additionally, test the extracting performance with "timeout_decorator".
In PYthon 3 .decode() on bytes requires the name of the encoding to be a str type which means we have to convert the extracted encoding before we can use it.
Since get_encoding() is only called when the input is *not* already unicode we need to declare the regexs as byte type so they continue to work in Python 3.
Unfortunately the Python 2 `raise` syntax is not supported in Python 3.3 and not all 3.4.x versions so we deal with that by using conditional imports and a compatibility layer.
Code now supports Python 2.6, 2.7 and 3.4. PYthon 3.3 isn't support
because of some issues with the parser and the difference between old and
new `raise` syntax.