It does not quite work yet, as we wrongly pull in page 2 at the end of the
article due to yet-to-be-implemented duplicate avoidance.
Conflicts:
src/readability_lxml/readability.py
src/tests/gen_test.py
src/tests/regression.py
Restructured code to better support multi-page readability. Improved tests.
Rick:
This generally works and the tests pass, but there are some broken cases with
the multipage bits that are causing me grief. It does pass the one test case.
I made the multipage an option vs doing it by default. The more I change the
code the harder future merges will be, but man it needs some cleanup, reorg,
and comments.
Conflicts:
src/readability_lxml/readability.py
src/tests/regression.py
These test cases provide a baseline from which we can start improving the
readability algorithm and making sure that we do not horribly break anything.
Conflicts:
src/tests/regression.py