Commit Graph

3 Commits (d708744822a37b15baf6f1f858909965a6a9c12b)

Author SHA1 Message Date
Jerry Charumilind eefb8e1125 Implement duplicate page detection
This adds detection of duplicate pages to avoid adding duplicate pages to a
multi-page article.  It adds a simple unit test and regenerates the nytimes
regression test with the new, and more correct, result.  Previously, we were
including page 2 again after page 5.

Conflicts:

	src/readability_lxml/readability.py
12 years ago
Jerry Charumilind 883a02ad5d Add a regression for a multi-page nytimes article
It does not quite work yet, as we wrongly pull in page 2 at the end of the
article due to yet-to-be-implemented duplicate avoidance.

Conflicts:

	src/readability_lxml/readability.py
	src/tests/gen_test.py
	src/tests/regression.py
12 years ago
Richard Harding ace51a6819 Combine our tests with the new regresssion_test stuff 12 years ago