Commit Graph

1 Commits (d708744822a37b15baf6f1f858909965a6a9c12b)

Author SHA1 Message Date
Jerry Charumilind eefb8e1125 Implement duplicate page detection
This adds detection of duplicate pages to avoid adding duplicate pages to a
multi-page article.  It adds a simple unit test and regenerates the nytimes
regression test with the new, and more correct, result.  Previously, we were
including page 2 again after page 5.

Conflicts:

	src/readability_lxml/readability.py
12 years ago