Commit Graph

6 Commits (3844d8f05b3f114e3df16c3bc3caf44e5ba52181)

Author SHA1 Message Date
PalmerAL 3844d8f05b
Include more ancestors in candidate scoring (#611)
* include more ancestors in candidate scoring

* fix medium-3 testcase

The original source file contained two copies of the document, which
was causing incorrect results

* remove unnecessary nested elements

* fix removal of empty elements

* add option to regenerate all testcases

* update tests

* fix quanta testcase

* fix creating testcase from network

* fix early exit in testcase generation

* format HTML before comparing while testing

* upgrade js-beautify

* don't merge outer readability div
4 years ago
Gijs Kruitbosch b2f3a43f9f Detect 'trailing' content when comparing DOMs 4 years ago
David A Roberts ea4165721f Remove single-cell tables 6 years ago
David A Roberts bf64b58d90 Update tests 6 years ago
Brad Philips 8525c6af36 Fix relative URIs given <base> tags (#422) 6 years ago
Andres Rey 834672ef86 Return longest text after failing to detect text longer than the configured value (#423)
Save extracted text across attempts and return the longest one when all attempts fail, and add a test case from hukumusume
6 years ago