Commit Graph

8 Commits (3844d8f05b3f114e3df16c3bc3caf44e5ba52181)

Author SHA1 Message Date
PalmerAL 3844d8f05b
Include more ancestors in candidate scoring (#611)
* include more ancestors in candidate scoring

* fix medium-3 testcase

The original source file contained two copies of the document, which
was causing incorrect results

* remove unnecessary nested elements

* fix removal of empty elements

* add option to regenerate all testcases

* update tests

* fix quanta testcase

* fix creating testcase from network

* fix early exit in testcase generation

* format HTML before comparing while testing

* upgrade js-beautify

* don't merge outer readability div
4 years ago
Maria Luiza Soares 8c41d92560 Assert on siteName in all test cases 6 years ago
Daniel Aleksandersen 5a69d4a8eb Improve metadata extraction (#478)
* Improve metadata extraction

* Recognize meta[property] as a space-separated list
* Recognize Dulin Core (dc|dcterm): metadata.
* Prefer Dublin Core, Open Graph, Twitter, and HTML in that order.
* _getArticleTitle() is now only used as fallback if document
 doesn't provide good metadata.
6 years ago
Evan Tseng e84c0c3f07 Bug 1285543 - Only use "og:title" or "twitter:title" if _getArticleTitle does not return a valid title, r=Gijs 8 years ago
Nicolas Perriault 6eeabf90c1 Fixes #164 - Add support for title alt semantic metadata. 9 years ago
Nicolas Perriault 3b636b59f0 Added readerable value to test pages metadata. 9 years ago
Margaret Leibovic 3c2d93cd09 Improve byline algorithm 9 years ago
Margaret Leibovic 1b5d896b8b Add expected-metadata.json for existing tests 9 years ago