Commit Graph

406 Commits (master)
 

Author SHA1 Message Date
PalmerAL d5eea06a00
exclude additional elements based on their role (#619) 4 years ago
Garrett Xu 3fe82816af
Add support for author array in JSON-LD. #617 (#618) 4 years ago
PalmerAL 3844d8f05b
Include more ancestors in candidate scoring (#611)
* include more ancestors in candidate scoring

* fix medium-3 testcase

The original source file contained two copies of the document, which
was causing incorrect results

* remove unnecessary nested elements

* fix removal of empty elements

* add option to regenerate all testcases

* update tests

* fix quanta testcase

* fix creating testcase from network

* fix early exit in testcase generation

* format HTML before comparing while testing

* upgrade js-beautify

* don't merge outer readability div
4 years ago
Gijs Kruitbosch 80d818aaa6 Don't publish git attributes or travis config to npm 4 years ago
Dan Burzo 2ca98284e9
Prefer JSON-LD metadata object, when present (#609)
* Prefer JSON-LD metadata object, when present

* Log JSON-LD parsing error

* Trim all JSON-LD fields
4 years ago
Gijs Kruitbosch 914307a90b Increment version before publishing on npm 4 years ago
Dan Burzo 1a61a23f68
Readability on npm (#608)
* Initial work on preparing Readability for npm

* Adjust some require()s

* Point package.json to index.js

* Add Node.js instructions to README

* Use ES6 in eslint
4 years ago
S Nikhill 59570ba7fc
Replace a Dead Link in Comment (#606)
* Update Links in Comments

Update a link in comments to point to a better source.
Remove a dead link. (Link Removed: http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/#ropes)
4 years ago
Dan Burzo b1d15c0ef9
Add option.serializer, fixes #605 (#607) 4 years ago
Radhi 52ab9b5c89
Fix lazy-loaded images are not visible in Kinja sites (#590)
* Add initial test case for kinja's lazy image

* Implement method to remove small data uri image

* Convert relative uri in poster and srcset of media nodes

* Eslint doesn't like arrow function

* Unescape HTML entities in metadata

* Fix wrong regex for parsing srcset urls

* Remove line to check data url since it already handled by new URL

* Replace String.matchAll since it only supported in Node 12+

* Use numeric code when unescaping HTML

* Don't remove data URL src if it's svg

* Don't remove b64 src if it's the only attr that contains image

* Make the comma part non-optional in regex for srcset url

* Fix wrong code for unescaping HTML

* Don't capture comma and semicolon in data URL regex
4 years ago
Gijs Kruitbosch d5621f85e7 Fix #585 - remove nodes with role=complementary 4 years ago
Radhi Fadlillah 668a3a1010 Minor cchange in comments 4 years ago
Radhi Fadlillah 3976fa34e9 Don't use data-old- prefix if old img attr not exists 4 years ago
Radhi Fadlillah 7d74395b7b Feed semicolon to eslint 4 years ago
Radhi Fadlillah d8366f0686 Keep all attributes that might contain image 4 years ago
Radhi Fadlillah e85122e8d7 Make eslint happy 4 years ago
Radhi Fadlillah c8eab07661 Stop using live list while removing nodes 4 years ago
Radhi Fadlillah 1277d22b81 Keep old img src as data attribute 4 years ago
Radhi Fadlillah 6fed28610d Simplify loop for unwrapping noscript 4 years ago
Radhi Fadlillah adc6accaec Fix grammar issues in comments 4 years ago
Radhi Fadlillah 89572ad29a Update test for several pages 4 years ago
Radhi Fadlillah d784bf7e20 Add method to unwrap img inside noscript 4 years ago
Gijs Kruitbosch b2f3a43f9f Detect 'trailing' content when comparing DOMs 4 years ago
Gijs Kruitbosch dc34dfd8fa Fix #580 by not using live node lists when removing items 4 years ago
Gijs 630681bd26 Add some indenting back 4 years ago
PalmerAL 61ef00a853 add exception for wikimedia math images 4 years ago
Gijs 56ecc4d4ba Fix eslint issues. 4 years ago
PalmerAL 7c91bdd275 preserve children when removing javascript: links 4 years ago
Gijs d6fc38c4b4
Fix #564 by allowing 'content' as an indicator of readable content (#565)
This avoid `contentWithSidebar` causing complete removal of the content.
As a side-effect, it slightly improves byline detection by not removing
content as early on as before.
5 years ago
PalmerAL b551f1cf6e Fix missing content on Wikipedia articles (#560) 5 years ago
Joe Winett 60f470c4bb Remove aria-hidden="true" nodes (fixes #541) (#555)
Remove aria-hidden="true" nodes (fixes #541)
5 years ago
Jordy van den Aardweg 2982216913 Added "keepClasses" option to prevent cleaning of classes (#552) 5 years ago
Gijs f33a6c2a23
Switch to a newer node.js to fix build issues (#551) 5 years ago
Gijs 234f420279
Clarify security implications of using readability 5 years ago
PalmerAL 9092b2a29c Remove sharing elements in fewer situations (#545)
* remove fewer share elements

* simplify and fix social-buttons testcase
5 years ago
PalmerAL 814f0a3884 Add support for detecting lazy-loaded images (#542)
Add support for detecting lazy-loaded images using `src` or `srcset` attributes.
5 years ago
Mozilla-GitHub-Standards 26379fe62e Add Mozilla Code of Conduct file
Fixes #537.

_(Message COC002)_
5 years ago
Gijs Kruitbosch cb5771fd4a Add nested font tags to test _setNodeTag on those (see #59) 5 years ago
Radhi 9009f64f9c Fix table header missing (#530) 5 years ago
Radhi 6761a7e412 Fix embedded videos getting removed (#526)
Fix embedded videos getting removed
5 years ago
PalmerAL f5c46a7b14 fix formatting 5 years ago
PalmerAL 681bf0c47b use default threshold for share elements 5 years ago
PalmerAL b9cece3e58 add test 5 years ago
PalmerAL e76aba3485 only remove sharing elements if they contain <500 characters 5 years ago
PalmerAL 27ee1e947e update regexes in readerable.js 5 years ago
PalmerAL a014e0c9c8 exclude graphs from nytimes articles 5 years ago
Radhi Fadlillah c942b32945 Revert source files and fix expected results 5 years ago
Radhi Fadlillah bd5087d2f1 fix error in testing "wikipedia" 5 years ago
Radhi Fadlillah 3e025d58e5 fix error in testing "lwn-01" 5 years ago
Radhi Fadlillah df95c9d717 fix error in testing "keep-tabulard-data" 5 years ago