* feat: Refactor and update fixtures
This patch changes how fixtures are stored. Previously, a fixture's folder identified its domain and its filename identified when it was fetched. This has been changed so that the filename indicates the domain and the modified time of the file indicates how recently it was fetched. A fixture's filename can optionally include a modifier to distinguish between two different page types on the same domain, for example.
Also included here are changes to the update-fixture script, both to accomodate the new filename scheme as well as to actually update all fixtures. The functionality for running automatically and opening PRs has been removed but will likely be reintroduced.
Finally, all fixtures have been updated.
* Remove reference to deleted extractor
* feat: first batch of test and parser updates due to new fixtures
* feat: update more custom parsers and unit tests
* feat: update more custom parsers and unit tests and remove unnecessary parser
* feat: update more custom parsers and unit tests
* feat: update more parsers and add correct bloomberg html files
* fix: remove console statement
* feat: all parsers updated and tests passing
* fix: update date_published tests to account for test server time difference
* fix: cleanup remaining fixtures in folders
* feat: move fixtures for newest custom parsers
* feat: remove script changes
* fix: update dist files to account for reverting script changes
* adding .DS_Store to .gitignore
* adding .DS_Store to .gitignore -- 2
* adding .DS_Store to .gitignore -- 3 lol
* cleaning up some tests
* fix: ran build:generator command to update generate-custom-parser dist file
* fix: update rollup configs to generate source maps and update source maps
* fix: use underscore in place of unused error variable
* fix: remove unused fixture
Co-authored-by: Postlight Bot <adam.pash+postlight-bot@postlight.com>
Co-authored-by: flbn <overasc@gmail.com>
Not to be confused with extractor fixtures, which are snapshots of a webpage.
This change removes the pattern of separate JS files that provide "fixtures" for tests, which are used as provided or expected strings in tests. They were inconsistent and disorganized, and generally just served to add indirection to test files. So now all those strings are defined where they are used in their respective tests.
* feat: extract custom types with extend option
Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.
```
Mercury.parse(
url,
extend: {
last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
}
)
```
* chore: use Reflect.ownKeys
* feat: add CLI options
* doc: add extend param to cli help
* refactor: extract selectExtendedTypes
* feat: only overwrite null extended results
* feat: add allowMultiple extraction option
* feat: accept extendList CLI args
* feat: allow attribute selectors in extends on CLI
* test: update extend tests
* fix: don't invoke cleaner for custom types
* feat: always return array if allowMultiple
* test: add test for array of single result
* refactor: extract extractHtml
* refactor: destructure allowMultiple
* fix: wrap multiple matches in $ for cheerio shim
* fix: find extended types before any other munging
* feat: absolutize all links
* fix: clean content more directly
* doc: Update CLI docs in README
* chore: update dist
* doc: Document extend in custom extractor README
* dx: remove commented code and obvious comments that can be looked up
* dx: remove commented out eslint options
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove test block as all its code was commented out
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove regex example comments
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out import
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* dx: remove commented out code
* chore: remove empty files
* chore: re-prettier code that may have missed it
* added back nec comments
* chore: update .nvmrc
* added prettier and pre-commit hooks
* update docker image to new node
* add karma-cli to get web tests working
* explictly install karma... seems to fix problem
* remove pre-built phantomjs
* swap install order
Bloomberg has several templates. I'm supporting three different templates here, but I'm not sure that this is complete by any means.
It's also worth noting that SVGs don't make it through the parser terribly well for many reasons. One, for example, is that a lot of SVGs require custom CSS in order for them to make sense. I'm not sure this is something we can expect to address in the parser.
Big undertaking to support Mercury in the browser. Builds are working and all tests are passing both for web and node builds. Most code is closely shared.
Squashed commit of the following:
commit 0ee7d51ce609ad23d2deca1af41e7b4e56681bd7
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 15:44:28 2016 -0700
feat: dek does not return if it's basically the same as the excerpt
commit 6ad27f994fff3652e04ffe7c81f1ae0b1647e941
Author: Adam Pash <adam.pash@gmail.com>
Date: Mon Oct 10 14:35:54 2016 -0700
feat: added excerpt util