Commit Graph

11 Commits (fix-remove-moment-js)

Author SHA1 Message Date
John Holdun 97472cf4f8
Change Name (#688)
Mercury Parser is now Postlight Parser!
2 years ago
John Brayton 3c5c0bdba9
feat: Add a custom extractor for www.engadget.com. (#552)
* feat:Add a custom extractor for ma.ttias.be.

When parsing content for cron.weekly issues, such as the one at https://ma.ttias.be/cronweekly/issue-130/, Mercury Parser would remove headings and ordered lists that were part of the content. This resolves that as follows:

* Remove "id" attributes from "h1" and "h2" elements. Those attributes would result in the elements having a low weight.
* Since Mercury Parser demotes "h1" elements to "h2", demote "h2" elements to "h3".
* Add class="entry-content-asset" to "ul" elements to avoid them being removed.

* removed redundant comment.

* feat: Add a custom extractor for engadget.com.

Co-authored-by: John Holdun <john@johnholdun.com>
2 years ago
Nick Sweeting 99062da034
Add --version CLI flag (#610)
* add --version CLI flag

* move import to top of file for consistency

Co-authored-by: John Holdun <john@johnholdun.com>
2 years ago
Michael Ashley e12c916499
feat: ability to add custom extractors via api (#484)
* feat: ability to add custom extractors via api

* docs: updating readme

* fix: example.com was being used in another test

* fix: timezone was messing up date_published test

* fix: using a unique site for testing

* fix: updated custom extractor api

* docs: updating readme

* fix: removing unused fixture

* fix: updating test description

* feat: ability to add custom extractors via cli
5 years ago
Toufic Mouallem 144a797564
feat: Support passing custom headers in requests (#337) 5 years ago
Drew Bell b3e2a0ffd1 feat: extract custom types with extend option (#313)
* feat: extract custom types with extend option

Adds an `extend` option that lets you add custom types to be extracted
and returned alongside the defaults, either in a call to `parse()` or in
a custom extractor.

```
Mercury.parse(
  url,
  extend: {
    last_edited: { selectors: ['#last-edited'], defaultCleaner: false }
  }
)
```

* chore: use Reflect.ownKeys

* feat: add CLI options

* doc: add extend param to cli help

* refactor: extract selectExtendedTypes

* feat: only overwrite null extended results

* feat: add allowMultiple extraction option

* feat: accept extendList CLI args

* feat: allow attribute selectors in extends on CLI

* test: update extend tests

* fix: don't invoke cleaner for custom types

* feat: always return array if allowMultiple

* test: add test for array of single result

* refactor: extract extractHtml

* refactor: destructure allowMultiple

* fix: wrap multiple matches in $ for cheerio shim

* fix: find extended types before any other munging

* feat: absolutize all links

* fix: clean content more directly

* doc: Update CLI docs in README

* chore: update dist

* doc: Document extend in custom extractor README
5 years ago
Adam Pash e033835c72
fix: parse signature in cli (#259) 5 years ago
Adam Pash 9b0664bc91
feat: add content format output options (#256) 5 years ago
Adam Pash b77a236dbe
feat: handle cli errors/timeout (#250) 5 years ago
Adam Pash d884c3470c
release: 1.1.0 (#245) 5 years ago
Adam Pash 6844975c94
feat: add mercury-parser cli (#244) 5 years ago