Commit Graph

333 Commits (76d333f0beae4287db503c9a154da674bc49781c)
 

Author SHA1 Message Date
Adam Pash 76d333f0be
deps: upgrade (#218) 5 years ago
Jad Termsani 438d495f3e
docs: add code of conduct (#204)
* docs: add code of conduct

* docs: modify the code of conduct
5 years ago
George Haddad 56badb51f5
dx: remove unnec comments in source (#205)
* dx: remove commented code and obvious comments that can be looked up

* dx: remove commented out eslint options

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove test block as all its code was commented out

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove regex example comments

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out import

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* dx: remove commented out code

* chore: remove empty files

* chore: re-prettier code that may have missed it

* added back nec comments
5 years ago
Adam Pash e2dbd08ae7
fix: pre-commit hook on js (#212) 5 years ago
Adam Pash e4b057f9ea
chore: update node and some deps (#209)
* chore: update .nvmrc

* added prettier and pre-commit hooks

* update docker image to new node

* add karma-cli to get web tests working

* explictly install karma... seems to fix problem

* remove pre-built phantomjs

* swap install order
5 years ago
Adam Pash 78adb2c2a0
fix: auto-pr (#199) 5 years ago
Adam Pash c643666c88
dx: automate fixture updates (#197) 5 years ago
Adam Pash bc23b8b7ea
dx: one-line comment links (#195) 5 years ago
Adam Pash c0676423be
dx: add image to preview and link to original article (#194) 5 years ago
Adam Pash ff144952b9
dx: test/finish bot preview 5 years ago
Adam Pash d35f7bd5bf
dx: comment on PRs when fixtures have been added/changed (#192)
The goal here is to provide some sort of relatively easy preview for the
PR reviewer to see if the fixture looks good, if the parsing is working,
and to make suggestions easily.
5 years ago
Adam Pash 96640e3564
fix: failing fetchResource test (#187)
I think was a fixture problem
6 years ago
Adam Pash 4478338046
docs: document release process (#186) 6 years ago
Adam Pash a7fd0e8dda
dx: add nvmrc file (#185)
The node version should not be higher than the node version we're using
on AWS with the Mercury Parser API
6 years ago
Adam Pash d850177b68
docs: Update README.md (#184) 6 years ago
Adam Pash fd6c9d4fa3
release: 1.0.13 (#183) 6 years ago
Adam Pash 0c15e9aad3
chore: update circle config.yml to 2.0 (#182) 6 years ago
Adam Pash 5663660f76
fix: nytimes custom parser title selector (#181)
* fix: nytimes custom parser title selector

* upgrade node version

* circle ci tweak
6 years ago
Adam Pash 7fcd9b62eb release: 1.0.12 (#173) 7 years ago
Jeremy Mack 5fcea1c5c3 fix: PARSING_NODE undefined (#172)
* fix: PARSING_NODE undefined

* chore: remove unused cleanup function/call
7 years ago
Adam Pash a51cc81c27 release: 1.0.11 (#171) 7 years ago
Jeremy Mack e92e798880 fix: viewport tags leaking to parent page (#170)
* fix: scrub meta viewport tags

They leak to the parent page when using the web version of Mercury
Parser.

* chore: build

* fix: keep DOM in memory to avoid conflicts
7 years ago
Adam Pash 86d6bd1dc1 release: 1.0.10 (#169) 7 years ago
Adam Pash b8aa87c777 feat: improve wh parser (#168) 7 years ago
Adam Pash e56e8e24cd release: 1.0.9 (#167) 7 years ago
Adam Pash 61f0f4e1af fix: kept elements being removed (#166)
Elements marked to keep were removeable under specific circumstances.
This PR fixes these edge cases.
7 years ago
Adam Pash 5741910fdc docs: update changelog (#165) 7 years ago
Adam Pash 321c087be6 release: 1.0.8 (#164) 7 years ago
Adam Pash 453419de72 feat: improve wh.gov parser (#163)
* feat: support youtube-nocookie domain

* feat: updated wh.gov parser to support speeches
7 years ago
Adam Pash e267d57d78 release: 1.0.7 (#160) 7 years ago
Janet f13bb721f6 feat: prospect magazine parser (#147)
* feat: prospect magazine parser

Couldn’t find a way to parse the date but I think it’s good otherwise.

* fix: pulls date

* fix: add timezone

* fix: generalize
7 years ago
Kevin Ngao 1b28713cf5 feat: fool.com parser (#158)
* feat: add fool.com custom parser
7 years ago
Janet c18959779d feat: forward.com parser (#144)
* feat: forward.com parser

LGTM although image didn’t show up in preview

* feat: also pull imge into content

* fix: generalize selectors

* fix: generalize selector
7 years ago
Janet 50e548bac2 feat: qdaily parser (#146)
* feat: qdaily parser

Firstly — I accidentally tried to generate the parser on the master
branch, and I’m not sure where it is, maybe floating in the nether
world.

On to the parser — this one was a bit tricky because things were in
Chinese! The content appears to be parsing (as seen in preview) but
it’s not passing the test. I noticed the second “ ‘ “ mark isn’t
appearing on the parser side.

Additionally, some of the lazy loading images aren’t appearing in the
preview (I cleaned the wrong lazy load images that appeared), so
someone will probably have to work on that (I don’t know how to do
transforms yet).

* fix tests

* fix: selector generalization
7 years ago
Silas Burton 51a4d1d12f feat: newrepublic parser shows image on page (#159) 7 years ago
Silas Burton 11382ce651 Feat: Slate extractor (#153)
* feat: slate extractor

* fix: generalize selectors

* fix: add Slate timezone
7 years ago
Silas Burton 5acaa6ab56 feat: ici.radio-canada.ca extractor (#156)
* feat: ici.radio-canada.ca extractor

* fix: add timezone
7 years ago
Silas Burton 4509b341e6 feat: better cleanup of atlantic articles (#157) 7 years ago
Kevin Ngao f2e3f055c2 Fixes an issue with encoding (#154)
* fix: fixes an issue with encoding on the fetch level
7 years ago
Silas Burton 9b371e51ac Feat: gothamist extractor (#151)
* feat: gothamist extractor

* feat: add other gothamist network sites

* fix: try getting date another way

* fix: add gothamist timezone

* fix: generalize selectors

* fix: h1 is inside entry-header, needs to be specific because of another h1 on the page

* fix: general and specific selector
7 years ago
Kevin Ngao afbef9bc39 Fix Encoding on Body (#143)
* fix: check encoding on body
7 years ago
Adam Pash 9d4c883d51 release: 1.0.6 (#142) 7 years ago
Janet 93d2baf5cf feat: news.natgeo parser (#88)
* feat: natgeo parser

For some reason, the local copy of the article didn’t grab the author
name in it, so I couldn’t figure out how to parse it. The generic
parser took a name of an author of a paper mentioned in the article,
and thought that was the author name, which was funny.

I cleaned a large block quote that didn’t make sense as it was shown in
the preview, although I noticed that the Mercury chrome extension
didn’t even display it.

* fix: add date_published transform

* fix: date_published assertion

* disable: author assertion, generlize author selector

* rm: author assertion

* fix: image lead

* fix: guard agaist missing img url

* fix: generalize dek and title selectors
7 years ago
Janet 2279c2d486 feat: natgeo parser (#89)
* feat: natgeo parser

Same as the news.nationalgeographic.com parser - for some reason the
author name doesn’t appear to be getting pulled into the local copy of
the file.

* fix: content assertion

* fix: generalize author byline

* disable: author assertion

* rm: author assertion

* fix: image lead, handles image-group

* fix: guard agaist missing img url

* fix: generalize dek and title selectors
7 years ago
Adam Pash 08b5bb7ff1 feat: allow parser to define custom date formats (#141)
* feat: allow parser to define custom date formats

* feat: updating macrumors to test/verify format working correctly
7 years ago
Janet 11f466ccb3 feat: latimes parser (#92)
* feat: latimes parser
7 years ago
Kevin Ngao 26a8e4f75a feat: macrumors parser (#120)
* feat: add macrumors
7 years ago
Kevin Ngao b4fec6af98 feat: androidcentral parser (#119)
* feat: androidcentral parser
7 years ago
Janet beb0b89a4f feat: pagesix parser (#97)
* feat: pagesix parser
7 years ago
Janet f2160eb5b6 feat: si parser (#118)
* feat: si parser
7 years ago