release: 2.1.1

release-2.1.1
Adam Pash 5 years ago
parent c11b85f405
commit 8b4a7bb1e9

@ -1,5 +1,50 @@
# Mercury Parser Changelog
### 2.1.1 (Jun 26, 2019)
##### Commits
- [[`c11b85f405`](https://github.com/postlight/mercury-parser/commit/c11b85f405)] - **deps**: update eslint-config-prettier to version 5.0.0 (#441) (greenkeeper[bot])
- [[`3b0d5fed69`](https://github.com/postlight/mercury-parser/commit/3b0d5fed69)] - **chore**: prevent adding phantomjs-prebuilt as a dependency in CI. (#412) (Jaen)
- [[`939d181951`](https://github.com/postlight/mercury-parser/commit/939d181951)] - **fix**: support query strings in lazy-loaded srcsets (#387) (Toufic Mouallem)
- [[`0942c37876`](https://github.com/postlight/mercury-parser/commit/0942c37876)] - **feat**: custom parser for phoronix.com. (#431) (Ben Ubois)
- [[`571a913745`](https://github.com/postlight/mercury-parser/commit/571a913745)] - **feat**: pitchfork extractor (#439) (Michael P. Geraci)
- [[`c8a66b0d77`](https://github.com/postlight/mercury-parser/commit/c8a66b0d77)] - **deps**: Update moment-timezone to the latest version 🚀 (#388) (greenkeeper[bot])
- [[`255da63e26`](https://github.com/postlight/mercury-parser/commit/255da63e26)] - **deps**: bump handlebars from 4.0.6 to 4.1.2 (#434) (dependabot[bot])
- [[`c7abfc25c6`](https://github.com/postlight/mercury-parser/commit/c7abfc25c6)] - chore(deps): bump sshpk from 1.10.1 to 1.16.1 (#435) (dependabot[bot])
- [[`694ea820aa`](https://github.com/postlight/mercury-parser/commit/694ea820aa)] - Custom Extractor for clinicaltrials.gov (#305) (david0leong)
- [[`a7cd9027e2`](https://github.com/postlight/mercury-parser/commit/a7cd9027e2)] - **chore**: update husky to version 2.3.0 (#422) (Toufic Mouallem)
- [[`9f6f07508c`](https://github.com/postlight/mercury-parser/commit/9f6f07508c)] - **docs**: Add links to README (Gina Trapani)
- [[`3414ebaa62`](https://github.com/postlight/mercury-parser/commit/3414ebaa62)] - **chore**: update jquery to version 3.4.1 (#420) (Toufic Mouallem)
- [[`7c8de71c52`](https://github.com/postlight/mercury-parser/commit/7c8de71c52)] - **fix**: new yorker extractor (#414) (Wajeeh Zantout)
- [[`e66ad8b81c`](https://github.com/postlight/mercury-parser/commit/e66ad8b81c)] - **feat**: add le monde extractor (#415) (Wajeeh Zantout)
- [[`f81dc63617`](https://github.com/postlight/mercury-parser/commit/f81dc63617)] - **feat**: add rbbtoday.com custom parser (#411) (kik0220)
- [[`5e1113b3a9`](https://github.com/postlight/mercury-parser/commit/5e1113b3a9)] - **feat**: add japan.zdnet.com custom parser (#410) (kik0220)
- [[`77e3bc00e2`](https://github.com/postlight/mercury-parser/commit/77e3bc00e2)] - **feat**: add wired.jp custom parser (#409) (kik0220)
- [[`0b36c96de0`](https://github.com/postlight/mercury-parser/commit/0b36c96de0)] - **feat**: add techlog.iij.ad.jp custom parser (#405) (kik0220)
- [[`406bf1b1a9`](https://github.com/postlight/mercury-parser/commit/406bf1b1a9)] - **feat**: add weekly.ascii.jp custom parser (#401) (kik0220)
- [[`216bfade00`](https://github.com/postlight/mercury-parser/commit/216bfade00)] - **feat**: add www.ipa.go.jp custom parser (#408) (kik0220)
- [[`3ae8f3bde3`](https://github.com/postlight/mercury-parser/commit/3ae8f3bde3)] - **feat**: add www.oreilly.co.jp custom parser (#407) (kik0220)
- [[`7396e81b72`](https://github.com/postlight/mercury-parser/commit/7396e81b72)] - **feat**: add sect.iij.ad.jp custom parser (#404) (kik0220)
- [[`3f1d9030ee`](https://github.com/postlight/mercury-parser/commit/3f1d9030ee)] - **feat**: add www.lifehacker.jp custom parser (#403) (kik0220)
- [[`b077000c4a`](https://github.com/postlight/mercury-parser/commit/b077000c4a)] - **feat**: add getnews.jp custom parser (#402) (kik0220)
- [[`b5425c3e8a`](https://github.com/postlight/mercury-parser/commit/b5425c3e8a)] - **feat**: add www.gizmodo.jp custom parser (#400) (kik0220)
- [[`a38c727a0a`](https://github.com/postlight/mercury-parser/commit/a38c727a0a)] - **feat**: add deadline.com custom parser (#383) (kik0220)
- [[`74a3c49a3c`](https://github.com/postlight/mercury-parser/commit/74a3c49a3c)] - **feat**: add japan.cnet.com custom parser (#382) (kik0220)
- [[`7b07f88448`](https://github.com/postlight/mercury-parser/commit/7b07f88448)] - **feat**: add www.yomiuri.co.jp custom parser (#381) (kik0220)
- [[`3f46859d14`](https://github.com/postlight/mercury-parser/commit/3f46859d14)] - **fix**: skip absolutizing invalid srcsets (#386) (Toufic Mouallem)
- [[`779c1154fb`](https://github.com/postlight/mercury-parser/commit/779c1154fb)] - **fix**: add date_published selector in www.sanwa.co.jp extractor (#378) (kik0220)
- [[`ea5b65f019`](https://github.com/postlight/mercury-parser/commit/ea5b65f019)] - **fix**: add date_published selector in www.elecom.co.jp extractor (#377) (kik0220)
- [[`7c0949e587`](https://github.com/postlight/mercury-parser/commit/7c0949e587)] - **fix**: add date_published selector in www.ossnews.jp extractor (#376) (kik0220)
- [[`3e91ac55db`](https://github.com/postlight/mercury-parser/commit/3e91ac55db)] - **fix**: add date_published selector in jvndb.jvn.jp extractor (#375) (kik0220)
- [[`8ca2894751`](https://github.com/postlight/mercury-parser/commit/8ca2894751)] - **feat**: add bookwalker.jp custom parser (#374) (kik0220)
- [[`a5f06ce27a`](https://github.com/postlight/mercury-parser/commit/a5f06ce27a)] - **feat**: add takagi-hiromitsu.jp custom parser (#364) (kik0220)
- [[`b9c57dbc2f`](https://github.com/postlight/mercury-parser/commit/b9c57dbc2f)] - **feat**: add www.publickey1.jp custom parser (#365) (kik0220)
- [[`d7dbea8a95`](https://github.com/postlight/mercury-parser/commit/d7dbea8a95)] - **feat**: add www.itmedia.co.jp custom parser (#366) (kik0220)
- [[`9218f80da6`](https://github.com/postlight/mercury-parser/commit/9218f80da6)] - **feat**: add www.moongift.jp custom parser (#367) (kik0220)
- [[`4eb73dffb0`](https://github.com/postlight/mercury-parser/commit/4eb73dffb0)] - **feat**: add www.infoq.com custom parser (#368) (kik0220)
- [[`ce5cd2dd0d`](https://github.com/postlight/mercury-parser/commit/ce5cd2dd0d)] - **feat**: add phpspot.org custom parser (#369) (kik0220)
### 2.1.0 (Apr 10, 2019)
##### Commits

675
dist/mercury.js vendored

@ -1264,6 +1264,7 @@ function absolutizeSet($, rootUrl, $content) {
// descriptors can only contain positive numbers followed immediately by either 'w' or 'x'
// space characters inside the URL should be encoded (%20 or +)
var candidates = urlSet.match(/(?:\s*)(\S+(?:\s*[\d.]+[wx])?)(?:\s*,\s*)?/g);
if (!candidates) return;
var absoluteCandidates = candidates.map(function (candidate) {
// a candidate URL cannot start or end with a comma
// descriptors are separated from the URLs by unescaped whitespace
@ -1529,7 +1530,7 @@ function setAttrs(node, attrs) {
var IS_LINK = new RegExp('https?://', 'i');
var IMAGE_RE = '.(png|gif|jpe?g)';
var IS_IMAGE = new RegExp("".concat(IMAGE_RE), 'i');
var IS_SRCSET = new RegExp("".concat(IMAGE_RE, "(\\s*[\\d.]+[wx])"), 'i');
var IS_SRCSET = new RegExp("".concat(IMAGE_RE, "(\\?\\S+)?(\\s*[\\d.]+[wx])"), 'i');
var TAGS_TO_REMOVE = ['script', 'style', 'form'].join(',');
// lazy loaded images into normal images.
@ -1934,30 +1935,31 @@ var TheAtlanticExtractor = {
var NewYorkerExtractor = {
domain: 'www.newyorker.com',
title: {
selectors: ['h1.title']
selectors: ['h1[class^="ArticleHeader__hed"]', ['meta[name="og:title"]', 'value']]
},
author: {
selectors: ['.contributors']
selectors: ['div[class^="ArticleContributors"] a[rel="author"]', 'article header div[class*="Byline__multipleContributors"]']
},
content: {
selectors: ['div#articleBody', 'div.articleBody'],
selectors: ['main[class^="Layout__content"]'],
// Is there anything in the content you selected that needs transformed
// before it's consumable content? E.g., unusual lazy loaded images
transforms: [],
// Is there anything that is in the result that shouldn't be?
// The clean selectors will remove anything that matches from
// the result
clean: []
clean: ['footer[class^="ArticleFooter__footer"]']
},
date_published: {
selectors: [['meta[name="article:published_time"]', 'value'], ['time[itemProp="datePublished"]', 'content']],
selectors: [['meta[name="pubdate"]', 'value']],
format: 'YYYYMMDD',
timezone: 'America/New_York'
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
dek: {
selectors: ['.dek', 'h2.dek']
selectors: ['h2[class^="ArticleHeader__dek"]']
},
next_page_url: null,
excerpt: null
@ -4760,6 +4762,30 @@ var NewsMynaviJpExtractor = {
}
};
var ClinicaltrialsGovExtractor = {
domain: 'clinicaltrials.gov',
title: {
selectors: ['h1.tr-solo_record']
},
author: {
selectors: ['div#sponsor.tr-info-text']
},
date_published: {
// selectors: ['span.term[data-term="Last Update Posted"]'],
selectors: ['div:has(> span.term[data-term="Last Update Posted"])']
},
content: {
selectors: ['div#tab-body'],
// Is there anything in the content you selected that needs transformed
// before it's consumable content? E.g., unusual lazy loaded images
transforms: {},
// Is there anything that is in the result that shouldn't be?
// The clean selectors will remove anything that matches from
// the result
clean: ['.usa-alert> img']
}
};
var GithubComExtractor = {
domain: 'github.com',
title: {
@ -4865,7 +4891,11 @@ var WwwOssnewsJpExtractor = {
selectors: ['#alpha-block h1.hxnewstitle']
},
author: null,
date_published: null,
date_published: {
selectors: ['p.fs12'],
format: 'YYYY年MM月DD日 HH:mm',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
@ -4931,7 +4961,11 @@ var WwwSanwaCoJpExtractor = {
selectors: ['#newsContent h1']
},
author: null,
date_published: null,
date_published: {
selectors: ['p.date'],
format: 'YYYY.MM.DD',
timezone: 'Asia/Tokyo'
},
dek: {
selectors: [['meta[name="og:description"]', 'value']]
},
@ -4952,7 +4986,11 @@ var WwwElecomCoJpExtractor = {
selectors: ['title']
},
author: null,
date_published: null,
date_published: {
selectors: ['p.section-last'],
format: 'YYYY.MM.DD',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: null,
content: {
@ -4996,7 +5034,11 @@ var JvndbJvnJpExtractor = {
selectors: ['title']
},
author: null,
date_published: null,
date_published: {
selectors: ['div.modifytxt:nth-child(2)'],
format: 'YYYY/MM/DD',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: null,
content: {
@ -5064,6 +5106,590 @@ var WwwJnsaOrgExtractor = {
}
};
var PhpspotOrgExtractor = {
domain: 'phpspot.org',
title: {
selectors: ['h3.hl']
},
author: null,
date_published: {
selectors: ['h4.hl'],
format: 'YYYY年MM月DD日',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: null,
content: {
selectors: ['div.entrybody'],
defaultCleaner: false,
transforms: {},
clean: []
}
};
var WwwInfoqComExtractor = {
domain: 'www.infoq.com',
title: {
selectors: ['h1.heading']
},
author: {
selectors: ['div.widget.article__authors']
},
date_published: {
selectors: ['.article__readTime.date'],
format: 'YYYY年MM月DD日',
timezone: 'Asia/Tokyo'
},
dek: {
selectors: [['meta[name="og:description"]', 'value']]
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.article__data'],
defaultCleaner: false,
transforms: {},
clean: []
}
};
var WwwMoongiftJpExtractor = {
domain: 'www.moongift.jp',
title: {
selectors: ['h1.title a']
},
author: null,
date_published: {
selectors: ['ul.meta li:not(.social):first-of-type'],
timezone: 'Asia/Tokyo'
},
dek: {
selectors: [['meta[name="og:description"]', 'value']]
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['#main'],
transforms: {},
clean: ['ul.mg_service.cf']
}
};
var WwwItmediaCoJpExtractor = {
domain: 'www.itmedia.co.jp',
supportedDomains: ['www.atmarkit.co.jp', 'techtarget.itmedia.co.jp', 'nlab.itmedia.co.jp'],
title: {
selectors: ['#cmsTitle h1']
},
author: {
selectors: ['#byline']
},
date_published: {
selectors: [['meta[name="article:modified_time"]', 'value']]
},
dek: {
selectors: ['#cmsAbstract h2']
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['#cmsBody'],
defaultCleaner: false,
transforms: {},
clean: ['#snsSharebox']
}
};
var WwwPublickey1JpExtractor = {
domain: 'www.publickey1.jp',
title: {
selectors: ['h1']
},
author: {
selectors: ['#subcol p:has(img)']
},
date_published: {
selectors: ['div.pubdate'],
format: 'YYYY年MM月DD日',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['#maincol'],
defaultCleaner: false,
transforms: {},
clean: ['#breadcrumbs', 'div.sbm', 'div.ad_footer']
}
};
var TakagihiromitsuJpExtractor = {
domain: 'takagi-hiromitsu.jp',
title: {
selectors: ['h3']
},
author: {
selectors: [['meta[name="author"]', 'value']]
},
date_published: {
selectors: [['meta[http-equiv="Last-Modified"]', 'value']]
},
dek: null,
lead_image_url: null,
content: {
selectors: ['div.body'],
defaultCleaner: false,
transforms: {},
clean: []
}
};
var BookwalkerJpExtractor = {
domain: 'bookwalker.jp',
title: {
selectors: ['h1.main-heading']
},
author: {
selectors: ['div.authors']
},
date_published: {
selectors: ['.work-info .work-detail:first-of-type .work-detail-contents:last-of-type'],
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: [['div.main-info', 'div.main-cover-inner']],
defaultCleaner: false,
transforms: {},
clean: ['span.label.label--trial', 'dt.info-head.info-head--coin', 'dd.info-contents.info-contents--coin', 'div.info-notice.fn-toggleClass']
}
};
var WwwYomiuriCoJpExtractor = {
domain: 'www.yomiuri.co.jp',
title: {
selectors: ['h1.title-article.c-article-title']
},
author: null,
date_published: {
selectors: [['meta[name="article:published_time"]', 'value']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.p-main-contents'],
transforms: {},
clean: []
}
};
var JapanCnetComExtractor = {
domain: 'japan.cnet.com',
title: {
selectors: ['.leaf-headline-ttl']
},
author: {
selectors: ['.writer']
},
date_published: {
selectors: ['.date'],
format: 'YYYY年MM月DD日 HH時mm分',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.article_body'],
transforms: {},
clean: []
}
};
var DeadlineComExtractor = {
domain: 'deadline.com',
title: {
selectors: ['h1']
},
author: {
selectors: ['section.author h3']
},
date_published: {
selectors: [['meta[name="article:published_time"]', 'value']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.a-article-grid__main.pmc-a-grid article.pmc-a-grid-item'],
transforms: {
'.embed-twitter': function embedTwitter($node) {
var innerHtml = $node.html();
$node.replaceWith(innerHtml);
}
},
clean: []
}
};
var WwwGizmodoJpExtractor = {
domain: 'www.gizmodo.jp',
title: {
selectors: ['h1.p-post-title']
},
author: {
selectors: ['li.p-post-AssistAuthor']
},
date_published: {
selectors: [['li.p-post-AssistTime time', 'datetime']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['article.p-post'],
transforms: {
'img.p-post-thumbnailImage': function imgPPostThumbnailImage($node) {
var src = $node.attr('src');
$node.attr('src', src.replace(/^.*=%27/, '').replace(/%27;$/, ''));
}
},
clean: ['h1.p-post-title', 'ul.p-post-Assist']
}
};
var GetnewsJpExtractor = {
domain: 'getnews.jp',
title: {
selectors: ['article h1']
},
author: {
selectors: ['span.prof']
},
date_published: {
selectors: [['ul.cattag-top time', 'datetime']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.post-bodycopy'],
transforms: {},
clean: []
}
};
var WwwLifehackerJpExtractor = {
domain: 'www.lifehacker.jp',
title: {
selectors: ['h1.lh-summary-title']
},
author: {
selectors: ['p.lh-entryDetailInner--credit']
},
date_published: {
selectors: [['div.lh-entryDetail-header time', 'datetime']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.lh-entryDetail-body'],
transforms: {
'img.lazyload': function imgLazyload($node) {
var src = $node.attr('src');
$node.attr('src', src.replace(/^.*=%27/, '').replace(/%27;$/, ''));
}
},
clean: ['p.lh-entryDetailInner--credit']
}
};
var SectIijAdJpExtractor = {
domain: 'sect.iij.ad.jp',
title: {
selectors: ['h3']
},
author: {
selectors: ['dl.entrydate dd']
},
date_published: {
selectors: ['dl.entrydate dd'],
format: 'YYYY年MM月DD日',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['#article'],
transforms: {},
clean: ['dl.entrydate']
}
};
var WwwOreillyCoJpExtractor = {
domain: 'www.oreilly.co.jp',
title: {
selectors: ['h3']
},
author: {
selectors: ['li[itemprop="author"]']
},
date_published: {
selectors: [['meta[itemprop="datePublished"]', 'value']],
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['#content'],
defaultCleaner: false,
transforms: {},
clean: ['.social-tools']
}
};
var WwwIpaGoJpExtractor = {
domain: 'www.ipa.go.jp',
title: {
selectors: ['h1']
},
author: null,
date_published: {
selectors: ['p.ipar_text_right'],
format: 'YYYY年M月D日',
timezone: 'Asia/Tokyo'
},
dek: null,
lead_image_url: null,
content: {
selectors: ['#ipar_main'],
defaultCleaner: false,
transforms: {},
clean: ['p.ipar_text_right']
}
};
var WeeklyAsciiJpExtractor = {
domain: 'weekly.ascii.jp',
title: {
selectors: ['h1[itemprop="headline"]']
},
author: {
selectors: ['p.author']
},
date_published: {
selectors: [['meta[name="odate"]', 'value']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.article'],
transforms: {},
clean: []
}
};
var TechlogIijAdJpExtractor = {
domain: 'techlog.iij.ad.jp',
title: {
selectors: ['h1.entry-title']
},
author: {
selectors: ['a[rel="author"]']
},
date_published: {
selectors: [['time.entry-date', 'datetime']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.entry-content'],
defaultCleaner: false,
transforms: {},
clean: []
}
};
var WiredJpExtractor = {
domain: 'wired.jp',
title: {
selectors: ['h1.post-title']
},
author: {
selectors: ['p[itemprop="author"]']
},
date_published: {
selectors: [['time', 'datetime']]
},
dek: {
selectors: ['.post-intro']
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['article.article-detail'],
transforms: {
'img[data-original]': function imgDataOriginal($node) {
var dataOriginal = $node.attr('data-original');
var src = $node.attr('src');
var url = URL.resolve(src, dataOriginal);
$node.attr('src', url);
}
},
clean: ['.post-category', 'time', 'h1.post-title', '.social-area-syncer']
}
};
var JapanZdnetComExtractor = {
domain: 'japan.zdnet.com',
title: {
selectors: ['h1']
},
author: {
selectors: [['meta[name="cXenseParse:author"]', 'value']]
},
date_published: {
selectors: [['meta[name="article:published_time"]', 'value']]
},
dek: null,
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['div.article_body'],
transforms: {},
clean: []
}
};
var WwwRbbtodayComExtractor = {
domain: 'www.rbbtoday.com',
title: {
selectors: ['h1']
},
author: {
selectors: ['.writer.writer-name']
},
date_published: {
selectors: [['header time', 'datetime']]
},
dek: {
selectors: ['.arti-summary']
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['.arti-content'],
transforms: {},
clean: ['.arti-giga']
}
};
var WwwLemondeFrExtractor = {
domain: 'www.lemonde.fr',
title: {
selectors: ['h1.article__title']
},
author: {
selectors: ['.author__name']
},
date_published: {
selectors: [['meta[name="og:article:published_time"]', 'value']]
},
dek: {
selectors: ['.article__desc']
},
lead_image_url: {
selectors: [['meta[name="og:image"]', 'value']]
},
content: {
selectors: ['.article__content'],
transforms: {},
clean: []
}
};
var WwwPhoronixComExtractor = {
domain: 'www.phoronix.com',
title: {
selectors: ['article header']
},
author: {
selectors: ['.author a:first-child']
},
date_published: {
selectors: ['.author'],
// 1 June 2019 at 08:34 PM EDT
format: 'D MMMM YYYY at hh:mm',
timezone: 'America/New_York'
},
dek: null,
lead_image_url: null,
content: {
selectors: ['.content'],
// Is there anything in the content you selected that needs transformed
// before it's consumable content? E.g., unusual lazy loaded images
transforms: {},
// Is there anything that is in the result that shouldn't be?
// The clean selectors will remove anything that matches from
// the result
clean: []
}
};
var PitchforkComExtractor = {
domain: 'pitchfork.com',
title: {
selectors: ['title']
},
author: {
selectors: ['.authors-detail__display-name']
},
date_published: {
selectors: [['.pub-date', 'datetime']]
},
dek: {
selectors: ['.review-detail__abstract']
},
lead_image_url: {
selectors: [['.single-album-tombstone__art img', 'src']]
},
content: {
selectors: ['.review-detail__text']
},
extend: {
score: {
selectors: ['.score']
}
}
};
var CustomExtractors = /*#__PURE__*/Object.freeze({
@ -5162,6 +5788,7 @@ var CustomExtractors = /*#__PURE__*/Object.freeze({
WwwFastcompanyComExtractor: WwwFastcompanyComExtractor,
BlisterreviewComExtractor: BlisterreviewComExtractor,
NewsMynaviJpExtractor: NewsMynaviJpExtractor,
ClinicaltrialsGovExtractor: ClinicaltrialsGovExtractor,
GithubComExtractor: GithubComExtractor,
WwwRedditComExtractor: WwwRedditComExtractor,
OtrsComExtractor: OtrsComExtractor,
@ -5173,7 +5800,31 @@ var CustomExtractors = /*#__PURE__*/Object.freeze({
ScanNetsecurityNeJpExtractor: ScanNetsecurityNeJpExtractor,
JvndbJvnJpExtractor: JvndbJvnJpExtractor,
GeniusComExtractor: GeniusComExtractor,
WwwJnsaOrgExtractor: WwwJnsaOrgExtractor
WwwJnsaOrgExtractor: WwwJnsaOrgExtractor,
PhpspotOrgExtractor: PhpspotOrgExtractor,
WwwInfoqComExtractor: WwwInfoqComExtractor,
WwwMoongiftJpExtractor: WwwMoongiftJpExtractor,
WwwItmediaCoJpExtractor: WwwItmediaCoJpExtractor,
WwwPublickey1JpExtractor: WwwPublickey1JpExtractor,
TakagihiromitsuJpExtractor: TakagihiromitsuJpExtractor,
BookwalkerJpExtractor: BookwalkerJpExtractor,
WwwYomiuriCoJpExtractor: WwwYomiuriCoJpExtractor,
JapanCnetComExtractor: JapanCnetComExtractor,
DeadlineComExtractor: DeadlineComExtractor,
WwwGizmodoJpExtractor: WwwGizmodoJpExtractor,
GetnewsJpExtractor: GetnewsJpExtractor,
WwwLifehackerJpExtractor: WwwLifehackerJpExtractor,
SectIijAdJpExtractor: SectIijAdJpExtractor,
WwwOreillyCoJpExtractor: WwwOreillyCoJpExtractor,
WwwIpaGoJpExtractor: WwwIpaGoJpExtractor,
WeeklyAsciiJpExtractor: WeeklyAsciiJpExtractor,
TechlogIijAdJpExtractor: TechlogIijAdJpExtractor,
WiredJpExtractor: WiredJpExtractor,
JapanZdnetComExtractor: JapanZdnetComExtractor,
WwwRbbtodayComExtractor: WwwRbbtodayComExtractor,
WwwLemondeFrExtractor: WwwLemondeFrExtractor,
WwwPhoronixComExtractor: WwwPhoronixComExtractor,
PitchforkComExtractor: PitchforkComExtractor
});
var Extractors = _Object$keys(CustomExtractors).reduce(function (acc, key) {

File diff suppressed because one or more lines are too long

@ -1,6 +1,6 @@
{
"name": "@postlight/mercury-parser",
"version": "2.1.0",
"version": "2.1.1",
"description": "Mercury transforms web pages into clean text. Publishers and programmers use it to make the web make sense, and readers use it to read any web article comfortably.",
"author": "Postlight <mercury@postlight.com>",
"homepage": "https://mercury.postlight.com",

@ -621,7 +621,6 @@
"@types/normalize-package-data@^2.4.0":
version "2.4.0"
resolved "https://registry.yarnpkg.com/@types/normalize-package-data/-/normalize-package-data-2.4.0.tgz#e486d0d97396d79beedd0a6e33f4534ff6b4973e"
integrity sha512-f5j5b/Gf71L+dbqxIpQ4Z2WlmI/mPJ0fOkGGmFgtb6sAu97EPczzbS3/tJKxmcYDj55OX6ssqwDAWOHIYDRDGA==
"@types/unist@*", "@types/unist@^2.0.0":
version "2.0.2"
@ -897,14 +896,12 @@ asn1.js@^4.0.0:
asn1@~0.2.3:
version "0.2.4"
resolved "https://registry.yarnpkg.com/asn1/-/asn1-0.2.4.tgz#8d2475dfab553bb33e77b54e59e880bb8ce23136"
integrity sha512-jxwzQpLQjSmWXgwaCZE9Nz+glAG01yF1QnWgbhGwHI5A6FRIEY6IVqtHhIepHqI7/kyEyQEagBC5mBEFlIYvdg==
dependencies:
safer-buffer "~2.1.0"
assert-plus@^1.0.0:
version "1.0.0"
resolved "https://registry.yarnpkg.com/assert-plus/-/assert-plus-1.0.0.tgz#f12e0f3c5d77b0b1cdd9146942e4e96c1e4dd525"
integrity sha1-8S4PPF13sLHN2RRpQuTpbB5N1SU=
assert@^1.4.0:
version "1.4.1"
@ -1275,7 +1272,6 @@ base@^0.11.1:
bcrypt-pbkdf@^1.0.0:
version "1.0.2"
resolved "https://registry.yarnpkg.com/bcrypt-pbkdf/-/bcrypt-pbkdf-1.0.2.tgz#a4301d389b6a43f9b67ff3ca11a3f6637e360e9e"
integrity sha1-pDAdOJtqQ/m2f/PKEaP2Y342Dp4=
dependencies:
tweetnacl "^0.14.3"
@ -1963,7 +1959,6 @@ commander@~2.17.1:
commander@~2.20.0:
version "2.20.0"
resolved "https://registry.yarnpkg.com/commander/-/commander-2.20.0.tgz#d58bb2b5c1ee8f87b0d340027e9e94e222c5a422"
integrity sha512-7j2y+40w61zy6YC2iRNpUe/NwhNyoXrYpHMrSunaMG64nRnaf96zO/KMQR4OyN/UnE5KLyEBnKHd4aG3rskjpQ==
commit-stream@~1.1.0:
version "1.1.0"
@ -2089,7 +2084,6 @@ cosmiconfig@5.0.6:
cosmiconfig@^5.2.0:
version "5.2.0"
resolved "https://registry.yarnpkg.com/cosmiconfig/-/cosmiconfig-5.2.0.tgz#45038e4d28a7fe787203aede9c25bca4a08b12c8"
integrity sha512-nxt+Nfc3JAqf4WIWd0jXLjTJZmsPLrA9DDc4nRw2KFJQJK7DNooqSXrNI7tzLG50CF8axczly5UV929tBmh/7g==
dependencies:
import-fresh "^2.0.0"
is-directory "^0.3.1"
@ -2196,7 +2190,6 @@ damerau-levenshtein@^1.0.4:
dashdash@^1.12.0:
version "1.14.1"
resolved "https://registry.yarnpkg.com/dashdash/-/dashdash-1.14.1.tgz#853cfa0f7cbe2fed5de20326b8dd581035f6e2f0"
integrity sha1-hTz6D3y+L+1d4gMmuN1YEDX24vA=
dependencies:
assert-plus "^1.0.0"
@ -2513,7 +2506,6 @@ duplexer2@~0.0.2:
ecc-jsbn@~0.1.1:
version "0.1.2"
resolved "https://registry.yarnpkg.com/ecc-jsbn/-/ecc-jsbn-0.1.2.tgz#3a83a904e54353287874c564b7549386849a98c9"
integrity sha1-OoOpBOVDUyh4dMVkt1SThoSamMk=
dependencies:
jsbn "~0.1.0"
safer-buffer "^2.1.0"
@ -2723,9 +2715,9 @@ eslint-config-airbnb@^17.1.0:
object.assign "^4.1.0"
object.entries "^1.0.4"
eslint-config-prettier@^4.0.0:
version "4.0.0"
resolved "https://registry.npmjs.org/eslint-config-prettier/-/eslint-config-prettier-4.0.0.tgz#16cedeea0a56e74de60dcbbe3be0ab2c645405b9"
eslint-config-prettier@^5.0.0:
version "5.1.0"
resolved "https://registry.npmjs.org/eslint-config-prettier/-/eslint-config-prettier-5.1.0.tgz#bf29442e7c818236a77acfe2241ec991299f9bf1"
dependencies:
get-stdin "^6.0.0"
@ -3425,7 +3417,6 @@ get-stdin@^6.0.0:
get-stdin@^7.0.0:
version "7.0.0"
resolved "https://registry.yarnpkg.com/get-stdin/-/get-stdin-7.0.0.tgz#8d5de98f15171a125c5e516643c7a6d0ea8a96f6"
integrity sha512-zRKcywvrXlXsA0v0i9Io4KDRaAw7+a1ZpjRwl9Wox8PFlVCCHra7E9c4kqXCoCM9nR5tBkaTTZRBoCm60bFqTQ==
get-stream@^3.0.0:
version "3.0.0"
@ -3444,7 +3435,6 @@ get-value@^2.0.3, get-value@^2.0.6:
getpass@^0.1.1:
version "0.1.7"
resolved "https://registry.yarnpkg.com/getpass/-/getpass-0.1.7.tgz#5eff8e3e684d569ae4cb2b1282604e8ba62149fa"
integrity sha1-Xv+OPmhNVprkyysSgmBOi6YhSfo=
dependencies:
assert-plus "^1.0.0"
@ -3594,7 +3584,6 @@ growly@^1.3.0:
handlebars@^4.0.3:
version "4.1.2"
resolved "https://registry.yarnpkg.com/handlebars/-/handlebars-4.1.2.tgz#b6b37c1ced0306b221e094fc7aca3ec23b131b67"
integrity sha512-nvfrjqvt9xQ8Z/w0ijewdD/vvWDTOweBUm96NTr66Wfvo1mJenBLwcYmPs3TIBP5ruzYGD7Hx/DaM9RmhroGPw==
dependencies:
neo-async "^2.6.0"
optimist "^0.6.1"
@ -3777,7 +3766,6 @@ https-browserify@^1.0.0:
husky@^2.3.0:
version "2.3.0"
resolved "https://registry.yarnpkg.com/husky/-/husky-2.3.0.tgz#8b78ed24d763042df7fd899991985d65a976dd13"
integrity sha512-A/ZQSEILoq+mQM3yC3RIBSaw1bYXdkKnyyKVSUiJl+iBjVZc5LQEXdGY1ZjrDxC4IzfRPiJ0IqzEQGCN5TQa/A==
dependencies:
cosmiconfig "^5.2.0"
execa "^1.0.0"
@ -4697,7 +4685,6 @@ js-yaml@^3.12.0, js-yaml@^3.6.1, js-yaml@^3.7.0, js-yaml@^3.9.0:
js-yaml@^3.13.0:
version "3.13.1"
resolved "https://registry.yarnpkg.com/js-yaml/-/js-yaml-3.13.1.tgz#aff151b30bfdfa8e49e05da22e7415e9dfa37847"
integrity sha512-YfbcO7jXDdyj0DGxYVSlSeQNHbD7XPWvrVWeVUujrQEoZzWJIRrCPoyk6kL6IAjAG2IolMK4T0hNUe0HOUs5Jw==
dependencies:
argparse "^1.0.7"
esprima "^4.0.0"
@ -4705,7 +4692,6 @@ js-yaml@^3.13.0:
jsbn@~0.1.0:
version "0.1.1"
resolved "https://registry.yarnpkg.com/jsbn/-/jsbn-0.1.1.tgz#a5e654c2e5a2deb5f201d96cefbca80c0ef2f513"
integrity sha1-peZUwuWi3rXyAdls77yoDA7y9RM=
jsdom@^11.5.1, jsdom@^11.9.0:
version "11.12.0"
@ -5437,7 +5423,6 @@ minimist@1.2.0, minimist@^1.1.0, minimist@^1.1.1, minimist@^1.1.3, minimist@^1.2
minimist@~0.0.1:
version "0.0.10"
resolved "https://registry.yarnpkg.com/minimist/-/minimist-0.0.10.tgz#de3f98543dbf96082be48ad1a0c7cda836301dcf"
integrity sha1-3j+YVD2/lggr5IrRoMfNqDYwHc8=
minipass@^2.2.1, minipass@^2.3.4:
version "2.3.5"
@ -5520,7 +5505,6 @@ moment-parseformat@3.0.0:
moment-timezone@0.5.24:
version "0.5.24"
resolved "https://registry.yarnpkg.com/moment-timezone/-/moment-timezone-0.5.24.tgz#59e14e210a6f2410ec71e01c01d324c45f7f0a7e"
integrity sha512-oxg1YswuqzOBzGWs3i3TnNqbvHMGK7qY7zcg9SJfZ09K+FiNtSPKmFfqGuxN1oMyusGisvYZEc4un//j3wwAKw==
dependencies:
moment ">= 2.9.0"
@ -5583,7 +5567,6 @@ negotiator@0.6.1:
neo-async@^2.6.0:
version "2.6.1"
resolved "https://registry.yarnpkg.com/neo-async/-/neo-async-2.6.1.tgz#ac27ada66167fa8849a6addd837f6b189ad2081c"
integrity sha512-iyam8fBuCUpWeKPGpaNMetEocMt364qkCsfL9JuhjXX6dRnguRVOfk2GZaDpPjcOKiiXCPINZC1GczQ7iTq3Zw==
next-line@^1.1.0:
version "1.1.0"
@ -5674,7 +5657,6 @@ normalize-package-data@^2.3.2:
normalize-package-data@^2.5.0:
version "2.5.0"
resolved "https://registry.yarnpkg.com/normalize-package-data/-/normalize-package-data-2.5.0.tgz#e66db1838b200c1dfc233225d12cb36520e234a8"
integrity sha512-/5CMN3T0R4XTj4DcGaexo+roZSdSFW/0AOOTROrjxzCG1wrWXEsGbRKevjlIL+ZDE4sZlJr5ED4YW0yqmkK+eA==
dependencies:
hosted-git-info "^2.1.4"
resolve "^1.10.0"
@ -5871,7 +5853,6 @@ onetime@^2.0.0:
optimist@^0.6.1:
version "0.6.1"
resolved "https://registry.yarnpkg.com/optimist/-/optimist-0.6.1.tgz#da3ea74686fa21a19a111c326e90eb15a0196686"
integrity sha1-2j6nRob6IaGaERwybpDrFaAZZoY=
dependencies:
minimist "~0.0.1"
wordwrap "~0.0.2"
@ -6196,7 +6177,6 @@ pkg-dir@^2.0.0:
pkg-dir@^4.1.0:
version "4.1.0"
resolved "https://registry.yarnpkg.com/pkg-dir/-/pkg-dir-4.1.0.tgz#aaeb91c0d3b9c4f74a44ad849f4de34781ae01de"
integrity sha512-55k9QN4saZ8q518lE6EFgYiu95u3BWkSajCifhdQjvLvmr8IpnRbhI+UGpWJQfa0KzDguHeeWT1ccO1PmkOi3A==
dependencies:
find-up "^3.0.0"
@ -6474,7 +6454,6 @@ read-pkg@^2.0.0:
read-pkg@^5.1.1:
version "5.1.1"
resolved "https://registry.yarnpkg.com/read-pkg/-/read-pkg-5.1.1.tgz#5cf234dde7a405c90c88a519ab73c467e9cb83f5"
integrity sha512-dFcTLQi6BZ+aFUaICg7er+/usEoqFdQxiEBsEMNGoipenihtxxtdrQuBXvyANCEI8VuUIVYFgeHGx9sLLvim4w==
dependencies:
"@types/normalize-package-data" "^2.4.0"
normalize-package-data "^2.5.0"
@ -6996,7 +6975,6 @@ resolve@^1.1.5, resolve@^1.1.6, resolve@^1.3.2, resolve@^1.3.3, resolve@^1.4.0,
resolve@^1.10.0:
version "1.10.1"
resolved "https://registry.yarnpkg.com/resolve/-/resolve-1.10.1.tgz#664842ac960795bbe758221cdccda61fb64b5f18"
integrity sha512-KuIe4mf++td/eFb6wkaPbMDnP6kObCaEtIDuHOUED6MNUo4K670KZUHuuvYPZDxNF0WVLw49n06M2m2dXphEzA==
dependencies:
path-parse "^1.0.6"
@ -7297,7 +7275,6 @@ slash@^1.0.0:
slash@^3.0.0:
version "3.0.0"
resolved "https://registry.yarnpkg.com/slash/-/slash-3.0.0.tgz#6539be870c165adbd5240220dbe361f1bc4d4634"
integrity sha512-g9Q1haeby36OSStwb4ntCGGGaKsaVSjQ68fBxoQcutl5fS1vuY18H3wSt3jFyFtrkx+Kz0V1G85A4MyAdDMi2Q==
slice-ansi@0.0.4:
version "0.0.4"
@ -7418,7 +7395,6 @@ source-map@^0.5.0, source-map@^0.5.3, source-map@^0.5.6, source-map@^0.5.7, sour
source-map@^0.6.0, source-map@^0.6.1, source-map@~0.6.1:
version "0.6.1"
resolved "https://registry.yarnpkg.com/source-map/-/source-map-0.6.1.tgz#74722af32e9614e9c287a8d0bbde48b5e2f1a263"
integrity sha512-UjgapumWlbMhkBgzT7Ykc5YXUT46F0iKu8SGXq0bcwP5dz/h0Plj6enJqjz1Zbq2l5WaqYnrVbwWOWMyF3F47g==
sourcemap-codec@^1.4.1:
version "1.4.4"
@ -7457,7 +7433,6 @@ sprintf-js@~1.0.2:
sshpk@^1.7.0:
version "1.16.1"
resolved "https://registry.yarnpkg.com/sshpk/-/sshpk-1.16.1.tgz#fb661c0bef29b39db40769ee39fa70093d6f6877"
integrity sha512-HXXqVUq7+pcKeLqqZj6mHFUMvXtOJt1uoUx09pFW6011inTMxqI8BA8PM95myrIyyKwdnzjdFjLiE6KBPVtJIg==
dependencies:
asn1 "~0.2.3"
assert-plus "^1.0.0"
@ -7919,7 +7894,6 @@ turndown@^5.0.3:
tweetnacl@^0.14.3, tweetnacl@~0.14.0:
version "0.14.5"
resolved "https://registry.yarnpkg.com/tweetnacl/-/tweetnacl-0.14.5.tgz#5ae68177f192d4456269d108afa93ff8743f4f64"
integrity sha1-WuaBd/GS1EViadEIr6k/+HQ/T2Q=
type-check@~0.3.2:
version "0.3.2"
@ -7934,7 +7908,6 @@ type-detect@^4.0.0, type-detect@^4.0.5:
type-fest@^0.4.1:
version "0.4.1"
resolved "https://registry.yarnpkg.com/type-fest/-/type-fest-0.4.1.tgz#8bdf77743385d8a4f13ba95f610f5ccd68c728f8"
integrity sha512-IwzA/LSfD2vC1/YDYMv/zHP4rDF1usCwllsDpbolT3D4fUepIO7f9K70jjmUewU/LmGUKJcwcVtDCpnKk4BPMw==
type-is@~1.6.16:
version "1.6.16"
@ -7950,7 +7923,6 @@ typedarray@^0.0.6, typedarray@~0.0.5:
uglify-js@^3.1.4:
version "3.6.0"
resolved "https://registry.yarnpkg.com/uglify-js/-/uglify-js-3.6.0.tgz#704681345c53a8b2079fb6cec294b05ead242ff5"
integrity sha512-W+jrUHJr3DXKhrsS7NUVxn3zqMOFn0hL/Ei6v0anCIMoKC93TjcflTagwIHLW7SfMFfiQuktQyFVCFHGUE0+yg==
dependencies:
commander "~2.20.0"
source-map "~0.6.1"
@ -8378,7 +8350,6 @@ windows-release@^3.1.0:
wordwrap@~0.0.2:
version "0.0.3"
resolved "https://registry.yarnpkg.com/wordwrap/-/wordwrap-0.0.3.tgz#a3d5da6cd5c0bc0008d37234bbaf1bed63059107"
integrity sha1-o9XabNXAvAAI03I0u68b7WMFkQc=
wordwrap@~1.0.0:
version "1.0.0"

Loading…
Cancel
Save