Thunder (mascot)
Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos
# Postlight Parser - Extracting content from chaos [![CircleCI](https://circleci.com/gh/postlight/parser.svg?style=svg&circle-token=3026c2b527d3767750e767872d08991aeb4f8f10)](https://circleci.com/gh/postlight/mercury-parser) [![Greenkeeper badge](https://badges.greenkeeper.io/postlight/mercury-parser.svg)](https://greenkeeper.io/) [![Apache License][license-apach-badge]][license-apach] [![MITC License][license-mit-badge]][license-mit] [![Gitter chat](https://badges.gitter.im/postlight/mercury.png)](https://gitter.im/postlight/mercury) [license-apach-badge]: https://img.shields.io/badge/License-Apache%202.0-blue.svg?style=flat-square [license-apach]: https://github.com/postlight/mercury-parser/blob/master/LICENSE-APACHE [license-mit-badge]: https://img.shields.io/badge/License-MIT%202.0-blue.svg?style=flat-square [license-mit]: https://github.com/postlight/mercury-parser/blob/master/LICENSE-MIT [Postlight](https://postlight.com)'s Parser extracts the bits that humans care about from any URL you give it. That includes article content, titles, authors, published dates, excerpts, lead images, and more. Postlight Parser powers [Postlight Reader](https://reader.postlight.com/), a browser extension that removes ads and distractions, leaving only text and images for a beautiful reading view on any site. Postlight Parser allows you to easily create custom parsers using simple JavaScript and CSS selectors. This allows you to proactively manage parsing and migration edge cases. There are [many examples available](https://github.com/postlight/parser/tree/master/src/extractors/custom) along with [documentation](https://github.com/postlight/parser/blob/master/src/extractors/custom/README.md). ## How? Like this. ### Installation ```bash # If you're using yarn yarn add @postlight/parser # If you're using npm npm install @postlight/parser ``` ### Usage ```javascript import Parser from '@postlight/parser'; Parser.parse(url).then(result => console.log(result)); // NOTE: When used in the browser, you can omit the URL argument // and simply run `Parser.parse()` to parse the current page. ``` The result looks like this: ```json { "title": "Thunder (mascot)", "content": "...
Thunder is the stage name for the...", "author": "Wikipedia Contributors", "date_published": "2016-09-16T20:56:00.000Z", "lead_image_url": null, "dek": null, "next_page_url": null, "url": "https://en.wikipedia.org/wiki/Thunder_(mascot)", "domain": "en.wikipedia.org", "excerpt": "Thunder Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos", "word_count": 4677, "direction": "ltr", "total_pages": 1, "rendered_pages": 1 } ``` If Parser is unable to find a field, that field will return `null`. #### `parse()` Options ##### Content Formats By default, Postlight Parser returns the `content` field as HTML. However, you can override this behavior by passing in options to the `parse` function, specifying whether or not to scrape all pages of an article, and what type of output to return (valid values are `'html'`, `'markdown'`, and `'text'`). For example: ```javascript Parser.parse(url, { contentType: 'markdown' }).then(result => console.log(result) ); ``` This returns the the page's `content` as GitHub-flavored Markdown: ```json "content": "...**Thunder** is the [stage name](https://en.wikipedia.org/wiki/Stage_name) for the..." ``` ##### Custom Request Headers You can include custom headers in requests by passing name-value pairs to the `parse` function as follows: ```javascript Parser.parse(url, { headers: { Cookie: 'name=value; name2=value2; name3=value3', 'User-Agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 10_3_1 like Mac OS X) AppleWebKit/603.1.30 (KHTML, like Gecko) Version/10.0 Mobile/14E304 Safari/602.1', }, }).then(result => console.log(result)); ``` ##### Pre-fetched HTML You can use Postlight Parser to parse custom or pre-fetched HTML by passing an HTML string to the `parse` function as follows: ```javascript Parser.parse(url, { html: '
Thunder is the stage name for the horse who is the official live animal mascot for the Denver Broncos