Commit Graph

84 Commits (dev)
 

Author SHA1 Message Date
Yuri Baburov e8f86bdcf9 Several updates from dev version. 9 years ago
Yuri Baburov 40e430c27d Makefile updates 9 years ago
Yuri Baburov 0a082ff020 Fix for Mac OS X 10.10 9 years ago
Yuri Baburov 8048160d66 WIP: update to support python2 and python3 9 years ago
Yuri Baburov 71294f094f Encoding improvements 10 years ago
Yuri Baburov 5855beb32a WIP; Backported features from stable branch 10 years ago
Yuri Baburov ae1f1adfff Switched to use python logging module.
Added xpath option (undocumented yet).
10 years ago
Yuri Baburov 2fab5ffa6b Merge pull request #48 from mperdomo1/master
Added code to check declared encodings first
10 years ago
Mark Perdomo 3a43a3fe7e Added code to check declared encodings first and check them
from kennethreitz/requests/utils.py.  Also I added some superset
encodings I have found in Chinese pages that are mishandled by
chardet/character declarations.
10 years ago
Yuri Baburov 1a4d3697bc Allow latest lxml on Mac OS X 10.9, see issue #39 for comments and setup instructions 10 years ago
Yuri Baburov d8595b7103 Quickfix for #41 11 years ago
Yuri Baburov 318f25c577 Minor fix in encoding guessing. Claiming it v0.3.0.1 11 years ago
Yuri Baburov 08658d1d31 Released v 0.3, and uploaded to the pypi. 11 years ago
Yuri Baburov 4e3192f5ab Merge pull request #29 from hush-hush/master
Make lxml clean tree available for user modifications
12 years ago
hush-hush e2e78e4d55 Make lxml clean tree available for user modifications. 12 years ago
Yuri Baburov c923995606 Merge pull request #27 from sunlightlabs/master
Simple guard for empty title elements. Thanks, dvogel!
12 years ago
Drew Vogel fdba8d9e11 Added check on title.text to avoid a TypeError on None. 12 years ago
Yuri Baburov 9cd5fb6226 Bump to 0.2.6.1 12 years ago
Yuri Baburov 44915518d3 Merge pull request #24 from zacharydenton/master
Fix issue 22: all titles were blank.
12 years ago
Zach Denton 0843d9cdf2 Explicitly check if title is None. fixes #22
This fixes #22 which caused all titles to be blank.
12 years ago
Yuri Baburov 8aefc6175f Updated README with 0.2.6 changes. 12 years ago
Yuri Baburov 20d5f3a73a Bump to 0.2.6 12 years ago
Yuri Baburov 2e49e34e11 Merge pull request #20 from andreypopp/master
readability.htmls: some docs do not have title elem
12 years ago
Andrey Popp 95852d5c18 readability.htmls: some docs do not have title elem 12 years ago
Yuri Baburov 274b60cdb1 Merge pull request #19 from EvaSDK/master
Package that provides source code
12 years ago
Gilles Dartiguelongue ea6afd3d49 Make sure code is actually distributed 12 years ago
Richard Harding a19e766900 Update version so we can upload new tar.gz to pypi 12 years ago
Richard Harding b9f6f6777f Merge branch 'master' of github.com:buriy/python-readability 12 years ago
Richard Harding 873562cfba Update setup.py for finding the package correctly 12 years ago
Richard Harding e9a5cbfe7f Remove pdb dummy 12 years ago
Richard Harding f1a79fb8f8 Update to make sure we don't drop the html tag when ditching elements 12 years ago
Richard Harding 46f0302ebc rename the document_only flag to html_partial 12 years ago
Rick Harding 6e8a1f5ce2 Merge pull request #18 from mitechie/add_makefile
Add makefile, update .gitignore for venv potential testfile output.
12 years ago
Richard Harding b8fc399fac Fix rebase issue in the Makefile 12 years ago
Richard Harding 82804b664d Update .gitignore file for venv and nosetests. 12 years ago
Richard Harding 4376eedc13 Add makefile testing, building, uploading.
- Adds a makefile with helpers
- make all will setup a virtualenv and get deps
- make test will install test deps and run nosetests
- make version_update will open the setup.py for updating version string
- make upload will build and upload sdist to pypi
12 years ago
Yuri Baburov 7338e9ef63 Added test suite to setup.py
Bump to version 0.2.4
12 years ago
Yuri Baburov a1ae4eaf72 Merge pull request #15 from mitechie/master
New option only_document of Document.summary(), fixed issue GH-13 with "<body/>", added some docs, tests, and code quality improvements. Thanks, Rick!
12 years ago
Richard Harding 8d3e39f04e Update readme 12 years ago
Richard Harding a46dc14251 Try to pep8 all the things but give up when I got close. 12 years ago
Richard Harding 5a98e2c1b8 Correct appending and allow for document only
- Fix the appending of siblings to the correct nested element
- Add a document only flag so that you can get a dom tree you can nest
yourself without html/body tags.
12 years ago
Richard Harding edccec5d3b Work on why we have an empty <body/> tag
- Seems to come because the sanitizer ends up with two nodes, not one. The
first is an empty body, the second is the article div.
- Fix up the tabs so we can work with the file. Needs lots of pep8 love.
- Implement an initial hack that at least gets it working atm.
- Start to add test cases, sample html files we can test against, etc.
12 years ago
Yuri Baburov ab783b25b7 Merge pull request #11 from JanX2/master
Fixing gap in node_length coverage (length=80 was missed)
Continue early in remove_unlikely_candidates() in case there is neither a class nor an id attribute.
Adding comment about oversight in transform_misused_divs_into_paragraphs
12 years ago
Jan Weiß 3cdc3d67af Adding comment about oversight in transform_misused_divs_into_paragraphs(). 12 years ago
Jan Weiß 960f885edf Continue early in remove_unlikely_candidates() in case there is neither a class nor an id attribute. 12 years ago
Jan Weiß 6b3961cd30 Fixing gap in node_length coverage. 12 years ago
Yuri Baburov f9b604c9a8 Merge pull request #10 from facundo/master
Fix: Document.score_paragraphs should use ._html() not .html in case it's used not from .summary() method.
Thanks to facundo.
12 years ago
facundo bb93ae1e5f fixed a small issue on the Document score_paragraphs method 12 years ago
Yuri Baburov fc6a500298 Merge pull request #9 from Psycojoker/master
Add lxml to the dependencies list in the setup.py
Please note that lxml sometimes can't be built from sources, lots of people use binary distributions, which setup.py/pip can't handle properly!
13 years ago
Laurent Peuch 1583d8a794 add lxml missing dependancy 13 years ago