Richard Harding
d708744822
Clean up tests/changes to merge into 0.3.0.dev
12 years ago
Jerry Charumilind
eefb8e1125
Implement duplicate page detection
...
This adds detection of duplicate pages to avoid adding duplicate pages to a
multi-page article. It adds a simple unit test and regenerates the nytimes
regression test with the new, and more correct, result. Previously, we were
including page 2 again after page 5.
Conflicts:
src/readability_lxml/readability.py
12 years ago
Richard Harding
c931a80ba8
Tweak tests post merging
12 years ago
Jerry Charumilind
883a02ad5d
Add a regression for a multi-page nytimes article
...
It does not quite work yet, as we wrongly pull in page 2 at the end of the
article due to yet-to-be-implemented duplicate avoidance.
Conflicts:
src/readability_lxml/readability.py
src/tests/gen_test.py
src/tests/regression.py
12 years ago
Richard Harding
cfc6f94634
Fix test for the multipage test with actual content
12 years ago
Jerry Charumilind
816c66482e
Improve unit test for basic multi-page handling
...
The test now actually asserts something instead of just printing some stuff out
for manual inspection.
Conflicts:
src/readability_lxml/readability.py
12 years ago
Richard Harding
99d5fc0a87
Update for merge with Jerry Checkpoint multi-page readability work
12 years ago
Jerry Charumilind
f02fe79840
Checkpoint multi-page readability work
...
Restructured code to better support multi-page readability. Improved tests.
Conflicts:
src/readability_lxml/readability.py
src/tests/regression.py
12 years ago
Richard Harding
5cb4b8b8c0
Tweaks after the code reorg
12 years ago
Jerry Charumilind
f8315d011c
Checkpoint multi-page readability work
...
Restructured code to better support multi-page readability. Improved tests.
Rick:
This generally works and the tests pass, but there are some broken cases with
the multipage bits that are causing me grief. It does pass the one test case.
I made the multipage an option vs doing it by default. The more I change the
code the harder future merges will be, but man it needs some cleanup, reorg,
and comments.
Conflicts:
src/readability_lxml/readability.py
src/tests/regression.py
12 years ago
Richard Harding
99efa5c10b
PEP8 again ...
12 years ago
Richard Harding
a012fd2362
urlfetch is in src
12 years ago
Jerry Charumilind
3fe416a5d1
Refactor code for easier testing
...
Conflicts:
src/readability_lxml/readability.py
12 years ago
Richard Harding
8cadc4a958
Fix links in the regression test set
12 years ago
Richard Harding
9765d13e90
Garden
12 years ago
Jerry Charumilind
32d1764e83
Add scoring of next page link ancestry and href
...
This adds the scoring of next page link candidates' ancestry and href values
from the readability algorithm.
12 years ago
Richard Harding
0951647c8e
Complete move from test_data/output to regression_test*
12 years ago
Richard Harding
ace51a6819
Combine our tests with the new regresssion_test stuff
12 years ago
Jerry Charumilind
2505c78e5b
Jerry Merge: First working find_next_page_link case
12 years ago
Richard Harding
edc0e4d4c6
Move tests to testfile
12 years ago
Jerry Charumilind
6abc6f7ef2
Add cleaning of short segments
...
Conflicts:
src/readability_lxml/readability.py
12 years ago
Jerry Charumilind
1e30e33302
Move the tests to the testfile
12 years ago
Richard Harding
e8a6250605
Clean up merge, put tests in right place, adjust imports
12 years ago
Jerry Charumilind
62df35570d
Checkpoint of multi-page article work
...
This implements some basic tools needed by the multi-page article algorithm.
Conflicts:
src/readability_lxml/readability.py
12 years ago
Richard Harding
29fceeb4b1
Fix regression to run with metadata
12 years ago
Richard Harding
6f8184be27
Doh, move the tests to the right dir
12 years ago
Richard Harding
9aef5e36b7
Move the test data into the tests/test_data dir
12 years ago
Jerry Charumilind
8988b6b767
Add comment for read_orig
12 years ago
Jerry Charumilind
7d097d5f11
Add subcommand parsing to gen_test
...
There are now subcommands to generate new tests or just regenerate readable
versions of old tests.
Conflicts:
src/tests/gen_test.py
12 years ago
Jerry Charumilind
b04f75239c
Add option to not generate yaml file
...
Sometimes you just want to generate the data files without the YAML
specification. This change lets you do that. In doing so, I switched to use
the argparse module for argument parsing.
Conflicts:
src/tests/gen_test.py
12 years ago
Jerry Charumilind
c21f00b1ee
Reorganize constants
...
Conflicts:
src/tests/regression.py
12 years ago
Richard Harding
9fec245ae4
garden
12 years ago
Jerry Charumilind
6af808bc14
Add docstring briefly describing gen_test program
12 years ago
Jerry Charumilind
7980ca84c9
Add regression tests for readability results
...
These test cases provide a baseline from which we can start improving the
readability algorithm and making sure that we do not horribly break anything.
Conflicts:
src/tests/regression.py
12 years ago
Richard Harding
a700bb8bd4
Update makefile regression test helper to open html results
12 years ago
Jerry Charumilind
bf203b5a4b
Add summary page for test results
...
Conflicts:
src/tests/regression.py
12 years ago
Jerry Charumilind
65989b538a
Remove obsolete code
...
Conflicts:
src/tests/regression.py
12 years ago
Jerry Charumilind
9b7e5bb327
Jerry Merge: Remove obsolete code
12 years ago
Jerry Charumilind
068eba19ae
Jerry Merge: Add reading of test information from YAML file
12 years ago
Richard Harding
6d3ad559f6
Move test_data, add regression_test make command
12 years ago
Jerry Charumilind
5222ed0628
Jerry Merge: Initial regression test data
12 years ago
Richard Harding
6454fb3f37
Clean up merge bits a little bit
12 years ago
Richard Harding
9366436861
Merge Jerry: pull in initial set of regression tests
12 years ago
Richard Harding
7dc373e9c5
Add the title and the short title to the metadata set.
...
- Tested for perf. hit, 100 iterations add .03s total time.
- Added the -m flag to the cmd line client to get all metadata output.
- Added test for making sure title/short title come back as well.
12 years ago
Richard Harding
b1966df1c3
Fix docs for changed method
12 years ago
Richard Harding
57694cb352
Remove the get_ in method name, doesn't fit rest of api
12 years ago
Jerry Charumilind
b78d7e8501
Merge Jerry: pull in the ability to get back confidence score as well as the processed html
12 years ago
Richard Harding
a2b17e757c
Update readme for the build location
12 years ago
Richard Harding
3347f16d93
Fix the flipped nature of the <html> wrapping setting
12 years ago
Richard Harding
93ac1111a1
Add try it out to the readable server
12 years ago