Closes #66 - Keep tabular data as they're most likely to be part of content.

keep-tabular-data
Nicolas Perriault 9 years ago
parent f9ea568f3f
commit 62fae22849

@ -357,7 +357,6 @@ Readability.prototype = {
// Do these last as the previous stuff may have removed junk
// that will affect these
this._cleanConditionally(articleContent, "table");
this._cleanConditionally(articleContent, "ul");
this._cleanConditionally(articleContent, "div");

@ -0,0 +1,5 @@
{
"title": "Keep tabular data test",
"byline": null,
"excerpt": "Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod\n tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,\n quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo\n consequat."
}

@ -0,0 +1,90 @@
<div id="readability-page-1" class="page">
<div>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat.</p>
<table>
<thead>
<tr>
<td></td>
<th><a href="http://gs.statcounter.com/#browser_version-ww-weekly-201201-201202-bar">World</a>
</th>
<th><a href="http://gs.statcounter.com/#browser_version-af-weekly-201201-201202-bar">Africa</a>
</th>
<th><a href="http://gs.statcounter.com/#browser_version-as-weekly-201201-201202-bar">Asia</a>
</th>
<th><a href="http://gs.statcounter.com/#browser_version-eu-weekly-201201-201202-bar">Europe</a>
</th>
<th><a href="http://gs.statcounter.com/#browser_version-na-weekly-201201-201202-bar">North America</a>
</th>
<th><a href="http://gs.statcounter.com/#browser_version-oc-weekly-201201-201202-bar">Oceania</a>
</th>
<th><a href="http://gs.statcounter.com/#browser_version-sa-weekly-201201-201202-bar">South America</a>
</th>
</tr>
</thead>
<tbody>
<tr>
<th align="left">Theora</th>
<td align="right">55%</td>
<td align="right">64%</td>
<td align="right">58%</td>
<td align="right">62%</td>
<td align="right">42%</td>
<td align="right">49%</td>
<td align="right">66%</td>
</tr>
<tr>
<th align="left">WebM</th>
<td align="right">51%</td>
<td align="right">55%</td>
<td align="right">52%</td>
<td align="right">57%</td>
<td align="right">38%</td>
<td align="right">45%</td>
<td align="right">62%</td>
</tr>
<tr>
<th align="left">H.264</th>
<td align="right">45%</td>
<td align="right">38%</td>
<td align="right">42%</td>
<td align="right">44%</td>
<td align="right">47%</td>
<td align="right">50%</td>
<td align="right">53%</td>
</tr>
<tr>
<th align="left">No <code>&lt;video&gt;</code>
</th>
<td align="right">28%</td>
<td align="right">27%</td>
<td align="right">32%</td>
<td align="right">21%</td>
<td align="right">32%</td>
<td align="right">25%</td>
<td align="right">25%</td>
</tr>
<tr>
<th align="left">Unknown</th>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
</tr>
</tbody>
</table>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse cillum
dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
</div>
<p>Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat
non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
</div>

@ -0,0 +1,95 @@
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<title>Keep tabular data test</title>
</head>
<body>
<article>
<h1>Lorem</h1>
<div>
<p>Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat.</p>
<table>
<thead>
<tr>
<td></td>
<th><a href="http://gs.statcounter.com/#browser_version-ww-weekly-201201-201202-bar">World</a></th>
<th><a href="http://gs.statcounter.com/#browser_version-af-weekly-201201-201202-bar">Africa</a></th>
<th><a href="http://gs.statcounter.com/#browser_version-as-weekly-201201-201202-bar">Asia</a></th>
<th><a href="http://gs.statcounter.com/#browser_version-eu-weekly-201201-201202-bar">Europe</a></th>
<th><a href="http://gs.statcounter.com/#browser_version-na-weekly-201201-201202-bar">North America</a></th>
<th><a href="http://gs.statcounter.com/#browser_version-oc-weekly-201201-201202-bar">Oceania</a></th>
<th><a href="http://gs.statcounter.com/#browser_version-sa-weekly-201201-201202-bar">South America</a></th>
</tr>
</thead>
<tbody>
<tr>
<th align="left">Theora</th>
<td align="right">55%</td>
<td align="right">64%</td>
<td align="right">58%</td>
<td align="right">62%</td>
<td align="right">42%</td>
<td align="right">49%</td>
<td align="right">66%</td>
</tr>
<tr>
<th align="left">WebM</th>
<td align="right">51%</td>
<td align="right">55%</td>
<td align="right">52%</td>
<td align="right">57%</td>
<td align="right">38%</td>
<td align="right">45%</td>
<td align="right">62%</td>
</tr>
<tr>
<th align="left">H.264</th>
<td align="right">45%</td>
<td align="right">38%</td>
<td align="right">42%</td>
<td align="right">44%</td>
<td align="right">47%</td>
<td align="right">50%</td>
<td align="right">53%</td>
</tr>
<tr>
<th align="left">No <code>&lt;video&gt;</code></th>
<td align="right">28%</td>
<td align="right">27%</td>
<td align="right">32%</td>
<td align="right">21%</td>
<td align="right">32%</td>
<td align="right">25%</td>
<td align="right">25%</td>
</tr>
<tr>
<th align="left">Unknown</th>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
<td align="right">0%</td>
</tr>
</tbody>
</table>
<p>Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.</p>
</div>
<h2>Foo</h2>
<div>
Tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam,
quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo
consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse
cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non
proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
</div>
</article>
</body>
</html>
Loading…
Cancel
Save