Fix for #52: <input type="hidden"> are not counted any more for "form removal" heuristic.

pull/50/merge
Yuri Baburov 10 years ago
parent 2fab5ffa6b
commit 638f73f6a2

@ -452,6 +452,7 @@ class Document:
for kind in ['p', 'img', 'li', 'a', 'embed', 'input']:
counts[kind] = len(el.findall('.//%s' % kind))
counts["li"] -= 100
counts["input"] -= len(el.findall('.//input[@type="hidden"]'))
# Count the text length excluding any surrounding whitespace
content_length = text_length(el)

Loading…
Cancel
Save