cybertools/util/html.txt
helmutm 439ddf7c3d provide simple function for stripping all HTML tags from a text
git-svn-id: svn://svn.cy55.de/Zope3/src/cybertools/trunk@3759 fd906abe-77d9-0310-91a1-e0d9ade77398
2010-03-08 07:20:54 +00:00

35 lines
862 B
Text

==================
Tweaking HTML text
==================
$Id$
>>> from cybertools.util.html import sanitize
>>> input = """<html>
... <p class="standard" style="font-size: 200%; font-weight: bold">
... <a href="blubb"><b>Text</b>, and more</a>
... </p>
... </html>"""
>>> sanitize(input, validAttrs=['style'])
u'\n<p style="font-weight: bold">\n<a><b>Text</b>, and more</a>\n</p>\n'
>>> sanitize(input, ['p', 'b'], ['class'])
u'\n<p class="standard">\n<b>Text</b>, and more\n</p>\n'
All comments are stripped from the HTML input.
>>> input2 = """<html>
... <p>text</p>
... <!-- comment -->
... <p>text</p>"""
>>> sanitize(input2)
u'\n<p>text</p>\n\n<p>text</p>'
It is also possible to strip all HTML tags from the input string.
>>> from cybertools.util.html import stripAll
>>> stripAll(input)
u'Text, and more'