Thursday, November 14, 2024

I stumbled into a surprisingly easy solution for #4851 (Bleach is no longer being maintained): I wrote a new function lino.utils.soup.sanitize(), which replaces bleach.clean(). Here are some first test cases I used for the new function. (But these are now merged into Bleaching).

>>> from lino.utils.soup import sanitize
>>> content = """
... No tag at beginning of text.
... bla bLTaQSTyI80t2t8l
... foo bar.
... And here is some <b>bold</b> text.
...
... """
>>> print(sanitize(content))
No tag at beginning of text.
bla bLTaQSTyI80t2t8l
foo bar.
And here is some <b>bold</b> text.
>>> content = """
... <p align="right">First paragraph</p>
... <p onclick="kill()">Second paragraph</p>
... """
>>> print(sanitize(content))
<p align="right">First paragraph</p>
<p>Second paragraph</p>
>>> content = """
... <!DOCTYPE html>
... <html>
...   <head>
...     <meta http-equiv="content-type" content="text/html; charset=UTF-8">
...     <title>Baby</title>
...   </head>
...   <body>
...     This is a descriptive text with <b>some</b> formatting.<br>
...     <br>
...     Here is a second paragraph.<br>
...     <br>
...   </body>
... </html>
... """
>>> print(sanitize(content))
This is a descriptive text with <b>some</b> formatting.<br/>
<br/>
    Here is a second paragraph.<br/>
<br/>