xhtml() doesn’t like <div> inside a <span>

You should be able run the snippets on this page and reproduce the problem by downloading files bug.rst and 0821.odt from /docs/blog/2013/0821 into a folder of your choice and then running:

$ python -m doctest bug.rst

0821.odt contains a simple appy pod from clause:

do text
from xhtml(chunk)

We are going to render this into a file out.odt:

>>> OUTFILE = 'out.odt'

Alternatively you might try to render to a .pdf file if you have an openoffice or libreoffice server running on port 2002 (uncomment the following line in your copy of bug.rst):

>>> # OUTFILE = 'out.pdf'

When chunk is the following, then it works:

>>> html = u'<p><div><span>it works!</span></div></p>'

But when I inverse the nesting (<div> inside <span>) then it fails:

>>> html = u'<p><span><div>Oops</div></span></p>'

Another example of HTML as TinyMCE happens to produce is this:

>>> html = '<strong><ul><li>Foo</li><li>Bar</li></ul></strong>'

Here is how it should be:

>>> html = u'<ul><li><strong>Foo</strong></li><li><strong>Bar</strong></li></ul>'

The following snippet will try to render it:

>>> import os
>>> from appy.pod.renderer import Renderer
>>> html = html.encode('utf-8')
>>> context = dict(chunk=html)
>>> if os.path.exists(OUTFILE):
...     os.remove(OUTFILE)
>>> r = Renderer('0821.odt',context,OUTFILE)
>>> r.run()
>>> os.path.exists(OUTFILE)
True

The file out.odt now exists, but it contains invalid content.xml and LibreOffice will complain when you try to open it.

I originally wrote this page for Gaëtan in the hope that he will fix this bug in appy pod… but then I understood: in fact Appy Pod is right! A <div> inside a <span> is no valid XHTML. A <li> inside a <strong> is no valid XHTML. According to Mac on stackoverflo “several websites use this method for styling”, but the bug is not in Gaëtan’s renderXhtml method, it is in my own code: in lino.utils.html2xhtml.

(Edit 20130823: added the <li> inside <strong> example)