Wednesday, November 11, 2015

Avoid line breaks in formatted amounts

Gerd reported #611. The solution was easy, I just added this to their file:

# decimal_group_separator '.'
decimal_group_separator = u"\u00A0"

(i.e. set the decimal_group_separator to a non-breaking space).

The non-breaking space is now the new default value for decimal_group_separator (so after release I might remove above local change)

It took me some time to write a test case because run_simple_doctests (i.e. python -m doctest lino/core/ did not run the tests because it mixes up with the site module of the standard library. At least in Python 2.7.6. Fixed in lino/tests/

Rendering raw HTML strings

I have a strange problem with E.raw: :

>>> from lino import startup
>>> startup('lino.projects.min1.settings')
>>> from lino.utils.xmlgen.html import E
>>> s = "<p>Ceci est un <b>beau</b> texte format&eacute;</p>"
>>> E.tostring(E.raw(s))
Traceback (most recent call last):
Exception: ParseError undefined entity: line 1, column 39 in <p>Ceci est un <b>beau</b> texte format&eacute;</p>

The ParseError looks as if the parser does not know the eacute HTML entity. But AFAICS I have successfully updated the parser’s XMLParser.entity attribute (I do this in lino.utils.xmlgen.html):

>>> from lino.utils.xmlgen.html import CreateParser
>>> p = CreateParser()
>>> p.entity['eacute']
>>> p.entity['egrave']

I changed lino.utils.xmlgen.Namespace.fromstring() so that it now uses xml.etree.ElementTree.fromstringlist() instead of xml.etree.ElementTree.fromstring() (because the latter does not support the parser keyword argument). But that doesn’t change anything.

>>> p  
<xml.etree.ElementTree.XMLParser object at ...>
>>> p.version
'Expat 2.1.0'
>>> E  
<lino.utils.xmlgen.html.HtmlNamespace object at ...>

Rendering raw HTML strings (continued)

Here is the same snippet without Lino:

>>> from xml.etree import ElementTree as ET
>>> from htmlentitydefs import name2codepoint
>>> ENTITIES = {}
>>> ENTITIES.update((x, unichr(i)) for x, i in name2codepoint.iteritems())
>>> def CreateParser():
...     p = ET.XMLParser()
...     p.entity.update(ENTITIES)
...     return p

No problem for HTML without entities:

>>> s = "<p>This is a <b>formatted</b> text</p>"
>>> ET.tostring(ET.fromstringlist([s], parser=CreateParser()))
'<p>This is a <b>formatted</b> text</p>'

But when it contains an entity (of type &name;), then it fails:

>>> s = "<p>Ceci est un texte <b>format&eacute;</b></p>"
>>> ET.tostring(ET.fromstringlist([s], parser=CreateParser()))
Traceback (most recent call last):
ParseError: undefined entity: line 1, column 30

The error message indicates the parser does not know the eacute HTML entity. But AFAICS I have successfully updated the parser’s XMLParser.entity attribute:

>>> p = CreateParser()
>>> p.entity['eacute']
>>> p.version
'Expat 2.1.0'

Lino now requires Python 2.7

I removed support for Python 2.6 because one test case was broken because E.raw: now reports the string where the parser error occured.

More parsing

The following is normal because fromstring and fromstringlist must return one element:

>>> s = "<p>intro:</p><ol><li>first</li><li>second</li></ol>"
>>> ET.tostring(ET.fromstringlist([s], parser=CreateParser()))
Traceback (most recent call last):
ParseError: junk after document element: line 1, column 13

Workaround is to wrap them into a <div>:

>>> s = '<div>%s</div>' % s
>>> ET.tostring(ET.fromstringlist([s], parser=CreateParser()))

Cannot reuse detail_layout

The following error came in lino_cosi.lib.sales when I

Exception: Cannot reuse detail_layout of <class ‘lino_cosi.lib.sales.models.ItemsByInvoicePrint’> for <class ‘lino_cosi.lib.sales.models.InvoiceItemsByProduct’>

The explanation was probably that the InvoiceItems table was never used. Since the table who defined the detail_layout was never used, Lino installed the layout on the first subclass thereof, and then the other subclasses failed to inherit from it. Just a theoretical explanation which I did not investigate to the end, but the problem disappeared after adding a command to the explorer menu.

Use lxml, not xml.etree for parsing HTML

Meanwhile I asked about Rendering raw HTML strings (continued) on #python, and Yhg1s suggested:

well, the simple solution is to not parse HTML as if it was XML, because, in reality, it isn’t. lxml.html is a much better idea.

I consulted Parsing XML and HTML with lxml to refresh my memory, and then it was actually quite easy.

>>> from lxml.etree import HTML
>>> s = "<p>Ceci est un texte <b>format&eacute;</b></p>"
>>> e = HTML(s)
>>> E.tostring(e)
'<html><body><p>Ceci est un texte <b>format&#233;</b></p></body></html>'
>>> E.tostring(e[0][0])
'<p>Ceci est un texte <b>format&#233;</b></p>'

So Lino now really needs lxml (not only for lino_cosi.lib.sepa), and let’s hope that the strange side effects I had some years ago will not occur again.

IllegalText: The <text:section> element does not allow text

I had this error message and wrote a test case in Product invoices to reproduce it.

The problem is in lino.utils.html2odf. Actually we just stumbled over one of the probably many situations which are not yet supported. I started a section “Not yet supported” in this document.

TODO: is there really no existing library for this task? The only approaches I saw call libreoffice in headless mode to do the conversion. Which sounds inappropriate for our situation where we must glue together fragments from different sources. Also note that we use appy.pod to do the actual generation.