Fictive VAT numbers are different on GitLab

April 16–17, 2024

I’m investigating for #5542 (Two VAT doctests fail because generated VAT numbers differ).

The demo fixture of the vat plugin assigns a fictive and randomly generated (but syntactically valid) VAT number to each business partner. For some reason, the generated VAT numbers differ between my computer and GitLab CI, causing two doctests (docs/plugins/eevat.rst and docs/plugins/bevats.rst) to fail.

What causes this difference? That’s the question of #5542!

The VAT numbers are assigned by VatNumberManager.generate_vid. I added a diagnostic log message in this method:

dd.logger.info("20240416 generated VAT id {} for {}".format(self.obj.vat_id, self.obj))

Then I run inv prep with the new --verbose option. The first demo database to get prepared is pierre. The resulting messages are always the same:

20240416 generated VAT id BE 7088.996.857 for Bäckerei Ausdemwald
20240416 generated VAT id BE 4685.739.309 for Bäckerei Mießen
20240416 generated VAT id BE 4181.505.692 for Bäckerei Schmitz
20240416 generated VAT id BE 9045.438.159 for Garage Mergelsberg
20240416 generated VAT id NL 220.876.686B01 for Donderweer BV
20240416 generated VAT id NL 451.948.587B01 for Van Achter NV
20240416 generated VAT id DE 143.956.862 for Hans Flott & Co
20240416 generated VAT id DE 135.079.295 for Bernd Brechts Bücherladen
20240416 generated VAT id DE 138.433.397 for Reinhards Baumschule
20240416 generated VAT id FR 86.915.334.564 for Moulin Rouge
20240416 generated VAT id FR 66.435.589.280 for Auto École Verte
20240416 generated VAT id EE 848.217.541 for Maksu- ja Tolliamet
20240416 generated VAT id BE 4018.258.949 for Electrabel Customer Solutions

The same messages also come on GitLab, but with other VAT numbers:

20240416 generated VAT id BE 2914.517.428 for Bäckerei Ausdemwald
20240416 generated VAT id BE 8750.836.192 for Bäckerei Mießen
20240416 generated VAT id BE 1958.116.531 for Bäckerei Schmitz
20240416 generated VAT id BE 4534.589.652 for Garage Mergelsberg
20240416 generated VAT id NL 237.725.353B01 for Donderweer BV
20240416 generated VAT id NL 643.080.485B01 for Van Achter NV
20240416 generated VAT id DE 928.188.312 for Hans Flott & Co
20240416 generated VAT id DE 593.748.463 for Bernd Brechts Bücherladen
20240416 generated VAT id DE 618.180.575 for Reinhards Baumschule
20240416 generated VAT id FR 65.449.289.186 for Moulin Rouge
20240416 generated VAT id FR 40.268.455.901 for Auto École Verte
20240416 generated VAT id EE 211.892.074 for Maksu- ja Tolliamet
20240416 generated VAT id BE 7659.012.310 for Electrabel Customer Solutions

The seed() method initializes the random number generator, and if you use the same seed value twice you will get the same random number twice. I verify this in Generating fictive VAT numbers (which passes both on my machine and on GitLab).

The generate_vid is the only place in Lino where Python’s random module is used (there is another import in the users plugin to generate the verification code, but this is never called during inv prep).

Calling seed() without argument would take the system time (and therefore yield different random numbers). But we deliberately call random.seed(1) at the global context of lino_xl.lib.vat.choicelists, i.e. when that module is imported. I added another log message at that place:

20240417 random.seed(1)

NB: Until now the import and the random.seed(1) call had been conditional (only when dd.is_installed("vat")), I removed this condition because it’s not needed and because it adds complexity. But that didn’t fix our problem.

One theoretic possibility was that for some reason the sorting order of the business partners might differ when they get their VAT id. By comparing the output between my machine and GitLab we can now exclude this possibility (IOW we are advancing ;-)

I also had a closer look at the code in lino_xl.lib.vat.choicelists and noticed this:

for cc, length in {
        "HR": 11,
        "DK": 8,
        "EE": 9,
        "FI": 8,
        "FR": 11,
        "DE": 9,
        "EL": 9,
        "HU": 8,
        "IT": 11,
        "LV": 11,
        "LT": 12,
        "LU": 8,
}.items():
    vat_origins.add_item(cc, VatOrigin(cc, length))

Which means that VatOrigin objects are instantiated in an order that can vary. I don’t say that this is the culprit, but it is suspicious… I made a series of bold simplifications to the code. Result: none.

Explanation

I found the explanation on April 20.

While showing the problem to Sharif I had the idea that I could “patch” the seed() method of the random generator so that it logs every time when it is called.

I added the following to lino/__init__.py:

import random
import inspect
def seed(self, a=None, version=2):
    stk = "\n".join(["{}:{}".format(s.filename, s.lineno) for s in inspect.stack()[1:3]])
    logger.info("20240420 random.seed(%s, %s) is called from %s", a, version, stk)
    self.original_seed(a=a, version=version)
random.Random.original_seed = random._inst.seed
random.Random.seed = seed
random.seed = random._inst.seed

And now:

(dev) luc@yoga:~/work/book/lino_book/projects/pierre$ pm prep
20240420 random.seed(None, 2) is called from /usr/lib/python3.10/random.py:125
/home/luc/virtualenvs/dev/lib/python3.10/site-packages/sympy/core/random.py:29
20240420 random.seed(None, 2) is called from /usr/lib/python3.10/random.py:125
/home/luc/virtualenvs/dev/lib/python3.10/site-packages/sympy/core/symbol.py:419
20240420 random.seed(None, 2) is called from /usr/lib/python3.10/random.py:125
/home/luc/virtualenvs/dev/lib/python3.10/site-packages/sympy/ntheory/ecm.py:7
20240420 random.seed(None, 2) is called from /usr/lib/python3.10/random.py:125
/home/luc/virtualenvs/dev/lib/python3.10/site-packages/sympy/ntheory/qs.py:8
20240420 random.seed(1, 2) is called from /home/luc/work/xl/lino_xl/lib/vat/choicelists.py:27
<frozen importlib._bootstrap>:241
20240420 random.seed(None, 2) is called from /usr/lib/python3.10/random.py:125
/usr/lib/python3.10/tempfile.py:285
20240420 random.seed(None, 2) is called from /usr/lib/python3.10/random.py:125
/usr/lib/python3.10/tempfile.py:285
We are going to flush your database (/home/luc/work/book/lino_book/projects/pierre/settings/default.db)
AND REMOVE ALL FILES BELOW /home/luc/work/book/lino_book/projects/pierre/settings/media.
Are you sure (y/n) ? [Y,n]?

This made me realize that after calling random.seed() from lino_xl.lib.vat.choicelists, it gets called two more times from tempfile. Ha! No need to dig more! We must simply use our own random generator in lino_xl.lib.vat.choicelists!