Friday, November 20, 2015¶
Duplicate sequence numbers¶
I wrote a little pm run
script which reproduces my problem:
from django.conf import settings
from lino.api import rt
ses = rt.login()
# part 1
settings.SITE.site_config.import_b2c(ses)
# part 2
Account = rt.models.b2c.Account
obj = Account.objects.get(iban='...')
ses.show(rt.models.b2c.StatementsByAccount, obj)
The output of this script, part 1 is:
INFO 371 XML files with 357 new and 14 updated statements have been imported.
And part 2 for a given IBAN:
================= ================ ============ =============== ========== ==========
Sequence number Initial amount Start date Final amount End date Currency
----------------- ---------------- ------------ --------------- ---------- ----------
74 2 441,59 06.11.15 2 361,59 09.11.15 EUR
74 2 361,59 09.11.15 2 251,59 13.11.15 EUR
74 2 361,59 09.11.15 2 251,59 13.11.15 EUR
74 2 251,59 13.11.15 2 181,59 16.11.15 EUR
74 2 251,59 13.11.15 2 181,59 16.11.15 EUR
74 2 181,59 16.11.15 2 154,94 17.11.15 EUR
**444** **13 849,54** **13 382,89**
================= ================ ============ =============== ========== ==========
It is a pity that Hamza cannot work on this (because it requires access to a collection of confidential XML files and it seems difficult to get a legal agreement about this). So I must do it myself.
Tonight I started by tidying up the code. I almost refactored the code
in lino_xl.lib.b2c.camt
and lino_xl.lib.b2c.models
:
I completely removed parserlib and now work without getters and
setters. I thought it just insane to have so many different names for
every data element. I also changed parts of Hamza’s code, trying to
make it more secure against bugs. I also changed some database field
names, e.g. replaced execution_date by booking_date (because the
XML tag is BookgDt). lino_xl.lib.b2c.camt
itself seems
well done, it was mostly that additional layer of field names
introduced by parserlib which disturbed me. It also contained
several hacks (e.g. for isolating the partner name?) which I removed
because I did not understand their necessity and they were not
documented.
My refactoring did not directly bring any change. I had to dive deeper yet. The problem was about how to identify statements which had been downloaded twice. See below. The current version implements it using my suggestion below. Now the output of my script seems more correct:
==================== ================ ============ ============== ========== ==========
Statement number Initial amount Start date Final amount End date Currency
-------------------- ---------------- ------------ -------------- ---------- ----------
2015/0074 2 361,59 09.11.15 2 251,59 13.11.15 EUR
**Total (1 rows)** **2 361,59** **2 251,59**
==================== ================ ============ ============== ========== ==========
Only later I discovered that this parserlib file was part of the bigger picture: it is maintained by the Odoo Community Association, and the original source file imports it directly from openerp. So yes, we moved away from the openerp parserlib because it would have taken me weeks of work to fully understand all those hacks and details which may occur in different banks. Is it worth to undo my refactoring and to integrate back with OpenERP? But it I say in advance that it won’t be easy because we would need to extend that parserlib definition…
Identifying incoming bank statements¶
A question to bank experts:
I am writing an import module for importing CAMT 053 into our database.
There are 2 files dated 20151113 which contain a statement for the
observed account. Both files are almost identical. It is normal to
have such duplicates because these XML files arrive to our incoming
folder by several ways, they are manually downloaded using the bank’s
web interface and authorisation check. Different users may do this
action at different times, and therefore we cannot exclude this to
happen. And now the interesting part: the value of XYZ in
<Stmd><Id>XYZ<Id> is not identical, despite the fact that it’s
documentation is “Unique identification, as assigned by the account
servicer, to unambiguously identify the account statement”. In
COD_20151113_OW1L3FY13K.XML
it is:
<Stmt><Id>153170000121620/000001-000001</Id><ElctrncSeqNb>132</ElctrncSeqNb><LglSeqNb>74</LglSeqNb>
while in COD_20151113_OP3L3FY13K.XML
it is:
<Stmt><Id>153170000120519/000001-000001</Id><ElctrncSeqNb>132</ElctrncSeqNb><LglSeqNb>74</LglSeqNb>
Question: how can I reliably identify individual statements from duplicate XML files which have been downloaded by possibly different users?
My suggestion is to use a combination of the year and the LegalSequenceNumber (the LegalSequenceNumber itself seems to restart every calendar year, I don’t believe that above account had only 74 statements since its creation).
Triaging tickets for Aurélie and Mathieu¶
#607 was possibly just because of a wrong icon_name. I replaced it by a unicode character “south east arrow to corner”, so now it looks like this: ⇲.
#577 was simply because the MIME type for docs files is
longer than 64 characters. I changed the max_length of
lino.mixins.uploadable.Uploadable.mimetype
from 64 to 255.
See also this thread about length of MIME type field.