Monday, April 27, 2026

Yesterday I finally found the explanation for for #6733 (laudate server not responding and kswapd0 CPU usage at 200%), which has occupied me for more than two weeks. I found the explanation because I had summarized my problem for a Debian consultant. A typical case of “Ask, and it will be given to you; seek, and you will find; knock, and it will be opened to you.” (Mt 7:7). Here is what I told him:

I use Postfix on my servers. Some servers are stand-alone mail servers, others use Zone Media as a smart host. On one of my servers I have the following warning in the journalctl:

apr   25 17:19:21 lumi postfix/postdrop[1000]: warning: mail_queue_enter: create file maildrop/965751.1000: Read-only file system

The first warning comes quickly after rebooting. It comes every 10 seconds (I guess that postdrop is configured to run every 10 seconds). In the beginning, after rebooting, it’s only one warning, After a minute or so there are already 2 of them, and that number continues to grow. ps shows a number of processes I don’t understand, and this list also grows:

postfix     1869  0.0  0.7  53376 14048 ?        S    17:20   0:00 tlsmgr -l -t unix -u -c
root        1877  0.0  0.1  18012  2308 ?        Ss   17:20   0:00 sudo supervisorctl status
root        1878  0.0  0.3  43956  7024 ?        S    17:20   0:00 sendmail -t
root        1879  0.0  0.3  43820  6996 ?        S    17:20   0:00 /usr/sbin/postdrop -r
root        1885  0.0  0.1  18012  2404 ?        Ss   17:22   0:00 sudo supervisorctl status
root        1886  0.0  0.3  43956  6976 ?        S    17:22   0:00 sendmail -t
root        1887  0.0  0.3  43820  7064 ?        S    17:22   0:00 /usr/sbin/postdrop -r
root        1892  0.0  0.0      0     0 ?        I    17:24   0:00 [kworker/0:0-events]
root        1900  0.0  0.1  18012  2336 ?        Ss   17:24   0:00 sudo supervisorctl status
root        1901  0.0  0.3  43956  7064 ?        S    17:24   0:00 sendmail -t
root        1902  0.0  0.3  43820  6880 ?        S    17:24   0:00 /usr/sbin/postdrop -r
root        1914  0.0  0.1  18012  2400 ?        Ss   17:26   0:00 sudo supervisorctl status
root        1915  0.0  0.3  43956  7036 ?        S    17:26   0:00 sendmail -t
root        1916  0.0  0.3  43820  6956 ?        S    17:26   0:00 /usr/sbin/postdrop -r

After a few hours of operation, the server collapses, CPU usage is 100%, and the only thing I can do is to reboot.

The next morning I thought: “Every two minutes… Tries to sudo but fails… Tries to send an email… this sounds like monit”. To verify my suspicion without too much effort, I removed the /etc/monit/conf.d/lino.conf file and then rebooted… and the issue was gone! This file is generated by getlino, so the culprit is getlino! This explains why the problem reappeared even after rebuilding the server (i.e. creating a completely new VPS with a virgin Debian) and reinstalling the Lino sites that were hosted on this server.

A look at the /var/log/monit.log file sheds more light on the issue:

[2026-04-23T12:28:45+0000] info     : 'status' status succeeded (0) -- sudo: PERM_SUDOERS: setresuid(-1, 1, -1): Operation not permitted
sudo: unable to open /etc/sudoers: Operation not permitted
sudo: error initializing audit plugin sudoers_audit
Checking supervisor status...
The supervisor status looks OK
[2026-04-23T12:28:45+0000] error    : Mail: No mail servers are defined -- please see the 'set mailserver' statement in the manual
[2026-04-23T12:28:45+0000] error    : Event queue is full
[2026-04-23T12:28:45+0000] error    : Aborting event - queue over quota
[2026-04-23T12:28:46+0000] error    : 'lumi' mem usage of 100.0% matches resource limit [mem usage > 75.0%]
[2026-04-23T12:28:46+0000] error    : Mail: No mail servers are defined -- please see the 'set mailserver' statement in the manual
[2026-04-23T12:28:47+0000] error    : Event queue is full

Here is the content of the /etc/monit/conf.d/lino.conf file:

# generated by getlino
set alert root@localhost with reminder on 2 cycles
check program status with path /usr/local/bin/healthcheck.sh
    if status != 0 then alert
check device ROOT  with path /
    if SPACE usage > 95% then alert
check system $HOST
    if memory usage > 75% for 5 cycles then alert

Yes, something is going wrong when monit tries to execute “check device ROOT with path / if SPACE usage > 95% then alert”. It’s some silly permission problem. But what exactly? And why did this wreak such a havoc? And how to avoid it in the future?

Observations:

  • We don’t want getlino to touch /etc/monit/monitrc. We assume monit to be

    correctly installed.

  • I restored the etc/monit/conf.d/lino.conf file except for the line “set alert root@localhost with reminder on 2 cycles” and rebooted. The issue reappears. So this line seems to not be the issue.

  • “cron wakes up every minute, examining all stored crontabs, checking each command to see if it should be run in the current minute. When executing commands, any output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists) from the owner of the crontab (or from the email address given in the MAILFROM environment variable in the crontab, if such exists). The children copies of cron running these processes have their name coerced to uppercase, as will be seen in the syslog and ps(1) output.”

  • In /etc/aliases I have a single entry:

    luc@lumi:~$ cat /etc/aliases
    root: luc
    

And I have a ~/.forward file with my real email address.

But when I send an email to root@localhost, the smart host thinks it is spam:

$ echo "Test" | mail -s "Test 2" root@localhost

Here are the relevant log entries:

apr   27 02:11:37 lumi postfix/pickup[1925]: 8BD7448423: uid=1000 from=<luc@lumi>
apr   27 02:11:37 lumi postfix/cleanup[1932]: 8BD7448423:
        message-id=<20260427021137.8BD7448423@lumi>
apr   27 02:11:37 lumi postfix/qmgr[1926]: 8BD7448423: from=<luc@lumi>,
        size=302, nrcpt=1 (queue active)
apr   27 02:11:38 lumi postfix/smtp[1934]: 8BD7448423: to=<root@localhost>,
        relay=smtp.zone.eu[85.234.244.110]:587, delay=0.55,
        delays=0.02/0.03/0.35/0.16, dsn=5.0.0, status=bounced (host
        smtp.zone.eu[85.234.244.110] said: 550 This message was classified as SPAM and
        may not be delivered (in reply to end of DATA command))
apr   27 02:11:38 lumi postfix/cleanup[1932]: 1EECA48424:
        message-id=<20260427021138.1EECA48424@lumi>
apr   27 02:11:38 lumi postfix/bounce[1936]: 8BD7448423: sender non-delivery
        notification: 1EECA48424
apr   27 02:11:38 lumi postfix/qmgr[1926]: 1EECA48424: from=<>, size=2207,
        nrcpt=1 (queue active)
apr   27 02:11:38 lumi postfix/qmgr[1926]: 8BD7448423: removed
apr   27 02:11:38 lumi postfix/smtp[1934]: 1EECA48424: to=<luc@lumi>,
        relay=smtp.zone.eu[85.234.244.79]:587, delay=0.67, delays=0/0/0.36/0.31,
        dsn=2.0.0, status=sent (250 Message queued as 19dccb4dd94000f038)
apr   27 02:11:38 lumi postfix/qmgr[1926]: 1EECA48424: removed

I did some adjustments in the /etc/postfix/mailn.cf