Monday, April 27, 2026¶
Yesterday I finally found the explanation for for #6733 (laudate server not responding and kswapd0 CPU usage at 200%), which has occupied me for more than two weeks. I found the explanation because I had summarized my problem for a Debian consultant. A typical case of “Ask, and it will be given to you; seek, and you will find; knock, and it will be opened to you.” (Mt 7:7). Here is what I told him:
I use Postfix on my servers. Some servers are stand-alone mail servers, others use Zone Media as a smart host. On one of my servers I have the following warning in the journalctl:
apr 25 17:19:21 lumi postfix/postdrop[1000]: warning: mail_queue_enter: create file maildrop/965751.1000: Read-only file systemThe first warning comes quickly after rebooting. It comes every 10 seconds (I guess that postdrop is configured to run every 10 seconds). In the beginning, after rebooting, it’s only one warning, After a minute or so there are already 2 of them, and that number continues to grow.
psshows a number of processes I don’t understand, and this list also grows:postfix 1869 0.0 0.7 53376 14048 ? S 17:20 0:00 tlsmgr -l -t unix -u -c root 1877 0.0 0.1 18012 2308 ? Ss 17:20 0:00 sudo supervisorctl status root 1878 0.0 0.3 43956 7024 ? S 17:20 0:00 sendmail -t root 1879 0.0 0.3 43820 6996 ? S 17:20 0:00 /usr/sbin/postdrop -r root 1885 0.0 0.1 18012 2404 ? Ss 17:22 0:00 sudo supervisorctl status root 1886 0.0 0.3 43956 6976 ? S 17:22 0:00 sendmail -t root 1887 0.0 0.3 43820 7064 ? S 17:22 0:00 /usr/sbin/postdrop -r root 1892 0.0 0.0 0 0 ? I 17:24 0:00 [kworker/0:0-events] root 1900 0.0 0.1 18012 2336 ? Ss 17:24 0:00 sudo supervisorctl status root 1901 0.0 0.3 43956 7064 ? S 17:24 0:00 sendmail -t root 1902 0.0 0.3 43820 6880 ? S 17:24 0:00 /usr/sbin/postdrop -r root 1914 0.0 0.1 18012 2400 ? Ss 17:26 0:00 sudo supervisorctl status root 1915 0.0 0.3 43956 7036 ? S 17:26 0:00 sendmail -t root 1916 0.0 0.3 43820 6956 ? S 17:26 0:00 /usr/sbin/postdrop -rAfter a few hours of operation, the server collapses, CPU usage is 100%, and the only thing I can do is to reboot.
The next morning I thought: “Every two minutes… Tries to sudo but fails…
Tries to send an email… this sounds like monit”. To verify my suspicion
without too much effort, I removed the /etc/monit/conf.d/lino.conf file
and then rebooted… and the issue was gone! This file is generated by
getlino, so the culprit is getlino! This explains why the problem
reappeared even after rebuilding the server (i.e. creating a completely new VPS
with a virgin Debian) and reinstalling the Lino sites that were hosted on this
server.
A look at the /var/log/monit.log file sheds more light on the issue:
[2026-04-23T12:28:45+0000] info : 'status' status succeeded (0) -- sudo: PERM_SUDOERS: setresuid(-1, 1, -1): Operation not permitted
sudo: unable to open /etc/sudoers: Operation not permitted
sudo: error initializing audit plugin sudoers_audit
Checking supervisor status...
The supervisor status looks OK
[2026-04-23T12:28:45+0000] error : Mail: No mail servers are defined -- please see the 'set mailserver' statement in the manual
[2026-04-23T12:28:45+0000] error : Event queue is full
[2026-04-23T12:28:45+0000] error : Aborting event - queue over quota
[2026-04-23T12:28:46+0000] error : 'lumi' mem usage of 100.0% matches resource limit [mem usage > 75.0%]
[2026-04-23T12:28:46+0000] error : Mail: No mail servers are defined -- please see the 'set mailserver' statement in the manual
[2026-04-23T12:28:47+0000] error : Event queue is full
Here is the content of the /etc/monit/conf.d/lino.conf file:
# generated by getlino
set alert root@localhost with reminder on 2 cycles
check program status with path /usr/local/bin/healthcheck.sh
if status != 0 then alert
check device ROOT with path /
if SPACE usage > 95% then alert
check system $HOST
if memory usage > 75% for 5 cycles then alert
Yes, something is going wrong when monit tries to execute “check device ROOT with path / if SPACE usage > 95% then alert”. It’s some silly permission problem. But what exactly? And why did this wreak such a havoc? And how to avoid it in the future?
Observations:
- We don’t want getlino to touch
/etc/monit/monitrc. We assume monit to be correctly installed.
- We don’t want getlino to touch
I restored the
etc/monit/conf.d/lino.conffile except for the line “set alert root@localhost with reminder on 2 cycles” and rebooted. The issue reappears. So this line seems to not be the issue.“cron wakes up every minute, examining all stored crontabs, checking each command to see if it should be run in the current minute. When executing commands, any output is mailed to the owner of the crontab (or to the user named in the MAILTO environment variable in the crontab, if such exists) from the owner of the crontab (or from the email address given in the MAILFROM environment variable in the crontab, if such exists). The children copies of cron running these processes have their name coerced to uppercase, as will be seen in the syslog and ps(1) output.”
In
/etc/aliasesI have a single entry:luc@lumi:~$ cat /etc/aliases root: luc
And I have a ~/.forward file with my real email address.
But when I send an email to root@localhost, the smart host thinks it is spam:
$ echo "Test" | mail -s "Test 2" root@localhost
Here are the relevant log entries:
apr 27 02:11:37 lumi postfix/pickup[1925]: 8BD7448423: uid=1000 from=<luc@lumi>
apr 27 02:11:37 lumi postfix/cleanup[1932]: 8BD7448423:
message-id=<20260427021137.8BD7448423@lumi>
apr 27 02:11:37 lumi postfix/qmgr[1926]: 8BD7448423: from=<luc@lumi>,
size=302, nrcpt=1 (queue active)
apr 27 02:11:38 lumi postfix/smtp[1934]: 8BD7448423: to=<root@localhost>,
relay=smtp.zone.eu[85.234.244.110]:587, delay=0.55,
delays=0.02/0.03/0.35/0.16, dsn=5.0.0, status=bounced (host
smtp.zone.eu[85.234.244.110] said: 550 This message was classified as SPAM and
may not be delivered (in reply to end of DATA command))
apr 27 02:11:38 lumi postfix/cleanup[1932]: 1EECA48424:
message-id=<20260427021138.1EECA48424@lumi>
apr 27 02:11:38 lumi postfix/bounce[1936]: 8BD7448423: sender non-delivery
notification: 1EECA48424
apr 27 02:11:38 lumi postfix/qmgr[1926]: 1EECA48424: from=<>, size=2207,
nrcpt=1 (queue active)
apr 27 02:11:38 lumi postfix/qmgr[1926]: 8BD7448423: removed
apr 27 02:11:38 lumi postfix/smtp[1934]: 1EECA48424: to=<luc@lumi>,
relay=smtp.zone.eu[85.234.244.79]:587, delay=0.67, delays=0/0/0.36/0.31,
dsn=2.0.0, status=sent (250 Message queued as 19dccb4dd94000f038)
apr 27 02:11:38 lumi postfix/qmgr[1926]: 1EECA48424: removed
I did some adjustments in the /etc/postfix/mailn.cf