systemd does not consistently capture and log stdout

I have a minimal script managed by a oneshot systemd service. Sometimes the stdout echoed by this script appears in the systemd logs. Sometimes it does not. It randomly appears about 50% of the time.

Systemd service:

> cat ~/.config/systemd/user/simple.service
[Unit]
Description=Simple service

[Service]
ExecStart=/bin/sh -c "/home/miles/temp/simple.sh"
Type=oneshot
RemainAfterExit=1

Inner script:

> cat /home/miles/temp/simple.sh
#!/usr/bin/env bash

echo Starting simple
echo exiting
exit 0

I restart the service with:

> systemctl --user restart simple.service

Sometimes the expected script output appears:

> systemctl status --user simple.service
● simple.service - Simple service
     Loaded: loaded (/home/miles/.config/systemd/user/simple.service; static)
     Active: active (exited) since Thu 2023-11-16 19:54:59 PST; 1min 57s ago
    Process: 3617756 ExecStart=/bin/sh -c /home/miles/temp/simple.sh (code=exited, status=0/SUCCESS)
   Main PID: 3617756 (code=exited, status=0/SUCCESS)
        CPU: 4ms

Nov 16 19:54:59 miles-desk systemd[3064]: Starting Simple service...
Nov 16 19:54:59 miles-desk sh[3617757]: Starting simple
Nov 16 19:54:59 miles-desk sh[3617757]: exiting
Nov 16 19:54:59 miles-desk systemd[3064]: Finished Simple service.

And then sometimes when I restart, the script output is missing:

> systemctl status --user simple.service
● simple.service - Simple service
     Loaded: loaded (/home/miles/.config/systemd/user/simple.service; static)
     Active: active (exited) since Thu 2023-11-16 19:58:57 PST; 29s ago
    Process: 3621103 ExecStart=/bin/sh -c /home/miles/temp/simple.sh (code=exited, status=0/SUCCESS)
   Main PID: 3621103 (code=exited, status=0/SUCCESS)
        CPU: 4ms

Nov 16 19:58:57 miles-desk systemd[3064]: Starting Simple service...
Nov 16 19:58:57 miles-desk systemd[3064]: Finished Simple service.

But then after a few more restarts, I’ll see the output again:

● simple.service - Simple service
     Loaded: loaded (/home/miles/.config/systemd/user/simple.service; static)
     Active: active (exited) since Thu 2023-11-16 20:01:53 PST; 6s ago
    Process: 3622119 ExecStart=/bin/sh -c /home/miles/temp/simple.sh (code=exited, status=0/SUCCESS)
   Main PID: 3622119 (code=exited, status=0/SUCCESS)
        CPU: 4ms

Nov 16 20:01:53 miles-desk systemd[3064]: Starting Simple service...
Nov 16 20:01:53 miles-desk sh[3622120]: Starting simple
Nov 16 20:01:53 miles-desk sh[3622120]: exiting
Nov 16 20:01:53 miles-desk systemd[3064]: Finished Simple service.

It seems totally random whether output appears. Perhaps there’s a flushing issue, but I’d expect systemd to flush all output once the service finishes.

Version info:

> systemd --version
systemd 249 (249.11-0ubuntu3.11)
+PAM +AUDIT +SELINUX +APPARMOR +IMA +SMACK +SECCOMP +GCRYPT +GNUTLS +OPENSSL +ACL +BLKID +CURL +ELFUTILS +FIDO2 +IDN2 -IDN +IPTC +KMOD +LIBCRYPTSETUP +LIBFDISK +PCRE2 -PWQUALITY -P11KIT -QRENCODE +BZIP2 +LZ4 +XZ +ZLIB +ZSTD -XKBCOMMON +UTMP +SYSVINIT default-hierarchy=unified

> uname -a
Linux miles-desk 6.2.0-36-generic #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct  9 15:34:04 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
Asked By: MilesF

||

I suspect you’re hitting the race condition documented in journald is unable to attribute messages incoming from processes that exited to their cgroup, due to /proc vs SCM_CREDS race #2913:

It looks like journalctl –user-unit=… is dropping log lines that happen immediately before a user job exits.

where Lennart Poettering commented on the FreeDesktop.org bug:

This is a race. We get the client PID with SCM_CREDENTIALS and then try to read _CMDLINE/_COMM/_EXEC and the cgroupd ata from /proc quickly but at that time the process might already have died.

To fix this properly we need SCM_COMM or so, to get the process cmdline/comm/exe, and SCM_CGROUPS to get the cgroup data in a race free fashion. Until this is added to the kernel this will continue to be racy. These items are listed on the plumbers wishlist, because we need this from the kernel folks.

One workaround that I found from Evgeny Vereshchagin’s answer was:

[Unit]
Description=Simple service

[Service]
SyslogIdentifier=simple-service
ExecStart=/home/miles/temp/simple.sh
Type=oneshot
RemainAfterExit=1

… the important change being the addition of SyslogIdentifier=simple-service, but I also simplified the /bin/sh -c "/home/miles/temp/simple.sh" line. There’s no need to invoke another shell with /bin/sh. Simply ensure that simple.sh is executable; it already has a functional sh-bang line.

The output was actually in the journal before, but wasn’t associated with a unit — only tagged with sh from syslog:

Nov 17 11:58:00 workstation sh[409442]: Starting simple
Nov 17 11:58:00 workstation sh[409442]: exiting

Using SyslogIdentifier is a workaround, because the output is still not associated with the unit, but it’s more easily found in the journal:

$ journalctl --user SYSLOG_IDENTIFIER=simple-service
-- Logs begin at Fri 2023-06-09 08:11:26 EDT, end at Fri 2023-11-17 12:16:10 EST. --
Nov 17 12:16:10 workstation simple-service[409845]: Starting simple
Nov 17 12:16:10 workstation simple-service[409845]: exiting
Answered By: Jeff Schaller
Categories: Answers Tags:
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.