Why can the waitpid system call only be used with child processes?

The man page wait(2) states that the waitpid system call returns the ECHILD error if the specified process is not a child of the calling process. Why is this? Would waiting on a non-child process create some sort of security issue? Is there a technical reason why implementing waiting on a non-child process would be difficult or impossible?

Asked By: Tanner Swett

||

Because of how waitpid works. On a POSIX system, a signal (SIGCHLD) is delivered to a parent process when one of its child processes dies. At a high level, all waitpid is doing is blocking until a SIGCHLD signal is delivered for the process (or one of the processes) specified. You can’t wait on arbitrary processes, because the SIGCHLD signal would never be delivered for them.

Answered By: godlygeek

godlygeek’s answer is good for understanding how the system works but the subsequent question that inevitably follows is:

How to determine if a process has gone away?

The correct way to wait on a process in another process group or session is to use kill(). Obviously, that is an unintuitive answer. You can’t use the wait family of functions because the SIGCHILD signal won’t ever be passed to your process nor can you get the status code. kill(), however, can tell you when a specific process has gone away by passing in 0 for the signal to send, which simply checks if a signal can be sent to the process. The return value of kill() is complex but can be boiled down to this: A value of 0 means the process is alive and would accept signals from your process while a value of -1 and errno EPERM means the process is alive but not accepting signals from your process.

Some sample C code that checks once per second to see if an arbitrary process is gone:

int res = kill(pid, 0);
while (res == 0 || (res < 0 && errno == EPERM))
{
    sleep(1);

    res = kill(pid, 0);
}

You can similarly experiment with the kill command:

kill -0 <pid>

That will pass pid and 0 into kill(). Some shells have a built-in kill, so it’s much more efficient than starting a new process (e.g. ps).

Answered By: CubicleSoft