Why does Linux needs both pid_max and threads-max?
I understand the difference between /proc/sys/kernel/pid_max
and /proc/sys/kernel/threads-max
. There’s a good explanation at the answer to
Understanding the differences between pid_max, ulimit -u and thread_max:
/proc/sys/kernel/pid_max
has nothing to do with the maximum number
of processes that can be run at any given time. It is, in fact, the
maximum numerical PROCESS IDENTIFIER than can be assigned by the
kernel.In the Linux kernel, a process and a thread are one and the same.
They’re handled the same way by the kernel. They both occupy a slot
in the task_struct data structure. A thread, by common terminology,
is in Linux a process that shares resources with another process (they
will also share a thread group ID). A thread in the Linux kernel is
largely a conceptual construct as far as the scheduler is concerned.Now that you understand that the kernel largely does not differentiate
between a thread and a process, it should make more sense that
/proc/sys/kernel/threads-max
is actually the maximum number of
elements contained in the data structure task_struct. Which is the
data structure that contains the list of processes, or as they can be
called, tasks.
However, effectively, both limit the maximum number of concurrent threads on a host. This number will be – to my understanding – the minimum of pid_max
and threads-max
. So why are both needed?
I understand that the default value pid_max
is based on the number of possible CPUs of the machine while the default of threads-max
is derived from the number of pages. But since both have the same effect, couldn’t Linux just have one value that would be the minimum of both?
These settings don’t have the same effect:
threads-max
limits the number of processes which can be instantiated simultaneouslypid_max
limits the identifier assigned to processes
threads-max
limits the amount of memory that can end up allocated to task_struct
instances. pid_max
determines when pids roll around (if ever).
Constraining pid_max
doesn’t have an effect on memory consumption (as far as I’m aware, unless lots of pids end up stored as text), and can end up affecting performance since finding a new pid is harder once pid_max
has been reached. A lower pid_max
also increases the likelihood of pid reuse within a given time period.