What is the rationale for the change of syscall calling convention in new Linuxes?

Quoting from https://www.kernel.org/doc/Documentation/process/adding-syscalls.rst:

At least on 64-bit x86, it will be a hard requirement from v4.17
onwards to not call system call functions in the kernel. It uses a
different calling convention for system calls where struct pt_regs
is decoded on-the-fly in a syscall wrapper which then hands processing
over to the actual syscall function. This means that only those
parameters which are actually needed for a specific syscall are passed
on during syscall entry, instead of filling in six CPU registers with
random user space content all the time (which may cause serious
trouble down the call chain).

What serious trouble down the call chain is the last parenthesized clause referring to?

To me it seems stupid not to load the six registers in the generic leadup to the syscall. Forcing each syscall wrapper to do it makes them larger and the syscall funcs become a new special case, so I’m wondering what the "serious trouble" is with having unintentional user content in unused argument registers.

Asked By: Petr Skocik

||

One of the concerns wasn’t so much with arbitrary register values, but that they get copied to the kernel stack. Unused registers can thus be used to write arbitrary caller-controlled values to the stack, with no checks.

These values on the stack could potentially be used in a more complex attack. That’s why removing this possibility seemed like a good idea.

Kees Cook’s 4.17 summary also mentions possible influence of these register values on speculative execution.

Answered By: Stephen Kitt
Categories: Answers Tags: ,
Answers are sorted by their score. The answer accepted by the question owner as the best is marked with
at the top-right corner.