How should Load Average be calculated on a CPU with Efficiency Cores?
I recently received a MacBook Pro with an M1 pro CPU, which has 2 "efficiency" cores and 8 performance cores. When I run htop/btop/top I get a load average of >2 because the process scheduler always assigns certain lower-demand processes to the efficiency cores, which results in those cores always running at 60 to 100% capacity.
I feel like the 2 efficiency cores reduce the utility of the load average metric, which was already reduced due to multiple cores. Back in the dim, distant past we had single core CPUs that the load average made intuitive sense on. However now we have 2 types of CPU core in a single system, and my most recent phone has 3 different types of core: efficiency, performance, and a single ultra performance core.
How should such a new load average be calculated? Are there any ongoing efforts to redefine a general system-load metric?
Since efficiency cores are made to run low priority processes, perhaps excluding those from the default metric makes sense? Then divide the remaining load value among the non-efficiency CPUs.
For instance, a load average of 3.4. Subtract 2 for the efficiency cores, 1.4. Then divide by the number of performance cores, 1.4 / 8 = 0.175.
First, we should talk about what "load" actually is: it’s the number of running processes, plus the number of processes that are currently in what is called "uninterruptible sleep", i.e. mostly these waiting for data from peripheral IO.
This means that if you have a 16 core machine (no matter how "strong or weak" the individual cores are), and you run 100 processes, each heavily loading IO (say through hammering pages of memory-mapped files, or through being loaded as needed from swap), you can get a load of 100. This means this load average is not primarily a function of how used your CPU cores are. It tries to be a bit of a measure how loaded your current system is.
It’s not really useful to compare loads across different machines. It does make sense to compare them on the same machine – if your smoothed load (load average is wrong in the strict sense; if you have electrical engineering background, these are exponentially weighted moving averages, i.e., simple single-pole IIR filter outputs) decreases, you are catching up on work, if it increases, you are getting more work in. But you can’t put two different machines next to each other and expect the loads to be sensibly comparable – and as such, redefining load to be more universally useful across different CPU architectures is kind of a moot thing to do. It was never meant to be that.
I feel like the 2 efficiency cores reduce the utility of the load average metric
Now, knowing you asked the question assuming that "load" is "CPU load" alone (i.e. the amount of CPUs not idling at any given moment), we’ll have to address that instead of discussing the Linux "load" measure.
That CPU utilization was already nearly useless in the past – since CPU cores were dynamically frequency-scaled; the moment that hit consumer computers (laptops) was ca 2000 with AMD’s PowerNow!. 50% time of a core runnning at 1.2 GHz is simply not worth that of a core running at 3.9 GHz; and both exist in the same system at the same time, and they both can change frequencies during the time being taken into account to calculate long-term averages. And that will happen exactly when the CPU utilization crosses thresholds and there’s enough tasks in a ready state. So, yeah, no. I honest treat "CPU utilization" as nothing more than "if it’s below maybe 90%, my system definitely can do more CPU work, other than that, it’s hard to tell and depends on the work".
So, frankly, it’s not been an overly useful metric in the last ~ 20 years or so.
Back in the dim, distant past we had single core CPUs that the load average made intuitive sense on.
Frequency scaling already was a thing back then, and "the CPU is waiting for storage or network 90% of time" still looks like low utilization, but really says "of the workload I’m trying to serve, I’ve got 100% saturation". There’s a very limited amount of intuitivity there – latest since the early 1990s, CPUs and RAM and peripheral buses were decoupled, and that made measuring the time the CPU was not idle a bit of a questionable attempt at describing system load. All it gives you is the info whether one of many limited resources in your computer are exhausted, and the percentage isn’t anywhere near proportional to the actual ability to do more work, because a computer at 0.1·(core count) CPU utilization is probably not running much at high clock rates, so at higher load, the computer suddenly becomes more capable. The same, if even more complex and hard to predict, happens when the computer has heterogenous CPU cores (and that’s not a new invention – hyperthreading already brought that to consumer computers around 15 years ago, and NUMA computers were a thing long long before).
How should such a new load average be calculated?
Frankly: not at all. Stay with the current metric! The current load metric serves one purpose: make workload situation on the same machine comparable. And it continues doing that; it’s hard to do sensible cross-platform comparisons anyway:
Are there any ongoing efforts to redefine a general system-load metric?
People that actually care about how utilized their systems are to dimension them correctly for their workloads (and not just like, for a laptop, whether the system that they have can do what it should do fast enough) do the opposite of boiling down system utilization to a workload-independent single number: they need to look at what is (are) the bottleneck(s) for that particular workload (or the necessary range of workloads), and then assign the necessary additional resources – and "CPU time" is just one of these resources.