CPU IDLE low as results of bad disks is it the case here
Low CPU IDLE can be caused by a variety of factors, including:Insufficient RAM or slow Hard Disk Drive
but in our RHEL server RAM memory have enough RAM but from dmesg we found couple errors about the disks drive
our suspicion is about the disks as for example sdk and sdc and that because we saw from dmesg errors as [sdk] tag#0 Add. Sense: Unrecovered read error
here the details from sar
command that show the CPU IDLE values
09:43:56 AM CPU %user %nice %system %iowait %steal %idle
09:44:01 AM all 98.57 0.00 0.62 0.00 0.00 0.80
09:44:06 AM all 98.26 0.00 0.92 0.01 0.00 0.81
09:44:11 AM all 97.29 0.00 1.66 0.01 0.00 1.03
09:44:16 AM all 92.81 0.00 6.06 0.03 0.00 1.10
09:44:21 AM all 92.31 0.00 6.43 0.05 0.00 1.21
Average: all 95.85 0.00 3.14 0.02 0.00 0.99
09:44:21 AM CPU %user %nice %system %iowait %steal %idle
09:44:22 AM all 96.52 0.00 3.10 0.00 0.00 0.38
09:44:22 AM 0 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 1 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 2 100.00 0.00 0.00 0.00 0.00 0.00
09:44:22 AM 3 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 4 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 5 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 6 97.98 0.00 2.02 0.00 0.00 0.00
09:44:22 AM 7 97.98 0.00 2.02 0.00 0.00 0.00
09:44:22 AM 8 98.99 0.00 1.01 0.00 0.00 0.00
09:44:22 AM 9 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 10 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 11 98.02 0.00 0.99 0.00 0.00 0.99
09:44:22 AM 12 97.00 0.00 1.00 0.00 0.00 2.00
09:44:22 AM 13 96.97 0.00 3.03 0.00 0.00 0.00
09:44:22 AM 14 98.02 0.00 0.99 0.00 0.00 0.99
09:44:22 AM 15 94.00 0.00 6.00 0.00 0.00 0.00
09:44:22 AM 16 83.00 0.00 16.00 0.00 0.00 1.00
09:44:22 AM 17 98.00 0.00 1.00 0.00 0.00 1.00
09:44:22 AM 18 96.97 0.00 2.02 0.00 0.00 1.01
09:44:22 AM 19 96.00 0.00 4.00 0.00 0.00 0.00
09:44:22 AM 20 97.98 0.00 1.01 0.00 0.00 1.01
09:44:22 AM 21 95.05 0.00 4.95 0.00 0.00 0.00
09:44:22 AM 22 94.95 0.00 5.05 0.00 0.00 0.00
09:44:22 AM 23 98.99 0.00 1.01 0.00 0.00 0.00
09:44:22 AM 24 98.99 0.00 1.01 0.00 0.00 0.00
09:44:22 AM 25 99.00 0.00 1.00 0.00 0.00 0.00
09:44:22 AM 26 98.99 0.00 1.01 0.00 0.00 0.00
09:44:22 AM 27 98.99 0.00 1.01 0.00 0.00 0.00
09:44:22 AM 28 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 29 98.00 0.00 2.00 0.00 0.00 0.00
09:44:22 AM 30 94.95 0.00 5.05 0.00 0.00 0.00
09:44:22 AM 31 97.03 0.00 1.98 0.00 0.00 0.99
09:44:22 AM 32 98.02 0.00 1.98 0.00 0.00 0.00
09:44:22 AM 33 99.00 0.00 1.00 0.00 0.00 0.00
09:44:22 AM 34 98.00 0.00 1.00 0.00 0.00 1.00
09:44:22 AM 35 97.98 0.00 2.02 0.00 0.00 0.00
09:44:22 AM 36 94.00 0.00 5.00 0.00 0.00 1.00
09:44:22 AM 37 98.02 0.00 0.99 0.00 0.00 0.99
09:44:22 AM 38 97.98 0.00 1.01 0.00 0.00 1.01
09:44:22 AM 39 89.00 0.00 11.00 0.00 0.00 0.00
09:44:22 AM 40 83.00 0.00 13.00 0.00 0.00 4.00
09:44:22 AM 41 97.00 0.00 3.00 0.00 0.00 0.00
09:44:22 AM 42 91.92 0.00 8.08 0.00 0.00 0.00
09:44:22 AM 43 94.06 0.00 5.94 0.00 0.00 0.00
09:44:22 AM 44 92.93 0.00 7.07 0.00 0.00 0.00
09:44:22 AM 45 97.00 0.00 3.00 0.00 0.00 0.00
09:44:22 AM 46 99.00 0.00 1.00 0.00 0.00 0.00
09:44:22 AM 47 98.99 0.00 1.01 0.00 0.00 0.00
sar -B 2 5
09:44:24 AM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
09:44:26 AM 14852.00 71776.00 101443.50 0.00 216420.00 0.00 0.00 0.00 0.00
09:44:28 AM 14336.00 184.00 5123.00 0.00 47167.50 0.00 0.00 0.00 0.00
09:44:30 AM 14418.00 203778.00 67194.50 0.00 132952.50 0.00 0.00 0.00 0.00
09:44:32 AM 14352.00 220796.00 2475.00 0.00 59666.00 0.00 0.00 0.00 0.00
09:44:34 AM 13318.00 56996.00 16290.00 0.00 9599.00 0.00 0.00 0.00 0.00
Average: 14255.20 110706.00 38505.20 0.00 93161.00 0.00 0.00 0.00 0.00
from vmstat
command
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
65 0 3505188 6265864 4828612 304096576 0 0 137 127 0 0 49 1 50 0 0
63 1 3505188 6068484 4828660 304294848 0 0 12292 41500 95782 88751 98 2 1 0 0
66 0 3505188 5933464 4828672 304429248 0 0 14668 130968 85788 90844 97 2 1 0 0
r: The number of processes waiting for run time.
from kernel messages we get:
[117426425.532990] blk_update_request: critical medium error, dev sdc, sector 116127985
[117426431.038365] sd 0:0:3:0: [sdc] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[117426431.038374] sd 0:0:3:0: [sdc] tag#0 Sense Key : Medium Error [current] [descriptor]
[117426431.038378] sd 0:0:3:0: [sdc] tag#0 Add. Sense: Unrecovered read error
[117426431.038383] sd 0:0:3:0: [sdc] tag#0 CDB: Read(16) 88 00 00 00 00 00 06 eb f8 f0 00 00 00 08 00 00
[117426431.038386] blk_update_request: critical medium error, dev sdc, sector 116127985
[139602560.596832] traps: polkitd[27641] general protection ip:7f7996318cf2 sp:7ffe7a28e5b0 error:0 in libmozjs-17.0.so[7f79961da000+3b3000]
[144770588.094226] sd 0:0:11:0: [sdk] tag#0 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE
[144770588.094238] sd 0:0:11:0: [sdk] tag#0 Sense Key : Medium Error [current] [descriptor]
[144770588.094242] sd 0:0:11:0: [sdk] tag#0 Add. Sense: Unrecovered read error
[144770588.094248] sd 0:0:11:0: [sdk] tag#0 CDB: Read(16) 88 00 00 00 00 00 01 15 20 00 00 00 02 00 00 00
so based on above output is it make sense that the root cause of very low CPU IDLE is because disks errors as we get from kernel messages ?
Based on the timestamps, nearly a year passed between the two disk errors in your logs, so no, they’re not the reason your system isn’t idling.
As an aside, note that
r: The number of processes waiting for run time.
isn’t accurate: in vmstat
, the r
column shows the number of runnable processes, i.e. the number of processes either running or waiting to run. If you have many logical CPUs then a high number here isn’t a problem.