cronjob to watch for runaway processes and kill them
I have a runaway ruby process – I know exactly how I trigger it.
Point is, it got me thinking about runaway processes (CPU usage or memory usage).
-
How would one monitor runaway
processes with cron? grep / top / ulimit? -
Can one notify the user via the
command line if something like this
happens? -
What alternatives are there to Monit?
Instead of writing a script yourself you could use the verynice utility. Its main focus is on dynamic process renicing but it also has the option to kill runaway processes and is easily configured.
The more conventional way to do this would be by imposing hard limits via ulimit
— it can even stop a forkbomb. As Marcel Stimberg said, verynice is a similar utility but focuses solely on nice value rather than, say, limiting memory usage which was included in your question.
Here is a script that looks for all processes having more than 3h CPU time and then kills them. The first awk
commands filters the processes – here, those that are not owned by root. We first send all of those processes the terminate-signal (-TERM
) so as to ask them to exit nicely. If they are still present after 3 seconds we kill them without interaction (-KILL
).
#!/bin/tcsh
# Get a list of all processes that are not owned by root
set processes = `ps -ef --no-headers | awk '($1 != "root") {print $2}'`
# Iterate over the list of processes and set TERM signal to all of them
foreach process ($processes)
# Get the CPU time of the current process
set cputime = `ps -p $process --no-headers -o cputime | tail -n 1`
# Convert the CPU time to hours
set cputime_hours = `echo $cputime | awk -F: '{print $1+0}'`
# If the CPU time is greater than 3 hours, kill the process
if ($cputime_hours >= 3) then
kill -TERM $process
endif
end
# Give them time to exit cleanly
if (${%processes} > 1) then
sleep 3
endif
# Kill those that are left
foreach process ($processes)
# Get the CPU time of the current process
set cputime = `ps -p $process --no-headers -o cputime | tail -n 1`
# Convert the CPU time to hours
set cputime_hours = `echo $cputime | awk -F: '{print $1+0}'`
# If the CPU time is greater than 3 hours, kill the process
if ($cputime_hours >= 3) then
kill -KILL $process
endif
end
Create that file as root, e.g. as /root/kill-old-processes
. Make it executable, e.g. by
chmod 750 /root/kill-old-processes
You can then add it to root
‘s crontab by calling (as root
):
crontab -e
and add the following line at the end:
4,9,14,19,24,29,34,39,44,49,54,59 * * * * /root/kill-old-processes >> /var/log/kill-old-processes.log 2>&1
This particular line will run the script every five minutes at the given minutes past each hour, every day.
Short note: the shell script uses the tcsh
, please install the shell if it isn’t installed yet.