When should I not kill -9 a process?
I am always very hesitant to run kill -9
, but I see other admins do it almost routinely.
I figure there is probably a sensible middle ground, so:
- When and why should
kill -9
be used? When and why not? - What should be tried before doing it?
- What kind of debugging a “hung” process could cause further problems?
Generally, you should use kill
(short for kill -s TERM
, or on most systems kill -15
) before kill -9
(kill -s KILL
) to give the target process a chance to clean up after itself. (Processes can’t catch or ignore SIGKILL
, but they can and often do catch SIGTERM
.) If you don’t give the process a chance to finish what it’s doing and clean up, it may leave corrupted files (or other state) around that it won’t be able to understand once restarted.
strace
/truss
, ltrace
and gdb
are generally good ideas for looking at why a stuck process is stuck. (truss -u
on Solaris is particularly helpful; I find ltrace
too often presents arguments to library calls in an unusable format.) Solaris also has useful /proc
-based tools, some of which have been ported to Linux. (pstack
is often helpful).
Never never do a kill -9 1
. Also avoid doing a kill on certain processes like mount`. When I have to kill a lot of processes (say for example an X session gets hung and I have to kill all the processes of a certain user), I reverse the order of the processes. For example:
ps -ef|remove all processes not matching a certain criteria| awk '{print $2}'|ruby -e '$A=stdin.readlines; A.reverse.each{|a| puts "kill -9 #{a}"}'|bash
Keep in mind that kill
does not stop a process and release its resources.
All it does is send a SIGKILL signal to the process; you could wind up with a process that’s hung.
Randal Schwartz used to frequently post “Useless use of (x)” on lists. One such post was about kill -9
. It includes reasons and a recipe to follow. Here is a reconstructed version (quoted below).
(Quote abomination)
No no no. Don’t use kill -9.
It doesn’t give the process a chance to cleanly:
1) shut down socket connections
2) clean up temp files
3) inform its children that it is going away
4) reset its terminal characteristics
and so on and so on and so on.
Generally, send 15, and wait a second or two, and if that doesn’t work, send 2, and if that doesn’t work, send 1. If that doesn’t, REMOVE THE BINARY because the program is badly behaved!
Don’t use kill -9. Don’t bring out the combine harvester just to tidy up the flower pot.
Just another Useless Use of Usenet,
(.signature)
From a programmer’s point of view, it should always be OK to do kill -9
, just like it should always be OK to shutdown by pulling the power cable. It may be anti-social, and leave some recovery to do, but it ought to work, and is a power tool for the impatient.
I say this as someone who will try plain kill (15)
first, because it does give a program a chance to do some cleanup — perhaps just writing to a log "exiting on sig 15". But I won’t accept any complaint about ill-behaviour on a kill -9
.
The reason:
- You can not prevent customers from doing silly things.
- Random
kill -9
testing is a good and fair test scenario. - If your system doesn’t handle it, your system is broken.
However, not every software we use is ideal.
Further more, if you use kill -9
, in any case, there is always a risk to lose data, regardless of the code robustness.
Not mentioned in all the other answers is a case where kill -9
doesn’t work at all, when a process is <defunct>
and cannot be killed:
How can I kill a <defunct> process whose parent is init?
What is defunct for a process and why it doesn’t get killed?
So before you attempt to kill -9
a <defunct>
process run ps -ef
to see what his parent is and attempt the -15
(TERM) or -2
(INT) and lastly -9
(KILL) on his parent.
Note: what ps -ef
does.
Later edit and caution: Proceed with caution when killing processes, their parent or their children, because they may leave files opened or corrupted, connections unfinished, may corrupt databases etc unless you know what kill -9
does for a process, use it only as a last resort, and if you need to run kill use the signals specified above before using -9 (KILL)
I use kill -9 in much the same way that I throw kitchen implements in the dishwasher: if a kitchen implement is ruined by the dishwasher then I don’t want it.
The same goes for most programs (even databases): if I can’t kill them without things going haywire, I don’t really want to use them. (And if you happen to use one of these non-databases that encourages you to pretend they have persisted data when they haven’t: well, I guess it is time you start thinking about what you are doing).
Because in the real world stuff can go down at any time for any reason.
People should write software that is tolerant to crashes. In particular on servers. You should learn how to design software that assumes that things will break, crash etc.
The same goes for desktop software. When I want to shut down my browser it usually takes AGES to shut down. There is nothing my browser needs to do that should take more than at most a couple of seconds. When I ask it to shut down it should manage to do that immediately. When it doesn’t, well, then we pull out kill -9 and make it.
Why you do not want to kill -9
a process normally
According to man 7 signal
:
The signals SIGKILL and SIGSTOP cannot be caught, blocked, or ignored.
This means that the application that receives either of these signals cannot “catch” them to do any shutdown behavior.
What you should do before running kill -9
on a process
You should make sure that before sending the signal to the process that you:
- Ensure that the process isn’t busy (ie doing “work”); sending a
kill -9
to the process will essentially result in the loss of this data. - If the process is an non-responsive database ensure that it has flushed its caches first. Some databases support sending other signals to the process to force the flushing of its cache.
Killing processes willy-nilly is not a smooth move: data can be lost, poorly-designed apps can break themselves in subtle ways that cannot be fixed without a reinstall.. but it completely depends on knowing what is and what is not safe in a given situation.
and what would be at risk. The user should have some idea what a process is, or should be, doing and what it’s constraints are (disk IOPS, rss/swap) and be able to estimate how much time a long-running process should take (say a file copy, mp3 reencoding, email migration, backup, [your favorite timesink here].)
Furthermore, sending SIGKILL
to a pid is no guarantee of killing it. If it’s stuck in a syscall or already zombied (Z
in ps
), it may continue to be zombied. This is often the case of ^Z a long running process and forgetting to bg
before trying to kill -9
it. A simple fg
will reconnect stdin/stdout and probably unblock the process, usually then followed by the process terminating. If it’s stuck elsewhere or in some other form of kernel deadlock, only a reboot may be able to remove the process. (Zombie processes are already dead after SIGKILL
is processed by the kernel (no further userland code will run), there’s usually a kernel reason (similar to being “blocked” waiting on a syscall to finish) for the process not terminating.)
Also, if you want to kill a process and all of its children, get into the habit of calling kill
with the negated PID, not just the PID itself. There’s no guarantee of SIGHUP
, SIGPIPE
or SIGINT
or other signals cleaning up after it, and having a bunch of disowned processes to cleanup (remember mongrel?) is annoying.
Bonus evil: kill -9 -1
is slightly more damaging than kill -9 1
(Don’t do either as root unless you want to see what happens on a throw-away, non-important VM)
I’ve created a script that helps automate this issue.
It is based on my complete answer 2 in a question very similar at stackoverflow.
You can read all the explanations there. To summarize I would recommend just SIGTERM
and SIGKILL
, or even SIGTERM
, SIGINT
and SIGKILL
. However I give more options in the complete answer.
Please, feel free to download (clone) it from the github repository to killgracefully 1