Do I own a watchdog?
Quite often times when I do a reboot, I get the following error message:
kernel: watchdog watchdog0: watchdog did not stop!
I tried to find out more about watchdog by doing man watchdog
, but it says no manual entry. I tried yum list watchdog
and found that it was not installed. However, when I look at the /dev
directory, I actually found two watchdogs:
watchdog
and watchdog0
I am curious. Do I actually own any watchdogs? Why does the kernel complain that it did not stop when I do a reboot?
Most modern PC hardware includes watchdog timer facilities. You can read more about them here via wikipedia: Watchdog Timers. Also from the Linux kernel docs:
excerpt – https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt
A Watchdog Timer (WDT) is a hardware circuit that can reset the
computer system in case of a software fault. You probably knew that
already.Usually a userspace daemon will notify the kernel watchdog driver via
the /dev/watchdog special device file that userspace is still alive,
at regular intervals. When such a notification occurs, the driver
will usually tell the hardware watchdog that everything is in order,
and that the watchdog should wait for yet another little while to
reset the system. If userspace fails (RAM error, kernel bug,
whatever), the notifications cease to occur, and the hardware watchdog
will reset the system (causing a reboot) after the timeout occurs.The Linux watchdog API is a rather ad-hoc construction and different
drivers implement different, and sometimes incompatible, parts of it.
This file is an attempt to document the existing usage and allow
future driver writers to use it as a reference.
This SO Q&A titled, Who is refreshing hardware watchdog in Linux?, covers the linkage between the Linux kernel and the hardware watchdog timer.
What about the watchdog package?
The description in the RPM makes this pretty clear, IMO. The watchdog
daemon can either act as a software watchdog or can interact with the hardware implementation.
excerpt from RPM description
The watchdog program can be used as a powerful software watchdog
daemon or may be alternately used with a hardware watchdog device such
as the IPMI hardware watchdog driver interface to a resident Baseboard
Management Controller (BMC). watchdog periodically writes to
/dev/watchdog; the interval between writes to /dev/watchdog is
configurable through settings in the watchdog sysconfig file.This configuration file is also used to set the watchdog to be used as
a hardware watchdog instead of its default software watchdog
operation. In either case, if the device is open but not written to
within the configured time period, the watchdog timer expiration will
trigger a machine reboot. When operating as a software watchdog, the
ability to reboot will depend on the state of the machine and
interrupts.When operating as a hardware watchdog, the machine will experience a
hard reset (or whatever action was configured to be taken upon
watchdog timer expiration) initiated by the BMC.
As a side note, to search your manuals, you can do:
man -k watchdog
and any manual which uses the word at least in the name or title/description will show up in your console. If you expect many options, you may want to use it with less
as in:
man -k watchdog | less
In our case here, though, you probably won’t get much more than 2 or 3 entries.
That shows a utility which is named wdctl
, allowing you to see the current status/setup of the watchdog. Here is an example on one of the Jetson boards I’m working with:
$ wdctl
Device: /dev/watchdog
Identity: Tegra WDT [version 1]
Timeout: 120 seconds
Pre-timeout: 0 seconds
FLAG DESCRIPTION STATUS BOOT-STATUS
KEEPALIVEPING Keep alive ping reply 0 0
MAGICCLOSE Supports magic close char 0 0
SETTIMEOUT Set timeout (in seconds) 0 0
We can see the Timeout
entry which tells you how long the watchdog will wait before forcing an auto-reboot.
In newer versions of Linux, this is controlled through systemd
. Look at /etc/systemd/system.conf
where you can find a couple of parameters (usually commented out by default):
[Manager]
...
#RuntimeWatchdogSec=0
#ShutdownWatchdogSec=10min
...
Note: If easier for you, you could also use the apropos
command line tool instead of man -k
. It does the same thing, although the -k
is not required in this case.
apropos watchdog
Also the keyword (watchdog
in this example) can be a regular expression by default.