Send in your Unix questions today! |
See additional Unix tips and tricks
If you remember a column that I wrote back in November when I was
baffled by a periodic reboot of a Windows box that I manage and trying
my hand at various DOS commands to determine when and why the system was
rebooting, you might be interested in knowing the cause of the problem.
The system was rebooting every other Thursday without leaving any
evidence except the notices in the events monitor stating that the
reboot that had occurred minutes before was unexpected. This problem
had been occurring for months by the time we realized it was occurring
regularly. Ironically, the system was rebooting in the middle of my
staff meetings -- until we moved the clocks back an hour on November
4th. I scheduled a reminder so that my blackberry would tell me to log
in just prior to the next expected occurrence of the reboot, but all I
was able to determine was that everything looked normal until the
system froze. The processes looked normal, performance was great and
no error messages or warnings were in evidence. A few minutes later,
the system was back up with the usual notice about the unexpected
reboot.
The first thing I do any time a strange problem occurs is enter a few
search terms in Google looking for other people who have seen and,
hopefully, solved the same problem. This time, however, I was getting
nothing and, frankly, terms like "Windows" and "reboot"
are going to
generate more hits than I'd have time to review even if I were to spend
the rest of my life reviewing them. Of course, I entered "two weeks"
and "2 weeks", but I wasn't getting anywhere. It struck me as very
strange that anything on this server that I manage would be causing
such regular reboots. What, after all, would be both so regular and
so badly behaved as to unceremoniously crash the system every two weeks?
I know enough about Windows to look for scheduled jobs that might have
developed an evil second nature and enough about the applications on
this particular server to know how they work. Nothing seemed to fit
the bill. It was as if my serious and unusually considerate boss
threw a cup of coffee across the room -- and at a precise two-week
interval.
The answer slowly came to the surface when we searched on "14 days"
instead of "two weeks". First, we found this comment that had been
posted years earlier by someone else who'd had a Windows system that
was similarly rebooting every two weeks, followed by a response to
check the UPS if one was involved:
"I have a Dell PowerEdge 2650 running W2K server SP4 which has been
rebooting itself for several months now. This always occurs every
14 days, and always at the same time of day. This has not been a big
issue for the users, but I would like to clear it up before it
becomes one."
A little more digging and we learned that (some) APC UPS devices
perform a self-test every 14 days. We checked our system. Yes, it
was plugged into an APC UPS and, further, it seemed that the device's
battery was going bad. We scheduled a time to move the system off
the small UPS and plug it into lab power (protected by a room-sized
UPS device). Problem resolved.