 |
Unix Tip: Using SCAT (Solaris CAT) for analyzing crash dumps
ITworld 04/23/2008
Sandra Henry-Stocker, ITworld.com
Want to do something with those crash dump files other than remove them? Want
to extract some useful information without a lot of work? Take a look at Sun's
free crash analysis tool, scat.
While "scat" in this context stands for "Solaris Crash Analysis
Tool", I find myself drawing an analogy to the other meaning of the word.
"Scat" refers to the dung of certain animals. Analysis of animal scat
can tell you a lot about the health and the diet of the particular animals and
can be useful in identifying the species as well. Similarly, scat analysis of
the system variety can provide useful information for diagnosing the ailments
of your Solaris servers -- in particular, what caused them to crash.
Examining the "droppings" of a system can require considerable skill
if you end up delving deeply into the crash dump files. At the same time, even
the simplest use of the scat command can provide you with some of the most telling
information about a crash. You simply give the tool the numeric identifier of
the crash dump you want to know about. If your crash dump files are unix.0 and
vmcore.0 (i.e., the first set of a possible collection of crash dump files),
for example, you would issue commands such as these:
# cd /var/crash/boson
# /opt/SUNWscat/bin/scat 0
|
Notice that, since the scat program wants only the numeric identifier for the
crash dump files, you must first cd to the directory in which these files are
stored.
The unix.X and vmcore.X files are generated when a Solaris system panics and
are normally stored in /var/crash/`uname -n`.
The initial output from scat will identify the system and provide information
on the date and time the system crashed, how long the system had been up at
the time, the system type, hostid, panic string, etc. Here's an example:
Solaris[TM] CAT 4.1 (build 526) for Solaris 9 64-bit SPARC(sun4u)
Copyright © 2003 Sun Microsystems, Inc. All rights reserved.
Patents Pending. Use is subject to license terms.
Sun Microsystems proprietary - DO NOT RE-DISTRIBUTE!
Feedback regarding the tool should be sent to SolarisCAT_Feedback@Sun.COM
opening vmcore.0 ...dumphdr...symtab...core...done
loading core data: modules...panic...memory...time...misc...done
loading stabs...read_type_db: Wrong number of lines in database, or database
doesn't end in a newline
unable to load any stabs file
patches... - NOT AVAILABLE (No such file or directory) done
core file: /var/crash/boson/vmcore.0
user: Super-User (root:0)
release: 5.9 (64-bit)
version: Generic_112233-11
machine: sun4u
node name: boson
domain: lab.particles.org
hw_provider: Sun_Microsystems
system type: SUNW,Sun-Fire-V210
hostid: 837844c7
time of crash: Tue Apr 22 11:49:52 EDT 2008
age of system: 22 hours 5 minutes 4.48 seconds
panic cpu: 0 (ncpus: 1)
panic string: free: freeing free block, dev:0x200000016e, block:32032, ino:6057255,
fs:/homes
running sanity checks.../etc/system...ndd...sysent...misc...done
SolarisCAT(vmcore.0)>
|
All of this information is derived from the vmcore.X file. As you can see from
the display above, this particular core dump was generated when the system panicked
with a "freeing free block" problem. Notice that you are left at the
scat command prompt. If you want to delve further into the contents of your
crash dump files, type "help" at this prompt and view all the scat
commands at your disposal. Type "help" followed by a specific command
(e.g., "help proc") and you'll get a brief description of what that
particular command can do for you.
The proc command, for example, can tell you about the processes that were running
at the time your system crashed. These processes are listed by default in reverse
PID order.
SolarisCAT(vmcore.0)> proc
addr pid ppid uid size rss swresv time command
------------- ------ ------ ------ ---------- -------- -------- ------ ---------
0x30002674a98 7774 613 0 8323072 4161536 3055616 148 /usr/local/samba/bin/smbd
-D -s/usr/local/samba/lib/smb.conf
0x300122e2ac0 7759 613 0 8151040 3923968 2883584 16 /usr/local/samba/bin/smbd
-D -s/usr/local/samba/lib/smb.conf
... truncated ...
0x30003d75480 402 1 0 1892352 1523712 294912 1 /usr/sadm/lib/smc/bin/smcboot
0x30003d5e058 395 1 25 4554752 1589248 745472 0 /usr/lib/sendmail -Ac -q15m
0x30003bef460 393 1 0 4587520 2211840 753664 5 /usr/lib/sendmail -bd -q15m
0x30003d5f488 376 1 0 1081344 811008 163840 3 /usr/lib/utmpd
0x30003d74a68 352 1 0 3530752 1212416 393216 0 /usr/lib/power/powerd
0x30000f60a28 318 1 0 1146880 933888 131072 0 /bin/sh ./ssdgrptd exec
0x30003c97468 305 1 0 2121728 1212416 319488 2 /usr/lib/inet/xntpd
addr pid ppid uid size rss swresv time command
------------- ------ ------ ------ ---------- -------- -------- ------ ---------
0x30003c8e040 283 1 0 3776512 1646592 1302528 90118 /usr/sbin/ssmon
0x30003c96a50 279 1 0 9306112 2514944 1769472 19 /usr/sbin/ssserver
0x30003bee030 256 1 0 27656192 2596864 1138688 57 /usr/sbin/nscd
0x30003c8ea58 243 1 0 2506752 1703936 466944 7 /usr/sbin/cron
0x30003c96038 240 1 0 18874368 2170880 2711552 7 /usr/sbin/syslogd
0x30000f60010 225 1 0 7217152 2400256 1146880 170 /usr/lib/autofs/automountd
0x300020c4a40 217 1 0 2260992 1572864 598016 3 /usr/lib/nfs/lockd
0x300020c5458 213 1 1 4677632 1974272 876544 2 /usr/lib/nfs/statd
0x300020c4028 201 1 0 2629632 2048000 835584 12 /usr/sbin/inetd -s
0x300020a6020 183 1 0 2195456 1392640 671744 2 /usr/lib/netsvc/yp/ypbind
0x30003beea48 173 1 0 4431872 1277952 737280 3 /usr/sbin/keyserv
0x300020a6a38 170 1 0 2236416 1261568 663552 157 /usr/sbin/rpcbind
0x30000e08018 84 1 0 1875968 638976 212992 0 /etc/opt/SUNWconn/atm/bin/ilmid
0x300020a7450 82 1 0 2457600 720896 360448 0 /etc/opt/SUNWconn/atm/bin/atmsnmpd
-n
0x30000e08a30 58 1 0 7716864 2818048 1556480 147 /usr/lib/picl/picld
0x30000e09448 53 1 0 15695872 1400832 655360 6 /usr/lib/sysevent/syseventd
0x30000f92008 3 0 0 Kernel Kernel Kernel 23134 fsflush
0x30000f92a20 2 0 0 Kernel Kernel Kernel 0 pageout
0x30000f93438 1 0 0 1277952 720896 221184 9 /etc/init -
0x1438518 0 0 0 Kernel Kernel Kernel 46 sched
|
If you prefer, you can organize this display by sorting on a different field.
The "proc sort size" command sorts the processes based on the size
field while "proc sort command" sorts on the command string. Using
"proc sort -r pid", you will get the default proc display, but sorted
in the reverse order (lower PIDs first).
If you want to see a process tree display for a particular PID, use "proc
tree" followed by the PID you're interested in examining.
SolarisCAT(vmcore.0)> proc tree 402
1 /etc/init -
402 /usr/sadm/lib/smc/bin/smcboot
410 /usr/sadm/lib/smc/bin/smcboot
407 /usr/sadm/lib/smc/bin/smcboot
|
The analyze command will repeat the initial information displayed when you
start scat, but will then proceed with some additional information, like what
you see below.
SolarisCAT(vmcore.0)> analyze
PANIC: free: freeing free block, dev:0x%lx, block:%ld, ino:%lu, fs:%s
core file: /var/crash/boson/vmcore.0
user: Super-User (root:0)
release: 5.9 (64-bit)
version: Generic_112233-11
machine: sun4u
node name: boson
domain: lab.particles.org
hw_provider: Sun_Microsystems
system type: SUNW,Sun-Fire-V210
hostid: 835026a9
time of crash: Tue Apr 22 11:49:52 EDT 2008
age of system: 22 hours 5 minutes 4.48 seconds
panic cpu: 0 (ncpus: 1)
panic string: free: freeing free block, dev:0x200000016e, block:32032, ino:6057255,
fs:/homes
1 cpu
==== printing for generic panic information ====
cpu 0 had the panic
==== panic thread: 0x2a1003f7d40 ==== cpu: 0 ====
==== panic kernel thread: 0x2a1003f7d40 pid: 0 on cpu: 0 ====
cmd: sched
t_stk: 0x2a1003f7b50 sp: 0x1437751 t_stkbase: 0x2a1003f4000
t_pri: 60(SYS) pctcpu: 0.000000 t_lwp: 0x0
t_procp: 0x1438518(proc_sched) p_as: 0x1438400(kas)
last cpuid: 0
idle: 50 ticks (0.50 seconds)
start: Mon Apr 21 13:45:07 2008
age: 79485 seconds (22 hours 4 minutes 45 seconds)
stime: 2132 (22 hours 4 minutes 43.16 seconds earlier)
tstate: TS_ONPROC - thread is being run on a processor
tflg: T_TALLOCSTK - thread structure allocated from stk
T_DONTBLOCK - for lockfs
T_PANIC - thread initiated a system panic
tpflg: none set
tsched: TS_LOAD - thread is in memory
TS_DONT_SWAP - thread/LWP should not be swapped
TS_SIGNALLED - thread was awakened by cv_signal()
pflag: SSYS - system resident process
SLOAD - in core
SLOCK - process cannot be swapped
pc: 0x104a720 unix:panicsys+0x44: call unix:setjmp
startpc: 0x11a53f8 ufs:ufs_thread_delete+0x0: save %sp, -0xd0, %sp
unix:panicsys+0x44 (0x14a3158, 0x2a1003f74a0, 0x1438120, 0x1, 0x0, 0x0)
unix:vpanic+0xcc (0x14a3158, 0x2a1003f74a0, 0x30000260708, 0x1, 0x14a05f30,
0x0)
genunix:vcmn_err+0x18 (0x3, 0x14a3158, 0x2a1003f74a0, 0x0, 0x0, 0x300000488c8)
ufs:real_panic_v - frame recycled
ufs:ufs_fault_v+0xa8 (0x2a1003f7738, 0x14a3158, 0x2a1003f74a0, 0xfa4, 0x1dac0,
0x3b9aca00)
ufs:ufs_fault+0x1c (0x2a1003f7738, 0x14a3158, 0x200000016e, 0x7d20, 0x5c6d27,
0x30003f880d4)
ufs:free - frame recycled
ufs:indirtrunc+0x2a0 (0xffffffffffffffff, 0x300157ae000, 0xffffffffffffffff,
0xc, 0x10, 0xffffffffffffffff)
ufs:indirtrunc+0x260 (0xe4, 0x300032c2000, 0x72639, 0xc, 0x10, 0x0)
ufs:ufs_itrunc+0x650 (0x2a1003f7698, 0x89ddd58, 0x8, 0x300008e1f28, 0x0, 0x4)
ufs:ufs_trans_itrunc+0x170 (0x0, 0x0, 0x300008e1f28, 0xffbf, 0x300075fa3c8,
0x30000048910)
ufs:ufs_delete+0x364 (0x0, 0x300075fa308, 0x300075fa3c8, 0x14a3c70, 0x2a1003f7d40,
0x0)
ufs:ufs_thread_delete+0xc4 (0x3000026e828, 0x0, 0x30003bee030, 0x1438518, 0x16,
0x0)
unix:thread_start+0x4 (0x3000026e828, 0x0, 0x0, 0x0, 0x0, 0x0)
-- end of kernel thread's stack --
SolarisCAT(vmcore.0)>
|
As you can see, we are getting into considerably more dense information. However,
it's not too hard to pick out additional evidence that the crash was due to
a ufs failure. It takes considerable familiarity with the system calls included
in the analyze output to identify the sequence as the removal of a large file.
You exit scat by typing "quit".
If you issue the scat command without an argument, you will be examining the
running kernel. Scat will remind you of this fact displaying the word "live"
in its prompt as shown here:
The SUNWscat package (version 4.1) can be downloaded here.
It installs in /opt and includes quite a bit of user documentation. The larger
of the two versions of scat includes a patch database which identifies known kernel
modules within crash dumps. I was using the "lite" version in this column.
Advanced use of scat requires an in-depth understanding of the Solaris kernel.
However, you can get a lot of useful information by using just the basic commands.
Sandra Henry-Stocker has been administering Unix systems
for more than 18 years. She describes herself as "USL"
(Unix as a second language) but remembers enough English
to write books and buy groceries. She
currently works for TeleCommunication Systems, a wireless
communications company, in Annapolis, Maryland, where no
one else necessarily shares any of her opinions. She lives
with her second family on a small farm on Maryland's
Eastern Shore. Send comments and suggestions to bugfarm@gmail.com.
|
|
|
|
|
Advertisements | |
|
 |
|