open.itworld.com
  Search  
Security Home Page Security Webcasts Security White Papers Security Newsletters Security News Open Topics Careers ITworld Voices ITwhirled The Security site of ITworld.com

Unix Tip: Using SCAT (Solaris CAT) for analyzing crash dumps

ITworld 04/23/2008

Sandra Henry-Stocker, ITworld.com

Bookmark and Share

Want to do something with those crash dump files other than remove them? Want to extract some useful information without a lot of work? Take a look at Sun's free crash analysis tool, scat.

While "scat" in this context stands for "Solaris Crash Analysis Tool", I find myself drawing an analogy to the other meaning of the word. "Scat" refers to the dung of certain animals. Analysis of animal scat can tell you a lot about the health and the diet of the particular animals and can be useful in identifying the species as well. Similarly, scat analysis of the system variety can provide useful information for diagnosing the ailments of your Solaris servers -- in particular, what caused them to crash. Examining the "droppings" of a system can require considerable skill if you end up delving deeply into the crash dump files. At the same time, even the simplest use of the scat command can provide you with some of the most telling information about a crash. You simply give the tool the numeric identifier of the crash dump you want to know about. If your crash dump files are unix.0 and vmcore.0 (i.e., the first set of a possible collection of crash dump files), for example, you would issue commands such as these:

  # cd /var/crash/boson
  # /opt/SUNWscat/bin/scat 0
Notice that, since the scat program wants only the numeric identifier for the crash dump files, you must first cd to the directory in which these files are stored.

The unix.X and vmcore.X files are generated when a Solaris system panics and are normally stored in /var/crash/`uname -n`. The initial output from scat will identify the system and provide information on the date and time the system crashed, how long the system had been up at the time, the system type, hostid, panic string, etc. Here's an example:

  Solaris[TM] CAT 4.1 (build 526) for Solaris 9 64-bit SPARC(sun4u)
 Copyright © 2003 Sun Microsystems, Inc. All rights reserved.
  Patents Pending. Use is subject to license terms.
  Sun Microsystems proprietary - DO NOT RE-DISTRIBUTE!
 Feedback regarding the tool should be sent to SolarisCAT_Feedback@Sun.COM
opening vmcore.0 ...dumphdr...symtab...core...done
  loading core data: modules...panic...memory...time...misc...done
  loading stabs...read_type_db: Wrong number of lines in database, or database
  doesn't end in a newline
  unable to load any stabs file
  patches... - NOT AVAILABLE (No such file or directory) done
core file: /var/crash/boson/vmcore.0
  user: Super-User (root:0)
  release: 5.9 (64-bit)
  version: Generic_112233-11
  machine: sun4u
  node name: boson
  domain: lab.particles.org
  hw_provider: Sun_Microsystems
  system type: SUNW,Sun-Fire-V210
  hostid: 837844c7
  time of crash: Tue Apr 22 11:49:52 EDT 2008
  age of system: 22 hours 5 minutes 4.48 seconds
  panic cpu: 0 (ncpus: 1)
  panic string: free: freeing free block, dev:0x200000016e, block:32032, ino:6057255, 
  fs:/homes
running sanity checks.../etc/system...ndd...sysent...misc...done
  SolarisCAT(vmcore.0)>
All of this information is derived from the vmcore.X file. As you can see from the display above, this particular core dump was generated when the system panicked with a "freeing free block" problem. Notice that you are left at the scat command prompt. If you want to delve further into the contents of your crash dump files, type "help" at this prompt and view all the scat commands at your disposal. Type "help" followed by a specific command (e.g., "help proc") and you'll get a brief description of what that particular command can do for you.

The proc command, for example, can tell you about the processes that were running at the time your system crashed. These processes are listed by default in reverse PID order.

  SolarisCAT(vmcore.0)> proc
  addr pid ppid uid size rss swresv time command
  ------------- ------ ------ ------ ---------- -------- -------- ------ ---------
  0x30002674a98 7774 613 0 8323072 4161536 3055616 148 /usr/local/samba/bin/smbd 
  -D -s/usr/local/samba/lib/smb.conf
  0x300122e2ac0 7759 613 0 8151040 3923968 2883584 16 /usr/local/samba/bin/smbd 
  -D -s/usr/local/samba/lib/smb.conf
  ... truncated ...
  0x30003d75480 402 1 0 1892352 1523712 294912 1 /usr/sadm/lib/smc/bin/smcboot
  0x30003d5e058 395 1 25 4554752 1589248 745472 0 /usr/lib/sendmail -Ac -q15m
  0x30003bef460 393 1 0 4587520 2211840 753664 5 /usr/lib/sendmail -bd -q15m
  0x30003d5f488 376 1 0 1081344 811008 163840 3 /usr/lib/utmpd
  0x30003d74a68 352 1 0 3530752 1212416 393216 0 /usr/lib/power/powerd
  0x30000f60a28 318 1 0 1146880 933888 131072 0 /bin/sh ./ssdgrptd exec
  0x30003c97468 305 1 0 2121728 1212416 319488 2 /usr/lib/inet/xntpd
  addr pid ppid uid size rss swresv time command
  ------------- ------ ------ ------ ---------- -------- -------- ------ ---------
  0x30003c8e040 283 1 0 3776512 1646592 1302528 90118 /usr/sbin/ssmon
  0x30003c96a50 279 1 0 9306112 2514944 1769472 19 /usr/sbin/ssserver
  0x30003bee030 256 1 0 27656192 2596864 1138688 57 /usr/sbin/nscd
  0x30003c8ea58 243 1 0 2506752 1703936 466944 7 /usr/sbin/cron
  0x30003c96038 240 1 0 18874368 2170880 2711552 7 /usr/sbin/syslogd
  0x30000f60010 225 1 0 7217152 2400256 1146880 170 /usr/lib/autofs/automountd
  0x300020c4a40 217 1 0 2260992 1572864 598016 3 /usr/lib/nfs/lockd
  0x300020c5458 213 1 1 4677632 1974272 876544 2 /usr/lib/nfs/statd
  0x300020c4028 201 1 0 2629632 2048000 835584 12 /usr/sbin/inetd -s
  0x300020a6020 183 1 0 2195456 1392640 671744 2 /usr/lib/netsvc/yp/ypbind
  0x30003beea48 173 1 0 4431872 1277952 737280 3 /usr/sbin/keyserv
  0x300020a6a38 170 1 0 2236416 1261568 663552 157 /usr/sbin/rpcbind
  0x30000e08018 84 1 0 1875968 638976 212992 0 /etc/opt/SUNWconn/atm/bin/ilmid
  0x300020a7450 82 1 0 2457600 720896 360448 0 /etc/opt/SUNWconn/atm/bin/atmsnmpd 
  -n
  0x30000e08a30 58 1 0 7716864 2818048 1556480 147 /usr/lib/picl/picld
  0x30000e09448 53 1 0 15695872 1400832 655360 6 /usr/lib/sysevent/syseventd
  0x30000f92008 3 0 0 Kernel Kernel Kernel 23134 fsflush
  0x30000f92a20 2 0 0 Kernel Kernel Kernel 0 pageout
  0x30000f93438 1 0 0 1277952 720896 221184 9 /etc/init -
  0x1438518 0 0 0 Kernel Kernel Kernel 46 sched
If you prefer, you can organize this display by sorting on a different field. The "proc sort size" command sorts the processes based on the size field while "proc sort command" sorts on the command string. Using "proc sort -r pid", you will get the default proc display, but sorted in the reverse order (lower PIDs first).

If you want to see a process tree display for a particular PID, use "proc tree" followed by the PID you're interested in examining.

  SolarisCAT(vmcore.0)> proc tree 402
  1 /etc/init -
  402 /usr/sadm/lib/smc/bin/smcboot
  410 /usr/sadm/lib/smc/bin/smcboot
  407 /usr/sadm/lib/smc/bin/smcboot
The analyze command will repeat the initial information displayed when you start scat, but will then proceed with some additional information, like what you see below.

  SolarisCAT(vmcore.0)> analyze
  PANIC: free: freeing free block, dev:0x%lx, block:%ld, ino:%lu, fs:%s

  core file: /var/crash/boson/vmcore.0
  user: Super-User (root:0)
  release: 5.9 (64-bit)
  version: Generic_112233-11
  machine: sun4u
  node name: boson
  domain: lab.particles.org
  hw_provider: Sun_Microsystems
  system type: SUNW,Sun-Fire-V210
  hostid: 835026a9
  time of crash: Tue Apr 22 11:49:52 EDT 2008
  age of system: 22 hours 5 minutes 4.48 seconds
  panic cpu: 0 (ncpus: 1)
  panic string: free: freeing free block, dev:0x200000016e, block:32032, ino:6057255, 
  fs:/homes

  1 cpu
==== printing for generic panic information ====
cpu 0 had the panic

  ==== panic thread: 0x2a1003f7d40 ==== cpu: 0 ====
  ==== panic kernel thread: 0x2a1003f7d40 pid: 0 on cpu: 0 ====
  cmd: sched
t_stk: 0x2a1003f7b50 sp: 0x1437751 t_stkbase: 0x2a1003f4000
  t_pri: 60(SYS) pctcpu: 0.000000 t_lwp: 0x0
  t_procp: 0x1438518(proc_sched) p_as: 0x1438400(kas)
  last cpuid: 0
  idle: 50 ticks (0.50 seconds)
  start: Mon Apr 21 13:45:07 2008
  age: 79485 seconds (22 hours 4 minutes 45 seconds)
  stime: 2132 (22 hours 4 minutes 43.16 seconds earlier)
  tstate: TS_ONPROC - thread is being run on a processor
  tflg: T_TALLOCSTK - thread structure allocated from stk
  T_DONTBLOCK - for lockfs
  T_PANIC - thread initiated a system panic
  tpflg: none set
  tsched: TS_LOAD - thread is in memory
  TS_DONT_SWAP - thread/LWP should not be swapped
  TS_SIGNALLED - thread was awakened by cv_signal()
  pflag: SSYS - system resident process
  SLOAD - in core
  SLOCK - process cannot be swapped
pc: 0x104a720 unix:panicsys+0x44: call unix:setjmp
  startpc: 0x11a53f8 ufs:ufs_thread_delete+0x0: save %sp, -0xd0, %sp
unix:panicsys+0x44 (0x14a3158, 0x2a1003f74a0, 0x1438120, 0x1, 0x0, 0x0)
  unix:vpanic+0xcc (0x14a3158, 0x2a1003f74a0, 0x30000260708, 0x1, 0x14a05f30, 
  0x0)
  genunix:vcmn_err+0x18 (0x3, 0x14a3158, 0x2a1003f74a0, 0x0, 0x0, 0x300000488c8)
  ufs:real_panic_v - frame recycled
  ufs:ufs_fault_v+0xa8 (0x2a1003f7738, 0x14a3158, 0x2a1003f74a0, 0xfa4, 0x1dac0, 
  0x3b9aca00)
  ufs:ufs_fault+0x1c (0x2a1003f7738, 0x14a3158, 0x200000016e, 0x7d20, 0x5c6d27, 
  0x30003f880d4)
  ufs:free - frame recycled
  ufs:indirtrunc+0x2a0 (0xffffffffffffffff, 0x300157ae000, 0xffffffffffffffff, 
  0xc, 0x10, 0xffffffffffffffff)
  ufs:indirtrunc+0x260 (0xe4, 0x300032c2000, 0x72639, 0xc, 0x10, 0x0)
  ufs:ufs_itrunc+0x650 (0x2a1003f7698, 0x89ddd58, 0x8, 0x300008e1f28, 0x0, 0x4)
  ufs:ufs_trans_itrunc+0x170 (0x0, 0x0, 0x300008e1f28, 0xffbf, 0x300075fa3c8, 
  0x30000048910)
  ufs:ufs_delete+0x364 (0x0, 0x300075fa308, 0x300075fa3c8, 0x14a3c70, 0x2a1003f7d40, 
  0x0)
  ufs:ufs_thread_delete+0xc4 (0x3000026e828, 0x0, 0x30003bee030, 0x1438518, 0x16, 
  0x0)
  unix:thread_start+0x4 (0x3000026e828, 0x0, 0x0, 0x0, 0x0, 0x0)
  -- end of kernel thread's stack --

SolarisCAT(vmcore.0)>
As you can see, we are getting into considerably more dense information. However, it's not too hard to pick out additional evidence that the crash was due to a ufs failure. It takes considerable familiarity with the system calls included in the analyze output to identify the sequence as the removal of a large file.

You exit scat by typing "quit". If you issue the scat command without an argument, you will be examining the running kernel. Scat will remind you of this fact displaying the word "live" in its prompt as shown here:

  SolarisCAT(live)>
The SUNWscat package (version 4.1) can be downloaded here. It installs in /opt and includes quite a bit of user documentation. The larger of the two versions of scat includes a patch database which identifies known kernel modules within crash dumps. I was using the "lite" version in this column. Advanced use of scat requires an in-depth understanding of the Solaris kernel. However, you can get a lot of useful information by using just the basic commands.

On this topic

 

Sandra Henry-Stocker has been administering Unix systems for more than 18 years. She describes herself as "USL" (Unix as a second language) but remembers enough English to write books and buy groceries. She currently works for TeleCommunication Systems, a wireless communications company, in Annapolis, Maryland, where no one else necessarily shares any of her opinions. She lives with her second family on a small farm on Maryland's Eastern Shore. Send comments and suggestions to bugfarm@gmail.com.




Sponsored Links

TRY MICROSOFT DYNAMICS® CRM Online FREE
Get Microsoft Dynamics CRM capabilities without the investment in IT infrastructure and staffing.
Client PCs As Strategic Assets
Read How Intel® Is Decreasing TCO And Delivering Measurable Business Value Year After Year.
Experience The Benefits Of Intel® vPro™ Technology
Get Built-In Security And Remote Management Capabilities. Meet Critical Business Challenges.
Sign up for a FREE NETWORK RISK ASSESSMENT!
MORE THAN 70% OF NETWORKS ARE INFECTED by hidden Malware. Find out if your network is infected now!
Check Out This Promotional Deal-SONY VAIO SZ645PA!
SYNNEX RESELLERS – This Is One Of The Top Notebooks On The Market Today. Hurry Up, Buy Now & Save!
» Buy a link now

Advertisements
Sponsored links
KODAK i1400 Series Scanners stand up to the challenge
Locate Hidden Software on business PCs with this free tool
Bring harmony to your mix of UNIX-Linux-Windows computing environments
Top 5 Reasons to Combine App Performance and Security
 Home   Open source  Operating systems  Unix
www.itworld.com    open.itworld.com     security.itworld.com     smallbusiness.itworld.com
storage.itworld.com     utilitycomputing.itworld.com     wireless.itworld.com

 
Contact Us   About Us   Privacy Policy    Terms of Service   Reprints  

CIO   Computerworld   CSO   GamePro   Games.net   IDG Connect   IDG World Expo   Infoworld   ITworld   JavaWorld   LinuxWorld  MacUser   Macworld   Network World   PC World   Playlist  

Copyright © Computerworld, Inc. All rights reserved

Reproduction in whole or in part in any form or medium without express written permission of Computerworld Inc. is prohibited. Computerworld and Computerworld.com and the respective logos are trademarks of International Data Group Inc.