Thursday, May 2, 2013

Using GDB to explore a core file

Before we go any further, please note that these are instructions for a Centos 6.3 OS.  They should work in a similar way on Redhat and Fedora.

Recently, I've been experiencing issues with pacemaker immediately after ugprading from one version to another.  To make a long story short, crmd, which is part of the pacemaker package, was crashing every 15 minutes and dumping a core file.  It became necessary to read the details of the core dump using gdb.  To be fair though, the service itself stayed up fine but kept respawning child processes.  So, even though it complains and dumps a core file, it is still robust enough to continue operating.

Now the issue becomes, how to get the error content out of the core file?  We first need to make sure the right repositories are available for our OS:

# DEBUGINFO
[debuginfo]
name=debuginfo
baseurl=http://debuginfo.centos.org/$releasever/$basearch
enabled=0
gpgcheck=0

You will need a tool called 'debuginfo-install' which is part of a yum utility package.

# yum install yum-utils

Install gdb

# yum install gdb

Run gdb against your executable and it's core file.  For example:

# gdb /usr/libexec/pacemaker/crmd ./core.26688

If you are missing any debuginfo file, gdb will let you know which ones.  You can then install them as required.  For example:

Missing separate debuginfos, use: debuginfo-install audit-libs-2.2-2.el6.x86_64

Simply run the suggested command to get the right debuginfo files.

# debuginfo-install audit-libs-2.2-2.el6.x86_64 --enablerepo=debuginfo

Once you've got all your debug info files, run the gdb debugger once again.

# gdb /usr/libexec/pacemaker/crmd ./core.26688
...
Core was generated by `/usr/libexec/pacemaker/crmd'.
Program terminated with signal 6, Aborted.
#0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
Missing separate debuginfos, use: debuginfo-install libtool-ltdl-2.2.6-15.5.el6.x86_64
(gdb)


At this point you are in the gdb prompt.  Use "bt" to output the backtrace of the error.

(gdb) bt
#0  0x00007f81896ac8a5 in raise (sig=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x00007f81896ae085 in abort () at abort.c:92
#2  0x00007f818bb8a56b in crm_abort (file=0x7f818bba9d58 "xml.c", function=0x7f818bbab6b4 "string2xml", line=650,
    assert_condition=0x7f818bbaa01a "String parsing error", do_core=, do_fork=) at utils.c:1073
#3  0x00007f818bb933af in string2xml (
    input=0x1e745f8 "#4  0x00007f818b76a2fc in lrmd_ipc_dispatch (buffer=, length=, userdata=0x1e72910) at lrmd_client.c:310
#5  0x00007f818bba2e90 in mainloop_gio_callback (gio=, condition=G_IO_IN, data=0x1e73be0) at mainloop.c:585
#6  0x00007f8188fbbf0e in g_main_dispatch (context=0x1d4f120) at gmain.c:1960
#7  IA__g_main_context_dispatch (context=0x1d4f120) at gmain.c:2513
#8  0x00007f8188fbf938 in g_main_context_iterate (context=0x1d4f120, block=1, dispatch=1, self=) at gmain.c:2591
#9  0x00007f8188fbfd55 in IA__g_main_loop_run (loop=0x1e734a0) at gmain.c:2799
#10 0x00000000004052ce in crmd_init () at main.c:154
#11 0x00000000004055cc in main (argc=1, argv=0x7fffe77a4f88) at main.c:120
(gdb)


This is the information we were looking for.

Thanks to Andrew Beekhof of  http://clusterlabs.org/ for pointing me in the right direction with GDB.