[sf-lug] Debugging system issues...

Thu Jan 31 15:20:16 PST 2008

This is from a guy I went to school with, and a good friend of mine,
Peter Petrakis (he lives in my old apartment).  Peter is an expert is
debugging system issues.  He works for Stratus, which makes the
highest availability Linux servers you can purchase in the world.  You
can blow up the server with a bomb, set it on fire, pop a drive out,
snap the CPU pins while it is running, and you will still be sure your
data/transactions continue in real-time.  It is amazing technology,
and he can't tell me everything that they do or how they accomplish
all of it, for NDA reasons.  However, he is an expert on debugging
systems and I present you an email correspondence from last year...

---------- Forwarded message ----------
From: Peter Petrakis <peter.petrakis at gmail.com>
Date: Jun 7, 2007 3:53 PM
Subject: Re: Kernel Module Compilation and debugging..
To: Kristian Hermansen <kristian.hermansen at gmail.com>
Cc: vinay.perneti at gmail.com

Hi,

On 6/7/07, Kristian Hermansen <kristian.hermansen at gmail.com> wrote:
> I'm trying to hack the nfs modules in the pNFS 2.6.17 custom kernel. I
> added a couple of printk's for debugging purposes. Do I have to
> rebuild the whole kernel or is there a way to just rebuild a specific
> module?

You can just rebuild the module. Check out it's makefile. If you want
to do the work outside the kernel tree. Just trace the files the
Makefile sources for variables, make your own with those defs, and
edit the Makefile to use that instead. Making your own svn repo to
track your changes might be a good idea too.

> Also, are there any tools for debugging kernel modules?

0) diskdump
1) crash
2) systemtap
3) kprobes

#1 being the most effective. You can debug core dumps using crash and
you can run crash on the live kernel, inspecting the current state of
things, even changing variables. It's not very user friendly but once
you get the hang you'll be all set.

> Is it possible to step through the code or is there any way to see the
> execution flow?

Possible, yes, effective, possibly :-) Systemtap/kprobes can trap on
the entrance and exit of a function call but you have to write a
system script script that:
  0) gets the name of the function (traps the entrance)
  1) evals it's variables
  2)  trap the post of the function.

while it's not invasive, you'll write more LOC in this script than you
would just sprinkling printks in the function calls in the kernel
you're interested. Now it makes sense to use systemtap to trace stuff
you'll not directly working on but for your own module. It's just more
efficient to write printk("(%s):(%d):INFO...\n",__FUNCTION__,
__LINE__);

This also gets alittle easier if you're running something like
RHEL-5.0 or Fedora. diskdump and crash are configured for you. It just
so happens to be based off of 2.6.16 so all your debugging symbols are
built for use with crash. If you haven't picked up a copy of Linux
Device Drivers 3rd Edition, well, get it. You can also view it online
and download it in pdf for free from ORA's website.

When it comes to hacking the Linux kernel, understanding the code is
more important than any tool.

Peter

http://people.redhat.com/anderson/.crash_whitepaper/index.html
http://fedoraproject.org/wiki/SystemTap
http://sourceware.org/systemtap/kprobes/

> Thanks,
>
> -Vinay
>
> --
> Kristian Hermansen
>

--
www.alphalinux.org
del.icio.us/peter.petrakis

-- 
Kristian Erik Hermansen
"Know something about everything and everything about something."