Linux kernel debugging

Setting up a VM

Rebooting a physical machine to load a new kernel is a waste of time, so it's better to use a VM. I found it easiest to install Arch Linux inside a VirtualBox VM.

Once the initial setup is done, you'll want to install the base-devel group, which includes (among other useful things) gcc, make, and sudo. It's helpful to set up a user that can sudo:

1
2
3
useradd -m -G wheel u
passwd u
visudo

Then you can turn on port forwarding for SSH and run the VM headless. Inside the <Network><Adapter> element, add

1
2
3
<NAT>
  <Forwarding name="SSH" proto="1" hostport="2222" guestport="22"/>
</NAT>

If your terminal is wonky (e.g. backspace doesn't render correctly),

1
export TERM=xterm

ought to fix that right up.

Snapshots

The VBoxManage snapshot command is very handy to restore the VM to a previous working condition. For example, make a snapshot before rebooting into a new kernel in case it won't boot.

Building the kernel

We'll build the Linux kernel the same way that the one in the linux package is built. For this, we need the PKGBUILD file, associated patches, and other files. The simplest way to obtain them is with the asp utility:

1
asp export core/linux

This will create a linux directory.

Running

1
makepkg -so

in this directory the first time should fail, because the signatures can't be verified. Verify and import the signing keys using

1
gpg --recv-keys A1B2... C3D4...

and rerun makepkg -so, which will prepare the sources in the src/linux-... directory.

If you've ever built the kernel from source, you'll know that configuring it is half the fun. Thus, we're going to skip as much manual configuration as possible. The other half the fun is waiting for the kernel to build, which means we're going to use a minimal configuration to minimize the build time. To do this, you should modprobe all the modules you'll want to use with the new kernel. Then

1
make localmodconfig

will disable any modules in the configuration which are not currently loaded. I've found that localmodconfig doesn't always work perfectly, but when it does work, it's extremely helpful. It may be tempting to first make olddefconfig to bring the .config file up to date, but this seems to reliably break localmodconfig. If you find yourself needing to tweak some kernel settings,

1
make menuconfig

is your friend.

Then build and install the new kernel with

1
makepkg -efi

This will take a while. Run it each time you make a change, and then reboot the VM to boot from the updated kernel. Subsequent builds should take only a fraction of the time.

Starting from userspace

To talk to the kernel, you'll need an entry point from userspace. You can use strace (from the strace package) for this.

For example,

1
strace ls /test/path

might result in

1
2
3
...
open("/test/path", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = -1 EACCES (Permission denied)
...

Then you could write a wrapper around this failing call to try to understand the problem better:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <string.h>

int main(int argc, char **argv) {
    int result = open(argv[1], O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC);
    int errsv = errno;

    printf("%d %d %s\n", result, errsv, strerror(errsv));

    return 0;
}

Making the kernel speak

The kernel is in a privileged position, making it more challenging to debug than programs in userspace. After all, how do you step forward in the kernel if the kernel itself is paused and can't pass your key presses to the debugger process? There are out-of-band solutions for this, but it's often easier to start by printing useful data.

printk

The bluntest tool in the toolbox is printk. It's a lot like printf, but the format string is prefixed by a log level:

1
printk(KERN_WARNING "%s %s\n", name, root);

This will cause the desired message to be printed to the kernel buffer, which you can access using dmesg. If you're using dmesg with color output, messages containing a colon (:) will be colored specially.

dump_stack

Often it's not clear where a piece of kernel code is being called from. In that case, calling

1
dump_stack();

will output an entire stack trace, like

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
dump_stack+0x63/0x83
ovl_dir_open+0x37/0x120 [overlay]
do_dentry_open+0x205/0x2e0
? ovl_dir_fsync+0x140/0x140 [overlay]
vfs_open+0x4c/0x70
path_openat+0x282/0x1170
? unlock_page_memcg+0x29/0x60
? page_add_file_rmap+0x5b/0x140
? filemap_map_pages+0x233/0x410
do_filp_open+0x91/0x100
? __alloc_fd+0xc9/0x180
do_sys_open+0x147/0x210
SyS_open+0x1e/0x20
entry_SYSCALL_64_fastpath+0x1a/0xa4

The text in square brackets indicates the module containing the function. According to a Stack Overflow answer, the question marks indicate that those entries are unreliable.

Finding symbols

To find where a symbol is defined, the Elixir tool from Free Electrons is a blessing. For example, we can see that in Linux 4.14.11, ovl_dir_fsync is a function defined in /fs/overlayfs/readdir.c.

Memory allocation

Sometimes you'll need to allocate more memory, for example to call dentry_path_raw. This can be done using __get_free_page. You should then free the page using free_page so you don't leak memory. For example,

1
2
3
4
char *buf = (char*)__get_free_page(GFP_USER);
char *p = dentry_path_raw(filp->f_path.dentry, buf, PAGE_SIZE);
printk(KERN_WARNING "path: %s\n", p);
free_page((unsigned long)buf);