March 3, 2023

Installing kernel modules faster with multithread XZ

As a kernel developer, everyday I need to compile and install custom kernels, and any improvement in this workflow means to be more productive. While installing my fresh compiled modules, I noticed that it would be stuck in amdgpu compression for some time:

XZ      /usr/lib/modules/6.2.0-tonyk/kernel/drivers/gpu/drm/amd/amdgpu/amdgpu.ko.xz

XZ format

My target machine is the Steam Deck, that uses .xz for compressing the modules. Giving that we want gamers to be able to install as many games as possible, the OS shouldn’t waste much disk space. amdgpu, when compiled with debug symbols can use a good hunk of space. Here’s the comparison of disk size of the module uncompressed, and then with .zst and .xz compression:

360M amdgpu.ko
61M  amdgpu.ko.zst
38M  amdgpu.ko.xz

This more compact module comes with a cost: more CPU time for compression.

Multithread compression

When I opened htop, I saw that only a lonely thread was doing the hard work to compress amdgpu, even that compression is a task easily parallelizable. I then hacked scripts/Makefile.modinst so XZ would use as many threads as possible, with the option -T0. In my main build machine, modules_install was running 4 times faster!

# before the patch
$ time make modules_install -j16
Executed in  100.08 secs

# after the patch
$ time make  modules_install -j16
Executed in   28.60 secs

Then, I submitted a patch to make this default for everyone: [PATCH] kbuild: modinst: Enable multithread xz compression

However, as Masahiro Yamada noticed, we shouldn’t be spawning numerous threads in the build system without the user request. Until today we specify manually how many threads we should run with make -jX.

Hopefully, Nathan Chancellor suggested that the same results can be achieved using XZ_OPT=-T0, so we still can benefit from this without the patch. I experimented with different -TX and -jY values, but in my notebook the most efficient values were X = Y = nproc. You can check some results bellow:

$ make modules_install
174.83 secs

$ make modules_install -j8
100.55 secs

$ make modules_install XZ_OPT=-T0
81.51 secs

$ make modules_install -j8 XZ_OPT=-T0
53.22 sec

© André Almeida 2022
Licensed as CC BY 4.0

Powered by Hugo & Kiss.