Linux Kernel 6.6 Hardening

After building a number of Linux kernels for use in embedded systems emphasizing a layered security approach, I have decided that I’d like to share my knowledge and recommendations on a hardened build configuration with the world. My intent is to draw attention to importat build options that you should consider for use in your own projects.

Not all options are available on all architectures. I will address the ones I’ve seen configuring the kernel for the popular x86_64 and arm64 architectures. I’ll also make an effort to keep each entry in this post short and to the point because there’s quite a few security-relevent options.

Finally, I want to remind the reader that a good security posture involves considering basic security principles in each layer of your solution. It should go without saying that you should disable kernel features you don’t need to reduce the attack surface. Further, a secure kernel won’t do you any good if kernel compromise isn’t an important part of your threat model. For example, a hardened kernel won’t protect your web application database from injection attacks, nor will it make much difference if your web app is running as uncontained, unconstrained root. Resisting a local priviledge escalation as part of an exploit chain? That, a hardened kernel can do.

I hope that you find this listing useful!

CONFIG_LOCALVERSION

Please, please label your kernel. Include your product name. With so many configuration options and likely you have merged in code for some board support package (BSP), there is no such thing as a “standard” kernel. Please add an identifier to set the context for future developers, even if you don’t enable access to /proc/config.gz. It costs nothing.

CONFIG_CROSS_MEMORY_ATTACH

Disable this feature unless you plan to run a debugger on the system. This allows priviledged processes to directly access the memory assigned to another process without its permission. This is useful to attackers for a number of reasons such as for code injection, reflective loading, or more generally manipulate the behavior of a process without leaving evidence behind on the filesystem.

Note that this is one means of attaching to a process. Ptrace for example would still available. This configuration option only controls the presense of process_vm_readv and friends.

CONFIG_BPF_SYSCALL

Controls whether we should build the BPF feature. BPF programs run directly in the kernel and are roughly equivalent to kernel-mode code injection. Applications should not be allowed to load kernel code unless you have a really good reason because it’s a clear path to potential kernel compromise.

This feature was originally used to accelerate packet filtering applications, but today some applications like web browsers and OpenSSH can use the BPF extension of seccomp to restrict permissable syscalls to reduce the kernel attack surface.

Run cat /proc/*/status > statuses and search the resultant file for the string “Seccomp_filters” to see which processes are using BPF filters for enhanced security. General purpose systems should leave BPF enabled. Otherwise, the feature should be disabled.

CONFIG_BPF_UNPRIV_DEFAULT_OFF

If you enable BPF, then also enable this option to restrict access to privledged processess only. Most daemons will configure BPF and then drop priviledges, and therefore won’t be affected by this restriction.

CONFIG_PREEMPT

This option enables a fully preemtable kernel, keeping critical sections to a minimum, at the expense of lower throughput. This option provides lower kernel latency when selected because the kernel responds to events faster and more consistently. However, a highly relaxed preempt-me-anytime model may expose timing-dependent kernel bugs. This option should be disabled unless you need the lowest possible latency.

CONFIG_NAMESPACES

Namespacing is an important isolation and incidentally a security feature used by container software like Flatpak, LXD and Docker. However, if you aren’t using these tools, then there’s no reason to enable namespace support. Check for processes using namespace with lsns as root for a complete listing.

CONFIG_KALLSYMS_ALL

This configuration parameter forces all symbols to be available. Unless you really need this for some specialized reason (livepatch, debugging), disable because there’s a a minor security benefit to conceal the locations of internal symbols that aren’t explicitly exported.

CONFIG_IO_URING

Disable this very popular attack surface. It’s rarely used, the apps that use it fallback to normal file IO, and the majority of kernel vulnerabilities in 2022 relied on flaws in this interface. Disable to remove this obvious bug farm.

CONFIG_PROFILING

Disable because it’s unlikely that you’re interested in profiling the kernel.

CONFIG_KEXEC

Past implementations of the kexec feature failed to authenticated the new code, which undermines the guarantees that secure boot and friends try to provide. kexec is also subject to hardware initialization problems.

CONFIG_64BIT

It’s strongly recommended that you compile your kernel in 64-bit mode if the target hardware supports this. The wider address space makes ASLR and KASLR much more effective, which raises the bar on attacks who wish to perform code-reuse attacks.

CONFIG_X86_UMIP

The kernel documentation explicitly labels this as an important security feature that disables instructions that “unnecessarily reveal too much information about the hardware state.” It should always be enabled.

CONFIG_X86_KERNEL_IBT

This adds ENDBR instructions that frustrate code reuse attacks in the kernel by forcing all indirect branches to terminate at a pre-approved location on the newest x86 processors. This feature should always be enabled unless you know your CPU doesn’t implement ENDBR instructions or you’re using proprietary binary drivers that are missing the requisite ENDBR landing zones. These would cause a trap at runtime.

CONFIG_X86_INTEL_TSX_MODE_OFF

This configuration should always be set to disable TSX instructions. TSX isn’t used aside from special purpose-build experimental applications, but TSX absolutely provides an information side-channel that can be used to compromise a system.

CONFIG_X86_USER_SHADOW_STACK

Enable this feature to increase the difficulty of exploiting memory corruption in userspace applications. This only works for applications compiled to use this feature in advance, and it’s a new feature on x86 CPUs. I typically disable this feature because it requires userspace to cooperate for the setting to be meaningful, and I don’t know of apps that do this by default.

CONFIG_RELOCATABLE

This is an essential option that must be enabled to have strong KASLR, and important defense against code reuse attacks in the kernel.

CONFIG_RANDOMIZE_BASE

CONFIG_RANDOMIZE_BASE is the actual switch that enables KASLR and is absolutely an essential exploit mitigation. Disable this configuration option only if security is not a priority for your kernel. A randomized kernel base requires attackers to discover the location of the kernel in memory.

CONFIG_RANDOMIZE_MEMORY

For the same reason as above, this feature should be enabled to properly and completely randomize the kernel memory layout to the maximum extent possible.

CONFIG_LIVEPATCH

Livepatch is a feature that allows fixing kernel bugs without restarting the running kernel. Because this feature can be used to modify the running kernel, it should be disabled to maximize kernel integrity, unless of course you really need this feature in your application. I recommend that this is disabled for a security-focused kernel build.

CONFIG_PAGE_TABLE_ISOLATION

When a user process runs, its address space might include parts of the Linux kernel. This isn’t normally a problem because that kernel code cannot be modified by a normal application, but it’s a potential side channel attack and can be used to perform speculation in the kernel, which must be denied in a secure system. Enable this feature.

CONFIG_RETPOLINE

This is the standard mitigation protecting against speculative execution attacks by preventing the CPU from speculating on kernel data in such a way that it might be revealed to userspace via timing attacks. This feature should always be enabled, unless you know that a better mitigation exists for the CPU or that you know your processor is not affected.

CONFIG_SLS

This is a recommended security feature that guards against straight-line speculation attacks. It isn’t clear at this time from the literature how much kernel developers should be worried about such attacks.

CONFIG_SECCOMP

Enable this flag to provide access to seccomp. As discussed previously, this feature limits the ways that a process can interact with the kernel, reducing the exposure of potential security flaws. Many popular applications make use of this feature. It should be enabled unless you know you aren’t using any seccomp-enabled apps.

CONFIG_STACKPROTECTOR

This is the original mitigation of the classic stack buffer overflow. Decades old, this mitigation still succeeds at preventing simple bugs or at least raising the bar for exploitation for a virtually insignificant runtime cost. Please enable.

CONFIG_STACKPROTECTOR_STRONG

Apply Stack Protector to more cases. This is a good idea and should always been enabled for most builds.

CONFIG_MODULES

Preventing kernel modification includes both positive, intended modifications such as loading kernel modules and negative, unintended changes performed in exploitation. For the ultimate securty, compile in all modules the system requires and forbid module loading or kernel patching/debugging of any kind. On most practical systems, I usually enable modules.

CONFIG_SLAB_FREELIST_RANDOM

SLAB is a generic name for the kernel memory allocator. This configuration option increases the randomization of the kernel heap, which helps to frustrate kernel heap corruption attacks which rely on predictable memory layout.

CONFIG_SLAB_FREELIST_HARDENED

Enable this option to add mitigations to resist corruption of the memory allocator structures. This low-cost option should always be enabled in security builds.

CONFIG_RANDOM_KMALLOC_CACHES

This option changes the way the cache works in the kernel memory allocator to resist heap spraying techiniques common to kernel heap corruption-based exploitation. This option should be enabled.

CONFIG_USERFAULTFD

Disable this option to prevent building the kernel with support for userfaultfd, a rarely-used syscall that is essential to exploiting some kernel race conditions by giving control of faults to a userspace process upon request. QEMU VM live migration is the only use case for this syscall that I know.

CONFIG_SECURITY_DMESG_RESTRICT

Normally, all users can run dmesg to access to system log. Blocking access to the log in the kernel for all but users with admin priviledges makes sense, especially because the log can contain kernel pointers or leak information about the kernel’s state. It’s a good idea to enable this feature to limit dmesg to priviledged users.

CONFIG_HARDENED_USERCOPY

Some kernel exploits rely on tricking the kernel into accessing userspace data when it should only be accessing kernel data. This mitigation adds extra steps to verify the requested copy operation and is very effective at squashing many types of kernel bugs. It should always be enabled.

CONFIG_FORTIFY_SOURCE

CONFIG_FORTIFY_SOURCE automatically adds length checks to some operations, reducing the opportunities for buffer overflows to occur. It is a classic buffer overflow mitigation and should always be enabled.

CONFIG_SECURITY_YAMA

The name of this configuration option suggests little about its behavior, but right now it limits access to ptrace to child processes. This is very important to prevent code injection and lateral movement into other processes and should always be enabled.

CONFIG_INIT_ON_ALLOC_DEFAULT_ON

Some kernel bugs rely on using memory before it has been initialized, leaving those values to be potentially anything and possibly attacker controlled. This option forces all allocations to be zeroized before use. Because memory is usually written to after allocation, adding an extra write doesn’t impose much yet prevents some types of attacks on the kernel. Enable this feature.

CONFIG_INIT_ON_FREE_DEFAULT_ON

In addition to zeroing allocated memory, memory can be zeroized before it’s freed in the kernel. This incurs significant additional cost because we don’t have an excuse to pay the initial access penalty–that the memory is not about to be accessed anyway as it was for CONFIG_INIT_ON_ALLOC_DEFAULT_ON. It’s more secure to enable this feature, but I generally choose to disable it because of the performance penality.

CONFIG_ZERO_CALL_USED_REGS

After exploiting a kernel bug, in the presense of the default NX protection on supported hardware, one must usually use ROP or JOP to build a weird machine to perform malicious tasks. This configuration option should be enabled to eliminate many useful gadgets in building such a weird machine. It works by zeroing any registers used as temporaries in a function before it exits. It also prevents some types of information leaks.

CONFIG_LIST_HARDENED

Enable this feature and the associated BUG on instance option to cause the kernel to stop when some kinds of linked list corruption are detected. This reduces viable attacks on the kernel.

CONFIG_FTRACE

ftrace presents yet another oppotunity to change the behavior or timing characteristics of the kernel. It should be disallowed unless you know that you need this feature to reduce exposure to potential bugs.

CONFIG_STRICT_DEVMEM

This options checks access to the /dev/mem device to protect it from accidential or intentional access to data that isn’t a hardware device. This protects against bugs and possibly their exploitation. Consider the companion option CONFIG_IO_STRICT_DEVMEM to further restrict access to memory-mapped devices.

CONFIG_ARM64_SW_TTBR0_PAN

PAN (Priviledged Access Never) is similar to CONFIG_HARDENED_USERCOPY in that it reduces the oppotunity for the kernel to access low-privledged data unintentionally and represents an additional obstacle to exploit developers. This feature should be enabled.

CONFIG_DEFAULT_MMAP_MIN_ADDR

Some errant kernel pointers point to low addresses, such as when a null pointer was manipulated via pointer arithmetic. This feature allows low memory to be blocked off from allocation so such errant pointers can be detected when access is attempted before attacker-controlled data is potentially loaded. It should be set to 65535 on almost all platforms.

CONFIG_DEBUG_FS

debugfs is the wild west of exported driver information. It should be disabled because this feature was instrumental to many exploits on Android, and the contents aren’t strictly policed for security.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

Recents from Henfred