Linux kernel source tree
Go to file
Alexander Graf 07d2490297 kexec: enable CMA based contiguous allocation
When booting a new kernel with kexec_file, the kernel picks a target
location that the kernel should live at, then allocates random pages,
checks whether any of those patches magically happens to coincide with a
target address range and if so, uses them for that range.

For every page allocated this way, it then creates a page list that the
relocation code - code that executes while all CPUs are off and we are
just about to jump into the new kernel - copies to their final memory
location.  We can not put them there before, because chances are pretty
good that at least some page in the target range is already in use by the
currently running Linux environment.  Copying is happening from a single
CPU at RAM rate, which takes around 4-50 ms per 100 MiB.

All of this is inefficient and error prone.

To successfully kexec, we need to quiesce all devices of the outgoing
kernel so they don't scribble over the new kernel's memory.  We have seen
cases where that does not happen properly (*cough* GIC *cough*) and hence
the new kernel was corrupted.  This started a month long journey to root
cause failing kexecs to eventually see memory corruption, because the new
kernel was corrupted severely enough that it could not emit output to tell
us about the fact that it was corrupted.  By allocating memory for the
next kernel from a memory range that is guaranteed scribbling free, we can
boot the next kernel up to a point where it is at least able to detect
corruption and maybe even stop it before it becomes severe.  This
increases the chance for successful kexecs.

Since kexec got introduced, Linux has gained the CMA framework which can
perform physically contiguous memory mappings, while keeping that memory
available for movable memory when it is not needed for contiguous
allocations.  The default CMA allocator is for DMA allocations.

This patch adds logic to the kexec file loader to attempt to place the
target payload at a location allocated from CMA.  If successful, it uses
that memory range directly instead of creating copy instructions during
the hot phase.  To ensure that there is a safety net in case anything goes
wrong with the CMA allocation, it also adds a flag for user space to force
disable CMA allocations.

Using CMA allocations has two advantages:

  1) Faster by 4-50 ms per 100 MiB. There is no more need to copy in the
     hot phase.
  2) More robust. Even if by accident some page is still in use for DMA,
     the new kernel image will be safe from that access because it resides
     in a memory region that is considered allocated in the old kernel and
     has a chance to reinitialize that component.

Link: https://lkml.kernel.org/r/20250610085327.51817-1-graf@amazon.com
Signed-off-by: Alexander Graf <graf@amazon.com>
Acked-by: Baoquan He <bhe@redhat.com>
Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com>
Cc: Zhongkun He <hezhongkun.hzk@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-08-02 12:01:38 -07:00
Documentation stackdepot: make max number of pools boot-time configurable 2025-08-02 12:01:38 -07:00
LICENSES LICENSES: add CC0-1.0 license text 2025-05-21 14:54:17 +02:00
arch kexec: enable CMA based contiguous allocation 2025-08-02 12:01:38 -07:00
block block-6.16-20250626 2025-06-27 09:02:33 -07:00
certs sign-file,extract-cert: use pkcs11 provider for OPENSSL MAJOR >= 3 2024-09-20 19:52:48 +03:00
crypto lib/raid6: replace custom zero page with ZERO_PAGE 2025-07-09 22:57:54 -07:00
drivers relayfs: abolish prev_padding 2025-07-09 22:57:51 -07:00
fs ocfs2: avoid potential ABBA deadlock by reordering tl_inode lock 2025-07-19 19:08:27 -07:00
include kexec: enable CMA based contiguous allocation 2025-08-02 12:01:38 -07:00
init init/Kconfig: restore CONFIG_BROKEN help text 2025-08-02 12:01:37 -07:00
io_uring io_uring-6.16-20250630 2025-06-30 16:32:43 -07:00
ipc - The 3 patch series "hung_task: extend blocking task stacktrace dump to 2025-05-31 19:12:53 -07:00
kernel kexec: enable CMA based contiguous allocation 2025-08-02 12:01:38 -07:00
lib stackdepot: make max number of pools boot-time configurable 2025-08-02 12:01:38 -07:00
mm vfs-6.16-rc5.fixes 2025-07-04 09:06:49 -07:00
net Including fixes from Bluetooth. 2025-07-03 09:18:55 -07:00
rust Driver core fixes for 6.16-rc3 2025-06-18 14:31:16 -07:00
samples samples: enhance hung_task detector test with read-write semaphore support 2025-07-19 19:08:26 -07:00
scripts coccinelle: misc: secs_to_jiffies: implement context and report modes 2025-07-19 19:08:25 -07:00
security selinux: change security_compute_sid to return the ssid or tsid on match 2025-06-19 16:13:16 -04:00
sound ALSA: hda/realtek: Fix built-in mic on ASUS VivoBook X507UAR 2025-06-26 08:02:44 +02:00
tools delaytop: add psi info to show system delay 2025-07-19 19:08:29 -07:00
usr usr/include: openrisc: don't HDRTEST bpf_perf_event.h 2025-05-12 15:03:17 +09:00
virt Merge branch 'kvm-lockdep-common' into HEAD 2025-05-28 06:29:17 -04:00
.clang-format Linux 6.15-rc5 2025-05-06 16:39:25 +10:00
.clippy.toml rust: clean Rust 1.88.0's warning about `clippy::disallowed_macros` configuration 2025-05-07 00:11:47 +02:00
.cocciconfig
.editorconfig .editorconfig: remove trim_trailing_whitespace option 2024-06-13 16:47:52 +02:00
.get_maintainer.ignore MAINTAINERS: Retire Ralf Baechle 2024-11-12 15:48:59 +01:00
.gitattributes .gitattributes: set diff driver for Rust source code files 2023-05-31 17:48:25 +02:00
.gitignore .gitignore: ignore Python compiled bytecode 2025-04-24 10:12:46 -06:00
.mailmap mailmap: update Sachin Mokashi's email address 2025-07-09 22:57:56 -07:00
.pylintrc docs: add a .pylintrc file with sys path for docs scripts 2025-04-09 12:10:33 -06:00
.rustfmt.toml
COPYING
CREDITS CREDITS: Add entry for Shannon Nelson 2025-06-21 07:34:28 -07:00
Kbuild drm: ensure drm headers are self-contained and pass kernel-doc 2025-02-12 10:44:43 +02:00
Kconfig io_uring: Rename KConfig to Kconfig 2025-02-19 14:53:27 -07:00
MAINTAINERS MAINTAINERS: add lib/raid6/ to "SOFTWARE RAID" 2025-07-09 22:57:55 -07:00
Makefile Linux 6.16-rc5 2025-07-06 14:10:26 -07:00
README README: Fix spelling 2024-03-18 03:36:32 -06:00

README

Linux kernel
============

There are several guides for kernel developers and users. These guides can
be rendered in a number of formats, like HTML and PDF. Please read
Documentation/admin-guide/README.rst first.

In order to build the documentation, use ``make htmldocs`` or
``make pdfdocs``.  The formatted documentation can also be read online at:

    https://www.kernel.org/doc/html/latest/

There are various text files in the Documentation/ subdirectory,
several of them using the reStructuredText markup notation.

Please read the Documentation/process/changes.rst file, as it contains the
requirements for building and running the kernel, and information about
the problems which may result by upgrading your kernel.