mirror of https://github.com/torvalds/linux.git
Patch series "mm/khugepaged: fix dirty page handling for MADV_COLLAPSE",
v5.
MADV_COLLAPSE on file-backed mappings fails with -EINVAL when TEXT pages
are dirty. This affects scenarios like package/container updates or
executing binaries immediately after writing them, etc.
The issue is that collapse_file() triggers async writeback and returns
SCAN_FAIL (maps to -EINVAL), expecting khugepaged to revisit later. But
MADV_COLLAPSE is synchronous and userspace expects immediate success or
a clear retry signal.
Reproduction:
- Compile or copy 2MB-aligned executable to XFS/ext4 FS
- Call MADV_COLLAPSE on .text section
- First call fails with -EINVAL (text pages dirty from copy)
- Second call succeeds (async writeback completed)
Issue Report:
https://lore.kernel.org/all/4e26fe5e-7374-467c-a333-9dd48f85d7cc@amd.com
This patch (of 2):
When collapse_file encounters dirty or writeback pages in file-backed
mappings, it currently returns SCAN_FAIL which maps to -EINVAL. This is
misleading as EINVAL suggests invalid arguments, whereas dirty/writeback
pages represent transient conditions that may resolve on retry.
Introduce SCAN_PAGE_DIRTY_OR_WRITEBACK to cover both dirty and writeback
states, mapping it to -EAGAIN. For MADV_COLLAPSE, this provides userspace
with a clear signal that retry may succeed after writeback completes. For
khugepaged, this is harmless as it will naturally revisit the range during
periodic scans after async writeback completes.
Link: https://lkml.kernel.org/r/20260118190939.8986-2-shivankg@amd.com
Link: https://lkml.kernel.org/r/20260118190939.8986-4-shivankg@amd.com
Fixes:
|
||
|---|---|---|
| .. | ||
| events | ||
| misc | ||
| stages | ||
| bpf_probe.h | ||
| define_custom_trace.h | ||
| define_trace.h | ||
| perf.h | ||
| syscall.h | ||
| trace_custom_events.h | ||
| trace_events.h | ||