mirror of https://github.com/torvalds/linux.git
1302 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
e2ee2e9b15 |
KVM/arm64 updates for 6.14
* New features:
- Support for non-protected guest in protected mode, achieving near
feature parity with the non-protected mode
- Support for the EL2 timers as part of the ongoing NV support
- Allow control of hardware tracing for nVHE/hVHE
* Improvements, fixes and cleanups:
- Massive cleanup of the debug infrastructure, making it a bit less
awkward and definitely easier to maintain. This should pave the
way for further optimisations
- Complete rewrite of pKVM's fixed-feature infrastructure, aligning
it with the rest of KVM and making the code easier to follow
- Large simplification of pKVM's memory protection infrastructure
- Better handling of RES0/RES1 fields for memory-backed system
registers
- Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
from a pretty nasty timer bug
- Small collection of cleanups and low-impact fixes
-----BEGIN PGP SIGNATURE-----
iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmeYqJcQHHdpbGxAa2Vy
bmVsLm9yZwAKCRC3rHDchMFjNLUhCACxUTMVQXhfW3qbh0UQxPd7XXvjI+Hm7SPS
wDuVTle4jrFVGHxuZqtgWLmx8hD7bqO965qmFgbevKlwsRY33onH2nbH4i4AcwbA
jcdM4yMHZI4+Qmnb4G5ZJ89IwjAhHPZTBOV5KRhyHQ/qtRciHHtOgJde7II9fd68
uIESg4SSSyUzI47YSEHmGVmiBIhdQhq2qust0m6NPFalEGYstPbpluPQ6R1CsDqK
v14TIAW7t0vSPucBeODxhA5gEa2JsvNi+sqA+DF/ELH2ZqpkuR7rofgMGblaXCSD
JXa5xamRB9dI5zi8vatwfOzYlog+/gzmPqMh/9JXpiDGHxJe0vlz
=tQ8F
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull KVM/arm64 updates from Will Deacon:
"New features:
- Support for non-protected guest in protected mode, achieving near
feature parity with the non-protected mode
- Support for the EL2 timers as part of the ongoing NV support
- Allow control of hardware tracing for nVHE/hVHE
Improvements, fixes and cleanups:
- Massive cleanup of the debug infrastructure, making it a bit less
awkward and definitely easier to maintain. This should pave the way
for further optimisations
- Complete rewrite of pKVM's fixed-feature infrastructure, aligning
it with the rest of KVM and making the code easier to follow
- Large simplification of pKVM's memory protection infrastructure
- Better handling of RES0/RES1 fields for memory-backed system
registers
- Add a workaround for Qualcomm's Snapdragon X CPUs, which suffer
from a pretty nasty timer bug
- Small collection of cleanups and low-impact fixes"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (87 commits)
arm64/sysreg: Get rid of TRFCR_ELx SysregFields
KVM: arm64: nv: Fix doc header layout for timers
KVM: arm64: nv: Apply RESx settings to sysreg reset values
KVM: arm64: nv: Always evaluate HCR_EL2 using sanitising accessors
KVM: arm64: Fix selftests after sysreg field name update
coresight: Pass guest TRFCR value to KVM
KVM: arm64: Support trace filtering for guests
KVM: arm64: coresight: Give TRBE enabled state to KVM
coresight: trbe: Remove redundant disable call
arm64/sysreg/tools: Move TRFCR definitions to sysreg
tools: arm64: Update sysreg.h header files
KVM: arm64: Drop pkvm_mem_transition for host/hyp donations
KVM: arm64: Drop pkvm_mem_transition for host/hyp sharing
KVM: arm64: Drop pkvm_mem_transition for FF-A
KVM: arm64: Explicitly handle BRBE traps as UNDEFINED
KVM: arm64: vgic: Use str_enabled_disabled() in vgic_v3_probe()
arm64: kvm: Introduce nvhe stack size constants
KVM: arm64: Fix nVHE stacktrace VA bits mask
KVM: arm64: Fix FEAT_MTE in pKVM
Documentation: Update the behaviour of "kvm-arm.mode"
...
|
|
|
|
9c5968db9e |
The various patchsets are summarized below. Plus of course many
indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec. - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones. - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest. - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code. - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups. - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code. - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c. - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator. - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading. - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated (https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/). Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED). - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled. - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL. - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests. - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size. - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic. - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated. - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated. - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed. - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic. - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy. - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic. - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions. - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed. - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting. - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface. - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior. - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi "introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors." - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram. - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal. - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance. - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation. - "Cleanup for memfd_create()" from Isaac Manjarres does that thing. - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration. - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices. - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZ5a+cwAKCRDdBJ7gKXxA jtoyAP9R58oaOKPJuTizEKKXvh/RpMyD6sYcz/uPpnf+cKTZxQEAqfVznfWlw/Lz uC3KRZYhmd5YrxU4o+qjbzp9XWX/xAE= =Ib2s -----END PGP SIGNATURE----- Merge tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "The various patchsets are summarized below. Plus of course many indivudual patches which are described in their changelogs. - "Allocate and free frozen pages" from Matthew Wilcox reorganizes the page allocator so we end up with the ability to allocate and free zero-refcount pages. So that callers (ie, slab) can avoid a refcount inc & dec - "Support large folios for tmpfs" from Baolin Wang teaches tmpfs to use large folios other than PMD-sized ones - "Fix mm/rodata_test" from Petr Tesarik performs some maintenance and fixes for this small built-in kernel selftest - "mas_anode_descend() related cleanup" from Wei Yang tidies up part of the mapletree code - "mm: fix format issues and param types" from Keren Sun implements a few minor code cleanups - "simplify split calculation" from Wei Yang provides a few fixes and a test for the mapletree code - "mm/vma: make more mmap logic userland testable" from Lorenzo Stoakes continues the work of moving vma-related code into the (relatively) new mm/vma.c - "mm/page_alloc: gfp flags cleanups for alloc_contig_*()" from David Hildenbrand cleans up and rationalizes handling of gfp flags in the page allocator - "readahead: Reintroduce fix for improper RA window sizing" from Jan Kara is a second attempt at fixing a readahead window sizing issue. It should reduce the amount of unnecessary reading - "synchronously scan and reclaim empty user PTE pages" from Qi Zheng addresses an issue where "huge" amounts of pte pagetables are accumulated: https://lore.kernel.org/lkml/cover.1718267194.git.zhengqi.arch@bytedance.com/ Qi's series addresses this windup by synchronously freeing PTE memory within the context of madvise(MADV_DONTNEED) - "selftest/mm: Remove warnings found by adding compiler flags" from Muhammad Usama Anjum fixes some build warnings in the selftests code when optional compiler warnings are enabled - "mm: don't use __GFP_HARDWALL when migrating remote pages" from David Hildenbrand tightens the allocator's observance of __GFP_HARDWALL - "pkeys kselftests improvements" from Kevin Brodsky implements various fixes and cleanups in the MM selftests code, mainly pertaining to the pkeys tests - "mm/damon: add sample modules" from SeongJae Park enhances DAMON to estimate application working set size - "memcg/hugetlb: Rework memcg hugetlb charging" from Joshua Hahn provides some cleanups to memcg's hugetlb charging logic - "mm/swap_cgroup: remove global swap cgroup lock" from Kairui Song removes the global swap cgroup lock. A speedup of 10% for a tmpfs-based kernel build was demonstrated - "zram: split page type read/write handling" from Sergey Senozhatsky has several fixes and cleaups for zram in the area of zram_write_page(). A watchdog softlockup warning was eliminated - "move pagetable_*_dtor() to __tlb_remove_table()" from Kevin Brodsky cleans up the pagetable destructor implementations. A rare use-after-free race is fixed - "mm/debug: introduce and use VM_WARN_ON_VMG()" from Lorenzo Stoakes simplifies and cleans up the debugging code in the VMA merging logic - "Account page tables at all levels" from Kevin Brodsky cleans up and regularizes the pagetable ctor/dtor handling. This results in improvements in accounting accuracy - "mm/damon: replace most damon_callback usages in sysfs with new core functions" from SeongJae Park cleans up and generalizes DAMON's sysfs file interface logic - "mm/damon: enable page level properties based monitoring" from SeongJae Park increases the amount of information which is presented in response to DAMOS actions - "mm/damon: remove DAMON debugfs interface" from SeongJae Park removes DAMON's long-deprecated debugfs interfaces. Thus the migration to sysfs is completed - "mm/hugetlb: Refactor hugetlb allocation resv accounting" from Peter Xu cleans up and generalizes the hugetlb reservation accounting - "mm: alloc_pages_bulk: small API refactor" from Luiz Capitulino removes a never-used feature of the alloc_pages_bulk() interface - "mm/damon: extend DAMOS filters for inclusion" from SeongJae Park extends DAMOS filters to support not only exclusion (rejecting), but also inclusion (allowing) behavior - "Add zpdesc memory descriptor for zswap.zpool" from Alex Shi introduces a new memory descriptor for zswap.zpool that currently overlaps with struct page for now. This is part of the effort to reduce the size of struct page and to enable dynamic allocation of memory descriptors - "mm, swap: rework of swap allocator locks" from Kairui Song redoes and simplifies the swap allocator locking. A speedup of 400% was demonstrated for one workload. As was a 35% reduction for kernel build time with swap-on-zram - "mm: update mips to use do_mmap(), make mmap_region() internal" from Lorenzo Stoakes reworks MIPS's use of mmap_region() so that mmap_region() can be made MM-internal - "mm/mglru: performance optimizations" from Yu Zhao fixes a few MGLRU regressions and otherwise improves MGLRU performance - "Docs/mm/damon: add tuning guide and misc updates" from SeongJae Park updates DAMON documentation - "Cleanup for memfd_create()" from Isaac Manjarres does that thing - "mm: hugetlb+THP folio and migration cleanups" from David Hildenbrand provides various cleanups in the areas of hugetlb folios, THP folios and migration - "Uncached buffered IO" from Jens Axboe implements the new RWF_DONTCACHE flag which provides synchronous dropbehind for pagecache reading and writing. To permite userspace to address issues with massive buildup of useless pagecache when reading/writing fast devices - "selftests/mm: virtual_address_range: Reduce memory" from Thomas Weißschuh fixes and optimizes some of the MM selftests" * tag 'mm-stable-2025-01-26-14-59' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm/compaction: fix UBSAN shift-out-of-bounds warning s390/mm: add missing ctor/dtor on page table upgrade kasan: sw_tags: use str_on_off() helper in kasan_init_sw_tags() tools: add VM_WARN_ON_VMG definition mm/damon/core: use str_high_low() helper in damos_wmark_wait_us() seqlock: add missing parameter documentation for raw_seqcount_try_begin() mm/page-writeback: consolidate wb_thresh bumping logic into __wb_calc_thresh mm/page_alloc: remove the incorrect and misleading comment zram: remove zcomp_stream_put() from write_incompressible_page() mm: separate move/undo parts from migrate_pages_batch() mm/kfence: use str_write_read() helper in get_access_type() selftests/mm/mkdirty: fix memory leak in test_uffdio_copy() kasan: hw_tags: Use str_on_off() helper in kasan_init_hw_tags() selftests/mm: virtual_address_range: avoid reading from VM_IO mappings selftests/mm: vm_util: split up /proc/self/smaps parsing selftests/mm: virtual_address_range: unmap chunks after validation selftests/mm: virtual_address_range: mmap() without PROT_WRITE selftests/memfd/memfd_test: fix possible NULL pointer dereference mm: add FGP_DONTCACHE folio creation flag mm: call filemap_fdatawrite_range_kick() after IOCB_DONTCACHE issue ... |
|
|
|
44d46b76c3 |
mm: add build-time option for hotplug memory default online type
Memory hotplug presently auto-onlines memory into a zone the kernel deems
appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y.
The memhp_default_state boot param enables runtime config, but it's not
possible to do this at build-time.
Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with
CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param.
Selections:
CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE
=> mhp_default_online_type = "offline"
Memory will not be onlined automatically.
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO
=> mhp_default_online_type = "online"
Memory will be onlined automatically in a zone deemed.
appropriate by the kernel.
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL
=> mhp_default_online_type = "online_kernel"
Memory will be onlined automatically.
The zone may allow kernel data (e.g. ZONE_NORMAL).
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE
=> mhp_default_online_type = "online_movable"
Memory will be onlined automatically.
The zone will be ZONE_MOVABLE.
Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing
default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior.
Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use
CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO.
[gourry@gourry.net: update KConfig comments]
Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net
Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net
Signed-off-by: Gregory Price <gourry@gourry.net>
Acked-by: David Hildenbrand <david@redhat.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: WANG Xuerui <kernel@xen0n.name>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
|
|
647d69605c |
pci-v6.14-changes
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmeTr8wUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxrMw//TJXH+U6x5LhYvBPD/KZ20ecGHqaA
eGXrbHAasYbU1CfW7HM0onR8NffOIGoYvQrtefjQAln0w6rTvyFO0xJKLP15vMfN
hnj+y1WWtKwAkSpu10Cl9nTj8uYRNNSQeoy5kS+1diwuXdby/DlgQONO2APSe9zd
KMPXJcqSfDJlM5zHrcqqtlxauO9KHInLCc/iutd85AKjvcjOoNHNeZE0pTC0C3gE
sXYHDqJiS3zdEG6X6mWFo3OzI/Q/7NGlHJ2j0CQaObsgQ9yA7eWkez25ifwZcugc
TPtjm8DhaDo9/zx0NV9c2dPauHRC6NYUjAflMPK7Aye/41BE1Ag5Ka+tMDgC2i/N
TbfBxSeArhjnjY+eZwRhrJNNC58TtHTUs69TO7Dbmuwr7cp99MIEDAYI5V6LFAdk
plKqn1h8FztW5QKRPCgmzy6KTE+WPytiGAGAQFxzIGYkV/QqyvFaVs8FIyOJUIFM
aDSa6Xy5WLGxmPZ9hPapzEm4ws/HTRpFjNgi/d4rRG5RWMwAxZZa44s9eldhN1D/
ZwEmF2rJ+U8S7Q+mXPHDlwcsHe5APCbiaTEp4X+e3LNe0i9oxhhaUWG6LrDDmTlQ
tU5j5daHiBa0nTDL1lfaayJlYX/oJ+IYQrIYzGbnivZv4ZVdPnuWSsOsMOEiLhEt
4QqCoanqf0mCn2A=
=O62O
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Batch sizing of multiple BARs while memory decoding is disabled
instead of disabling/enabling decoding for each BAR individually;
this optimizes virtualized environments where toggling decoding
enable is expensive (Alex Williamson)
- Add host bridge .enable_device() and .disable_device() hooks for
bridges that need to configure things like Requester ID to StreamID
mapping when enabling devices (Frank Li)
- Extend struct pci_ecam_ops with .enable_device() and
.disable_device() hooks so drivers that use pci_host_common_probe()
instead of their own .probe() have a way to set the
.enable_device() callbacks (Marc Zyngier)
- Drop 'No bus range found' message so we don't complain when DTs
don't specify the default 'bus-range = <0x00 0xff>' (Bjorn Helgaas)
- Rename the drivers/pci/of_property.c struct of_pci_range to
of_pci_range_entry to avoid confusion with the global of_pci_range
in include/linux/of_address.h (Bjorn Helgaas)
Driver binding:
- Update resource request API documentation to encourage callers to
supply a driver name when requesting resources (Philipp Stanner)
- Export pci_intx_unmanaged() and pcim_intx() (always managed) so
callers of pci_intx() (which is sometimes managed) can explicitly
choose the one they need (Philipp Stanner)
- Convert drivers from pci_intx() to always-managed pcim_intx() or
never-managed pci_intx_unmanaged(): amd_sfh, ata (ahci, ata_piix,
pata_rdc, sata_sil24, sata_sis, sata_uli, sata_vsc), bnx2x, bna,
ntb, qtnfmac, rtsx, tifm_7xx1, vfio, xen-pciback (Philipp Stanner)
- Remove pci_intx_unmanaged() since pci_intx() is now always
unmanaged and pcim_intx() is always managed (Philipp Stanner)
Error handling:
- Unexport pcie_read_tlp_log() to encourage drivers to use PCI core
logging rather than building their own (Ilpo Järvinen)
- Move TLP Log handling to its own file (Ilpo Järvinen)
- Store number of supported End-End TLP Prefixes always so we can
read the correct number of DWORDs from the TLP Prefix Log (Ilpo
Järvinen)
- Read TLP Prefixes in addition to the Header Log in
pcie_read_tlp_log() (Ilpo Järvinen)
- Add pcie_print_tlp_log() to consolidate printing of TLP Header and
Prefix Log (Ilpo Järvinen)
- Quirk the Intel Raptor Lake-P PIO log size to accommodate vendor
BIOSes that don't configure it correctly (Takashi Iwai)
ASPM:
- Save parent L1 PM Substates config so when we restore it along with
an endpoint's config, the parent info isn't junk (Jian-Hong Pan)
Power management:
- Avoid D3 for Root Ports on TUXEDO Sirius Gen1 with old BIOS because
the system can't wake up from suspend (Werner Sembach)
Endpoint framework:
- Destroy the EPC device in devm_pci_epc_destroy(), which previously
didn't call devres_release() (Zijun Hu)
- Finish virtual EP removal in pci_epf_remove_vepf(), which
previously caused a subsequent pci_epf_add_vepf() to fail with
-EBUSY (Zijun Hu)
- Write BAR_MASK before iATU registers in pci_epc_set_bar() so we
don't depend on the BAR_MASK reset value being larger than the
requested BAR size (Niklas Cassel)
- Prevent changing BAR size/flags in pci_epc_set_bar() to prevent
reads from bypassing the iATU if we reduced the BAR size (Niklas
Cassel)
- Verify address alignment when programming iATU so we don't attempt
to write bits that are read-only because of the BAR size, which
could lead to directing accesses to the wrong address (Niklas
Cassel)
- Implement artpec6 pci_epc_features so we can rely on all drivers
supporting it so we can use it in EPC core code (Niklas Cassel)
- Check for BARs of fixed size to prevent endpoint drivers from
trying to change their size (Niklas Cassel)
- Verify that requested BAR size is a power of two when endpoint
driver sets the BAR (Niklas Cassel)
Endpoint framework tests:
- Clear pci-epf-test dma_chan_rx, not dma_chan_tx, after freeing
dma_chan_rx (Mohamed Khalfella)
- Correct the DMA MEMCPY test so it doesn't fail if the Endpoint
supports both DMA_PRIVATE and DMA_MEMCPY (Manivannan Sadhasivam)
- Add pci-epf-test and pci_endpoint_test support for capabilities
(Niklas Cassel)
- Add Endpoint test for consecutive BARs (Niklas Cassel)
- Remove redundant comparison from Endpoint BAR test because a > 1MB
BAR can always be exactly covered by iterating with a 1MB buffer
(Hans Zhang)
- Move and convert PCI Endpoint tests from tools/pci to Kselftests
(Manivannan Sadhasivam)
Apple PCIe controller driver:
- Convert StreamID mapping configuration from a bus notifier to the
.enable_device() and .disable_device() callbacks (Marc Zyngier)
Freescale i.MX6 PCIe controller driver:
- Add Requester ID to StreamID mapping configuration when enabling
devices (Frank Li)
- Use DWC core suspend/resume functions for imx6 (Frank Li)
- Add suspend/resume support for i.MX8MQ, i.MX8Q, and i.MX95 (Richard
Zhu)
- Add DT compatible string 'fsl,imx8q-pcie-ep' and driver support for
i.MX8Q series (i.MX8QM, i.MX8QXP, and i.MX8DXL) Endpoints (Frank
Li)
- Add DT binding for optional i.MX95 Refclk and driver support to
enable it if the platform hasn't enabled it (Richard Zhu)
- Configure PHY based on controller being in Root Complex or Endpoint
mode (Frank Li)
- Rely on dbi2 and iATU base addresses from DT via
dw_pcie_get_resources() instead of hardcoding them (Richard Zhu)
- Deassert apps_reset in imx_pcie_deassert_core_reset() since it is
asserted in imx_pcie_assert_core_reset() (Richard Zhu)
- Add missing reference clock enable or disable logic for IMX6SX,
IMX7D, IMX8MM (Richard Zhu)
- Remove redundant imx7d_pcie_init_phy() since
imx7d_pcie_enable_ref_clk() does the same thing (Richard Zhu)
Freescale Layerscape PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead
of syscon_regmap_lookup_by_phandle() followed by
of_property_read_u32_array() (Krzysztof Kozlowski)
Marvell MVEBU PCIe controller driver:
- Add MODULE_DEVICE_TABLE() to enable module autoloading (Liao Chen)
MediaTek PCIe Gen3 controller driver:
- Use clk_bulk_prepare_enable() instead of separate
clk_bulk_prepare() and clk_bulk_enable() (Lorenzo Bianconi)
- Rearrange reset assert/deassert so they're both done in the
*_power_up() callbacks (Lorenzo Bianconi)
- Document that Airoha EN7581 requires PHY init and power-on before
PHY reset deassert, unlike other MediaTek Gen3 controllers (Lorenzo
Bianconi)
- Move Airoha EN7581 post-reset delay from the en7581 clock .enable()
method to mtk_pcie_en7581_power_up() (Lorenzo Bianconi)
- Sleep instead of delay during Airoha EN7581 power-up, since this is
a non-atomic context (Lorenzo Bianconi)
- Skip PERST# assertion on Airoha EN7581 during probe and
suspend/resume to avoid a hardware defect (Lorenzo Bianconi)
- Enable async probe to reduce system startup time (Douglas Anderson)
Microchip PolarFlare PCIe controller driver:
- Set up the inbound address translation based on whether the
platform allows coherent or non-coherent DMA (Daire McNamara)
- Update DT binding such that platforms are DMA-coherent by default
and must specify 'dma-noncoherent' if needed (Conor Dooley)
Mobiveil PCIe controller driver:
- Convert mobiveil-pcie.txt to YAML and update 'interrupt-names'
and 'reg-names' (Frank Li)
Qualcomm PCIe controller driver:
- Add DT SM8550 and SM8650 optional 'global' interrupt for link
events (Neil Armstrong)
- Add DT 'compatible' strings for IPQ5424 PCIe controller (Manikanta
Mylavarapu)
- If 'global' IRQ is supported for detection of Link Up events, tell
DWC core not to wait for link up (Krishna chaitanya chundru)
Renesas R-Car PCIe controller driver:
- Avoid passing stack buffer as resource name (King Dix)
Rockchip PCIe controller driver:
- Simplify clock and reset handling by using bulk interfaces (Anand
Moon)
- Pass typed rockchip_pcie (not void) pointer to
rockchip_pcie_disable_clocks() (Anand Moon)
- Return -ENOMEM, not success, when pci_epc_mem_alloc_addr() fails
(Dan Carpenter)
Rockchip DesignWare PCIe controller driver:
- Use dll_link_up IRQ to detect Link Up and enumerate devices so
users don't have to manually rescan (Niklas Cassel)
- Tell DWC core not to wait for link up since the 'sys' interrupt is
required and detects Link Up events (Niklas Cassel)
Synopsys DesignWare PCIe controller driver:
- Don't wait for link up in DWC core if driver can detect Link Up
event (Krishna chaitanya chundru)
- Update ICC and OPP votes after Link Up events (Krishna chaitanya
chundru)
- Always stop link in dw_pcie_suspend_noirq(), which is required at
least for i.MX8QM to re-establish link on resume (Richard Zhu)
- Drop racy and unnecessary LTSSM state check before sending
PME_TURN_OFF message in dw_pcie_suspend_noirq() (Richard Zhu)
- Add struct of_pci_range.parent_bus_addr for devices that need their
immediate parent bus address, not the CPU address, e.g., to program
an internal Address Translation Unit (iATU) (Frank Li)
TI DRA7xx PCIe controller driver:
- Simplify by using syscon_regmap_lookup_by_phandle_args() instead of
syscon_regmap_lookup_by_phandle() followed by
of_parse_phandle_with_fixed_args() or of_property_read_u32_index()
(Krzysztof Kozlowski)
Xilinx Versal CPM PCIe controller driver:
- Add DT binding and driver support for Xilinx Versal CPM5
(Thippeswamy Havalige)
MicroSemi Switchtec management driver:
- Add Microchip PCI100X device IDs (Rakesh Babu Saladi)
Miscellaneous:
- Move reset related sysfs code from pci.c to pci-sysfs.c where other
similar code lives (Ilpo Järvinen)
- Simplify reset_method_store() memory management by using __free()
instead of explicit kfree() cleanup (Ilpo Järvinen)
- Constify struct bin_attribute for sysfs, VPD, P2PDMA, and the IBM
ACPI hotplug driver (Thomas Weißschuh)
- Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT (Dongdong
Zhang)
- Correct documentation of the 'config_acs=' kernel parameter
(Akihiko Odaki)"
* tag 'pci-v6.14-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (111 commits)
PCI: Batch BAR sizing operations
dt-bindings: PCI: microchip,pcie-host: Allow dma-noncoherent
PCI: microchip: Set inbound address translation for coherent or non-coherent mode
Documentation: Fix pci=config_acs= example
PCI: Remove redundant PCI_VSEC_HDR and PCI_VSEC_HDR_LEN_SHIFT
PCI: Don't include 'pm_wakeup.h' directly
selftests: pci_endpoint: Migrate to Kselftest framework
selftests: Move PCI Endpoint tests from tools/pci to Kselftests
misc: pci_endpoint_test: Fix IOCTL return value
dt-bindings: PCI: qcom: Document the IPQ5424 PCIe controller
dt-bindings: PCI: qcom,pcie-sm8550: Document 'global' interrupt
dt-bindings: PCI: mobiveil: Convert mobiveil-pcie.txt to YAML
PCI: switchtec: Add Microchip PCI100X device IDs
misc: pci_endpoint_test: Remove redundant 'remainder' test
misc: pci_endpoint_test: Add consecutive BAR test
misc: pci_endpoint_test: Add support for capabilities
PCI: endpoint: pci-epf-test: Add support for capabilities
PCI: endpoint: pci-epf-test: Fix check for DMA MEMCPY test
PCI: endpoint: pci-epf-test: Set dma_chan_rx pointer to NULL on error
PCI: dwc: Simplify config resource lookup
...
|
|
|
|
e8744fbc83 |
tracing updates for v6.14:
- Cleanup with guard() and free() helpers There were several places in the code that had a lot of "goto out" in the error paths to either unlock a lock or free some memory that was allocated. But this is error prone. Convert the code over to use the guard() and free() helpers that let the compiler unlock locks or free memory when the function exits. - Update the Rust tracepoint code to use the C code too There was some duplication of the tracepoint code for Rust that did the same logic as the C code. Add a helper that makes it possible for both algorithms to use the same logic in one place. - Add poll to trace event hist files It is useful to know when an event is triggered, or even with some filtering. Since hist files of events get updated when active and the event is triggered, allow applications to poll the hist file and wake up when an event is triggered. This will let the application know that the event it is waiting for happened. - Add :mod: command to enable events for current or future modules The function tracer already has a way to enable functions to be traced in modules by writing ":mod:<module>" into set_ftrace_filter. That will enable either all the functions for the module if it is loaded, or if it is not, it will cache that command, and when the module is loaded that matches <module>, its functions will be enabled. This also allows init functions to be traced. But currently events do not have that feature. Add the command where if ':mod:<module>' is written into set_event, then either all the modules events are enabled if it is loaded, or cache it so that the module's events are enabled when it is loaded. This also works from the kernel command line, where "trace_event=:mod:<module>", when the module is loaded at boot up, its events will be enabled then. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ5EbMxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qkZsAP9Amgx9frSbR1pn1t0I3wVnQx7khgOu s/b8Ro+vjTx1/QD/RN2AA7f+HK4F27w3Aqfrs0nKXAPtXWsJ9Epp8raG5w8= =Pg+4 -----END PGP SIGNATURE----- Merge tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing updates from Steven Rostedt: - Cleanup with guard() and free() helpers There were several places in the code that had a lot of "goto out" in the error paths to either unlock a lock or free some memory that was allocated. But this is error prone. Convert the code over to use the guard() and free() helpers that let the compiler unlock locks or free memory when the function exits. - Update the Rust tracepoint code to use the C code too There was some duplication of the tracepoint code for Rust that did the same logic as the C code. Add a helper that makes it possible for both algorithms to use the same logic in one place. - Add poll to trace event hist files It is useful to know when an event is triggered, or even with some filtering. Since hist files of events get updated when active and the event is triggered, allow applications to poll the hist file and wake up when an event is triggered. This will let the application know that the event it is waiting for happened. - Add :mod: command to enable events for current or future modules The function tracer already has a way to enable functions to be traced in modules by writing ":mod:<module>" into set_ftrace_filter. That will enable either all the functions for the module if it is loaded, or if it is not, it will cache that command, and when the module is loaded that matches <module>, its functions will be enabled. This also allows init functions to be traced. But currently events do not have that feature. Add the command where if ':mod:<module>' is written into set_event, then either all the modules events are enabled if it is loaded, or cache it so that the module's events are enabled when it is loaded. This also works from the kernel command line, where "trace_event=:mod:<module>", when the module is loaded at boot up, its events will be enabled then. * tag 'trace-v6.14-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (26 commits) tracing: Fix output of set_event for some cached module events tracing: Fix allocation of printing set_event file content tracing: Rename update_cache() to update_mod_cache() tracing: Fix #if CONFIG_MODULES to #ifdef CONFIG_MODULES selftests/ftrace: Add test that tests event :mod: commands tracing: Cache ":mod:" events for modules not loaded yet tracing: Add :mod: command to enabled module events selftests/tracing: Add hist poll() support test tracing/hist: Support POLLPRI event for poll on histogram tracing/hist: Add poll(POLLIN) support on hist file tracing: Fix using ret variable in tracing_set_tracer() tracepoint: Reduce duplication of __DO_TRACE_CALL tracing/string: Create and use __free(argv_free) in trace_dynevent.c tracing: Switch trace_stat.c code over to use guard() tracing: Switch trace_stack.c code over to use guard() tracing: Switch trace_osnoise.c code over to use guard() and __free() tracing: Switch trace_events_synth.c code over to use guard() tracing: Switch trace_events_filter.c code over to use guard() tracing: Switch trace_events_trigger.c code over to use guard() tracing: Switch trace_events_hist.c code over to use guard() ... |
|
|
|
d0f93ac2c3 |
Documentation changes this time around include:
- Quite a bit of Chinese and Spanish translation work. - Clarifying that Git commit IDs >12chars are OK - A new nvme-multipath document - A reorganization of the admin-guide top-level page to make it readable - Clarification of the role of Acked-by and maintainer discretion on their acceptance. - Some reorganization of debugging-oriented docs. ...and typo fixes, documentation updates, etc. as usual. -----BEGIN PGP SIGNATURE----- iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmeOp8EPHGNvcmJldEBs d24ubmV0AAoJEBdDWhNsDH5YipUH/iffvlVYuqoVdPUFWdmsiNjwOCRE2MIfp8qO tPTRRHJAny+NlFT0IWlGUbLNoNXtvpN47YlkaeAjdrsjASerfpwzje7t4Z1B+jWT 0YwGBCvDIGasfRCx7D14+w5aqkEEynfsy+QurwcuDxcHMQGwt7ZCuTNOVO6BULEr L++BMwqapUr5IemP4ItQqDVVF9sp6bWEhaOnTTJCLU6oG23uUSSA/59sJmwDJUk7 6J3VGO1An4Jte9WX7qkVrSBNO5cOOhaFiFXIeNxfOioOPctBwxKiHDJnzVud8ipz R+tnUI/8hEvyJ7GZFezyZxmMnFs0P2DEYAkaN+hBs/nUjx0dKUg= =YxaS -----END PGP SIGNATURE----- Merge tag 'docs-6.14' of git://git.lwn.net/linux Pull Documentation updates from Jonathan Corbet: - Quite a bit of Chinese and Spanish translation work - Clarifying that Git commit IDs >12chars are OK - A new nvme-multipath document - A reorganization of the admin-guide top-level page to make it readable - Clarification of the role of Acked-by and maintainer discretion on their acceptance - Some reorganization of debugging-oriented docs ... and typo fixes, documentation updates, etc as usual * tag 'docs-6.14' of git://git.lwn.net/linux: (50 commits) Documentation: Fix x86_64 UEFI outdated references to elilo Documentation/sysctl: Add timer_migration to kernel.rst docs/mm: Physical memory: Remove zone_t docs: submitting-patches: clarify that signers may use their discretion on tags docs: submitting-patches: clarify difference between Acked-by and Reviewed-by docs: submitting-patches: clarify Acked-by and introduce "# Suffix" Documentation: bug-hunting.rst: remove odd contact information docs/zh_CN: Add sak index Chinese translation doc: module: DEFAULT_SYMBOL_NAMESPACE must be defined before #includes doc: module: Fix documented type of namespace Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst docs/zh_CN: Add landlock index Chinese translation Documentation: Fix typo localmodonfig -> localmodconfig overlayfs.rst: Fix and improve grammar docs/zh_CN: Add siphash index Chinese translation docs/zh_CN: Add security IMA-templates Chinese translation docs/zh_CN: Add security digsig Chinese translation Align git commit ID abbreviation guidelines and checks docs: process: submitting-patches: split canonical patch format section docs/zh_CN: Add security lsm Chinese translation ... |
|
|
|
b388face5f |
Documentation: Fix pci=config_acs= example
The documentation currently says:
config_acs=
Format:
<ACS flags>@<pci_dev>[; ...]
Specify one or more PCI devices (in the format
specified above) optionally prepended with flags
and separated by semicolons. The respective
capabilities will be enabled, disabled or
unchanged based on what is specified in
flags.
(...)
For example,
pci=config_acs=10x
would configure all devices that support
ACS to enable P2P Request Redirect, disable
Translation Blocking, and leave Source
Validation unchanged from whatever power-up
or firmware set it to.
See the complete documentation at:
https://www.kernel.org/doc/html/latest/admin-guide/kernel-parameters.html
However, a flag specification always needs to be suffixed with "@" and
a PCI valid device address, which is missing in this example. Also, to
configure all devices that support ACS, the flag needs to be suffixed
with "@pci:0:0", for the ACS support to be enabled.
Fix the documentation so the example is correct.
Link: https://lore.kernel.org/r/20240915-acs-v1-1-b9ee536ee9bd@daynix.com
Signed-off-by: Akihiko Odaki <akihiko.odaki@daynix.com>
[kwilczynski: commit log]
Signed-off-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
|
|
|
9f3ee94e70 |
RCU pull request for v6.14
This pull request contains the following branches:
fixes.2024.12.14a: Misc fixes, check if IRQs are disabled in rcu_exp_need_qs(),
instrument KCSAN exclusive-writer assertions, add extra WARN_ON_ONCE() check,
set the cpu_no_qs.b.exp under lock, warn if callback enqueued on offline CPU.
rcutorture.2024.12.14a: Torture-test updates, add rcutorture.preempt_duration kernel
module parameter, make the TREE03 scenario do preemption, improve pooling timeouts
for rcu_torture_writer(), improve output of "Failure/close-call rcutorture reader
segments", add some reader-state debugging checks, update doc of polled APIs, add
extra diagnostics for per-reader-segment preemption.
srcu.2024.12.14a: SRCU updates, improve doc for srcu_read_lock() in terms of return
value, fix typo in comments, remove redundant GP sequence checks in the
srcu_funnel_gp_start.
torture-test.2024.12.14a: Add an extra test for sched_clock(), improve testing
on unresponsive systems.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEEu6QRe/mAUYNn5U0PBYqkjnKWLM8FAmeGprwACgkQBYqkjnKW
LM+QUwv/VVwYKI3f9eH4AcjijIFufmsP3I5REfY3s7+a6BItwZrulwhGK+mWWE+9
nKjsrrjw3sv3dEvdaUfZxwiLQBfJmWdUUGTZ748qyCpidlo4wB7OW/BR+Pn4ZiB/
rWkn28fHdDxUJV+nQGbGC82EiGrLC9XYlbTbnh9VzGEjcyKIIU3Dw5tGXzEVMn5w
Tc6H6jWg8+fXxIdmhdEkjpH+rS9H160Lt1bGeGadI3LMdmMj89x5u+i6gheT83WL
FBBwgNVITWPTwfQFyK4wuRcKzi/UIrRdQIU+2xqJKs6NeWhwqhFDfW8FP5brBI2o
f7fFQA+CbP/oRCqXCaZKmB3i/xGeJUsJ/IJ992jq61TCLNoc3LDxovYdaHfUJgph
W/8KHUc8oZMQU4CjGJkj30jnpBLPwZeZuJvuTfQZl5QuBeUivIMg89cXLpE9Vnny
yof3pm++Fru8wJ0rooq3ef2A5vblpoBQcnYelQV2EJisMCOkd+P5PNVA2viTrG8F
QGfmDzm1
=EwPt
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Uladzislau Rezki:
"Misc fixes:
- check if IRQs are disabled in rcu_exp_need_qs()
- instrument KCSAN exclusive-writer assertions
- add extra WARN_ON_ONCE() check
- set the cpu_no_qs.b.exp under lock
- warn if callback enqueued on offline CPU
Torture-test updates:
- add rcutorture.preempt_duration kernel module parameter
- make the TREE03 scenario do preemption
- improve pooling timeouts for rcu_torture_writer()
- improve output of "Failure/close-call rcutorture reader segments"
- add some reader-state debugging checks
- update doc of polled APIs
- add extra diagnostics for per-reader-segment preemption
- add an extra test for sched_clock()
- improve testing on unresponsive systems
SRCU updates:
- improve doc for srcu_read_lock() in terms of return value
- fix typo in comments
- remove redundant GP sequence checks in the srcu_funnel_gp_start"
* tag 'rcu.release.v6.14' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (31 commits)
srcu: Remove redundant GP sequence checks in srcu_funnel_gp_start
srcu: Fix typo s/srcu_check_read_flavor()/__srcu_check_read_flavor()/
srcu: Guarantee non-negative return value from srcu_read_lock()
MAINTAINERS: Update RCU git tree
rcu: Add lockdep_assert_irqs_disabled() to rcu_exp_need_qs()
rcu: Add KCSAN exclusive-writer assertions for rdp->cpu_no_qs.b.exp
rcu: Make preemptible rcu_exp_handler() check idempotency
rcu: Replace open-coded rcu_exp_need_qs() from rcu_exp_handler() with call
rcu: Move rcu_report_exp_rdp() setting of ->cpu_no_qs.b.exp under lock
rcu: Make rcu_report_exp_cpu_mult() caller acquire lock
rcu: Report callbacks enqueued on offline CPU blind spot
rcutorture: Use symbols for SRCU reader flavors
rcutorture: Add per-reader-segment preemption diagnostics
rcutorture: Read CPU ID for decoration protected by both reader types
rcutorture: Add preempt_count() to rcutorture_one_extend_check() diagnostics
rcutorture: Add parameters to control polled/conditional wait interval
rcutorture: Add documentation for recent conditional and polled APIs
rcutorture: Ignore attempts to test preemption and forward progress
rcutorture: Make rcutorture_one_extend() check reader state
rcutorture: Pretty-print rcutorture reader segments
...
|
|
|
|
62de6e1685 |
Scheduler enhancements for v6.14:
- Fair scheduler (SCHED_FAIR) enhancements:
- Behavioral improvements:
- Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)
- Delayed-dequeue enhancements & fixes: (Vincent Guittot)
- Rename h_nr_running into h_nr_queued
- Add new cfs_rq.h_nr_runnable
- Use the new cfs_rq.h_nr_runnable
- Removed unsued cfs_rq.h_nr_delayed
- Rename cfs_rq.idle_h_nr_running into h_nr_idle
- Remove unused cfs_rq.idle_nr_running
- Rename cfs_rq.nr_running into nr_queued
- Do not try to migrate delayed dequeue task
- Fix variable declaration position
- Encapsulate set custom slice in a __setparam_fair() function
- Fixes:
- Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
- Fix CPU bandwidth limit bypass during CPU hotplug (Vishal Chourasia)
- Cleanups:
- Clean up in migrate_degrades_locality() to improve
readability (Peter Zijlstra)
- Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
- Update comments after sched_tick() rename (Sebastian Andrzej Siewior)
- Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
(Valentin Schneider)
- Deadline scheduler (SCHED_DL) enhancements:
- Restore dl_server bandwidth on non-destructive root domain
changes (Juri Lelli)
- Correctly account for allocated bandwidth during
hotplug (Juri Lelli)
- Check bandwidth overflow earlier for hotplug (Juri Lelli)
- Clean up goto label in pick_earliest_pushable_dl_task()
(John Stultz)
- Consolidate timer cancellation (Wander Lairson Costa)
- Load-balancer enhancements:
- Improve performance by prioritizing migrating eligible
tasks in sched_balance_rq() (Hao Jia)
- Do not compute NUMA Balancing stats unnecessarily during
load-balancing (K Prateek Nayak)
- Do not compute overloaded status unnecessarily during
load-balancing (K Prateek Nayak)
- Generic scheduling code enhancements:
- Use READ_ONCE() in task_on_rq_queued(), to consistently use
the WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)
- Isolated CPUs support enhancements: (Waiman Long)
- Make "isolcpus=nohz" equivalent to "nohz_full"
- Consolidate housekeeping cpumasks that are always identical
- Remove HK_TYPE_SCHED
- Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE
- RSEQ enhancements:
- Validate read-only fields under DEBUG_RSEQ config
(Mathieu Desnoyers)
- PSI enhancements:
- Fix race when task wakes up before psi_sched_switch()
adjusts flags (Chengming Zhou)
- IRQ time accounting performance enhancements: (Yafang Shao)
- Define sched_clock_irqtime as static key
- Don't account irq time if sched_clock_irqtime is disabled
- Virtual machine scheduling enhancements:
- Don't try to catch up excess steal time (Suleiman Souhlal)
- Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)
- Convert "sysctl_sched_itmt_enabled" to boolean
- Use guard() for itmt_update_mutex
- Move the "sched_itmt_enabled" sysctl to debugfs
- Remove x86_smt_flags and use cpu_smt_flags directly
- Use x86_sched_itmt_flags for PKG domain unconditionally
- Debugging code & instrumentation enhancements:
- Change need_resched warnings to pr_err() (David Rientjes)
- Print domain name in /proc/schedstat (K Prateek Nayak)
- Fix value reported by hot tasks pulled in /proc/schedstat (Peter Zijlstra)
- Report the different kinds of imbalances in /proc/schedstat (Swapnil Sapkal)
- Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
- Update Schedstat version to 17 (Swapnil Sapkal)
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmePSRcRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1hrdBAAjYiLl5Q8SHM0xnl+kbvuUkCTgEB/gSgA
mfrZtHRUgRZuA89NZ9NljlCkQSlsLTOjnpNuaeFzs529GMg9iemc99dbnz3BP5F3
V5qpYvWe7yIkJ3hd0TOGLmYEPMNQaAW57YBOrxcPjWNLJ4cr9iMdccVA1OQtcmqD
ZUh3nibv81QI8HDmT2G+figxEIqH3yBV1+SmEIxbrdkQpIJ5702Ng6+0KQK5TShN
xwjFELWZUl2TfkoCc4nkIpkImV6cI1DvXSw1xK6gbb1xEVOrsmFW3TYFw4trKHBu
2RBG4wtmzNjh+12GmSdIBJHogPNcay+JIJW9EG/unT7jirqzkkeP1X2eJEbh+X1L
CMa7GsD9Vy72jCzeJDMuiy7bKfG/MiKUtDXrAZQDo2atbw7H88QOzMuTE5a5WSV+
tRxXGI/dgFVOk+JQUfctfJbYeXjmG8GAflawvXtGDAfDZsja6M+65fH8p0AOgW1E
HHmXUzAe2E2xQBiSok/DYHPQeCDBAjoJvU93YhGiXv8UScb2UaD4BAfzfmc8P+Zs
Eox6444ah5U0jiXmZ3HU707n1zO+Ql4qKoyyMJzSyP+oYHE/Do7NYTElw2QovVdN
FX/9Uae8T4ttA/5lFe7FNoXgKvSxXDKYyKLZcysjVrWJF866Ui/TWtmxA6w8Osn7
sfucuLawLPM=
=5ZNW
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Fair scheduler (SCHED_FAIR) enhancements:
- Behavioral improvements:
- Untangle NEXT_BUDDY and pick_next_task() (Peter Zijlstra)
- Delayed-dequeue enhancements & fixes: (Vincent Guittot)
- Rename h_nr_running into h_nr_queued
- Add new cfs_rq.h_nr_runnable
- Use the new cfs_rq.h_nr_runnable
- Removed unsued cfs_rq.h_nr_delayed
- Rename cfs_rq.idle_h_nr_running into h_nr_idle
- Remove unused cfs_rq.idle_nr_running
- Rename cfs_rq.nr_running into nr_queued
- Do not try to migrate delayed dequeue task
- Fix variable declaration position
- Encapsulate set custom slice in a __setparam_fair() function
- Fixes:
- Fix race between yield_to() and try_to_wake_up() (Tianchen Ding)
- Fix CPU bandwidth limit bypass during CPU hotplug (Vishal
Chourasia)
- Cleanups:
- Clean up in migrate_degrades_locality() to improve readability
(Peter Zijlstra)
- Mark m*_vruntime() with __maybe_unused (Andy Shevchenko)
- Update comments after sched_tick() rename (Sebastian Andrzej
Siewior)
- Remove CONFIG_CFS_BANDWIDTH=n definition of cfs_bandwidth_used()
(Valentin Schneider)
Deadline scheduler (SCHED_DL) enhancements:
- Restore dl_server bandwidth on non-destructive root domain changes
(Juri Lelli)
- Correctly account for allocated bandwidth during hotplug (Juri
Lelli)
- Check bandwidth overflow earlier for hotplug (Juri Lelli)
- Clean up goto label in pick_earliest_pushable_dl_task() (John
Stultz)
- Consolidate timer cancellation (Wander Lairson Costa)
Load-balancer enhancements:
- Improve performance by prioritizing migrating eligible tasks in
sched_balance_rq() (Hao Jia)
- Do not compute NUMA Balancing stats unnecessarily during
load-balancing (K Prateek Nayak)
- Do not compute overloaded status unnecessarily during
load-balancing (K Prateek Nayak)
Generic scheduling code enhancements:
- Use READ_ONCE() in task_on_rq_queued(), to consistently use the
WRITE_ONCE() updated ->on_rq field (Harshit Agarwal)
Isolated CPUs support enhancements: (Waiman Long)
- Make "isolcpus=nohz" equivalent to "nohz_full"
- Consolidate housekeeping cpumasks that are always identical
- Remove HK_TYPE_SCHED
- Unify HK_TYPE_{TIMER|TICK|MISC} to HK_TYPE_KERNEL_NOISE
RSEQ enhancements:
- Validate read-only fields under DEBUG_RSEQ config (Mathieu
Desnoyers)
PSI enhancements:
- Fix race when task wakes up before psi_sched_switch() adjusts flags
(Chengming Zhou)
IRQ time accounting performance enhancements: (Yafang Shao)
- Define sched_clock_irqtime as static key
- Don't account irq time if sched_clock_irqtime is disabled
Virtual machine scheduling enhancements:
- Don't try to catch up excess steal time (Suleiman Souhlal)
Heterogenous x86 CPU scheduling enhancements: (K Prateek Nayak)
- Convert "sysctl_sched_itmt_enabled" to boolean
- Use guard() for itmt_update_mutex
- Move the "sched_itmt_enabled" sysctl to debugfs
- Remove x86_smt_flags and use cpu_smt_flags directly
- Use x86_sched_itmt_flags for PKG domain unconditionally
Debugging code & instrumentation enhancements:
- Change need_resched warnings to pr_err() (David Rientjes)
- Print domain name in /proc/schedstat (K Prateek Nayak)
- Fix value reported by hot tasks pulled in /proc/schedstat (Peter
Zijlstra)
- Report the different kinds of imbalances in /proc/schedstat
(Swapnil Sapkal)
- Move sched domain name out of CONFIG_SCHED_DEBUG (Swapnil Sapkal)
- Update Schedstat version to 17 (Swapnil Sapkal)"
* tag 'sched-core-2025-01-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (48 commits)
rseq: Fix rseq unregistration regression
psi: Fix race when task wakes up before psi_sched_switch() adjusts flags
sched, psi: Don't account irq time if sched_clock_irqtime is disabled
sched: Don't account irq time if sched_clock_irqtime is disabled
sched: Define sched_clock_irqtime as static key
sched/fair: Do not compute overloaded status unnecessarily during lb
sched/fair: Do not compute NUMA Balancing stats unnecessarily during lb
x86/topology: Use x86_sched_itmt_flags for PKG domain unconditionally
x86/topology: Remove x86_smt_flags and use cpu_smt_flags directly
x86/itmt: Move the "sched_itmt_enabled" sysctl to debugfs
x86/itmt: Use guard() for itmt_update_mutex
x86/itmt: Convert "sysctl_sched_itmt_enabled" to boolean
sched/core: Prioritize migrating eligible tasks in sched_balance_rq()
sched/debug: Change need_resched warnings to pr_err
sched/fair: Encapsulate set custom slice in a __setparam_fair() function
sched: Fix race between yield_to() and try_to_wake_up()
docs: Update Schedstat version to 17
sched/stats: Print domain name in /proc/schedstat
sched: Move sched domain name out of CONFIG_SCHED_DEBUG
sched: Report the different kinds of imbalances in /proc/schedstat
...
|
|
|
|
b355247df1 |
tracing: Cache ":mod:" events for modules not loaded yet
When the :mod: command is written into /sys/kernel/tracing/set_event (or that file within an instance), if the module specified after the ":mod:" is not yet loaded, it will store that string internally. When the module is loaded, it will enable the events as if the module was loaded when the string was written into the set_event file. This can also be useful to enable events that are in the init section of the module, as the events are enabled before the init section is executed. This also works on the kernel command line: trace_event=:mod:<module> Will enable the events for <module> when it is loaded. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://lore.kernel.org/20250116143533.514730995@goodmis.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> |
|
|
|
d635ccdb43 |
mm: shmem: add a kernel command line to change the default huge policy for tmpfs
Now the tmpfs can allow to allocate any sized large folios, and the default huge policy is still preferred to be 'never'. Due to tmpfs not behaving like other file systems in some cases as previously explained by David[1]: : I think I raised this in the past, but tmpfs/shmem is just like any : other file system .. except it sometimes really isn't and behaves much : more like (swappable) anonymous memory. (or mlocked files) : : There are many systems out there that run without swap enabled, or with : extremely minimal swap (IIRC until recently kubernetes was completely : incompatible with swapping). Swap can even be disabled today for shmem : using a mount option. : : That's a big difference to all other file systems where you are : guaranteed to have backend storage where you can simply evict under : memory pressure (might temporarily fail, of course). : : I *think* that's the reason why we have the "huge=" parameter that also : controls the THP allocations during page faults (IOW possible memory : over-allocation). Maybe also because it was a new feature, and we only : had a single THP size. Thus adding a new command line to change the default huge policy will be helpful to use the large folios for tmpfs, which is similar to the 'transparent_hugepage_shmem' cmdline for shmem. [1] https://lore.kernel.org/all/cbadd5fe-69d5-4c21-8eb8-3344ed36c721@redhat.com/ Link: https://lkml.kernel.org/r/ff390b2656f0d39649547f8f2cbb30fcb7e7be2d.1732779148.git.baolin.wang@linux.alibaba.com Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
769b83735d |
Documentation/kernel-parameters: Fix a reference to vga-softcursor.rst
A very minor oversight that dates all the way back to rst migration in
commit
|
|
|
|
e8440c1e2d |
Documentation: Update the behaviour of "kvm-arm.mode"
Commit
|
|
|
|
1146f7429f |
Documentation/kernel-parameters: Fix a typo in kvm.enable_virt_at_load text
s/lode/load/ Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/r/20241202120416.6054-5-bp@kernel.org |
|
|
|
282e06cc8f |
rcutorture: Add parameters to control polled/conditional wait interval
This commit adds rcutorture module parameters gp_cond_wi, gp_cond_wi_exp, gp_poll_wi, and gp_poll_wi_exp to control the wait interval for conditional, conditional expedited, polled, and polled expedited grace periods, respectively. When rcu_torture_writer() is testing these types of grace periods, hrtimers are used to randomly wait up to the specified number of microseconds, but with nanosecond granularity. In the case of conditional grace periods (get_state_synchronize_rcu() and cond_synchronize_rcu(), for example) there is just one wait. For polled grace periods (start_poll_synchronize_rcu() and poll_state_synchronize_rcu(), for example), there is a repeated series of waits until the grace period ends. For normal grace periods, the default is 16 jiffies (for example, 16,000 microseconds on a HZ=1000 system) and for expedited grace periods the default is 128 microseconds. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> |
|
|
|
cae7f6319e |
rcutorture: Add documentation for recent conditional and polled APIs
This commit adds kernel-parameters.txt documentation for rcutorture's (relatively) new gp_cond_exp, gp_cond_full, gp_cond_exp, gp_poll, gp_poll_exp, gp_poll_full, and gp_poll_exp module parameters. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> |
|
|
|
584975ccb7 |
rcutorture: Add random real-time preemption
This commit adds the rcutorture.preempt_duration kernel module parameter, which gives the real-time preemption duration in milliseconds (zero to disable, which is the default) and also the rcutorture.preempt_interval module parameter, which gives the interval between successive preemptions, also in milliseconds, defaulting to one second. The CPU to preempt is chosen at random from those online at that time. Races between preempting a given CPU and that CPU going offline are ignored, and preemption is forgone when this occurs. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> |
|
|
|
05453d36a2 |
Merge branch 'linus' into x86/cleanups, to resolve conflict
These two commits interact: upstream: |
|
|
|
ab0e7f2076 |
Documentation: Merge x86-specific boot options doc into kernel-parameters.txt
Documentation/arch/x86/x86_64/boot-options.rst is causing unnecessary confusion by being a second place where one can put x86 boot options. Move them into the main one. Drop removed ones like "acpi=ht", while at it. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Link: https://lore.kernel.org/r/20241202190011.11979-1-bp@kernel.org |
|
|
|
1174b9344b |
sched/isolation: Make "isolcpus=nohz" equivalent to "nohz_full"
The "isolcpus=nohz" boot parameter and flag were used to disable tick when running a single task. Nowsdays, this "nohz" flag is seldomly used as it is included as part of the "nohz_full" parameter. Extend this flag to cover other kernel noises disabled by the "nohz_full" parameter to make them equivalent. This also eliminates the need to use both the "isolcpus" and the "nohz_full" parameters to fully isolated a given set of CPUs. Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/r/20241030175253.125248-3-longman@redhat.com |
|
|
|
f66e4a9965 |
sched/core: Update kernel boot parameters for LAZY preempt.
Update the documentation for the `preempt=' parameter which now also
accepts `lazy'.
Fixes:
|
|
|
|
9ad55a67a7 |
soundwire updates for 6.13
- structure optimization of few bus structures and header updates
- support for 2.0 disco spec
- amd driver updates for acp revision, refactoring code and support for
acp6.3
- soft reset support for cadence driver
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE+vs47OPLdNbVcHzyfBQHDyUjg0cFAmdEgiMACgkQfBQHDyUj
g0dVBA/+MWwHs+Sl7LMSmkpGsfAsmSbD2il+v+9WcVnaQpl/dgv8EXPGbafBBgK/
AlVUvLCNdbwY93wCb/2xdGPOJS699D7AtdJnUEppcL2VsMtEbgQxyG0OSekRVH0c
NxVLNPVLQFQnZayh7MNflQNVrXJyEqUJg8n0G9G1KT7jTeMavejYhqmhN7TKNtLD
vJzF79QFC2n7+f7jK9+d2pJlhW5V3XUyQCRF6FipftKbuZN+ciVh9kjnAf1GjPsi
qpv7kRZ3ttZiYW+/8FjJxqChnT10b/ahRDwJTXE+uGhqxHD9Cjo/GYrzUtQQbDR2
uvZ6+o0UxhN3HR5Dq09FJYPluHpt8S/s/wZ0dj+dXlvPR82qT6LA9LP16BFwYj3S
36/DpGwJBYg3tsmwECKbY08t3aI1d8nXNKG0tXbkEU3RUWVeOJOLAyXbwYQ9DRGN
k3RbTTEZiw223FlgAk9dzCI6mMuekdh20UWVH7iZwUl8ZvJhWNdWiZOV4uaUcGZS
fmJ6JE7cM1ntv5rXjKIhhnTnoL5Z+3es3PjLxj8PE7VNC8Dlln67FF1NuoDd0uF0
jWA13iNUOKgytsx2jxAxWnU8S3SAPjB1+GD65ovMxH+b9xtgwhtmCcpySJaG4/Pn
P7F7dx1+bK8gbmc5xJf8ZddYeDF/Nb/493trk+Sf+zZSs+hevRY=
=3Ob7
-----END PGP SIGNATURE-----
Merge tag 'soundwire-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire
Pull soundwire updates from Vinod Koul:
- structure optimization of few bus structures and header updates
- support for 2.0 disco spec
- amd driver updates for acp revision, refactoring code and support for
acp6.3
- soft reset support for cadence driver
* tag 'soundwire-6.13-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (24 commits)
soundwire: Minor formatting fixups in sdw.h header
soundwire: Update the includes on the sdw.h header
soundwire: cadence: clear MCP BLOCK_WAKEUP in init
soundwire: cadence: add soft-reset on startup
soundwire: intel_auxdevice: add kernel parameter for mclk divider
soundwire: mipi-disco: add support for DP0/DPn 'lane-list' property
soundwire: mipi-disco: add new properties from 2.0 spec
soundwire: mipi-disco: add comment on DP0-supported property
soundwire: mipi-disco: add support for peripheral channelprepare timeout
soundwire: mipi_disco: add support for clock-scales property
soundwire: mipi-disco: add error handling for property array read
soundwire: mipi-disco: remove DPn audio-modes
soundwire: optimize sdw_dpn_prop
soundwire: optimize sdw_dp0_prop
soundwire: optimize sdw_slave_prop
soundwire: optimize sdw_bus structure
soundwire: optimize sdw_master_prop
soundwire: optimize sdw_stream_runtime memory layout
soundwire: mipi_disco: add MIPI-specific property_read_bool() helpers
soundwire: Correct some typos in comments
...
|
|
|
|
1746db26f8 |
pci-v6.13-changes
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmdE14wUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vxMPRAAslaEhHZ06cU/I+BA0UrMJBbzOw+/
XM2XUojxWaNMYSBPVXbtSBrfFMnox4G3hFBPK0T0HiWoc7wGx/TUVJk65ioqM8ug
gS/U3NjSlqlnH8NHxKrb/2t0tlMvSll9WwumOD9pMFeMGFOS3fAgUk+fBqXFYsI/
RsVRMavW9BucZ0yMHpgr0KGLPSt3HK/E1h0NLO+TN6dpFcoIq3XimKFyk1QQQgiR
V3W21JMwjw+lDnUAsijU+RBYi5Fj6Rpqig/biRnzagVE6PJOci3ZJEBE7dGqm4LM
UlgG6Ql/eK+bb3fPhcXxVmscj5XlEfbesX5PUzTmuj79Wq5l9hpy+0c654G79y8b
rGiEVGM0NxmRdbuhWQUM2EsffqFlkFu7MN3gH0tP0Z0t3VTXfBcGrQJfqCcSCZG3
5IwGdEE2kmGb5c3RApZrm+HCXdxhb3Nwc3P8c27eXDT4eqHWDJag4hzLETNBdIrn
Rsbgry6zzAVA6lLT0uasUlWerq/I6OrueJvnEKRGKDtbw/JL6PLveR1Rvsc//cQD
Tu4FcG81bldQTUOdHEgFyJgmSu77Gvfs5RZBV0cEtcCBc33uGJne08kOdGD4BwWJ
dqN3wJFh5yX4jlMGmBDw0KmFIwKstfUCIoDE4Kjtal02CURhz5ZCDVGNPnSUKN0C
hflVX0//cRkHc5g=
=2Otz
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI updates from Bjorn Helgaas:
"Enumeration:
- Make pci_stop_dev() and pci_destroy_dev() safe so concurrent
callers can't stop a device multiple times, even as we migrate from
the global pci_rescan_remove_lock to finer-grained locking (Keith
Busch)
- Improve pci_walk_bus() implementation by making it recursive and
moving locking up to avoid need for a 'locked' parameter (Keith
Busch)
- Unexport pci_walk_bus_locked(), which is only used internally by
the PCI core (Keith Busch)
- Detect some Thunderbolt chips that are built-in and hence
'trustworthy' by a heuristic since the 'ExternalFacingPort' and
'usb4-host-interface' ACPI properties are not quite enough (Esther
Shimanovich)
Resource management:
- Use PCI bus addresses (not CPU addresses) in 'ranges' properties
when building dynamic DT nodes so systems where PCI and CPU
addresses differ work correctly (Andrea della Porta)
- Tidy resource sizing and assignment with helpers to reduce
redundancy (Ilpo Järvinen)
- Improve pdev_sort_resources() 'bogus alignment' warning to be more
specific (Ilpo Järvinen)
Driver binding:
- Convert driver .remove_new() callbacks to .remove() again to finish
the conversion from returning 'int' to being 'void' (Sergio
Paracuellos)
- Export pcim_request_all_regions(), a managed interface to request
all BARs (Philipp Stanner)
- Replace pcim_iomap_regions_request_all() with
pcim_request_all_regions(), and pcim_iomap_table()[n] with
pcim_iomap(n), in the following drivers: ahci, crypto qat, crypto
octeontx2, intel_th, iwlwifi, ntb idt, serial rp2, ALSA korg1212
(Philipp Stanner)
- Remove the now unused pcim_iomap_regions_request_all() (Philipp
Stanner)
- Export pcim_iounmap_region(), a managed interface to unmap and
release a PCI BAR (Philipp Stanner)
- Replace pcim_iomap_regions(mask) with pcim_iomap_region(n), and
pcim_iounmap_regions(mask) with pcim_iounmap_region(n), in the
following drivers: fpga dfl-pci, block mtip32xx, gpio-merrifield,
cavium (Philipp Stanner)
Error handling:
- Add sysfs 'reset_subordinate' to reset the entire hierarchy below a
bridge; previously Secondary Bus Reset could only be used when
there was a single device below a bridge (Keith Busch)
- Warn if we reset a running device where the driver didn't register
pci_error_handlers notification callbacks (Keith Busch)
ASPM:
- Disable ASPM L1 before touching L1 PM Substates to follow the spec
closer and avoid a CPU load timeout on some platforms (Ajay
Agarwal)
- Set devices below Intel VMD to D0 before enabling ASPM L1 Substates
as required per spec for all L1 Substates changes (Jian-Hong Pan)
Power management:
- Enable starfive controller runtime PM before probing host bridge
(Mayank Rana)
- Enable runtime power management for host bridges (Krishna chaitanya
chundru)
Power control:
- Use of_platform_device_create() instead of of_platform_populate()
to create pwrctl platform devices so we can control it based on the
child nodes (Manivannan Sadhasivam)
- Create pwrctrl platform devices only if there's a relevant power
supply property (Manivannan Sadhasivam)
- Add device link from the pwrctl supplier to the PCI dev to ensure
pwrctl drivers are probed before the PCI dev driver; this avoids a
race where pwrctl could change device power state while the PCI
driver was active (Manivannan Sadhasivam)
- Find pwrctl device for removal with of_find_device_by_node()
instead of searching all children of the parent (Manivannan
Sadhasivam)
- Rename 'pwrctl' to 'pwrctrl' to match new bandwidth controller
('bwctrl') and hotplug files (Bjorn Helgaas)
Bandwidth control:
- Add read/modify/write locking for Link Control 2, which is used to
manage Link speed (Ilpo Järvinen)
- Extract Link Bandwidth Management Status check into
pcie_lbms_seen(), where it can be shared between the bandwidth
controller and quirks that use it to help retrain failed links
(Ilpo Järvinen)
- Re-add Link Bandwidth notification support with updates to address
the reasons it was previously reverted (Alexandru Gagniuc, Ilpo
Järvinen)
- Add pcie_set_target_speed() and related functionality so drivers
can manage PCIe Link speed based on thermal or other constraints
(Ilpo Järvinen)
- Add a thermal cooling driver to throttle PCIe Links via the
existing thermal management framework (Ilpo Järvinen)
- Add a userspace selftest for the PCIe bandwidth controller (Ilpo
Järvinen)
PCI device hotplug:
- Add hotplug controller driver for Marvell OCTEON multi-function
device where function 0 has a management console interface to
enable/disable and provision various personalities for the other
functions (Shijith Thotton)
- Retain a reference to the pci_bus for the lifetime of a pci_slot to
avoid a use-after-free when the thunderbolt driver resets USB4 host
routers on boot, causing hotplug remove/add of downstream docks or
other devices (Lukas Wunner)
- Remove unused cpcihp struct cpci_hp_controller_ops.hardware_test
(Guilherme Giacomo Simoes)
- Remove unused cpqphp struct ctrl_dbg.ctrl (Christophe JAILLET)
- Use pci_bus_read_dev_vendor_id() instead of hand-coded presence
detection in cpqphp (Ilpo Järvinen)
- Simplify cpqphp enumeration, which is already simple-minded and
doesn't handle devices below hot-added bridges (Ilpo Järvinen)
Virtualization:
- Add ACS quirk for Wangxun FF5xxx NICs, which don't advertise an ACS
capability but do isolate functions as though PCI_ACS_RR and
PCI_ACS_CR were set, so the functions can be in independent IOMMU
groups (Mengyuan Lou)
TLP Processing Hints (TPH):
- Add and document TLP Processing Hints (TPH) support so drivers can
enable and disable TPH and the kernel can save/restore TPH
configuration (Wei Huang)
- Add TPH Steering Tag support so drivers can retrieve Steering Tag
values associated with specific CPUs via an ACPI _DSM to improve
performance by directing DMA writes closer to their consumers (Wei
Huang)
Data Object Exchange (DOE):
- Wait up to 1 second for DOE Busy bit to clear before writing a
request to the mailbox to avoid failures if the mailbox is still
busy from a previous transfer (Gregory Price)
Endpoint framework:
- Skip attempts to allocate from endpoint controller memory window if
the requested size is larger than the window (Damien Le Moal)
- Add and document pci_epc_mem_map() and pci_epc_mem_unmap() to
handle controller-specific size and alignment constraints, and add
test cases to the endpoint test driver (Damien Le Moal)
- Implement dwc pci_epc_ops.align_addr() so pci_epc_mem_map() can
observe DWC-specific alignment requirements (Damien Le Moal)
- Synchronously cancel command handler work in endpoint test before
cleaning up DMA and BARs (Damien Le Moal)
- Respect endpoint page size in dw_pcie_ep_align_addr() (Niklas
Cassel)
- Use dw_pcie_ep_align_addr() in dw_pcie_ep_raise_msi_irq() and
dw_pcie_ep_raise_msix_irq() instead of open coding the equivalent
(Niklas Cassel)
- Avoid NULL dereference if Modem Host Interface Endpoint lacks
'mmio' DT property (Zhongqiu Han)
- Release PCI domain ID of Endpoint controller parent (not controller
itself) and before unregistering the controller, to avoid
use-after-free (Zijun Hu)
- Clear secondary (not primary) EPC in pci_epc_remove_epf() when
removing the secondary controller associated with an NTB (Zijun Hu)
Cadence PCIe controller driver:
- Lower severity of 'phy-names' message (Bartosz Wawrzyniak)
Freescale i.MX6 PCIe controller driver:
- Fix suspend/resume support on i.MX6QDL, which has a hardware
erratum that prevents use of L2 (Stefan Eichenberger)
Intel VMD host bridge driver:
- Add 0xb60b and 0xb06f Device IDs for client SKUs (Nirmal Patel)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 DT binding to require the exact number of
clocks for each SoC (Fei Shao)
- Add support for DT 'max-link-speed' and 'num-lanes' properties to
restrict the link speed and width (AngeloGioacchino Del Regno)
Microchip PolarFlare PCIe controller driver:
- Add DT and driver support for using either of the two PolarFire
Root Ports (Conor Dooley)
NVIDIA Tegra194 PCIe controller driver:
- Move endpoint controller cleanups that depend on refclk from the
host to the notifier that tells us the host has deasserted PERST#,
when refclk should be valid (Manivannan Sadhasivam)
Qualcomm PCIe controller driver:
- Add qcom SAR2130P DT binding with an additional clock (Dmitry
Baryshkov)
- Enable MSI interrupts if 'global' IRQ is supported, since a
previous commit unintentionally masked them (Manivannan Sadhasivam)
- Move endpoint controller cleanups that depend on refclk from the
host to the notifier that tells us the host has deasserted PERST#,
when refclk should be valid (Manivannan Sadhasivam)
- Add DT binding and driver support for IPQ9574, with Synopsys IP
v5.80a and Qcom IP 1.27.0 (devi priya)
- Move the OPP "operating-points-v2" table from the
qcom,pcie-sm8450.yaml DT binding to qcom,pcie-common.yaml, where it
can be used by other Qcom platforms (Qiang Yu)
- Add 'global' SPI interrupt for events like link-up, link-down to
qcom,pcie-x1e80100 DT binding so we can start enumeration when the
link comes up (Qiang Yu)
- Disable ASPM L0s for qcom,pcie-x1e80100 since the PHY is not tuned
to support this (Qiang Yu)
- Add ops_1_21_0 for SC8280X family SoC, which doesn't use the
'iommu-map' DT property and doesn't need BDF-to-SID translation
(Qiang Yu)
Rockchip PCIe controller driver:
- Define ROCKCHIP_PCIE_AT_SIZE_ALIGN to replace magic 256 endpoint
.align value (Damien Le Moal)
- When unmapping an endpoint window, compute the region index instead
of searching for it, and verify that the address was mapped (Damien
Le Moal)
- When mapping an endpoint window, verify that the address hasn't
been mapped already (Damien Le Moal)
- Implement pci_epc_ops.align_addr() for rockchip-ep (Damien Le Moal)
- Fix MSI IRQ data mapping to observe the alignment constraint, which
fixes intermittent page faults in memcpy_toio() and memcpy_fromio()
(Damien Le Moal)
- Rename rockchip_pcie_parse_ep_dt() to
rockchip_pcie_ep_get_resources() for consistency with similar DT
interfaces (Damien Le Moal)
- Skip the unnecessary link train in rockchip_pcie_ep_probe() and do
it only in the endpoint start operation (Damien Le Moal)
- Implement pci_epc_ops.stop_link() to disable link training and
controller configuration (Damien Le Moal)
- Attempt link training at 5 GT/s when both partners support it
(Damien Le Moal)
- Add a handler for PERST# signal so we can detect host-initiated
resets and start link training after PERST# is deasserted (Damien
Le Moal)
Synopsys DesignWare PCIe controller driver:
- Clear outbound address on unmap so dw_pcie_find_index() won't match
an ATU index that was already unmapped (Damien Le Moal)
- Use of_property_present() instead of of_property_read_bool() when
testing for presence of non-boolean DT properties (Rob Herring)
- Advertise 1MB size if endpoint supports Resizable BARs, which was
inadvertently lost in v6.11 (Niklas Cassel)
TI J721E PCIe driver:
- Add PCIe support for J722S SoC (Siddharth Vadapalli)
- Delay PCIE_T_PVPERL_MS (100 ms), not just PCIE_T_PERST_CLK_US (100
us), before deasserting PERST# to ensure power and refclk are
stable (Siddharth Vadapalli)
TI Keystone PCIe controller driver:
- Set the 'ti,keystone-pcie' mode so v3.65a devices work in Root
Complex mode (Kishon Vijay Abraham I)
- Try to avoid unrecoverable SError for attempts to issue config
transactions when the link is down; this is racy but the best we
can do (Kishon Vijay Abraham I)
Miscellaneous:
- Reorganize kerneldoc parameter names to match order in function
signature (Julia Lawall)
- Fix sysfs reset_method_store() memory leak (Todd Kjos)
- Simplify pci_create_slot() (Ilpo Järvinen)
- Fix incorrect printf format specifiers in pcitest (Luo Yifan)"
* tag 'pci-v6.13-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (127 commits)
PCI: rockchip-ep: Handle PERST# signal in EP mode
PCI: rockchip-ep: Improve link training
PCI: rockship-ep: Implement the pci_epc_ops::stop_link() operation
PCI: rockchip-ep: Refactor endpoint link training enable
PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() MSI-X hiding
PCI: rockchip-ep: Refactor rockchip_pcie_ep_probe() memory allocations
PCI: rockchip-ep: Rename rockchip_pcie_parse_ep_dt()
PCI: rockchip-ep: Fix MSI IRQ data mapping
PCI: rockchip-ep: Implement the pci_epc_ops::align_addr() operation
PCI: rockchip-ep: Improve rockchip_pcie_ep_map_addr()
PCI: rockchip-ep: Improve rockchip_pcie_ep_unmap_addr()
PCI: rockchip-ep: Use a macro to define EP controller .align feature
PCI: rockchip-ep: Fix address translation unit programming
PCI/pwrctrl: Rename pwrctrl functions and structures
PCI/pwrctrl: Rename pwrctl files to pwrctrl
PCI/pwrctl: Remove pwrctl device without iterating over all children of pwrctl parent
PCI/pwrctl: Ensure that pwrctl drivers are probed before PCI client drivers
PCI/pwrctl: Create pwrctl device only if at least one power supply is present
PCI/pwrctl: Use of_platform_device_create() to create pwrctl devices
tools: PCI: Fix incorrect printf format specifiers
...
|
|
|
|
e06635e26c |
slab updates for 6.13
-----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEEe7vIQRWZI0iWSE3xu+CwddJFiJoFAmdERvEACgkQu+CwddJF iJre6Af9EBMVQiWJrmoMOjbGLqLgmZzSXRNxR862WGn4D/wesA1HmSlWgEn54hgc GIYIeD++v4JaIRNH0yZqb2UBSKjF/rYPDkKstnqgFaVakLoDrwkkwV2n3Gk5BEgR m/SzLGgoDWKR65I/oMpL6e2KrMOfMfjpB31qiVvdlaQd2Nv/5rw+gUVylxhNIZEH W11N3IC+e9hmgT3ZBpTmHeqNrlXE1+USWPrp/HV05Ndz6yf97JnP4Wr9f9pcyN3R aflLHR38+Q9cCfO7y8wNqtYvIV/kbqgdaqD76frSgalC4Lmz9+L+TZ2NuENCPoGj Xdbip2z+iffWhvqM+qooOLVxR0XqTA== =Sepb -----END PGP SIGNATURE----- Merge tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab updates from Vlastimil Babka: - Add new slab_strict_numa boot parameter to enforce per-object memory policies on top of slab folio policies, for systems where saving cost of remote accesses is more important than minimizing slab allocation overhead (Christoph Lameter) - Fix for freeptr_offset alignment check being too strict for m68k (Geert Uytterhoeven) - krealloc() fixes for not violating __GFP_ZERO guarantees on krealloc() when slub_debug (redzone and object tracking) is enabled (Feng Tang) - Fix a memory leak in case sysfs registration fails for a slab cache, and also no longer fail to create the cache in that case (Hyeonggon Yoo) - Fix handling of detected consistency problems (due to buggy slab user) with slub_debug enabled, so that it does not cause further list corruption bugs (yuan.gao) - Code cleanup and kerneldocs polishing (Zhen Lei, Vlastimil Babka) * tag 'slab-for-6.13-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: slab: Fix too strict alignment check in create_cache() mm/slab: Allow cache creation to proceed even if sysfs registration fails mm/slub: Avoid list corruption when removing a slab from the full list mm/slub, kunit: Add testcase for krealloc redzone and zeroing mm/slub: Improve redzone check and zeroing for krealloc() mm/slub: Consider kfence case for get_orig_size() SLUB: Add support for per object memory policies mm, slab: add kerneldocs for common SLAB_ flags mm/slab: remove duplicate check in create_cache() mm/slub: Move krealloc() and related code to slub.c mm/kasan: Don't store metadata inside kmalloc object when slub_debug_orig_size is on |
|
|
|
5c00ff742b |
- The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection algorithm.
This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of shadow
entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in the
hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page into
small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio size
rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel Butt
removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations" from
Mike Rapoport teaches x86 to use large pages for read-only-execute
module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in tests
over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a single
VMA, rather than requiring that multiple VMAs be created for this.
Improved efficiencies for userspace memory allocators are expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP from
the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep is
enabled.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZzwFqgAKCRDdBJ7gKXxA
jkeuAQCkl+BmeYHE6uG0hi3pRxkupseR6DEOAYIiTv0/l8/GggD/Z3jmEeqnZaNq
xyyenpibWgUoShU2wZ/Ha8FE5WDINwg=
=JfWR
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- The series "zram: optimal post-processing target selection" from
Sergey Senozhatsky improves zram's post-processing selection
algorithm. This leads to improved memory savings.
- Wei Yang has gone to town on the mapletree code, contributing several
series which clean up the implementation:
- "refine mas_mab_cp()"
- "Reduce the space to be cleared for maple_big_node"
- "maple_tree: simplify mas_push_node()"
- "Following cleanup after introduce mas_wr_store_type()"
- "refine storing null"
- The series "selftests/mm: hugetlb_fault_after_madv improvements" from
David Hildenbrand fixes this selftest for s390.
- The series "introduce pte_offset_map_{ro|rw}_nolock()" from Qi Zheng
implements some rationaizations and cleanups in the page mapping
code.
- The series "mm: optimize shadow entries removal" from Shakeel Butt
optimizes the file truncation code by speeding up the handling of
shadow entries.
- The series "Remove PageKsm()" from Matthew Wilcox completes the
migration of this flag over to being a folio-based flag.
- The series "Unify hugetlb into arch_get_unmapped_area functions" from
Oscar Salvador implements a bunch of consolidations and cleanups in
the hugetlb code.
- The series "Do not shatter hugezeropage on wp-fault" from Dev Jain
takes away the wp-fault time practice of turning a huge zero page
into small pages. Instead we replace the whole thing with a THP. More
consistent cleaner and potentiall saves a large number of pagefaults.
- The series "percpu: Add a test case and fix for clang" from Andy
Shevchenko enhances and fixes the kernel's built in percpu test code.
- The series "mm/mremap: Remove extra vma tree walk" from Liam Howlett
optimizes mremap() by avoiding doing things which we didn't need to
do.
- The series "Improve the tmpfs large folio read performance" from
Baolin Wang teaches tmpfs to copy data into userspace at the folio
size rather than as individual pages. A 20% speedup was observed.
- The series "mm/damon/vaddr: Fix issue in
damon_va_evenly_split_region()" fro Zheng Yejian fixes DAMON
splitting.
- The series "memcg-v1: fully deprecate charge moving" from Shakeel
Butt removes the long-deprecated memcgv2 charge moving feature.
- The series "fix error handling in mmap_region() and refactor" from
Lorenzo Stoakes cleanup up some of the mmap() error handling and
addresses some potential performance issues.
- The series "x86/module: use large ROX pages for text allocations"
from Mike Rapoport teaches x86 to use large pages for
read-only-execute module text.
- The series "page allocation tag compression" from Suren Baghdasaryan
is followon maintenance work for the new page allocation profiling
feature.
- The series "page->index removals in mm" from Matthew Wilcox remove
most references to page->index in mm/. A slow march towards shrinking
struct page.
- The series "damon/{self,kunit}tests: minor fixups for DAMON debugfs
interface tests" from Andrew Paniakin performs maintenance work for
DAMON's self testing code.
- The series "mm: zswap swap-out of large folios" from Kanchana Sridhar
improves zswap's batching of compression and decompression. It is a
step along the way towards using Intel IAA hardware acceleration for
this zswap operation.
- The series "kasan: migrate the last module test to kunit" from
Sabyrzhan Tasbolatov completes the migration of the KASAN built-in
tests over to the KUnit framework.
- The series "implement lightweight guard pages" from Lorenzo Stoakes
permits userapace to place fault-generating guard pages within a
single VMA, rather than requiring that multiple VMAs be created for
this. Improved efficiencies for userspace memory allocators are
expected.
- The series "memcg: tracepoint for flushing stats" from JP Kobryn uses
tracepoints to provide increased visibility into memcg stats flushing
activity.
- The series "zram: IDLE flag handling fixes" from Sergey Senozhatsky
fixes a zram buglet which potentially affected performance.
- The series "mm: add more kernel parameters to control mTHP" from
Maíra Canal enhances our ability to control/configuremultisize THP
from the kernel boot command line.
- The series "kasan: few improvements on kunit tests" from Sabyrzhan
Tasbolatov has a couple of fixups for the KASAN KUnit tests.
- The series "mm/list_lru: Split list_lru lock into per-cgroup scope"
from Kairui Song optimizes list_lru memory utilization when lockdep
is enabled.
* tag 'mm-stable-2024-11-18-19-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (215 commits)
cma: enforce non-zero pageblock_order during cma_init_reserved_mem()
mm/kfence: add a new kunit test test_use_after_free_read_nofault()
zram: fix NULL pointer in comp_algorithm_show()
memcg/hugetlb: add hugeTLB counters to memcg
vmstat: call fold_vm_zone_numa_events() before show per zone NUMA event
mm: mmap_lock: check trace_mmap_lock_$type_enabled() instead of regcount
zram: ZRAM_DEF_COMP should depend on ZRAM
MAINTAINERS/MEMORY MANAGEMENT: add document files for mm
Docs/mm/damon: recommend academic papers to read and/or cite
mm: define general function pXd_init()
kmemleak: iommu/iova: fix transient kmemleak false positive
mm/list_lru: simplify the list_lru walk callback function
mm/list_lru: split the lock to per-cgroup scope
mm/list_lru: simplify reparenting and initial allocation
mm/list_lru: code clean up for reparenting
mm/list_lru: don't export list_lru_add
mm/list_lru: don't pass unnecessary key parameters
kasan: add kunit tests for kmalloc_track_caller, kmalloc_node_track_caller
kasan: change kasan_atomics kunit test as KUNIT_CASE_SLOW
kasan: use EXPORT_SYMBOL_IF_KUNIT to export symbols
...
|
|
|
|
fcc79e1714 |
Networking changes for 6.13.
The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core
----
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL
knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read
access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code
--------------------------------------------
- Add small file operations for debugfs, to reduce the struct ops size.
- Refactoring and optimization for the implementation of page_frag API,
This is a preparatory work to consolidate the page_frag
implementation.
Netfilter
---------
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users
the option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent
CI improvements.
BPF
---
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols
---------
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after close,
the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per device
neigh lists.
Driver API
----------
- Introduce a unified interface to configure transmission H/W shaping,
and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling
-----------------
- forwarding: introduce deferred commands, to simplify
the cleanup phase
Drivers
-------
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- adds support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Adds representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- adds support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implements page pool support
- macsec:
- inherit lower device's features and TSO limits when offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- Add the dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- adds clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4
9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL
wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/
vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX
jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF
b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v
kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie
BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl
lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u
Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx
eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf
THcvVM+bupEZ
=GzPr
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Paolo Abeni:
"The most significant set of changes is the per netns RTNL. The new
behavior is disabled by default, regression risk should be contained.
Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its
default value from PTP_1588_CLOCK_KVM, as the first is intended to be
a more reliable replacement for the latter.
Core:
- Started a very large, in-progress, effort to make the RTNL lock
scope per network-namespace, thus reducing the lock contention
significantly in the containerized use-case, comprising:
- RCU-ified some relevant slices of the FIB control path
- introduce basic per netns locking helpers
- namespacified the IPv4 address hash table
- remove rtnl_register{,_module}() in favour of
rtnl_register_many()
- refactor rtnl_{new,del,set}link() moving as much validation as
possible out of RTNL lock
- convert all phonet doit() and dumpit() handlers to RCU
- convert IPv4 addresses manipulation to per-netns RTNL
- convert virtual interface creation to per-netns RTNL
the per-netns lock infrastructure is guarded by the
CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim.
- Introduce NAPI suspension, to efficiently switching between busy
polling (NAPI processing suspended) and normal processing.
- Migrate the IPv4 routing input, output and control path from direct
ToS usage to DSCP macros. This is a work in progress to make ECN
handling consistent and reliable.
- Add drop reasons support to the IPv4 rotue input path, allowing
better introspection in case of packets drop.
- Make FIB seqnum lockless, dropping RTNL protection for read access.
- Make inet{,v6} addresses hashing less predicable.
- Allow providing timestamp OPT_ID via cmsg, to correlate TX packets
and timestamps
Things we sprinkled into general kernel code:
- Add small file operations for debugfs, to reduce the struct ops
size.
- Refactoring and optimization for the implementation of page_frag
API, This is a preparatory work to consolidate the page_frag
implementation.
Netfilter:
- Optimize set element transactions to reduce memory consumption
- Extended netlink error reporting for attribute parser failure.
- Make legacy xtables configs user selectable, giving users the
option to configure iptables without enabling any other config.
- Address a lot of false-positive RCU issues, pointed by recent CI
improvements.
BPF:
- Put xsk sockets on a struct diet and add various cleanups. Overall,
this helps to bump performance by 12% for some workloads.
- Extend BPF selftests to increase coverage of XDP features in
combination with BPF cpumap.
- Optimize and homogenize bpf_csum_diff helper for all archs and also
add a batch of new BPF selftests for it.
- Extend netkit with an option to delegate skb->{mark,priority}
scrubbing to its BPF program.
- Make the bpf_get_netns_cookie() helper available also to tc(x) BPF
programs.
Protocols:
- Introduces 4-tuple hash for connected udp sockets, speeding-up
significantly connected sockets lookup.
- Add a fastpath for some TCP timers that usually expires after
close, the socket lock contention.
- Add inbound and outbound xfrm state caches to speed up state
lookups.
- Avoid sending MPTCP advertisements on stale subflows, reducing
risks on loosing them.
- Make neighbours table flushing more scalable, maintaining per
device neigh lists.
Driver API:
- Introduce a unified interface to configure transmission H/W
shaping, and expose it to user-space via generic-netlink.
- Add support for per-NAPI config via netlink. This makes napi
configuration persistent across queues removal and re-creation.
Requires driver updates, currently supported drivers are:
nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice.
- Add ethtool support for writing SFP / PHY firmware blocks.
- Track RSS context allocation from ethtool core.
- Implement support for mirroring to DSA CPU port, via TC mirror
offload.
- Consolidate FDB updates notification, to avoid duplicates on
device-specific entries.
- Expose DPLL clock quality level to the user-space.
- Support master-slave PHY config via device tree.
Tests and tooling:
- forwarding: introduce deferred commands, to simplify the cleanup
phase
Drivers:
- Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic,
Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the
IRQs and queues to NAPI IDs, allowing busy polling and better
introspection.
- Ethernet high-speed NICs:
- nVidia/Mellanox:
- mlx5:
- a large refactor to implement support for cross E-Switch
scheduling
- refactor H/W conter management to let it scale better
- H/W GRO cleanups
- Intel (100G, ice)::
- add support for ethtool reset
- implement support for per TX queue H/W shaping
- AMD/Solarflare:
- implement per device queue stats support
- Broadcom (bnxt):
- improve wildcard l4proto on IPv4/IPv6 ntuple rules
- Marvell Octeon:
- Add representor support for each Resource Virtualization Unit
(RVU) device.
- Hisilicon:
- add support for the BMC Gigabit Ethernet
- IBM (EMAC):
- driver cleanup and modernization
- Cisco (VIC):
- raise the queues number limit to 256
- Ethernet virtual:
- Google vNIC:
- implement page pool support
- macsec:
- inherit lower device's features and TSO limits when
offloading
- virtio_net:
- enable premapped mode by default
- support for XDP socket(AF_XDP) zerocopy TX
- wireguard:
- set the TSO max size to be GSO_MAX_SIZE, to aggregate larger
packets.
- Ethernet NICs embedded and virtual:
- Broadcom ASP:
- enable software timestamping
- Freescale:
- add enetc4 PF driver
- MediaTek: Airoha SoC:
- implement BQL support
- RealTek r8169:
- enable TSO by default on r8168/r8125
- implement extended ethtool stats
- Renesas AVB:
- enable TX checksum offload
- Synopsys (stmmac):
- support header splitting for vlan tagged packets
- move common code for DWMAC4 and DWXGMAC into a separate FPE
module.
- add dwmac driver support for T-HEAD TH1520 SoC
- Synopsys (xpcs):
- driver refactor and cleanup
- TI:
- icssg_prueth: add VLAN offload support
- Xilinx emaclite:
- add clock support
- Ethernet switches:
- Microchip:
- implement support for the lan969x Ethernet switch family
- add LAN9646 switch support to KSZ DSA driver
- Ethernet PHYs:
- Marvel: 88q2x: enable auto negotiation
- Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2
- PTP:
- Add support for the Amazon virtual clock device
- Add PtP driver for s390 clocks
- WiFi:
- mac80211
- EHT 1024 aggregation size for transmissions
- new operation to indicate that a new interface is to be added
- support radio separation of multi-band devices
- move wireless extension spy implementation to libiw
- Broadcom:
- brcmfmac: optional LPO clock support
- Microchip:
- add support for Atmel WILC3000
- Qualcomm (ath12k):
- firmware coredump collection support
- add debugfs support for a multitude of statistics
- Qualcomm (ath5k):
- Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support
- Realtek:
- rtw88: 8821au and 8812au USB adapters support
- rtw89: add thermal protection
- rtw89: fine tune BT-coexsitence to improve user experience
- rtw89: firmware secure boot for WiFi 6 chip
- Bluetooth
- add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and
0x13d3:0x3623
- add Realtek RTL8852BE support for id Foxconn 0xe123
- add MediaTek MT7920 support for wireless module ids
- btintel_pcie: add handshake between driver and firmware
- btintel_pcie: add recovery mechanism
- btnxpuart: add GPIO support to power save feature"
* tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits)
mm: page_frag: fix a compile error when kernel is not compiled
Documentation: tipc: fix formatting issue in tipc.rst
selftests: nic_performance: Add selftest for performance of NIC driver
selftests: nic_link_layer: Add selftest case for speed and duplex states
selftests: nic_link_layer: Add link layer selftest for NIC driver
bnxt_en: Add FW trace coredump segments to the coredump
bnxt_en: Add a new ethtool -W dump flag
bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr()
bnxt_en: Add functions to copy host context memory
bnxt_en: Do not free FW log context memory
bnxt_en: Manage the FW trace context memory
bnxt_en: Allocate backing store memory for FW trace logs
bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem()
bnxt_en: Refactor bnxt_free_ctx_mem()
bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type
bnxt_en: Update firmware interface spec to 1.10.3.85
selftests/bpf: Add some tests with sockmap SK_PASS
bpf: fix recursive lock when verdict program return SK_PASS
wireguard: device: support big tcp GSO
wireguard: selftests: load nf_conntrack if not present
...
|
|
|
|
c3cda60e83 |
Another moderately busy cycle in docsland:
- Work on Chinese translations has picked up again. Happily, they are
maintaining the existing translations and not just adding new ones.
- Some maintenance of the Japanese and Italian translations as well.
- The removal of the venerable "dontdiff" file. It has long outlived its
usefulness and contained entries ("parse.*") that would actively mask
actual source change.
- The addition of enforcement information to the code-of-conduct
documentation.
Along with some build-system fixes and a lot of typo and language fixes.
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEIw+MvkEiF49krdp9F0NaE2wMflgFAmc7eD4PHGNvcmJldEBs
d24ubmV0AAoJEBdDWhNsDH5YSp4H/2zknNZNhxAtWbF1L/MprjVgh5OtS0xEI8SR
Klks8pHm9Dg5sg3EciJ9Jt7C3ZdPANOb7K4ykL2w2TKLgZbIMUa6FIqKbASqbryX
0t3nTn0gvkVMEtLlNLw4M1QIUox55fxLKUMV0MxcTAkvmFnG6XJl2gzGoL/SrI/h
19QDAKZZn2+S7Yow8MAdfef+ILu1Y9ms/4pumeUXHgVPJO7HDMCS85zQGU3tAB2n
HgR4RRSXNsfXvW/rxx2YvGtJ3SZWnZM7NVbWcb25i8Wu/uBDOzoSW7uFRRad67cP
d0MiHrB9RqltHGaJpEUisKLpTExd/GEZlTL+ILbXDROT+BHdLDQ=
=ndvR
-----END PGP SIGNATURE-----
Merge tag 'docs-6.13' of git://git.lwn.net/linux
Pull documentation updates from Jonathan Corbet:
"Another moderately busy cycle in docsland:
- Work on Chinese translations has picked up again. Happily, they are
maintaining the existing translations and not just adding new ones.
- Some maintenance of the Japanese and Italian translations as well.
- The removal of the venerable "dontdiff" file. It has long outlived
its usefulness and contained entries ("parse.*") that would
actively mask actual source change.
- The addition of enforcement information to the code-of-conduct
documentation.
Along with some build-system fixes and a lot of typo and language
fixes"
* tag 'docs-6.13' of git://git.lwn.net/linux: (52 commits)
Documentation/CoC: spell out enforcement for unacceptable behaviors
docs: fix typos and whitespace in Documentation/process/backporting.rst
docs/zh_CN: fix one sentence in llvm.rst
docs: bug-bisect: add a note about bisecting -next
docs/zh_CN: add the translation of kbuild/llvm.rst
Documentation: Fix incorrect paths/magic in magic numbers rst
Documentation/maintainer-tip: Fix typos
Documentation: Improve crash_kexec_post_notifiers description
Docs/zh_CN: Translate physical_memory.rst to Simplified Chinese
Documentation: admin: reorganize kernel-parameters intro
docs/zh_CN: update the translation of process/programming-language.rst
docs/zh_CN: update the translation of mm/page_owner.rst
docs/zh_CN: update the translation of mm/page_table_check.rst
docs/zh_CN: update the translation of mm/overcommit-accounting.rst
docs/zh_CN: update the translation of mm/admon/faq.rst
docs/zh_CN: update the translation of mm/active_mm.rst
docs/zh_CN: update the translation of mm/hmm.rst
docs: remove Documentation/dontdiff
docs/zh_CN: Add a entry in Chinese glossary
Docs/zh_CN: Fix the pfn calculation error in page_tables.rst
...
|
|
|
|
8cdf2d1903 |
RCU pull request for v6.13
SRCU:
- Introduction of the new SRCU-lite flavour with a new pair of
srcu_read_[un]lock_lite() APIs. In practice the read side using
this flavour becomes lighter by removing a full memory barrier on
LOCK and a full memory barrier on UNLOCK. This comes at the
expense of a higher latency write side with two (in the best case
of a snaphot of unused read-sides) or more RCU grace periods on
the update side which now assumes by itself the whole full
ordering guarantee against the LOCK/UNLOCK counters on both
indexes, along with the accesses performed inside.
Uretprobes is a known potential user.
Note this doesn't replace the default normal flavour of SRCU which
still behaves the same as usual.
- Add testing of SRCU-lite through rcutorture and rcuscale
- Various cleanups on the way.
FIXES:
- Allow short-circuiting RCU-TASKS-RUDE grace periods on architectures
that have sane noinstr boundaries forbidding tracing on low-level
idle and kernel entry code. RCU-TASKS is enough on such configurations
because it involves an RCU grace period that waits for all idle
tasks to either schedule out voluntarily or enter into RCU
unwatched noinstr code.
- Allow and test start_poll_synchronize_rcu() with IRQs disabled.
- Mention rcuog kthreads in relevant documentation and Kconfig help
- Various fixes and consolidations
RCUTORTURE:
- Add --no-affinity on tools to leave the affinity setting of guests
up to the user.
- Add guest_os_delay parameter to rcuscale for better warm-up
control.
- Fix and improve some rcuscale error handling.
- Various cleanups and fixes
STALL:
- Remove dead code
- Stop dumping tasks if a stalled grace period eventually ended
midway as that only produces confusing output.
- Optimize detection of stalling CPUs and avoid useless node
locking otherwise.
NOCB:
- Fix rcu_barrier() hang due to a race against callbacks
deoffloading. This is not yet used, except by rcutorture, and
waits for its promised cpusets interface.
- Remove leftover function declaration
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEd76+gtGM8MbftQlOhSRUR1COjHcFAmc6gP0ACgkQhSRUR1CO
jHcHfw/5AWg5wiapwJtLO9KNdtELflTTbT/NhhqwYVReHnOSvtPNwWgo984T3jYJ
xikE4Ccn5Nu4zJVbTOtmwJ/RP6WWP1I28LgoTCdcz9BB9b+CRLogV/dR5r5uZbhD
+jqXRAzDhEifR0pcfSK28MkXoh+puXMg4C78f7xtT1Oe3Gr67RLf6xvE59gHJrDg
QrPStdwhOn2bhmbKcflw1bHYqpypL09P2WHuRLmsJJUMUGIHTohK05lJOkD3hV9g
HTxOecNmeF/r8NyN8l/ERJgKmwDukIG02xih8UMEtqDEl04IxZFHbCfB6yyIsKDT
fTFxnRCHnm/PxIKRA5ENvyg/6uArMJ0xuSTZRG4K5v0nx7okR8gbCPmwiwn1m5w3
+/oppjCmG/gRgyiOytuEGKfaN9q/oJqQgeS7j8WruWj9V68FYUKr6COfQByw0xOc
H6ftaLGeFHgHxk3nua2wFrfMtQhucYAMGAlVK82yd7Q1EFW47kzleO8w/HSvfrBt
trX+9HZ77GVVmREJMstnIWRr5mbPtUf8yRZdA5bBrlEYz0A/ToNaFACid0fsaMC2
Dbo9Q+wDqL2wwOpjZy+MA3k1IVyDdUTuOQmPt57LmFTxUNZ+AQQlJcrhrUqWVvdM
Nne2EHdqCHADKd7g3i17HtvpTsapz+Qakpzx8UsPqNtfo1DSd5A=
=MWrw
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Frederic Weisbecker:
"SRCU:
- Introduction of the new SRCU-lite flavour with a new pair of
srcu_read_[un]lock_lite() APIs. In practice the read side using
this flavour becomes lighter by removing a full memory barrier on
LOCK and a full memory barrier on UNLOCK. This comes at the expense
of a higher latency write side with two (in the best case of a
snaphot of unused read-sides) or more RCU grace periods on the
update side which now assumes by itself the whole full ordering
guarantee against the LOCK/UNLOCK counters on both indexes, along
with the accesses performed inside.
Uretprobes is a known potential user.
Note this doesn't replace the default normal flavour of SRCU which
still behaves the same as usual.
- Add testing of SRCU-lite through rcutorture and rcuscale
- Various cleanups on the way.
Fixes:
- Allow short-circuiting RCU-TASKS-RUDE grace periods on
architectures that have sane noinstr boundaries forbidding tracing
on low-level idle and kernel entry code. RCU-TASKS is enough on
such configurations because it involves an RCU grace period that
waits for all idle tasks to either schedule out voluntarily or
enter into RCU unwatched noinstr code.
- Allow and test start_poll_synchronize_rcu() with IRQs disabled.
- Mention rcuog kthreads in relevant documentation and Kconfig help
- Various fixes and consolidations
rcutorture:
- Add --no-affinity on tools to leave the affinity setting of guests
up to the user.
- Add guest_os_delay parameter to rcuscale for better warm-up
control.
- Fix and improve some rcuscale error handling.
- Various cleanups and fixes
stall:
- Remove dead code
- Stop dumping tasks if a stalled grace period eventually ended
midway as that only produces confusing output.
- Optimize detection of stalling CPUs and avoid useless node locking
otherwise.
NOCB:
- Fix rcu_barrier() hang due to a race against callbacks
deoffloading. This is not yet used, except by rcutorture, and waits
for its promised cpusets interface.
- Remove leftover function declaration"
* tag 'rcu.release.v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (42 commits)
rcuscale: Remove redundant WARN_ON_ONCE() splat
rcuscale: Do a proper cleanup if kfree_scale_init() fails
srcu: Unconditionally record srcu_read_lock_lite() in ->srcu_reader_flavor
srcu: Check for srcu_read_lock_lite() across all CPUs
srcu: Remove smp_mb() from srcu_read_unlock_lite()
rcutorture: Avoid printing cpu=-1 for no-fault RCU boost failure
rcuscale: Add guest_os_delay module parameter
refscale: Correct affinity check
torture: Add --no-affinity parameter to kvm.sh
rcu/nocb: Fix missed RCU barrier on deoffloading
rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu
rcu/srcutiny: don't return before reenabling preemption
rcu-tasks: Remove open-coded one-byte cmpxchg() emulation
doc: Remove kernel-parameters.txt entry for rcutorture.read_exit
rcutorture: Test start-poll primitives with interrupts disabled
rcu: Permit start_poll_synchronize_rcu*() with interrupts disabled
rcu: Allow short-circuiting of synchronize_rcu_tasks_rude()
doc: Add rcuog kthreads to kernel-per-CPU-kthreads.rst
rcu: Add rcuog kthreads to RCU_NOCB_CPU help text
rcu: Use the BITS_PER_LONG macro
...
|
|
|
|
ba1f9c8fe3 |
arm64 updates for 6.13:
* Support for running Linux in a protected VM under the Arm Confidential
Compute Architecture (CCA)
* Guarded Control Stack user-space support. Current patches follow the
x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
patches (already on the list) will add support for clone3() allowing
finer-grained control of the shadow stack size and placement from libc
* AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
getting close with the upcoming dpISA support)
* Other arch features:
- In-kernel use of the memcpy instructions, FEAT_MOPS (previously only
exposed to user; uaccess support not merged yet)
- MTE: hugetlbfs support and the corresponding kselftests
- Optimise CRC32 using the PMULL instructions
- Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG
- Optimise the kernel TLB flushing to use the range operations
- POE/pkey (permission overlays): further cleanups after bringing the
signal handler in line with the x86 behaviour for 6.12
* arm64 perf updates:
- Support for the NXP i.MX91 PMU in the existing IMX driver
- Support for Ampere SoCs in the Designware PCIe PMU driver
- Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC
- Support for Samsung's 'Mongoose' CPU PMU
- Support for PMUv3.9 finer-grained userspace counter access control
- Switch back to platform_driver::remove() now that it returns 'void'
- Add some missing events for the CXL PMU driver
* Miscellaneous arm64 fixes/cleanups:
- Page table accessors cleanup: type updates, drop unused macros,
reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
check addresses before runtime P4D/PUD folding
- Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
FEAT_ECV for the generic timers) allowing Linux to boot with
firmware deployments that don't set SCTLR_EL3.ECVEn
- ACPI/arm64: tighten the check for the array of platform timer
structures and adjust the error handling procedure in
gtdt_parse_timer_block()
- Optimise the cache flush for the uprobes xol slot (skip if no
change) and other uprobes/kprobes cleanups
- Fix the context switching of tpidrro_el0 when kpti is enabled
- Dynamic shadow call stack fixes
- Sysreg updates
- Various arm64 kselftest improvements
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmc5POIACgkQa9axLQDI
XvEDYA//a3eeNkgMuGdnSCVcLz+zy+oNwAwboG/4X1DqL8jiCbI4npwugPx95RIA
YZOUvo9T2aL3OyefpUHll4gFHqx9OwoZIig2F70TEUmlPsGUbh0KBkdfQF3xZPdl
EwV0kHSGEqMWMBwsGJGwgCYrUaf1MUQzh1GBl7VJ2ts5XsJBaBeOyKkysij26wtZ
V+aHq2IUx7qQS7+HC/4P6IoHxKziFcsCMovaKaynP4cw9xXBQbDMcNlHEwndOMyk
pu2zrv7GG0j3KQuVP/2Alf5FKhmI0GVGP/6Nc/zsOmw96w8Kf7HfzEtkHawr2aRq
rqg/c9ivzDn1p+fUBo4ZYtrRk4IAY+yKu6hdzdLTP5+bQrBTWTO9rjQVBm9FAGYT
sCdEj1NqzvExvNHD7X6ut/GJ05lmce3K+qeSXSEysN9gqiT3eomYWMXrD2V2lxzb
rIDDcb/icfaqjt14Mksh19r/rzNeq7noj9CGSmcqw0BHZfHzl38Lai6pdfYzCNyn
vCM/c4c1D/WWX8/lifO1JZVbhDk1jy82Iphg2KEhL8iKPxDsKBBZLmYuU1oa7tMo
WryGAz9+GQwd+W9chFuaOEtMnzvW2scEJ5Eb2fEf0Qj0aEurkL+C9dZR6o1GN77V
DBUxtU628Ef4PJJGfbNCwZzdd8UPYG3a/mKfQQ3dz0oz2LySlW4=
=wDot
-----END PGP SIGNATURE-----
Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for running Linux in a protected VM under the Arm
Confidential Compute Architecture (CCA)
- Guarded Control Stack user-space support. Current patches follow the
x86 ABI of implicitly creating a shadow stack on clone(). Subsequent
patches (already on the list) will add support for clone3() allowing
finer-grained control of the shadow stack size and placement from
libc
- AT_HWCAP3 support (not running out of HWCAP2 bits yet but we are
getting close with the upcoming dpISA support)
- Other arch features:
- In-kernel use of the memcpy instructions, FEAT_MOPS (previously
only exposed to user; uaccess support not merged yet)
- MTE: hugetlbfs support and the corresponding kselftests
- Optimise CRC32 using the PMULL instructions
- Support for FEAT_HAFT enabling ARCH_HAS_NONLEAF_PMD_YOUNG
- Optimise the kernel TLB flushing to use the range operations
- POE/pkey (permission overlays): further cleanups after bringing
the signal handler in line with the x86 behaviour for 6.12
- arm64 perf updates:
- Support for the NXP i.MX91 PMU in the existing IMX driver
- Support for Ampere SoCs in the Designware PCIe PMU driver
- Support for Marvell's 'PEM' PCIe PMU present in the 'Odyssey' SoC
- Support for Samsung's 'Mongoose' CPU PMU
- Support for PMUv3.9 finer-grained userspace counter access
control
- Switch back to platform_driver::remove() now that it returns
'void'
- Add some missing events for the CXL PMU driver
- Miscellaneous arm64 fixes/cleanups:
- Page table accessors cleanup: type updates, drop unused macros,
reorganise arch_make_huge_pte() and clean up pte_mkcont(), sanity
check addresses before runtime P4D/PUD folding
- Command line override for ID_AA64MMFR0_EL1.ECV (advertising the
FEAT_ECV for the generic timers) allowing Linux to boot with
firmware deployments that don't set SCTLR_EL3.ECVEn
- ACPI/arm64: tighten the check for the array of platform timer
structures and adjust the error handling procedure in
gtdt_parse_timer_block()
- Optimise the cache flush for the uprobes xol slot (skip if no
change) and other uprobes/kprobes cleanups
- Fix the context switching of tpidrro_el0 when kpti is enabled
- Dynamic shadow call stack fixes
- Sysreg updates
- Various arm64 kselftest improvements
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (168 commits)
arm64: tls: Fix context-switching of tpidrro_el0 when kpti is enabled
kselftest/arm64: Try harder to generate different keys during PAC tests
kselftest/arm64: Don't leak pipe fds in pac.exec_sign_all()
arm64/ptrace: Clarify documentation of VL configuration via ptrace
kselftest/arm64: Corrupt P0 in the irritator when testing SSVE
acpi/arm64: remove unnecessary cast
arm64/mm: Change protval as 'pteval_t' in map_range()
kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c
kselftest/arm64: Add FPMR coverage to fp-ptrace
kselftest/arm64: Expand the set of ZA writes fp-ptrace does
kselftets/arm64: Use flag bits for features in fp-ptrace assembler code
kselftest/arm64: Enable build of PAC tests with LLVM=1
kselftest/arm64: Check that SVCR is 0 in signal handlers
selftests/mm: Fix unused function warning for aarch64_write_signal_pkey()
kselftest/arm64: Fix printf() compiler warnings in the arm64 syscall-abi.c tests
kselftest/arm64: Fix printf() warning in the arm64 MTE prctl() test
kselftest/arm64: Fix printf() compiler warnings in the arm64 fp tests
kselftest/arm64: Fix build with stricter assemblers
arm64/scs: Drop unused prototype __pi_scs_patch_vmlinux()
arm64/scs: Deal with 64-bit relative offsets in FDE frames
...
|
|
|
|
d8dfba2c60 | Merge branches 'rcu/fixes', 'rcu/nocb', 'rcu/torture', 'rcu/stall' and 'rcu/srcu' into rcu/dev | |
|
|
a79993b5fc |
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore |
|
|
|
27184f8905 |
tpm: Opt-in in disable PCR integrity protection
The initial HMAC session feature added TPM bus encryption and/or integrity
protection to various in-kernel TPM operations. This can cause performance
bottlenecks with IMA, as it heavily utilizes PCR extend operations.
In order to mitigate this performance issue, introduce a kernel
command-line parameter to the TPM driver for disabling the integrity
protection for PCR extend operations (i.e. TPM2_PCR_Extend).
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Link: https://lore.kernel.org/linux-integrity/20241015193916.59964-1-zohar@linux.ibm.com/
Fixes:
|
|
|
|
0a116dc86d |
doc: Remove kernel-parameters.txt entry for rcutorture.read_exit
There is only ever the one read-exit task, and there is no module parameter named rcutorture.read_exit, so remove the bogus documentation. Instead, use rcutorture.read_exit_burst to enable/disable read-exit race testing. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> |
|
|
|
43349fc4d8 |
rcutorture: Add srcu_read_lock_lite() support to rcutorture.reader_flavor
This commit causes bit 0x4 of rcutorture.reader_flavor to select the new srcu_read_lock_lite() and srcu_read_unlock_lite() functions. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> |
|
|
|
95a5de2154 |
rcutorture: Add reader_flavor parameter for SRCU readers
This commit adds an rcutorture.reader_flavor parameter whose bits correspond to reader flavors. For example, SRCU's readers are 0x1 for normal and 0x2 for NMI-safe. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> |
|
|
|
12079a59ce |
net: Implement fault injection forcing skb reallocation
Introduce a fault injection mechanism to force skb reallocation. The primary goal is to catch bugs related to pointer invalidation after potential skb reallocation. The fault injection mechanism aims to identify scenarios where callers retain pointers to various headers in the skb but fail to reload these pointers after calling a function that may reallocate the data. This type of bug can lead to memory corruption or crashes if the old, now-invalid pointers are used. By forcing reallocation through fault injection, we can stress-test code paths and ensure proper pointer management after potential skb reallocations. Add a hook for fault injection in the following functions: * pskb_trim_rcsum() * pskb_may_pull_reason() * pskb_trim() As the other fault injection mechanism, protect it under a debug Kconfig called CONFIG_FAIL_SKB_REALLOC. This patch was *heavily* inspired by Jakub's proposal from: https://lore.kernel.org/all/20240719174140.47a868e6@kernel.org/ CC: Akinobu Mita <akinobu.mita@gmail.com> Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Akinobu Mita <akinobu.mita@gmail.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Guillaume Nault <gnault@redhat.com> Link: https://patch.msgid.link/20241107-fault_v6-v6-1-1b82cb6ecacd@debian.org Signed-off-by: Paolo Abeni <pabeni@redhat.com> |
|
|
|
24f9cd195f |
mm: shmem: override mTHP shmem default with a kernel parameter
Add the ``thp_shmem=`` kernel command line to allow specifying the default policy of each supported shmem hugepage size. The kernel parameter accepts the following format: thp_shmem=<size>[KMG],<size>[KMG]:<policy>;<size>[KMG]-<size>[KMG]:<policy> For example, thp_shmem=16K-64K:always;128K,512K:inherit;256K:advise;1M-2M:never;4M-8M:within_size Some GPUs may benefit from using huge pages. Since DRM GEM uses shmem to allocate anonymous pageable memory, it's essential to control the huge page allocation policy for the internal shmem mount. This control can be achieved through the ``transparent_hugepage_shmem=`` parameter. Beyond just setting the allocation policy, it's crucial to have granular control over the size of huge pages that can be allocated. The GPU may support only specific huge page sizes, and allocating pages larger/smaller than those sizes would be ineffective. Link: https://lkml.kernel.org/r/20241101165719.1074234-6-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
9490428111 |
mm: shmem: control THP support through the kernel command line
Patch series "mm: add more kernel parameters to control mTHP", v5. This series introduces four patches related to the kernel parameters controlling mTHP and a fifth patch replacing `strcpy()` for `strscpy()` in the file `mm/huge_memory.c`. The first patch is a straightforward documentation update, correcting the format of the kernel parameter ``thp_anon=``. The second, third, and fourth patches focus on controlling THP support for shmem via the kernel command line. The second patch introduces a parameter to control the global default huge page allocation policy for the internal shmem mount. The third patch moves a piece of code to a shared header to ease the implementation of the fourth patch. Finally, the fourth patch implements a parameter similar to ``thp_anon=``, but for shmem. The goal of these changes is to simplify the configuration of systems that rely on mTHP support for shmem. For instance, a platform with a GPU that benefits from huge pages may want to enable huge pages for shmem. Having these kernel parameters streamlines the configuration process and ensures consistency across setups. This patch (of 4): Add a new kernel command line to control the hugepage allocation policy for the internal shmem mount, ``transparent_hugepage_shmem``. The parameter is similar to ``transparent_hugepage`` and has the following format: transparent_hugepage_shmem=<policy> where ``<policy>`` is one of the seven valid policies available for shmem. Configuring the default huge page allocation policy for the internal shmem mount can be beneficial for DRM GPU drivers. Just as CPU architectures, GPUs can also take advantage of huge pages, but this is possible only if DRM GEM objects are backed by huge pages. Since GEM uses shmem to allocate anonymous pageable memory, having control over the default huge page allocation policy allows for the exploration of huge pages use on GPUs that rely on GEM objects backed by shmem. Link: https://lkml.kernel.org/r/20241101165719.1074234-2-mcanal@igalia.com Link: https://lkml.kernel.org/r/20241101165719.1074234-4-mcanal@igalia.com Signed-off-by: Maíra Canal <mcanal@igalia.com> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: Hugh Dickins <hughd@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: kernel-dev@igalia.com Cc: Lance Yang <ioworker0@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
652e1a5146 |
mm: fix docs for the kernel parameter ``thp_anon=``
If we add ``thp_anon=32,64K:always`` to the kernel command line, we
will see the following error:
[ 0.000000] huge_memory: thp_anon=32,64K:always: error parsing string, ignoring setting
This happens because the correct format isn't ``thp_anon=<size>,<size>[KMG]:<state>```,
as [KMG] must follow each number to especify its unit. So, the correct
format is ``thp_anon=<size>[KMG],<size>[KMG]:<state>```.
Therefore, adjust the documentation to reflect the correct format of the
parameter ``thp_anon=``.
Link: https://lkml.kernel.org/r/20241101165719.1074234-3-mcanal@igalia.com
Fixes:
|
|
|
|
7d6094e62c |
Documentation: Improve crash_kexec_post_notifiers description
The crash_kexec_post_notifiers description could be improved a bit, by clarifying its upsides (yes, there are some!) and be more descriptive about the downsides, specially mentioning code that enables the option unconditionally, like Hyper-V[0], PowerPC (fadump)[1] and more recently, AMD SEV-SNP[2]. [0] Commit |
|
|
|
f7c80fad6c |
SLUB: Add support for per object memory policies
The old SLAB allocator used to support memory policies on a per
allocation bases. In SLUB the memory policies are applied on a
per page frame / folio bases. Doing so avoids having to check memory
policies in critical code paths for kmalloc and friends.
This worked on general well on Intel/AMD/PowerPC because the
interconnect technology is mature and can minimize the latencies
through intelligent caching even if a small object is not
placed optimally.
However, on ARM we have an emergence of new NUMA interconnect
technology based more on embedded devices. Caching of remote content
can currently be ineffective using the standard building blocks / mesh
available on that platform. Such architectures benefit if each slab
object is individually placed according to memory policies
and other restrictions.
This patch adds another kernel parameter
slab_strict_numa
If that is set then a static branch is activated that will cause
the hotpaths of the allocator to evaluate the current memory
allocation policy. Each object will be properly placed by
paying the price of extra processing and SLUB will no longer
defer to the page allocator to apply memory policies at the
folio level.
This patch improves performance of memcached running
on Ampere Altra 2P system (ARM Neoverse N1 processor)
by 3.6% due to accurate placement of small kernel objects.
Tested-by: Huang Shijie <shijie@os.amperecomputing.com>
Signed-off-by: Christoph Lameter (Ampere) <cl@gentwo.org>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
|
|
|
|
f7e1d19105 |
Documentation/tracing: Mention that RESET_ATTACK_MITIGATION can clear memory
At the 2024 Linux Plumbers Conference, I was talking with Hans de Goede about the persistent buffer to display traces from previous boots. He mentioned that UEFI can clear memory. In my own tests I have not seen this. He later informed me that it requires the config option: CONFIG_RESET_ATTACK_MITIGATION It appears that setting this will allow the memory to be cleared on boot up, which will definitely clear out the trace of the previous boot. Add this information under the trace_instance in kernel-parameters.txt to let people know that this can cause issues. Link: https://lore.kernel.org/all/20170825155019.6740-2-ard.biesheuvel@linaro.org/ Reported-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20241007131653.35837081@gandalf.local.home |
|
|
|
cbcb7edd09 |
soundwire: intel_auxdevice: add kernel parameter for mclk divider
Add a kernel parameter to work-around discrepancies between hardware and platform firmware, it's not unusual to see e.g. 38.4MHz listed in _DSD properties as the SoundWire clock source, but the hardware may be based on a 19.2 MHz mclk source. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Link: https://lore.kernel.org/r/20241004021850.9758-1-yung-chuan.liao@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> |
|
|
|
a94452112c |
arm64/idreg: Add overrride for GCS
Hook up an override for GCS, allowing it to be disabled from the command line by specifying arm64.nogcs in case there are problems. Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-17-222b78d87eee@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> |
|
|
|
f69767a1ad |
PCI: Add TLP Processing Hints (TPH) support
Add support for PCIe TLP Processing Hints (TPH) support (see PCIe r6.2, sec 6.17). Add TPH register definitions in pci_regs.h, including the TPH Requester capability register, TPH Requester control register, TPH Completer capability, and the ST fields of MSI-X entry. Introduce pcie_enable_tph() and pcie_disable_tph(), enabling drivers to toggle TPH support and configure specific ST mode as needed. Also add a new kernel parameter, "pci=notph", allowing users to disable TPH support across the entire system. Link: https://lore.kernel.org/r/20241002165954.128085-2-wei.huang2@amd.com Co-developed-by: Jing Liu <jing2.liu@intel.com> Co-developed-by: Paul Luse <paul.e.luse@linux.intel.com> Co-developed-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Jing Liu <jing2.liu@intel.com> Signed-off-by: Paul Luse <paul.e.luse@linux.intel.com> Signed-off-by: Eric Van Tassell <Eric.VanTassell@amd.com> Signed-off-by: Wei Huang <wei.huang2@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com> Reviewed-by: Somnath Kotur <somnath.kotur@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Reviewed-by: Lukas Wunner <lukas@wunner.de> |
|
|
|
3efc57369a |
x86:
* KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or deleted.
The former does not have particularly noticeable overhead, but Intel's
TDX will require the guest to re-accept private pages if they are
dropped from the secure EPT, which is a non starter. Actually,
the only reason why this is not already being done is a bug which
was never fully investigated and caused VM instability with assigned
GeForce GPUs, so allow userspace to opt into the new behavior.
* Advertise AVX10.1 to userspace (effectively prep work for the "real" AVX10
functionality that is on the horizon).
* Rework common MSR handling code to suppress errors on userspace accesses to
unsupported-but-advertised MSRs. This will allow removing (almost?) all of
KVM's exemptions for userspace access to MSRs that shouldn't exist based on
the vCPU model (the actual cleanup is non-trivial future work).
* Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC) splits the
64-bit value into the legacy ICR and ICR2 storage, whereas Intel (APICv)
stores the entire 64-bit value at the ICR offset.
* Fix a bug where KVM would fail to exit to userspace if one was triggered by
a fastpath exit handler.
* Add fastpath handling of HLT VM-Exit to expedite re-entering the guest when
there's already a pending wake event at the time of the exit.
* Fix a WARN caused by RSM entering a nested guest from SMM with invalid guest
state, by forcing the vCPU out of guest mode prior to signalling SHUTDOWN
(the SHUTDOWN hits the VM altogether, not the nested guest)
* Overhaul the "unprotect and retry" logic to more precisely identify cases
where retrying is actually helpful, and to harden all retry paths against
putting the guest into an infinite retry loop.
* Add support for yielding, e.g. to honor NEED_RESCHED, when zapping rmaps in
the shadow MMU.
* Refactor pieces of the shadow MMU related to aging SPTEs in prepartion for
adding multi generation LRU support in KVM.
* Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is enabled,
i.e. when the CPU has already flushed the RSB.
* Trace the per-CPU host save area as a VMCB pointer to improve readability
and cleanup the retrieval of the SEV-ES host save area.
* Remove unnecessary accounting of temporary nested VMCB related allocations.
* Set FINAL/PAGE in the page fault error code for EPT violations if and only
if the GVA is valid. If the GVA is NOT valid, there is no guest-side page
table walk and so stuffing paging related metadata is nonsensical.
* Fix a bug where KVM would incorrectly synthesize a nested VM-Exit instead of
emulating posted interrupt delivery to L2.
* Add a lockdep assertion to detect unsafe accesses of vmcs12 structures.
* Harden eVMCS loading against an impossible NULL pointer deref (really truly
should be impossible).
* Minor SGX fix and a cleanup.
* Misc cleanups
Generic:
* Register KVM's cpuhp and syscore callbacks when enabling virtualization in
hardware, as the sole purpose of said callbacks is to disable and re-enable
virtualization as needed.
* Enable virtualization when KVM is loaded, not right before the first VM
is created. Together with the previous change, this simplifies a
lot the logic of the callbacks, because their very existence implies
virtualization is enabled.
* Fix a bug that results in KVM prematurely exiting to userspace for coalesced
MMIO/PIO in many cases, clean up the related code, and add a testcase.
* Fix a bug in kvm_clear_guest() where it would trigger a buffer overflow _if_
the gpa+len crosses a page boundary, which thankfully is guaranteed to not
happen in the current code base. Add WARNs in more helpers that read/write
guest memory to detect similar bugs.
Selftests:
* Fix a goof that caused some Hyper-V tests to be skipped when run on bare
metal, i.e. NOT in a VM.
* Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES guest.
* Explicitly include one-off assets in .gitignore. Past Sean was completely
wrong about not being able to detect missing .gitignore entries.
* Verify userspace single-stepping works when KVM happens to handle a VM-Exit
in its fastpath.
* Misc cleanups
-----BEGIN PGP SIGNATURE-----
iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmb201AUHHBib256aW5p
QHJlZGhhdC5jb20ACgkQv/vSX3jHroOM1gf+Ij7dpCh0KwoNYlHfW2aCHAv3PqQd
cKMDSGxoCernbJEyPO/3qXNUK+p4zKedk3d92snW3mKa+cwxMdfthJ3i9d7uoNiw
7hAgcfKNHDZGqAQXhx8QcVF3wgp+diXSyirR+h1IKrGtCCmjMdNC8ftSYe6voEkw
VTVbLL+tER5H0Xo5UKaXbnXKDbQvWLXkdIqM8dtLGFGLQ2PnF/DdMP0p6HYrKf1w
B7LBu0rvqYDL8/pS82mtR3brHJXxAr9m72fOezRLEUbfUdzkTUi/b1vEe6nDCl0Q
i/PuFlARDLWuetlR0VVWKNbop/C/l4EmwCcKzFHa+gfNH3L9361Oz+NzBw==
=Q7kz
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull x86 kvm updates from Paolo Bonzini:
"x86:
- KVM currently invalidates the entirety of the page tables, not just
those for the memslot being touched, when a memslot is moved or
deleted.
This does not traditionally have particularly noticeable overhead,
but Intel's TDX will require the guest to re-accept private pages
if they are dropped from the secure EPT, which is a non starter.
Actually, the only reason why this is not already being done is a
bug which was never fully investigated and caused VM instability
with assigned GeForce GPUs, so allow userspace to opt into the new
behavior.
- Advertise AVX10.1 to userspace (effectively prep work for the
"real" AVX10 functionality that is on the horizon)
- Rework common MSR handling code to suppress errors on userspace
accesses to unsupported-but-advertised MSRs
This will allow removing (almost?) all of KVM's exemptions for
userspace access to MSRs that shouldn't exist based on the vCPU
model (the actual cleanup is non-trivial future work)
- Rework KVM's handling of x2APIC ICR, again, because AMD (x2AVIC)
splits the 64-bit value into the legacy ICR and ICR2 storage,
whereas Intel (APICv) stores the entire 64-bit value at the ICR
offset
- Fix a bug where KVM would fail to exit to userspace if one was
triggered by a fastpath exit handler
- Add fastpath handling of HLT VM-Exit to expedite re-entering the
guest when there's already a pending wake event at the time of the
exit
- Fix a WARN caused by RSM entering a nested guest from SMM with
invalid guest state, by forcing the vCPU out of guest mode prior to
signalling SHUTDOWN (the SHUTDOWN hits the VM altogether, not the
nested guest)
- Overhaul the "unprotect and retry" logic to more precisely identify
cases where retrying is actually helpful, and to harden all retry
paths against putting the guest into an infinite retry loop
- Add support for yielding, e.g. to honor NEED_RESCHED, when zapping
rmaps in the shadow MMU
- Refactor pieces of the shadow MMU related to aging SPTEs in
prepartion for adding multi generation LRU support in KVM
- Don't stuff the RSB after VM-Exit when RETPOLINE=y and AutoIBRS is
enabled, i.e. when the CPU has already flushed the RSB
- Trace the per-CPU host save area as a VMCB pointer to improve
readability and cleanup the retrieval of the SEV-ES host save area
- Remove unnecessary accounting of temporary nested VMCB related
allocations
- Set FINAL/PAGE in the page fault error code for EPT violations if
and only if the GVA is valid. If the GVA is NOT valid, there is no
guest-side page table walk and so stuffing paging related metadata
is nonsensical
- Fix a bug where KVM would incorrectly synthesize a nested VM-Exit
instead of emulating posted interrupt delivery to L2
- Add a lockdep assertion to detect unsafe accesses of vmcs12
structures
- Harden eVMCS loading against an impossible NULL pointer deref
(really truly should be impossible)
- Minor SGX fix and a cleanup
- Misc cleanups
Generic:
- Register KVM's cpuhp and syscore callbacks when enabling
virtualization in hardware, as the sole purpose of said callbacks
is to disable and re-enable virtualization as needed
- Enable virtualization when KVM is loaded, not right before the
first VM is created
Together with the previous change, this simplifies a lot the logic
of the callbacks, because their very existence implies
virtualization is enabled
- Fix a bug that results in KVM prematurely exiting to userspace for
coalesced MMIO/PIO in many cases, clean up the related code, and
add a testcase
- Fix a bug in kvm_clear_guest() where it would trigger a buffer
overflow _if_ the gpa+len crosses a page boundary, which thankfully
is guaranteed to not happen in the current code base. Add WARNs in
more helpers that read/write guest memory to detect similar bugs
Selftests:
- Fix a goof that caused some Hyper-V tests to be skipped when run on
bare metal, i.e. NOT in a VM
- Add a regression test for KVM's handling of SHUTDOWN for an SEV-ES
guest
- Explicitly include one-off assets in .gitignore. Past Sean was
completely wrong about not being able to detect missing .gitignore
entries
- Verify userspace single-stepping works when KVM happens to handle a
VM-Exit in its fastpath
- Misc cleanups"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (127 commits)
Documentation: KVM: fix warning in "make htmldocs"
s390: Enable KVM_S390_UCONTROL config in debug_defconfig
selftests: kvm: s390: Add VM run test case
KVM: SVM: let alternatives handle the cases when RSB filling is required
KVM: VMX: Set PFERR_GUEST_{FINAL,PAGE}_MASK if and only if the GVA is valid
KVM: x86/mmu: Use KVM_PAGES_PER_HPAGE() instead of an open coded equivalent
KVM: x86/mmu: Add KVM_RMAP_MANY to replace open coded '1' and '1ul' literals
KVM: x86/mmu: Fold mmu_spte_age() into kvm_rmap_age_gfn_range()
KVM: x86/mmu: Morph kvm_handle_gfn_range() into an aging specific helper
KVM: x86/mmu: Honor NEED_RESCHED when zapping rmaps and blocking is allowed
KVM: x86/mmu: Add a helper to walk and zap rmaps for a memslot
KVM: x86/mmu: Plumb a @can_yield parameter into __walk_slot_rmaps()
KVM: x86/mmu: Move walk_slot_rmaps() up near for_each_slot_rmap_range()
KVM: x86/mmu: WARN on MMIO cache hit when emulating write-protected gfn
KVM: x86/mmu: Detect if unprotect will do anything based on invalid_list
KVM: x86/mmu: Subsume kvm_mmu_unprotect_page() into the and_retry() version
KVM: x86: Rename reexecute_instruction()=>kvm_unprotect_and_retry_on_failure()
KVM: x86: Update retry protection fields when forcing retry on emulation failure
KVM: x86: Apply retry protection to "unprotect on failure" path
KVM: x86: Check EMULTYPE_WRITE_PF_TO_SP before unprotecting gfn
...
|
|
|
|
af9c191ac2 |
ring-buffer: Updates for v6.12:
- Merged v6.11-rc3 into trace/ring-buffer/core
The v6.10 ring buffer pull request was not made due to Mathieu Desnoyers
making a comment to the pull request. Mathieu and I resolved it on IRC,
but we did not let Linus know that it was resolved. Linus did not do the
pull thinking it still had some unresolved issues.
The ring buffer work for 6.12 was dependent on both this pull request as
well as the reserve_mem kernel command line option that was going upstream
through the memory management tree. The ring buffer repo was being used by
others so it could not be rebased. In order to continue the work, the
v6.11-rc3 branch was pulled in to get access to the reserve_mem work.
This has the 6.11 pull request that did not make it into 6.11, which was:
tracing/ring-buffer: Have persistent buffer across reboots
This allows for the tracing instance ring buffer to stay persistent across
reboots. The way this is done is by adding to the kernel command line:
trace_instance=boot_map@0x285400000:12M
This will reserve 12 megabytes at the address 0x285400000, and then map
the tracing instance "boot_map" ring buffer to that memory. This will
appear as a normal instance in the tracefs system:
/sys/kernel/tracing/instances/boot_map
A user could enable tracing in that instance, and on reboot or kernel
crash, if the memory is not wiped by the firmware, it will recreate the
trace in that instance. For example, if one was debugging a shutdown of a
kernel reboot:
# cd /sys/kernel/tracing
# echo function > instances/boot_map/current_tracer
# reboot
[..]
# cd /sys/kernel/tracing
# tail instances/boot_map/trace
swapper/0-1 [000] d..1. 164.549800: restore_boot_irq_mode <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549801: native_restore_boot_irq_mode <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549802: disconnect_bsp_APIC <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549811: hpet_disable <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549812: iommu_shutdown_noop <-native_machine_restart
swapper/0-1 [000] d..1. 164.549813: native_machine_emergency_restart <-__do_sys_reboot
swapper/0-1 [000] d..1. 164.549813: tboot_shutdown <-native_machine_emergency_restart
swapper/0-1 [000] d..1. 164.549820: acpi_reboot <-native_machine_emergency_restart
swapper/0-1 [000] d..1. 164.549821: acpi_reset <-acpi_reboot
swapper/0-1 [000] d..1. 164.549822: acpi_os_write_port <-acpi_reboot
On reboot, the buffer is examined to make sure it is valid. The validation
check even steps through every event to make sure the meta data of the
event is correct. If any test fails, it will simply reset the buffer, and
the buffer will be empty on boot.
The new changes for 6.12 are:
- Allow the tracing persistent boot buffer to use the "reserve_mem" option
Instead of having the admin find a physical address to store the persistent
buffer, which can be very tedious if they have to administrate several
different machines, allow them to use the "reserve_mem" option that will
find a location for them. It is not as reliable because of KASLR, as the
loading of the kernel in different locations can cause the memory
allocated to be inconsistent. Booting with "nokaslr" can make reserve_mem
more reliable.
- Have function graph tracer handle offsets from a previous boot.
The ring buffer output from a previous boot may have different addresses
due to kaslr. Have the function graph tracer handle these by using the
delta from the previous boot to the new boot address space.
- Only reset the saved meta offset when the buffer is started or reset
In the persistent memory meta data, it holds the previous address space
information, so that it can calculate the delta to have function tracing
work. But this gets updated after being read to hold the new address
space. But if the buffer isn't used for that boot, on reboot, the delta is
now calculated from the previous boot and not the boot that holds the data
in the ring buffer. This causes the functions not to be shown. Do not save
the address space information of the current kernel until it is being
recorded.
- Add a magic variable to test the valid meta data
Add a magic variable in the meta data that can also be used for
validation. The validator of the previous buffer doesn't need this magic
data, but it can be used if the meta data is changed by a new kernel, which
may have the same format that passes the validator but is used
differently. This magic number can also be used as a "versioning" of the
meta data.
- Align user space mapped ring buffer sub buffers to improve TLB entries
Linus mentioned that the mapped ring buffer sub buffers were misaligned
between the meta page and the sub-buffers, so that if the sub-buffers were
bigger than PAGE_SIZE, it wouldn't allow the TLB to use bigger entries.
- Add new kernel command line "traceoff" to disable tracing on boot for instances
If tracing is enabled for a boot instance, there needs a way to be able to
disable it on boot so that new events do not get entered into the ring
buffer and be mixed with events from a previous boot, as that can be
confusing.
- Allow trace_printk() to go to other instances
Currently, trace_printk() can only go to the top level instance. When
debugging with a persistent buffer, it is really useful to be able to add
trace_printk() to go to that buffer, so that you have access to them after
a crash.
- Do not use "bin_printk()" for traces to a boot instance
The bin_printk() saves only a pointer to the printk format in the ring
buffer, as the reader of the buffer can still have access to it. But this
is not the case if the buffer is from a previous boot. If the
trace_printk() is going to a "persistent" buffer, it will use the slower
version that writes the printk format into the buffer.
- Add command line option to allow trace_printk() to go to an instance
Allow the kernel command line to define which instance the trace_printk()
goes to, instead of forcing the admin to set it for every boot via the
tracefs options.
- Start a document that explains how to use tracefs to debug the kernel
- Add some more kernel selftests to test user mapped ring buffer
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZu/PxxQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6qowiAQCx86Nm48aCACjrvGWCFb+jgQZn8QdO
MeK15Fcc5C3b5gEAkJkDKqtul7ybI9+vq+3yNzdl7pO7Y7+pCNzz3PfVaQA=
=Ce81
-----END PGP SIGNATURE-----
Merge tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull ring-buffer updates from Steven Rostedt:
- tracing/ring-buffer: persistent buffer across reboots
This allows for the tracing instance ring buffer to stay persistent
across reboots. The way this is done is by adding to the kernel
command line:
trace_instance=boot_map@0x285400000:12M
This will reserve 12 megabytes at the address 0x285400000, and then
map the tracing instance "boot_map" ring buffer to that memory. This
will appear as a normal instance in the tracefs system:
/sys/kernel/tracing/instances/boot_map
A user could enable tracing in that instance, and on reboot or kernel
crash, if the memory is not wiped by the firmware, it will recreate
the trace in that instance. For example, if one was debugging a
shutdown of a kernel reboot:
# cd /sys/kernel/tracing
# echo function > instances/boot_map/current_tracer
# reboot
[..]
# cd /sys/kernel/tracing
# tail instances/boot_map/trace
swapper/0-1 [000] d..1. 164.549800: restore_boot_irq_mode <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549801: native_restore_boot_irq_mode <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549802: disconnect_bsp_APIC <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549811: hpet_disable <-native_machine_shutdown
swapper/0-1 [000] d..1. 164.549812: iommu_shutdown_noop <-native_machine_restart
swapper/0-1 [000] d..1. 164.549813: native_machine_emergency_restart <-__do_sys_reboot
swapper/0-1 [000] d..1. 164.549813: tboot_shutdown <-native_machine_emergency_restart
swapper/0-1 [000] d..1. 164.549820: acpi_reboot <-native_machine_emergency_restart
swapper/0-1 [000] d..1. 164.549821: acpi_reset <-acpi_reboot
swapper/0-1 [000] d..1. 164.549822: acpi_os_write_port <-acpi_reboot
On reboot, the buffer is examined to make sure it is valid. The
validation check even steps through every event to make sure the meta
data of the event is correct. If any test fails, it will simply reset
the buffer, and the buffer will be empty on boot.
- Allow the tracing persistent boot buffer to use the "reserve_mem"
option
Instead of having the admin find a physical address to store the
persistent buffer, which can be very tedious if they have to
administrate several different machines, allow them to use the
"reserve_mem" option that will find a location for them. It is not as
reliable because of KASLR, as the loading of the kernel in different
locations can cause the memory allocated to be inconsistent. Booting
with "nokaslr" can make reserve_mem more reliable.
- Have function graph tracer handle offsets from a previous boot.
The ring buffer output from a previous boot may have different
addresses due to kaslr. Have the function graph tracer handle these
by using the delta from the previous boot to the new boot address
space.
- Only reset the saved meta offset when the buffer is started or reset
In the persistent memory meta data, it holds the previous address
space information, so that it can calculate the delta to have
function tracing work. But this gets updated after being read to hold
the new address space. But if the buffer isn't used for that boot, on
reboot, the delta is now calculated from the previous boot and not
the boot that holds the data in the ring buffer. This causes the
functions not to be shown. Do not save the address space information
of the current kernel until it is being recorded.
- Add a magic variable to test the valid meta data
Add a magic variable in the meta data that can also be used for
validation. The validator of the previous buffer doesn't need this
magic data, but it can be used if the meta data is changed by a new
kernel, which may have the same format that passes the validator but
is used differently. This magic number can also be used as a
"versioning" of the meta data.
- Align user space mapped ring buffer sub buffers to improve TLB
entries
Linus mentioned that the mapped ring buffer sub buffers were
misaligned between the meta page and the sub-buffers, so that if the
sub-buffers were bigger than PAGE_SIZE, it wouldn't allow the TLB to
use bigger entries.
- Add new kernel command line "traceoff" to disable tracing on boot for
instances
If tracing is enabled for a boot instance, there needs a way to be
able to disable it on boot so that new events do not get entered into
the ring buffer and be mixed with events from a previous boot, as
that can be confusing.
- Allow trace_printk() to go to other instances
Currently, trace_printk() can only go to the top level instance. When
debugging with a persistent buffer, it is really useful to be able to
add trace_printk() to go to that buffer, so that you have access to
them after a crash.
- Do not use "bin_printk()" for traces to a boot instance
The bin_printk() saves only a pointer to the printk format in the
ring buffer, as the reader of the buffer can still have access to it.
But this is not the case if the buffer is from a previous boot. If
the trace_printk() is going to a "persistent" buffer, it will use the
slower version that writes the printk format into the buffer.
- Add command line option to allow trace_printk() to go to an instance
Allow the kernel command line to define which instance the
trace_printk() goes to, instead of forcing the admin to set it for
every boot via the tracefs options.
- Start a document that explains how to use tracefs to debug the kernel
- Add some more kernel selftests to test user mapped ring buffer
* tag 'trace-ring-buffer-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (28 commits)
selftests/ring-buffer: Handle meta-page bigger than the system
selftests/ring-buffer: Verify the entire meta-page padding
tracing/Documentation: Start a document on how to debug with tracing
tracing: Add option to set an instance to be the trace_printk destination
tracing: Have trace_printk not use binary prints if boot buffer
tracing: Allow trace_printk() to go to other instance buffers
tracing: Add "traceoff" flag to boot time tracing instances
ring-buffer: Align meta-page to sub-buffers for improved TLB usage
ring-buffer: Add magic and struct size to boot up meta data
ring-buffer: Don't reset persistent ring-buffer meta saved addresses
tracing/fgraph: Have fgraph handle previous boot function addresses
tracing: Allow boot instances to use reserve_mem boot memory
tracing: Fix ifdef of snapshots to not prevent last_boot_info file
ring-buffer: Use vma_pages() helper function
tracing: Fix NULL vs IS_ERR() check in enable_instances()
tracing: Add last boot delta offset for stack traces
tracing: Update function tracing output for previous boot buffer
tracing: Handle old buffer mappings for event strings and functions
tracing/ring-buffer: Add last_boot_info file to boot instance
ring-buffer: Save text and data locations in mapped meta data
...
|
|
|
|
617a814f14 |
ALong with the usual shower of singleton patches, notable patch series in
this pull request are:
"Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
"Some cleanups for shmem" from Baolin Wang. No functional changes - mode
code reuse, better function naming, logic simplifications.
"mm: some small page fault cleanups" from Josef Bacik. No functional
changes - code cleanups only.
"Various memory tiering fixes" from Zi Yan. A small fix and a little
cleanup.
"mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
"Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This
is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at all
used 16k. Useful for some system tuning things, but partivularly useful
for "the dynamic kernel stack project".
"kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov.
Teaches kmemleak to detect leaksage of percpu memory.
"mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
"mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David
Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work
correctly by design rather than by accident.
"mm: remove arch_make_page_accessible()" from David Hildenbrand. Some
folio conversions which make arch_make_page_accessible() unneeded.
"mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel.
Cleans up and fixes our handling of the resetting of the cgroup/process
peak-memory-use detector.
"Make core VMA operations internal and testable" from Lorenzo Stoakes.
Rationalizaion and encapsulation of the VMA manipulation APIs. With a
view to better enable testing of the VMA functions, even from a
userspace-only harness.
"mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in
the zswap global shrinker, resulting in improved performance.
"mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in
some missing info in /proc/zoneinfo.
"mm: replace follow_page() by folio_walk" from David Hildenbrand. Code
cleanups and rationalizations (conversion to folio_walk()) resulting in
the removal of follow_page().
"improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some
tuning to improve zswap's dynamic shrinker. Significant reductions in
swapin and improvements in performance are shown.
"mm: Fix several issues with unaccepted memory" from Kirill Shutemov.
Improvements to the new unaccepted memory feature,
"mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX
PUDs. This was missing, although nobody seems to have notied yet.
"Introduce a store type enum for the Maple tree" from Sidhartha Kumar.
Cleanups and modest performance improvements for the maple tree library
code.
"memcg: further decouple v1 code from v2" from Shakeel Butt. Move more
cgroup v1 remnants away from the v2 memcg code.
"memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds
various warnings telling users that memcg v1 features are deprecated.
"mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li.
Greatly improves the success rate of the mTHP swap allocation.
"mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate
per-arch implementations of numa_memblk code into generic code.
"mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
"support large folio swap-out and swap-in for shmem" from Baolin Wang.
With this series we no longer split shmem large folios into simgle-page
folios when swapping out shmem.
"mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance
improvements and code reductions for gigantic folios.
"support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
"mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
"Increase the number of bits available in page_type" from Matthew Wilcox.
Increases the number of bits available in page_type!
"Simplify the page flags a little" from Matthew Wilcox. Many legacy page
flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
"mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An
optimization which permits us to avoid writing/reading zero-filled zswap
pages to backing store.
"Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window
which occurs when a MAP_FIXED operqtion is occurring during an unrelated
vma tree walk.
"mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the
vma_merge() functionality, making ot cleaner, more testable and better
tested.
"misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor
fixups of DAMON selftests and kunit tests.
"mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code
cleanups and folio conversions.
"Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups
for shmem controls and stats.
"mm: count the number of anonymous THPs per size" from Barry Song. Expose
additional anon THP stats to userspace for improved tuning.
"mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio
conversions and removal of now-unused page-based APIs.
"replace per-quota region priorities histogram buffer with per-context
one" from SeongJae Park. DAMON histogram rationalization.
"Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae
Park. DAMON documentation updates.
"mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve
related doc and warn" from Jason Wang: fixes usage of page allocator
__GFP_NOFAIL and GFP_ATOMIC flags.
"mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this
was overprovisioning THPs in sparsely accessed memory areas.
"zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add
support for zram run-time compression algorithm tuning.
"mm: Care about shadow stack guard gap when getting an unmapped area" from
Mark Brown. Fix up the various arch_get_unmapped_area() implementations
to better respect guard areas.
"Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of
mem_cgroup_iter() and various code cleanups.
"mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
"resource: Fix region_intersects() vs add_memory_driver_managed()" from
Huang Ying. Fix a bug in region_intersects() for systems with CXL memory.
"mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a
couple more code paths to correctly recover from the encountering of
poisoned memry.
"mm: enable large folios swap-in support" from Barry Song. Support the
swapin of mTHP memory into appropriately-sized folios, rather than into
single-page folios.
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA
jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7
AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo=
=s0T+
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Along with the usual shower of singleton patches, notable patch series
in this pull request are:
- "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds
consistency to the APIs and behaviour of these two core allocation
functions. This also simplifies/enables Rustification.
- "Some cleanups for shmem" from Baolin Wang. No functional changes -
mode code reuse, better function naming, logic simplifications.
- "mm: some small page fault cleanups" from Josef Bacik. No
functional changes - code cleanups only.
- "Various memory tiering fixes" from Zi Yan. A small fix and a
little cleanup.
- "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and
simplifications and .text shrinkage.
- "Kernel stack usage histogram" from Pasha Tatashin and Shakeel
Butt. This is a feature, it adds new feilds to /proc/vmstat such as
$ grep kstack /proc/vmstat
kstack_1k 3
kstack_2k 188
kstack_4k 11391
kstack_8k 243
kstack_16k 0
which tells us that 11391 processes used 4k of stack while none at
all used 16k. Useful for some system tuning things, but
partivularly useful for "the dynamic kernel stack project".
- "kmemleak: support for percpu memory leak detect" from Pavel
Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory.
- "mm: memcg: page counters optimizations" from Roman Gushchin. "3
independent small optimizations of page counters".
- "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from
David Hildenbrand. Improves PTE/PMD splitlock detection, makes
powerpc/8xx work correctly by design rather than by accident.
- "mm: remove arch_make_page_accessible()" from David Hildenbrand.
Some folio conversions which make arch_make_page_accessible()
unneeded.
- "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David
Finkel. Cleans up and fixes our handling of the resetting of the
cgroup/process peak-memory-use detector.
- "Make core VMA operations internal and testable" from Lorenzo
Stoakes. Rationalizaion and encapsulation of the VMA manipulation
APIs. With a view to better enable testing of the VMA functions,
even from a userspace-only harness.
- "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix
issues in the zswap global shrinker, resulting in improved
performance.
- "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill
in some missing info in /proc/zoneinfo.
- "mm: replace follow_page() by folio_walk" from David Hildenbrand.
Code cleanups and rationalizations (conversion to folio_walk())
resulting in the removal of follow_page().
- "improving dynamic zswap shrinker protection scheme" from Nhat
Pham. Some tuning to improve zswap's dynamic shrinker. Significant
reductions in swapin and improvements in performance are shown.
- "mm: Fix several issues with unaccepted memory" from Kirill
Shutemov. Improvements to the new unaccepted memory feature,
- "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on
DAX PUDs. This was missing, although nobody seems to have notied
yet.
- "Introduce a store type enum for the Maple tree" from Sidhartha
Kumar. Cleanups and modest performance improvements for the maple
tree library code.
- "memcg: further decouple v1 code from v2" from Shakeel Butt. Move
more cgroup v1 remnants away from the v2 memcg code.
- "memcg: initiate deprecation of v1 features" from Shakeel Butt.
Adds various warnings telling users that memcg v1 features are
deprecated.
- "mm: swap: mTHP swap allocator base on swap cluster order" from
Chris Li. Greatly improves the success rate of the mTHP swap
allocation.
- "mm: introduce numa_memblks" from Mike Rapoport. Moves various
disparate per-arch implementations of numa_memblk code into generic
code.
- "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly
improves the performance of munmap() of swap-filled ptes.
- "support large folio swap-out and swap-in for shmem" from Baolin
Wang. With this series we no longer split shmem large folios into
simgle-page folios when swapping out shmem.
- "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice
performance improvements and code reductions for gigantic folios.
- "support shmem mTHP collapse" from Baolin Wang. Adds support for
khugepaged's collapsing of shmem mTHP folios.
- "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect()
performance regression due to the addition of mseal().
- "Increase the number of bits available in page_type" from Matthew
Wilcox. Increases the number of bits available in page_type!
- "Simplify the page flags a little" from Matthew Wilcox. Many legacy
page flags are now folio flags, so the page-based flags and their
accessors/mutators can be removed.
- "mm: store zero pages to be swapped out in a bitmap" from Usama
Arif. An optimization which permits us to avoid writing/reading
zero-filled zswap pages to backing store.
- "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race
window which occurs when a MAP_FIXED operqtion is occurring during
an unrelated vma tree walk.
- "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of
the vma_merge() functionality, making ot cleaner, more testable and
better tested.
- "misc fixups for DAMON {self,kunit} tests" from SeongJae Park.
Minor fixups of DAMON selftests and kunit tests.
- "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang.
Code cleanups and folio conversions.
- "Shmem mTHP controls and stats improvements" from Ryan Roberts.
Cleanups for shmem controls and stats.
- "mm: count the number of anonymous THPs per size" from Barry Song.
Expose additional anon THP stats to userspace for improved tuning.
- "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more
folio conversions and removal of now-unused page-based APIs.
- "replace per-quota region priorities histogram buffer with
per-context one" from SeongJae Park. DAMON histogram
rationalization.
- "Docs/damon: update GitHub repo URLs and maintainer-profile" from
SeongJae Park. DAMON documentation updates.
- "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and
improve related doc and warn" from Jason Wang: fixes usage of page
allocator __GFP_NOFAIL and GFP_ATOMIC flags.
- "mm: split underused THPs" from Yu Zhao. Improve THP=always policy.
This was overprovisioning THPs in sparsely accessed memory areas.
- "zram: introduce custom comp backends API" frm Sergey Senozhatsky.
Add support for zram run-time compression algorithm tuning.
- "mm: Care about shadow stack guard gap when getting an unmapped
area" from Mark Brown. Fix up the various arch_get_unmapped_area()
implementations to better respect guard areas.
- "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability
of mem_cgroup_iter() and various code cleanups.
- "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge
pfnmap support.
- "resource: Fix region_intersects() vs add_memory_driver_managed()"
from Huang Ying. Fix a bug in region_intersects() for systems with
CXL memory.
- "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches
a couple more code paths to correctly recover from the encountering
of poisoned memry.
- "mm: enable large folios swap-in support" from Barry Song. Support
the swapin of mTHP memory into appropriately-sized folios, rather
than into single-page folios"
* tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits)
zram: free secondary algorithms names
uprobes: turn xol_area->pages[2] into xol_area->page
uprobes: introduce the global struct vm_special_mapping xol_mapping
Revert "uprobes: use vm_special_mapping close() functionality"
mm: support large folios swap-in for sync io devices
mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios
mm: fix swap_read_folio_zeromap() for large folios with partial zeromap
mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries
set_memory: add __must_check to generic stubs
mm/vma: return the exact errno in vms_gather_munmap_vmas()
memcg: cleanup with !CONFIG_MEMCG_V1
mm/show_mem.c: report alloc tags in human readable units
mm: support poison recovery from copy_present_page()
mm: support poison recovery from do_cow_fault()
resource, kunit: add test case for region_intersects()
resource: make alloc_free_mem_region() works for iomem_resource
mm: z3fold: deprecate CONFIG_Z3FOLD
vfio/pci: implement huge_fault support
mm/arm64: support large pfn mappings
mm/x86: support large pfn mappings
...
|
|
|
|
eec91e22fe |
IOMMU Updates for Linux v6.12
Including:
- Core changes:
- Allow ATS on VF when parent device is identity mapped.
- Optimize unmap path on ARM io-pagetable implementation.
- Use of_property_present().
- ARM-SMMU changes:
- SMMUv2:
- Devicetree binding updates for Qualcomm MMU-500 implementations.
- Extend workarounds for broken Qualcomm hypervisor to avoid
touching features that are not available (e.g. 16KiB page
support, reserved context banks).
- SMMUv3:
- Support for NVIDIA's custom virtual command queue hardware.
- Fix Stage-2 stall configuration and extend tests to cover this
area.
- A bunch of driver cleanups, including simplification of the
master rbtree code.
- Plus minor cleanups and fixes across both drivers.
- Intel VT-d changes:
- Retire si_domain and convert to use static identity domain.
- Batched IOTLB/dev-IOTLB invalidation.
- Small code refactoring and cleanups.
- AMD-Vi changes:
- Cleanup and refactoring of io-pagetable code.
- Add parameter to limit the used io-pagesizes.
- Other cleanups and fixes.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEr9jSbILcajRFYWYyK/BELZcBGuMFAmboAtoACgkQK/BELZcB
GuNidQ//WOhwVQZmdS6vnU2vu//LwFE7Q7PsRYPW2QhFri1eurKo6jxMNBtgUsXu
fPTSEBM7/lhagRgb29ycrbOYoavkEnUiIMX7vRsjl9tVkqd/GKNTrMuUC+QPiBYQ
ASkStmEUW6Zvye4rWyUxiCJIFIA5wm74wSOOQ6X2Wg3WMo51njrj1DK/k2H5JenJ
RTmIA9Ynef2py38xWDd0UE/psvKrzA5uug4IP0E0v014i36cSEVrH7hjztMfd8Sc
2dUuJ8eUUtLTo1ffTcmxoTvUBjBzJOzeSQrFfaDZDgyqayt6JoSKeX1DV/nCI8kc
ftg0pe37Zr3mndgQC7wNyUO1GOmkJl+GpMFyJTG8wpnBc0tr+TDn1o6QERymcRxA
kn62n4vxxjWoRSKt3di7hNM0Uuwj8/z/cIbDSTNbSov4fDuuz0xppdcA/ewKATv0
VgmpP5OyIFZXM+mR4Vem2hZQQ3wPOsJAFVWS1ROtYQFgiimrGf+w9et8rEU4pmp5
Ve4rSmka60NLdE6i1JNqx4sRrRsdJJ55knI77nHrt0TZkbMzA/JG1UT3TbbMJTtd
v5dviMMOXLpcKQLgqlde8QWOEjT6VUw/fbU640iyzhrWAm8fWDBefrSv6JLhevQ4
fBajoaej89cd9DkkEJiSTiyGig8QkY3HFaqDo3u5g/sBBrMBZas=
=1QvI
-----END PGP SIGNATURE-----
Merge tag 'iommu-updates-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu updates from Joerg Roedel:
"Core changes:
- Allow ATS on VF when parent device is identity mapped
- Optimize unmap path on ARM io-pagetable implementation
- Use of_property_present()
ARM-SMMU changes:
- SMMUv2:
- Devicetree binding updates for Qualcomm MMU-500 implementations
- Extend workarounds for broken Qualcomm hypervisor to avoid
touching features that are not available (e.g. 16KiB page
support, reserved context banks)
- SMMUv3:
- Support for NVIDIA's custom virtual command queue hardware
- Fix Stage-2 stall configuration and extend tests to cover this
area
- A bunch of driver cleanups, including simplification of the
master rbtree code
- Minor cleanups and fixes across both drivers
Intel VT-d changes:
- Retire si_domain and convert to use static identity domain
- Batched IOTLB/dev-IOTLB invalidation
- Small code refactoring and cleanups
AMD-Vi changes:
- Cleanup and refactoring of io-pagetable code
- Add parameter to limit the used io-pagesizes
- Other cleanups and fixes"
* tag 'iommu-updates-v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: (77 commits)
dt-bindings: arm-smmu: Add compatible for QCS8300 SoC
iommu/amd: Test for PAGING domains before freeing a domain
iommu/amd: Fix argument order in amd_iommu_dev_flush_pasid_all()
iommu/amd: Add kernel parameters to limit V1 page-sizes
iommu/arm-smmu-v3: Reorganize struct arm_smmu_ctx_desc_cfg
iommu/arm-smmu-v3: Add types for each level of the CD table
iommu/arm-smmu-v3: Shrink the cdtab l1_desc array
iommu/arm-smmu-v3: Do not use devm for the cd table allocations
iommu/arm-smmu-v3: Remove strtab_base/cfg
iommu/arm-smmu-v3: Reorganize struct arm_smmu_strtab_cfg
iommu/arm-smmu-v3: Add types for each level of the 2 level stream table
iommu/arm-smmu-v3: Add arm_smmu_strtab_l1/2_idx()
iommu/arm-smmu-qcom: apply num_context_bank fixes for SDM630 / SDM660
iommu/arm-smmu-v3: Use the new rb tree helpers
dt-bindings: arm-smmu: document the support on SA8255p
iommu/tegra241-cmdqv: Do not allocate vcmdq until dma_set_mask_and_coherent
iommu/tegra241-cmdqv: Drop static at local variable
iommu/tegra241-cmdqv: Fix ioremap() error handling in probe()
iommu/amd: Do not set the D bit on AMD v2 table entries
iommu/amd: Correct the reported page sizes from the V1 table
...
|
|
|
|
067610ebaa |
RCU pull request for v6.12
This pull request contains the following branches:
context_tracking.15.08.24a: Rename context tracking state related
symbols and remove references to "dynticks" in various context
tracking state variables and related helpers; force
context_tracking_enabled_this_cpu() to be inlined to avoid
leaving a noinstr section.
csd.lock.15.08.24a: Enhance CSD-lock diagnostic reports; add an API
to provide an indication of ongoing CSD-lock stall.
nocb.09.09.24a: Update and simplify RCU nocb code to handle
(de-)offloading of callbacks only for offline CPUs; fix RT
throttling hrtimer being armed from offline CPU.
rcutorture.14.08.24a: Remove redundant rcu_torture_ops get_gp_completed
fields; add SRCU ->same_gp_state and ->get_comp_state
functions; add generic test for NUM_ACTIVE_*RCU_POLL* for
testing RCU and SRCU polled grace periods; add CFcommon.arch
for arch-specific Kconfig options; print number of update types
in rcu_torture_write_types();
add rcutree.nohz_full_patience_delay testing to the TREE07
scenario; add a stall_cpu_repeat module parameter to test
repeated CPU stalls; add argument to limit number of CPUs a
guest OS can use in torture.sh;
rcustall.09.09.24a: Abbreviate RCU CPU stall warnings during CSD-lock
stalls; Allow dump_cpu_task() to be called without disabling
preemption; defer printing stall-warning backtrace when holding
rcu_node lock.
srcu.12.08.24a: Make SRCU gp seq wrap-around faster; add KCSAN checks
for concurrent updates to ->srcu_n_exp_nodelay and
->reschedule_count which are used in heuristics governing
auto-expediting of normal SRCU grace periods and
grace-period-state-machine delays; mark idle SRCU-barrier
callbacks to help identify stuck SRCU-barrier callback.
rcu.tasks.14.08.24a: Remove RCU Tasks Rude asynchronous APIs as they
are no longer used; stop testing RCU Tasks Rude asynchronous
APIs; fix access to non-existent percpu regions; check
processor-ID assumptions during chosen CPU calculation for
callback enqueuing; update description of rtp->tasks_gp_seq
grace-period sequence number; add rcu_barrier_cb_is_done()
to identify whether a given rcu_barrier callback is stuck;
mark idle Tasks-RCU-barrier callbacks; add
*torture_stats_print() functions to print detailed
diagnostics for Tasks-RCU variants; capture start time of
rcu_barrier_tasks*() operation to help distinguish a hung
barrier operation from a long series of barrier operations.
rcu_scaling_tests.15.08.24a:
refscale: Add a TINY scenario to support tests of Tiny RCU
and Tiny SRCU; Optimize process_durations() operation;
rcuscale: Dump stacks of stalled rcu_scale_writer() instances;
dump grace-period statistics when rcu_scale_writer() stalls;
mark idle RCU-barrier callbacks to identify stuck RCU-barrier
callbacks; print detailed grace-period and barrier diagnostics
on rcu_scale_writer() hangs for Tasks-RCU variants; warn if
async module parameter is specified for RCU implementations
that do not have async primitives such as RCU Tasks Rude;
make all writer tasks report upon hang; tolerate repeated
GFP_KERNEL failure in rcu_scale_writer(); use special allocator
for rcu_scale_writer(); NULL out top-level pointers to heap
memory to avoid double-free bugs on modprobe failures; maintain
per-task instead of per-CPU callbacks count to avoid any issues
with migration of either tasks or callbacks; constify struct
ref_scale_ops.
fixes.12.08.24a: Use system_unbound_wq for kfree_rcu work to avoid
disturbing isolated CPUs.
misc.11.08.24a: Warn on unexpected rcu_state.srs_done_tail state;
Better define "atomic" for list_replace_rcu() and
hlist_replace_rcu() routines; annotate struct
kvfree_rcu_bulk_data with __counted_by().
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQSi2tPIQIc2VEtjarIAHS7/6Z0wpQUCZt8+8wAKCRAAHS7/6Z0w
pTqoAPwPN//tlEoJx2PRs6t0q+nD1YNvnZawPaRmdzgdM8zJogD+PiSN+XhqRr80
jzyvMDU4Aa0wjUNP3XsCoaCxo7L/lQk=
=bZ9z
-----END PGP SIGNATURE-----
Merge tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux
Pull RCU updates from Neeraj Upadhyay:
"Context tracking:
- rename context tracking state related symbols and remove references
to "dynticks" in various context tracking state variables and
related helpers
- force context_tracking_enabled_this_cpu() to be inlined to avoid
leaving a noinstr section
CSD lock:
- enhance CSD-lock diagnostic reports
- add an API to provide an indication of ongoing CSD-lock stall
nocb:
- update and simplify RCU nocb code to handle (de-)offloading of
callbacks only for offline CPUs
- fix RT throttling hrtimer being armed from offline CPU
rcutorture:
- remove redundant rcu_torture_ops get_gp_completed fields
- add SRCU ->same_gp_state and ->get_comp_state functions
- add generic test for NUM_ACTIVE_*RCU_POLL* for testing RCU and SRCU
polled grace periods
- add CFcommon.arch for arch-specific Kconfig options
- print number of update types in rcu_torture_write_types()
- add rcutree.nohz_full_patience_delay testing to the TREE07 scenario
- add a stall_cpu_repeat module parameter to test repeated CPU stalls
- add argument to limit number of CPUs a guest OS can use in
torture.sh
rcustall:
- abbreviate RCU CPU stall warnings during CSD-lock stalls
- Allow dump_cpu_task() to be called without disabling preemption
- defer printing stall-warning backtrace when holding rcu_node lock
srcu:
- make SRCU gp seq wrap-around faster
- add KCSAN checks for concurrent updates to ->srcu_n_exp_nodelay and
->reschedule_count which are used in heuristics governing
auto-expediting of normal SRCU grace periods and
grace-period-state-machine delays
- mark idle SRCU-barrier callbacks to help identify stuck
SRCU-barrier callback
rcu tasks:
- remove RCU Tasks Rude asynchronous APIs as they are no longer used
- stop testing RCU Tasks Rude asynchronous APIs
- fix access to non-existent percpu regions
- check processor-ID assumptions during chosen CPU calculation for
callback enqueuing
- update description of rtp->tasks_gp_seq grace-period sequence
number
- add rcu_barrier_cb_is_done() to identify whether a given
rcu_barrier callback is stuck
- mark idle Tasks-RCU-barrier callbacks
- add *torture_stats_print() functions to print detailed diagnostics
for Tasks-RCU variants
- capture start time of rcu_barrier_tasks*() operation to help
distinguish a hung barrier operation from a long series of barrier
operations
refscale:
- add a TINY scenario to support tests of Tiny RCU and Tiny
SRCU
- optimize process_durations() operation
rcuscale:
- dump stacks of stalled rcu_scale_writer() instances and
grace-period statistics when rcu_scale_writer() stalls
- mark idle RCU-barrier callbacks to identify stuck RCU-barrier
callbacks
- print detailed grace-period and barrier diagnostics on
rcu_scale_writer() hangs for Tasks-RCU variants
- warn if async module parameter is specified for RCU implementations
that do not have async primitives such as RCU Tasks Rude
- make all writer tasks report upon hang
- tolerate repeated GFP_KERNEL failure in rcu_scale_writer()
- use special allocator for rcu_scale_writer()
- NULL out top-level pointers to heap memory to avoid double-free
bugs on modprobe failures
- maintain per-task instead of per-CPU callbacks count to avoid any
issues with migration of either tasks or callbacks
- constify struct ref_scale_ops
Fixes:
- use system_unbound_wq for kfree_rcu work to avoid disturbing
isolated CPUs
Misc:
- warn on unexpected rcu_state.srs_done_tail state
- better define "atomic" for list_replace_rcu() and
hlist_replace_rcu() routines
- annotate struct kvfree_rcu_bulk_data with __counted_by()"
* tag 'rcu.release.v6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (90 commits)
rcu: Defer printing stall-warning backtrace when holding rcu_node lock
rcu/nocb: Remove superfluous memory barrier after bypass enqueue
rcu/nocb: Conditionally wake up rcuo if not already waiting on GP
rcu/nocb: Fix RT throttling hrtimer armed from offline CPU
rcu/nocb: Simplify (de-)offloading state machine
context_tracking: Tag context_tracking_enabled_this_cpu() __always_inline
context_tracking, rcu: Rename rcu_dyntick trace event into rcu_watching
rcu: Update stray documentation references to rcu_dynticks_eqs_{enter, exit}()
rcu: Rename rcu_momentary_dyntick_idle() into rcu_momentary_eqs()
rcu: Rename rcu_implicit_dynticks_qs() into rcu_watching_snap_recheck()
rcu: Rename dyntick_save_progress_counter() into rcu_watching_snap_save()
rcu: Rename struct rcu_data .exp_dynticks_snap into .exp_watching_snap
rcu: Rename struct rcu_data .dynticks_snap into .watching_snap
rcu: Rename rcu_dynticks_zero_in_eqs() into rcu_watching_zero_in_eqs()
rcu: Rename rcu_dynticks_in_eqs_since() into rcu_watching_snap_stopped_since()
rcu: Rename rcu_dynticks_in_eqs() into rcu_watching_snap_in_eqs()
rcu: Rename rcu_dynticks_eqs_online() into rcu_watching_online()
context_tracking, rcu: Rename rcu_dynticks_curr_cpu_in_eqs() into rcu_is_watching_curr_cpu()
context_tracking, rcu: Rename rcu_dynticks_task*() into rcu_task*()
refscale: Constify struct ref_scale_ops
...
|