linux

Commit Graph

Author	SHA1	Message	Date
Alexander Gordeev	3ace3c4214	Merge branch 'pci-device-recovery' into features Niklas Schnelle says: =================== This patch series enhances the introspectability of the PCI device recovery for firmware. Until now when Linux performs recovery in response to a firmware error report. For example, until now firmware debug data would have no indication if the recovery was successfull or if it failed, for example due to KVM pass-through. Improve on this by reporting recovery status as well as some debug information such as device driver name and s390dbf/pci_msg/sprintf logs via the SCLP Write Event Data Action Qualifier 2 (Log Data provided) mechanism. =================== Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-18 16:06:24 +01:00
Sumanth Korikkar	388cf16d90	s390/diag: Move diag.c to diag specific folder Move implementation of s390 diagnose code to diag specific folder. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-17 12:46:14 +01:00
Sumanth Korikkar	90e6f191e1	s390/diag324: Retrieve power readings via diag 0x324 Retrieve electrical power readings for resources in a computing environment via diag 0x324. diag 0x324 stores the power readings in the power information block (pib). Provide power readings from pib via diag324 ioctl interface. diag324 ioctl provides new pib to the user only if the threshold time has passed since the last call. Otherwise, cache data is returned. When there are no active readers, cleanup of pib buffer is performed. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-17 12:46:14 +01:00
Sumanth Korikkar	2478d43ed6	s390/diag: Create misc device /dev/diag Create a misc device /dev/diag to fetch diagnose specific information from the kernel and provide it to userspace. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-17 12:46:14 +01:00
Heiko Carstens	f8107a8be0	s390/mm: Simplify noexec page protection handling By default page protection definitions like PAGE_RX have the _PAGE_NOEXEC bit set. For older machines without the instruction execution protection facility this bit is not allowed to be used in page table entries, and therefore must be removed. This is done at a couple of page table walkers, but also at some but not all page table modification functions like ptep_modify_prot_commit(). Avoid all of this and change the page, segment and region3 protection definitions so that the noexec bit is masked out automatically if the instruction execution-protection facility is not available. This is similar to what also various other architectures do which had to solve the same problem. Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-17 12:46:13 +01:00
Niklas Schnelle	4c41a48f5f	s390/pci: Add pci_msg debug view to PCI report Using the newly introduced debug_dump() mechanism add formatted content of pci_debug_msg_id to the PCI report. The formatting is based on the existing sprintf format but removes caller pointer and area index and adds an column header. This will allow the platform to collect this log data together with hardware errors. This sets the reverse flag such that the newest log entries get added to the PCI report even if not all debug log entries fit. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Co-developed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-16 16:14:27 +01:00
Niklas Schnelle	dc18c81a57	s390/debug: Add a reverse mode for debug_dump() In this mode debug_dump() writes the debug log starting at the newest entry followed by earlier entries. To this end add a debug_prev_entry() helper analogous to debug_next_entry() a helper to get the latest entry which is one before the active entry and a helper to iterate either forward or backward. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Co-developed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-16 16:14:26 +01:00
Niklas Schnelle	5f952dae48	s390/debug: Add debug_dump() to write debug view to a string buffer The debug_dump() function allows to get the content of a debug log and view pair in a string buffer. One future application of this is to provide debug logs to the platform to be collected with hardware error logs during recovery. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Co-developed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-16 16:14:26 +01:00
Niklas Schnelle	460c52a57f	s390/debug: Split private data alloc/free out of file operations Split the allocation respectively freeing of file_private_info_t out of open() respectively close(). This will be used in a follow on change to access to debug views without going through the s390dbf filesystem. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-16 16:14:26 +01:00
Niklas Schnelle	7832b3047d	s390/debug: Simplify and document debug_next_entry() logic Contrary to convention debug_next_entry() returns a falsy 0 value if there are more entries and a truthy 1 value when there are no more entries. As there is only one caller just reverse this logic to be less surprising and document the behavior in a kdoc comment. Also replace the goto with an early return. In the future this allows using it in a do {} while (debug_next_entry(...)) loop. Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-16 16:14:26 +01:00
Vasily Gorbik	912a0d3523	s390: Remove __bootdata annotations from declarations For consistency, remove the `__bootdata` and `__bootdata_preserved` section annotations from variable declarations in header files. Section annotations should be applied to definitions, not declarations. This change moves the annotations to the variable definitions in the corresponding source files. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-15 16:19:04 +01:00
Alexander Gordeev	5fa49dd8e5	s390/ipl: Fix never less than zero warning DEFINE_IPL_ATTR_STR_RW() macro produces "unsigned 'len' is never less than zero." warning when sys_vmcmd_on_*_store() callbacks are defined. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202412081614.5uel8F6W-lkp@intel.com/ Fixes: `247576bf62` ("s390/ipl: Do not accept z/VM CP diag X'008' cmds longer than max length") Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-11 18:05:49 +01:00
Heiko Carstens	7ad0075005	s390/setup: Cleanup stack_alloc() and stack_free() Some small cleanups to stack_alloc() and stack_free(): - Rename ret to stack to reflect what the variable is used for - Whitespace removal Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-10 15:41:58 +01:00
Heiko Carstens	27939d6cde	s390/Kconfig: Select VMAP_STACK unconditionally There is no point in supporting !VMAP_STACK kernel builds. VMAP_STACK has proven to work since many years. Also, since KASAN_VMALLOC is supported, kernels built with !VMAP_STACK are completely untested. Therefore select VMAP_STACK unconditionally and remove all config options and code required for !VMAP_STACK builds. Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-12-10 15:41:58 +01:00
Namhyung Kim	6057b90ecc	perf/core: Export perf_exclude_event() While at it, rename the same function in s390 cpum_sf PMU. Signed-off-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Tested-by: Ravi Bangoria <ravi.bangoria@amd.com> Reviewed-by: Ravi Bangoria <ravi.bangoria@amd.com> Acked-by: Thomas Richter <tmricht@linux.ibm.com> Link: https://lore.kernel.org/r/20241203180441.1634709-2-namhyung@kernel.org	2024-12-09 15:50:31 +01:00
Ingo Molnar	bcfd5f644c	Linux 6.13-rc1 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmdM4ygeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGURgIAIpjH8kH2NS3bdqK 65MBoKZ8qstZcQyo7H68sCkMyaspvDyePznmkDrWym/FyIOVg4FQ/sXes9xxLACu 2zy9WG+bAmZvpQ/xCqJZK9WklbXwvRXW5c5i+SB1kFTMhhdLqCpwxRnaQyIVMnmO dIAtJxDr1eYpOCEmibEbVfYyj9SUhBcvk4qznV5yeW50zOYzv0OJU9BwAuxkShxV NXqMpXoy1Ye5GJ2KB8u/VEccVpywR0c6bHlvaTnPZxOBxrZF/FbVQ6PzEO+j4/aX 3TWgSa5jrVwRksnll8YqIkNSWR10u3kOLgDax/S0G8opktTFIB/EiQ84AVN0Tjme PrwJSWs= =tjyG -----END PGP SIGNATURE----- Merge tag 'v6.13-rc1' into perf/core, to refresh the branch Signed-off-by: Ingo Molnar <mingo@kernel.org>	2024-12-02 11:52:59 +01:00
Linus Torvalds	509f806f7f	more s390 updates for 6.13 merge window - Add swap entry for hugetlbfs support - Add PTE_MARKER support for hugetlbs mappings; this fixes a regression (possible page fault loop) which was introduced when support for UFFDIO_POISON for hugetlbfs was added - Add ARCH_HAS_PREEMPT_LAZY and PREEMPT_DYNAMIC support - Mark IRQ entries in entry code, so that stack tracers can filter out the non-IRQ parts of stack traces. This fixes stack depot capacity limit warnings, since without filtering the number of unique stack traces is huge - In PCI code fix leak of struct zpci_dev object, and fix potential double remove of hotplug slot - Fix pagefault_disable() / pagefault_enable() unbalance in arch_stack_user_walk_common() - A couple of inline assembly optimizations, more cmpxchg() to try_cmpxchg() conversions, and removal of usages of xchg() and cmpxchg() on one and two byte memory areas - Various other small improvements and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmdJ3WoACgkQIg7DeRsp bsJt1RAAtlkbeN4+eVeYM4vBwHvgfAY/5Ii2wdHO2qwPHqBVkRtsqrmyewE/tVCF PZsYBXDDrzyAtLMqjlNGDQ1QexNLn4BELgSIysr45mxwMq1W33BiXvb8I5uK/V/7 /TcW2s1daJKKrbk+HBA8ZTwna5SeUSoZuh9y/n9SKVC4rRkWdeL7G1RRNQtafDlg aELCo17iHDZNoHeoRStOimZqVBwko6IQqQH4DCx2S4+J6nKQBGRyzGWIkLRoUxr6 MgNLrxekWjkoqAnXM0Ztb7LYg6AS/iOuGbqg/xLi1VJSWNCIf9zLpDs++SdFHoTU n4Cj07IHR4OLQ1YB+EX2uPY7rJw0tPt0g/dgmYYi3uP88hJ7VYFOtfJx/UGlid2q 3l7wXNwtg+CJtw0Ey+21cMdmnOffxH9c3nBPahe7zK5k1GKjXDOfWEcmucG0zW5K qYI5m7vAZAX4ve1362DOgJei/1uxGuMQQZsobHpwfhcGXzLZ2AZY45Ls86nQzHua KpupybWQe70hQYk9hUw+M/ShChuH8dhnPjx51T0r/0E0BdU6Q20xLPLWx/2jRzUb FlFg7WtVw2y45eQCFPbtVsoVzDCpfpfgTw5rrDsjFf/twS0E3ubmTC1rLr4YB+5m 5cjPys/SYpQWUi3wQFTQ6dL3w0+vWXlQmTi5ChcxTZF2ytwP+yg= =cfmM -----END PGP SIGNATURE----- Merge tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Heiko Carstens: - Add swap entry for hugetlbfs support - Add PTE_MARKER support for hugetlbs mappings; this fixes a regression (possible page fault loop) which was introduced when support for UFFDIO_POISON for hugetlbfs was added - Add ARCH_HAS_PREEMPT_LAZY and PREEMPT_DYNAMIC support - Mark IRQ entries in entry code, so that stack tracers can filter out the non-IRQ parts of stack traces. This fixes stack depot capacity limit warnings, since without filtering the number of unique stack traces is huge - In PCI code fix leak of struct zpci_dev object, and fix potential double remove of hotplug slot - Fix pagefault_disable() / pagefault_enable() unbalance in arch_stack_user_walk_common() - A couple of inline assembly optimizations, more cmpxchg() to try_cmpxchg() conversions, and removal of usages of xchg() and cmpxchg() on one and two byte memory areas - Various other small improvements and cleanups * tag 's390-6.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (27 commits) Revert "s390/mm: Allow large pages for KASAN shadow mapping" s390/spinlock: Use flag output constraint for arch_cmpxchg_niai8() s390/spinlock: Use R constraint for arch_load_niai4() s390/spinlock: Generate shorter code for arch_spin_unlock() s390/spinlock: Remove condition code clobber from arch_spin_unlock() s390/spinlock: Use symbolic names in inline assemblies s390: Support PREEMPT_DYNAMIC s390/pci: Fix potential double remove of hotplug slot s390/pci: Fix leak of struct zpci_dev when zpci_add_device() fails s390/mm/hugetlbfs: Add missing includes s390/mm: Add PTE_MARKER support for hugetlbfs mappings s390/mm: Introduce region-third and segment table swap entries s390/mm: Introduce region-third and segment table entry present bits s390/mm: Rearrange region-third and segment table entry SW bits KVM: s390: Increase size of union sca_utility to four bytes KVM: s390: Remove one byte cmpxchg() usage KVM: s390: Use try_cmpxchg() instead of cmpxchg() loops s390/ap: Replace xchg() with WRITE_ONCE() s390/mm: Allow large pages for KASAN shadow mapping s390: Add ARCH_HAS_PREEMPT_LAZY support ...	2024-11-29 10:40:52 -08:00
Linus Torvalds	fcc79e1714	Networking changes for 6.13. The most significant set of changes is the per netns RTNL. The new behavior is disabled by default, regression risk should be contained. Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its default value from PTP_1588_CLOCK_KVM, as the first is intended to be a more reliable replacement for the latter. Core ---- - Started a very large, in-progress, effort to make the RTNL lock scope per network-namespace, thus reducing the lock contention significantly in the containerized use-case, comprising: - RCU-ified some relevant slices of the FIB control path - introduce basic per netns locking helpers - namespacified the IPv4 address hash table - remove rtnl_register{,_module}() in favour of rtnl_register_many() - refactor rtnl_{new,del,set}link() moving as much validation as possible out of RTNL lock - convert all phonet doit() and dumpit() handlers to RCU - convert IPv4 addresses manipulation to per-netns RTNL - convert virtual interface creation to per-netns RTNL the per-netns lock infra is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim. - Introduce NAPI suspension, to efficiently switching between busy polling (NAPI processing suspended) and normal processing. - Migrate the IPv4 routing input, output and control path from direct ToS usage to DSCP macros. This is a work in progress to make ECN handling consistent and reliable. - Add drop reasons support to the IPv4 rotue input path, allowing better introspection in case of packets drop. - Make FIB seqnum lockless, dropping RTNL protection for read access. - Make inet{,v6} addresses hashing less predicable. - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets and timestamps Things we sprinkled into general kernel code -------------------------------------------- - Add small file operations for debugfs, to reduce the struct ops size. - Refactoring and optimization for the implementation of page_frag API, This is a preparatory work to consolidate the page_frag implementation. Netfilter --------- - Optimize set element transactions to reduce memory consumption - Extended netlink error reporting for attribute parser failure. - Make legacy xtables configs user selectable, giving users the option to configure iptables without enabling any other config. - Address a lot of false-positive RCU issues, pointed by recent CI improvements. BPF --- - Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads. - Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap. - Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it. - Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program. - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs. Protocols --------- - Introduces 4-tuple hash for connected udp sockets, speeding-up significantly connected sockets lookup. - Add a fastpath for some TCP timers that usually expires after close, the socket lock contention. - Add inbound and outbound xfrm state caches to speed up state lookups. - Avoid sending MPTCP advertisements on stale subflows, reducing risks on loosing them. - Make neighbours table flushing more scalable, maintaining per device neigh lists. Driver API ---------- - Introduce a unified interface to configure transmission H/W shaping, and expose it to user-space via generic-netlink. - Add support for per-NAPI config via netlink. This makes napi configuration persistent across queues removal and re-creation. Requires driver updates, currently supported drivers are: nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice. - Add ethtool support for writing SFP / PHY firmware blocks. - Track RSS context allocation from ethtool core. - Implement support for mirroring to DSA CPU port, via TC mirror offload. - Consolidate FDB updates notification, to avoid duplicates on device-specific entries. - Expose DPLL clock quality level to the user-space. - Support master-slave PHY config via device tree. Tests and tooling ----------------- - forwarding: introduce deferred commands, to simplify the cleanup phase Drivers ------- - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic, Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the IRQs and queues to NAPI IDs, allowing busy polling and better introspection. - Ethernet high-speed NICs: - nVidia/Mellanox: - mlx5: - a large refactor to implement support for cross E-Switch scheduling - refactor H/W conter management to let it scale better - H/W GRO cleanups - Intel (100G, ice):: - adds support for ethtool reset - implement support for per TX queue H/W shaping - AMD/Solarflare: - implement per device queue stats support - Broadcom (bnxt): - improve wildcard l4proto on IPv4/IPv6 ntuple rules - Marvell Octeon: - Adds representor support for each Resource Virtualization Unit (RVU) device. - Hisilicon: - adds support for the BMC Gigabit Ethernet - IBM (EMAC): - driver cleanup and modernization - Cisco (VIC): - raise the queues number limit to 256 - Ethernet virtual: - Google vNIC: - implements page pool support - macsec: - inherit lower device's features and TSO limits when offloading - virtio_net: - enable premapped mode by default - support for XDP socket(AF_XDP) zerocopy TX - wireguard: - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger packets. - Ethernet NICs embedded and virtual: - Broadcom ASP: - enable software timestamping - Freescale: - add enetc4 PF driver - MediaTek: Airoha SoC: - implement BQL support - RealTek r8169: - enable TSO by default on r8168/r8125 - implement extended ethtool stats - Renesas AVB: - enable TX checksum offload - Synopsys (stmmac): - support header splitting for vlan tagged packets - move common code for DWMAC4 and DWXGMAC into a separate FPE module. - Add the dwmac driver support for T-HEAD TH1520 SoC - Synopsys (xpcs): - driver refactor and cleanup - TI: - icssg_prueth: add VLAN offload support - Xilinx emaclite: - adds clock support - Ethernet switches: - Microchip: - implement support for the lan969x Ethernet switch family - add LAN9646 switch support to KSZ DSA driver - Ethernet PHYs: - Marvel: 88q2x: enable auto negotiation - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2 - PTP: - Add support for the Amazon virtual clock device - Add PtP driver for s390 clocks - WiFi: - mac80211 - EHT 1024 aggregation size for transmissions - new operation to indicate that a new interface is to be added - support radio separation of multi-band devices - move wireless extension spy implementation to libiw - Broadcom: - brcmfmac: optional LPO clock support - Microchip: - add support for Atmel WILC3000 - Qualcomm (ath12k): - firmware coredump collection support - add debugfs support for a multitude of statistics - Qualcomm (ath5k): - Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support - Realtek: - rtw88: 8821au and 8812au USB adapters support - rtw89: add thermal protection - rtw89: fine tune BT-coexsitence to improve user experience - rtw89: firmware secure boot for WiFi 6 chip - Bluetooth - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and 0x13d3:0x3623 - add Realtek RTL8852BE support for id Foxconn 0xe123 - add MediaTek MT7920 support for wireless module ids - btintel_pcie: add handshake between driver and firmware - btintel_pcie: add recovery mechanism - btnxpuart: add GPIO support to power save feature Signed-off-by: Paolo Abeni <pabeni@redhat.com> -----BEGIN PGP SIGNATURE----- iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmc8sukSHHBhYmVuaUBy ZWRoYXQuY29tAAoJECkkeY3MjxOkLEYQAIMM6Qjh0bh3Byr3gOS1xZzXG+APLjP4 9Jr0p3i+X53i90jvVqzeVO5FTc95MVHSKZ3kvPkDMXSLUaEJxocNHCI5Dzl/2/qL wWdpUB6/ou+jKB4Bn6Z8OvVODT7qrr0tVa9M2/fuKWrIsOU/ntIhG8EhnGddk5U/ vKPSf5PUIb81uNRnF58VusY3wrT1dEoh9VfJYxL+ST+inPxjEAMy6Y+lmlsjGaSX jrS+Pp9KYiUwl3Qt0AQs+cG4OHkJdjbnChrfosWwpkiyddO8klVq06+wX/TiSzfF b9VZtBfy/GZs3lkE1mQkcILdtX5pP3YHQdpsuxFfVI0JHVszx2ck7WdoRux/8F0v kKZsYcO7bH9I1wMFP66Ff9hIbdEQaeucK+KdDkXyPNMfP91Vzmfjii8IBxOC36Ie BbOeFUrXyTxxJ2u0vf/X9JtIq8bcrkNrSd1n1jlGPMqG3FVzsY95+Oi4qfsyeUbl lS1PlVTqPMPFdX54HnxM3y2rJjhd7iXhkvmtuXNjRFThXlOiK3maAPWlM1aZ3b8u Vjs4JFUsW0tleZG+RzANjsGjXbf7AiPUGLZt+acem0K+fcjG4i5aGIAJrxwa/ORx eG74IZRt5cOI371W7gNLGHjwnuge8tFPgOWcRP2eozNm7jvMYALBejYS7eWUTvaf THcvVM+bupEZ =GzPr -----END PGP SIGNATURE----- Merge tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most significant set of changes is the per netns RTNL. The new behavior is disabled by default, regression risk should be contained. Notably the new config knob PTP_1588_CLOCK_VMCLOCK will inherit its default value from PTP_1588_CLOCK_KVM, as the first is intended to be a more reliable replacement for the latter. Core: - Started a very large, in-progress, effort to make the RTNL lock scope per network-namespace, thus reducing the lock contention significantly in the containerized use-case, comprising: - RCU-ified some relevant slices of the FIB control path - introduce basic per netns locking helpers - namespacified the IPv4 address hash table - remove rtnl_register{,_module}() in favour of rtnl_register_many() - refactor rtnl_{new,del,set}link() moving as much validation as possible out of RTNL lock - convert all phonet doit() and dumpit() handlers to RCU - convert IPv4 addresses manipulation to per-netns RTNL - convert virtual interface creation to per-netns RTNL the per-netns lock infrastructure is guarded by the CONFIG_DEBUG_NET_SMALL_RTNL knob, disabled by default ad interim. - Introduce NAPI suspension, to efficiently switching between busy polling (NAPI processing suspended) and normal processing. - Migrate the IPv4 routing input, output and control path from direct ToS usage to DSCP macros. This is a work in progress to make ECN handling consistent and reliable. - Add drop reasons support to the IPv4 rotue input path, allowing better introspection in case of packets drop. - Make FIB seqnum lockless, dropping RTNL protection for read access. - Make inet{,v6} addresses hashing less predicable. - Allow providing timestamp OPT_ID via cmsg, to correlate TX packets and timestamps Things we sprinkled into general kernel code: - Add small file operations for debugfs, to reduce the struct ops size. - Refactoring and optimization for the implementation of page_frag API, This is a preparatory work to consolidate the page_frag implementation. Netfilter: - Optimize set element transactions to reduce memory consumption - Extended netlink error reporting for attribute parser failure. - Make legacy xtables configs user selectable, giving users the option to configure iptables without enabling any other config. - Address a lot of false-positive RCU issues, pointed by recent CI improvements. BPF: - Put xsk sockets on a struct diet and add various cleanups. Overall, this helps to bump performance by 12% for some workloads. - Extend BPF selftests to increase coverage of XDP features in combination with BPF cpumap. - Optimize and homogenize bpf_csum_diff helper for all archs and also add a batch of new BPF selftests for it. - Extend netkit with an option to delegate skb->{mark,priority} scrubbing to its BPF program. - Make the bpf_get_netns_cookie() helper available also to tc(x) BPF programs. Protocols: - Introduces 4-tuple hash for connected udp sockets, speeding-up significantly connected sockets lookup. - Add a fastpath for some TCP timers that usually expires after close, the socket lock contention. - Add inbound and outbound xfrm state caches to speed up state lookups. - Avoid sending MPTCP advertisements on stale subflows, reducing risks on loosing them. - Make neighbours table flushing more scalable, maintaining per device neigh lists. Driver API: - Introduce a unified interface to configure transmission H/W shaping, and expose it to user-space via generic-netlink. - Add support for per-NAPI config via netlink. This makes napi configuration persistent across queues removal and re-creation. Requires driver updates, currently supported drivers are: nVidia/Mellanox mlx4 and mlx5, Broadcom brcm and Intel ice. - Add ethtool support for writing SFP / PHY firmware blocks. - Track RSS context allocation from ethtool core. - Implement support for mirroring to DSA CPU port, via TC mirror offload. - Consolidate FDB updates notification, to avoid duplicates on device-specific entries. - Expose DPLL clock quality level to the user-space. - Support master-slave PHY config via device tree. Tests and tooling: - forwarding: introduce deferred commands, to simplify the cleanup phase Drivers: - Updated several drivers - Amazon vNic, Google vNic, Microsoft vNic, Intel e1000e and Broadcom Tigon3 - to use netdev-genl to link the IRQs and queues to NAPI IDs, allowing busy polling and better introspection. - Ethernet high-speed NICs: - nVidia/Mellanox: - mlx5: - a large refactor to implement support for cross E-Switch scheduling - refactor H/W conter management to let it scale better - H/W GRO cleanups - Intel (100G, ice):: - add support for ethtool reset - implement support for per TX queue H/W shaping - AMD/Solarflare: - implement per device queue stats support - Broadcom (bnxt): - improve wildcard l4proto on IPv4/IPv6 ntuple rules - Marvell Octeon: - Add representor support for each Resource Virtualization Unit (RVU) device. - Hisilicon: - add support for the BMC Gigabit Ethernet - IBM (EMAC): - driver cleanup and modernization - Cisco (VIC): - raise the queues number limit to 256 - Ethernet virtual: - Google vNIC: - implement page pool support - macsec: - inherit lower device's features and TSO limits when offloading - virtio_net: - enable premapped mode by default - support for XDP socket(AF_XDP) zerocopy TX - wireguard: - set the TSO max size to be GSO_MAX_SIZE, to aggregate larger packets. - Ethernet NICs embedded and virtual: - Broadcom ASP: - enable software timestamping - Freescale: - add enetc4 PF driver - MediaTek: Airoha SoC: - implement BQL support - RealTek r8169: - enable TSO by default on r8168/r8125 - implement extended ethtool stats - Renesas AVB: - enable TX checksum offload - Synopsys (stmmac): - support header splitting for vlan tagged packets - move common code for DWMAC4 and DWXGMAC into a separate FPE module. - add dwmac driver support for T-HEAD TH1520 SoC - Synopsys (xpcs): - driver refactor and cleanup - TI: - icssg_prueth: add VLAN offload support - Xilinx emaclite: - add clock support - Ethernet switches: - Microchip: - implement support for the lan969x Ethernet switch family - add LAN9646 switch support to KSZ DSA driver - Ethernet PHYs: - Marvel: 88q2x: enable auto negotiation - Microchip: add support for LAN865X Rev B1 and LAN867X Rev C1/C2 - PTP: - Add support for the Amazon virtual clock device - Add PtP driver for s390 clocks - WiFi: - mac80211 - EHT 1024 aggregation size for transmissions - new operation to indicate that a new interface is to be added - support radio separation of multi-band devices - move wireless extension spy implementation to libiw - Broadcom: - brcmfmac: optional LPO clock support - Microchip: - add support for Atmel WILC3000 - Qualcomm (ath12k): - firmware coredump collection support - add debugfs support for a multitude of statistics - Qualcomm (ath5k): - Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support - Realtek: - rtw88: 8821au and 8812au USB adapters support - rtw89: add thermal protection - rtw89: fine tune BT-coexsitence to improve user experience - rtw89: firmware secure boot for WiFi 6 chip - Bluetooth - add Qualcomm WCN785x support for ids Foxconn 0xe0fc/0xe0f3 and 0x13d3:0x3623 - add Realtek RTL8852BE support for id Foxconn 0xe123 - add MediaTek MT7920 support for wireless module ids - btintel_pcie: add handshake between driver and firmware - btintel_pcie: add recovery mechanism - btnxpuart: add GPIO support to power save feature" * tag 'net-next-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1475 commits) mm: page_frag: fix a compile error when kernel is not compiled Documentation: tipc: fix formatting issue in tipc.rst selftests: nic_performance: Add selftest for performance of NIC driver selftests: nic_link_layer: Add selftest case for speed and duplex states selftests: nic_link_layer: Add link layer selftest for NIC driver bnxt_en: Add FW trace coredump segments to the coredump bnxt_en: Add a new ethtool -W dump flag bnxt_en: Add 2 parameters to bnxt_fill_coredump_seg_hdr() bnxt_en: Add functions to copy host context memory bnxt_en: Do not free FW log context memory bnxt_en: Manage the FW trace context memory bnxt_en: Allocate backing store memory for FW trace logs bnxt_en: Add a 'force' parameter to bnxt_free_ctx_mem() bnxt_en: Refactor bnxt_free_ctx_mem() bnxt_en: Add mem_valid bit to struct bnxt_ctx_mem_type bnxt_en: Update firmware interface spec to 1.10.3.85 selftests/bpf: Add some tests with sockmap SK_PASS bpf: fix recursive lock when verdict program return SK_PASS wireguard: device: support big tcp GSO wireguard: selftests: load nf_conntrack if not present ...	2024-11-21 08:28:08 -08:00
Vasily Gorbik	45c9f2b856	s390/entry: Mark IRQ entries to fix stack depot warnings The stack depot filters out everything outside of the top interrupt context as an uninteresting or irrelevant part of the stack traces. This helps with stack trace de-duplication, avoiding an explosion of saved stack traces that share the same IRQ context code path but originate from different randomly interrupted points, eventually exhausting the stack depot. Filtering uses in_irqentry_text() to identify functions within the .irqentry.text and .softirqentry.text sections, which then become the last stack trace entries being saved. While __do_softirq() is placed into the .softirqentry.text section by common code, populating .irqentry.text is architecture-specific. Currently, the .irqentry.text section on s390 is empty, which prevents stack depot filtering and de-duplication and could result in warnings like: Stack depot reached limit capacity WARNING: CPU: 0 PID: 286113 at lib/stackdepot.c:252 depot_alloc_stack+0x39a/0x3c8 with PREEMPT and KASAN enabled. Fix this by moving the IO/EXT interrupt handlers from .kprobes.text into the .irqentry.text section and updating the kprobes blacklist to include the .irqentry.text section. This is done only for asynchronous interrupts and explicitly not for program checks, which are synchronous and where the context beyond the program check is important to preserve. Despite machine checks being somewhat in between, they are extremely rare, and preserving context when possible is also of value. SVCs and Restart Interrupts are not relevant, one being always at the boundary to user space and the other being a one-time thing. IRQ entries filtering is also optionally used in ftrace function graph, where the same logic applies. Cc: stable@vger.kernel.org # 5.15+ Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-21 12:44:07 +01:00
Thomas Richter	7bc1ee28f4	s390/cpum_sf: Simplify release of SDBs and SDBTs Free_sampling_buffer() releases the Sampling Data Buffers (SDBs) and Sampling Data Buffer Table (SDBTs) allocated at event initialization. Both buffers are of PAGE_SIZE bytes. Each SDBT consists of 512 entries. The first 511 entries point to SDBs the last entry points to a successor SDBT. The last SDBT in the list points to the origin of all SDBTs. SDBTs do not contain holes, that is an entry always points to a SDB. If less than 511 SDBs have been allocation, the last entry points to the origin SDBT. Simplify the release of the SDBs and SDBTs, walk along the SDBT chain, release SDBs and SDBTs and stop when reaching the origin again. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-21 12:44:07 +01:00
Heiko Carstens	588a9836a4	s390/stacktrace: Use break instead of return statement arch_stack_walk_user_common() contains a return statement instead of a break statement in case store_ip() fails while trying to store a callchain entry of a user space process. This may lead to a missing pagefault_enable() call. If this happens any subsequent page fault of the process won't be resolved by the page fault handler and this in turn will lead to the process being killed. Use a break instead of a return statement to fix this. Fixes: `ebd912ff99` ("s390/stacktrace: Merge perf_callchain_user() and arch_stack_walk_user()") Cc: stable@vger.kernel.org Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-21 12:44:07 +01:00
Niklas Schnelle	897614f90f	s390/debug: Pass in and enforce output buffer size for format handlers The s390dbf format handler rely on being passed an output buffer sized such that their output will always fit and then use plain sprintf() to write to it. While only supplied data from other kernel components this still potentially allows buffer overwrite if callers are not careful. Instead just pass in the size of the output buffer and use scnprintf() instead of sprintf() and strscpy() instead of strcpy(). The latter also allows us to get rid of a separate strlen() call. Signed-off-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-21 12:44:06 +01:00
Linus Torvalds	aad3a0d084	ftrace updates for v6.13: - Merged tag ftrace-v6.12-rc4 There was a fix to locking in register_ftrace_graph() for shadow stacks that was sent upstream. But this code was also being rewritten, and the locking fix was needed. Merging this fix was required to continue the work. - Restructure the function graph shadow stack to prepare it for use with kretprobes With the goal of merging the shadow stack logic of function graph and kretprobes, some more restructuring of the function shadow stack is required. Move out function graph specific fields from the fgraph infrastructure and store it on the new stack variables that can pass data from the entry callback to the exit callback. Hopefully, with this change, the merge of kretprobes to use fgraph shadow stacks will be ready by the next merge window. - Make shadow stack 4k instead of using PAGE_SIZE. Some architectures have very large PAGE_SIZE values which make its use for shadow stacks waste a lot of memory. - Give shadow stacks its own kmem cache. When function graph is started, every task on the system gets a shadow stack. In the future, shadow stacks may not be 4K in size. Have it have its own kmem cache so that whatever size it becomes will still be efficient in allocations. - Initialize profiler graph ops as it will be needed for new updates to fgraph - Convert to use guard(mutex) for several ftrace and fgraph functions - Add more comments and documentation - Show function return address in function graph tracer Add an option to show the caller of a function at each entry of the function graph tracer, similar to what the function tracer does. - Abstract out ftrace_regs from being used directly like pt_regs ftrace_regs was created to store a partial pt_regs. It holds only the registers and stack information to get to the function arguments and return values. On several archs, it is simply a wrapper around pt_regs. But some users would access ftrace_regs directly to get the pt_regs which will not work on all archs. Make ftrace_regs an abstract structure that requires all access to its fields be through accessor functions. - Show how long it takes to do function code modifications When code modification for function hooks happen, it always had the time recorded in how long it took to do the conversion. But this value was never exported. Recently the code was touched due to new ROX modification handling that caused a large slow down in doing the modifications and had a significant impact on boot times. Expose the timings in the dyn_ftrace_total_info file. This file was created a while ago to show information about memory usage and such to implement dynamic function tracing. It's also an appropriate file to store the timings of this modification as well. This will make it easier to see the impact of changes to code modification on boot up timings. - Other clean ups and small fixes -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZztrUxQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qnnNAQD6w4q9VQ7oOE2qKLqtnj87h4c1GqKn SPkpEfC3n/ATEAD/fnYjT/eOSlHiGHuD/aTA+U/bETrT99bozGM/4mFKEgY= =6nCa -----END PGP SIGNATURE----- Merge tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ftrace updates from Steven Rostedt: - Restructure the function graph shadow stack to prepare it for use with kretprobes With the goal of merging the shadow stack logic of function graph and kretprobes, some more restructuring of the function shadow stack is required. Move out function graph specific fields from the fgraph infrastructure and store it on the new stack variables that can pass data from the entry callback to the exit callback. Hopefully, with this change, the merge of kretprobes to use fgraph shadow stacks will be ready by the next merge window. - Make shadow stack 4k instead of using PAGE_SIZE. Some architectures have very large PAGE_SIZE values which make its use for shadow stacks waste a lot of memory. - Give shadow stacks its own kmem cache. When function graph is started, every task on the system gets a shadow stack. In the future, shadow stacks may not be 4K in size. Have it have its own kmem cache so that whatever size it becomes will still be efficient in allocations. - Initialize profiler graph ops as it will be needed for new updates to fgraph - Convert to use guard(mutex) for several ftrace and fgraph functions - Add more comments and documentation - Show function return address in function graph tracer Add an option to show the caller of a function at each entry of the function graph tracer, similar to what the function tracer does. - Abstract out ftrace_regs from being used directly like pt_regs ftrace_regs was created to store a partial pt_regs. It holds only the registers and stack information to get to the function arguments and return values. On several archs, it is simply a wrapper around pt_regs. But some users would access ftrace_regs directly to get the pt_regs which will not work on all archs. Make ftrace_regs an abstract structure that requires all access to its fields be through accessor functions. - Show how long it takes to do function code modifications When code modification for function hooks happen, it always had the time recorded in how long it took to do the conversion. But this value was never exported. Recently the code was touched due to new ROX modification handling that caused a large slow down in doing the modifications and had a significant impact on boot times. Expose the timings in the dyn_ftrace_total_info file. This file was created a while ago to show information about memory usage and such to implement dynamic function tracing. It's also an appropriate file to store the timings of this modification as well. This will make it easier to see the impact of changes to code modification on boot up timings. - Other clean ups and small fixes * tag 'ftrace-v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (22 commits) ftrace: Show timings of how long nop patching took ftrace: Use guard to take ftrace_lock in ftrace_graph_set_hash() ftrace: Use guard to take the ftrace_lock in release_probe() ftrace: Use guard to lock ftrace_lock in cache_mod() ftrace: Use guard for match_records() fgraph: Use guard(mutex)(&ftrace_lock) for unregister_ftrace_graph() fgraph: Give ret_stack its own kmem cache fgraph: Separate size of ret_stack from PAGE_SIZE ftrace: Rename ftrace_regs_return_value to ftrace_regs_get_return_value selftests/ftrace: Fix check of return value in fgraph-retval.tc test ftrace: Use arch_ftrace_regs() for ftrace_regs_*() macros ftrace: Consolidate ftrace_regs accessor functions for archs using pt_regs ftrace: Make ftrace_regs abstract from direct use fgragh: No need to invoke the function call_filter_check_discard() fgraph: Simplify return address printing in function graph tracer function_graph: Remove unnecessary initialization in ftrace_graph_ret_addr() function_graph: Support recording and printing the function return address ftrace: Have calltime be saved in the fgraph storage ftrace: Use a running sleeptime instead of saving on shadow stack fgraph: Use fgraph data to store subtime for profiler ...	2024-11-20 11:34:10 -08:00
Linus Torvalds	0352387523	First step of consolidating the VDSO data page handling: The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: * consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. * removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. * seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7kyoTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoVBjD/9awdN2YeCGIM9rlHIktUdNRmRSL2SL 6av1CPffN5DenONYTXWrDYPkC4yfjUwIs8H57uzFo10yA7RQ/Qfq+O68k5GnuFew jvpmmYSZ6TT21AmAaCIhn+kdl9YbEJFvN2AWH85Bl29k9FGB04VzJlQMMjfEZ1a5 Mhwv+cfYNuPSZmU570jcxW2XgbyTWlLZBByXX/Tuz9bwpmtszba507bvo45x6gIP twaWNzrsyJpdXfMrfUnRiChN8jHlDN7I6fgQvpsoRH5FOiVwIFo0Ip2rKbk+ONfD W/rcU5oeqRIxRVDHzf2Sv8WPHMCLRv01ZHBcbJOtgvZC3YiKgKYoeEKabu9ZL1BH 6VmrxjYOBBFQHOYAKPqBuS7BgH5PmtMbDdSZXDfRaAKaCzhCRysdlWW7z48r2R// zPufb7J6Tle23AkuZWhFjvlGgSBl4zxnTFn31HYOyQps3TMI4y50Z2DhE/EeU8a6 DRl8/k1KQVDUZ6udJogS5kOr1J8pFtUPrA2uhR8UyLdx7YKiCzcdO1qWAjtXlVe8 oNpzinU+H9bQqGe9IyS7kCG9xNaCRZNkln5Q1WfnkTzg5f6ihfaCvIku3l4bgVpw 3HmcxYiC6RxQB+ozwN7hzCCKT4L9aMhr/457TNOqRkj2Elw3nvJ02L4aI86XAKLE jwO9Fkp9qcCxCw== =q5eD -----END PGP SIGNATURE----- Merge tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull vdso data page handling updates from Thomas Gleixner: "First steps of consolidating the VDSO data page handling. The VDSO data page handling is architecture specific for historical reasons, but there is no real technical reason to do so. Aside of that VDSO data has become a dump ground for various mechanisms and fail to provide a clear separation of the functionalities. Clean this up by: - consolidating the VDSO page data by getting rid of architecture specific warts especially in x86 and PowerPC. - removing the last includes of header files which are pulling in other headers outside of the VDSO namespace. - seperating timekeeping and other VDSO data accordingly. Further consolidation of the VDSO page handling is done in subsequent changes scheduled for the next merge window. This also lays the ground for expanding the VDSO time getters for independent PTP clocks in a generic way without making every architecture add support seperately" * tag 'timers-vdso-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (42 commits) x86/vdso: Add missing brackets in switch case vdso: Rename struct arch_vdso_data to arch_vdso_time_data powerpc: Split systemcfg struct definitions out from vdso powerpc: Split systemcfg data out of vdso data page powerpc: Add kconfig option for the systemcfg page powerpc/pseries/lparcfg: Use num_possible_cpus() for potential processors powerpc/pseries/lparcfg: Fix printing of system_active_processors powerpc/procfs: Propagate error of remap_pfn_range() powerpc/vdso: Remove offset comment from 32bit vdso_arch_data x86/vdso: Split virtual clock pages into dedicated mapping x86/vdso: Delete vvar.h x86/vdso: Access vdso data without vvar.h x86/vdso: Move the rng offset to vsyscall.h x86/vdso: Access rng vdso data without vvar.h x86/vdso: Access timens vdso data without vvar.h x86/vdso: Allocate vvar page from C code x86/vdso: Access rng data from kernel without vvar x86/vdso: Place vdso_data at beginning of vvar page x86/vdso: Use __arch_get_vdso_data() to access vdso data x86/mm/mmap: Remove arch_vma_name() ...	2024-11-19 16:09:13 -08:00
Linus Torvalds	5c2b050848	A set of updates for the interrupt subsystem: - Tree wide: * Make nr_irqs static to the core code and provide accessor functions to remove existing and prevent future aliasing problems with local variables or function arguments of the same name. - Core code: * Prevent freeing an interrupt in the devres code which is not managed by devres in the first place. * Use seq_put_decimal_ull_width() for decimal values output in /proc/interrupts which increases performance significantly as it avoids parsing the format strings over and over. * Optimize raising the timer and hrtimer soft interrupts by using the 'set bit only' variants instead of the combined version which checks whether ksoftirqd should be woken up. The latter is a pointless exercise as both soft interrupts are raised in the context of the timer interrupt and therefore never wake up ksoftirqd. * Delegate timer/hrtimer soft interrupt processing to a dedicated thread on RT. Timer and hrtimer soft interrupts are always processed in ksoftirqd on RT enabled kernels. This can lead to high latencies when other soft interrupts are delegated to ksoftirqd as well. The separate thread allows to run them seperately under a RT scheduling policy to reduce the latency overhead. - Drivers: * New drivers or extensions of existing drivers to support Renesas RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt chips * Support for multi-cluster GICs on MIPS. MIPS CPUs can come with multiple CPU clusters, where each CPU cluster has its own GIC (Generic Interrupt Controller). This requires to access the GIC of a remote cluster through a redirect register block. This is encapsulated into a set of helper functions to keep the complexity out of the actual code paths which handle the GIC details. * Support for encrypted guests in the ARM GICV3 ITS driver The ITS page needs to be shared with the hypervisor and therefore must be decrypted. * Small cleanups and fixes all over the place -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmc7ggcTHHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYoaf7D/9G6FgJXx/60zqnpnOr9Yx0hxjaI47x PFyCd3P05qyVMBYXfI99vrSKuVdMZXJ/fH5L83y+sOaTASyLTzg37igZycIDJzLI FnHh/m/+UA8k2aIC5VUiNAjne2RLaTZiRN15uEHFVjByC5Y+YTlCNUE4BBhg5RfQ hKmskeffWdtui3ou13CSNvbFn+pmqi4g6n1ysUuLhiwM2E5b1rZMprcCOnun/cGP IdUQsODNWTTv9eqPJez985M6A1x2SCGNv7Z73h58B9N0pBRPEC1xnhUnCJ1sA0cJ pnfde2C1lztEjYbwDngy0wgq0P6LINjQ5Ma2YY2F2hTMsXGJxGPDZm24/u5uR46x N/gsOQMXqw6f5yvbiS7Asx9WzR6ry8rJl70QRgTyozz7xxJTaiNm2HqVFe2wc+et Q/BzaKdhmUJj1GMZmqD2rrgwYeDcb4wWYNtwjM4PVHHxYlJVq0mEF1kLLS8YDyjf HuGPVqtSkt3E0+Br3FKcv5ltUQP8clXbudc6L1u98YBfNK12hW8L+c3YSvIiFoYM ZOAeANPM7VtQbP2Jg2q81Dd3CShImt5jqL2um+l8g7+mUE7l9gyuO/w/a5dQ57+b kx7mHHIW2zCeHrkZZbRUYzI2BJfMCCOVN4Ax5OZxTLnLsL9VEehy8NM8QYT4TS8R XmTOYW3U9XR3gw== =JqxC -----END PGP SIGNATURE----- Merge tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: "Tree wide: - Make nr_irqs static to the core code and provide accessor functions to remove existing and prevent future aliasing problems with local variables or function arguments of the same name. Core code: - Prevent freeing an interrupt in the devres code which is not managed by devres in the first place. - Use seq_put_decimal_ull_width() for decimal values output in /proc/interrupts which increases performance significantly as it avoids parsing the format strings over and over. - Optimize raising the timer and hrtimer soft interrupts by using the 'set bit only' variants instead of the combined version which checks whether ksoftirqd should be woken up. The latter is a pointless exercise as both soft interrupts are raised in the context of the timer interrupt and therefore never wake up ksoftirqd. - Delegate timer/hrtimer soft interrupt processing to a dedicated thread on RT. Timer and hrtimer soft interrupts are always processed in ksoftirqd on RT enabled kernels. This can lead to high latencies when other soft interrupts are delegated to ksoftirqd as well. The separate thread allows to run them seperately under a RT scheduling policy to reduce the latency overhead. Drivers: - New drivers or extensions of existing drivers to support Renesas RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt chips - Support for multi-cluster GICs on MIPS. MIPS CPUs can come with multiple CPU clusters, where each CPU cluster has its own GIC (Generic Interrupt Controller). This requires to access the GIC of a remote cluster through a redirect register block. This is encapsulated into a set of helper functions to keep the complexity out of the actual code paths which handle the GIC details. - Support for encrypted guests in the ARM GICV3 ITS driver The ITS page needs to be shared with the hypervisor and therefore must be decrypted. - Small cleanups and fixes all over the place" * tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) irqchip/riscv-aplic: Prevent crash when MSI domain is missing genirq/proc: Use seq_put_decimal_ull_width() for decimal values softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT. timers: Use __raise_softirq_irqoff() to raise the softirq. hrtimer: Use __raise_softirq_irqoff() to raise the softirq riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers irqchip: Add T-HEAD C900 ACLINT SSWI driver dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK irqchip/mips-gic: Prevent indirect access to clusters without CPU cores irqchip/mips-gic: Multi-cluster support irqchip/mips-gic: Setup defaults in each cluster irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic() irqchip/mips-gic: Replace open coded online CPU iterations genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show() genirq/devres: Don't free interrupt which is not managed by devres irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool() irqchip/aspeed-intc: Add AST27XX INTC support dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC ...	2024-11-19 15:54:19 -08:00
Linus Torvalds	f41dac3efb	Performance events changes for v6.13: - Uprobes: - Add BPF session support (Jiri Olsa) - Switch to RCU Tasks Trace flavor for better performance (Andrii Nakryiko) - Massively increase uretprobe SMP scalability by SRCU-protecting the uretprobe lifetime (Andrii Nakryiko) - Kill xol_area->slot_count (Oleg Nesterov) - Core facilities: - Implement targeted high-frequency profiling by adding the ability for an event to "pause" or "resume" AUX area tracing (Adrian Hunter) - VM profiling/sampling: - Correct perf sampling with guest VMs (Colton Lewis) - New hardware support: - x86/intel: Add PMU support for Intel ArrowLake-H CPUs (Dapeng Mi) - Misc fixes and enhancements: - x86/intel/pt: Fix buffer full but size is 0 case (Adrian Hunter) - x86/amd: Warn only on new bits set (Breno Leitao) - x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init (Jean Delvare) - uprobes: Re-order struct uprobe_task to save some space (Christophe JAILLET) - x86/rapl: Move the pmu allocation out of CPU hotplug (Kan Liang) - x86/rapl: Clean up cpumask and hotplug (Kan Liang) - uprobes: Deuglify xol_get_insn_slot/xol_free_insn_slot paths (Oleg Nesterov) Signed-off-by: Ingo Molnar <mingo@kernel.org> -----BEGIN PGP SIGNATURE----- iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmc7eKERHG1pbmdvQGtl cm5lbC5vcmcACgkQEnMQ0APhK1i57A/+KQ6TrIoICVTE+BPlDfUw8NU+N3DagVb0 dzoyDxlDRsnsYzeXZipPn+3IitX1w+DrGxBNIojSoiFVCLnHIKgo4uHbj7cVrR7J fBTVSnoJ94SGAk5ySebvLwMLce/YhXBeHK2lx6W/pI6acNcxzDfIabjjETeqltUo g7hmT9lo10pzZEZyuUfYX9khlWBxda1dKHc9pMIq7baeLe4iz/fCGlJ0K4d4M4z3 NPZw239Np6iHUwu3Lcs4gNKe4rcDe7Bt47hpedemHe0Y+7c4s2HaPxbXWxvDtE76 mlsg93i28f8SYxeV83pREn0EOCptXcljhiek+US+GR7NSbltMnV+uUiDfPKIE9+Y vYP/DYF9hx73FsOucEFrHxYYcePorn3pne5/khBYWdQU6TnlrBYWpoLQsjgCKTTR 4JhCFlBZ5cDpc6ihtpwCwVTQ4Q/H7vM1XOlDwx0hPhcIPPHDreaQD/wxo61jBdXf PY0EPAxh3BcQxfPYuDS+XiYjQ8qO8MtXMKz5bZyHBZlbHwccV6T4ExjsLKxFk5As 6BG8pkBWLg7drXAgVdleIY0ux+34w/Zzv7gemdlQxvWLlZrVvpjiG93oU3PTpZeq A2UD9eAOuXVD6+HsF/dmn88sFmcLWbrMskFWujkvhEUmCvSGAnz3YSS/mLEawBiT 2xI8xykNWSY= =ItOT -----END PGP SIGNATURE----- Merge tag 'perf-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull performance events updates from Ingo Molnar: "Uprobes: - Add BPF session support (Jiri Olsa) - Switch to RCU Tasks Trace flavor for better performance (Andrii Nakryiko) - Massively increase uretprobe SMP scalability by SRCU-protecting the uretprobe lifetime (Andrii Nakryiko) - Kill xol_area->slot_count (Oleg Nesterov) Core facilities: - Implement targeted high-frequency profiling by adding the ability for an event to "pause" or "resume" AUX area tracing (Adrian Hunter) VM profiling/sampling: - Correct perf sampling with guest VMs (Colton Lewis) New hardware support: - x86/intel: Add PMU support for Intel ArrowLake-H CPUs (Dapeng Mi) Misc fixes and enhancements: - x86/intel/pt: Fix buffer full but size is 0 case (Adrian Hunter) - x86/amd: Warn only on new bits set (Breno Leitao) - x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init (Jean Delvare) - uprobes: Re-order struct uprobe_task to save some space (Christophe JAILLET) - x86/rapl: Move the pmu allocation out of CPU hotplug (Kan Liang) - x86/rapl: Clean up cpumask and hotplug (Kan Liang) - uprobes: Deuglify xol_get_insn_slot/xol_free_insn_slot paths (Oleg Nesterov)" * tag 'perf-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (32 commits) perf/core: Correct perf sampling with guest VMs perf/x86: Refactor misc flag assignments perf/powerpc: Use perf_arch_instruction_pointer() perf/core: Hoist perf_instruction_pointer() and perf_misc_flags() perf/arm: Drop unused functions uprobes: Re-order struct uprobe_task to save some space perf/x86/amd/uncore: Avoid a false positive warning about snprintf truncation in amd_uncore_umc_ctx_init perf/x86/intel: Do not enable large PEBS for events with aux actions or aux sampling perf/x86/intel/pt: Add support for pause / resume perf/core: Add aux_pause, aux_resume, aux_start_paused perf/x86/intel/pt: Fix buffer full but size is 0 case uprobes: SRCU-protect uretprobe lifetime (with timeout) uprobes: allow put_uprobe() from non-sleepable softirq context perf/x86/rapl: Clean up cpumask and hotplug perf/x86/rapl: Move the pmu allocation out of CPU hotplug uprobe: Add support for session consumer uprobe: Add data pointer to consumer handlers perf/x86/amd: Warn only on new bits set uprobes: fold xol_take_insn_slot() into xol_get_insn_slot() uprobes: kill xol_area->slot_count ...	2024-11-19 13:34:06 -08:00
Yabin Cui	b9c44b9147	perf/core: Save raw sample data conditionally based on sample type Currently, space for raw sample data is always allocated within sample records for both BPF output and tracepoint events. This leads to unused space in sample records when raw sample data is not requested. This patch enforces checking sample type of an event in perf_sample_save_raw_data(). So raw sample data will only be saved if explicitly requested, reducing overhead when it is not needed. Fixes: `0a9081cf0a` ("perf/core: Add perf_sample_save_raw_data() helper") Signed-off-by: Yabin Cui <yabinc@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240515193610.2350456-2-yabinc@google.com	2024-11-19 09:23:42 +01:00
Linus Torvalds	0338cd9c22	s390 updates for 6.13 merge window - Add firmware sysfs interface which allows user space to retrieve the dump area size of the machine - Add 'measurement_chars_full' CHPID sysfs attribute to make the complete associated Channel-Measurements Characteristics Block available - Add virtio-mem support - Move gmap aka KVM page fault handling from the main fault handler to KVM code. This is the first step to make s390 KVM page fault handling similar to other architectures. With this first step the main fault handler does not have any special handling anymore, and therefore convert it to support LOCK_MM_AND_FIND_VMA - With gcc 14 s390 support for flag output operand support for inline assemblies was added. This allows for several optimizations - Provide a cmpxchg inline assembly which makes use of this, and provide all variants of arch_try_cmpxchg() so that the compiler can generate slightly better code - Convert a few cmpxchg() loops to try_cmpxchg() loops - Similar to x86 add a CC_OUT() helper macro (and other macros), and convert all inline assemblies to make use of them, so that depending on compiler version better code can be generated - List installed host-key hashes in sysfs if the machine supports the Query Ultravisor Keys UVC - Add 'Retrieve Secret' ioctl which allows user space in protected execution guests to retrieve previously stored secrets from the Ultravisor - Add pkey-uv module which supports the conversion of Ultravisor retrievable secrets to protected keys - Extend the existing paes cipher to exploit the full AES-XTS hardware acceleration introduced with message-security assist extension 10 - Convert hopefully all sysfs show functions to use sysfs_emit() so that the constant flow of such patches stop - For PCI devices make use of the newly added Topology ID attribute to enable whole card multi-function support despite the change to PCHID per port. Additionally improve the overall robustness and usability of the multifunction support - Various other small improvements, fixes, and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmc3Y9oACgkQIg7DeRsp bsJigQ//fcZ3NqA6rARWYoVNEEzUfvDha1LchhAV4aBUu5cIZFc/SQKxMuACVELh wW7RKCWhGLML5c/cPjke4ECBJiFYI/MQNB3xkDl1i2FDyUNs1Fdq9Be3Y0uXXO+U TxvSYiPm3p/Gik8G2KhDPivqPQmrF7o2KNyRWqPBdqRl5U4NLnwJpCMbddP/PTdI 2ytJ2OGuXo3djzibXldUbik4UG6hXUqGzeIMbrOG8ZiFCeznVck/OHydoLR4MKBy MyrmqCxTu/p7gpTanccpTQR+uC5lodxad4kMh86CV3w41HhrWV1z912eNdsz6MMR B8kGPx5D0juXtUbB0Mn0kdM6Kak5/BaSA58HRNJz9AMa5MVOj+YTAmlTN5E7uGzg graPE3ilwEgj0pArdhwyhIEnVGP381NyhTbMDhTUhRB6lMJVyN5202YZCieezr/u dIyurno1T0T8if1B6n7tQQprIVSQDthzE8lCAtYrll86vLIbiXGxCg2yaVLEz1aL ptUZ84/bT29G8XivZAeDLjzRSwde+l5pkZWd3rBmdHC8FCH8Epiy/ZB5ozpJ1u02 fViqheeTsTC/nR6DlwylF4YET6QVPYgLOUZCnBQJnTsVRFtBpAXIaHyvOJYNuxUN ybtsgzJ59bMES8DpBCIibBoJOD1vyoWoeXu06bhGuMT+wahCwgE= =v+um -----END PGP SIGNATURE----- Merge tag 's390-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Add firmware sysfs interface which allows user space to retrieve the dump area size of the machine - Add 'measurement_chars_full' CHPID sysfs attribute to make the complete associated Channel-Measurements Characteristics Block available - Add virtio-mem support - Move gmap aka KVM page fault handling from the main fault handler to KVM code. This is the first step to make s390 KVM page fault handling similar to other architectures. With this first step the main fault handler does not have any special handling anymore, and therefore convert it to support LOCK_MM_AND_FIND_VMA - With gcc 14 s390 support for flag output operand support for inline assemblies was added. This allows for several optimizations: - Provide a cmpxchg inline assembly which makes use of this, and provide all variants of arch_try_cmpxchg() so that the compiler can generate slightly better code - Convert a few cmpxchg() loops to try_cmpxchg() loops - Similar to x86 add a CC_OUT() helper macro (and other macros), and convert all inline assemblies to make use of them, so that depending on compiler version better code can be generated - List installed host-key hashes in sysfs if the machine supports the Query Ultravisor Keys UVC - Add 'Retrieve Secret' ioctl which allows user space in protected execution guests to retrieve previously stored secrets from the Ultravisor - Add pkey-uv module which supports the conversion of Ultravisor retrievable secrets to protected keys - Extend the existing paes cipher to exploit the full AES-XTS hardware acceleration introduced with message-security assist extension 10 - Convert hopefully all sysfs show functions to use sysfs_emit() so that the constant flow of such patches stop - For PCI devices make use of the newly added Topology ID attribute to enable whole card multi-function support despite the change to PCHID per port. Additionally improve the overall robustness and usability of the multifunction support - Various other small improvements, fixes, and cleanups * tag 's390-6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (133 commits) s390/cio/ioasm: Convert to use flag output macros s390/cio/qdio: Convert to use flag output macros s390/sclp: Convert to use flag output macros s390/dasd: Convert to use flag output macros s390/boot/physmem: Convert to use flag output macros s390/pci: Convert to use flag output macros s390/kvm: Convert to use flag output macros s390/extmem: Convert to use flag output macros s390/string: Convert to use flag output macros s390/diag: Convert to use flag output macros s390/irq: Convert to use flag output macros s390/smp: Convert to use flag output macros s390/uv: Convert to use flag output macros s390/pai: Convert to use flag output macros s390/mm: Convert to use flag output macros s390/cpu_mf: Convert to use flag output macros s390/cpcmd: Convert to use flag output macros s390/topology: Convert to use flag output macros s390/time: Convert to use flag output macros s390/pageattr: Convert to use flag output macros ...	2024-11-18 17:45:41 -08:00
Colton Lewis	04782e6391	perf/core: Hoist perf_instruction_pointer() and perf_misc_flags() For clarity, rename the arch-specific definitions of these functions to perf_arch_* to denote they are arch-specifc. Define the generic-named functions in one place where they can call the arch-specific ones as needed. Signed-off-by: Colton Lewis <coltonlewis@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Acked-by: Madhavan Srinivasan <maddy@linux.ibm.com> Acked-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lore.kernel.org/r/20241113190156.2145593-3-coltonlewis@google.com	2024-11-14 10:40:01 +01:00
Heiko Carstens	a6122f690a	s390/diag: Convert to use flag output macros Use flag output macros in inline asm to allow for better code generation if the compiler has support for the flag output constraint. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-13 14:31:33 +01:00
Heiko Carstens	5a5897d65b	s390/irq: Convert to use flag output macros Use flag output macros in inline asm to allow for better code generation if the compiler has support for the flag output constraint. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-13 14:31:32 +01:00
Heiko Carstens	ca6dd1faa0	s390/cpcmd: Convert to use flag output macros Use flag output macros in inline asm to allow for better code generation if the compiler has support for the flag output constraint. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-13 14:31:32 +01:00
Heiko Carstens	d4e50cfe9c	s390/topology: Convert to use flag output macros Use flag output macros in inline asm to allow for better code generation if the compiler has support for the flag output constraint. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-13 14:31:31 +01:00
Heiko Carstens	eade39cc72	s390/sthyi: Convert to use flag output macros Use flag output macros in inline asm to allow for better code generation if the compiler has support for the flag output constraint. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-13 14:31:31 +01:00
Masahiro Yamada	182c02a6cd	s390/syscalls: Convert filechk to if_changed The filechk macro always executes the syscalltbl script (and discards the output if there are no changes). Using if_changed is more efficient because it avoids running the script when the target is up-to-date and the command remains unchanged. All other architectures use if_changed for generating syscall headers. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20241111134603.2063226-3-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-12 14:01:30 +01:00
Masahiro Yamada	e17aca2005	s390/syscalls: Remove unnecessary argument of filechk_syshdr The filechk_syshdr macro receives $@ in both cases, making the argument redundant. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20241111134603.2063226-2-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-12 14:01:30 +01:00
Masahiro Yamada	0708967e2d	s390/syscalls: Avoid creation of arch/arch/ directory Building the kernel with ARCH=s390 creates a weird arch/arch/ directory. $ find arch/arch arch/arch arch/arch/s390 arch/arch/s390/include arch/arch/s390/include/generated arch/arch/s390/include/generated/asm arch/arch/s390/include/generated/uapi arch/arch/s390/include/generated/uapi/asm The root cause is 'targets' in arch/s390/kernel/syscalls/Makefile, where the relative path is incorrect. Strictly speaking, 'targets' was not necessary in the first place because this Makefile uses 'filechk' instead of 'if_changed'. However, this commit keeps it, as it will be useful when converting 'filechk' to 'if_changed' later. Fixes: `5c75824d91` ("s390/syscalls: add Makefile to generate system call header files") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Link: https://lore.kernel.org/r/20241111134603.2063226-1-masahiroy@kernel.org Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-12 14:01:29 +01:00
Heiko Carstens	42898f74b2	s390/perf_cpum_cf: Convert to use local64_try_cmpxchg() Convert local64_cmpxchg() usages to local64_try_cmpxchg() in order to generate slightly better code. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-12 14:01:29 +01:00
Heiko Carstens	e449399ffd	s390/perf_cpum_sf: Convert to use try_cmpxchg128() Convert cmpxchg128() usages to try_cmpxchg128() in order to generate slightly better code. Reviewed-by: Juergen Christ <jchrist@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-12 14:01:29 +01:00
Alexander Egorenkov	01bfb451a3	s390/dump: Add firmware sysfs attribute for dump area size Dump tools from s390-tools such as zipl need to know the correct dump area size of the machine they run on in order to be able to create valid standalone dumper images. Therefore, allow it to be obtained through the new sysfs read-only attribute /sys/firmware/dump/dump_area_size. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Suggested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Alexander Egorenkov <egorenar@linux.ibm.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-11-12 14:01:27 +01:00
Christian Göttsche	6140be90ec	fs/xattr: add at family syscalls Add the four syscalls setxattrat(), getxattrat(), listxattrat() and removexattrat(). Those can be used to operate on extended attributes, especially security related ones, either relative to a pinned directory or on a file descriptor without read access, avoiding a /proc/<pid>/fd/<fd> detour, requiring a mounted procfs. One use case will be setfiles(8) setting SELinux file contexts ("security.selinux") without race conditions and without a file descriptor opened with read access requiring SELinux read permission. Use the do_{name}at() pattern from fs/open.c. Pass the value of the extended attribute, its length, and for setxattrat(2) the command (XATTR_CREATE or XATTR_REPLACE) via an added struct xattr_args to not exceed six syscall arguments and not merging the AT_ and XATTR_* flags. [AV: fixes by Christian Brauner folded in, the entire thing rebased on top of {filename,file}_...xattr() primitives, treatment of empty pathnames regularized. As the result, AT_EMPTY_PATH+NULL handling is cheap, so f...(2) can use it] Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Link: https://lore.kernel.org/r/20240426162042.191916-1-cgoettsche@seltendoof.de Reviewed-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Christian Brauner <brauner@kernel.org> CC: x86@kernel.org CC: linux-alpha@vger.kernel.org CC: linux-kernel@vger.kernel.org CC: linux-arm-kernel@lists.infradead.org CC: linux-ia64@vger.kernel.org CC: linux-m68k@lists.linux-m68k.org CC: linux-mips@vger.kernel.org CC: linux-parisc@vger.kernel.org CC: linuxppc-dev@lists.ozlabs.org CC: linux-s390@vger.kernel.org CC: linux-sh@vger.kernel.org CC: sparclinux@vger.kernel.org CC: linux-fsdevel@vger.kernel.org CC: audit@vger.kernel.org CC: linux-arch@vger.kernel.org CC: linux-api@vger.kernel.org CC: linux-security-module@vger.kernel.org CC: selinux@vger.kernel.org [brauner: slight tweaks] Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2024-11-06 12:59:44 -05:00
Thomas Weißschuh	98333a84e3	s390/vdso: Drop LBASE_VDSO This constant is always "0", providing no value and making the logic harder to understand. Also prepare for a consolidation of the vdso linkerscript logic by aligning it with other architectures. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/all/20241010-vdso-generic-base-v1-3-b64f0842d512@linutronix.de	2024-11-02 12:37:32 +01:00
Thomas Richter	f55bd479d8	s390/cpum_sf: Fix and protect memory allocation of SDBs with mutex Reservation of the PMU hardware is done at first event creation and is protected by a pair of mutex_lock() and mutex_unlock(). After reservation of the PMU hardware the memory required for the PMUs the event is to be installed on is allocated by allocate_buffers() and alloc_sampling_buffer(). This done outside of the mutex protection. Without mutex protection two or more concurrent invocations of perf_event_init() may run in parallel. This can lead to allocation of Sample Data Blocks (SDBs) multiple times for the same PMU. Prevent this and protect memory allocation of SDBs by mutex. Fixes: `8a6fe8f21e` ("s390/cpum_sf: Use refcount_t instead of atomic_t") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-31 10:50:06 +01:00
Sven Schnelle	2d7de7a301	s390/time: Add PtP driver Add a small PtP driver which allows user space to get the values of the physical and tod clock. This allows programs like chrony to use STP as clock source and steer the kernel clock. The physical clock can be used as a debugging aid to get the clock without any additional offsets like STP steering or LPAR offset. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-3-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-30 17:02:39 -07:00
Sven Schnelle	f247fd22e9	s390/time: Add clocksource id to TOD clock To allow specifying the clock source in the upcoming PtP driver, add a clocksource ID to the s390 TOD clock. Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Link: https://patch.msgid.link/20241023065601.449586-2-svens@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2024-10-30 17:02:38 -07:00
Claudio Imbrenda	e8d8d97218	s390: Remove gmap pointer from lowcore Remove the gmap pointer from lowcore, since it is not used anymore. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-9-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:49:19 +01:00
Claudio Imbrenda	05066cafa9	s390/mm/fault: Handle guest-related program interrupts in KVM Any program interrupt that happens in the host during the execution of a KVM guest will now short circuit the fault handler and return to KVM immediately. Guest fault handling (including pfault) will happen entirely inside KVM. When sie64a() returns zero, current->thread.gmap_int_code will contain the program interrupt number that caused the exit, or zero if the exit was not caused by a host program interrupt. KVM will now take care of handling all guest faults in vcpu_post_run(). Since gmap faults will not be visible by the rest of the kernel, remove GMAP_FAULT, the linux fault handlers for secure execution faults, the exception table entries for the sie instruction, the nop padding after the sie instruction, and all other references to guest faults from the s390 code. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Co-developed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-6-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:49:18 +01:00
Claudio Imbrenda	f96cb0d61d	s390/entry: Remove __GMAP_ASCE and use _PIF_GUEST_FAULT again Now that the guest ASCE is passed as a parameter to __sie64a(), _PIF_GUEST_FAULT can be used again to determine whether the fault was a guest or host fault. Since the guest ASCE will not be taken from the gmap pointer in lowcore anymore, __GMAP_ASCE can be removed. For the same reason the guest ASCE needs now to be saved into the cr1 save area unconditionally. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/r/20241022120601.167009-2-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:49:18 +01:00
Thomas Richter	f4d5e64c5c	s390/cpum_sf: Rework call to sf_disable() Setup_pmc_cpu() function body consists of one single switch statement with two cases PMC_INIT and PMC_RELEASE. In both of these cases sf_disable() is invoked to turn off the CPU Measurement sampling facility. Move sf_disable() out of the switch statement. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:20 +01:00
Thomas Richter	a0bd7dacbd	s390/cpum_sf: Handle CPU hotplug remove during sampling CPU hotplug remove handling triggers the following function call sequence: CPUHP_AP_PERF_S390_SF_ONLINE --> s390_pmu_sf_offline_cpu() ... CPUHP_AP_PERF_ONLINE --> perf_event_exit_cpu() The s390 CPUMF sampling CPU hotplug handler invokes: s390_pmu_sf_offline_cpu() +--> cpusf_pmu_setup() +--> setup_pmc_cpu() +--> deallocate_buffers() This function de-allocates all sampling data buffers (SDBs) allocated for that CPU at event initialization. It also clears the PMU_F_RESERVED bit. The CPU is gone and can not be sampled. With the event still being active on the removed CPU, the CPU event hotplug support in kernel performance subsystem triggers the following function calls on the removed CPU: perf_event_exit_cpu() +--> perf_event_exit_cpu_context() +--> __perf_event_exit_context() +--> __perf_remove_from_context() +--> event_sched_out() +--> cpumsf_pmu_del() +--> cpumsf_pmu_stop() +--> hw_perf_event_update() to stop and remove the event. During removal of the event, the sampling device driver tries to read out the remaining samples from the sample data buffers (SDBs). But they have already been freed (and may have been re-assigned). This may lead to a use after free situation in which case the samples are most likely invalid. In the best case the memory has not been reassigned and still contains valid data. Remedy this situation and check if the CPU is still in reserved state (bit PMU_F_RESERVED set). In this case the SDBs have not been released an contain valid data. This is always the case when the event is removed (and no CPU hotplug off occured). If the PMU_F_RESERVED bit is not set, the SDB buffers are gone. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:18 +01:00
Thomas Richter	0c25132396	s390/cpum_sf: Fix format string in pr_err() Fix format string in pr_err() and use the built-in hexadecimal prefix %#x to display a number with a leading hexadecimal indicator 0x. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:18 +01:00
Thomas Richter	f2e9d46ac6	s390/cpum_sf: Use sf_buffer_available() Use sf_buffer_available() consistently throughtout the code to test for the existence of sampling buffer. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:18 +01:00
Thomas Richter	de6d22ccdc	s390/cpum_sf: Consistently use goto out for function exit When the sampling buffer allocation fails in __hw_perf_event_init(), jump to the end of the function and return the result. This is consistent with the other error handling and return conditions in this function. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Reviewed-By: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:18 +01:00
Thomas Richter	db417646fe	s390/cpum_sf: Do not re-enable event after deletion Event delete removes an event from the event list, but common code invokes the PMU's enable function later on. This happens in event_sched_out() and leads to the following call sequence: event_sched_out() +--> cpumsf_pmu_del() +--> cpumsf_pmu_enable() In cpumsf_pmu_enable() return immediately when the event is not active. Also remove an unneeded if clause. That if() statement is only reached when flag PMU_F_IN_USE has been set in cpumsf_pmu_add(). And this function also sets cpuhw->event to a valid value. Remove WARN_ON_ONCE() statement which never triggered. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:18 +01:00
Steffen Eiden	f00469a642	s390/uv: Retrieve UV secrets sysfs support Reflect the updated content in the query information UVC to the sysfs at /sys/firmware/query * new UV-query sysfs entry for the maximum number of retrievable secrets the UV can store for one secure guest. * new UV-query sysfs entry for the maximum number of association secrets the UV can store for one secure guest. * max_secrets contains the sum of max association and max retrievable secrets. Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241024062638.1465970-7-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:17 +01:00
Steffen Eiden	7c9137af20	s390/uv: Retrieve UV secrets support Provide a kernel API to retrieve secrets from the UV secret store. Add two new functions: * `uv_get_secret_metadata` - get metadata for a given secret identifier * `uv_retrieve_secret` - get the secret value for the secret index With those two functions one can extract the secret for a given secret id, if the secret is retrievable. Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241024084107.2418186-1-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:16 +01:00
Steffen Eiden	28a51ee8eb	s390/uv: Provide host-key hashes in sysfs Utilize the new Query Ultravisor Keys UVC to give user space the information which host-keys are installed on the system. Create a new sysfs directory 'firmware/uv/keys' that contains the hash of the host-key and the backup host-key of that system. Additionally, the file 'all' contains the response from the UVC possibly containing more key-hashes than currently known. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241023075529.2561384-1-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:16 +01:00
Steffen Eiden	bb4ad73a28	s390/uv: Refactor uv-sysfs creation Streamline the sysfs generation to make it more extensible. Add a function to create a sysfs entry in the uv-sysfs dir. Use this function for the query directory. Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Signed-off-by: Steffen Eiden <seiden@linux.ibm.com> Link: https://lore.kernel.org/r/20241015113940.3088249-2-seiden@linux.ibm.com Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-29 11:17:16 +01:00
Mete Durlu	401721a54c	s390/ipl: Switch over to sysfs_emit() Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over sprintf for presenting attributes to user space. Convert the left-over uses in the s390/ipl code. Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:24 +02:00
Mete Durlu	c50262498d	s390/nospec: Switch over to sysfs_emit() Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over sprintf for presenting attributes to user space. Convert the left-over uses in the s390/nospec-sysfs code. Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:24 +02:00
Mete Durlu	d151f8f788	s390/perf_event: Switch over to sysfs_emit() Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over sprintf for presenting attributes to user space. Convert the left-over uses in the s390/perf_event code. Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:24 +02:00
Mete Durlu	5b2a85a24b	s390/smp: Switch over to sysfs_emit() Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over sprintf for presenting attributes to user space. Convert the left-over uses in the s390/smp code. Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:24 +02:00
Mete Durlu	b4b920cded	s390/time: Switch over to sysfs_emit() Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over sprintf for presenting attributes to user space. Convert the left-over uses in the s390/time code. Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:23 +02:00
Mete Durlu	f94de4f17b	s390/topology: Switch over to sysfs_emit() Per Documentation/filesystems/sysfs.rst, sysfs_emit() is preferred over sprintf for presenting attributes to user space. Convert the left-over uses in the s390/topology code. Signed-off-by: Mete Durlu <meted@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:23 +02:00
David Hildenbrand	82a0fcb1ad	s390/kdump: Provide is_kdump_kernel() implementation s390 sets "elfcorehdr_addr = ELFCORE_ADDR_MAX;" early during setup_arch() to deactivate the "elfcorehdr= kernel" parameter, resulting in is_kdump_kernel() returning "false". During vmcore_init()->elfcorehdr_alloc(), if on a dump kernel and allocation succeeded, elfcorehdr_addr will be set to a valid address and is_kdump_kernel() will consequently return "true". is_kdump_kernel() should return a consistent result during all boot stages, and properly return "true" if in a kdump environment - just like it is done on powerpc where "false" is indicated in fadump environments, as added in commit `b098f1c323` ("powerpc/fadump: make is_kdump_kernel() return false when fadump is active"). Similarly provide a custom is_kdump_kernel() implementation that will only return "true" in kdump environments, and will do so consistently during boot. Update the documentation of dump_available(). Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Alexander Egorenkov <egorenar@linux.ibm.com> Link: https://lore.kernel.org/r/20241023090651.1115507-1-david@redhat.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:23 +02:00
Heiko Carstens	e6ebf0d651	s390: Fix various typos Run codespell on arch/s390 and drivers/s390 and fix all typos. Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-25 16:03:23 +02:00
Bart Van Assche	951248383a	s390/irq: Switch to irq_get_nr_irqs() Use the irq_get_nr_irqs() function instead of the global variable 'nr_irqs'. Prepare for changing 'nr_irqs' from an exported global variable into a variable with file scope. Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20241015190953.1266194-6-bvanassche@acm.org	2024-10-16 21:56:57 +02:00
Thomas Weißschuh	3aa8881ebd	s390/vdso: Remove timekeeper includes Since the generic VDSO clock mode storage is used, this header file is unused and can be removed. This avoids including a non-VDSO header while building the VDSO, which can lead to compilation errors. Also drop the comment which is out of date and in the wrong place. Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Link: https://lore.kernel.org/all/20241010-vdso-generic-arch_update_vsyscall-v1-6-7fe5a3ea4382@linutronix.de	2024-10-15 17:50:29 +02:00
Steven Rostedt	7888af4166	ftrace: Make ftrace_regs abstract from direct use ftrace_regs was created to hold registers that store information to save function parameters, return value and stack. Since it is a subset of pt_regs, it should only be used by its accessor functions. But because pt_regs can easily be taken from ftrace_regs (on most archs), it is tempting to use it directly. But when running on other architectures, it may fail to build or worse, build but crash the kernel! Instead, make struct ftrace_regs an empty structure and have the architectures define __arch_ftrace_regs and all the accessor functions will typecast to it to get to the actual fields. This will help avoid usage of ftrace_regs directly. Link: https://lore.kernel.org/all/20241007171027.629bdafd@gandalf.local.home/ Cc: "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org> Cc: "x86@kernel.org" <x86@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Naveen N Rao <naveen@kernel.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/20241008230628.958778821@goodmis.org Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2024-10-10 20:18:01 -04:00
Antonia Jonas	f626e79bfe	s390/cpum_cf: Correct typo CYLCE Signed-off-by: Antonia Jonas <antonia@toertel.de> Link: https://lore.kernel.org/r/20241003115648.26188-1-antonia@toertel.de Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-10 15:32:44 +02:00
Thomas Richter	31d1d8a35e	s390/cpum_sf: Set bit PMU_F_ENABLED enabled after lpp() invocation Set PMU enabled bit after lpp() call. This ensures the proper task PID is loaded by the llp instruction into Program Parameter register from where it is copied to each sampling data buffer (SDB) sample entry after the sampling was enabled. The barrier() instruction is not needed as cpumsf_pmu_enable() changes a CPU specific variable. Only the CPU that task is running on changes structure members. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-10-10 15:32:43 +02:00
Linus Torvalds	e08d227840	more s390 updates for 6.12 merge window - Clean up and improve vdso code: use SYM_* macros for function and data annotations, add CFI annotations to fix GDB unwinding, optimize the chacha20 implementation - Add vfio-ap driver feature advertisement for use by libvirt and mdevctl -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmb39nkACgkQjYWKoQLX FBiacggAlHrwDYIZdr7bd+0bJa8Wq5STrdo1XiohQhCrRg+4rXRaquEQszWVheRk qi/9u+MKJFtbd4VMbf+WqEi2P4Pnn5cByv+As3B4RMR0Tp3uQK5TlDbh7wbJ+pIG AQQoL5D2z05rvbYgc1Wvt/3lgrfs2EzUY3cAyIMmFwSp0On+psDuwuFVtWzH0jCr fxCwX7mrF6OoBMR0QSUxAvoOujUhd4CANk3n6rNxJfi2AC/gX/ZkFUrV5a3FvJ2Z X7fQgDc3rivsm5wFuoBvlrH5J1jhmWtsg8Tiygos7BBehpQ5NUsJMDGS8kR+avbF iYKg1oMy5/ZXekBjqkEMO6Vp3nCrdg== =Idf7 -----END PGP SIGNATURE----- Merge tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Clean up and improve vdso code: use SYM_* macros for function and data annotations, add CFI annotations to fix GDB unwinding, optimize the chacha20 implementation - Add vfio-ap driver feature advertisement for use by libvirt and mdevctl * tag 's390-6.12-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/vfio-ap: Driver feature advertisement s390/vdso: Use one large alternative instead of an alternative branch s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects tools: Add additional SYM_*() stubs to linkage.h s390/vdso: Use macros for annotation of asm functions s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack() s390/vdso: Fix comment within __arch_chacha20_blocks_nostack() s390/vdso: Get rid of permutation constants	2024-09-28 09:11:46 -07:00
Al Viro	cb787f4ac0	[tree-wide] finally take no_llseek out no_llseek had been defined to NULL two years ago, in commit `868941b144` ("fs: remove no_llseek") To quote that commit, At -rc1 we'll need do a mechanical removal of no_llseek - git grep -l -w no_llseek \| grep -v porting.rst \| while read i; do sed -i '/\<no_llseek\>/d' $i done would do it. Unfortunately, that hadn't been done. Linus, could you do that now, so that we could finally put that thing to rest? All instances are of the form .llseek = no_llseek, so it's obviously safe. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2024-09-27 08:18:43 -07:00
Linus Torvalds	348325d644	asm-generic updates for 6.12 These are only two small patches, one cleanup for arch/alpha and a preparation patch cleaning up the handling of runtime constants in the linker scripts. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEiK/NIGsWEZVxh/FrYKtH/8kJUicFAmboHV0ACgkQYKtH/8kJ UifHfhAAqTHHxxe+HiphGBPHN0ODyLVUs7fOQHtLOSmJlQa6x1TCR/+1nL1kTDbe j6EcIRxZrllQZ+jZBA8z2XsAmjjBLUxCB4yu6oxYJh8OdFyqeVM/myZEr2TAyb0o A3D9b+rfnY8sr9XaFHSHGWbh4c33cGQhACumHVAjtPvU06Voskq4pAf9ZnpGkNBe AdKNTVG6+w84dKUNuzXcexP8d7SnsXNfd6T9+evtW/M+fziWzs3aPQr+GZED96E5 8IRldXi2nzIwm9LT5IzZAt+QvpVb2Qob1+rej9p5WpptGp840CROTo61SwaYHCMV DDxTlmADsApWJQ3B5gDu6QS2jXT4eeOrY3JI2baeCyOV6auj15UXKiWc2QVoHOVU 6+PzlSFuLatI6WsxXfOcD0o3bfQXMKS6zCC/4eD7Y/SmmMqBbL5+d9sU5lwkiOFl swoswF4HTwo5d6NdkSuJOt6KA/V8a68lBhKYBXHu2yuLi/LDNOaipEvBHQLzfnlY 91e5DtDiHK9CYDNkwiR+bV9rQnhA535JSlfR8VtpU/SJTTjyF+dkt9JGPdivXoIA 8Zv+DN/oyrahUtCrgzzPXahOuBrfD/WfIajsvpEK6vNPuBhscsZFg/thc70FMIXo qn8Dmpi/CnDWFNOy0xO0cbYWrGBGn9E7kzbSZ78tUIjPUmmEKfk= =OOMl -----END PGP SIGNATURE----- Merge tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic Pull asm-generic updates from Arnd Bergmann: "These are only two small patches, one cleanup for arch/alpha and a preparation patch cleaning up the handling of runtime constants in the linker scripts" * tag 'asm-generic-6.12' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic: runtime constants: move list of constants to vmlinux.lds.h alpha: no need to include asm/xchg.h twice	2024-09-26 11:54:40 -07:00
Heiko Carstens	d714abee5f	s390/vdso: Use one large alternative instead of an alternative branch Replace the alternative branch with a larger alternative that contains both paths. That way the two paths are closer together and it is easier to change both paths if the need should arise. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-23 17:57:04 +02:00
Heiko Carstens	c902b578ee	s390/vdso: Use SYM_DATA_START_LOCAL()/SYM_DATA_END() for data objects Use SYM_DATA_START_LOCAL()/SYM_DATA_END() in vgetrandom-chacha.S so that the constants end up in an object with correct size: readelf -Ws vgetrandom-chacha.o Num: Value Size Type Bind Vis Ndx Name ... 5: 0000000000000000 32 OBJECT LOCAL DEFAULT 5 chacha20_constants Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-23 17:57:04 +02:00
Jens Remus	d361390d9f	s390/vdso: Use macros for annotation of asm functions Use the macros SYM_FUNC_START() and SYM_FUNC_END() to annotate the functions in assembly. Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-23 17:57:04 +02:00
Jens Remus	5cccfc8be6	s390/vdso: Add CFI annotations to __arch_chacha20_blocks_nostack() This allows proper unwinding, for instance when using a debugger such as GDB. Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-23 17:57:04 +02:00
Heiko Carstens	ff35a3f0ca	s390/vdso: Fix comment within __arch_chacha20_blocks_nostack() Fix comment within __arch_chacha20_blocks_nostack() so the comment reflects what the code is doing. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-23 17:57:04 +02:00
Heiko Carstens	8e391ae060	s390/vdso: Get rid of permutation constants The three byte masks for VECTOR PERMUTE are not needed, since the instruction VECTOR SHIFT LEFT DOUBLE BY BYTE can be used to implement the required "rotate left" easily. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-23 17:57:04 +02:00
Linus Torvalds	1ec6d09789	s390 updates for 6.12 merge window - Optimize ftrace and kprobes code patching and avoid stop machine for kprobes if sequential instruction fetching facility is available - Add hiperdispatch feature to dynamically adjust CPU capacity in vertical polarization to improve scheduling efficiency and overall performance. Also add infrastructure for handling warning track interrupts (WTI), allowing for graceful CPU preemption - Rework crypto code pkey module and split it into separate, independent modules for sysfs, PCKMO, CCA, and EP11, allowing modules to load only when the relevant hardware is available - Add hardware acceleration for HMAC modes and the full AES-XTS cipher, utilizing message-security assist extensions (MSA) 10 and 11. It introduces new shash implementations for HMAC-SHA224/256/384/512 and registers the hardware-accelerated AES-XTS cipher as the preferred option. Also add clear key token support - Add MSA 10 and 11 processor activity instrumentation counters to perf and update PAI Extension 1 NNPA counters - Cleanup cpu sampling facility code and rework debug/WARN_ON_ONCE statements - Add support for SHA3 performance enhancements introduced with MSA 12 - Add support for the query authentication information feature of MSA 13 and introduce the KDSA CPACF instruction. Provide query and query authentication information in sysfs, enabling tools like cpacfinfo to present this data in a human-readable form - Update kernel disassembler instructions - Always enable EXPOLINE_EXTERN if supported by the compiler to ensure kpatch compatibility - Add missing warning handling and relocated lowcore support to the early program check handler - Optimize ftrace_return_address() and avoid calling unwinder - Make modules use kernel ftrace trampolines - Strip relocs from the final vmlinux ELF file to make it roughly 2 times smaller - Dump register contents and call trace for early crashes to the console - Generate ptdump address marker array dynamically - Fix rcu_sched stalls that might occur when adding or removing large amounts of pages at once to or from the CMM balloon - Fix deadlock caused by recursive lock of the AP bus scan mutex - Unify sync and async register save areas in entry code - Cleanup debug prints in crypto code - Various cleanup and sanitizing patches for the decompressor - Various small ftrace cleanups -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmbsZawACgkQjYWKoQLX FBg+Ogf+NiKPfvI14NcTwnOHB6qz8ApPdGfN9bNVtQxtK3epeAvtj0cMonAuKpRg xckTRRd8y0guhCT7Q2+WitSgA5eYDn+u9/Ux5YuKUdUdXolQ0D64BJNtVeEFkmJj s+Lesb8cVI9T2VBZOpuF9lJigfsDALBkFroqN4MDudDeahS+qy33bAc0OoqYNXHo S6OwPK1/tEG9O/oTN2V4mN+aP0B3/dl7Msezb0gfAXQJA+WUAyMNK0RHvoG9uzaa BWAyWWYABj6woGZEAQAzXcbzkQiRPixTqZVe6e4YndXhIlEnB/Z2AQFdTpT9V7En eOmmve3QuJa0hkF9q4H/anvOMPntTg== =Xagq -----END PGP SIGNATURE----- Merge tag 's390-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Optimize ftrace and kprobes code patching and avoid stop machine for kprobes if sequential instruction fetching facility is available - Add hiperdispatch feature to dynamically adjust CPU capacity in vertical polarization to improve scheduling efficiency and overall performance. Also add infrastructure for handling warning track interrupts (WTI), allowing for graceful CPU preemption - Rework crypto code pkey module and split it into separate, independent modules for sysfs, PCKMO, CCA, and EP11, allowing modules to load only when the relevant hardware is available - Add hardware acceleration for HMAC modes and the full AES-XTS cipher, utilizing message-security assist extensions (MSA) 10 and 11. It introduces new shash implementations for HMAC-SHA224/256/384/512 and registers the hardware-accelerated AES-XTS cipher as the preferred option. Also add clear key token support - Add MSA 10 and 11 processor activity instrumentation counters to perf and update PAI Extension 1 NNPA counters - Cleanup cpu sampling facility code and rework debug/WARN_ON_ONCE statements - Add support for SHA3 performance enhancements introduced with MSA 12 - Add support for the query authentication information feature of MSA 13 and introduce the KDSA CPACF instruction. Provide query and query authentication information in sysfs, enabling tools like cpacfinfo to present this data in a human-readable form - Update kernel disassembler instructions - Always enable EXPOLINE_EXTERN if supported by the compiler to ensure kpatch compatibility - Add missing warning handling and relocated lowcore support to the early program check handler - Optimize ftrace_return_address() and avoid calling unwinder - Make modules use kernel ftrace trampolines - Strip relocs from the final vmlinux ELF file to make it roughly 2 times smaller - Dump register contents and call trace for early crashes to the console - Generate ptdump address marker array dynamically - Fix rcu_sched stalls that might occur when adding or removing large amounts of pages at once to or from the CMM balloon - Fix deadlock caused by recursive lock of the AP bus scan mutex - Unify sync and async register save areas in entry code - Cleanup debug prints in crypto code - Various cleanup and sanitizing patches for the decompressor - Various small ftrace cleanups * tag 's390-6.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (84 commits) s390/crypto: Display Query and Query Authentication Information in sysfs s390/crypto: Add Support for Query Authentication Information s390/crypto: Rework RRE and RRF CPACF inline functions s390/crypto: Add KDSA CPACF Instruction s390/disassembler: Remove duplicate instruction format RSY_RDRU s390/boot: Move boot_printk() code to own file s390/boot: Use boot_printk() instead of sclp_early_printk() s390/boot: Rename decompressor_printk() to boot_printk() s390/boot: Compile all files with the same march flag s390: Use MARCH_HAS__FEATURES defines s390: Provide MARCH_HAS__FEATURES defines s390/facility: Disable compile time optimization for decompressor code s390/boot: Increase minimum architecture to z10 s390/als: Remove obsolete comment s390/sha3: Fix SHA3 selftests failures s390/pkey: Add AES xts and HMAC clear key token support s390/cpacf: Add MSA 10 and 11 new PCKMO functions s390/mm: Add cond_resched() to cmm_alloc/free_pages() s390/pai_ext: Update PAI extension 1 counters s390/pai_crypto: Add support for MSA 10 and 11 pai counters ...	2024-09-21 09:02:54 -07:00
Linus Torvalds	617a814f14	ALong with the usual shower of singleton patches, notable patch series in this pull request are: "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. "mm: split underused THPs" from Yu Zhao. Improve THP=always policy - this was overprovisioning THPs in sparsely accessed memory areas. "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZu1BBwAKCRDdBJ7gKXxA jlWNAQDYlqQLun7bgsAN4sSvi27VUuWv1q70jlMXTfmjJAvQqwD/fBFVR6IOOiw7 AkDbKWP2k0hWPiNJBGwoqxdHHx09Xgo= =s0T+ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "Along with the usual shower of singleton patches, notable patch series in this pull request are: - "Align kvrealloc() with krealloc()" from Danilo Krummrich. Adds consistency to the APIs and behaviour of these two core allocation functions. This also simplifies/enables Rustification. - "Some cleanups for shmem" from Baolin Wang. No functional changes - mode code reuse, better function naming, logic simplifications. - "mm: some small page fault cleanups" from Josef Bacik. No functional changes - code cleanups only. - "Various memory tiering fixes" from Zi Yan. A small fix and a little cleanup. - "mm/swap: remove boilerplate" from Yu Zhao. Code cleanups and simplifications and .text shrinkage. - "Kernel stack usage histogram" from Pasha Tatashin and Shakeel Butt. This is a feature, it adds new feilds to /proc/vmstat such as $ grep kstack /proc/vmstat kstack_1k 3 kstack_2k 188 kstack_4k 11391 kstack_8k 243 kstack_16k 0 which tells us that 11391 processes used 4k of stack while none at all used 16k. Useful for some system tuning things, but partivularly useful for "the dynamic kernel stack project". - "kmemleak: support for percpu memory leak detect" from Pavel Tikhomirov. Teaches kmemleak to detect leaksage of percpu memory. - "mm: memcg: page counters optimizations" from Roman Gushchin. "3 independent small optimizations of page counters". - "mm: split PTE/PMD PT table Kconfig cleanups+clarifications" from David Hildenbrand. Improves PTE/PMD splitlock detection, makes powerpc/8xx work correctly by design rather than by accident. - "mm: remove arch_make_page_accessible()" from David Hildenbrand. Some folio conversions which make arch_make_page_accessible() unneeded. - "mm, memcg: cg2 memory{.swap,}.peak write handlers" fro David Finkel. Cleans up and fixes our handling of the resetting of the cgroup/process peak-memory-use detector. - "Make core VMA operations internal and testable" from Lorenzo Stoakes. Rationalizaion and encapsulation of the VMA manipulation APIs. With a view to better enable testing of the VMA functions, even from a userspace-only harness. - "mm: zswap: fixes for global shrinker" from Takero Funaki. Fix issues in the zswap global shrinker, resulting in improved performance. - "mm: print the promo watermark in zoneinfo" from Kaiyang Zhao. Fill in some missing info in /proc/zoneinfo. - "mm: replace follow_page() by folio_walk" from David Hildenbrand. Code cleanups and rationalizations (conversion to folio_walk()) resulting in the removal of follow_page(). - "improving dynamic zswap shrinker protection scheme" from Nhat Pham. Some tuning to improve zswap's dynamic shrinker. Significant reductions in swapin and improvements in performance are shown. - "mm: Fix several issues with unaccepted memory" from Kirill Shutemov. Improvements to the new unaccepted memory feature, - "mm/mprotect: Fix dax puds" from Peter Xu. Implements mprotect on DAX PUDs. This was missing, although nobody seems to have notied yet. - "Introduce a store type enum for the Maple tree" from Sidhartha Kumar. Cleanups and modest performance improvements for the maple tree library code. - "memcg: further decouple v1 code from v2" from Shakeel Butt. Move more cgroup v1 remnants away from the v2 memcg code. - "memcg: initiate deprecation of v1 features" from Shakeel Butt. Adds various warnings telling users that memcg v1 features are deprecated. - "mm: swap: mTHP swap allocator base on swap cluster order" from Chris Li. Greatly improves the success rate of the mTHP swap allocation. - "mm: introduce numa_memblks" from Mike Rapoport. Moves various disparate per-arch implementations of numa_memblk code into generic code. - "mm: batch free swaps for zap_pte_range()" from Barry Song. Greatly improves the performance of munmap() of swap-filled ptes. - "support large folio swap-out and swap-in for shmem" from Baolin Wang. With this series we no longer split shmem large folios into simgle-page folios when swapping out shmem. - "mm/hugetlb: alloc/free gigantic folios" from Yu Zhao. Nice performance improvements and code reductions for gigantic folios. - "support shmem mTHP collapse" from Baolin Wang. Adds support for khugepaged's collapsing of shmem mTHP folios. - "mm: Optimize mseal checks" from Pedro Falcato. Fixes an mprotect() performance regression due to the addition of mseal(). - "Increase the number of bits available in page_type" from Matthew Wilcox. Increases the number of bits available in page_type! - "Simplify the page flags a little" from Matthew Wilcox. Many legacy page flags are now folio flags, so the page-based flags and their accessors/mutators can be removed. - "mm: store zero pages to be swapped out in a bitmap" from Usama Arif. An optimization which permits us to avoid writing/reading zero-filled zswap pages to backing store. - "Avoid MAP_FIXED gap exposure" from Liam Howlett. Fixes a race window which occurs when a MAP_FIXED operqtion is occurring during an unrelated vma tree walk. - "mm: remove vma_merge()" from Lorenzo Stoakes. Major rotorooting of the vma_merge() functionality, making ot cleaner, more testable and better tested. - "misc fixups for DAMON {self,kunit} tests" from SeongJae Park. Minor fixups of DAMON selftests and kunit tests. - "mm: memory_hotplug: improve do_migrate_range()" from Kefeng Wang. Code cleanups and folio conversions. - "Shmem mTHP controls and stats improvements" from Ryan Roberts. Cleanups for shmem controls and stats. - "mm: count the number of anonymous THPs per size" from Barry Song. Expose additional anon THP stats to userspace for improved tuning. - "mm: finish isolate/putback_lru_page()" from Kefeng Wang: more folio conversions and removal of now-unused page-based APIs. - "replace per-quota region priorities histogram buffer with per-context one" from SeongJae Park. DAMON histogram rationalization. - "Docs/damon: update GitHub repo URLs and maintainer-profile" from SeongJae Park. DAMON documentation updates. - "mm/vdpa: correct misuse of non-direct-reclaim __GFP_NOFAIL and improve related doc and warn" from Jason Wang: fixes usage of page allocator __GFP_NOFAIL and GFP_ATOMIC flags. - "mm: split underused THPs" from Yu Zhao. Improve THP=always policy. This was overprovisioning THPs in sparsely accessed memory areas. - "zram: introduce custom comp backends API" frm Sergey Senozhatsky. Add support for zram run-time compression algorithm tuning. - "mm: Care about shadow stack guard gap when getting an unmapped area" from Mark Brown. Fix up the various arch_get_unmapped_area() implementations to better respect guard areas. - "Improve mem_cgroup_iter()" from Kinsey Ho. Improve the reliability of mem_cgroup_iter() and various code cleanups. - "mm: Support huge pfnmaps" from Peter Xu. Extends the usage of huge pfnmap support. - "resource: Fix region_intersects() vs add_memory_driver_managed()" from Huang Ying. Fix a bug in region_intersects() for systems with CXL memory. - "mm: hwpoison: two more poison recovery" from Kefeng Wang. Teaches a couple more code paths to correctly recover from the encountering of poisoned memry. - "mm: enable large folios swap-in support" from Barry Song. Support the swapin of mTHP memory into appropriately-sized folios, rather than into single-page folios" * tag 'mm-stable-2024-09-20-02-31' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (416 commits) zram: free secondary algorithms names uprobes: turn xol_area->pages[2] into xol_area->page uprobes: introduce the global struct vm_special_mapping xol_mapping Revert "uprobes: use vm_special_mapping close() functionality" mm: support large folios swap-in for sync io devices mm: add nr argument in mem_cgroup_swapin_uncharge_swap() helper to support large folios mm: fix swap_read_folio_zeromap() for large folios with partial zeromap mm/debug_vm_pgtable: Use pxdp_get() for accessing page table entries set_memory: add __must_check to generic stubs mm/vma: return the exact errno in vms_gather_munmap_vmas() memcg: cleanup with !CONFIG_MEMCG_V1 mm/show_mem.c: report alloc tags in human readable units mm: support poison recovery from copy_present_page() mm: support poison recovery from do_cow_fault() resource, kunit: add test case for region_intersects() resource: make alloc_free_mem_region() works for iomem_resource mm: z3fold: deprecate CONFIG_Z3FOLD vfio/pci: implement huge_fault support mm/arm64: support large pfn mappings mm/x86: support large pfn mappings ...	2024-09-21 07:29:05 -07:00
Heiko Carstens	b920aa77be	s390/vdso: Wire up getrandom() vdso implementation Provide the s390 specific vdso getrandom() architecture backend. _vdso_rng_data required data is placed within the _vdso_data vvar page, by using a hardcoded offset larger than vdso_data. As required the chacha20 implementation does not write to the stack. The implementation follows more or less the arm64 implementations and makes use of vector instructions. It has a fallback to the getrandom() system call for machines where the vector facility is not installed. The check if the vector facility is installed, as well as an optimization for machines with the vector-enhancements facility 2, is implemented with alternatives, avoiding runtime checks. Note that __kernel_getrandom() is implemented without the vdso user wrapper which would setup a stack frame for odd cases (aka very old glibc variants) where the caller has not done that. All callers of __kernel_getrandom() are required to setup a stack frame, like the C ABI requires it. The vdso testcases vdso_test_getrandom and vdso_test_chacha pass. Benchmark on a z16: $ ./vdso_test_getrandom bench-single vdso: 25000000 times in 0.493703559 seconds syscall: 25000000 times in 6.584025337 seconds Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2024-09-13 20:57:31 +02:00
Heiko Carstens	c1ae1b4ef5	s390/vdso: Move vdso symbol handling to separate header file The vdso.h header file, which is included at many places, includes generated header files. This can easily lead to recursive header file inclusions if the vdso code is changed. Therefore move the vdso symbol code, which requires the generated header files, to a separate header file, and include it at the two locations which require it. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2024-09-13 17:28:36 +02:00
Heiko Carstens	e10863fffe	s390/vdso: Allow alternatives in vdso code Implement the infrastructure required to allow alternatives in vdso code. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2024-09-13 17:28:36 +02:00
Heiko Carstens	013e984397	s390/alternatives: Remove ALT_FACILITY_EARLY Patch all alternatives which depend on facilities from the decompressor. There is no technical reason which enforces to split patching of such alternatives to the decompressor and the kernel. This simplifies alternative handling a bit, since one alternative type is removed. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>	2024-09-13 17:28:36 +02:00
Finn Callies	9fed8d7c46	s390/crypto: Display Query and Query Authentication Information in sysfs Displays the query (fc=0) and query authentication information (fc=127) as binary in sysfs per CPACF instruction. Files are located in /sys/devices/system/cpu/cpacf/. These information can be fetched via asm already except for PCKMO because this instruction is privileged. To offer a unified interface all CPACF instructions will have this information displayed in sysfs in files <instruction>_query_raw and <instruction>_query_auth_info_raw. A new tool introduced into s390-tools called cpacfinfo will use this information to convert and display in human readable form. Suggested-by: Harald Freudenberger <freude@linux.ibm.com> Reviewed-by: Harald Freudenberger <freude@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Finn Callies <fcallies@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-12 14:13:27 +02:00
Jens Remus	ab22f8d908	s390/disassembler: Remove duplicate instruction format RSY_RDRU Instruction format RSY_RDRU is a duplicate of RSY_RURD2. Use the latter, as it follows the s390-specific conventions for instruction format naming used in binutils. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-09-12 14:13:27 +02:00
Heiko Carstens	ebcc369f18	s390: Use MARCH_HAS__FEATURES defines Replace CONFIG_HAVE_MARCH__FEATURES with MARCH_HAS_*_FEATURES everywhere so code gets compiled correctly depending on if the target is the kernel or the decompressor. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-09-07 17:12:42 +02:00
Thomas Richter	0114009953	s390/pai_ext: Update PAI extension 1 counters Update the internal array of PAI extension 1 NNPA counter string table to support specialized processor instrumentation assist instructions. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-09-05 15:17:23 +02:00
Thomas Richter	4ae48555d0	s390/pai_crypto: Add support for MSA 10 and 11 pai counters Update the internal array of PAI crypto counter string table with new counters supported with Message Security Assist extension (MSA) 10 and MSA 11. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Finn Callies <fcallies@linux.ibm.com> Tested-by: Finn Callies <fcallies@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>	2024-09-05 15:17:23 +02:00
Mike Rapoport (Microsoft)	46bcce5031	arch, mm: move definition of node_data to generic code Every architecture that supports NUMA defines node_data in the same way: struct pglist_data *node_data[MAX_NUMNODES]; No reason to keep multiple copies of this definition and its forward declarations, especially when such forward declaration is the only thing in include/asm/mmzone.h for many architectures. Add definition and declaration of node_data to generic code and drop architecture-specific versions. Link: https://lkml.kernel.org/r/20240807064110.1003856-8-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Acked-by: Davidlohr Bueso <dave@stgolabs.net> Tested-by: Zi Yan <ziy@nvidia.com> # for x86_64 and arm64 Tested-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [arm64 + CXL via QEMU] Acked-by: Dan Williams <dan.j.williams@intel.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Rob Herring (Arm) <robh@kernel.org> Cc: Samuel Holland <samuel.holland@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-03 21:15:28 -07:00
David Hildenbrand	85a7e5432d	s390/uv: convert gmap_destroy_page() from follow_page() to folio_walk Let's get rid of another follow_page() user and perform the UV calls under PTL -- which likely should be fine. No need for an additional reference while holding the PTL: uv_destroy_folio() and uv_convert_from_secure_folio() raise the refcount, so any concurrent make_folio_secure() would see an unexpted reference and cannot set PG_arch_1 concurrently. Do we really need a writable PTE? Likely yes, because the "destroy" part is, in comparison to the export, a destructive operation. So we'll keep the writability check for now. We'll lose the secretmem check from follow_page(). Likely we don't care about that here. Link: https://lkml.kernel.org/r/20240802155524.517137-9-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Matthew Wilcox <willy@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-01 20:26:01 -07:00
David Hildenbrand	3290ef3c7f	s390/uv: drop arch_make_page_accessible() All code was converted to using arch_make_folio_accessible(), let's drop arch_make_page_accessible(). Link: https://lkml.kernel.org/r/20240729183844.388481-4-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Janosch Frank <frankja@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-09-01 20:25:53 -07:00
Mete Durlu	ea31f1c6b4	s390/hiperdispatch: Add hiperdispatch debug counters Add three counters to follow and understand hiperdispatch behavior; * adjustment_count (amount of capacity adjustments triggered) * greedy_time_ms (time spent while all cpus are on high capacity) * conservative_time_ms (time spent while only entitled cpus are on high capacity) These counters can be found under /sys/kernel/debug/s390/hiperdispatch/ Time counters are in <msec> format and only cover the time spent when hiperdispatch is active. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Mete Durlu	441cc6f5b6	s390/hiperdispatch: Add hiperdispatch debug attributes Add two attributes for debug purposes. They can be found under; /sys/devices/system/cpu/hiperdispatch/ * hd_stime_threshold : allows user to adjust steal time threshold * hd_delay_factor : allows user to adjust delay factor of hiperdispatch work (after topology updates, delayed work is always delayed extra by this factor) hd_stime_threshold can have values between 0-100 as it represents a percentage value. hd_delay_factor can have values greater than 1. It is multiplied with the default delay to achieve a longer interval, pushing back the next hiperdispatch adjustment after a topology update. Ex: if delay interval is 250ms and the delay factor is 4; delayed interval is now 1000ms(1sec). After each capacity adjustment or topology change, work has a delayed interval of 1 sec for one interval. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Mete Durlu	b9271a5334	s390/hiperdispatch: Add hiperdispatch sysctl interface Expose hiperdispatch controls via sysctl. The user can now toggle hiperdispatch via assigning 0 or 1 to s390.hiperdispatch attribute. When hiperdipatch is toggled on, it tries to adjust CPU capacities, while system is in vertical polarization to gain performance benefits from different CPU polarizations. Disabling hiperdispatch reverts the CPU capacities to their default (HIGH_CAPACITY) and stops the dynamic adjustments. Introduce a kconfig option HIPERDISPATCH_ON which allows users to use hiperdispatch by default on vertical polarization. Using the sysctl attribute s390.hiperdispatch would overwrite this behavior. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Mete Durlu	1e5aa12d47	s390/hiperdispatch: Add trace events Add trace events to debug hiperdispatch behavior and track domain rebuilding. Two events provide information about the decision making of hiperdispatch and the adjustments made. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Mete Durlu	c0d4ba054f	s390/hiperdispatch: Add steal time averaging The measurements done by hiperdispatch can have sudden spikes and dips during run time. To prevent these outliers effecting the decision making process and causing adjustment overhead, use weighted average of the steal time. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Mete Durlu	6843d6d97c	s390/hiperdispatch: Introduce hiperdispatch When LPAR is in vertical polarization, CPUs get different polarization values, namely vertical high, vertical medium and vertical low. These values represent the likelyhood of the CPU getting physical runtime. Vertical high CPUs will always get runtime and others get varying runtime depending on the load the CEC is under. Vertical high and vertical medium CPUs are considered the CPUs which the current LPAR has the entitlement to run on. The vertical lows are on the other hand are borrowed CPUs which would only be given to the LPAR by hipervisor when the other LPARs are not utilizing them. Using the CPU capacities, hint linux scheduler when it should prioritise vertical high and vertical medium CPUs over vertical low CPUs. By tracking various system statistics hiperdispatch determines when to adjust cpu capacities. After each adjustment, rebuilding of scheduler domains is necessary to notify the scheduler about capacity changes but since this operation is costly it should be done as sparsely as possible. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Co-developed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Mete Durlu	26ceef523d	s390/smp: Add cpu capacities Linux scheduler allows architectures to assign capacity values to individual CPUs. This hints scheduler the performance difference between CPUs and allows more efficient task distribution them. Implement helper methods to set and get CPU capacities for s390. This is particularly helpful in vertical polarization configurations of LPARs. On vertical polarization an LPARs CPUs can get different polarization values depending on the CEC configuration. CPUs with different polarization values can perform different from each other, using CPU capacities this can be reflected to linux scheduler. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Tobias Huschle	7e627f8193	s390/topology: Add config option to switch to vertical during boot By default, all systems on s390 start in horizontal cpu polarization. Selecting the new config option SCHED_TOPOLOGY_VERTICAL allows to build a kernel that switches to vertical polarization during boot. Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Tobias Huschle	9dd333e7af	s390/topology: Add sysctl handler for polarization Provide an additional path to set the polarization of the system, such that a user no longer relies on the sysfs interface only and is able configure the polarization for every reboot via sysctl control files. The new sysctl can be set as follows: - s390.polarization=0 for horizontal polarization - s390.polarization=1 for vertical polarization Acked-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Tobias Huschle	307b675cf0	s390/wti: Add debugfs file to display missed grace periods per cpu Introduce a new debug file which allows to determine how many warning track grace periods were missed on each CPU. The new file can be found as /sys/kernel/debug/s390/wti It is formatted as: CPU0 CPU1 [...] CPUx xyz xyz [...] xyz Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:35 +02:00
Tobias Huschle	42419bcdfd	s390/wti: Add wti accounting for missed grace periods A virtual CPU that has received a warning-track interrupt may fail to acknowledge the interrupt within the warning-track grace period. While this is usually not a problem, it will become necessary to investigate if there is a large number of such missed warning-track interrupts. Therefore, it is necessary to track these events. The information is tracked through the s390 debug facility and can be found under /sys/kernel/debug/s390dbf/wti/. The hex_ascii output is formatted as: <pid> <symbol> The values pid and current psw are collected when a warning track interrupt is received. Symbol is either the kernel symbol matching the collected psw or redacted to <user> when running in user space. Each line represents the currently executing process when a warning track interrupt was received which was then not acknowledged within its grace period. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Tobias Huschle	cafeff5a03	s390/wti: Prepare graceful CPU pre-emption on wti reception When a warning track interrupt is received, the kernel has only a very limited amount of time to make sure, that the CPU can be yielded as gracefully as possible before being pre-empted by the hypervisor. The interrupt handler for the wti therefore unparks a kernel thread which has being created on boot re-using the CPU hotplug kernel thread infrastructure. These threads exist per CPU and are assigned the highest possible real-time priority. This makes sure, that said threads will execute as soon as possible as the scheduler should pre-empt any other running user tasks to run the real-time thread. Furthermore, the interrupt handler disables all I/O interrupts to prevent additional interrupt processing on the soon-preempted CPU. Interrupt handlers are likely to take kernel locks, which in the worst case, will be kept while the interrupt handler is pre-empted from itself underlying physical CPU. In that case, all tasks or interrupt handlers on other CPUs would have to wait for the pre-empted CPU being dispatched again. By preventing further interrupt processing, this risk is minimized. Once the CPU gets dispatched again, the real-time kernel thread regains control, reenables interrupts and parks itself again. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Tobias Huschle	2c6c9ccc76	s390/wti: Introduce infrastructure for warning track interrupt The warning-track interrupt (wti) provides a notification that the receiving CPU will be pre-empted from its physical CPU within a short time frame. This time frame is called grace period and depends on the machine type. Giving up the CPU on time may prevent a task to get stuck while holding a resource. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Vasily Gorbik	36dff49b96	s390/ftrace: Avoid extra serialization for graph caller patching The only context where ftrace_enable_ftrace_graph_caller() or ftrace_disable_ftrace_graph_caller() is called also calls ftrace_arch_code_modify_post_process(), which already performs text_poke_sync_lock(). ftrace_run_update_code() arch_ftrace_update_code() ftrace_modify_all_code() ftrace_enable_ftrace_graph_caller()/ftrace_disable_ftrace_graph_caller() ftrace_arch_code_modify_post_process() text_poke_sync_lock() Remove the redundant serialization. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Vasily Gorbik	5200614080	s390/ftrace: Use get/copy_from_kernel_nofault consistently Use get/copy_from_kernel_nofault to access the kernel text consistently. Replace memcmp() in ftrace_init_nop() to ensure that in case of inconsistencies in the 'mcount' table, the kernel reports a failure instead of potentially crashing. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Vasily Gorbik	efd9cd019e	s390/ftrace: Avoid trampolines if possible When a sequential instruction fetching facility is present, it is safe to patch ftrace NOPs in function prologues. All of them are 8-byte aligned. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Vasily Gorbik	30799152c3	s390/kprobes: Avoid stop machine if possible Avoid stop machine on kprobes arm/disarm when sequential instruction fetching is present. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Vasily Gorbik	bb91ed0ee3	s390/setup: Recognize sequential instruction fetching facility When sequential instruction fetching facility is present, certain guarantees are provided for code patching. In particular, atomic overwrites within 8 aligned bytes is safe from an instruction-fetching point of view. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Sven Schnelle	ee3daf7c05	s390/entry: Unify save_area_sync and save_area_async In the past two save areas existed because interrupt handlers and system call / program check handlers where entered with interrupts enabled. To prevent a handler from overwriting the save areas from the previous handler, interrupts used the async save area, while system call and program check handler used the sync save area. Since the removal of critical section cleanup from entry.S, handlers are entered with interrupts disabled. When the interrupts are re-enabled, the save area is no longer need. Therefore merge both save areas into one. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:34 +02:00
Vasily Gorbik	acb684d3b0	s390/disassembler: Add instructions Add more instructions to the kernel disassembler. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:33 +02:00
Jens Remus	73c81973b4	s390/disassembler: Use proper format specifiers for operand values Treat register numbers as unsigned. Treat signed operand values as signed. This resolves multiple instances of the Cppcheck warning: warning: %i in format string (no. 1) requires 'int' but the argument type is 'unsigned int'. [invalidPrintfArgType_sint] Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-29 22:56:33 +02:00
Vasily Gorbik	a84dd0d8ae	s390/ftrace: Avoid calling unwinder in ftrace_return_address() ftrace_return_address() is called extremely often from performance-critical code paths when debugging features like CONFIG_TRACE_IRQFLAGS are enabled. For example, with debug_defconfig, ftrace selftests on my LPAR currently execute ftrace_return_address() as follows: ftrace_return_address(0) - 0 times (common code uses __builtin_return_address(0) instead) ftrace_return_address(1) - 2,986,805,401 times (with this patch applied) ftrace_return_address(2) - 140 times ftrace_return_address(>2) - 0 times The use of __builtin_return_address(n) was replaced by return_address() with an unwinder call by commit `cae74ba8c2` ("s390/ftrace: Use unwinder instead of __builtin_return_address()") because __builtin_return_address(n) simply walks the stack backchain and doesn't check for reaching the stack top. For shallow stacks with fewer than "n" frames, this results in reads at low addresses and random memory accesses. While calling the fully functional unwinder "works", it is very slow for this purpose. Moreover, potentially following stack switches and walking past IRQ context is simply wrong thing to do for ftrace_return_address(). Reimplement return_address() to essentially be __builtin_return_address(n) with checks for reaching the stack top. Since the ftrace_return_address(n) argument is always a constant, keep the implementation in the header, allowing both GCC and Clang to unroll the loop and optimize it to the bare minimum. Fixes: `cae74ba8c2` ("s390/ftrace: Use unwinder instead of __builtin_return_address()") Cc: stable@vger.kernel.org Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-27 20:16:48 +02:00
Vasily Gorbik	d759be2823	s390/ftrace: Use kernel ftrace trampoline for modules Now that both the kernel modules area and the kernel image itself are located within 4 GB, there is no longer a need to maintain a separate ftrace_plt trampoline. Use the existing trampoline in the kernel. Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-27 20:16:48 +02:00
Vasily Gorbik	017f1f0d39	s390/ftrace: Remove unused ftrace_plt_template* Unused since commit `b860b9346e` ("s390/ftrace: remove dead code"). Reviewed-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-27 20:16:48 +02:00
Heiko Carstens	6708948e36	s390/early: Dump register contents and call trace for early crashes If the early program check handler cannot resolve a program check dump register contents and a call trace to the console before loading a disabled wait psw. This makes debugging much easier. Emit an extra message with early_printk() for cases where regular printk() via the early console is not yet working so that at least some information is available. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-22 19:28:11 +02:00
Heiko Carstens	0bc6a69f5f	s390/early: Add __init to __do_early_pgm_check() __do_early_pgm_check() is a function which is only needed during early setup code. Mark it __init in order to save a few bytes. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-22 19:28:11 +02:00
Thomas Richter	b495e71015	s390/cpum_sf: Remove WARN_ON_ONCE statements Remove WARN_ON_ONCE statements. These have not triggered in the past. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-22 19:28:11 +02:00
Thomas Richter	14a34130e0	s390/cpum_sf: Rework debug_sprintf_event() messages Rework debug messages: - Remove most of the debug_sprintf_event() invocations. - Do not split string format statements - Remove colon after function name. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-22 19:28:10 +02:00
Alexander Gordeev	1642285e51	s390/boot: Fix KASLR base offset off by __START_KERNEL bytes Symbol offsets to the KASLR base do not match symbol address in the vmlinux image. That is the result of setting the KASLR base to the beginning of .text section as result of an optimization. Revert that optimization and allocate virtual memory for the whole kernel image including __START_KERNEL bytes as per the linker script. That allows keeping the semantics of the KASLR base offset in sync with other architectures. Rename __START_KERNEL to TEXT_OFFSET, since it represents the offset of the .text section within the kernel image, rather than a virtual address. Still skip mapping TEXT_OFFSET bytes to save memory on pgtables and provoke exceptions in case an attempt to access this area is made, as no kernel symbol may reside there. In case CONFIG_KASAN is enabled the location counter might exceed the value of TEXT_OFFSET, while the decompressor linker script forcefully resets it to TEXT_OFFSET, which leads to a sections overlap link failure. Use MAX() expression to avoid that. Reported-by: Omar Sandoval <osandov@osandov.com> Closes: https://lore.kernel.org/linux-s390/ZnS8dycxhtXBZVky@telecaster.dhcp.thefacebook.com/ Fixes: `56b1069c40` ("s390/boot: Rework deployment of the kernel image") Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-22 19:24:13 +02:00
Thomas Richter	6d9a732d8a	s390/cpum_sf: Ignore qsi() return code qsi() executes the instruction qsi (query sample information) and stores the result of the query in a sample information block pointed to by the function argument. The instruction does not change the condition code register. The return code is always zero. No need to check for errors. Remove now unreferenced macros PMC_FAILURE and RS_INIT_FAILURE_QSI. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-21 16:17:01 +02:00
Thomas Richter	742a755716	s390/cpum_sf: Ignore lsctl() return code in sf_disable() sf_disable() returns the condition code of instruction lsctl (load sampling controls). However the parameter to lsctl() in sf_disable() is a sample control block containing all zeroes. This invocation of lsctl() does not fail and returns always zero even when there is no authorization for sampling on the machine. In short, sampling can be always turned off. Ignore the return code of sf_disable() and change the function return to void. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-21 16:17:01 +02:00
Alexander Gordeev	a3ca27c405	s390/mm: Prevent lowcore vs identity mapping overlap The identity mapping position in virtual memory is randomized together with the kernel mapping. That position can never overlap with the lowcore even when the lowcore is relocated. Prevent overlapping with the lowcore to allow independent positioning of the identity mapping. With the current value of the alternative lowcore address of 0x70000 the overlap could happen in case the identity mapping is placed at zero. This is a prerequisite for uncoupling of randomization base of kernel image and identity mapping in virtual memory. Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-21 16:14:45 +02:00
Jann Horn	92a10d3861	runtime constants: move list of constants to vmlinux.lds.h Refactor the list of constant variables into a macro. This should make it easier to add more constants in the future. Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will@kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-08-19 09:48:14 +02:00
Heiko Carstens	85878ff1b3	s390/entry: Move early_pgm_check_handler() to init text section Save some bytes and move early_pgm_check_handler() to init text section. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Heiko Carstens	3c4d0ae067	s390/traps: Handle early warnings gracefully Add missing warning handling to the early program check handler. This way a warning is printed to the console as soon as the early console is setup, and the kernel continues to boot. Before this change a disabled wait psw was loaded instead and the machine was silently stopped without giving an idea about what happened. Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Heiko Carstens	f101b305a7	s390/entry: Make early program check handler relocated lowcore aware Add the missing pieces so the early program check handler also works with a relocated lowcore. Right now the result of an early program check in case of a relocated lowcore would be a program check loop. Fixes: `8f1e70adb1` ("s390/boot: Add cmdline option to relocate lowcore") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Heiko Carstens	f2bb5b97b5	s390/entry: Move early program check handler to entry.S Have all program check handlers in one file to make future changes easy. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Thomas Richter	e09e58f425	s390/cpum_sf: Use variable name cpuhw consistently All functions but setup_pmc_cpu() use a local variable named cpuhw to refer to struct cpu_hw_sf. In setup_pmc_cpu() rename variable cpusf to cpuhw. This makes the naming scheme consistent with all other functions. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Thomas Richter	6bc565a99e	s390/cpum_sf: Define and initialize variable Define and initialize a variable in one place. Remove space between cast and variable. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Thomas Richter	b201828290	s390/cpum_sf: Use hwc as variable consistently In hw_perf_event_update() and cpumsf_pmu_enable() use variable hwc consistently to access event's hardware related data. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Thomas Richter	d4559eabc1	s390/cpum_cf: Move defines from header file to source file The macros PERF_CPUM_CF_MAX_CTR and PERF_EVENT_CPUM_CF_DIAG are used in only one source file arch/s390/kernel/perf_cpum_cf.c. Move these defines from the header file arch/s390/include/asm/perf_event.h to the only user. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Thomas Richter	52d6ef92a4	s390/cpum_sf: Move defines from header file to source file Some defines in common header file arch/s390/include/asm/perf_event.h are only used in one source file arch/s390/kernel/perf_cpum_sf.c. Move these defines from header to source file. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:53 +02:00
Thomas Richter	501cab2b1d	s390/cpum_sf: Rename macro to consistent prefix Rename macro SAMPLE_FREQ_MODE to SAMPL_FREQ_MODE to make its prefix consistent with all other macro starting with prefix SAMPL_XXX. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:52 +02:00
Thomas Richter	ea95be4bda	s390/cpum_sf: Remove unused defines REG_NONE and REG_OVERFLOW Member hw_perf_event::reg.reg is set but never used, so remove it. Defines REG_NONE and REG_OVERFLOW are not referenced anymore. The initialization to zero takes place in function perf_event_alloc() where ... event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL \| __GFP_ZERO, node); ... makes sure memory allocated for the event is zero'ed. This is done in the kernel's common code in kernel/events/core.c The struct perf_event contains member hw_perf_event as in struct perf_event { .... struct hw_perf_event hw; .... }; This contained sub-structure is also initialized to zero. No functional change. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:52 +02:00
Thomas Richter	8a6fe8f21e	s390/cpum_sf: Use refcount_t instead of atomic_t Replace atomic_t by refcount_t for reference counting of events. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-08-07 20:52:52 +02:00
Heiko Carstens	75c10d5377	s390/vmlinux.lds.S: Move ro_after_init section behind rodata section The .data.rel.ro and .got section were added between the rodata and ro_after_init data section, which adds an RW mapping in between all RO mapping of the kernel image: ---[ Kernel Image Start ]--- 0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X 0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X 0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX 0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX 0x000003ffe1300000-0x000003ffe1331000 196K PTE RO NX 0x000003ffe1331000-0x000003ffe13b3000 520K PTE RW NX <--- 0x000003ffe13b3000-0x000003ffe13d5000 136K PTE RO NX 0x000003ffe13d5000-0x000003ffe1400000 172K PTE RW NX 0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX 0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX 0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX 0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX ---[ Kernel Image End ]--- Move the ro_after_init data section again right behind the rodata section to prevent interleaving RO and RW mappings: ---[ Kernel Image Start ]--- 0x000003ffe0000000-0x000003ffe0e00000 14M PMD RO X 0x000003ffe0e00000-0x000003ffe0ec7000 796K PTE RO X 0x000003ffe0ec7000-0x000003ffe0f00000 228K PTE RO NX 0x000003ffe0f00000-0x000003ffe1300000 4M PMD RO NX 0x000003ffe1300000-0x000003ffe1353000 332K PTE RO NX 0x000003ffe1353000-0x000003ffe1400000 692K PTE RW NX 0x000003ffe1400000-0x000003ffe1500000 1M PMD RW NX 0x000003ffe1500000-0x000003ffe1700000 2M PTE RW NX 0x000003ffe1700000-0x000003ffe1800000 1M PMD RW NX 0x000003ffe1800000-0x000003ffe187e000 504K PTE RW NX ---[ Kernel Image End ]--- Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-31 16:30:20 +02:00
Heiko Carstens	0a34c027a3	s390/alternatives: Remove unused empty header file Remove the unused and empty arch/s390/kernel/alternative.h header file which was added by mistake. Fixes: `5ade5be4ed` ("s390: Add infrastructure to patch lowcore accesses") Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-31 16:30:20 +02:00
Heiko Carstens	4734406c39	s390/fpu: Re-add exception handling in load_fpu_state() With the recent rewrite of the fpu code exception handling for the lfpc instruction within load_fpu_state() was erroneously removed. Add it again to prevent that loading invalid floating point register values cause an unhandled specification exception. Fixes: `8c09871a95` ("s390/fpu: limit save and restore to used registers") Cc: stable@vger.kernel.org Reported-by: Aristeu Rozanski <aris@redhat.com> Tested-by: Aristeu Rozanski <aris@redhat.com> Reviewed-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-31 16:30:20 +02:00
Linus Torvalds	65ad409e63	more s390 updates for 6.11 merge window - Fix KMSAN build breakage caused by the conflict between s390 and mm-stable trees - Add KMSAN page markers for ptdump - Add runtime constant support - Fix __pa/__va for modules under non-GPL licenses by exporting necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage problems - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement Counter Facility code when the counter set size is zero - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable its functionality by default - Support allocation of multiple MSI interrupts per device and improve logging of architecture-specific limitations - Add support for lowcore relocation as a debugging feature to catch all null ptr dereferences in the kernel address space, improving detection beyond the current implementation's limited write access protection - Clean up and rework CPU alternatives to allow for callbacks and early patching for the lowcore relocation -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmajrGUACgkQjYWKoQLX FBgZjwf/X8suh1Gm2qO47hdGURusOKmEa6GjYaihKzOi2I5yAWVXGsAYX7QtXI4X fxbKyuGJIvq7LOrIojN1JOCPGkDRztgMddqDJI7WljuRiw6dcd4L5tbaPgCKEv3Q AQcoq+Aeg1L5xnuNFPdQXl6+Fy2lTFqJCkUl+uW05pGAn2R212dYG3HB41TpwOtJ Sv2R5+yD9TQCKnHyuCQqaGf7d6SQTcVeBj8zrqVmcyduNK+BYYMOwlJ/UTRzeZEX 3DmQg/TdAkxXf0jZ+vrNILEfHlIvwDAhFjdoAXXL0TX4lx2cHLx9AiqNxYhUprsG 0gutc/nLq2FxhqofoJ0z9TCdb1Ef7w== =skva -----END PGP SIGNATURE----- Merge tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull more s390 updates from Vasily Gorbik: - Fix KMSAN build breakage caused by the conflict between s390 and mm-stable trees - Add KMSAN page markers for ptdump - Add runtime constant support - Fix __pa/__va for modules under non-GPL licenses by exporting necessary vm_layout struct with EXPORT_SYMBOL to prevent linkage problems - Fix an endless loop in the CF_DIAG event stop in the CPU Measurement Counter Facility code when the counter set size is zero - Remove the PROTECTED_VIRTUALIZATION_GUEST config option and enable its functionality by default - Support allocation of multiple MSI interrupts per device and improve logging of architecture-specific limitations - Add support for lowcore relocation as a debugging feature to catch all null ptr dereferences in the kernel address space, improving detection beyond the current implementation's limited write access protection - Clean up and rework CPU alternatives to allow for callbacks and early patching for the lowcore relocation * tag 's390-6.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (39 commits) s390: Remove protvirt and kvm config guards for uv code s390/boot: Add cmdline option to relocate lowcore s390/kdump: Make kdump ready for lowcore relocation s390/entry: Make system_call() ready for lowcore relocation s390/entry: Make ret_from_fork() ready for lowcore relocation s390/entry: Make __switch_to() ready for lowcore relocation s390/entry: Make restart_int_handler() ready for lowcore relocation s390/entry: Make mchk_int_handler() ready for lowcore relocation s390/entry: Make int handlers ready for lowcore relocation s390/entry: Make pgm_check_handler() ready for lowcore relocation s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro s390/entry: Add base register to SIEEXIT macro s390/entry: Add base register to MBEAR macro s390/entry: Make __sie64a() ready for lowcore relocation s390/head64: Make startup code ready for lowcore relocation s390: Add infrastructure to patch lowcore accesses s390/atomic_ops: Disable flag outputs constraint for GCC versions below 14.2.0 s390/entry: Move SIE indicator flag to thread info s390/nmi: Simplify ptregs setup s390/alternatives: Remove alternative facility list ...	2024-07-26 10:47:53 -07:00
Joel Granados	78eb4ea25c	sysctl: treewide: constify the ctl_table argument of proc_handlers const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer\|interval)_handler\|sched_(rt\|rr)_handler\|rds_tcp_skbuf_handler\|proc_sctp_do_(hmac_alg\|rto_min\|rto_max\|udp_port\|alpha_beta\|auth\|probe_interval)"; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int write, void buffer, size_t lenp, loff_t ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void , size_t , loff_t ); @r4@ identifier func, ctl; @@ int func( - struct ctl_table ctl + const struct ctl_table ctl ,int , void , size_t , loff_t ); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void buffer, size_t lenp, loff_t ppos); ``` Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>	2024-07-24 20:59:29 +02:00
Janosch Frank	6dc2e98d5f	s390: Remove protvirt and kvm config guards for uv code Removing the CONFIG_PROTECTED_VIRTUALIZATION_GUEST ifdefs and config option as well as CONFIG_KVM ifdefs in uv files. Having this configurable has been more of a pain than a help. It's time to remove the ifdefs and the config option. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:33 +02:00
Sven Schnelle	97cee3dd4a	s390/kdump: Make kdump ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in store_status() and __do_machine_kdump(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	361f6ec2fe	s390/entry: Make system_call() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in system_call(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	9b3dcae128	s390/entry: Make ret_from_fork() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in ret_from_fork(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	7cc86dee44	s390/entry: Make __switch_to() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in __switch_to(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	4064b71112	s390/entry: Make restart_int_handler() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in restart_int_handler(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	0001b7bbc5	s390/entry: Make mchk_int_handler() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in mcck_int_handler(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	bd2c55b307	s390/entry: Make int handlers ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in the ext/io interrupt handlers. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	9e1e275fa2	s390/entry: Make pgm_check_handler() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in pgm_check_handler(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	86e08d64ee	s390/entry: Add base register to CHECK_VMAP_STACK/CHECK_STACK macro In preparation of having lowcore at different address than zero, add the base register to CHECK_VMAP_STACK and CHECK_STACK. No functional change, because %r0 is passed to the macro. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	6908f8f916	s390/entry: Add base register to SIEEXIT macro In preparation of having lowcore at different address than zero, add the base register to SIEEXIT. No functional change, because %r0 is passed to the macro. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	ca2f0a26c4	s390/entry: Add base register to MBEAR macro In preparation of having lowcore at different address than zero, add the base register to MBEAR. No functional change, because %r0 is passed to the macro. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	12184a4676	s390/entry: Make __sie64a() ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in __sie64a(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	39e8c5d6a4	s390/head64: Make startup code ready for lowcore relocation In preparation of having lowcore at different address than zero, add the base register to all lowcore accesses in startup_continue(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Sven Schnelle	5ade5be4ed	s390: Add infrastructure to patch lowcore accesses The s390 architecture defines two special per-CPU data pages called the "prefix area". In s390-linux terminology this is usually called "lowcore". This memory area contains system configuration data like old/new PSW's for system call/interrupt/machine check handlers and lots of other data. It is normally mapped to logical address 0. This area can only be accessed when in supervisor mode. This means that kernel code can dereference NULL pointers, because accesses to address 0 are allowed. Parts of lowcore can be write protected, but read accesses and write accesses outside of the write protected areas are not caught. To remove this limitation for debugging and testing, remap lowcore to another address and define a function get_lowcore() which simply returns the address where lowcore is mapped at. This would normally introduce a pointer dereference (=memory read). As lowcore is used for several very often used variables, add code to patch this function during runtime, so we avoid the memory reads. For C code get_lowcore() has to be used, for assembly code it is the GET_LC macro. When using this macro/function a reference is added to alternative patching. All these locations will be patched to the actual lowcore location when the kernel is booted or a module is loaded. To make debugging/bisecting problems easier, this patch adds all the infrastructure but the lowcore address is still hardwired to 0. This way the code can be converted on a per function basis, and the functionality is enabled in a patch after all the functions have been converted. Note that this requires at least z16 because the old lpsw instruction only allowed a 12 bit displacement. z16 introduced lpswey which allows 20 bits (signed), so the lowcore can effectively be mapped from address 0 - 0x7e000. To use 0x7e000 as address, a 6 byte lgfi instruction would have to be used in the alternative. To save two bytes, llilh can be used, but this only allows to set bits 16-31 of the address. In order to use the llilh instruction, use 0x70000 as alternative lowcore address. This is still large enough to catch NULL pointer dereferences into large arrays. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:32 +02:00
Heiko Carstens	fc8eac33ad	s390/entry: Move SIE indicator flag to thread info CIF_SIE indicates if a thread is running in SIE context. This is the state of a thread and not the CPU. Therefore move this indicator to thread info. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	213400c4af	s390/nmi: Simplify ptregs setup The low level machine check handler code fills the ptregs structure partially with the register contents present at machine check handler entry and partially with contents from the machine check save area. In case of a machine check the contents of all general purpose registers are saved by the CPU to the machine check save area. Therefore simplify the code and fill the ptregs structure by only using the machine check save area as source. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	beb8cee06f	s390/alternatives: Remove alternative facility list The alternative and the normal facility list are always identical. Remove the alternative facility list, which allows to simplify the alternatives code. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	47837a5c74	s390/nospec: Push down alternative handling The nospec implementation is deeply integrated into the alternatives code: only for nospec an alternative facility list is implemented and used by the alternative code, while it is modified by nospec specific needs. Push down the nospec alternative handling into the nospec by introducing a new alternative type and a specific nospec callback to decide if alternatives should be applied. Also introduce a new global nobp variable which together with facility 82 can be used to decide if nobp is enabled or not. Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Sven Schnelle	7f9d85998f	s390/alternatives: Allow early alternative patching in decompressor Add the required code to patch alternatives early in the decompressor. This is required for the upcoming lowcore relocation changes, where alternatives for facility 193 need to get patched before lowcore alternatives. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Co-developed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	b3e0c5f734	s390/alternatives: Rework to allow for callbacks Rework alternatives to allow for callbacks. With this every alternative entry has additional data encoded: - When (aka context) an alternative is supposed to be applied - The type of an alternative, which allows for type specific handling and callbacks - Extra type specific payload (patch information), which can be passed to callbacks in order to decide if an alternative should be applied or not With this only the "late" context is implemented, which means there is no change to the previous behaviour. All code is just converted to the more generic new infrastructure. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	ace76fac94	s390/alternatives: Move text sync functions Move all text sync functions from alternative.c to processor.c. This way there is only minimal code left in alternative.c left, which is a prerequisite to use the C file within boot code as well. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	c77f7354c4	s390/alternatives: Merge both alternative header files The two alternative header files must stay in sync. This is easier to achieve within one header file. Therefore merge both of them and have only one file, like most other architectures. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Heiko Carstens	9be999a612	s390/alternatives: Use consistent naming The alternative code is using the words facility and feature for the same. Rename facility to more generic feature everywhere to have consistent naming. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Tested-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Sven Schnelle	035248a784	s390/alternatives: Remove noaltinstr option The current Kernel doesn't boot without alternative patching on z16 machines. To avoid such bugs in the future, remove the option disable alternative patching. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Sven Schnelle	d3604ffba1	s390: Move CIF flags to struct pcpu To allow testing flags for offline CPUs, move the CIF flags to struct pcpu. To avoid having to calculate the array index for each access, add a pointer to the pcpu member for the current cpu to lowcore. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Sven Schnelle	90fc5ac282	s390/smp: Switch pcpu_devices to percpu In preparation of moving the CIF flags from lowcore to pcpu_devices, convert the pcpu_devices array to use the percpu infrastructure. This is required because using the pcpu_devices array as it is would introduce a performance penalty due to the fact that CPU flags for multiple CPUs would end up in the same cacheline. Note that a pointer to the pcpu struct of the IPL CPU is still required. This is because a restart interrupt can be triggered on an offline CPU. s390 stores the percpu offset in lowcore, but offline CPUs have no lowcore area allocated. So percpu data cannot be used from an offline CPU and we need to get the pcpu pointer for the IPL cpu from somewhere else. Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Sven Schnelle	a795eeaf85	s390/smp: Handle restart interrupt on ipl cpu The current smp code allows to trigger a restart interrupt on CPUs offline in linux. To allow using the percpu infrastructure instead of the pcpu_devices array, switch to the ipl cpu which is always online before calling do_restart(). Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:31 +02:00
Thomas Richter	e6ce1f12d7	s390/cpum_cf: Fix endless loop in CF_DIAG event stop Event CF_DIAG reads out complete counter sets using stcctm instruction. This is done at event start time when the process starts execution and at event stop time when the process is removed from the CPU. During removal the difference of each counter in the counter sets is calculated and saved as raw data in the ring buffer. This works fine unless the number of counters in a counter set is zero. This may happen for the extended counter set. This set is machine specific and the size of the counter set can be zero even when extended counter set is authorized for read access. This case is not handled. cfdiag_diffctr() checks authorization of the extended counter set. If true the functions assumes the extended counter set has been saved in a data buffer. However this is not the case, cfdiag_getctrset() does not save a counter set with counter set size of zero. This mismatch causes an endless loop in the counter set readout during event stop handling. The calculation of the difference of the counters in each counter now verifies the size of the counter set is non-zero. A counter set with size zero is skipped. Fixes: `a029a4eab3` ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility") Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 16:02:30 +02:00
Vasily Gorbik	e188e5d5ff	s390/setup: Fix __pa/__va for modules under non-GPL licenses The struct vm_layout contains fields used in __pa/__va calculations. Such fundamental things have to be exported with EXPORT_SYMBOL to avoid breakages of out-of-tree modules under non-GPL licenses. Fixes: `7de0446f0b` ("s390/boot: Make identity mapping base address explicit") Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 15:54:58 +02:00
Heiko Carstens	57e29f4c89	s390: Add runtime constant support Implement the runtime constant infrastructure for s390, allowing the dcache d_hash() function to be generated using as a constant for hash table address followed by shift by a constant of the hash index. This is the s390 variant of commit `94a2bc0f61` ("arm64: add 'runtime constant' support") and commit `e3c92e8171` ("runtime constants: add x86 architecture support"). Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-23 15:54:58 +02:00
Linus Torvalds	fbc90c042c	- 875fa64577da ("mm/hugetlb_vmemmap: fix race with speculative PFN walkers") is known to cause a performance regression (https://lore.kernel.org/all/3acefad9-96e5-4681-8014-827d6be71c7a@linux.ibm.com/T/#mfa809800a7862fb5bdf834c6f71a3a5113eb83ff). Yu has a fix which I'll send along later via the hotfixes branch. - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page(), vm_map_pages() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd\|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Is anyone reading this stuff? If so, email me! - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZp2C+QAKCRDdBJ7gKXxA joTkAQDvjqOoFStqk4GU3OXMYB7WCU/ZQMFG0iuu1EEwTVDZ4QEA8CnG7seek1R3 xEoo+vw0sWWeLV3qzsxnCA1BJ8cTJA8= =z0Lf -----END PGP SIGNATURE----- Merge tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - In the series "mm: Avoid possible overflows in dirty throttling" Jan Kara addresses a couple of issues in the writeback throttling code. These fixes are also targetted at -stable kernels. - Ryusuke Konishi's series "nilfs2: fix potential issues related to reserved inodes" does that. This should actually be in the mm-nonmm-stable tree, along with the many other nilfs2 patches. My bad. - More folio conversions from Kefeng Wang in the series "mm: convert to folio_alloc_mpol()" - Kemeng Shi has sent some cleanups to the writeback code in the series "Add helper functions to remove repeated code and improve readability of cgroup writeback" - Kairui Song has made the swap code a little smaller and a little faster in the series "mm/swap: clean up and optimize swap cache index". - In the series "mm/memory: cleanly support zeropage in vm_insert_page(), vm_map_pages() and vmf_insert_mixed()" David Hildenbrand has reworked the rather sketchy handling of the use of the zeropage in MAP_SHARED mappings. I don't see any runtime effects here - more a cleanup/understandability/maintainablity thing. - Dev Jain has improved selftests/mm/va_high_addr_switch.c's handling of higher addresses, for aarch64. The (poorly named) series is "Restructure va_high_addr_switch". - The core TLB handling code gets some cleanups and possible slight optimizations in Bang Li's series "Add update_mmu_tlb_range() to simplify code". - Jane Chu has improved the handling of our fake-an-unrecoverable-memory-error testing feature MADV_HWPOISON in the series "Enhance soft hwpoison handling and injection". - Jeff Johnson has sent a billion patches everywhere to add MODULE_DESCRIPTION() to everything. Some landed in this pull. - In the series "mm: cleanup MIGRATE_SYNC_NO_COPY mode", Kefeng Wang has simplified migration's use of hardware-offload memory copying. - Yosry Ahmed performs more folio API conversions in his series "mm: zswap: trivial folio conversions". - In the series "large folios swap-in: handle refault cases first", Chuanhua Han inches us forward in the handling of large pages in the swap code. This is a cleanup and optimization, working toward the end objective of full support of large folio swapin/out. - In the series "mm,swap: cleanup VMA based swap readahead window calculation", Huang Ying has contributed some cleanups and a possible fixlet to his VMA based swap readahead code. - In the series "add mTHP support for anonymous shmem" Baolin Wang has taught anonymous shmem mappings to use multisize THP. By default this is a no-op - users must opt in vis sysfs controls. Dramatic improvements in pagefault latency are realized. - David Hildenbrand has some cleanups to our remaining use of page_mapcount() in the series "fs/proc: move page_mapcount() to fs/proc/internal.h". - David also has some highmem accounting cleanups in the series "mm/highmem: don't track highmem pages manually". - Build-time fixes and cleanups from John Hubbard in the series "cleanups, fixes, and progress towards avoiding "make headers"". - Cleanups and consolidation of the core pagemap handling from Barry Song in the series "mm: introduce pmd\|pte_needs_soft_dirty_wp helpers and utilize them". - Lance Yang's series "Reclaim lazyfree THP without splitting" has reduced the latency of the reclaim of pmd-mapped THPs under fairly common circumstances. A 10x speedup is seen in a microbenchmark. It does this by punting to aother CPU but I guess that's a win unless all CPUs are pegged. - hugetlb_cgroup cleanups from Xiu Jianfeng in the series "mm/hugetlb_cgroup: rework on cftypes". - Miaohe Lin's series "Some cleanups for memory-failure" does just that thing. - Someone other than SeongJae has developed a DAMON feature in Honggyu Kim's series "DAMON based tiered memory management for CXL memory". This adds DAMON features which may be used to help determine the efficiency of our placement of CXL/PCIe attached DRAM. - DAMON user API centralization and simplificatio work in SeongJae Park's series "mm/damon: introduce DAMON parameters online commit function". - In the series "mm: page_type, zsmalloc and page_mapcount_reset()" David Hildenbrand does some maintenance work on zsmalloc - partially modernizing its use of pageframe fields. - Kefeng Wang provides more folio conversions in the series "mm: remove page_maybe_dma_pinned() and page_mkclean()". - More cleanup from David Hildenbrand, this time in the series "mm/memory_hotplug: use PageOffline() instead of PageReserved() for !ZONE_DEVICE". It "enlightens memory hotplug more about PageOffline() pages" and permits the removal of some virtio-mem hacks. - Barry Song's series "mm: clarify folio_add_new_anon_rmap() and __folio_add_anon_rmap()" is a cleanup to the anon folio handling in preparation for mTHP (multisize THP) swapin. - Kefeng Wang's series "mm: improve clear and copy user folio" implements more folio conversions, this time in the area of large folio userspace copying. - The series "Docs/mm/damon/maintaier-profile: document a mailing tool and community meetup series" tells people how to get better involved with other DAMON developers. From SeongJae Park. - A large series ("kmsan: Enable on s390") from Ilya Leoshkevich does that. - David Hildenbrand sends along more cleanups, this time against the migration code. The series is "mm/migrate: move NUMA hinting fault folio isolation + checks under PTL". - Jan Kara has found quite a lot of strangenesses and minor errors in the readahead code. He addresses this in the series "mm: Fix various readahead quirks". - SeongJae Park's series "selftests/damon: test DAMOS tried regions and {min,max}_nr_regions" adds features and addresses errors in DAMON's self testing code. - Gavin Shan has found a userspace-triggerable WARN in the pagecache code. The series "mm/filemap: Limit page cache size to that supported by xarray" addresses this. The series is marked cc:stable. - Chengming Zhou's series "mm/ksm: cmp_and_merge_page() optimizations and cleanup" cleans up and slightly optimizes KSM. - Roman Gushchin has separated the memcg-v1 and memcg-v2 code - lots of code motion. The series (which also makes the memcg-v1 code Kconfigurable) are "mm: memcg: separate legacy cgroup v1 code and put under config option" and "mm: memcg: put cgroup v1-specific memcg data under CONFIG_MEMCG_V1" - Dan Schatzberg's series "Add swappiness argument to memory.reclaim" adds an additional feature to this cgroup-v2 control file. - The series "Userspace controls soft-offline pages" from Jiaqi Yan permits userspace to stop the kernel's automatic treatment of excessive correctable memory errors. In order to permit userspace to monitor and handle this situation. - Kefeng Wang's series "mm: migrate: support poison recover from migrate folio" teaches the kernel to appropriately handle migration from poisoned source folios rather than simply panicing. - SeongJae Park's series "Docs/damon: minor fixups and improvements" does those things. - In the series "mm/zsmalloc: change back to per-size_class lock" Chengming Zhou improves zsmalloc's scalability and memory utilization. - Vivek Kasireddy's series "mm/gup: Introduce memfd_pin_folios() for pinning memfd folios" makes the GUP code use FOLL_PIN rather than bare refcount increments. So these paes can first be moved aside if they reside in the movable zone or a CMA block. - Andrii Nakryiko has added a binary ioctl()-based API to /proc/pid/maps for much faster reading of vma information. The series is "query VMAs from /proc/<pid>/maps". - In the series "mm: introduce per-order mTHP split counters" Lance Yang improves the kernel's presentation of developer information related to multisize THP splitting. - Michael Ellerman has developed the series "Reimplement huge pages without hugepd on powerpc (8xx, e500, book3s/64)". This permits userspace to use all available huge page sizes. - In the series "revert unconditional slab and page allocator fault injection calls" Vlastimil Babka removes a performance-affecting and not very useful feature from slab fault injection. * tag 'mm-stable-2024-07-21-14-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (411 commits) mm/mglru: fix ineffective protection calculation mm/zswap: fix a white space issue mm/hugetlb: fix kernel NULL pointer dereference when migrating hugetlb folio mm/hugetlb: fix possible recursive locking detected warning mm/gup: clear the LRU flag of a page before adding to LRU batch mm/numa_balancing: teach mpol_to_str about the balancing mode mm: memcg1: convert charge move flags to unsigned long long alloc_tag: fix page_ext_get/page_ext_put sequence during page splitting lib: reuse page_ext_data() to obtain codetag_ref lib: add missing newline character in the warning message mm/mglru: fix overshooting shrinker memory mm/mglru: fix div-by-zero in vmpressure_calc_level() mm/kmemleak: replace strncpy() with strscpy() mm, page_alloc: put should_fail_alloc_page() back behing CONFIG_FAIL_PAGE_ALLOC mm, slab: put should_failslab() back behind CONFIG_SHOULD_FAILSLAB mm: ignore data-race in __swap_writepage hugetlbfs: ensure generic_hugetlb_get_unmapped_area() returns higher address than mmap_min_addr mm: shmem: rename mTHP shmem counters mm: swap_state: use folio_alloc_mpol() in __read_swap_cache_async() mm/migrate: putback split folios when numa hint migration fails ...	2024-07-21 17:15:46 -07:00
Linus Torvalds	1c7d0c3af5	s390 updates for 6.11 merge window - Remove restrictions on PAI NNPA and crypto counters, enabling concurrent per-task and system-wide sampling and counting events - Switch to GENERIC_CPU_DEVICES by setting up the CPU present mask in the architecture code and letting the generic code handle CPU bring-up - Add support for the diag204 busy indication facility to prevent undesirable blocking during hypervisor logical CPU utilization queries. Implement results caching - Improve the handling of Store Data SCLP events by suppressing unnecessary warning, preventing buffer release in I/O during failures, and adding timeout handling for Store Data requests to address potential firmware issues - Provide optimized __arch_hweight() implementations - Remove the unnecessary CPU KOBJ_CHANGE uevents generated during topology updates, as they are unused and also not present on other architectures - Cleanup atomic_ops, optimize __atomic_set() for small values and __atomic_cmpxchg_bool() for compilers supporting flag output constraint - Couple of cleanups for KVM: - Move and improve KVM struct definitions for DAT tables from gaccess.c to a new header - Pass the asce as parameter to sie64a() - Make the crdte() and cspg() page table handling wrappers return a boolean to indicate success, like the other existing "compare and swap" wrappers - Add documentation for HWCAP flags - Switch to obtaining total RAM pages from memblock instead of totalram_pages() during mm init, to ensure correct calculation of zero page size, when defer_init is enabled - Refactor lowcore access and switch to using the get_lowcore() function instead of the S390_lowcore macro - Cleanups for PG_arch_1 and folio handling in UV and hugetlb code - Add missing MODULE_DESCRIPTION() macros - Fix VM_FAULT_HWPOISON handling in do_exception() -----BEGIN PGP SIGNATURE----- iQEzBAABCAAdFiEE3QHqV+H2a8xAv27vjYWKoQLXFBgFAmaYGegACgkQjYWKoQLX FBjCwwf/aRYoLIXCa9/nHGWFiUjZm6xBgVwZh55bXjfNG9TI2J9UZSsYlOFGUJKl gvD2Ym+LqAejK8R4EUHkfD6ftaKMQuIxNDoedxhwuSpfOQ2mZ5teu0MxTh8QcUAx 4Y2w5XEeCuqE3SuoZ4SJa58K4rGl4cFpPsKNa8ofdzH1ZLFNe8Wqzis4kh0htqLb FtPj6nsgfzQ5kg14rVkGxCa4CqoFxonXgsA6nH6xZLbxKUInyq8uV44UBQ+aJq5v dsdzZ5XuAJHN2FpBuuOYQYZYw3XIy/kka7o4EjffORi5SGCRMWO4Zt0P6HXaNkh6 xV8EEO8myeo7rV8dnrk1V4yGjGJmfA== =3IGY -----END PGP SIGNATURE----- Merge tag 's390-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Vasily Gorbik: - Remove restrictions on PAI NNPA and crypto counters, enabling concurrent per-task and system-wide sampling and counting events - Switch to GENERIC_CPU_DEVICES by setting up the CPU present mask in the architecture code and letting the generic code handle CPU bring-up - Add support for the diag204 busy indication facility to prevent undesirable blocking during hypervisor logical CPU utilization queries. Implement results caching - Improve the handling of Store Data SCLP events by suppressing unnecessary warning, preventing buffer release in I/O during failures, and adding timeout handling for Store Data requests to address potential firmware issues - Provide optimized __arch_hweight() implementations - Remove the unnecessary CPU KOBJ_CHANGE uevents generated during topology updates, as they are unused and also not present on other architectures - Cleanup atomic_ops, optimize __atomic_set() for small values and __atomic_cmpxchg_bool() for compilers supporting flag output constraint - Couple of cleanups for KVM: - Move and improve KVM struct definitions for DAT tables from gaccess.c to a new header - Pass the asce as parameter to sie64a() - Make the crdte() and cspg() page table handling wrappers return a boolean to indicate success, like the other existing "compare and swap" wrappers - Add documentation for HWCAP flags - Switch to obtaining total RAM pages from memblock instead of totalram_pages() during mm init, to ensure correct calculation of zero page size, when defer_init is enabled - Refactor lowcore access and switch to using the get_lowcore() function instead of the S390_lowcore macro - Cleanups for PG_arch_1 and folio handling in UV and hugetlb code - Add missing MODULE_DESCRIPTION() macros - Fix VM_FAULT_HWPOISON handling in do_exception() * tag 's390-6.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (54 commits) s390/mm: Fix VM_FAULT_HWPOISON handling in do_exception() s390/kvm: Move bitfields for dat tables s390/entry: Pass the asce as parameter to sie64a() s390/sthyi: Use cached data when diag is busy s390/sthyi: Move diag operations s390/hypfs_diag: Diag204 busy loop s390/diag: Add busy-indication-facility requirements s390/diag: Diag204 add busy return errno s390/diag: Return errno's from diag204 s390/sclp: Diag204 busy indication facility detection s390/atomic_ops: Make use of flag output constraint s390/atomic_ops: Improve __atomic_set() for small values s390/atomic_ops: Use symbolic names s390/smp: Switch to GENERIC_CPU_DEVICES s390/hwcaps: Add documentation for HWCAP flags s390/pgtable: Make crdte() and cspg() return a value s390/topology: Remove CPU KOBJ_CHANGE uevents s390/sclp: Add timeout to Store Data requests s390/sclp: Prevent release of buffer in I/O s390/sclp: Suppress unnecessary Store Data warning ...	2024-07-18 15:41:45 -07:00
Claudio Imbrenda	723ac2d6ba	s390/entry: Pass the asce as parameter to sie64a() Pass the guest ASCE explicitly as parameter, instead of having sie64a() take it from lowcore. This removes hidden state from lowcore, and makes things look cleaner. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Nico Boehr <nrb@linux.ibm.com> Link: https://lore.kernel.org/r/20240703155900.103783-2-imbrenda@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-10 19:50:45 +02:00
Mete Durlu	b7a5e5dfbd	s390/sthyi: Use cached data when diag is busy When sthyi is being emulated, data from diag204 is used. If diag204 returns busy, previously cached sthyi info block is returned to the caller and cache expiry is set to expired. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-10 19:50:45 +02:00
Mete Durlu	6fdf72c9a9	s390/sthyi: Move diag operations Move diag204 related operations to their own functions for better error handling and better readability. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-10 19:50:45 +02:00
Mete Durlu	df7e714d6d	s390/diag: Diag204 add busy return errno When diag204-busy-indication facility is installed, diag204 can return '8' which means device is busy and no operation is done. Add check for return codes of diag204 call. Return error codes according to diag204 return codes. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-10 19:50:45 +02:00
Mete Durlu	bb9be93acb	s390/diag: Return errno's from diag204 Return different errno's from diag204 to allow users to handle them accordingly. Instead of returning -1 regardless of the failing condition, return -EINVAL on invalid memory address and -EOPNOTSUPP when diag instruction fails. Acked-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Tobias Huschle <huschle@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-10 19:50:44 +02:00
Sven Schnelle	4a39f12e75	s390/smp: Switch to GENERIC_CPU_DEVICES Instead of setting up non-boot CPUs early in architecture code, only setup the cpu present mask and let the generic code handle cpu bringup. Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-07-10 19:50:44 +02:00
Ilya Leoshkevich	7e17eac28a	s390/unwind: disable KMSAN checks The unwind code can read uninitialized frames. Furthermore, even in the good case, KMSAN does not emit shadow for backchains. Therefore disable it for the unwinding functions. Link: https://lkml.kernel.org/r/20240621113706.315500-37-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-07-03 19:30:25 -07:00
Ilya Leoshkevich	c1057a707a	s390/traps: unpoison the kernel_stack_overflow()'s pt_regs This is normally done by the generic entry code, but the kernel_stack_overflow() flow bypasses it. Link: https://lkml.kernel.org/r/20240621113706.315500-34-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-07-03 19:30:25 -07:00
Ilya Leoshkevich	0cfd60a6a1	s390/ftrace: unpoison ftrace_regs in kprobe_ftrace_handler() s390 uses assembly code to initialize ftrace_regs and call kprobe_ftrace_handler(). Therefore, from the KMSAN's point of view, ftrace_regs is poisoned on kprobe_ftrace_handler() entry. This causes KMSAN warnings when running the ftrace testsuite. Fix by trusting the assembly code and always unpoisoning ftrace_regs in kprobe_ftrace_handler(). Link: https://lkml.kernel.org/r/20240621113706.315500-30-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Heiko Carstens <hca@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-07-03 19:30:24 -07:00
Ilya Leoshkevich	1f4cf63979	s390/diag: unpoison diag224() output buffer Diagnose 224 stores 4k bytes, which currently cannot be deduced from the inline assembly constraints. This leads to KMSAN false positives. Fix the constraints by using a 4k-sized struct instead of a raw pointer. While at it, prettify them too. Link: https://lkml.kernel.org/r/20240621113706.315500-29-iii@linux.ibm.com Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Suggested-by: Heiko Carstens <hca@linux.ibm.com> Reviewed-by: Alexander Potapenko <glider@google.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: <kasan-dev@googlegroups.com> Cc: Marco Elver <elver@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org> Cc: Pekka Enberg <penberg@kernel.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>	2024-07-03 19:30:24 -07:00
Mete Durlu	d6d1aa519c	s390/topology: Remove CPU KOBJ_CHANGE uevents s390 generates KOBJ_CHANGE uevents on CPUs whenever a topology update occurs. These uevents currently have no users and they are also not present on other architectures. As they are not necessary, remove these extra uevents. Suggested-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Mete Durlu <meted@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-07-02 10:17:16 +02:00
Arnd Bergmann	5daf62da52	s390: remove native mmap2() syscall The mmap2() syscall has never been used on 64-bit s390x and should have been removed as part of `5a79859ae0` ("s390: remove 31 bit support"). Remove it now. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-06-25 15:57:37 +02:00
Arnd Bergmann	d3882564a7	syscalls: fix compat_sys_io_pgetevents_time64 usage Using sys_io_pgetevents() as the entry point for compat mode tasks works almost correctly, but misses the sign extension for the min_nr and nr arguments. This was addressed on parisc by switching to compat_sys_io_pgetevents_time64() in commit `6431e92fc8` ("parisc: io_pgetevents_time64() needs compat syscall in 32-bit compat mode"), as well as by using more sophisticated system call wrappers on x86 and s390. However, arm64, mips, powerpc, sparc and riscv still have the same bug. Change all of them over to use compat_sys_io_pgetevents_time64() like parisc already does. This was clearly the intention when the function was originally added, but it got hooked up incorrectly in the tables. Cc: stable@vger.kernel.org Fixes: `48166e6ea4` ("y2038: add 64-bit time_t syscalls to all 32-bit architectures") Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Signed-off-by: Arnd Bergmann <arnd@arndb.de>	2024-06-25 15:57:20 +02:00
Sven Schnelle	81f907b246	s390/mm: Remove duplicate get_lowcore() calls Assign the output from get_lowcore() to a local variable, so the code is easier to read. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-06-18 17:01:33 +02:00
Sven Schnelle	15428734e1	s390/idle: Remove duplicate get_lowcore() calls Assign the output from get_lowcore() to a local variable, so the code is easier to read. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-06-18 17:01:33 +02:00
Sven Schnelle	46c3031108	s390/vtime: Remove duplicate get_lowcore() calls Assign the output from get_lowcore() to a local variable, so the code is easier to read. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-06-18 17:01:33 +02:00
Sven Schnelle	eb28ec2b2e	s390/smp: Remove duplicate get_lowcore() calls Assign the output from get_lowcore() to a local variable, so the code is easier to read. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-06-18 17:01:33 +02:00
Sven Schnelle	d7c3ebc49e	s390/nmi: Remove duplicate get_lowcore() calls Assign the output from get_lowcore() to a local variable, so the code is easier to read. Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-06-18 17:01:33 +02:00
Sven Schnelle	208da1d5fc	s390: Replace S390_lowcore by get_lowcore() Replace all S390_lowcore usages in arch/s390/ by get_lowcore(). Acked-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>	2024-06-18 17:01:33 +02:00
Thomas Richter	582cc1b28e	s390/pai_ext: Enable per-task and system-wide sampling event The PMU for PAI NNPA counters enforces the following restriction: - No per-task context for PAI sampling event NNPA_ALL - No multiple system-wide PAI sampling event NNPA_ALL Both restrictions are removed. One or more per-task sampling events are supported. Also one or more system-wide sampling events are supported. Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-06-07 16:49:08 +02:00
Thomas Richter	3f9ff4c5a0	s390/pai_ext: Enable per-task counting event The PMU for PAI NNPA counters enforces the following restriction: - No per-task context for PAI NNPA counters. This restriction is removed. One or more per-task/system-wide counting events can now be active at the same time while one system wide sampling event is active. Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-06-07 16:49:08 +02:00
Thomas Richter	14e3768435	s390/pai_ext: Enable concurrent system-wide counting/sampling The PMU for PAI NNPA counters enforces the following restriction: - No system wide counting while system wide sampling is active. This restriction is removed. One or more system wide counting events can now be active at the same time while at most one system wide sampling event is active. Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-06-07 16:49:07 +02:00
Thomas Richter	9f66572f28	s390/pai_crypto: Enable per-task and system-wide sampling event The PMU for PAI crypto counters enforces the following restrictions: - No per-task context for PAI crypto sampling event CRYPTO_ALL - No multiple system-wide PAI crypto sampling event CRYPTO_ALL Both restrictions are removed. One or more per-task sampling events are supported. Also one or more system-wide sampling events are supported. Example for per-task context of sampling event CRYPTO_ALL: # perf record -e pai_crypto/CRYPTO_ALL/ -- true Example for system-wide context of sampling event CRYPTO_ALL: # perf record -e pai_crypto/CRYPTO_ALL/ -a -- sleep 4 Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>	2024-06-07 16:49:07 +02:00

... 2 3 4 5 6 ...

4108 Commits