mirror of https://github.com/torvalds/linux.git
489 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
7cd122b552 |
Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
Reference is grabbed and not released, but it's not actually _stored_
anywhere. That works, but it's hard to follow and verify; among other
things, we have no way to tell _which_ of the increments is intended
to be an unpaired one. Worse, on removal we need to decide whether
the reference had already been dropped, which can be non-trivial if
that removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done. Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).
Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
marking those "leaked" dentries. Having it set claims responsibility
for +1 in refcount.
The end result this series is aiming for:
* get these unbalanced dget() and dput() replaced with new primitives that
would, in addition to adjusting refcount, set and clear persistency flag.
* instead of having kill_litter_super() mess with removing the remaining
"leaked" references (e.g. for all tmpfs files that hadn't been removed
prior to umount), have the regular shrink_dcache_for_umount() strip
DCACHE_PERSISTENT of all dentries, dropping the corresponding
reference if it had been set. After that kill_litter_super() becomes
an equivalent of kill_anon_super().
Doing that in a single step is not feasible - it would affect too many places
in too many filesystems. It has to be split into a series.
This work has really started early in 2024; quite a few preliminary pieces
have already gone into mainline. This chunk is finally getting to the
meat of that stuff - infrastructure and most of the conversions to it.
Some pieces are still sitting in the local branches, but the bulk of
that stuff is here.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
=rKq+
-----END PGP SIGNATURE-----
Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull persistent dentry infrastructure and conversion from Al Viro:
"Some filesystems use a kinda-sorta controlled dentry refcount leak to
pin dentries of created objects in dcache (and undo it when removing
those). A reference is grabbed and not released, but it's not actually
_stored_ anywhere.
That works, but it's hard to follow and verify; among other things, we
have no way to tell _which_ of the increments is intended to be an
unpaired one. Worse, on removal we need to decide whether the
reference had already been dropped, which can be non-trivial if that
removal is on umount and we need to figure out if this dentry is
pinned due to e.g. unlink() not done. Usually that is handled by using
kill_litter_super() as ->kill_sb(), but there are open-coded special
cases of the same (consider e.g. /proc/self).
Things get simpler if we introduce a new dentry flag
(DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
claims responsibility for +1 in refcount.
The end result this series is aiming for:
- get these unbalanced dget() and dput() replaced with new primitives
that would, in addition to adjusting refcount, set and clear
persistency flag.
- instead of having kill_litter_super() mess with removing the
remaining "leaked" references (e.g. for all tmpfs files that hadn't
been removed prior to umount), have the regular
shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
dropping the corresponding reference if it had been set. After that
kill_litter_super() becomes an equivalent of kill_anon_super().
Doing that in a single step is not feasible - it would affect too many
places in too many filesystems. It has to be split into a series.
This work has really started early in 2024; quite a few preliminary
pieces have already gone into mainline. This chunk is finally getting
to the meat of that stuff - infrastructure and most of the conversions
to it.
Some pieces are still sitting in the local branches, but the bulk of
that stuff is here"
* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
d_make_discardable(): warn if given a non-persistent dentry
kill securityfs_recursive_remove()
convert securityfs
get rid of kill_litter_super()
convert rust_binderfs
convert nfsctl
convert rpc_pipefs
convert hypfs
hypfs: swich hypfs_create_u64() to returning int
hypfs: switch hypfs_create_str() to returning int
hypfs: don't pin dentries twice
convert gadgetfs
gadgetfs: switch to simple_remove_by_name()
convert functionfs
functionfs: switch to simple_remove_by_name()
functionfs: fix the open/removal races
functionfs: need to cancel ->reset_work in ->kill_sb()
functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
functionfs: don't abuse ffs_data_closed() on fs shutdown
convert selinuxfs
...
|
|
|
|
8f7aa3d3c7 |
Networking changes for 6.19.
Core & protocols
----------------
- Replace busylock at the Tx queuing layer with a lockless list. Resulting
in a 300% (4x) improvement on heavy TX workloads, sending twice the
number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue. Normally we perform queue re-selection when flow comes out
of idle, but under extreme circumstances the flows may be constantly
busy. Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already did
for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly
quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for packets.
- Make MPTCP use Rx backlog processing. This lowers the lock pressure,
improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor fit
for modern container workloads, where limits are imposed using cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily
aggressive rcvbuf growth and overshot when the connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC 5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API
----------
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics,
as defined by the OPEN Alliance's "Advanced diagnostic features
for 100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads IPsec
and performs RSS.
- Add info to devlink params whether the current setting is the default
or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame duplication
for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers
--------------
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback
for reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which supports
Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as other
drivers) to avoid wasting resources if feature is unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211 debugfs
interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkveRQACgkQMUZtbf5S
IrvY7A/+Nb0o4BxLHjPkAl1m3t3q2d0Y29B7SNkwnwEtxAV8EkNeZ3GWrdtDnTQY
MYhmc7LEzvz8/lihapr7UJkcokzSASUV54hbez5jDBKC8EEoyUk8FdWDPerwlcRI
zmCFNAVFyh9GX8i7wcrzKbDTHT5+GZLbSlGl9U5mhLsDdRlJgH7d8PJ7vWcmtLFY
XN0paDyaeHfCl8wReWNAYx4C/I0ODOvlscpO0tnAKhB0ngJbQCKY2t6tn3rOYdif
ZSQ5KwVRnJtQ4fYOFMOy9+FSCjVXtyrxF8KLxD+mqom2ZhmO00UpOMl09tqhq3uT
WnvwoHUVBt6F+iITHwg5kMgIDPUq1kpUvL4S4UbVSuUm9ZKD+4KRU2ZHRBYMx+MU
bsqmtY8/IULClUoRz+tZhltA8eb0NEqNZE2JPOFDiJHn1YiCCkFwxibhir893oM3
sB7x65D7LQI2ty2BBGVGYnwYDPtyaxOA/s3WTwPvLEi3+Y/TGNIIrS9lBLA4U+Yr
Gi93WQGVjttMmVyaHgXBUGmi3L52hvolm0AZ8zSRGrnIEpecjhly2KfYuaOzuxXC
IHEQ6AFLdRh6JzafXGb/mQwGCHNmhwsY8A49i94fakWQamaL/L6A+1dyPu4LXMqi
NwqCmlVb/LKGlfNG+V4wT27srJ+yBA2Vk3tpR1sZQQytFh0LKHI=
=UoDR
-----END PGP SIGNATURE-----
Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Replace busylock at the Tx queuing layer with a lockless list.
Resulting in a 300% (4x) improvement on heavy TX workloads, sending
twice the number of packets per second, for half the cpu cycles.
- Allow constantly busy flows to migrate to a more suitable CPU/NIC
queue.
Normally we perform queue re-selection when flow comes out of idle,
but under extreme circumstances the flows may be constantly busy.
Add sysctl to allow periodic rehashing even if it'd risk packet
reordering.
- Optimize the NAPI skb cache, make it larger, use it in more paths.
- Attempt returning Tx skbs to the originating CPU (like we already
did for Rx skbs).
- Various data structure layout and prefetch optimizations from Eric.
- Remove ktime_get() from the recvmsg() fast path, ktime_get() is
sadly quite expensive on recent AMD machines.
- Extend threaded NAPI polling to allow the kthread busy poll for
packets.
- Make MPTCP use Rx backlog processing. This lowers the lock
pressure, improving the Rx performance.
- Support memcg accounting of MPTCP socket memory.
- Allow admin to opt sockets out of global protocol memory accounting
(using a sysctl or BPF-based policy). The global limits are a poor
fit for modern container workloads, where limits are imposed using
cgroups.
- Improve heuristics for when to kick off AF_UNIX garbage collection.
- Allow users to control TCP SACK compression, and default to 33% of
RTT.
- Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
unnecessarily aggressive rcvbuf growth and overshot when the
connection RTT is low.
- Preserve skb metadata space across skb_push / skb_pull operations.
- Support for IPIP encapsulation in the nftables flowtable offload.
- Support appending IP interface information to ICMP messages (RFC
5837).
- Support setting max record size in TLS (RFC 8449).
- Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
- Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
- Let users configure the number of write buffers in SMC.
- Add new struct sockaddr_unsized for sockaddr of unknown length,
from Kees.
- Some conversions away from the crypto_ahash API, from Eric Biggers.
- Some preparations for slimming down struct page.
- YAML Netlink protocol spec for WireGuard.
- Add a tool on top of YAML Netlink specs/lib for reporting commonly
computed derived statistics and summarized system state.
Driver API:
- Add CAN XL support to the CAN Netlink interface.
- Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
defined by the OPEN Alliance's "Advanced diagnostic features for
100BASE-T1 automotive Ethernet PHYs" specification.
- Add DPLL phase-adjust-gran pin attribute (and implement it in
zl3073x).
- Refactor xfrm_input lock to reduce contention when NIC offloads
IPsec and performs RSS.
- Add info to devlink params whether the current setting is the
default or a user override. Allow resetting back to default.
- Add standard device stats for PSP crypto offload.
- Leverage DSA frame broadcast to implement simple HSR frame
duplication for a lot of switches without dedicated HSR offload.
- Add uAPI defines for 1.6Tbps link modes.
Device drivers:
- Add Motorcomm YT921x gigabit Ethernet switch support.
- Add MUCSE driver for N500/N210 1GbE NIC series.
- Convert drivers to support dedicated ops for timestamping control,
and away from the direct IOCTL handling. While at it support GET
operations for PHY timestamping.
- Add (and convert most drivers to) a dedicated ethtool callback for
reading the Rx ring count.
- Significant refactoring efforts in the STMMAC driver, which
supports Synopsys turn-key MAC IP integrated into a ton of SoCs.
- Ethernet high-speed NICs:
- Broadcom (bnxt):
- support PPS in/out on all pins
- Intel (100G, ice, idpf):
- ice: implement standard ethtool and timestamping stats
- i40e: support setting the max number of MAC addresses per VF
- iavf: support RSS of GTP tunnels for 5G and LTE deployments
- nVidia/Mellanox (mlx5):
- reduce downtime on interface reconfiguration
- disable being an XDP redirect target by default (same as
other drivers) to avoid wasting resources if feature is
unused
- Meta (fbnic):
- add support for Linux-managed PCS on 25G, 50G, and 100G links
- Wangxun:
- support Rx descriptor merge, and Tx head writeback
- support Rx coalescing offload
- support 25G SPF and 40G QSFP modules
- Ethernet virtual:
- Google (gve):
- allow ethtool to configure rx_buf_len
- implement XDP HW RX Timestamping support for DQ descriptor
format
- Microsoft vNIC (mana):
- support HW link state events
- handle hardware recovery events when probing the device
- Ethernet NICs consumer, and embedded:
- usbnet: add support for Byte Queue Limits (BQL)
- AMD (amd-xgbe):
- add device selftests
- NXP (enetc):
- add i.MX94 support
- Broadcom integrated MACs (bcmgenet, bcmasp):
- bcmasp: add support for PHY-based Wake-on-LAN
- Broadcom switches (b53):
- support port isolation
- support BCM5389/97/98 and BCM63XX ARL formats
- Lantiq/MaxLinear switches:
- support bridge FDB entries on the CPU port
- use regmap for register access
- allow user to enable/disable learning
- support Energy Efficient Ethernet
- support configuring RMII clock delays
- add tagging driver for MaxLinear GSW1xx switches
- Synopsys (stmmac):
- support using the HW clock in free running mode
- add Eswin EIC7700 support
- add Rockchip RK3506 support
- add Altera Agilex5 support
- Cadence (macb):
- cleanup and consolidate descriptor and DMA address handling
- add EyeQ5 support
- TI:
- icssg-prueth: support AF_XDP
- Airoha access points:
- add missing Ethernet stats and link state callback
- add AN7583 support
- support out-of-order Tx completion processing
- Power over Ethernet:
- pd692x0: preserve PSE configuration across reboots
- add support for TPS23881B devices
- Ethernet PHYs:
- Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
- Support 50G SerDes and 100G interfaces in Linux-managed PHYs
- micrel:
- support for non PTP SKUs of lan8814
- enable in-band auto-negotiation on lan8814
- realtek:
- cable testing support on RTL8224
- interrupt support on RTL8221B
- motorcomm: support for PHY LEDs on YT853
- microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
- mscc: support for PHY LED control
- CAN drivers:
- m_can: add support for optional reset and system wake up
- remove can_change_mtu() obsoleted by core handling
- mcp251xfd: support GPIO controller functionality
- Bluetooth:
- add initial support for PASTa
- WiFi:
- split ieee80211.h file, it's way too big
- improvements in VHT radiotap reporting, S1G, Channel Switch
Announcement handling, rate tracking in mesh networks
- improve multi-radio monitor mode support, and add a cfg80211
debugfs interface for it
- HT action frame handling on 6 GHz
- initial chanctx work towards NAN
- MU-MIMO sniffer improvements
- WiFi drivers:
- RealTek (rtw89):
- support USB devices RTL8852AU and RTL8852CU
- initial work for RTL8922DE
- improved injection support
- Intel:
- iwlwifi: new sniffer API support
- MediaTek (mt76):
- WED support for >32-bit DMA
- airoha NPU support
- regdomain improvements
- continued WiFi7/MLO work
- Qualcomm/Atheros:
- ath10k: factory test support
- ath11k: TX power insertion support
- ath12k: BSS color change support
- ath12k: statistics improvements
- brcmfmac: Acer A1 840 tablet quirk
- rtl8xxxu: 40 MHz connection fixes/support"
* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
net: page_pool: sanitise allocation order
net: page pool: xa init with destroy on pp init
net/mlx5e: Support XDP target xmit with dummy program
net/mlx5e: Update XDP features in switch channels
selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
wireguard: netlink: generate netlink code
wireguard: uapi: generate header with ynl-gen
wireguard: uapi: move flag enums
wireguard: uapi: move enum wg_cmd
wireguard: netlink: add YNL specification
selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
selftests: drv-net: introduce Iperf3Runner for measurement use cases
selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
Documentation: net: dsa: mention simple HSR offload helpers
Documentation: net: dsa: mention availability of RedBox
...
|
|
|
|
784faa8eca |
Rust changes for v6.19
Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io), and
it is also used by the Rust compiler itself. While the amount of code
is substantial, there should not be many updates needed for these
crates, and even if there are, they should not be too big, e.g. +7k
-3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for doctests.
Examples (i.e. doctests) may want to do things like show public items
and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code as
possible (this is part of why running Clippy on doctests is important
for us, e.g. for safety comments, which userspace Rust does not
support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension trait
and a new 'fmt!' macro which replaces the previous 'core' import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' less significant bits of the wrapped
type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>:🆕:<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can be constructed from simple non-constant expressions or,
for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations (with
both their backing type and other 'Bounded's with a compatible
backing type), casts to change the backing type, extending/shrinking
and infallible/fallible conversions from/to primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where appropriate.
The existing fully-featured mutable cursor is renamed to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version in
Linux follows Debian Stable releases, with Debian 13 being the first
one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to contribute
for a few years now, so it is a no-op change in practice.
And a few other cleanups and improvements.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmks0WkACgkQGXyLc2ht
IW3QQg/+LpBmrz0ZSKH24kcX3x/hpfA2Erd4cmn+qjXev9RSAM1bt9xf3dsAhhFd
BStUpf0aglHOSEuWAvNHqEb5+yu+6qy6EFXXqH0ASexHK93t77Jbztjtf3SMlykT
N/lSJ+LWw2WiRT0NRWoTfKaEWzZQ8j9fi9Jb/IGNZGdNMryisVUYWqzLwNupPuK+
RMcEitHdO2NWjyodk2GGRyYQ7+XxQgbXZoxtgeubPSrrmGuGTXV42RlQKC2KHPx3
gz6CwcO3Xd0bGHHSgP32QDtGRJtniO8iXBKxiooT+ys+M83fTKbwNrIrW3tHdheY
765qsd/NvUmAkcgTCoLqj5biU6LCsepyimNg1vf4pYFohBoTaGeN+UqzbXBrSjy2
pmrgxwMRVHsYz+zoSKAVKJl7ASba5BXFdI4Whgfqwwc9So/X7uyujIYXGbRoznCV
W5vu7OUboBy26NvcsPrf6BqWcsJEpGV/M4z2UBRjAoJTRGQMcm/ckuo/GfYm3yW+
bUW62UmVCdY5crpo7XPH/G4ZGBR/k3p9dLVt8OJxEoTlfw4KDE5BszJoXmejZqdi
9LEMhzTWwoFp9NspQuEGdYdfGRlfG6XXqrwGZtQI+dlc4RvFEgBBu2Lxotq+Ods0
EfCVCJQjWmyCodVdJ/QqbCRFuXtOFLr/hPdWnvlrRxVkPtF2CDw=
=9nM+
-----END PGP SIGNATURE-----
Merge tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull Rust updates from Miguel Ojeda:
"Toolchain and infrastructure:
- Add support for 'syn'.
Syn is a parsing library for parsing a stream of Rust tokens into a
syntax tree of Rust source code.
Currently this library is geared toward use in Rust procedural
macros, but contains some APIs that may be useful more generally.
'syn' allows us to greatly simplify writing complex macros such as
'pin-init' (Benno has already prepared the 'syn'-based version). We
will use it in the 'macros' crate too.
'syn' is the most downloaded Rust crate (according to crates.io),
and it is also used by the Rust compiler itself. While the amount
of code is substantial, there should not be many updates needed for
these crates, and even if there are, they should not be too big,
e.g. +7k -3k lines across the 3 crates in the last year.
'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
I only modified their code to remove a third dependency
('unicode-ident') and to add the SPDX identifiers. The code can be
easily verified to exactly match upstream with the provided
scripts.
They are all licensed under "Apache-2.0 OR MIT", like the other
vendored 'alloc' crate we had for a while.
Please see the merge commit with the cover letter for more context.
- Allow 'unreachable_pub' and 'clippy::disallowed_names' for
doctests.
Examples (i.e. doctests) may want to do things like show public
items and use names such as 'foo'.
Nevertheless, we still try to keep examples as close to real code
as possible (this is part of why running Clippy on doctests is
important for us, e.g. for safety comments, which userspace Rust
does not support yet but we are stricter).
'kernel' crate:
- Replace our custom 'CStr' type with 'core::ffi::CStr'.
Using the standard library type reduces our custom code footprint,
and we retain needed custom functionality through an extension
trait and a new 'fmt!' macro which replaces the previous 'core'
import.
This started in 6.17 and continued in 6.18, and we finally land the
replacement now. This required quite some stamina from Tamir, who
split the changes in steps to prepare for the flag day change here.
- Replace 'kernel::c_str!' with C string literals.
C string literals were added in Rust 1.77, which produce '&CStr's
(the 'core' one), so now we can write:
c"hi"
instead of:
c_str!("hi")
- Add 'num' module for numerical features.
It includes the 'Integer' trait, implemented for all primitive
integer types.
It also includes the 'Bounded' integer wrapping type: an integer
value that requires only the 'N' least significant bits of the
wrapped type to be encoded:
// An unsigned 8-bit integer, of which only the 4 LSBs are used.
let v = Bounded::<u8, 4>:🆕:<15>();
assert_eq!(v.get(), 15);
'Bounded' is useful to e.g. enforce guarantees when working with
bitfields that have an arbitrary number of bits.
Values can also be constructed from simple non-constant expressions
or, for more complex ones, validated at runtime.
'Bounded' also comes with comparison and arithmetic operations
(with both their backing type and other 'Bounded's with a
compatible backing type), casts to change the backing type,
extending/shrinking and infallible/fallible conversions from/to
primitives as applicable.
- 'rbtree' module: add immutable cursor ('Cursor').
It enables to use just an immutable tree reference where
appropriate. The existing fully-featured mutable cursor is renamed
to 'CursorMut'.
kallsyms:
- Fix wrong "big" kernel symbol type read from procfs.
'pin-init' crate:
- A couple minor fixes (Benno asked me to pick these patches up for
him this cycle).
Documentation:
- Quick Start guide: add Debian 13 (Trixie).
Debian Stable is now able to build Linux, since Debian 13 (released
2025-08-09) packages Rust 1.85.0, which is recent enough.
We are planning to propose that the minimum supported Rust version
in Linux follows Debian Stable releases, with Debian 13 being the
first one we upgrade to, i.e. Rust 1.85.
MAINTAINERS:
- Add entry for the new 'num' module.
- Remove Alex as Rust maintainer: he hasn't had the time to
contribute for a few years now, so it is a no-op change in
practice.
And a few other cleanups and improvements"
* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
rust: macros: support `proc-macro2`, `quote` and `syn`
rust: syn: enable support in kbuild
rust: syn: add `README.md`
rust: syn: remove `unicode-ident` dependency
rust: syn: add SPDX License Identifiers
rust: syn: import crate
rust: quote: enable support in kbuild
rust: quote: add `README.md`
rust: quote: add SPDX License Identifiers
rust: quote: import crate
rust: proc-macro2: enable support in kbuild
rust: proc-macro2: add `README.md`
rust: proc-macro2: remove `unicode_ident` dependency
rust: proc-macro2: add SPDX License Identifiers
rust: proc-macro2: import crate
rust: kbuild: support using libraries in `rustc_procmacro`
rust: kbuild: support skipping flags in `rustc_test_library`
rust: kbuild: add proc macro library support
rust: kbuild: simplify `--cfg` handling
rust: kbuild: introduce `core-flags` and `core-skip_flags`
...
|
|
|
|
68e83f3472 |
tools: ynl-gen: add regeneration comment
Add a comment on regeneration to the generated files. The comment is placed after the YNL-GEN line[1], as to not interfere with ynl-regen.sh's detection logic. [1] and after the optional YNL-ARG line. Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/ Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net Signed-off-by: Jakub Kicinski <kuba@kernel.org> |
|
|
|
4433d8e25d |
convert rust_binderfs
Parallel to binderfs stuff: * use simple_start_creating()/simple_done_creating()/d_make_persistent() instead of manual inode_lock()/lookup_noperm()/d_instanitate()/inode_unlock(). * allocate inode first - simpler cleanup that way. * use simple_recursive_removal() instead of open-coding it. * switch to kill_anon_super() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
f56b131723 |
rust: rbtree: add immutable cursor
Sometimes we may need to iterate over, or find an element in a read only (or read mostly) red-black tree, and in that case we don't need a mutable reference to the tree, which we'll however have to take to be able to use the current (mutable) cursor implementation. This patch adds a simple immutable cursor implementation to RBTree, which enables us to use an immutable tree reference. The existing (fully featured) cursor implementation is renamed to CursorMut, while retaining its functionality. The only existing user of the [mutable] cursor for RBTrees (binder) is updated to match the changes. Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20251014123339.2492210-1-vitaly.wool@konsulko.se [ Applied `rustfmt`. Added intra-doc link. Fixed unclosed example. Fixed docs description. Fixed typo and other formatting nits. - Miguel ] Signed-off-by: Miguel Ojeda <ojeda@kernel.org> |
|
|
|
b89aa54482 |
convert binderfs
Objects are created either by d_alloc_name()+d_add() (in binderfs_ctl_create()) or by simple_start_creating()+d_instantiate(). Removals are by simple_recurisive_removal(). Switch d_add()/d_instantiate() to d_make_persistent() + dput(). Voila - kill_litter_super() is not needed anymore. Fold dput()+unlocking the parent into simple_done_creating(), while we are at it. NOTE: return value of binderfs_create_file() is borrowed; it may get stored in proc->binderfs_entry. See binder_release()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
02da8d2c09 |
binderfs_binder_ctl_create(): kill a bogus check
It's called once, during binderfs mount, right after allocating root dentry. Checking that it hadn't been already called is only obfuscating things. Looks like that bogosity had been copied from devpts... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
185d241c88 |
binderfs: use simple_start_creating()
binderfs_binder_device_create() gets simpler, binderfs_create_dentry() simply goes away... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
d90eeb8ecd |
binder: remove "invalid inc weak" check
There are no scenarios where a weak increment is invalid on binder_node.
The only possible case where it could be invalid is if the kernel
delivers BR_DECREFS to the process that owns the node, and then
increments the weak refcount again, effectively "reviving" a dead node.
However, that is not possible: when the BR_DECREFS command is delivered,
the kernel removes and frees the binder_node. The fact that you were
able to call binder_inc_node_nilocked() implies that the node is not yet
destroyed, which implies that BR_DECREFS has not been delivered to
userspace, so incrementing the weak refcount is valid.
Note that it's currently possible to trigger this condition if the owner
calls BINDER_THREAD_EXIT while node->has_weak_ref is true. This causes
BC_INCREFS on binder_ref instances to fail when they should not.
Cc: stable@vger.kernel.org
Fixes:
|
|
|
|
3b83f5d5e7 |
rust: replace `CStr` with `core::ffi::CStr`
`kernel::ffi::CStr` was introduced in commit |
|
|
|
0dac8cf44b |
rust_binder: use `core::ffi::CStr` method names
Prepare for `core::ffi::CStr` taking the place of `kernel::str::CStr` by
avoiding methods that only exist on the latter.
This backslid in commit
|
|
|
|
5aed9677e5 |
rust_binder: use `kernel::fmt`
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.
This backslid in commit
|
|
|
|
d9252f1be2 |
rust_binder: remove trailing comma
This prepares for a later commit in which we introduce a custom formatting macro; that macro doesn't handle trailing commas so just remove this one. Reviewed-by: Alice Ryhl <aliceryhl@google.com> Signed-off-by: Tamir Duberstein <tamird@gmail.com> Link: https://patch.msgid.link/20251018-cstr-core-v18-2-9378a54385f8@gmail.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org> |
|
|
|
7557f18994 |
binder: Fix missing kernel-doc entries in binder.c
Fix several kernel-doc warnings in `drivers/android/binder.c` caused by undocumented struct members and function parameters. In particular, add missing documentation for the `@thread` parameter in binder_free_buf_locked(). Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
b5ce7a5cc5 |
rust_binder: report freeze notification only when fully frozen
Binder only sends out freeze notifications when ioctl_freeze() completes and the process has become fully frozen. However, if a freeze notification is registered during the freeze operation, then it registers an initial state of 'frozen'. This is a problem because if the freeze operation fails, then the listener is not told about that state change, leading to lost updates. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
99559e5bb4 |
rust_binder: don't delete FreezeListener if there are pending duplicates
When userspace issues commands to a freeze listener, it identifies it using a cookie. Normally this cookie uniquely identifies a freeze listener, but when userspace clears a listener with the intent of deleting it, it's allowed to "regret" clearing it and create a new freeze listener for the same node using the same cookie. (IMO this was an API mistake, but userspace relies on it.) Currently if the active freeze listener gets fully deleted while there are still pending duplicates, then the code incorrectly deletes the pending duplicates too. To fix this, do not delete the entry if there are still pending duplicates. Since the current data structure requires a main freeze listener, we convert one pending duplicate into the primary listener in this scenario. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
bfe144da06 |
rust_binder: freeze_notif_done should resend if wrong state
Consider the following scenario: 1. A freeze notification is delivered to thread 1. 2. The process becomes frozen or unfrozen. 3. The message for step 2 is delivered to thread 2 and ignored because there is already a pending notification from step 1. 4. Thread 1 acknowledges the notification from step 1. In this case, step 4 should ensure that the message ignored in step 3 is resent as it can now be delivered. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
c7c090af37 |
rust_binder: remove warning about orphan mappings
This condition occurs if a thread dies while processing a transaction. We should not print anything in this scenario. Signed-off-by: Alice Ryhl <aliceryhl@google.com> Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
7e69a24b6b |
rust_binder: clean `clippy::mem_replace_with_default` warning
Clippy reports:
error: replacing a value of type `T` with `T::default()` is better expressed using `core::mem::take`
--> drivers/android/binder/node.rs:690:32
|
690 | _unused_capacity = mem::replace(&mut inner.freeze_list, KVVec::new());
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using: `core::mem::take(&mut inner.freeze_list)`
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#mem_replace_with_default
= note: `-D clippy::mem-replace-with-default` implied by `-D warnings`
= help: to override `-D warnings` add `#[allow(clippy::mem_replace_with_default)]`
The suggestion seems fine, thus apply it.
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
|
|
eafedbc7c0 |
rust_binder: add Rust Binder driver
We're generally not proponents of rewrites (nasty uncomfortable things that make you late for dinner!). So why rewrite Binder? Binder has been evolving over the past 15+ years to meet the evolving needs of Android. Its responsibilities, expectations, and complexity have grown considerably during that time. While we expect Binder to continue to evolve along with Android, there are a number of factors that currently constrain our ability to develop/maintain it. Briefly those are: 1. Complexity: Binder is at the intersection of everything in Android and fulfills many responsibilities beyond IPC. It has become many things to many people, and due to its many features and their interactions with each other, its complexity is quite high. In just 6kLOC it must deliver transactions to the right threads. It must correctly parse and translate the contents of transactions, which can contain several objects of different types (e.g., pointers, fds) that can interact with each other. It controls the size of thread pools in userspace, and ensures that transactions are assigned to threads in ways that avoid deadlocks where the threadpool has run out of threads. It must track refcounts of objects that are shared by several processes by forwarding refcount changes between the processes correctly. It must handle numerous error scenarios and it combines/nests 13 different locks, 7 reference counters, and atomic variables. Finally, It must do all of this as fast and efficiently as possible. Minor performance regressions can cause a noticeably degraded user experience. 2. Things to improve: Thousand-line functions [1], error-prone error handling [2], and confusing structure can occur as a code base grows organically. After more than a decade of development, this codebase could use an overhaul. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n2896 [2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n3658 3. Security critical: Binder is a critical part of Android's sandboxing strategy. Even Android's most de-privileged sandboxes (e.g. the Chrome renderer, or SW Codec) have direct access to Binder. More than just about any other component, it's important that Binder provide robust security, and itself be robust against security vulnerabilities. It's #1 (high complexity) that has made continuing to evolve Binder and resolving #2 (tech debt) exceptionally difficult without causing #3 (security issues). For Binder to continue to meet Android's needs, we need better ways to manage (and reduce!) complexity without increasing the risk. The biggest change is obviously the choice of programming language. We decided to use Rust because it directly addresses a number of the challenges within Binder that we have faced during the last years. It prevents mistakes with ref counting, locking, bounds checking, and also does a lot to reduce the complexity of error handling. Additionally, we've been able to use the more expressive type system to encode the ownership semantics of the various structs and pointers, which takes the complexity of managing object lifetimes out of the hands of the programmer, reducing the risk of use-after-frees and similar problems. Rust has many different pointer types that it uses to encode ownership semantics into the type system, and this is probably one of the most important aspects of how it helps in Binder. The Binder driver has a lot of different objects that have complex ownership semantics; some pointers own a refcount, some pointers have exclusive ownership, and some pointers just reference the object and it is kept alive in some other manner. With Rust, we can use a different pointer type for each kind of pointer, which enables the compiler to enforce that the ownership semantics are implemented correctly. Another useful feature is Rust's error handling. Rust allows for more simplified error handling with features such as destructors, and you get compilation failures if errors are not properly handled. This means that even though Rust requires you to spend more lines of code than C on things such as writing down invariants that are left implicit in C, the Rust driver is still slightly smaller than C binder: Rust is 5.5kLOC and C is 5.8kLOC. (These numbers are excluding blank lines, comments, binderfs, and any debugging facilities in C that are not yet implemented in the Rust driver. The numbers include abstractions in rust/kernel/ that are unlikely to be used by other drivers than Binder.) Although this rewrite completely rethinks how the code is structured and how assumptions are enforced, we do not fundamentally change *how* the driver does the things it does. A lot of careful thought has gone into the existing design. The rewrite is aimed rather at improving code health, structure, readability, robustness, security, maintainability and extensibility. We also include more inline documentation, and improve how assumptions in the code are enforced. Furthermore, all unsafe code is annotated with a SAFETY comment that explains why it is correct. We have left the binderfs filesystem component in C. Rewriting it in Rust would be a large amount of work and requires a lot of bindings to the file system interfaces. Binderfs has not historically had the same challenges with security and complexity, so rewriting binderfs seems to have lower value than the rest of Binder. Correctness and feature parity ------------------------------ Rust binder passes all tests that validate the correctness of Binder in the Android Open Source Project. We can boot a device, and run a variety of apps and functionality without issues. We have performed this both on the Cuttlefish Android emulator device, and on a Pixel 6 Pro. As for feature parity, Rust binder currently implements all features that C binder supports, with the exception of some debugging facilities. The missing debugging facilities will be added before we submit the Rust implementation upstream. Tracepoints ----------- I did not include all of the tracepoints as I felt that the mechansim for making C access fields of Rust structs should be discussed on list separately. I also did not include the support for building Rust Binder as a module since that requires exporting a bunch of additional symbols on the C side. Original RFC Link with old benchmark numbers: https://lore.kernel.org/r/20231101-rust-binder-v1-0-08ba9197f637@google.com Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com> Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com> Co-developed-by: Matt Gilbride <mattgilbride@google.com> Signed-off-by: Matt Gilbride <mattgilbride@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250919-rust-binder-v2-1-a384b09f28dd@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
3ebcd3460c |
binder: fix double-free in dbitmap
A process might fail to allocate a new bitmap when trying to expand its
proc->dmap. In that case, dbitmap_grow() fails and frees the old bitmap
via dbitmap_free(). However, the driver calls dbitmap_free() again when
the same process terminates, leading to a double-free error:
==================================================================
BUG: KASAN: double-free in binder_proc_dec_tmpref+0x2e0/0x55c
Free of addr ffff00000b7c1420 by task kworker/9:1/209
CPU: 9 UID: 0 PID: 209 Comm: kworker/9:1 Not tainted 6.17.0-rc6-dirty #5 PREEMPT
Hardware name: linux,dummy-virt (DT)
Workqueue: events binder_deferred_func
Call trace:
kfree+0x164/0x31c
binder_proc_dec_tmpref+0x2e0/0x55c
binder_deferred_func+0xc24/0x1120
process_one_work+0x520/0xba4
[...]
Allocated by task 448:
__kmalloc_noprof+0x178/0x3c0
bitmap_zalloc+0x24/0x30
binder_open+0x14c/0xc10
[...]
Freed by task 449:
kfree+0x184/0x31c
binder_inc_ref_for_node+0xb44/0xe44
binder_transaction+0x29b4/0x7fbc
binder_thread_write+0x1708/0x442c
binder_ioctl+0x1b50/0x2900
[...]
==================================================================
Fix this issue by marking proc->map NULL in dbitmap_free().
Cc: stable@vger.kernel.org
Fixes:
|
|
|
|
8a61a53b07 |
binder: add tracepoint for netlink reports
Add a tracepoint to capture the same details that are being sent through the generic netlink interface during transaction failures. This provides a useful debugging tool to observe the events independently from the netlink listeners. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250727182932.2499194-6-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
f37b55ded8 |
binder: add transaction_report feature entry
Add "transaction_report" to the binderfs feature list, to help userspace determine if the "BINDER_CMD_REPORT" generic netlink api is supported by the binder driver. Signed-off-by: Li Li <dualli@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250727182932.2499194-5-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
63740349eb |
binder: introduce transaction reports via netlink
Introduce a generic netlink multicast event to report binder transaction failures to userspace. This allows subscribers to monitor these events and take appropriate actions, such as stopping a misbehaving application that is spamming a service with huge amount of transactions. The multicast event contains full details of the failed transactions, including the sender/target PIDs, payload size and specific error code. This interface is defined using a YAML spec, from which the UAPI and kernel headers and source are auto-generated. Signed-off-by: Li Li <dualli@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250727182932.2499194-4-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
5cd0645b43 |
binder: add t->is_async and t->is_reply
Replace the t->need_reply flag with the more descriptive t->is_async and and t->is_reply flags. The 'need_reply' flag was only used for debugging purposes and the new flags can be used to distinguish between the type of transactions too: sync, async and reply. For now, only update the logging in print_binder_transaction_ilocked(). However, the new flags can be used in the future to replace the current patterns and improve readability. e.g.: - if (!reply && !(tr->flags & TF_ONE_WAY)) + if (t->is_async) This patch is in preparation for binder's generic netlink implementation and no functional changes are intended. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250727182932.2499194-3-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
4afc5bf0a1 |
binder: pre-allocate binder_transaction
Move the allocation of 'struct binder_transaction' to the beginning of the binder_transaction() function, along with the initialization of all the members that are known at that time. This minor refactoring helps to consolidate the usage of transaction information at later points. This patch is in preparation for binder's generic netlink implementation and no functional changes are intended. Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250727182932.2499194-2-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
4d2604833e |
binder: remove MODULE_LICENSE()
The MODULE_LICENSE() macro is intended for drivers that can be built as loadable modules. The binder driver is always built-in, using this macro here is unnecessary and potentially confusing. Remove it. Cc: Salvatore Bonaccorso <carnil@debian.org> Signed-off-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250817135034.3692902-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
0d5ec7919f |
Char / Misc / IIO / other driver updates for 6.17-rc1
Here is the big set of char/misc/iio and other smaller driver subsystems
for 6.17-rc1. It's a big set this time around, with the huge majority
being in the iio subsystem with new drivers and dts files being added
there.
Highlights include:
- IIO driver updates, additions, and changes making more code const
and cleaning up some init logic
- bus_type constant conversion changes
- misc device test functions added
- rust miscdevice minor fixup
- unused function removals for some drivers
- mei driver updates
- mhi driver updates
- interconnect driver updates
- Android binder updates and test infrastructure added
- small cdx driver updates
- small comedi fixes
- small nvmem driver updates
- small pps driver updates
- some acrn virt driver fixes for printk messages
- other small driver updates
All of these have been in linux-next with no reported issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaIeYDg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+yk3dwCdF6xoEVZaDkLM5IF3XKWWgdYxjNsAoKUy2kUx
YtmVh4S0tMmugfeRGac7
=3Wxi
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / IIO / other driver updates from Greg KH:
"Here is the big set of char/misc/iio and other smaller driver
subsystems for 6.17-rc1. It's a big set this time around, with the
huge majority being in the iio subsystem with new drivers and dts
files being added there.
Highlights include:
- IIO driver updates, additions, and changes making more code const
and cleaning up some init logic
- bus_type constant conversion changes
- misc device test functions added
- rust miscdevice minor fixup
- unused function removals for some drivers
- mei driver updates
- mhi driver updates
- interconnect driver updates
- Android binder updates and test infrastructure added
- small cdx driver updates
- small comedi fixes
- small nvmem driver updates
- small pps driver updates
- some acrn virt driver fixes for printk messages
- other small driver updates
All of these have been in linux-next with no reported issues"
* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
binder: Use seq_buf in binder_alloc kunit tests
binder: Add copyright notice to new kunit files
misc: ti_fpc202: Switch to of_fwnode_handle()
bus: moxtet: Use dev_fwnode()
pc104: move PC104 option to drivers/Kconfig
drivers: virt: acrn: Don't use %pK through printk
comedi: fix race between polling and detaching
interconnect: qcom: Add Milos interconnect provider driver
dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
mei: more prints with client prefix
mei: bus: use cldev in prints
bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
bus: mhi: host: Detect events pointing to unexpected TREs
bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
bus: mhi: host: Use str_true_false() helper
bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
bus: mhi: host: Fix endianness of BHI vector table
bus: mhi: host: pci_generic: Disable runtime PM for QDU100
bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
...
|
|
|
|
2d9c1336ed |
VFS-related cleanups in various places (mostly of the "that really can't
happen" or "there's a better way to do it" variety) -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRK0QAKCRBZ7Krx/gZQ 66/LAPoCvj5nAZH41F1VfyinA6V96kKsAazjrG7ttpWenu+6GAD/e9YQIAtYro0Z 6f6EWTgrrEZqpOgc9kfHJq60m/TnSg8= =Ojq4 -----END PGP SIGNATURE----- Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull misc VFS updates from Al Viro: "VFS-related cleanups in various places (mostly of the "that really can't happen" or "there's a better way to do it" variety)" * tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: gpib: use file_inode() binder_ioctl_write_read(): simplify control flow a bit secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file apparmor: file never has NULL f_path.mnt landlock: opened file never has a negative dentry |
|
|
|
fa3f79e82d |
binder: Use seq_buf in binder_alloc kunit tests
Replace instances of snprintf with seq_buf functions, as suggested by
Kees [1].
[1] https://lore.kernel.org/all/202507160743.15E8044@keescook/
Fixes:
|
|
|
|
8a8d47e86c |
binder: Add copyright notice to new kunit files
Clean up for the binder_alloc kunit test series. Add a copyright notice
to new files, as suggested by Carlos [1].
[1] https://lore.kernel.org/all/CAFuZdDLD=3CBOLSWw3VxCf7Nkf884SSNmt1wresQgxgBwED=eQ@mail.gmail.com/
Fixes:
|
|
|
|
d1934ed980 |
binder: encapsulate individual alloc test cases
Each case tested by the binder allocator test is defined by 3 parameters: the end alignment type of each requested buffer allocation, whether those buffers share the front or back pages of the allotted address space, and the order in which those buffers should be released. The alignment type represents how a binder buffer may be laid out within or across page boundaries and relative to other buffers, and it's used along with whether the buffers cover part (sharing the front pages) of or all (sharing the back pages) of the vma to calculate the sizes passed into each test. binder_alloc_test_alloc recursively generates each possible arrangement of alignment types and then tests that the binder_alloc code tracks pages correctly when those buffers are allocated and then freed in every possible order at both ends of the address space. While they provide comprehensive coverage, they are poor candidates to be represented as KUnit test cases, which must be statically enumerated. For 5 buffers and 5 end alignment types, the test case array would have 750,000 entries. This change structures the recursive calls into meaningful test cases so that failures are easier to interpret. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-7-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
f6544dcdd0 |
binder: Convert binder_alloc selftests to KUnit
Convert the existing binder_alloc_selftest tests into KUnit tests. These tests allocate and free an exhaustive combination of buffers with various sizes and alignments. This change allows them to be run without blocking or otherwise interfering with other processes in binder. This test is refactored into more meaningful cases in the subsequent patch. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-6-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
5e024582f4 |
binder: Scaffolding for binder_alloc KUnit tests
Add setup and teardown for testing binder allocator code with KUnit. Include minimal test cases to verify that tests are initialized correctly. Tested-by: Rae Moar <rmoar@google.com> Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-5-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
4328a52642 |
binder: Store lru freelist in binder_alloc
Store a pointer to the free pages list that the binder allocator should use for a process inside of struct binder_alloc. This change allows binder allocator code to be tested and debugged deterministically while a system is using binder; i.e., without interfering with other binder processes and independently of the shrinker. This is necessary to convert the current binder_alloc_selftest into a kunit test that does not rely on hijacking an existing binder_proc to run. A binder process's binder_alloc->freelist should not be changed after it is initialized. A sole exception is the process that runs the existing binder_alloc selftest. Its freelist can be temporarily replaced for the duration of the test because it runs as a single thread before any pages can be added to the global binder freelist, and the test frees every page it allocates before dropping the binder_selftest_lock. This exception allows the existing selftest to be used to check for regressions, but it will be dropped when the binder_alloc tests are converted to kunit in a subsequent patch in this series. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-3-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
bea3e7bfa2 |
binder: Fix selftest page indexing
The binder allocator selftest was only checking the last page of buffers that ended on a page boundary. Correct the page indexing to account for buffers that are not page-aligned. Signed-off-by: Tiffany Yang <ynaffit@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250714185321.2417234-2-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
01afddcac6 |
binder: use guards for plain mutex- and spinlock-protected sections
Use 'guard(mutex)' and 'guard(spinlock)' for plain (i.e. non-scoped) mutex- and spinlock-protected sections, respectively, thus making locking a bit simpler. Briefly tested with 'stress-ng --binderfs'. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Acked-by: Carlos Llamas <cmllamas@google.com> Link: https://lore.kernel.org/r/20250626073054.7706-2-dmantipov@yandex.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
1da2dca2fb |
binder: use kstrdup() in binderfs_binder_device_create()
In 'binderfs_binder_device_create()', use 'kstrdup()' to copy the newly created device's name, thus making the former a bit simpler. Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru> Acked-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: "Tiffany Y. Yang" <ynaffit@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250626073054.7706-1-dmantipov@yandex.ru Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
1664a91025 |
kill binderfs_remove_file()
don't try to open-code simple_recursive_removal(), especially when you miss things like d_invalidate()... Reviewed-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
421d3a860d |
binder: Remove unused binder lock events
Trace events can take up to 5K each when they are defined, regardless if
they are used or not. The binder lock events: binder_lock, binder_locked
and binder_unlock are no longer used.
Remove them.
Fixes:
|
|
|
|
899385b04b |
binder: fix reversed pid/tid in log
The "pid:tid" format is used consistently throughout the driver's logs with the exception of this one place where the arguments are reversed. Let's fix that. Also, collapse a multi-line comment into a single line. Cc: Steven Moreland <smoreland@google.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250605141930.1069438-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
5a6acd563a |
binder_ioctl_write_read(): simplify control flow a bit
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
c26f4fbd58 |
Char/Misc/IIO pull request for 6.16-rc1
Here is the big char/misc/iio and other small driver subsystem pull
request for 6.16-rc1.
Overall, a lot of individual changes, but nothing major, just the normal
constant forward progress of new device support and cleanups to existing
subsystems. Highlights in here are:
- Large IIO driver updates and additions and device tree changes
- Android binder bugfixes and logfile fixes
- mhi driver updates
- comedi driver updates
- counter driver updates and additions
- coresight driver updates and additions
- echo driver removal as there are no in-kernel users of it
- nvmem driver updates
- spmi driver updates
- new amd-sbi driver "subsystem" and drivers added
- rust miscdriver binding documentation fix
- other small driver fixes and updates (uio, w1, acrn, hpet, xillybus,
cardreader drivers, fastrpc and others.)
All of these have been in linux-next for quite a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaEKg5Q8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ykUyACgmAzrzKMoQUwwhQ6ed2l7tHdrlOcAoIORI1/x
pNqQdrE1EbmAAyl47IN4
=ts6J
-----END PGP SIGNATURE-----
Merge tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc / iio driver updates from Greg KH:
"Here is the big char/misc/iio and other small driver subsystem pull
request for 6.16-rc1.
Overall, a lot of individual changes, but nothing major, just the
normal constant forward progress of new device support and cleanups to
existing subsystems. Highlights in here are:
- Large IIO driver updates and additions and device tree changes
- Android binder bugfixes and logfile fixes
- mhi driver updates
- comedi driver updates
- counter driver updates and additions
- coresight driver updates and additions
- echo driver removal as there are no in-kernel users of it
- nvmem driver updates
- spmi driver updates
- new amd-sbi driver "subsystem" and drivers added
- rust miscdriver binding documentation fix
- other small driver fixes and updates (uio, w1, acrn, hpet,
xillybus, cardreader drivers, fastrpc and others)
All of these have been in linux-next for quite a while with no
reported problems"
* tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (390 commits)
binder: fix yet another UAF in binder_devices
counter: microchip-tcb-capture: Add watch validation support
dt-bindings: iio: adc: Add ROHM BD79100G
iio: adc: add support for Nuvoton NCT7201
dt-bindings: iio: adc: add NCT7201 ADCs
iio: chemical: Add driver for SEN0322
dt-bindings: trivial-devices: Document SEN0322
iio: adc: ad7768-1: reorganize driver headers
iio: bmp280: zero-init buffer
iio: ssp_sensors: optimalize -> optimize
HID: sensor-hub: Fix typo and improve documentation
iio: admv1013: replace redundant ternary operator with just len
iio: chemical: mhz19b: Fix error code in probe()
iio: adc: at91-sama5d2: use IIO_DECLARE_BUFFER_WITH_TS
iio: accel: sca3300: use IIO_DECLARE_BUFFER_WITH_TS
iio: adc: ad7380: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: adc: ad4695: rename AD4695_MAX_VIN_CHANNELS
iio: adc: ad4695: use IIO_DECLARE_DMA_BUFFER_WITH_TS
iio: introduce IIO_DECLARE_BUFFER_WITH_TS macros
iio: make IIO_DMA_MINALIGN minimum of 8 bytes
...
|
|
|
|
6d5b940e1e |
vfs-6.16-rc1.async.dir
-----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBN6wAKCRCRxhvAZXjc ok32AQD9DTiSCAoVg+7s+gSBuLTi8drPTN++mCaxdTqRh5WpRAD9GVyrGQT0s6LH eo9bm8d1TAYjilEWM0c0K0TxyQ7KcAA= =IW7H -----END PGP SIGNATURE----- Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs directory lookup updates from Christian Brauner: "This contains cleanups for the lookup_one*() family of helpers. We expose a set of functions with names containing "lookup_one_len" and others without the "_len". This difference has nothing to do with "len". It's rater a historical accident that can be confusing. The functions without "_len" take a "mnt_idmap" pointer. This is found in the "vfsmount" and that is an important question when choosing which to use: do you have a vfsmount, or are you "inside" the filesystem. A related question is "is permission checking relevant here?". nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len functions. They pass nop_mnt_idmap and refuse to work on filesystems which have any other idmap. This work changes nfsd and cachefile to use the lookup_one family of functions and to explictily pass &nop_mnt_idmap which is consistent with all other vfs interfaces used where &nop_mnt_idmap is explicitly passed. The remaining uses of the "_one" functions do not require permission checks so these are renamed to be "_noperm" and the permission checking is removed. This series also changes these lookup function to take a qstr instead of separate name and len. In many cases this simplifies the call" * tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: VFS: change lookup_one_common and lookup_noperm_common to take a qstr Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS VFS: rename lookup_one_len family to lookup_noperm and remove permission check cachefiles: Use lookup_one() rather than lookup_one_len() nfsd: Use lookup_one() rather than lookup_one_len() VFS: improve interface for lookup_one functions |
|
|
|
9857af0fcf |
binder: fix yet another UAF in binder_devices
Commit |
|
|
|
57483a3627 |
binder: Create safe versions of binder log files
Binder defines several seq_files that can be accessed via debugfs or binderfs. Some of these files (e.g., 'state' and 'transactions') contain more granular information about binder's internal state that is helpful for debugging, but they also leak userspace address data through user-defined 'cookie' or 'ptr' values. Consequently, access to these files must be heavily restricted. Add two new files, 'state_hashed' and 'transactions_hashed', that reproduce the information in the original files but use the kernel's raw pointer obfuscation to hash any potential user addresses. This approach allows systems to grant broader access to the new files without having to change the security policy around the existing ones. In practice, userspace populates these fields with user addresses, but within the driver, these values only serve as unique identifiers for their associated binder objects. Consequently, binder logs can obfuscate these values and still retain meaning. While this strategy prevents leaking information about the userspace memory layout in the existing log files, it also decouples log messages about binder objects from their user-defined identifiers. Acked-by: Carlos Llamas <cmllamas@google.com> Tested-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com> Link: https://lore.kernel.org/r/20250510013435.1520671-7-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
91f1bbaa78 |
binder: Refactor binder_node print synchronization
The binder driver outputs information about each dead binder node by iterating over the dead nodes list, and it prints the state of each live node in the system by traversing each binder_proc's proc->nodes tree. Both cases require similar logic to maintain the global lock ordering while accessing each node. Create a helper function to synchronize around printing binder nodes in a list. Opportunistically make minor cosmetic changes to binder print functions. Acked-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://lore.kernel.org/r/20250510013435.1520671-5-ynaffit@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |
|
|
|
8c0a559825 |
binder: fix use-after-free in binderfs_evict_inode()
Running 'stress-ng --binderfs 16 --timeout 300' under KASAN-enabled
kernel, I've noticed the following:
BUG: KASAN: slab-use-after-free in binderfs_evict_inode+0x1de/0x2d0
Write of size 8 at addr ffff88807379bc08 by task stress-ng-binde/1699
CPU: 0 UID: 0 PID: 1699 Comm: stress-ng-binde Not tainted 6.14.0-rc7-g586de92313fc-dirty #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x1c2/0x2a0
? __pfx_dump_stack_lvl+0x10/0x10
? __pfx__printk+0x10/0x10
? __pfx_lock_release+0x10/0x10
? __virt_addr_valid+0x18c/0x540
? __virt_addr_valid+0x469/0x540
print_report+0x155/0x840
? __virt_addr_valid+0x18c/0x540
? __virt_addr_valid+0x469/0x540
? __phys_addr+0xba/0x170
? binderfs_evict_inode+0x1de/0x2d0
kasan_report+0x147/0x180
? binderfs_evict_inode+0x1de/0x2d0
binderfs_evict_inode+0x1de/0x2d0
? __pfx_binderfs_evict_inode+0x10/0x10
evict+0x524/0x9f0
? __pfx_lock_release+0x10/0x10
? __pfx_evict+0x10/0x10
? do_raw_spin_unlock+0x4d/0x210
? _raw_spin_unlock+0x28/0x50
? iput+0x697/0x9b0
__dentry_kill+0x209/0x660
? shrink_kill+0x8d/0x2c0
shrink_kill+0xa9/0x2c0
shrink_dentry_list+0x2e0/0x5e0
shrink_dcache_parent+0xa2/0x2c0
? __pfx_shrink_dcache_parent+0x10/0x10
? __pfx_lock_release+0x10/0x10
? __pfx_do_raw_spin_lock+0x10/0x10
do_one_tree+0x23/0xe0
shrink_dcache_for_umount+0xa0/0x170
generic_shutdown_super+0x67/0x390
kill_litter_super+0x76/0xb0
binderfs_kill_super+0x44/0x90
deactivate_locked_super+0xb9/0x130
cleanup_mnt+0x422/0x4c0
? lockdep_hardirqs_on+0x9d/0x150
task_work_run+0x1d2/0x260
? __pfx_task_work_run+0x10/0x10
resume_user_mode_work+0x52/0x60
syscall_exit_to_user_mode+0x9a/0x120
do_syscall_64+0x103/0x210
? asm_sysvec_apic_timer_interrupt+0x1a/0x20
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0xcac57b
Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8
RSP: 002b:00007ffecf4226a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007ffecf422720 RCX: 0000000000cac57b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffecf422850
RBP: 00007ffecf422850 R08: 0000000028d06ab1 R09: 7fffffffffffffff
R10: 3fffffffffffffff R11: 0000000000000246 R12: 00007ffecf422718
R13: 00007ffecf422710 R14: 00007f478f87b658 R15: 00007ffecf422830
</TASK>
Allocated by task 1705:
kasan_save_track+0x3e/0x80
__kasan_kmalloc+0x8f/0xa0
__kmalloc_cache_noprof+0x213/0x3e0
binderfs_binder_device_create+0x183/0xa80
binder_ctl_ioctl+0x138/0x190
__x64_sys_ioctl+0x120/0x1b0
do_syscall_64+0xf6/0x210
entry_SYSCALL_64_after_hwframe+0x77/0x7f
Freed by task 1705:
kasan_save_track+0x3e/0x80
kasan_save_free_info+0x46/0x50
__kasan_slab_free+0x62/0x70
kfree+0x194/0x440
evict+0x524/0x9f0
do_unlinkat+0x390/0x5b0
__x64_sys_unlink+0x47/0x50
do_syscall_64+0xf6/0x210
entry_SYSCALL_64_after_hwframe+0x77/0x7f
This 'stress-ng' workload causes the concurrent deletions from
'binder_devices' and so requires full-featured synchronization
to prevent list corruption.
I've found this issue independently but pretty sure that syzbot did
the same, so Reported-by: and Closes: should be applicable here as well.
Cc: stable@vger.kernel.org
Reported-by: syzbot+353d7b75658a95aa955a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=353d7b75658a95aa955a
Fixes:
|
|
|
|
4f822ad5ee |
Linux 6.15-rc4
-----BEGIN PGP SIGNATURE----- iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ b/dCGNg= =xDGd -----END PGP SIGNATURE----- Merge 6.15-rc4 into char-misc-next We need the char-misc fixes in here as well. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> |