Commit Graph

489 Commits

Author SHA1 Message Date
Linus Torvalds 7cd122b552 Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
dentries of created objects in dcache (and undo it when removing those).
 Reference is grabbed and not released, but it's not actually _stored_
 anywhere.  That works, but it's hard to follow and verify; among other
 things, we have no way to tell _which_ of the increments is intended
 to be an unpaired one.  Worse, on removal we need to decide whether
 the reference had already been dropped, which can be non-trivial if
 that removal is on umount and we need to figure out if this dentry is
 pinned due to e.g. unlink() not done.  Usually that is handled by using
 kill_litter_super() as ->kill_sb(), but there are open-coded special
 cases of the same (consider e.g. /proc/self).
 
 Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
 marking those "leaked" dentries.  Having it set claims responsibility
 for +1 in refcount.
 
 The end result this series is aiming for:
 
 * get these unbalanced dget() and dput() replaced with new primitives that
   would, in addition to adjusting refcount, set and clear persistency flag.
 * instead of having kill_litter_super() mess with removing the remaining
   "leaked" references (e.g. for all tmpfs files that hadn't been removed
   prior to umount), have the regular shrink_dcache_for_umount() strip
   DCACHE_PERSISTENT of all dentries, dropping the corresponding
   reference if it had been set.  After that kill_litter_super() becomes
   an equivalent of kill_anon_super().
 
 Doing that in a single step is not feasible - it would affect too many places
 in too many filesystems.  It has to be split into a series.
 
 This work has really started early in 2024; quite a few preliminary pieces
 have already gone into mainline.  This chunk is finally getting to the
 meat of that stuff - infrastructure and most of the conversions to it.
 
 Some pieces are still sitting in the local branches, but the bulk of
 that stuff is here.
 
 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaTEq1wAKCRBZ7Krx/gZQ
 643uAQC1rRslhw5l7OjxEpIYbGG4M+QaadN4Nf5Sr2SuTRaPJQD/W4oj/u4C2eCw
 Dd3q071tqyvm/PXNgN2EEnIaxlFUlwc=
 =rKq+
 -----END PGP SIGNATURE-----

Merge tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull persistent dentry infrastructure and conversion from Al Viro:
 "Some filesystems use a kinda-sorta controlled dentry refcount leak to
  pin dentries of created objects in dcache (and undo it when removing
  those). A reference is grabbed and not released, but it's not actually
  _stored_ anywhere.

  That works, but it's hard to follow and verify; among other things, we
  have no way to tell _which_ of the increments is intended to be an
  unpaired one. Worse, on removal we need to decide whether the
  reference had already been dropped, which can be non-trivial if that
  removal is on umount and we need to figure out if this dentry is
  pinned due to e.g. unlink() not done. Usually that is handled by using
  kill_litter_super() as ->kill_sb(), but there are open-coded special
  cases of the same (consider e.g. /proc/self).

  Things get simpler if we introduce a new dentry flag
  (DCACHE_PERSISTENT) marking those "leaked" dentries. Having it set
  claims responsibility for +1 in refcount.

  The end result this series is aiming for:

   - get these unbalanced dget() and dput() replaced with new primitives
     that would, in addition to adjusting refcount, set and clear
     persistency flag.

   - instead of having kill_litter_super() mess with removing the
     remaining "leaked" references (e.g. for all tmpfs files that hadn't
     been removed prior to umount), have the regular
     shrink_dcache_for_umount() strip DCACHE_PERSISTENT of all dentries,
     dropping the corresponding reference if it had been set. After that
     kill_litter_super() becomes an equivalent of kill_anon_super().

  Doing that in a single step is not feasible - it would affect too many
  places in too many filesystems. It has to be split into a series.

  This work has really started early in 2024; quite a few preliminary
  pieces have already gone into mainline. This chunk is finally getting
  to the meat of that stuff - infrastructure and most of the conversions
  to it.

  Some pieces are still sitting in the local branches, but the bulk of
  that stuff is here"

* tag 'pull-persistency' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (54 commits)
  d_make_discardable(): warn if given a non-persistent dentry
  kill securityfs_recursive_remove()
  convert securityfs
  get rid of kill_litter_super()
  convert rust_binderfs
  convert nfsctl
  convert rpc_pipefs
  convert hypfs
  hypfs: swich hypfs_create_u64() to returning int
  hypfs: switch hypfs_create_str() to returning int
  hypfs: don't pin dentries twice
  convert gadgetfs
  gadgetfs: switch to simple_remove_by_name()
  convert functionfs
  functionfs: switch to simple_remove_by_name()
  functionfs: fix the open/removal races
  functionfs: need to cancel ->reset_work in ->kill_sb()
  functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
  functionfs: don't abuse ffs_data_closed() on fs shutdown
  convert selinuxfs
  ...
2025-12-05 14:36:21 -08:00
Linus Torvalds 8f7aa3d3c7 Networking changes for 6.19.
Core & protocols
 ----------------
 
  - Replace busylock at the Tx queuing layer with a lockless list. Resulting
    in a 300% (4x) improvement on heavy TX workloads, sending twice the
    number of packets per second, for half the cpu cycles.
 
  - Allow constantly busy flows to migrate to a more suitable CPU/NIC
    queue. Normally we perform queue re-selection when flow comes out
    of idle, but under extreme circumstances the flows may be constantly
    busy. Add sysctl to allow periodic rehashing even if it'd risk packet
    reordering.
 
  - Optimize the NAPI skb cache, make it larger, use it in more paths.
 
  - Attempt returning Tx skbs to the originating CPU (like we already did
    for Rx skbs).
 
  - Various data structure layout and prefetch optimizations from Eric.
 
  - Remove ktime_get() from the recvmsg() fast path, ktime_get() is sadly
    quite expensive on recent AMD machines.
 
  - Extend threaded NAPI polling to allow the kthread busy poll for packets.
 
  - Make MPTCP use Rx backlog processing. This lowers the lock pressure,
    improving the Rx performance.
 
  - Support memcg accounting of MPTCP socket memory.
 
  - Allow admin to opt sockets out of global protocol memory accounting
    (using a sysctl or BPF-based policy). The global limits are a poor fit
    for modern container workloads, where limits are imposed using cgroups.
 
  - Improve heuristics for when to kick off AF_UNIX garbage collection.
 
  - Allow users to control TCP SACK compression, and default to 33% of RTT.
 
  - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid unnecessarily
    aggressive rcvbuf growth and overshot when the connection RTT is low.
 
  - Preserve skb metadata space across skb_push / skb_pull operations.
 
  - Support for IPIP encapsulation in the nftables flowtable offload.
 
  - Support appending IP interface information to ICMP messages (RFC 5837).
 
  - Support setting max record size in TLS (RFC 8449).
 
  - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.
 
  - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.
 
  - Let users configure the number of write buffers in SMC.
 
  - Add new struct sockaddr_unsized for sockaddr of unknown length,
    from Kees.
 
  - Some conversions away from the crypto_ahash API, from Eric Biggers.
 
  - Some preparations for slimming down struct page.
 
  - YAML Netlink protocol spec for WireGuard.
 
  - Add a tool on top of YAML Netlink specs/lib for reporting commonly
    computed derived statistics and summarized system state.
 
 Driver API
 ----------
 
  - Add CAN XL support to the CAN Netlink interface.
 
  - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics,
    as defined by the OPEN Alliance's "Advanced diagnostic features
    for 100BASE-T1 automotive Ethernet PHYs" specification.
 
  - Add DPLL phase-adjust-gran pin attribute (and implement it in zl3073x).
 
  - Refactor xfrm_input lock to reduce contention when NIC offloads IPsec
    and performs RSS.
 
  - Add info to devlink params whether the current setting is the default
    or a user override. Allow resetting back to default.
 
  - Add standard device stats for PSP crypto offload.
 
  - Leverage DSA frame broadcast to implement simple HSR frame duplication
    for a lot of switches without dedicated HSR offload.
 
  - Add uAPI defines for 1.6Tbps link modes.
 
 Device drivers
 --------------
 
  - Add Motorcomm YT921x gigabit Ethernet switch support.
 
  - Add MUCSE driver for N500/N210 1GbE NIC series.
 
  - Convert drivers to support dedicated ops for timestamping control,
    and away from the direct IOCTL handling. While at it support GET
    operations for PHY timestamping.
 
  - Add (and convert most drivers to) a dedicated ethtool callback
    for reading the Rx ring count.
 
  - Significant refactoring efforts in the STMMAC driver, which supports
    Synopsys turn-key MAC IP integrated into a ton of SoCs.
 
  - Ethernet high-speed NICs:
    - Broadcom (bnxt):
      - support PPS in/out on all pins
    - Intel (100G, ice, idpf):
      - ice: implement standard ethtool and timestamping stats
      - i40e: support setting the max number of MAC addresses per VF
      - iavf: support RSS of GTP tunnels for 5G and LTE deployments
    - nVidia/Mellanox (mlx5):
      - reduce downtime on interface reconfiguration
      - disable being an XDP redirect target by default (same as other
        drivers) to avoid wasting resources if feature is unused
    - Meta (fbnic):
      - add support for Linux-managed PCS on 25G, 50G, and 100G links
    - Wangxun:
      - support Rx descriptor merge, and Tx head writeback
      - support Rx coalescing offload
      - support 25G SPF and 40G QSFP modules
 
  - Ethernet virtual:
    - Google (gve):
      - allow ethtool to configure rx_buf_len
      - implement XDP HW RX Timestamping support for DQ descriptor format
    - Microsoft vNIC (mana):
      - support HW link state events
      - handle hardware recovery events when probing the device
 
  - Ethernet NICs consumer, and embedded:
    - usbnet: add support for Byte Queue Limits (BQL)
    - AMD (amd-xgbe):
      - add device selftests
    - NXP (enetc):
      - add i.MX94 support
    - Broadcom integrated MACs (bcmgenet, bcmasp):
      - bcmasp: add support for PHY-based Wake-on-LAN
    - Broadcom switches (b53):
      - support port isolation
      - support BCM5389/97/98 and BCM63XX ARL formats
    - Lantiq/MaxLinear switches:
      - support bridge FDB entries on the CPU port
      - use regmap for register access
      - allow user to enable/disable learning
      - support Energy Efficient Ethernet
      - support configuring RMII clock delays
      - add tagging driver for MaxLinear GSW1xx switches
    - Synopsys (stmmac):
      - support using the HW clock in free running mode
      - add Eswin EIC7700 support
      - add Rockchip RK3506 support
      - add Altera Agilex5 support
    - Cadence (macb):
      - cleanup and consolidate descriptor and DMA address handling
      - add EyeQ5 support
    - TI:
      - icssg-prueth: support AF_XDP
    - Airoha access points:
      - add missing Ethernet stats and link state callback
      - add AN7583 support
      - support out-of-order Tx completion processing
    - Power over Ethernet:
      - pd692x0: preserve PSE configuration across reboots
      - add support for TPS23881B devices
 
  - Ethernet PHYs:
    - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
    - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
    - micrel:
      - support for non PTP SKUs of lan8814
      - enable in-band auto-negotiation on lan8814
    - realtek:
      - cable testing support on RTL8224
      - interrupt support on RTL8221B
    - motorcomm: support for PHY LEDs on YT853
    - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
    - mscc: support for PHY LED control
 
  - CAN drivers:
    - m_can: add support for optional reset and system wake up
    - remove can_change_mtu() obsoleted by core handling
    - mcp251xfd: support GPIO controller functionality
 
  - Bluetooth:
    - add initial support for PASTa
 
  - WiFi:
    - split ieee80211.h file, it's way too big
    - improvements in VHT radiotap reporting, S1G, Channel Switch
      Announcement handling, rate tracking in mesh networks
    - improve multi-radio monitor mode support, and add a cfg80211 debugfs
      interface for it
    - HT action frame handling on 6 GHz
    - initial chanctx work towards NAN
    - MU-MIMO sniffer improvements
 
  - WiFi drivers:
    - RealTek (rtw89):
      - support USB devices RTL8852AU and RTL8852CU
      - initial work for RTL8922DE
      - improved injection support
    - Intel:
      - iwlwifi: new sniffer API support
    - MediaTek (mt76):
      - WED support for >32-bit DMA
      - airoha NPU support
      - regdomain improvements
      - continued WiFi7/MLO work
    - Qualcomm/Atheros:
      - ath10k: factory test support
      - ath11k: TX power insertion support
      - ath12k: BSS color change support
      - ath12k: statistics improvements
    - brcmfmac: Acer A1 840 tablet quirk
    - rtl8xxxu: 40 MHz connection fixes/support
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmkveRQACgkQMUZtbf5S
 IrvY7A/+Nb0o4BxLHjPkAl1m3t3q2d0Y29B7SNkwnwEtxAV8EkNeZ3GWrdtDnTQY
 MYhmc7LEzvz8/lihapr7UJkcokzSASUV54hbez5jDBKC8EEoyUk8FdWDPerwlcRI
 zmCFNAVFyh9GX8i7wcrzKbDTHT5+GZLbSlGl9U5mhLsDdRlJgH7d8PJ7vWcmtLFY
 XN0paDyaeHfCl8wReWNAYx4C/I0ODOvlscpO0tnAKhB0ngJbQCKY2t6tn3rOYdif
 ZSQ5KwVRnJtQ4fYOFMOy9+FSCjVXtyrxF8KLxD+mqom2ZhmO00UpOMl09tqhq3uT
 WnvwoHUVBt6F+iITHwg5kMgIDPUq1kpUvL4S4UbVSuUm9ZKD+4KRU2ZHRBYMx+MU
 bsqmtY8/IULClUoRz+tZhltA8eb0NEqNZE2JPOFDiJHn1YiCCkFwxibhir893oM3
 sB7x65D7LQI2ty2BBGVGYnwYDPtyaxOA/s3WTwPvLEi3+Y/TGNIIrS9lBLA4U+Yr
 Gi93WQGVjttMmVyaHgXBUGmi3L52hvolm0AZ8zSRGrnIEpecjhly2KfYuaOzuxXC
 IHEQ6AFLdRh6JzafXGb/mQwGCHNmhwsY8A49i94fakWQamaL/L6A+1dyPu4LXMqi
 NwqCmlVb/LKGlfNG+V4wT27srJ+yBA2Vk3tpR1sZQQytFh0LKHI=
 =UoDR
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core & protocols:

   - Replace busylock at the Tx queuing layer with a lockless list.

     Resulting in a 300% (4x) improvement on heavy TX workloads, sending
     twice the number of packets per second, for half the cpu cycles.

   - Allow constantly busy flows to migrate to a more suitable CPU/NIC
     queue.

     Normally we perform queue re-selection when flow comes out of idle,
     but under extreme circumstances the flows may be constantly busy.

     Add sysctl to allow periodic rehashing even if it'd risk packet
     reordering.

   - Optimize the NAPI skb cache, make it larger, use it in more paths.

   - Attempt returning Tx skbs to the originating CPU (like we already
     did for Rx skbs).

   - Various data structure layout and prefetch optimizations from Eric.

   - Remove ktime_get() from the recvmsg() fast path, ktime_get() is
     sadly quite expensive on recent AMD machines.

   - Extend threaded NAPI polling to allow the kthread busy poll for
     packets.

   - Make MPTCP use Rx backlog processing. This lowers the lock
     pressure, improving the Rx performance.

   - Support memcg accounting of MPTCP socket memory.

   - Allow admin to opt sockets out of global protocol memory accounting
     (using a sysctl or BPF-based policy). The global limits are a poor
     fit for modern container workloads, where limits are imposed using
     cgroups.

   - Improve heuristics for when to kick off AF_UNIX garbage collection.

   - Allow users to control TCP SACK compression, and default to 33% of
     RTT.

   - Add tcp_rcvbuf_low_rtt sysctl to let datacenter users avoid
     unnecessarily aggressive rcvbuf growth and overshot when the
     connection RTT is low.

   - Preserve skb metadata space across skb_push / skb_pull operations.

   - Support for IPIP encapsulation in the nftables flowtable offload.

   - Support appending IP interface information to ICMP messages (RFC
     5837).

   - Support setting max record size in TLS (RFC 8449).

   - Remove taking rtnl_lock from RTM_GETNEIGHTBL and RTM_SETNEIGHTBL.

   - Use a dedicated lock (and RCU) in MPLS, instead of rtnl_lock.

   - Let users configure the number of write buffers in SMC.

   - Add new struct sockaddr_unsized for sockaddr of unknown length,
     from Kees.

   - Some conversions away from the crypto_ahash API, from Eric Biggers.

   - Some preparations for slimming down struct page.

   - YAML Netlink protocol spec for WireGuard.

   - Add a tool on top of YAML Netlink specs/lib for reporting commonly
     computed derived statistics and summarized system state.

  Driver API:

   - Add CAN XL support to the CAN Netlink interface.

   - Add uAPI for reporting PHY Mean Square Error (MSE) diagnostics, as
     defined by the OPEN Alliance's "Advanced diagnostic features for
     100BASE-T1 automotive Ethernet PHYs" specification.

   - Add DPLL phase-adjust-gran pin attribute (and implement it in
     zl3073x).

   - Refactor xfrm_input lock to reduce contention when NIC offloads
     IPsec and performs RSS.

   - Add info to devlink params whether the current setting is the
     default or a user override. Allow resetting back to default.

   - Add standard device stats for PSP crypto offload.

   - Leverage DSA frame broadcast to implement simple HSR frame
     duplication for a lot of switches without dedicated HSR offload.

   - Add uAPI defines for 1.6Tbps link modes.

  Device drivers:

   - Add Motorcomm YT921x gigabit Ethernet switch support.

   - Add MUCSE driver for N500/N210 1GbE NIC series.

   - Convert drivers to support dedicated ops for timestamping control,
     and away from the direct IOCTL handling. While at it support GET
     operations for PHY timestamping.

   - Add (and convert most drivers to) a dedicated ethtool callback for
     reading the Rx ring count.

   - Significant refactoring efforts in the STMMAC driver, which
     supports Synopsys turn-key MAC IP integrated into a ton of SoCs.

   - Ethernet high-speed NICs:
      - Broadcom (bnxt):
         - support PPS in/out on all pins
      - Intel (100G, ice, idpf):
         - ice: implement standard ethtool and timestamping stats
         - i40e: support setting the max number of MAC addresses per VF
         - iavf: support RSS of GTP tunnels for 5G and LTE deployments
      - nVidia/Mellanox (mlx5):
         - reduce downtime on interface reconfiguration
         - disable being an XDP redirect target by default (same as
           other drivers) to avoid wasting resources if feature is
           unused
      - Meta (fbnic):
         - add support for Linux-managed PCS on 25G, 50G, and 100G links
      - Wangxun:
         - support Rx descriptor merge, and Tx head writeback
         - support Rx coalescing offload
         - support 25G SPF and 40G QSFP modules

   - Ethernet virtual:
      - Google (gve):
         - allow ethtool to configure rx_buf_len
         - implement XDP HW RX Timestamping support for DQ descriptor
           format
      - Microsoft vNIC (mana):
         - support HW link state events
         - handle hardware recovery events when probing the device

   - Ethernet NICs consumer, and embedded:
      - usbnet: add support for Byte Queue Limits (BQL)
      - AMD (amd-xgbe):
         - add device selftests
      - NXP (enetc):
         - add i.MX94 support
      - Broadcom integrated MACs (bcmgenet, bcmasp):
         - bcmasp: add support for PHY-based Wake-on-LAN
      - Broadcom switches (b53):
         - support port isolation
         - support BCM5389/97/98 and BCM63XX ARL formats
      - Lantiq/MaxLinear switches:
         - support bridge FDB entries on the CPU port
         - use regmap for register access
         - allow user to enable/disable learning
         - support Energy Efficient Ethernet
         - support configuring RMII clock delays
         - add tagging driver for MaxLinear GSW1xx switches
      - Synopsys (stmmac):
         - support using the HW clock in free running mode
         - add Eswin EIC7700 support
         - add Rockchip RK3506 support
         - add Altera Agilex5 support
      - Cadence (macb):
         - cleanup and consolidate descriptor and DMA address handling
         - add EyeQ5 support
      - TI:
         - icssg-prueth: support AF_XDP
      - Airoha access points:
         - add missing Ethernet stats and link state callback
         - add AN7583 support
         - support out-of-order Tx completion processing
      - Power over Ethernet:
         - pd692x0: preserve PSE configuration across reboots
         - add support for TPS23881B devices

   - Ethernet PHYs:
      - Open Alliance OATC14 10BASE-T1S PHY cable diagnostic support
      - Support 50G SerDes and 100G interfaces in Linux-managed PHYs
      - micrel:
         - support for non PTP SKUs of lan8814
         - enable in-band auto-negotiation on lan8814
      - realtek:
         - cable testing support on RTL8224
         - interrupt support on RTL8221B
      - motorcomm: support for PHY LEDs on YT853
      - microchip: support for LAN867X Rev.D0 PHYs w/ SQI and cable diag
      - mscc: support for PHY LED control

   - CAN drivers:
      - m_can: add support for optional reset and system wake up
      - remove can_change_mtu() obsoleted by core handling
      - mcp251xfd: support GPIO controller functionality

   - Bluetooth:
      - add initial support for PASTa

   - WiFi:
      - split ieee80211.h file, it's way too big
      - improvements in VHT radiotap reporting, S1G, Channel Switch
        Announcement handling, rate tracking in mesh networks
      - improve multi-radio monitor mode support, and add a cfg80211
        debugfs interface for it
      - HT action frame handling on 6 GHz
      - initial chanctx work towards NAN
      - MU-MIMO sniffer improvements

   - WiFi drivers:
      - RealTek (rtw89):
         - support USB devices RTL8852AU and RTL8852CU
         - initial work for RTL8922DE
         - improved injection support
      - Intel:
         - iwlwifi: new sniffer API support
      - MediaTek (mt76):
         - WED support for >32-bit DMA
         - airoha NPU support
         - regdomain improvements
         - continued WiFi7/MLO work
      - Qualcomm/Atheros:
         - ath10k: factory test support
         - ath11k: TX power insertion support
         - ath12k: BSS color change support
         - ath12k: statistics improvements
      - brcmfmac: Acer A1 840 tablet quirk
      - rtl8xxxu: 40 MHz connection fixes/support"

* tag 'net-next-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1381 commits)
  net: page_pool: sanitise allocation order
  net: page pool: xa init with destroy on pp init
  net/mlx5e: Support XDP target xmit with dummy program
  net/mlx5e: Update XDP features in switch channels
  selftests/tc-testing: Test CAKE scheduler when enqueue drops packets
  net/sched: sch_cake: Fix incorrect qlen reduction in cake_drop
  wireguard: netlink: generate netlink code
  wireguard: uapi: generate header with ynl-gen
  wireguard: uapi: move flag enums
  wireguard: uapi: move enum wg_cmd
  wireguard: netlink: add YNL specification
  selftests: drv-net: Fix tolerance calculation in devlink_rate_tc_bw.py
  selftests: drv-net: Fix and clarify TC bandwidth split in devlink_rate_tc_bw.py
  selftests: drv-net: Set shell=True for sysfs writes in devlink_rate_tc_bw.py
  selftests: drv-net: Use Iperf3Runner in devlink_rate_tc_bw.py
  selftests: drv-net: introduce Iperf3Runner for measurement use cases
  selftests: drv-net: Add devlink_rate_tc_bw.py to TEST_PROGS
  net: ps3_gelic_net: Use napi_alloc_skb() and napi_gro_receive()
  Documentation: net: dsa: mention simple HSR offload helpers
  Documentation: net: dsa: mention availability of RedBox
  ...
2025-12-03 17:24:33 -08:00
Linus Torvalds 784faa8eca Rust changes for v6.19
Toolchain and infrastructure:
 
  - Add support for 'syn'.
 
      Syn is a parsing library for parsing a stream of Rust tokens into a
      syntax tree of Rust source code.
 
      Currently this library is geared toward use in Rust procedural
      macros, but contains some APIs that may be useful more generally.
 
    'syn' allows us to greatly simplify writing complex macros such as
    'pin-init' (Benno has already prepared the 'syn'-based version). We
    will use it in the 'macros' crate too.
 
    'syn' is the most downloaded Rust crate (according to crates.io), and
    it is also used by the Rust compiler itself. While the amount of code
    is substantial, there should not be many updates needed for these
    crates, and even if there are, they should not be too big, e.g. +7k
    -3k lines across the 3 crates in the last year.
 
    'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
    I only modified their code to remove a third dependency
    ('unicode-ident') and to add the SPDX identifiers. The code can be
    easily verified to exactly match upstream with the provided scripts.
 
    They are all licensed under "Apache-2.0 OR MIT", like the other
    vendored 'alloc' crate we had for a while.
 
    Please see the merge commit with the cover letter for more context.
 
  - Allow 'unreachable_pub' and 'clippy::disallowed_names' for doctests.
 
    Examples (i.e. doctests) may want to do things like show public items
    and use names such as 'foo'.
 
    Nevertheless, we still try to keep examples as close to real code as
    possible (this is part of why running Clippy on doctests is important
    for us, e.g. for safety comments, which userspace Rust does not
    support yet but we are stricter).
 
 'kernel' crate:
 
  - Replace our custom 'CStr' type with 'core::ffi::CStr'.
 
    Using the standard library type reduces our custom code footprint,
    and we retain needed custom functionality through an extension trait
    and a new 'fmt!' macro which replaces the previous 'core' import.
 
    This started in 6.17 and continued in 6.18, and we finally land the
    replacement now. This required quite some stamina from Tamir, who
    split the changes in steps to prepare for the flag day change here.
 
  - Replace 'kernel::c_str!' with C string literals.
 
    C string literals were added in Rust 1.77, which produce '&CStr's
    (the 'core' one), so now we can write:
 
        c"hi"
 
    instead of:
 
        c_str!("hi")
 
  - Add 'num' module for numerical features.
 
    It includes the 'Integer' trait, implemented for all primitive
    integer types.
 
    It also includes the 'Bounded' integer wrapping type: an integer
    value that requires only the 'N' less significant bits of the wrapped
    type to be encoded:
 
        // An unsigned 8-bit integer, of which only the 4 LSBs are used.
        let v = Bounded::<u8, 4>:🆕:<15>();
        assert_eq!(v.get(), 15);
 
    'Bounded' is useful to e.g. enforce guarantees when working with
    bitfields that have an arbitrary number of bits.
 
    Values can be constructed from simple non-constant expressions or,
    for more complex ones, validated at runtime.
 
    'Bounded' also comes with comparison and arithmetic operations (with
    both their backing type and other 'Bounded's with a compatible
    backing type), casts to change the backing type, extending/shrinking
    and infallible/fallible conversions from/to primitives as applicable.
 
  - 'rbtree' module: add immutable cursor ('Cursor').
 
    It enables to use just an immutable tree reference where appropriate.
    The existing fully-featured mutable cursor is renamed to 'CursorMut'.
 
 kallsyms:
 
  - Fix wrong "big" kernel symbol type read from procfs.
 
 'pin-init' crate:
 
  - A couple minor fixes (Benno asked me to pick these patches up for
    him this cycle).
 
 Documentation:
 
  - Quick Start guide: add Debian 13 (Trixie).
 
    Debian Stable is now able to build Linux, since Debian 13 (released
    2025-08-09) packages Rust 1.85.0, which is recent enough.
 
    We are planning to propose that the minimum supported Rust version in
    Linux follows Debian Stable releases, with Debian 13 being the first
    one we upgrade to, i.e. Rust 1.85.
 
 MAINTAINERS:
 
  - Add entry for the new 'num' module.
 
  - Remove Alex as Rust maintainer: he hasn't had the time to contribute
    for a few years now, so it is a no-op change in practice.
 
 And a few other cleanups and improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEPjU5OPd5QIZ9jqqOGXyLc2htIW0FAmks0WkACgkQGXyLc2ht
 IW3QQg/+LpBmrz0ZSKH24kcX3x/hpfA2Erd4cmn+qjXev9RSAM1bt9xf3dsAhhFd
 BStUpf0aglHOSEuWAvNHqEb5+yu+6qy6EFXXqH0ASexHK93t77Jbztjtf3SMlykT
 N/lSJ+LWw2WiRT0NRWoTfKaEWzZQ8j9fi9Jb/IGNZGdNMryisVUYWqzLwNupPuK+
 RMcEitHdO2NWjyodk2GGRyYQ7+XxQgbXZoxtgeubPSrrmGuGTXV42RlQKC2KHPx3
 gz6CwcO3Xd0bGHHSgP32QDtGRJtniO8iXBKxiooT+ys+M83fTKbwNrIrW3tHdheY
 765qsd/NvUmAkcgTCoLqj5biU6LCsepyimNg1vf4pYFohBoTaGeN+UqzbXBrSjy2
 pmrgxwMRVHsYz+zoSKAVKJl7ASba5BXFdI4Whgfqwwc9So/X7uyujIYXGbRoznCV
 W5vu7OUboBy26NvcsPrf6BqWcsJEpGV/M4z2UBRjAoJTRGQMcm/ckuo/GfYm3yW+
 bUW62UmVCdY5crpo7XPH/G4ZGBR/k3p9dLVt8OJxEoTlfw4KDE5BszJoXmejZqdi
 9LEMhzTWwoFp9NspQuEGdYdfGRlfG6XXqrwGZtQI+dlc4RvFEgBBu2Lxotq+Ods0
 EfCVCJQjWmyCodVdJ/QqbCRFuXtOFLr/hPdWnvlrRxVkPtF2CDw=
 =9nM+
 -----END PGP SIGNATURE-----

Merge tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust updates from Miguel Ojeda:
 "Toolchain and infrastructure:

   - Add support for 'syn'.

     Syn is a parsing library for parsing a stream of Rust tokens into a
     syntax tree of Rust source code.

     Currently this library is geared toward use in Rust procedural
     macros, but contains some APIs that may be useful more generally.

     'syn' allows us to greatly simplify writing complex macros such as
     'pin-init' (Benno has already prepared the 'syn'-based version). We
     will use it in the 'macros' crate too.

     'syn' is the most downloaded Rust crate (according to crates.io),
     and it is also used by the Rust compiler itself. While the amount
     of code is substantial, there should not be many updates needed for
     these crates, and even if there are, they should not be too big,
     e.g. +7k -3k lines across the 3 crates in the last year.

     'syn' requires two smaller dependencies: 'quote' and 'proc-macro2'.
     I only modified their code to remove a third dependency
     ('unicode-ident') and to add the SPDX identifiers. The code can be
     easily verified to exactly match upstream with the provided
     scripts.

     They are all licensed under "Apache-2.0 OR MIT", like the other
     vendored 'alloc' crate we had for a while.

     Please see the merge commit with the cover letter for more context.

   - Allow 'unreachable_pub' and 'clippy::disallowed_names' for
     doctests.

     Examples (i.e. doctests) may want to do things like show public
     items and use names such as 'foo'.

     Nevertheless, we still try to keep examples as close to real code
     as possible (this is part of why running Clippy on doctests is
     important for us, e.g. for safety comments, which userspace Rust
     does not support yet but we are stricter).

  'kernel' crate:

   - Replace our custom 'CStr' type with 'core::ffi::CStr'.

     Using the standard library type reduces our custom code footprint,
     and we retain needed custom functionality through an extension
     trait and a new 'fmt!' macro which replaces the previous 'core'
     import.

     This started in 6.17 and continued in 6.18, and we finally land the
     replacement now. This required quite some stamina from Tamir, who
     split the changes in steps to prepare for the flag day change here.

   - Replace 'kernel::c_str!' with C string literals.

     C string literals were added in Rust 1.77, which produce '&CStr's
     (the 'core' one), so now we can write:

         c"hi"

     instead of:

         c_str!("hi")

   - Add 'num' module for numerical features.

     It includes the 'Integer' trait, implemented for all primitive
     integer types.

     It also includes the 'Bounded' integer wrapping type: an integer
     value that requires only the 'N' least significant bits of the
     wrapped type to be encoded:

         // An unsigned 8-bit integer, of which only the 4 LSBs are used.
         let v = Bounded::<u8, 4>:🆕:<15>();
         assert_eq!(v.get(), 15);

     'Bounded' is useful to e.g. enforce guarantees when working with
     bitfields that have an arbitrary number of bits.

     Values can also be constructed from simple non-constant expressions
     or, for more complex ones, validated at runtime.

     'Bounded' also comes with comparison and arithmetic operations
     (with both their backing type and other 'Bounded's with a
     compatible backing type), casts to change the backing type,
     extending/shrinking and infallible/fallible conversions from/to
     primitives as applicable.

   - 'rbtree' module: add immutable cursor ('Cursor').

     It enables to use just an immutable tree reference where
     appropriate. The existing fully-featured mutable cursor is renamed
     to 'CursorMut'.

  kallsyms:

   - Fix wrong "big" kernel symbol type read from procfs.

  'pin-init' crate:

   - A couple minor fixes (Benno asked me to pick these patches up for
     him this cycle).

  Documentation:

   - Quick Start guide: add Debian 13 (Trixie).

     Debian Stable is now able to build Linux, since Debian 13 (released
     2025-08-09) packages Rust 1.85.0, which is recent enough.

     We are planning to propose that the minimum supported Rust version
     in Linux follows Debian Stable releases, with Debian 13 being the
     first one we upgrade to, i.e. Rust 1.85.

  MAINTAINERS:

   - Add entry for the new 'num' module.

   - Remove Alex as Rust maintainer: he hasn't had the time to
     contribute for a few years now, so it is a no-op change in
     practice.

  And a few other cleanups and improvements"

* tag 'rust-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux: (53 commits)
  rust: macros: support `proc-macro2`, `quote` and `syn`
  rust: syn: enable support in kbuild
  rust: syn: add `README.md`
  rust: syn: remove `unicode-ident` dependency
  rust: syn: add SPDX License Identifiers
  rust: syn: import crate
  rust: quote: enable support in kbuild
  rust: quote: add `README.md`
  rust: quote: add SPDX License Identifiers
  rust: quote: import crate
  rust: proc-macro2: enable support in kbuild
  rust: proc-macro2: add `README.md`
  rust: proc-macro2: remove `unicode_ident` dependency
  rust: proc-macro2: add SPDX License Identifiers
  rust: proc-macro2: import crate
  rust: kbuild: support using libraries in `rustc_procmacro`
  rust: kbuild: support skipping flags in `rustc_test_library`
  rust: kbuild: add proc macro library support
  rust: kbuild: simplify `--cfg` handling
  rust: kbuild: introduce `core-flags` and `core-skip_flags`
  ...
2025-12-03 14:16:49 -08:00
Asbjørn Sloth Tønnesen 68e83f3472 tools: ynl-gen: add regeneration comment
Add a comment on regeneration to the generated files.

The comment is placed after the YNL-GEN line[1], as to not interfere
with ynl-regen.sh's detection logic.

[1] and after the optional YNL-ARG line.

Link: https://lore.kernel.org/r/aR5m174O7pklKrMR@zx2c4.com/
Suggested-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Asbjørn Sloth Tønnesen <ast@fiberby.net>
Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20251120174429.390574-3-ast@fiberby.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25 19:20:42 -08:00
Al Viro 4433d8e25d convert rust_binderfs
Parallel to binderfs stuff:
	* use simple_start_creating()/simple_done_creating()/d_make_persistent()
instead of manual inode_lock()/lookup_noperm()/d_instanitate()/inode_unlock().
	* allocate inode first - simpler cleanup that way.
	* use simple_recursive_removal() instead of open-coding it.
	* switch to kill_anon_super()

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-17 23:59:27 -05:00
Vitaly Wool f56b131723 rust: rbtree: add immutable cursor
Sometimes we may need to iterate over, or find an element in a read
only (or read mostly) red-black tree, and in that case we don't need a
mutable reference to the tree, which we'll however have to take to be
able to use the current (mutable) cursor implementation.

This patch adds a simple immutable cursor implementation to RBTree,
which enables us to use an immutable tree reference. The existing
(fully featured) cursor implementation is renamed to CursorMut,
while retaining its functionality.

The only existing user of the [mutable] cursor for RBTrees (binder) is
updated to match the changes.

Signed-off-by: Vitaly Wool <vitaly.wool@konsulko.se>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251014123339.2492210-1-vitaly.wool@konsulko.se
[ Applied `rustfmt`. Added intra-doc link. Fixed unclosed example.
  Fixed docs description. Fixed typo and other formatting nits.
    - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-11-16 21:56:57 +01:00
Al Viro b89aa54482 convert binderfs
Objects are created either by d_alloc_name()+d_add() (in
binderfs_ctl_create()) or by simple_start_creating()+d_instantiate().
Removals are by simple_recurisive_removal().

Switch d_add()/d_instantiate() to d_make_persistent() + dput().
Voila - kill_litter_super() is not needed anymore.

Fold dput()+unlocking the parent into simple_done_creating(), while
we are at it.

NOTE: return value of binderfs_create_file() is borrowed; it may get
stored in proc->binderfs_entry.  See binder_release()...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:05 -05:00
Al Viro 02da8d2c09 binderfs_binder_ctl_create(): kill a bogus check
It's called once, during binderfs mount, right after allocating
root dentry.  Checking that it hadn't been already called is
only obfuscating things.

Looks like that bogosity had been copied from devpts...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Al Viro 185d241c88 binderfs: use simple_start_creating()
binderfs_binder_device_create() gets simpler, binderfs_create_dentry() simply
goes away...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-11-16 01:35:04 -05:00
Alice Ryhl d90eeb8ecd binder: remove "invalid inc weak" check
There are no scenarios where a weak increment is invalid on binder_node.
The only possible case where it could be invalid is if the kernel
delivers BR_DECREFS to the process that owns the node, and then
increments the weak refcount again, effectively "reviving" a dead node.

However, that is not possible: when the BR_DECREFS command is delivered,
the kernel removes and frees the binder_node. The fact that you were
able to call binder_inc_node_nilocked() implies that the node is not yet
destroyed, which implies that BR_DECREFS has not been delivered to
userspace, so incrementing the weak refcount is valid.

Note that it's currently possible to trigger this condition if the owner
calls BINDER_THREAD_EXIT while node->has_weak_ref is true. This causes
BC_INCREFS on binder_ref instances to fail when they should not.

Cc: stable@vger.kernel.org
Fixes: 457b9a6f09 ("Staging: android: add binder driver")
Reported-by: Yu-Ting Tseng <yutingtseng@google.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20251015-binder-weak-inc-v1-1-7914b092c371@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-22 08:04:15 +02:00
Tamir Duberstein 3b83f5d5e7 rust: replace `CStr` with `core::ffi::CStr`
`kernel::ffi::CStr` was introduced in commit d126d23801 ("rust: str:
add `CStr` type") in November 2022 as an upstreaming of earlier work
that was done in May 2021[0]. That earlier work, having predated the
inclusion of `CStr` in `core`, largely duplicated the implementation of
`std::ffi::CStr`.

`std::ffi::CStr` was moved to `core::ffi::CStr` in Rust 1.64 in
September 2022. Hence replace `kernel::str::CStr` with `core::ffi::CStr`
to reduce our custom code footprint, and retain needed custom
functionality through an extension trait.

Add `CStr` to `ffi` and the kernel prelude.

Link: faa3cbcca0 [0]
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Danilo Krummrich <dakr@kernel.org>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-16-9378a54385f8@gmail.com
[ Removed assert that would now depend on the Rust version. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-22 07:47:27 +02:00
Tamir Duberstein 0dac8cf44b rust_binder: use `core::ffi::CStr` method names
Prepare for `core::ffi::CStr` taking the place of `kernel::str::CStr` by
avoiding methods that only exist on the latter.

This backslid in commit eafedbc7c0 ("rust_binder: add Rust Binder
driver").

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-4-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein 5aed9677e5 rust_binder: use `kernel::fmt`
Reduce coupling to implementation details of the formatting machinery by
avoiding direct use for `core`'s formatting traits and macros.

This backslid in commit eafedbc7c0 ("rust_binder: add Rust Binder
driver").

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-3-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Tamir Duberstein d9252f1be2 rust_binder: remove trailing comma
This prepares for a later commit in which we introduce a custom
formatting macro; that macro doesn't handle trailing commas so just
remove this one.

Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Signed-off-by: Tamir Duberstein <tamird@gmail.com>
Link: https://patch.msgid.link/20251018-cstr-core-v18-2-9378a54385f8@gmail.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
2025-10-20 04:04:23 +02:00
Kriish Sharma 7557f18994 binder: Fix missing kernel-doc entries in binder.c
Fix several kernel-doc warnings in `drivers/android/binder.c` caused by
undocumented struct members and function parameters.

In particular, add missing documentation for the `@thread` parameter in
binder_free_buf_locked().

Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-13 11:08:25 +02:00
Alice Ryhl b5ce7a5cc5 rust_binder: report freeze notification only when fully frozen
Binder only sends out freeze notifications when ioctl_freeze() completes
and the process has become fully frozen. However, if a freeze
notification is registered during the freeze operation, then it
registers an initial state of 'frozen'. This is a problem because if
the freeze operation fails, then the listener is not told about that
state change, leading to lost updates.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-13 11:06:20 +02:00
Alice Ryhl 99559e5bb4 rust_binder: don't delete FreezeListener if there are pending duplicates
When userspace issues commands to a freeze listener, it identifies it
using a cookie. Normally this cookie uniquely identifies a freeze
listener, but when userspace clears a listener with the intent of
deleting it, it's allowed to "regret" clearing it and create a new
freeze listener for the same node using the same cookie. (IMO this was
an API mistake, but userspace relies on it.)

Currently if the active freeze listener gets fully deleted while there
are still pending duplicates, then the code incorrectly deletes the
pending duplicates too. To fix this, do not delete the entry if there
are still pending duplicates.

Since the current data structure requires a main freeze listener, we
convert one pending duplicate into the primary listener in this
scenario.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-13 11:06:18 +02:00
Alice Ryhl bfe144da06 rust_binder: freeze_notif_done should resend if wrong state
Consider the following scenario:
1. A freeze notification is delivered to thread 1.
2. The process becomes frozen or unfrozen.
3. The message for step 2 is delivered to thread 2 and ignored because
   there is already a pending notification from step 1.
4. Thread 1 acknowledges the notification from step 1.
In this case, step 4 should ensure that the message ignored in step 3 is
resent as it can now be delivered.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-13 11:06:16 +02:00
Alice Ryhl c7c090af37 rust_binder: remove warning about orphan mappings
This condition occurs if a thread dies while processing a transaction.
We should not print anything in this scenario.

Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-13 11:06:12 +02:00
Miguel Ojeda 7e69a24b6b rust_binder: clean `clippy::mem_replace_with_default` warning
Clippy reports:

    error: replacing a value of type `T` with `T::default()` is better expressed using `core::mem::take`
       --> drivers/android/binder/node.rs:690:32
        |
    690 |             _unused_capacity = mem::replace(&mut inner.freeze_list, KVVec::new());
        |                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: consider using: `core::mem::take(&mut inner.freeze_list)`
        |
        = help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#mem_replace_with_default
        = note: `-D clippy::mem-replace-with-default` implied by `-D warnings`
        = help: to override `-D warnings` add `#[allow(clippy::mem_replace_with_default)]`

The suggestion seems fine, thus apply it.

Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-13 10:57:41 +02:00
Alice Ryhl eafedbc7c0 rust_binder: add Rust Binder driver
We're generally not proponents of rewrites (nasty uncomfortable things
that make you late for dinner!). So why rewrite Binder?

Binder has been evolving over the past 15+ years to meet the evolving
needs of Android. Its responsibilities, expectations, and complexity
have grown considerably during that time. While we expect Binder to
continue to evolve along with Android, there are a number of factors
that currently constrain our ability to develop/maintain it. Briefly
those are:

1. Complexity: Binder is at the intersection of everything in Android and
   fulfills many responsibilities beyond IPC. It has become many things
   to many people, and due to its many features and their interactions
   with each other, its complexity is quite high. In just 6kLOC it must
   deliver transactions to the right threads. It must correctly parse
   and translate the contents of transactions, which can contain several
   objects of different types (e.g., pointers, fds) that can interact
   with each other. It controls the size of thread pools in userspace,
   and ensures that transactions are assigned to threads in ways that
   avoid deadlocks where the threadpool has run out of threads. It must
   track refcounts of objects that are shared by several processes by
   forwarding refcount changes between the processes correctly.  It must
   handle numerous error scenarios and it combines/nests 13 different
   locks, 7 reference counters, and atomic variables. Finally, It must
   do all of this as fast and efficiently as possible. Minor performance
   regressions can cause a noticeably degraded user experience.

2. Things to improve: Thousand-line functions [1], error-prone error
   handling [2], and confusing structure can occur as a code base grows
   organically. After more than a decade of development, this codebase
   could use an overhaul.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n2896
[2]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/android/binder.c?h=v6.5#n3658

3. Security critical: Binder is a critical part of Android's sandboxing
   strategy. Even Android's most de-privileged sandboxes (e.g. the
   Chrome renderer, or SW Codec) have direct access to Binder. More than
   just about any other component, it's important that Binder provide
   robust security, and itself be robust against security
   vulnerabilities.

It's #1 (high complexity) that has made continuing to evolve Binder and
resolving #2 (tech debt) exceptionally difficult without causing #3
(security issues). For Binder to continue to meet Android's needs, we
need better ways to manage (and reduce!) complexity without increasing
the risk.

The biggest change is obviously the choice of programming language. We
decided to use Rust because it directly addresses a number of the
challenges within Binder that we have faced during the last years. It
prevents mistakes with ref counting, locking, bounds checking, and also
does a lot to reduce the complexity of error handling. Additionally,
we've been able to use the more expressive type system to encode the
ownership semantics of the various structs and pointers, which takes the
complexity of managing object lifetimes out of the hands of the
programmer, reducing the risk of use-after-frees and similar problems.

Rust has many different pointer types that it uses to encode ownership
semantics into the type system, and this is probably one of the most
important aspects of how it helps in Binder. The Binder driver has a lot
of different objects that have complex ownership semantics; some
pointers own a refcount, some pointers have exclusive ownership, and
some pointers just reference the object and it is kept alive in some
other manner. With Rust, we can use a different pointer type for each
kind of pointer, which enables the compiler to enforce that the
ownership semantics are implemented correctly.

Another useful feature is Rust's error handling. Rust allows for more
simplified error handling with features such as destructors, and you get
compilation failures if errors are not properly handled. This means that
even though Rust requires you to spend more lines of code than C on
things such as writing down invariants that are left implicit in C, the
Rust driver is still slightly smaller than C binder: Rust is 5.5kLOC and
C is 5.8kLOC. (These numbers are excluding blank lines, comments,
binderfs, and any debugging facilities in C that are not yet implemented
in the Rust driver. The numbers include abstractions in rust/kernel/
that are unlikely to be used by other drivers than Binder.)

Although this rewrite completely rethinks how the code is structured and
how assumptions are enforced, we do not fundamentally change *how* the
driver does the things it does. A lot of careful thought has gone into
the existing design. The rewrite is aimed rather at improving code
health, structure, readability, robustness, security, maintainability
and extensibility. We also include more inline documentation, and
improve how assumptions in the code are enforced. Furthermore, all
unsafe code is annotated with a SAFETY comment that explains why it is
correct.

We have left the binderfs filesystem component in C. Rewriting it in
Rust would be a large amount of work and requires a lot of bindings to
the file system interfaces. Binderfs has not historically had the same
challenges with security and complexity, so rewriting binderfs seems to
have lower value than the rest of Binder.

Correctness and feature parity
------------------------------

Rust binder passes all tests that validate the correctness of Binder in
the Android Open Source Project. We can boot a device, and run a variety
of apps and functionality without issues. We have performed this both on
the Cuttlefish Android emulator device, and on a Pixel 6 Pro.

As for feature parity, Rust binder currently implements all features
that C binder supports, with the exception of some debugging facilities.
The missing debugging facilities will be added before we submit the Rust
implementation upstream.

Tracepoints
-----------

I did not include all of the tracepoints as I felt that the mechansim
for making C access fields of Rust structs should be discussed on list
separately. I also did not include the support for building Rust Binder
as a module since that requires exporting a bunch of additional symbols
on the C side.

Original RFC Link with old benchmark numbers:
	https://lore.kernel.org/r/20231101-rust-binder-v1-0-08ba9197f637@google.com

Co-developed-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: Wedson Almeida Filho <wedsonaf@gmail.com>
Co-developed-by: Matt Gilbride <mattgilbride@google.com>
Signed-off-by: Matt Gilbride <mattgilbride@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250919-rust-binder-v2-1-a384b09f28dd@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-19 09:40:46 +02:00
Carlos Llamas 3ebcd3460c binder: fix double-free in dbitmap
A process might fail to allocate a new bitmap when trying to expand its
proc->dmap. In that case, dbitmap_grow() fails and frees the old bitmap
via dbitmap_free(). However, the driver calls dbitmap_free() again when
the same process terminates, leading to a double-free error:

  ==================================================================
  BUG: KASAN: double-free in binder_proc_dec_tmpref+0x2e0/0x55c
  Free of addr ffff00000b7c1420 by task kworker/9:1/209

  CPU: 9 UID: 0 PID: 209 Comm: kworker/9:1 Not tainted 6.17.0-rc6-dirty #5 PREEMPT
  Hardware name: linux,dummy-virt (DT)
  Workqueue: events binder_deferred_func
  Call trace:
   kfree+0x164/0x31c
   binder_proc_dec_tmpref+0x2e0/0x55c
   binder_deferred_func+0xc24/0x1120
   process_one_work+0x520/0xba4
  [...]

  Allocated by task 448:
   __kmalloc_noprof+0x178/0x3c0
   bitmap_zalloc+0x24/0x30
   binder_open+0x14c/0xc10
  [...]

  Freed by task 449:
   kfree+0x184/0x31c
   binder_inc_ref_for_node+0xb44/0xe44
   binder_transaction+0x29b4/0x7fbc
   binder_thread_write+0x1708/0x442c
   binder_ioctl+0x1b50/0x2900
  [...]
  ==================================================================

Fix this issue by marking proc->map NULL in dbitmap_free().

Cc: stable@vger.kernel.org
Fixes: 15d9da3f81 ("binder: use bitmap for faster descriptor lookup")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Tiffany Yang <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250915221248.3470154-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-09-18 17:20:00 +02:00
Carlos Llamas 8a61a53b07 binder: add tracepoint for netlink reports
Add a tracepoint to capture the same details that are being sent through
the generic netlink interface during transaction failures. This provides
a useful debugging tool to observe the events independently from the
netlink listeners.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-6-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19 12:53:02 +02:00
Li Li f37b55ded8 binder: add transaction_report feature entry
Add "transaction_report" to the binderfs feature list, to help userspace
determine if the "BINDER_CMD_REPORT" generic netlink api is supported by
the binder driver.

Signed-off-by: Li Li <dualli@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-5-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19 12:53:01 +02:00
Li Li 63740349eb binder: introduce transaction reports via netlink
Introduce a generic netlink multicast event to report binder transaction
failures to userspace. This allows subscribers to monitor these events
and take appropriate actions, such as stopping a misbehaving application
that is spamming a service with huge amount of transactions.

The multicast event contains full details of the failed transactions,
including the sender/target PIDs, payload size and specific error code.
This interface is defined using a YAML spec, from which the UAPI and
kernel headers and source are auto-generated.

Signed-off-by: Li Li <dualli@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-4-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19 12:53:01 +02:00
Carlos Llamas 5cd0645b43 binder: add t->is_async and t->is_reply
Replace the t->need_reply flag with the more descriptive t->is_async and
and t->is_reply flags. The 'need_reply' flag was only used for debugging
purposes and the new flags can be used to distinguish between the type
of transactions too: sync, async and reply.

For now, only update the logging in print_binder_transaction_ilocked().
However, the new flags can be used in the future to replace the current
patterns and improve readability. e.g.:

  - if (!reply && !(tr->flags & TF_ONE_WAY))
  + if (t->is_async)

This patch is in preparation for binder's generic netlink implementation
and no functional changes are intended.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-3-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19 12:53:01 +02:00
Carlos Llamas 4afc5bf0a1 binder: pre-allocate binder_transaction
Move the allocation of 'struct binder_transaction' to the beginning of
the binder_transaction() function, along with the initialization of all
the members that are known at that time. This minor refactoring helps to
consolidate the usage of transaction information at later points.

This patch is in preparation for binder's generic netlink implementation
and no functional changes are intended.

Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250727182932.2499194-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-19 12:53:01 +02:00
Carlos Llamas 4d2604833e binder: remove MODULE_LICENSE()
The MODULE_LICENSE() macro is intended for drivers that can be built as
loadable modules. The binder driver is always built-in, using this macro
here is unnecessary and potentially confusing. Remove it.

Cc: Salvatore Bonaccorso <carnil@debian.org>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250817135034.3692902-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-08-18 11:49:19 +02:00
Linus Torvalds 0d5ec7919f Char / Misc / IIO / other driver updates for 6.17-rc1
Here is the big set of char/misc/iio and other smaller driver subsystems
 for 6.17-rc1.  It's a big set this time around, with the huge majority
 being in the iio subsystem with new drivers and dts files being added
 there.
 
 Highlights include:
   - IIO driver updates, additions, and changes making more code const
     and cleaning up some init logic
   - bus_type constant conversion changes
   - misc device test functions added
   - rust miscdevice minor fixup
   - unused function removals for some drivers
   - mei driver updates
   - mhi driver updates
   - interconnect driver updates
   - Android binder updates and test infrastructure added
   - small cdx driver updates
   - small comedi fixes
   - small nvmem driver updates
   - small pps driver updates
   - some acrn virt driver fixes for printk messages
   - other small driver updates
 
 All of these have been in linux-next with no reported issues.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaIeYDg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+yk3dwCdF6xoEVZaDkLM5IF3XKWWgdYxjNsAoKUy2kUx
 YtmVh4S0tMmugfeRGac7
 =3Wxi
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc / IIO / other driver updates from Greg KH:
 "Here is the big set of char/misc/iio and other smaller driver
  subsystems for 6.17-rc1. It's a big set this time around, with the
  huge majority being in the iio subsystem with new drivers and dts
  files being added there.

  Highlights include:
   - IIO driver updates, additions, and changes making more code const
     and cleaning up some init logic
   - bus_type constant conversion changes
   - misc device test functions added
   - rust miscdevice minor fixup
   - unused function removals for some drivers
   - mei driver updates
   - mhi driver updates
   - interconnect driver updates
   - Android binder updates and test infrastructure added
   - small cdx driver updates
   - small comedi fixes
   - small nvmem driver updates
   - small pps driver updates
   - some acrn virt driver fixes for printk messages
   - other small driver updates

  All of these have been in linux-next with no reported issues"

* tag 'char-misc-6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (292 commits)
  binder: Use seq_buf in binder_alloc kunit tests
  binder: Add copyright notice to new kunit files
  misc: ti_fpc202: Switch to of_fwnode_handle()
  bus: moxtet: Use dev_fwnode()
  pc104: move PC104 option to drivers/Kconfig
  drivers: virt: acrn: Don't use %pK through printk
  comedi: fix race between polling and detaching
  interconnect: qcom: Add Milos interconnect provider driver
  dt-bindings: interconnect: document the RPMh Network-On-Chip Interconnect in Qualcomm Milos SoC
  mei: more prints with client prefix
  mei: bus: use cldev in prints
  bus: mhi: host: pci_generic: Add Telit FN990B40 modem support
  bus: mhi: host: Detect events pointing to unexpected TREs
  bus: mhi: host: pci_generic: Add Foxconn T99W696 modem
  bus: mhi: host: Use str_true_false() helper
  bus: mhi: host: pci_generic: Add support for EM929x and set MRU to 32768 for better performance.
  bus: mhi: host: Fix endianness of BHI vector table
  bus: mhi: host: pci_generic: Disable runtime PM for QDU100
  bus: mhi: host: pci_generic: Fix the modem name of Foxconn T99W640
  dt-bindings: interconnect: qcom,msm8998-bwmon: Allow 'nonposted-mmio'
  ...
2025-07-29 09:52:01 -07:00
Linus Torvalds 2d9c1336ed VFS-related cleanups in various places (mostly of the "that really can't
happen" or "there's a better way to do it" variety)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCaIRK0QAKCRBZ7Krx/gZQ
 66/LAPoCvj5nAZH41F1VfyinA6V96kKsAazjrG7ttpWenu+6GAD/e9YQIAtYro0Z
 6f6EWTgrrEZqpOgc9kfHJq60m/TnSg8=
 =Ojq4
 -----END PGP SIGNATURE-----

Merge tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs

Pull misc VFS updates from Al Viro:
 "VFS-related cleanups in various places (mostly of the "that really
  can't happen" or "there's a better way to do it" variety)"

* tag 'pull-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  gpib: use file_inode()
  binder_ioctl_write_read(): simplify control flow a bit
  secretmem: move setting O_LARGEFILE and bumping users' count to the place where we create the file
  apparmor: file never has NULL f_path.mnt
  landlock: opened file never has a negative dentry
2025-07-28 10:32:20 -07:00
Tiffany Yang fa3f79e82d binder: Use seq_buf in binder_alloc kunit tests
Replace instances of snprintf with seq_buf functions, as suggested by
Kees [1].

[1] https://lore.kernel.org/all/202507160743.15E8044@keescook/

Fixes: d1934ed980 ("binder: encapsulate individual alloc test cases")
Suggested-by: Kees Cook <kees@kernel.org>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Reviewed-by: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/r/20250722234508.232228-2-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24 11:42:43 +02:00
Tiffany Yang 8a8d47e86c binder: Add copyright notice to new kunit files
Clean up for the binder_alloc kunit test series. Add a copyright notice
to new files, as suggested by Carlos [1].

[1] https://lore.kernel.org/all/CAFuZdDLD=3CBOLSWw3VxCf7Nkf884SSNmt1wresQgxgBwED=eQ@mail.gmail.com/

Fixes: 5e024582f4 ("binder: Scaffolding for binder_alloc KUnit tests")
Suggested-by: Carlos Llamas <cmllamas@google.com>
Cc: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250722234508.232228-1-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-24 11:42:43 +02:00
Tiffany Yang d1934ed980 binder: encapsulate individual alloc test cases
Each case tested by the binder allocator test is defined by 3 parameters:
the end alignment type of each requested buffer allocation, whether those
buffers share the front or back pages of the allotted address space, and
the order in which those buffers should be released. The alignment type
represents how a binder buffer may be laid out within or across page
boundaries and relative to other buffers, and it's used along with
whether the buffers cover part (sharing the front pages) of or all
(sharing the back pages) of the vma to calculate the sizes passed into
each test.

binder_alloc_test_alloc recursively generates each possible arrangement
of alignment types and then tests that the binder_alloc code tracks pages
correctly when those buffers are allocated and then freed in every
possible order at both ends of the address space. While they provide
comprehensive coverage, they are poor candidates to be represented as
KUnit test cases, which must be statically enumerated. For 5 buffers and
5 end alignment types, the test case array would have 750,000 entries.
This change structures the recursive calls into meaningful test cases so
that failures are easier to interpret.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-7-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:59 +02:00
Tiffany Yang f6544dcdd0 binder: Convert binder_alloc selftests to KUnit
Convert the existing binder_alloc_selftest tests into KUnit tests. These
tests allocate and free an exhaustive combination of buffers with
various sizes and alignments. This change allows them to be run without
blocking or otherwise interfering with other processes in binder.

This test is refactored into more meaningful cases in the subsequent
patch.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-6-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:59 +02:00
Tiffany Yang 5e024582f4 binder: Scaffolding for binder_alloc KUnit tests
Add setup and teardown for testing binder allocator code with KUnit.
Include minimal test cases to verify that tests are initialized
correctly.

Tested-by: Rae Moar <rmoar@google.com>
Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-5-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:58 +02:00
Tiffany Yang 4328a52642 binder: Store lru freelist in binder_alloc
Store a pointer to the free pages list that the binder allocator should
use for a process inside of struct binder_alloc. This change allows
binder allocator code to be tested and debugged deterministically while
a system is using binder; i.e., without interfering with other binder
processes and independently of the shrinker. This is necessary to
convert the current binder_alloc_selftest into a kunit test that does
not rely on hijacking an existing binder_proc to run.

A binder process's binder_alloc->freelist should not be changed after
it is initialized. A sole exception is the process that runs the
existing binder_alloc selftest. Its freelist can be temporarily replaced
for the duration of the test because it runs as a single thread before
any pages can be added to the global binder freelist, and the test frees
every page it allocates before dropping the binder_selftest_lock. This
exception allows the existing selftest to be used to check for
regressions, but it will be dropped when the binder_alloc tests are
converted to kunit in a subsequent patch in this series.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-3-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:58 +02:00
Tiffany Yang bea3e7bfa2 binder: Fix selftest page indexing
The binder allocator selftest was only checking the last page of buffers
that ended on a page boundary. Correct the page indexing to account for
buffers that are not page-aligned.

Signed-off-by: Tiffany Yang <ynaffit@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250714185321.2417234-2-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:58 +02:00
Dmitry Antipov 01afddcac6 binder: use guards for plain mutex- and spinlock-protected sections
Use 'guard(mutex)' and 'guard(spinlock)' for plain (i.e. non-scoped)
mutex- and spinlock-protected sections, respectively, thus making
locking a bit simpler. Briefly tested with 'stress-ng --binderfs'.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250626073054.7706-2-dmantipov@yandex.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:20 +02:00
Dmitry Antipov 1da2dca2fb binder: use kstrdup() in binderfs_binder_device_create()
In 'binderfs_binder_device_create()', use 'kstrdup()' to copy the
newly created device's name, thus making the former a bit simpler.

Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: "Tiffany Y. Yang" <ynaffit@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250626073054.7706-1-dmantipov@yandex.ru
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-16 14:11:20 +02:00
Al Viro 1664a91025 kill binderfs_remove_file()
don't try to open-code simple_recursive_removal(), especially when
you miss things like d_invalidate()...

Reviewed-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-07-02 22:36:52 -04:00
Steven Rostedt 421d3a860d binder: Remove unused binder lock events
Trace events can take up to 5K each when they are defined, regardless if
they are used or not. The binder lock events: binder_lock, binder_locked
and binder_unlock are no longer used.

Remove them.

Fixes: a60b890f60 ("binder: remove global binder lock")
Signed-off-by: "Steven Rostedt (Google)" <rostedt@goodmis.org>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250612093408.3b7320fa@batman.local.home
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-24 17:23:59 +01:00
Carlos Llamas 899385b04b binder: fix reversed pid/tid in log
The "pid:tid" format is used consistently throughout the driver's logs
with the exception of this one place where the arguments are reversed.
Let's fix that. Also, collapse a multi-line comment into a single line.

Cc: Steven Moreland <smoreland@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250605141930.1069438-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-24 17:23:55 +01:00
Al Viro 5a6acd563a binder_ioctl_write_read(): simplify control flow a bit
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2025-06-17 18:04:54 -04:00
Linus Torvalds c26f4fbd58 Char/Misc/IIO pull request for 6.16-rc1
Here is the big char/misc/iio and other small driver subsystem pull
 request for 6.16-rc1.
 
 Overall, a lot of individual changes, but nothing major, just the normal
 constant forward progress of new device support and cleanups to existing
 subsystems.  Highlights in here are:
   - Large IIO driver updates and additions and device tree changes
   - Android binder bugfixes and logfile fixes
   - mhi driver updates
   - comedi driver updates
   - counter driver updates and additions
   - coresight driver updates and additions
   - echo driver removal as there are no in-kernel users of it
   - nvmem driver updates
   - spmi driver updates
   - new amd-sbi driver "subsystem" and drivers added
   - rust miscdriver binding documentation fix
   - other small driver fixes and updates (uio, w1, acrn, hpet, xillybus,
     cardreader drivers, fastrpc and others.)
 
 All of these have been in linux-next for quite a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCaEKg5Q8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ykUyACgmAzrzKMoQUwwhQ6ed2l7tHdrlOcAoIORI1/x
 pNqQdrE1EbmAAyl47IN4
 =ts6J
 -----END PGP SIGNATURE-----

Merge tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc

Pull char / misc / iio driver updates from Greg KH:
 "Here is the big char/misc/iio and other small driver subsystem pull
  request for 6.16-rc1.

  Overall, a lot of individual changes, but nothing major, just the
  normal constant forward progress of new device support and cleanups to
  existing subsystems. Highlights in here are:

   - Large IIO driver updates and additions and device tree changes

   - Android binder bugfixes and logfile fixes

   - mhi driver updates

   - comedi driver updates

   - counter driver updates and additions

   - coresight driver updates and additions

   - echo driver removal as there are no in-kernel users of it

   - nvmem driver updates

   - spmi driver updates

   - new amd-sbi driver "subsystem" and drivers added

   - rust miscdriver binding documentation fix

   - other small driver fixes and updates (uio, w1, acrn, hpet,
     xillybus, cardreader drivers, fastrpc and others)

  All of these have been in linux-next for quite a while with no
  reported problems"

* tag 'char-misc-6.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (390 commits)
  binder: fix yet another UAF in binder_devices
  counter: microchip-tcb-capture: Add watch validation support
  dt-bindings: iio: adc: Add ROHM BD79100G
  iio: adc: add support for Nuvoton NCT7201
  dt-bindings: iio: adc: add NCT7201 ADCs
  iio: chemical: Add driver for SEN0322
  dt-bindings: trivial-devices: Document SEN0322
  iio: adc: ad7768-1: reorganize driver headers
  iio: bmp280: zero-init buffer
  iio: ssp_sensors: optimalize -> optimize
  HID: sensor-hub: Fix typo and improve documentation
  iio: admv1013: replace redundant ternary operator with just len
  iio: chemical: mhz19b: Fix error code in probe()
  iio: adc: at91-sama5d2: use IIO_DECLARE_BUFFER_WITH_TS
  iio: accel: sca3300: use IIO_DECLARE_BUFFER_WITH_TS
  iio: adc: ad7380: use IIO_DECLARE_DMA_BUFFER_WITH_TS
  iio: adc: ad4695: rename AD4695_MAX_VIN_CHANNELS
  iio: adc: ad4695: use IIO_DECLARE_DMA_BUFFER_WITH_TS
  iio: introduce IIO_DECLARE_BUFFER_WITH_TS macros
  iio: make IIO_DMA_MINALIGN minimum of 8 bytes
  ...
2025-06-06 11:50:47 -07:00
Linus Torvalds 6d5b940e1e vfs-6.16-rc1.async.dir
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCaDBN6wAKCRCRxhvAZXjc
 ok32AQD9DTiSCAoVg+7s+gSBuLTi8drPTN++mCaxdTqRh5WpRAD9GVyrGQT0s6LH
 eo9bm8d1TAYjilEWM0c0K0TxyQ7KcAA=
 =IW7H
 -----END PGP SIGNATURE-----

Merge tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs

Pull vfs directory lookup updates from Christian Brauner:
 "This contains cleanups for the lookup_one*() family of helpers.

  We expose a set of functions with names containing "lookup_one_len"
  and others without the "_len". This difference has nothing to do with
  "len". It's rater a historical accident that can be confusing.

  The functions without "_len" take a "mnt_idmap" pointer. This is found
  in the "vfsmount" and that is an important question when choosing
  which to use: do you have a vfsmount, or are you "inside" the
  filesystem. A related question is "is permission checking relevant
  here?".

  nfsd and cachefiles *do* have a vfsmount but *don't* use the non-_len
  functions. They pass nop_mnt_idmap and refuse to work on filesystems
  which have any other idmap.

  This work changes nfsd and cachefile to use the lookup_one family of
  functions and to explictily pass &nop_mnt_idmap which is consistent
  with all other vfs interfaces used where &nop_mnt_idmap is explicitly
  passed.

  The remaining uses of the "_one" functions do not require permission
  checks so these are renamed to be "_noperm" and the permission
  checking is removed.

  This series also changes these lookup function to take a qstr instead
  of separate name and len. In many cases this simplifies the call"

* tag 'vfs-6.16-rc1.async.dir' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
  VFS: change lookup_one_common and lookup_noperm_common to take a qstr
  Use try_lookup_noperm() instead of d_hash_and_lookup() outside of VFS
  VFS: rename lookup_one_len family to lookup_noperm and remove permission check
  cachefiles: Use lookup_one() rather than lookup_one_len()
  nfsd: Use lookup_one() rather than lookup_one_len()
  VFS: improve interface for lookup_one functions
2025-05-26 08:02:43 -07:00
Carlos Llamas 9857af0fcf binder: fix yet another UAF in binder_devices
Commit e77aff5528 ("binderfs: fix use-after-free in binder_devices")
addressed a use-after-free where devices could be released without first
being removed from the binder_devices list. However, there is a similar
path in binder_free_proc() that was missed:

  ==================================================================
  BUG: KASAN: slab-use-after-free in binder_remove_device+0xd4/0x100
  Write of size 8 at addr ffff0000c773b900 by task umount/467
  CPU: 12 UID: 0 PID: 467 Comm: umount Not tainted 6.15.0-rc7-00138-g57483a362741 #9 PREEMPT
  Hardware name: linux,dummy-virt (DT)
  Call trace:
   binder_remove_device+0xd4/0x100
   binderfs_evict_inode+0x230/0x2f0
   evict+0x25c/0x5dc
   iput+0x304/0x480
   dentry_unlink_inode+0x208/0x46c
   __dentry_kill+0x154/0x530
   [...]

  Allocated by task 463:
   __kmalloc_cache_noprof+0x13c/0x324
   binderfs_binder_device_create.isra.0+0x138/0xa60
   binder_ctl_ioctl+0x1ac/0x230
  [...]

  Freed by task 215:
   kfree+0x184/0x31c
   binder_proc_dec_tmpref+0x33c/0x4ac
   binder_deferred_func+0xc10/0x1108
   process_one_work+0x520/0xba4
  [...]
  ==================================================================

Call binder_remove_device() within binder_free_proc() to ensure the
device is removed from the binder_devices list before being kfreed.

Cc: stable@vger.kernel.org
Fixes: 12d909cac1 ("binderfs: add new binder devices to binder_devices")
Reported-by: syzbot+4af454407ec393de51d6@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=4af454407ec393de51d6
Tested-by: syzbot+4af454407ec393de51d6@syzkaller.appspotmail.com
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://lore.kernel.org/r/20250524220758.915028-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-25 11:25:07 +02:00
Tiffany Y. Yang 57483a3627 binder: Create safe versions of binder log files
Binder defines several seq_files that can be accessed via debugfs or
binderfs. Some of these files (e.g., 'state' and 'transactions')
contain more granular information about binder's internal state that
is helpful for debugging, but they also leak userspace address data
through user-defined 'cookie' or 'ptr' values. Consequently, access
to these files must be heavily restricted.

Add two new files, 'state_hashed' and 'transactions_hashed', that
reproduce the information in the original files but use the kernel's
raw pointer obfuscation to hash any potential user addresses. This
approach allows systems to grant broader access to the new files
without having to change the security policy around the existing ones.

In practice, userspace populates these fields with user addresses, but
within the driver, these values only serve as unique identifiers for
their associated binder objects. Consequently, binder logs can
obfuscate these values and still retain meaning. While this strategy
prevents leaking information about the userspace memory layout in the
existing log files, it also decouples log messages about binder
objects from their user-defined identifiers.

Acked-by: Carlos Llamas <cmllamas@google.com>
Tested-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com>
Link: https://lore.kernel.org/r/20250510013435.1520671-7-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21 14:39:16 +02:00
Tiffany Y. Yang 91f1bbaa78 binder: Refactor binder_node print synchronization
The binder driver outputs information about each dead binder node by
iterating over the dead nodes list, and it prints the state of each live
node in the system by traversing each binder_proc's proc->nodes tree.
Both cases require similar logic to maintain the global lock ordering
while accessing each node.

Create a helper function to synchronize around printing binder nodes in
a list. Opportunistically make minor cosmetic changes to binder print
functions.

Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: "Tiffany Y. Yang" <ynaffit@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250510013435.1520671-5-ynaffit@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21 14:39:16 +02:00
Dmitry Antipov 8c0a559825 binder: fix use-after-free in binderfs_evict_inode()
Running 'stress-ng --binderfs 16 --timeout 300' under KASAN-enabled
kernel, I've noticed the following:

BUG: KASAN: slab-use-after-free in binderfs_evict_inode+0x1de/0x2d0
Write of size 8 at addr ffff88807379bc08 by task stress-ng-binde/1699

CPU: 0 UID: 0 PID: 1699 Comm: stress-ng-binde Not tainted 6.14.0-rc7-g586de92313fc-dirty #13
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x1c2/0x2a0
 ? __pfx_dump_stack_lvl+0x10/0x10
 ? __pfx__printk+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? __virt_addr_valid+0x18c/0x540
 ? __virt_addr_valid+0x469/0x540
 print_report+0x155/0x840
 ? __virt_addr_valid+0x18c/0x540
 ? __virt_addr_valid+0x469/0x540
 ? __phys_addr+0xba/0x170
 ? binderfs_evict_inode+0x1de/0x2d0
 kasan_report+0x147/0x180
 ? binderfs_evict_inode+0x1de/0x2d0
 binderfs_evict_inode+0x1de/0x2d0
 ? __pfx_binderfs_evict_inode+0x10/0x10
 evict+0x524/0x9f0
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_evict+0x10/0x10
 ? do_raw_spin_unlock+0x4d/0x210
 ? _raw_spin_unlock+0x28/0x50
 ? iput+0x697/0x9b0
 __dentry_kill+0x209/0x660
 ? shrink_kill+0x8d/0x2c0
 shrink_kill+0xa9/0x2c0
 shrink_dentry_list+0x2e0/0x5e0
 shrink_dcache_parent+0xa2/0x2c0
 ? __pfx_shrink_dcache_parent+0x10/0x10
 ? __pfx_lock_release+0x10/0x10
 ? __pfx_do_raw_spin_lock+0x10/0x10
 do_one_tree+0x23/0xe0
 shrink_dcache_for_umount+0xa0/0x170
 generic_shutdown_super+0x67/0x390
 kill_litter_super+0x76/0xb0
 binderfs_kill_super+0x44/0x90
 deactivate_locked_super+0xb9/0x130
 cleanup_mnt+0x422/0x4c0
 ? lockdep_hardirqs_on+0x9d/0x150
 task_work_run+0x1d2/0x260
 ? __pfx_task_work_run+0x10/0x10
 resume_user_mode_work+0x52/0x60
 syscall_exit_to_user_mode+0x9a/0x120
 do_syscall_64+0x103/0x210
 ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0xcac57b
Code: c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 31 f6 e9 05 00 00 00 0f 1f 44 00 00 f3 0f 1e fa b8
RSP: 002b:00007ffecf4226a8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
RAX: 0000000000000000 RBX: 00007ffecf422720 RCX: 0000000000cac57b
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00007ffecf422850
RBP: 00007ffecf422850 R08: 0000000028d06ab1 R09: 7fffffffffffffff
R10: 3fffffffffffffff R11: 0000000000000246 R12: 00007ffecf422718
R13: 00007ffecf422710 R14: 00007f478f87b658 R15: 00007ffecf422830
 </TASK>

Allocated by task 1705:
 kasan_save_track+0x3e/0x80
 __kasan_kmalloc+0x8f/0xa0
 __kmalloc_cache_noprof+0x213/0x3e0
 binderfs_binder_device_create+0x183/0xa80
 binder_ctl_ioctl+0x138/0x190
 __x64_sys_ioctl+0x120/0x1b0
 do_syscall_64+0xf6/0x210
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

Freed by task 1705:
 kasan_save_track+0x3e/0x80
 kasan_save_free_info+0x46/0x50
 __kasan_slab_free+0x62/0x70
 kfree+0x194/0x440
 evict+0x524/0x9f0
 do_unlinkat+0x390/0x5b0
 __x64_sys_unlink+0x47/0x50
 do_syscall_64+0xf6/0x210
 entry_SYSCALL_64_after_hwframe+0x77/0x7f

This 'stress-ng' workload causes the concurrent deletions from
'binder_devices' and so requires full-featured synchronization
to prevent list corruption.

I've found this issue independently but pretty sure that syzbot did
the same, so Reported-by: and Closes: should be applicable here as well.

Cc: stable@vger.kernel.org
Reported-by: syzbot+353d7b75658a95aa955a@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=353d7b75658a95aa955a
Fixes: e77aff5528 ("binderfs: fix use-after-free in binder_devices")
Signed-off-by: Dmitry Antipov <dmantipov@yandex.ru>
Acked-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250517170957.1317876-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-21 14:38:49 +02:00
Greg Kroah-Hartman 4f822ad5ee Linux 6.15-rc4
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCgA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmgOrWseHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGFyIH/AhXcuA8y8rk43mo
 t+0GO7JR4dnr4DIl74GgDjCXlXiKCT7EXMfD/ABdofTxV4Pbyv+pUODlg1E6eO9U
 C1WWM5PPNBGDDEVSQ3Yu756nr0UoiFhvW0R6pVdou5cezCWAtIF9LTN8DEUgis0u
 EUJD9+/cHAMzfkZwabjm/HNsa1SXv2X47MzYv/PdHKr0htEPcNHF4gqBrBRdACGy
 FJtaCKhuPf6TcDNXOFi5IEWMXrugReRQmOvrXqVYGa7rfUFkZgsAzRY6n/rUN5Z9
 FAgle4Vlv9ohVYj9bXX8b6wWgqiKRpoN+t0PpRd6G6ict1AFBobNGo8LH3tYIKqZ
 b/dCGNg=
 =xDGd
 -----END PGP SIGNATURE-----

Merge 6.15-rc4 into char-misc-next

We need the char-misc fixes in here as well.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-28 09:45:00 +02:00