linux

Commit Graph

Author	SHA1	Message	Date
David Wei	804bf334d0	net: Proxy netdev_queue_get_dma_dev for leased queues Extend netdev_queue_get_dma_dev to return the physical device of the real rxq for DMA in case the queue was leased. This allows memory providers like io_uring zero-copy or devmem to bind to the physically leased rxq via virtual devices such as netkit. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-7-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
David Wei	0caa9a8dde	net: Proxy net_mp_{open,close}_rxq for leased queues When a process in a container wants to setup a memory provider, it will use the virtual netdev and a leased rxq, and call net_mp_{open,close}_rxq to try and restart the queue. At this point, proxy the queue restart on the real rxq in the physical netdev. For memory providers (io_uring zero-copy rx and devmem), it causes the real rxq in the physical netdev to be filled from a memory provider that has DMA mapped memory from a process within a container. Signed-off-by: David Wei <dw@davidwei.uk> Co-developed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-6-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Daniel Borkmann	ff8889ff91	net, ethtool: Disallow leased real rxqs to be resized Similar to AF_XDP, do not allow queues in a physical netdev to be resized by ethtool -L when they are leased. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-5-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Daniel Borkmann	9e2103f361	net: Add lease info to queue-get response Populate nested lease info to the queue-get response that returns the ifindex, queue id with type and optionally netns id if the device resides in a different netns. Example with ynl client: # ip a [...] 4: enp10s0f0np0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 xdp/id:24 qdisc mq state UP group default qlen 1000 link/ether e8:eb:d3:a3:43:f6 brd ff:ff:ff:ff:ff:ff inet 10.0.0.2/24 scope global enp10s0f0np0 valid_lft forever preferred_lft forever inet6 fe80::eaeb:d3ff:fea3:43f6/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ethtool -i enp10s0f0np0 driver: mlx5_core [...] # ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-get \ --json '{"ifindex": 4, "id": 15, "type": "rx"}' {'id': 15, 'ifindex': 4, 'lease': {'ifindex': 8, 'netns-id': 0, 'queue': {'id': 1, 'type': 'rx'}}, 'napi-id': 8227, 'type': 'rx', 'xsk': {}} # ip netns list foo (id: 0) # ip netns exec foo ip a [...] 8: nk@NONE: <BROADCAST,MULTICAST,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff inet6 fe80::200:ff:fe00:0/64 scope link proto kernel_ll valid_lft forever preferred_lft forever [...] # ip netns exec foo ethtool -i nk driver: netkit [...] # ip netns exec foo ls /sys/class/net/nk/queues/ rx-0 rx-1 tx-0 # ip netns exec foo ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-get \ --json '{"ifindex": 8, "id": 1, "type": "rx"}' {'id': 1, 'ifindex': 8, 'type': 'rx'} Note that the caller of netdev_nl_queue_fill_one() holds the netdevice lock. For the queue-get we do not lock both devices. When queues get {un,}leased, both devices are locked, thus if __netif_get_rx_queue_peer() returns true, the peer pointer points to a valid device. The netns-id is fetched via peernet2id_alloc() similarly as done in OVS. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260115082603.219152-4-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Daniel Borkmann	31127dedde	net: Implement netdev_nl_queue_create_doit Implement netdev_nl_queue_create_doit which creates a new rx queue in a virtual netdev and then leases it to a rx queue in a physical netdev. Example with ynl client: # ./pyynl/cli.py \ --spec ~/netlink/specs/netdev.yaml \ --do queue-create \ --json '{"ifindex": 8, "type": "rx", "lease": {"ifindex": 4, "queue": {"type": "rx", "id": 15}}}' {'id': 1} Note that the netdevice locking order is always from the virtual to the physical device. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-3-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Daniel Borkmann	a5546e18f7	net: Add queue-create operation Add a ynl netdev family operation called queue-create that creates a new queue on a netdevice: name: queue-create attribute-set: queue flags: [admin-perm] do: request: attributes: - ifindex - type - lease reply: &queue-create-op attributes: - id This is a generic operation such that it can be extended for various use cases in future. Right now it is mandatory to specify ifindex, the queue type which is enforced to rx and a lease. The newly created queue id is returned to the caller. A queue from a virtual device can have a lease which refers to another queue from a physical device. This is useful for memory providers and AF_XDP operations which take an ifindex and queue id to allow applications to bind against virtual devices in containers. The lease couples both queues together and allows to proxy the operations from a virtual device in a container to the physical device. In future, the nested lease attribute can be lifted and made optional for other use-cases such as dynamic queue creation for physical netdevs. The lack of lease and the specification of the physical device as an ifindex will imply that we need a real queue to be allocated. Similarly, the queue type enforcement to rx can then be lifted as well to support tx. An early implementation had only driver-specific integration [0], but in order for other virtual devices to reuse, it makes sense to have this as a generic API in core net. For leasing queues, the virtual netdev must have real_num_rx_queue less than num_rx_queues at the time of calling queue-create. The queue-type must be rx as only rx queues are supported for leasing for now. We also enforce that the queue-create ifindex must point to a virtual device, and that the nested lease attribute's ifindex must point to a physical device. The nested lease attribute set contains a netns-id attribute which is currently only intended for dumping as part of the queue-get operation. Also, it is modeled as an s32 type similarly as done elsewhere in the stack. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Co-developed-by: David Wei <dw@davidwei.uk> Signed-off-by: David Wei <dw@davidwei.uk> Link: https://bpfconf.ebpf.io/bpfconf2025/bpfconf2025_material/lsfmmbpf_2025_netkit_borkmann.pdf [0] Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115082603.219152-2-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 11:58:49 +01:00
Paolo Abeni	4515ec4ad5	Merge branch 'net-hinic3-pf-initialization' Fan Gong says: ==================== net: hinic3: PF initialization This is [1/3] part of hinic3 Ethernet driver second submission. With this patch hinic3 becomes a complete Ethernet driver with pf and vf. The driver parts contained in this patch: Add support for PF framework based on the VF code. Add PF management interfaces to communicate with HW. Add 8 netdev ops to configure NIC features. Support mac filter to unicast and multicast. Add HW event handler to manage port and link status. V01: https://lore.kernel.org/netdev/cover.1760502478.git.zhuyikai1@h-partners.com/ V02: https://lore.kernel.org/netdev/cover.1760685059.git.zhuyikai1@h-partners.com/ V03: https://lore.kernel.org/netdev/cover.1761362580.git.zhuyikai1@h-partners.com/ V04: https://lore.kernel.org/netdev/cover.1761711549.git.zhuyikai1@h-partners.com/ V05: https://lore.kernel.org/netdev/cover.1762414088.git.zhuyikai1@h-partners.com/ V06: https://lore.kernel.org/netdev/cover.1762581665.git.zhuyikai1@h-partners.com/ V07: https://lore.kernel.org/netdev/cover.1763555878.git.zhuyikai1@h-partners.com/ V08: https://lore.kernel.org/netdev/cover.1767495881.git.zhuyikai1@h-partners.com/ V09: https://lore.kernel.org/netdev/cover.1767707500.git.zhuyikai1@h-partners.com/ V10: https://lore.kernel.org/netdev/cover.1767861236.git.zhuyikai1@h-partners.com/ ==================== Link: https://patch.msgid.link/cover.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:34 +01:00
Fan Gong	cb36f89b10	hinic3: Add HW event handler Add HINIC3_INIT_UP flags to trace netdev open status. Add port module event handler. Add link status event type(FAULT, PCIE link down, heart lost, mgmt watchdog). Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/53a2b928136998f740d597bbd45ca1740b95538f.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	aebd95b00a	hinic3: Add mac filter ops Add ops to support unicast and multicast to filter mac address in packets sending and receiving. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/9ea618ad5d6c404e222e9114e08e913a73177705.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	b35a6fd37a	hinic3: Add adaptive IRQ coalescing with DIM DIM offers a way to adjust the coalescing settings based on load. As hinic3 rx and tx share interrupts, we only need to base dim on rx stats. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/af96c20a836800a5972a09cdaf520029d976ad48.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	0f9e2d9574	hinic3: Add .ndo_vlan_rx_add/kill_vid and .ndo_validate_addr Implement following callback function: .ndo_vlan_rx_add_vid .ndo_vlan_rx_kill_vid .ndo_validate_addr Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/70be187afcaa7b38981d114c088ffdc2cba0b4f1.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	2467a04660	hinic3: Add .ndo_features_check As we cannot solve packets with multiple stacked vlan, so we use .ndo_features_check to check for these packets and return a smaller feature without offload features. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/3879b20b7ffa20106a3f8f56dbf2d5eb389f260a.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	721df7639c	hinic3: Add .ndo_set_features and .ndo_fix_features Implement following callback function: .ndo_set_features .ndo_fix_features The .ndo_set_features function includes five features: rx_csum, tso, lro, rx_cvlan and vlan_filter. Add these new features in netdev_feature_init. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/682734a08fde421413048bf70057dafe3cbe8497.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	f47872bed4	hinic3: Add .ndo_tx_timeout and .ndo_get_stats64 Implement following callback function: .ndo_tx_timeout .ndo_get_stats64 Use a work queue to trace tx_timeout callback and dump necessary debug information. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/ec34d2ff9b142e1e142e47700714533baf7e659c.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	a30cc9b277	hinic3: Add PF management interfaces Add management and communication pathways between PF and HW. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/447e72b1c5f255ea5d79ecf96c8dac58011e604d.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Fan Gong	53200a8605	hinic3: Add PF framework Add support for PF framework based on the VF code. Co-developed-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Zhu Yikai <zhuyikai1@h-partners.com> Signed-off-by: Fan Gong <gongfan1@huawei.com> Link: https://patch.msgid.link/4796d6076bbad0f096eb10261ab3c7d989c5f7b7.1768375903.git.zhuyikai1@h-partners.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2026-01-20 10:34:31 +01:00
Jakub Kicinski	c5e7b1d1cc	Merge branch 'net-mlx5e-save-per-channel-async-icosq-in-default' Tariq Toukan says: ==================== net/mlx5e: Save per-channel async ICOSQ in default This series by William reduces the default number of SQs in a channel from 3 down to 2, by not creating the async ICOSQ (asynchronous internal-communication-operations send-queue). This significantly improves the latency of channel configuration operations, like interface up (create channels), interface down (destroy channels), and channels reconfiguration (create new set, destroy old one). This reduces the per-channel memory usage, saves hardware resources, in addition to the improved latency. This significantly speeds up the setup/config stage on systems with high number of channels or many netdevs, in particular systems with hundreds or K's of SFs. The two remaining default SQs per channel after this series: 1 TXQ SQ (for traffic), and 1 ICOSQ (for internal communication operations with the device). Perf numbers: NIC: Connect-X7. Test: Latency of interface up + down operations. Measured 20% speedup. Saving ~0.36 sec for 248 channels (~1.45 msec per channel). ==================== Link: https://patch.msgid.link/1768376800-1607672-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:26:45 -08:00
William Tu	abed42f9cd	net/mlx5e: Conditionally create async ICOSQ The async ICOSQ is only required by TLS RX (for re-sync flow) and XSK TX. Create it only when these features are enabled instead of always allocating it. This reduces per-channel memory usage, saves hardware resources, improves latency, and decreases the default number of SQs (from 3 to 2) and CQs (from 4 to 3). It also speeds up channel open/close operations for a netdev when async ICOSQ is not needed. Currently when TLS RX is enabled, there is no channel reset triggered. As a result, async ICOSQ allocation is not triggered, causing a NULL pointer crash. One solution is to do channel reset every time when toggling TLS RX. However, it's not straightforward as the offload state matters only on connection creation, and can go on beyond the channels reset. Instead, introduce a new field 'ktls_rx_was_enabled': if TLS RX is enabled for the first time: reset channels, create async ICOSQ, set the field. From that point on, no need to reset channels for any TLS RX enable/disable. Async ICOSQ will always be needed. For XSK TX, async ICOSQ is used in wakeup control and is guaranteed to have async ICOSQ allocated. This improves the latency of interface up/down operations when it applies. Perf numbers: NIC: Connect-X7. Test: Latency of interface up + down operations. Measured 20% speedup. Saving ~0.36 sec for 248 channels (~1.45 msec per channel). Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-5-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:26:42 -08:00
William Tu	1b080bd748	net/mlx5e: Move async ICOSQ to dynamic allocation Dynamically allocate async ICOSQ. ICO (Internal Communication Operations) is for driver to communicate with the HW, and it's not used for traffic. Currently mlx5 driver has sync and async ICO send queues. The async ICOSQ means that it's not necessarily under NAPI context protection. The patch is in preparation for the later patch to detect its usage and enable it when necessary. Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-4-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:26:42 -08:00
William Tu	56aca3e0f7	net/mlx5e: Use regular ICOSQ for triggering NAPI Before the cited commit, ICOSQ is used to post NOP WQE to trigger hardware interrupt and start NAPI, but this mechanism suffers from a race condition: mlx5e_alloc_rx_mpwqe may post UMR WQEs to ICOSQ _before_ NOP WQE is posted. The cited commit fixes the issue by replacing ICOSQ with async ICOSQ, as a new way to post the NOP WQE to trigger the hardware interrupt and NAPI. The patch changes it back by replacing async ICOSQ with regular ICOSQ, for the purpose of saving memory in later patches, and solves the issue by adding a new SQ state, MLX5E_SQ_STATE_LOCK_NEEDED for syncing the start of NAPI. What it does: - Switch trigger path from async ICOSQ to regular ICOSQ to reduce need for async SQ. - Introduce MLX5E_SQ_STATE_LOCK_NEEDED and mlx5e_icosq_sync_lock(), unlock() to prevent the race where UMR WQEs could be posted before the NOP WQE used to trigger NAPI. - Use synchronize_net() once per trigger cycle to quiesce in-flight softirqs before serializing the NOP WQE and any UMR postings via the ICOSQ lock. - Wrap ICOSQ UMR posting in en_rx.c and xsk/rx.c with the new conditional lock. The conditional locking approach is critical for performance: always locking would impose unnecessary overhead. Synchronization is not needed between regular NAPI cycles once the channel is activated and running. The lock is only required to protect against the race during channel activation—specifically, when the very first NOP WQE is posted to trigger NAPI. After that initial trigger, normal NAPI polling handles subsequent work without contention. The MLX5E_SQ_STATE_LOCK_NEEDED flag ensures we pay the synchronization cost only when necessary. Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-3-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:26:42 -08:00
William Tu	ea945f4f39	net/mlx5e: Move async ICOSQ lock into ICOSQ struct Move the async_icosq spinlock from the mlx5e_channel structure into the mlx5e_icosq structure itself for better encapsulation and for later patch to also use it for other icosq use cases. Changes: - Add spinlock_t lock field to struct mlx5e_icosq - Remove async_icosq_lock field from struct mlx5e_channel - Initialize the new lock in mlx5e_open_icosq() - Update all lock usage in ktls_rx.c and en_main.c to use sq->lock instead of c->async_icosq_lock Signed-off-by: William Tu <witu@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1768376800-1607672-2-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:26:42 -08:00
Vimlesh Kumar	3b85d5f856	octeon_ep: reset firmware ready status Add support to reset firmware ready status when the driver is removed(either in unload or unbind) Signed-off-by: Sathesh Edara <sedara@marvell.com> Signed-off-by: Shinas Rasheed <srasheed@marvell.com> Signed-off-by: Vimlesh Kumar <vimleshk@marvell.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115092048.870237-1-vimleshk@marvell.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:11:07 -08:00
Jakub Kicinski	3d4375c2a9	Merge branch 'net-thunderbolt-various-improvements' Mika Westerberg says: ==================== net: thunderbolt: Various improvements This series improves the Thunderbolt networking driver so that it should work with the bonding driver. The discussion that started this patch series can be read below: https://lore.kernel.org/netdev/CAFJzfF9N4Hak23sc-zh0jMobbkjK7rg4odhic1DQ1cC+=MoQoA@mail.gmail.com/ v2: https://lore.kernel.org/20260109122606.3586895-1-mika.westerberg@linux.intel.com v1: https://lore.kernel.org/20251127131521.2580237-1-mika.westerberg@linux.intel.com ==================== Link: https://patch.msgid.link/20260115115646.328898-1-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:10:00 -08:00
Ian MacDonald	7a3d3279a5	net: thunderbolt: Allow reading link settings In order to use Thunderbolt networking as part of bonding device it needs to support ->get_link_ksettings() ethtool operation, so that the bonding driver can read the link speed and the related attributes. Add support for this to the driver. Signed-off-by: Ian MacDonald <ian@netstatz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://patch.msgid.link/20260115115646.328898-5-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:09:58 -08:00
Mika Westerberg	2e62e5565b	bonding: 3ad: Add support for SPEED_80000 Add support for ethtool SPEED_80000. This is needed to allow Thunderbolt/USB4 networking driver to be used with the bonding driver. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20260115115646.328898-4-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:09:58 -08:00
Mika Westerberg	a9927022c4	net: ethtool: Add support for 80Gbps speed USB4 v2 link used in peer-to-peer networking is symmetric 80Gbps so in order to support reading this link speed, add support for it to ethtool. Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260115115646.328898-3-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:09:58 -08:00
Mika Westerberg	8411d7286b	net: thunderbolt: Allow changing MAC address of the device The MAC address we use is based on a suggestion in the USB4 Inter-domain spec but it is not really used in the USB4NET protocol. It is more targeted for the upper layers of the network stack. There is no reason why it should not be changed by the userspace for example if needed for bonding. Reported-by: Ian MacDonald <ian@netstatz.com> Closes: https://lore.kernel.org/netdev/CAFJzfF9N4Hak23sc-zh0jMobbkjK7rg4odhic1DQ1cC+=MoQoA@mail.gmail.com/ Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Link: https://patch.msgid.link/20260115115646.328898-2-mika.westerberg@linux.intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:09:57 -08:00
Jakub Kicinski	456083e7f1	Merge branch 'dpll-support-mode-switching' Ivan Vecera says: ==================== dpll: support mode switching This series adds support for switching the working mode (automatic vs manual) of a DPLL device via netlink. Currently, the DPLL subsystem allows userspace to retrieve the current working mode but lacks the mechanism to configure it. Userspace is also unaware of which modes a specific device actually supports, as it currently assumes only the active mode is supported. The series addresses these limitations by: 1. Introducing .supported_modes_get() callback to allow drivers to report all modes capable of running on the device. 2. Introducing .mode_set() callback and updating the netlink policy to allow userspace to request a mode change. 3. Implementing these callbacks in the zl3073x driver, enabling dynamic switching between automatic and manual modes. ==================== Link: https://patch.msgid.link/20260114122726.120303-1-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:05:04 -08:00
Ivan Vecera	d6df0dea24	dpll: zl3073x: Implement device mode setting support Add support for .supported_modes_get() and .mode_set() callbacks to enable switching between manual and automatic modes via netlink. Implement .supported_modes_get() to report available modes based on the current hardware configuration: * manual mode is always supported * automatic mode is supported unless the dpll channel is configured in NCO (Numerically Controlled Oscillator) mode Implement .mode_set() to handle the specific logic required when transitioning between modes: 1) Transition to manual: * If a valid reference is currently active, switch the hardware to ref-lock mode (force lock to that reference). * If no reference is valid and the DPLL is unlocked, switch to freerun. * Otherwise, switch to Holdover. 2) Transition to automatic: * If the currently selected reference pin was previously marked as non-selectable (likely during a previous manual forcing operation), restore its priority and selectability in the hardware. * Switch the hardware to Automatic selection mode. Signed-off-by: Ivan Vecera <ivecera@redhat.com> Reviewed-by: Prathosh Satish <Prathosh.Satish@microchip.com> Link: https://patch.msgid.link/20260114122726.120303-4-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:04:57 -08:00
Ivan Vecera	e3f6c65192	dpll: add dpll_device op to set working mode Currently, userspace can retrieve the DPLL working mode but cannot configure it. This prevents changing the device operation, such as switching from manual to automatic mode and vice versa. Add a new callback .mode_set() to struct dpll_device_ops. Extend the netlink policy and device-set command handling to process the DPLL_A_MODE attribute. Update the netlink YAML specification to include the mode attribute in the device-set operation. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260114122726.120303-3-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:04:53 -08:00
Ivan Vecera	b1f99cc886	dpll: add dpll_device op to get supported modes Currently, the DPLL subsystem assumes that the only supported mode is the one currently active on the device. When dpll_msg_add_mode_supported() is called, it relies on ops->mode_get() and reports that single mode to userspace. This prevents users from discovering other modes the device might be capable of. Add a new callback .supported_modes_get() to struct dpll_device_ops. This allows drivers to populate a bitmap indicating all modes supported by the hardware. Update dpll_msg_add_mode_supported() to utilize this new callback: * if ops->supported_modes_get is defined, use it to retrieve the full bitmap of supported modes. * if not defined, fall back to the existing behavior: retrieve the current mode via ops->mode_get and set the corresponding bit in the bitmap. Finally, iterate over the bitmap and add a DPLL_A_MODE_SUPPORTED netlink attribute for every set bit, accurately reporting the device's capabilities to userspace. Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Link: https://patch.msgid.link/20260114122726.120303-2-ivecera@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 12:04:53 -08:00
Christophe Leroy (CS GROUP)	d321d505ed	selftests: net: csum: Fix printk format in recv_get_packet_csum_status() Following warning is encountered when building selftests on powerpc/32. CC csum csum.c: In function 'recv_get_packet_csum_status': csum.c:710:50: warning: format '%lu' expects argument of type 'long unsigned int', but argument 4 has type 'size_t' {aka 'unsigned int'} [-Wformat=] 710 \| error(1, 0, "cmsg: len=%lu expected=%lu", \| ~~^ \| \| \| long unsigned int \| %u 711 \| cm->cmsg_len, CMSG_LEN(sizeof(struct tpacket_auxdata))); \| ~~~~~~~~~~~~ \| \| \| size_t {aka unsigned int} csum.c:710:63: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=] 710 \| error(1, 0, "cmsg: len=%lu expected=%lu", \| ~~^ \| \| \| long unsigned int \| %u cm->cmsg_len has type __kernel_size_t and CMSG() macro has the type returned by sizeof() which is size_t. size_t is 'unsigned int' on some platforms and 'unsigned long' on other ones so use %zu instead of %lu. The code in question was introduced by commit `91a7de8560` ("selftests/net: add csum offload test"). Signed-off-by: Christophe Leroy (CS GROUP) <chleroy@kernel.org> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Link: https://patch.msgid.link/8b69b40826553c1dd500d9d25e45883744f3f348.1768556791.git.chleroy@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 10:09:47 -08:00
Jakub Kicinski	2f52f835b4	Merge branch 'dsa-mxl-gsw1xx-support-r-g-mii-slew-rate-configuration' Alexander Sverdlin says: ==================== dsa: mxl-gsw1xx: Support R(G)MII slew rate configuration Maxlinear GSW1xx switches offer slew rate configuration bits for R(G)MII interface. The default state of the configuration bits is "normal", while "slow" can be used to reduce the radiated emissions. Add the support for the latter option into the driver as well as the new DT bindings. ==================== Link: https://patch.msgid.link/20260114104509.618984-1-alexander.sverdlin@siemens.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 10:08:28 -08:00
Alexander Sverdlin	dbf24ab58f	net: dsa: mxl-gsw1xx: Support R(G)MII slew rate configuration Support newly introduced maxlinear,slew-rate-txc and maxlinear,slew-rate-txd device tree properties to configure R(G)MII interface pins' slew rate. It might be used to reduce the radiated emissions. Reviewed-by: Daniel Golle <daniel@makrotopia.org> Tested-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://patch.msgid.link/20260114104509.618984-3-alexander.sverdlin@siemens.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 10:08:24 -08:00
Alexander Sverdlin	4cc265663d	dt-bindings: net: dsa: lantiq,gswip: add MaxLinear R(G)MII slew rate Add new maxlinear,slew-rate-txc and maxlinear,slew-rate-txd uint32 properties. The properties are only applicable for ports in R(G)MII mode and allow for slew rate reduction in comparison to "normal" default configuration with the purpose to reduce radiated emissions. Signed-off-by: Alexander Sverdlin <alexander.sverdlin@siemens.com> Reviewed-by: Rob Herring (Arm) <robh@kernel.org> Link: https://patch.msgid.link/20260114104509.618984-2-alexander.sverdlin@siemens.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 10:08:24 -08:00
Jakub Kicinski	8f762b2e88	Merge branch 'ipv6-more-data-race-annotations' Eric Dumazet says: ==================== ipv6: more data-race annotations Inspired by one unrelated syzbot report. This series adds missing (and boring) data-race annotations in IPv6. Only the first patch adds sysctl_ipv6_flowlabel group to speedup ip6_make_flowlabel() a bit. ==================== Link: https://patch.msgid.link/20260115094141.3124990-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:46 -08:00
Eric Dumazet	f062e8e251	ipv6: annotate data-races in net/ipv6/route.c sysctls are read while their values can change, add READ_ONCE() annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:43 -08:00
Eric Dumazet	978b67d283	ipv6: exthdrs: annotate data-race over multiple sysctl Following four sysctls can change under us, add missing READ_ONCE(). - ipv6.sysctl.max_dst_opts_len - ipv6.sysctl.max_dst_opts_cnt - ipv6.sysctl.max_hbh_opts_len - ipv6.sysctl.max_hbh_opts_cnt Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:43 -08:00
Eric Dumazet	12eddc6857	ipv6: annotate data-races around sysctl.ip6_rt_gc_interval Add READ_ONCE() on lockless reads of net->ipv6.sysctl.ip6_rt_gc_interval Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	5ade47c974	ipv6: annotate data-races over sysctl.flowlabel_reflect Add missing READ_ONCE() when reading ipv6.sysctl.flowlabel_reflect, as its value can be changed under us. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	03e9d91dd6	ipv6: annotate data-races in ip6_multipath_hash_{policy,fields}() Add missing READ_ONCE() when reading sysctl values. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	3681282530	ipv6: annotate date-race in ipv6_can_nonlocal_bind() Add a missing READ_ONCE(), and add const qualifiers to the two parameters. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	ded139b59b	ipv6: annotate data-races from ip6_make_flowlabel() Use READ_ONCE() to read sysctl values in ip6_make_flowlabel() and ip6_make_flowlabel() Add a const qualifier to 'struct net' parameters. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Eric Dumazet	e82a347d92	ipv6: add sysctl_ipv6_flowlabel group Group together following struct netns_sysctl_ipv6 fields: - flowlabel_consistency - auto_flowlabels - flowlabel_state_ranges After this patch, ip6_make_flowlabel() uses a single cache line to fetch auto_flowlabels and flowlabel_state_ranges (instead of two before the patch). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260115094141.3124990-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-19 09:56:42 -08:00
Jakub Kicinski	b4e486e2c4	Merge branch 'net-convert-drivers-to-get_rx_ring_count-part-2' Breno Leitao says: ==================== net: convert drivers to .get_rx_ring_count (part 2) Commit `84eaf4359c` ("net: ethtool: add get_rx_ring_count callback to optimize RX ring queries") added specific support for GRXRINGS callback, simplifying .get_rxnfc. Remove the handling of GRXRINGS in .get_rxnfc() by moving it to the new .get_rx_ring_count(). This simplifies the RX ring count retrieval and aligns the following drivers with the new ethtool API for querying RX ring parameters. * engleder/tsnep * mediatek * amazon/ena * microchip/lan743x * amd/xgbe * chelsio/cxgb4 * wangxun/txgbe * cadence/macb All of these change were compile-tested only. ==================== Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-0-b3e1b58bced5@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 18:10:19 -08:00
Breno Leitao	c4279332f4	net: txgbe: convert to use .get_rx_ring_count Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-9-b3e1b58bced5@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 18:10:16 -08:00
Breno Leitao	d1c7ed5dfa	net: macb: convert to use .get_rx_ring_count Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-8-b3e1b58bced5@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 18:10:16 -08:00
Breno Leitao	ceec168d03	net: cxgb4: convert to use .get_rx_ring_count Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-7-b3e1b58bced5@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 18:10:16 -08:00
Breno Leitao	507353bf84	net: xgbe: convert to use .get_rx_ring_count Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Since ETHTOOL_GRXRINGS was the only command handled by xgbe_get_rxnfc(), remove the function entirely. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-6-b3e1b58bced5@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 18:10:16 -08:00
Breno Leitao	05ba304486	net: lan743x: convert to use .get_rx_ring_count Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Since ETHTOOL_GRXRINGS was the only command handled by lan743x_ethtool_get_rxnfc(), remove the function entirely. Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://patch.msgid.link/20260115-grxring_big_v2-v1-5-b3e1b58bced5@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2026-01-17 18:10:16 -08:00

1 2 3 4 5 ...

1413636 Commits All Branches Search

1413636 Commits

All Branches