Commit Graph

185 Commits

Author SHA1 Message Date
Larysa Zaremba 114f398d48 ice: Fix ice_xdp_xmit() when XDP TX queue number is not sufficient
The original patch added the static branch to handle the situation,
when assigning an XDP TX queue to every CPU is not possible,
so they have to be shared.

However, in the XDP transmit handler ice_xdp_xmit(), an error was
returned in such cases even before static condition was checked,
thus making queue sharing still impossible.

Fixes: 22bf877e52 ("ice: introduce XDP_TX fallback path")
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/r/20220919134346.25030-1-larysa.zaremba@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-09-21 17:33:29 -07:00
Jacob Keller f020481be5 ice: track Tx timestamp stats similar to other Intel drivers
Several Intel networking drivers which support PTP track when Tx timestamps
are skipped or when they timeout without a timestamp from hardware. The
conditions which could cause these events are rare, but it can be useful to
know when and how often they occur.

Implement similar statistics for the ice driver, tx_hwtstamp_skipped,
tx_hwtstamp_timeouts, and tx_hwtstamp_flushed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-08-16 09:04:38 -07:00
Przemyslaw Patynowski 01658aeead ice: Fix tunnel checksum offload with fragmented traffic
Fix checksum offload on VXLAN tunnels.
In case, when mpls protocol is not used, set l4 header to transport
header of skb. This fixes case, when user tries to offload checksums
of VXLAN tunneled traffic.

Steps for reproduction (requires link partner with tunnels):
ip l s enp130s0f0 up
ip a f enp130s0f0
ip a a 10.10.110.2/24 dev enp130s0f0
ip l s enp130s0f0 mtu 1600
ip link add vxlan12_sut type vxlan id 12 group 238.168.100.100 dev enp130s0f0 dstport 4789
ip l s vxlan12_sut up
ip a a 20.10.110.2/24 dev vxlan12_sut
iperf3 -c 20.10.110.1 #should connect

Offload params: td_offset, cd_tunnel_params were
corrupted, due to l4 header pointing wrong address. NIC would then drop
those packets internally, due to incorrect TX descriptor data,
which increased GLV_TEPC register.

Fixes: 69e66c04c6 ("ice: Add mpls+tso support")
Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com>
Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com>
Signed-off-by: Jedrzej Jagielski <jedrzej.jagielski@intel.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-07-26 13:15:36 -07:00
Joe Damato 69e66c04c6 ice: Add mpls+tso support
Attempt to add mpls+tso support.

I don't have ice hardware available to test myself, but I just implemented
this feature in i40e and thought it might be useful to implement for ice
while this is fresh in my brain.

Hoping some one at intel will be able to test this on my behalf.

Signed-off-by: Joe Damato <jdamato@fastly.com>
Tested-by: Gurucharan <gurucharanx.g@intel.com> (A Contingent worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-04-12 11:05:09 -07:00
Maciej Fijalkowski 457a02f03e ice: avoid XDP checks in ice_clean_tx_irq()
Commit 9610bd988d ("ice: optimize XDP_TX workloads") introduced Tx IRQ
cleaning routine dedicated for XDP rings. Currently it is impossible to
call ice_clean_tx_irq() against XDP ring, so it is safe to drop
ice_ring_is_xdp() calls in there.

Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>  (A Contingent Worker at Intel)
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-09 10:05:27 -08:00
Jacob Keller b03d519d34 ice: store VF pointer instead of VF ID
The VSI structure contains a vf_id field used to associate a VSI with a
VF. This is used mainly for ICE_VSI_VF as well as partially for
ICE_VSI_CTRL associated with the VFs.

This API was designed with the idea that VFs are stored in a simple
array that was expected to be static throughout most of the driver's
life.

We plan on refactoring VF storage in a few key ways:

  1) converting from a simple static array to a hash table
  2) using krefs to track VF references obtained from the hash table
  3) use RCU to delay release of VF memory until after all references
     are dropped

This is motivated by the goal to ensure that the lifetime of VF
structures is accounted for, and prevent various use-after-free bugs.

With the existing vsi->vf_id, the reference tracking for VFs would
become somewhat convoluted, because each VSI maintains a vf_id field
which will then require performing a look up. This means all these flows
will require reference tracking and proper usage of rcu_read_lock, etc.

We know that the VF VSI will always be backed by a valid VF structure,
because the VSI is created during VF initialization and removed before
the VF is destroyed. Rely on this and store a reference to the VF in the
VSI structure instead of storing a VF ID. This will simplify the usage
and avoid the need to perform lookups on the hash table in the future.

For ICE_VSI_VF, it is expected that vsi->vf is always non-NULL after
ice_vsi_alloc succeeds. Because of this, use WARN_ON when checking if a
vsi->vf pointer is valid when dealing with VF VSIs. This will aid in
debugging code which violates this assumption and avoid more disastrous
panics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-03-03 08:46:47 -08:00
David S. Miller adc27288f2 Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/next-queue
Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2022-02-09

This series contains updates to ice driver only.

Brett adds support for QinQ. This begins with code refactoring and
re-organization of VLAN configuration functions to allow for
introduction of VSI VLAN ops to enable setting and calling of
respective operations based on device support of single or double
VLANs. Implementations are added for outer VLAN support.

To support QinQ, the device must be set to double VLAN mode (DVM).
In order for this to occur, the DDP package and NVM must also support
DVM. Functions to determine compatibility and properly configure the
device are added as well as setting the proper bits to advertise and
utilize the proper offloads. Support for VIRTCHNL_VF_OFFLOAD_VLAN_V2
is also included to allow for VF to negotiate and utilize this
functionality.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2022-02-10 11:00:13 +00:00
Jakub Kicinski 1127170d45 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2022-02-09

We've added 126 non-merge commits during the last 16 day(s) which contain
a total of 201 files changed, 4049 insertions(+), 2215 deletions(-).

The main changes are:

1) Add custom BPF allocator for JITs that pack multiple programs into a huge
   page to reduce iTLB pressure, from Song Liu.

2) Add __user tagging support in vmlinux BTF and utilize it from BPF
   verifier when generating loads, from Yonghong Song.

3) Add per-socket fast path check guarding from cgroup/BPF overhead when
   used by only some sockets, from Pavel Begunkov.

4) Continued libbpf deprecation work of APIs/features and removal of their
   usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko
   and various others.

5) Improve BPF instruction set documentation by adding byte swap
   instructions and cleaning up load/store section, from Christoph Hellwig.

6) Switch BPF preload infra to light skeleton and remove libbpf dependency
   from it, from Alexei Starovoitov.

7) Fix architecture-agnostic macros in libbpf for accessing syscall
   arguments from BPF progs for non-x86 architectures,
   from Ilya Leoshkevich.

8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be
   of 16-bit field with anonymous zero padding, from Jakub Sitnicki.

9) Add new bpf_copy_from_user_task() helper to read memory from a different
   task than current. Add ability to create sleepable BPF iterator progs,
   from Kenny Yu.

10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and
    utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski.

11) Generate temporary netns names for BPF selftests to avoid naming
    collisions, from Hangbin Liu.

12) Implement bpf_core_types_are_compat() with limited recursion for
    in-kernel usage, from Matteo Croce.

13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5
    to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor.

14) Misc minor fixes to libbpf and selftests from various folks.

* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits)
  selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup
  bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide
  libbpf: Fix compilation warning due to mismatched printf format
  selftests/bpf: Test BPF_KPROBE_SYSCALL macro
  libbpf: Add BPF_KPROBE_SYSCALL macro
  libbpf: Fix accessing the first syscall argument on s390
  libbpf: Fix accessing the first syscall argument on arm64
  libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL
  selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390
  libbpf: Fix accessing syscall arguments on riscv
  libbpf: Fix riscv register names
  libbpf: Fix accessing syscall arguments on powerpc
  selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro
  libbpf: Add PT_REGS_SYSCALL_REGS macro
  selftests/bpf: Fix an endianness issue in bpf_syscall_macro test
  bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE
  bpf: Fix leftover header->pages in sparc and powerpc code.
  libbpf: Fix signedness bug in btf_dump_array_data()
  selftests/bpf: Do not export subtest as standalone test
  bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures
  ...
====================

Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-02-09 18:40:56 -08:00
Brett Creeley 0d54d8f7a1 ice: Add hot path support for 802.1Q and 802.1ad VLAN offloads
Currently the driver only supports 802.1Q VLAN insertion and stripping.
However, once Double VLAN Mode (DVM) is fully supported, then both 802.1Q
and 802.1ad VLAN insertion and stripping will be supported. Unfortunately
the VSI context parameters only allow for one VLAN ethertype at a time
for VLAN offloads so only one or the other VLAN ethertype offload can be
supported at once.

To support this, multiple changes are needed.

Rx path changes:

[1] In DVM, the Rx queue context l2tagsel field needs to be cleared so
the outermost tag shows up in the l2tag2_2nd field of the Rx flex
descriptor. In Single VLAN Mode (SVM), the l2tagsel field should remain
1 to support SVM configurations.

[2] Modify the ice_test_staterr() function to take a __le16 instead of
the ice_32b_rx_flex_desc union pointer so this function can be used for
both rx_desc->wb.status_error0 and rx_desc->wb.status_error1.

[3] Add the new inline function ice_get_vlan_tag_from_rx_desc() that
checks if there is a VLAN tag in l2tag1 or l2tag2_2nd.

[4] In ice_receive_skb(), add a check to see if NETIF_F_HW_VLAN_STAG_RX
is enabled in netdev->features. If it is, then this is the VLAN
ethertype that needs to be added to the stripping VLAN tag. Since
ice_fix_features() prevents CTAG_RX and STAG_RX from being enabled
simultaneously, the VLAN ethertype will only ever be 802.1Q or 802.1ad.

Tx path changes:

[1] In DVM, the VLAN tag needs to be placed in the l2tag2 field of the Tx
context descriptor. The new define ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN was
added to the list of tx_flags to handle this case.

[2] When the stack requests the VLAN tag to be offloaded on Tx, the
driver needs to set either ICE_TX_FLAGS_HW_OUTER_SINGLE_VLAN or
ICE_TX_FLAGS_HW_VLAN, so the tag is inserted in l2tag2 or l2tag1
respectively. To determine which location to use, set a bit in the Tx
ring flags field during ring allocation that can be used to determine
which field to use in the Tx descriptor. In DVM, always use l2tag2,
and in SVM, always use l2tag1.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-02-09 09:24:45 -08:00
Alexander Lobakin ee803dca96 ice: respect metadata in legacy-rx/ice_construct_skb()
In "legacy-rx" mode represented by ice_construct_skb(), we can
still use XDP (and XDP metadata), but after XDP_PASS the metadata
will be lost as it doesn't get copied to the skb.
Copy it along with the frame headers. Account its size on skb
allocation, and when copying just treat it as a part of the frame
and do a pull after to "move" it to the "reserved" zone.
Point net_prefetch() to xdp->data_meta instead of data. This won't
change anything when the meta is not here, but will save some cache
misses otherwise.

Suggested-by: Jesper Dangaard Brouer <brouer@redhat.com>
Suggested-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2022-01-31 09:43:35 -08:00
Maciej Fijalkowski 126cdfe100 ice: xsk: Improve AF_XDP ZC Tx and use batching API
Apply the logic that was done for regular XDP from commit 9610bd988d
("ice: optimize XDP_TX workloads") to the ZC side of the driver. On top
of that, introduce batching to Tx that is inspired by i40e's
implementation with adjustments to the cleaning logic - take into the
account NAPI budget in ice_clean_xdp_irq_zc().

Separating the stats structs onto separate cache lines seemed to improve
the performance.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Link: https://lore.kernel.org/bpf/20220125160446.78976-8-maciej.fijalkowski@intel.com
2022-01-27 17:25:33 +01:00
Maciej Fijalkowski 86e3f78c8d ice: xsk: Avoid potential dead AF_XDP Tx processing
Commit 9610bd988d ("ice: optimize XDP_TX workloads") introduced
@next_dd and @next_rs to ice_tx_ring struct. Currently, their state is
not restored in ice_clean_tx_ring(), which was not causing any troubles
as the XDP rings are gone after we're done with XDP prog on interface.

For upcoming usage of mentioned fields in AF_XDP, this might expose us
to a potential dead Tx side. Scenario would look like following (based
on xdpsock):

- two xdpsock instances are spawned in Tx mode
- one of them is killed
- XDP prog is kept on interface due to the other xdpsock still running
  * this means that XDP rings stayed in place
- xdpsock is launched again on same queue id that was terminated on
- @next_dd and @next_rs setting is bogus, therefore transmit side is
  broken

To protect us from the above, restore the initial @next_rs and @next_dd
values when cleaning the Tx ring.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220125160446.78976-7-maciej.fijalkowski@intel.com
2022-01-27 17:25:33 +01:00
Maciej Fijalkowski a4e186693c ice: Remove likely for napi_complete_done
Remove the likely before napi_complete_done as this is the unlikely case
when busy-poll is used. Removing this has a positive performance impact
for busy-poll and no negative impact to the regular case.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20220125160446.78976-2-maciej.fijalkowski@intel.com
2022-01-27 17:25:32 +01:00
David S. Miller e63a023489 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-12-30

The following pull-request contains BPF updates for your *net-next* tree.

We've added 72 non-merge commits during the last 20 day(s) which contain
a total of 223 files changed, 3510 insertions(+), 1591 deletions(-).

The main changes are:

1) Automatic setrlimit in libbpf when bpf is memcg's in the kernel, from Andrii.

2) Beautify and de-verbose verifier logs, from Christy.

3) Composable verifier types, from Hao.

4) bpf_strncmp helper, from Hou.

5) bpf.h header dependency cleanup, from Jakub.

6) get_func_[arg|ret|arg_cnt] helpers, from Jiri.

7) Sleepable local storage, from KP.

8) Extend kfunc with PTR_TO_CTX, PTR_TO_MEM argument support, from Kumar.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-12-31 14:35:40 +00:00
Alexander Lobakin 5ce6663158 ice: switch to napi_build_skb()
napi_build_skb() reuses per-cpu NAPI skbuff_head cache in order
to save some cycles on freeing/allocating skbuff_heads on every
new Rx or completed Tx.
ice driver runs Tx completion polling cycle right before the Rx
one and uses napi_consume_skb() to feed the cache with skbuff_heads
of completed entries, so it's never empty and always warm at that
moment. Switch to the napi_build_skb() to relax mm pressure on
heavy Rx.

Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com>
Reviewed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-28 09:42:33 -08:00
Jakub Kicinski 8b3f913322 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
include/net/sock.h
  commit 8f905c0e73 ("inet: fully convert sk->sk_rx_dst to RCU rules")
  commit 43f51df417 ("net: move early demux fields close to sk_refcnt")
  https://lore.kernel.org/all/20211222141641.0caa0ab3@canb.auug.org.au/

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-12-23 16:09:58 -08:00
Maciej Fijalkowski 617f3e1b58 ice: xsk: allocate separate memory for XDP SW ring
Currently, the zero-copy data path is reusing the memory region that was
initially allocated for an array of struct ice_rx_buf for its own
purposes. This is error prone as it is based on the ice_rx_buf struct
always being the same size or bigger than what the zero-copy path needs.
There can also be old values present in that array giving rise to errors
when the zero-copy path uses it.

Fix this by freeing the ice_rx_buf region and allocating a new array for
the zero-copy path that has the right length and is initialized to zero.

Fixes: 57f7f8b6bc ("ice: Use xdp_buf instead of rx_buf for xsk zero-copy")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-17 11:09:04 -08:00
Jesse Brandeburg 9c99d099f7 ice: use modern kernel API for kick
The kernel gained a new interface for drivers to use to combine tail
bump (doorbell) and BQL updates, attempt to use those new interfaces.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:49:25 -08:00
Jesse Brandeburg cc14db11c8 ice: use prefetch methods
The kernel provides some prefetch mechanisms to speed up commonly
cold cache line accesses during receive processing. Since these are
software structures it helps to have these strategically placed
prefetches.

Be careful to call BQL prefetch complete only for non XDP queues.

Co-developed-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:46:28 -08:00
Jesse Brandeburg 1c96c16858 ice: update to newer kernel API
Use the netif_tx_* API from netdevice.h which has simpler parameters.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-12-15 08:45:28 -08:00
Paolo Abeni c8064e5b4a bpf: Let bpf_warn_invalid_xdp_action() report more info
In non trivial scenarios, the action id alone is not sufficient to
identify the program causing the warning. Before the previous patch,
the generated stack-trace pointed out at least the involved device
driver.

Let's additionally include the program name and id, and the relevant
device name.

If the user needs additional infos, he can fetch them via a kernel
probe, leveraging the arguments added here.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Toke Høiland-Jørgensen <toke@redhat.com>
Link: https://lore.kernel.org/bpf/ddb96bb975cbfddb1546cf5da60e77d5100b533c.1638189075.git.pabeni@redhat.com
2021-12-13 22:28:27 +01:00
Gustavo A. R. Silva 6f3323536a ice: use devm_kcalloc() instead of devm_kzalloc()
Use 2-factor multiplication argument form devm_kcalloc() instead
of devm_kzalloc().

Link: https://github.com/KSPP/linux/issues/162
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19 10:53:28 -07:00
Jesse Brandeburg 23be7075b3 ice: fix software generating extra interrupts
The driver tried to work around missing completion events that occurred
while interrupts are disabled, by triggering a software interrupt
whenever we exit polling (but we had to have polled at least once).

This was causing a *lot* of extra interrupts for some workloads like
NVMe over TCP, which resulted in regressions in performance. It was also
visible when polling didn't prevent interrupts when busy_poll was
enabled.

Fix the extra interrupts by utilizing our previously unused 3rd ITR
(interrupt throttle) index and set it to 20K interrupts per second, and
then trigger a software interrupt within that rate limit.

While here, slightly refactor the code to avoid an overwrite of a local
variable in the case of wb_en = true.

Fixes: b7306b42be ("ice: manage interrupts during poll exit")
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19 10:45:17 -07:00
Jesse Brandeburg d8eb7ad5e4 ice: update dim usage and moderation
The driver was having trouble with unreliable latency when doing single
threaded ping-pong tests. This was root caused to the DIM algorithm
landing on a too slow interrupt value, which caused high latency, and it
was especially present when queues were being switched frequently by the
scheduler as happens on default setups today.

In attempting to improve this, we allow the upper rate limit for
interrupts to move to rate limit of 4 microseconds as a max, which means
that no vector can generate more than 250,000 interrupts per second. The
old config was up to 100,000. The driver previously tried to program the
rate limit too frequently and if the receive and transmit side were both
active on the same vector, the INTRL would be set incorrectly, and this
change fixes that issue as a side effect of the redesign.

This driver will operate from now on with a slightly changed DIM table
with more emphasis towards latency sensitivity by having more table
entries with lower latency than with high latency (high being >= 64
microseconds).

The driver also resets the DIM algorithm state with a new stats set when
there is no work done and the data becomes stale (older than 1 second),
for the respective receive or transmit portion of the interrupt.

Add a new helper for setting rate limit, which will be used more
in a followup patch.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-19 10:45:16 -07:00
Maciej Fijalkowski 22bf877e52 ice: introduce XDP_TX fallback path
Under rare circumstances there might be a situation where a requirement
of having XDP Tx queue per CPU could not be fulfilled and some of the Tx
resources have to be shared between CPUs. This yields a need for placing
accesses to xdp_ring inside a critical section protected by spinlock.
These accesses happen to be in the hot path, so let's introduce the
static branch that will be triggered from the control plane when driver
could not provide Tx queue dedicated for XDP on each CPU.

Currently, the design that has been picked is to allow any number of XDP
Tx queues that is at least half of a count of CPUs that platform has.
For lower number driver will bail out with a response to user that there
were not enough Tx resources that would allow configuring XDP. The
sharing of rings is signalled via static branch enablement which in turn
indicates that lock for xdp_ring accesses needs to be taken in hot path.

Approach based on static branch has no impact on performance of a
non-fallback path. One thing that is needed to be mentioned is a fact
that the static branch will act as a global driver switch, meaning that
if one PF got out of Tx resources, then other PFs that ice driver is
servicing will suffer. However, given the fact that HW that ice driver
is handling has 1024 Tx queues per each PF, this is currently an
unlikely scenario.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:03 -07:00
Maciej Fijalkowski 9610bd988d ice: optimize XDP_TX workloads
Optimize Tx descriptor cleaning for XDP. Current approach doesn't
really scale and chokes when multiple flows are handled.

Introduce two ring fields, @next_dd and @next_rs that will keep track of
descriptor that should be looked at when the need for cleaning arise and
the descriptor that should have the RS bit set, respectively.

Note that at this point the threshold is a constant (32), but it is
something that we could make configurable.

First thing is to get away from setting RS bit on each descriptor. Let's
do this only once NTU is higher than the currently @next_rs value. In
such case, grab the tx_desc[next_rs], set the RS bit in descriptor and
advance the @next_rs by a 32.

Second thing is to clean the Tx ring only when there are less than 32
free entries. For that case, look up the tx_desc[next_dd] for a DD bit.
This bit is written back by HW to let the driver know that xmit was
successful. It will happen only for those descriptors that had RS bit
set. Clean only 32 descriptors and advance the DD bit.

Actual cleaning routine is moved from ice_napi_poll() down to the
ice_xmit_xdp_ring(). It is safe to do so as XDP ring will not get any
SKBs in there that would rely on interrupts for the cleaning. Nice side
effect is that for rare case of Tx fallback path (that next patch is
going to introduce) we don't have to trigger the SW irq to clean the
ring.

With those two concepts, ring is kept at being almost full, but it is
guaranteed that driver will be able to produce Tx descriptors.

This approach seems to work out well even though the Tx descriptors are
produced in one-by-one manner. Test was conducted with the ice HW
bombarded with packets from HW generator, configured to generate 30
flows.

Xdp2 sample yields the following results:
<snip>
proto 17:   79973066 pkt/s
proto 17:   80018911 pkt/s
proto 17:   80004654 pkt/s
proto 17:   79992395 pkt/s
proto 17:   79975162 pkt/s
proto 17:   79955054 pkt/s
proto 17:   79869168 pkt/s
proto 17:   79823947 pkt/s
proto 17:   79636971 pkt/s
</snip>

As that sample reports the Rx'ed frames, let's look at sar output.
It says that what we Rx'ed we do actually Tx, no noticeable drops.
Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
Average:       ens4f1 79842324.00 79842310.40 4678261.17 4678260.38 0.00      0.00      0.00     38.32

with tx_busy staying calm.

When compared to a state before:
Average:        IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s txcmp/s  rxmcst/s   %ifutil
Average:       ens4f1 90919711.60 42233822.60 5327326.85 2474638.04 0.00      0.00      0.00     43.64

it can be observed that the amount of txpck/s is almost doubled, meaning
that the performance is improved by around 90%. All of this due to the
drops in the driver, previously the tx_busy stat was bumped at a 7mpps
rate.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:03 -07:00
Maciej Fijalkowski eb087cd828 ice: propagate xdp_ring onto rx_ring
With rings being split, it is now convenient to introduce a pointer to
XDP ring within the Rx ring. For XDP_TX workloads this means that
xdp_rings array access will be skipped, which was executed per each
processed frame.

Also, read the XDP prog once per NAPI and if prog is present, set up the
local xdp_ring pointer. Reading prog a single time was discussed in [1]
with some concern raised by Toke around dispatcher handling and having
the need for going through the RCU grace period in the ndo_bpf driver
callback, but ice currently is torning down NAPI instances regardless of
the prog presence on VSI.

Although the pointer to XDP ring introduced to Rx ring makes things a
lot slimmer/simpler, I still feel that single prog read per NAPI
lifetime is beneficial.

Further patch that will introduce the fallback path will also get a
profit from that as xdp_ring pointer will be set during the XDP rings
setup.

[1]: https://lore.kernel.org/bpf/87k0oseo6e.fsf@toke.dk/

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:03 -07:00
Maciej Fijalkowski a55e16fa33 ice: do not create xdp_frame on XDP_TX
xdp_frame is not needed for XDP_TX data path in ice driver case.
For this data path cleaning of sent descriptor will not happen anywhere
outside of the driver, which means that carrying the information about
the underlying memory model via xdp_frame will not be used. Therefore,
this conversion can be simply dropped, which would relieve CPU a bit.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:03 -07:00
Maciej Fijalkowski e72bba2135 ice: split ice_ring onto Tx/Rx separate structs
While it was convenient to have a generic ring structure that served
both Tx and Rx sides, next commits are going to introduce several
Tx-specific fields, so in order to avoid hurting the Rx side, let's
pull out the Tx ring onto new ice_tx_ring and ice_rx_ring structs.

Rx ring could be handled by the old ice_ring which would reduce the code
churn within this patch, but this would make things asymmetric.

Make the union out of the ring container within ice_q_vector so that it
is possible to iterate over newly introduced ice_tx_ring.

Remove the @size as it's only accessed from control path and it can be
calculated pretty easily.

Change definitions of ice_update_ring_stats and
ice_fetch_u64_stats_per_ring so that they are ring agnostic and can be
used for both Rx and Tx rings.

Sizes of Rx and Tx ring structs are 256 and 192 bytes, respectively. In
Rx ring xdp_rxq_info occupies its own cacheline, so it's the major
difference now.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-15 07:39:02 -07:00
Grzegorz Nitka f5396b8a66 ice: switchdev slow path
Slow path means allowing packet to go from uplink to representor
and from representor to correct VF on Rx site and from VF to
representor and to uplink on Tx site.

To accomplish this driver, has to set correct Tx descriptor. When
packet is sent from representor to VF, destination should be
set to VF VSI. When packet is sent from uplink port destination
should be uplink to bypass switch infrastructure and send packet
outside.

On Rx site driver should check source VSI field from Rx descriptor
and based on that forward packed to correct netdev. To allow
this there is a target netdevs table in control plane VSI
struct.

Co-developed-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Michal Swiatkowski <michal.swiatkowski@linux.intel.com>
Signed-off-by: Grzegorz Nitka <grzegorz.nitka@intel.com>
Tested-by: Sandeep Penigalapati <sandeep.penigalapati@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-10-07 10:41:42 -07:00
Dave Ertman 2a87bd73e5 ice: Add DSCP support
Implement code to handle submission of APP TLV's
containing DSCP to TC mapping.

The first such mapping received on an interface
will cause that PF to switch to L3 DSCP QoS mode,
apply the default config for that mode, and apply
the received mapping.

Only one such mapping will be allowed per DSCP value,
and when the last DSCP mapping is deleted, the PF
will switch back into L2 VLAN QoS mode, applying the
appropriate default QoS settings.

L3 DSCP QoS mode will only be allowed in SW DCBx
mode, in other words, when the FW LLDP engine is
disabled.  Commands that break this mutual exclusivity
will be blocked.

Co-developed-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Gurucharan G <gurucharanx.g@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-09-28 09:42:04 -07:00
David S. Miller e1289cfb63 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2021-06-28

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 12 day(s) which contain
a total of 56 files changed, 394 insertions(+), 380 deletions(-).

The main changes are:

1) XDP driver RCU cleanups, from Toke Høiland-Jørgensen and Paul E. McKenney.

2) Fix bpf_skb_change_proto() IPv4/v6 GSO handling, from Maciej Żenczykowski.

3) Fix false positive kmemleak report for BPF ringbuf alloc, from Rustam Kovhaev.

4) Fix x86 JIT's extable offset calculation for PROBE_LDX NULL, from Ravi Bangoria.

5) Enable libbpf fallback probing with tracing under RHEL7, from Jonathan Edwards.

6) Clean up x86 JIT to remove unused cnt tracking from EMIT macro, from Jiri Olsa.

7) Netlink cleanups for libbpf to please Coverity, from Kumar Kartikeya Dwivedi.

8) Allow to retrieve ancestor cgroup id in tracing programs, from Namhyung Kim.

9) Fix lirc BPF program query to use user-provided prog_cnt, from Sean Young.

10) Add initial libbpf doc including generated kdoc for its API, from Grant Seltzer.

11) Make xdp_rxq_info_unreg_mem_model() more robust, from Jakub Kicinski.

12) Fix up bpfilter startup log-level to info level, from Gary Lin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-06-28 15:28:03 -07:00
Jesse Brandeburg 3089cf6d3c ice: add tracepoints
This patch is modeled after one by Scott Peterson for i40e.

Add tracepoints to the driver, via a new file ice_trace.h and some new
trace calls added in interesting places in the driver. Add some tracing
for DIMLIB to help debug interrupt moderation problems.

Performance should not be affected, and this can be very useful
for debugging and adding new trace events to paths in the future.

Note eBPF programs can attach to these events, as well as perf
can count them since we're attaching to the events subsystem
in the kernel.

Co-developed-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-25 08:32:18 -07:00
Toke Høiland-Jørgensen 49589b23d5 intel: Remove rcu_read_lock() around XDP program invocation
The Intel drivers all have rcu_read_lock()/rcu_read_unlock() pairs around
XDP program invocations. However, the actual lifetime of the objects
referred by the XDP program invocation is longer, all the way through to
the call to xdp_do_flush(), making the scope of the rcu_read_lock() too
small. This turns out to be harmless because it all happens in a single
NAPI poll cycle (and thus under local_bh_disable()), but it makes the
rcu_read_lock() misleading.

Rather than extend the scope of the rcu_read_lock(), just get rid of it
entirely. With the addition of RCU annotations to the XDP_REDIRECT map
types that take bh execution into account, lockdep even understands this to
be safe, so there's really no reason to keep it around.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Jesper Dangaard Brouer <brouer@redhat.com> # i40e
Cc: Jesse Brandeburg <jesse.brandeburg@intel.com>
Cc: Tony Nguyen <anthony.l.nguyen@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20210624160609.292325-12-toke@redhat.com
2021-06-24 19:44:34 +02:00
Jesse Brandeburg dda90cb90a ice: report hash type such as L2/L3/L4
The hardware is reporting the type of the hash used for RSS
as a PTYPE field in the receive descriptor. Use this value to set
the skb packet hash type by extending the hash type table to
cover all 10-bits of possible values (requiring some variables
to be changed from u8 to u16), and then use that table to convert
to one of the possible values in enum pkt_hash_types.

While we're here, remove the unused ptype struct value, which
makes table init easier for the zero entries, and use ranged
initializer to remove a bunch of code (works with gcc and clang).

Without this change, the kernel will recalculate the hash in software,
which can consume extra CPU cycles.

Co-developed-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-18 08:59:46 -07:00
Jacob Keller ea9b847cda ice: enable transmit timestamps for E810 devices
Add support for enabling Tx timestamp requests for outgoing packets on
E810 devices.

The ice hardware can support multiple outstanding Tx timestamp requests.
When sending a descriptor to hardware, a Tx timestamp request is made by
setting a request bit, and assigning an index that represents which Tx
timestamp index to store the timestamp in.

Hardware makes no effort to synchronize the index use, so it is up to
software to ensure that Tx timestamp indexes are not re-used before the
timestamp is reported back.

To do this, introduce a Tx timestamp tracker which will keep track of
currently in-use indexes.

In the hot path, if a packet has a timestamp request, an index will be
requested from the tracker. Unfortunately, this does require a lock as
the indexes are shared across all queues on a PHY. There are not enough
indexes to reliably assign only 1 to each queue.

For the E810 devices, the timestamp indexes are not shared across PHYs,
so each port can have its own tracking.

Once hardware captures a timestamp, an interrupt is fired. In this
interrupt, trigger a new work item that will figure out which timestamp
was completed, and report the timestamp back to the stack.

This function loops through the Tx timestamp indexes and checks whether
there is now a valid timestamp. If so, it clears the PHY timestamp
indication in the PHY memory, locks and removes the SKB and bit in the
tracker, then reports the timestamp to the stack.

It is possible in some cases that a timestamp request will be initiated
but never completed. This might occur if the packet is dropped by
software or hardware before it reaches the PHY.

Add a task to the periodic work function that will check whether
a timestamp request is more than a few seconds old. If so, the timestamp
index is cleared in the PHY, and the SKB is released.

Just as with Rx timestamps, the Tx timestamps are only 40 bits wide, and
use the same overall logic for extending to 64 bits of nanoseconds.

With this change, E810 devices should be able to perform basic PTP
functionality.

Future changes will extend the support to cover the E822-based devices.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-11 08:47:41 -07:00
Dave Ertman f9f83202b7 ice: Allow all LLDP packets from PF to Tx
Currently in the ice driver, the check whether to
allow a LLDP packet to egress the interface from the
PF_VSI is being based on the SKB's priority field.
It checks to see if the packets priority is equal to
TC_PRIO_CONTROL.  Injected LLDP packets do not always
meet this condition.

SCAPY defaults to a sk_buff->protocol value of ETH_P_ALL
(0x0003) and does not set the priority field.  There will
be other injection methods (even ones used by end users)
that will not correctly configure the socket so that
SKB fields are correctly populated.

Then ethernet header has to have to correct value for
the protocol though.

Add a check to also allow packets whose ethhdr->h_proto
matches ETH_P_LLDP (0x88CC).

Fixes: 0c3a6101ff ("ice: Allow egress control packets from PF_VSI")
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-04 07:37:48 -07:00
Magnus Karlsson 89d65df024 ice: add correct exception tracing for XDP
Add missing exception tracing to XDP when a number of different
errors can occur. The support was only partial. Several errors
where not logged which would confuse the user quite a lot not
knowing where and why the packets disappeared.

Fixes: efc2214b60 ("ice: Add support for XDP")
Fixes: 2d4238f556 ("ice: Add support for AF_XDP")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-06-03 08:38:37 -07:00
Jesse Brandeburg d59684a07e ice: refactor ITR data structures
Use a dedicated bitfield in order to both increase
the amount of checking around the length of ITR writes
as well as simplify the checks of dynamic mode.

Basically unpack the "high bit means dynamic" logic
into bitfields.

Also, remove some unused ITR defines.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14 17:00:06 -07:00
Jesse Brandeburg b7306b42be ice: manage interrupts during poll exit
The driver would occasionally miss that there were outstanding
descriptors to clean when exiting busy/napi poll. This issue has
been in the code since the introduction of the ice driver.

Attempt to "catch" any remaining work by triggering a software
interrupt when exiting napi poll or busy-poll. This will not
cause extra interrupts in the case of normal execution.

This issue was found when running sfnt-pingpong, with busy
poll enabled, and typically with larger I/O sizes like > 8192,
the program would occasionally report > 1 second maximums
to complete a ping pong.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14 17:00:05 -07:00
Jacob Keller cdf1f1f169 ice: replace custom AIM algorithm with kernel's DIM library
The ice driver has support for adaptive interrupt moderation, an
algorithm for tuning the interrupt rate dynamically. This algorithm
is based on various assumptions about ring size, socket buffer size,
link speed, SKB overhead, ethernet frame overhead and more.

The Linux kernel has support for a dynamic interrupt moderation
algorithm known as "dimlib". Replace the custom driver-specific
implementation of dynamic interrupt moderation with the kernel's
algorithm.

The Intel hardware has a different hardware implementation than the
originators of the dimlib code had to work with, which requires the
driver to use a slightly different set of inputs for the actual
moderation values, while getting all the advice from dimlib of
better/worse, shift left or right.

The change made for this implementation is to use a pair of values
for each of the 5 "slots" that the dimlib moderation expects, and
the driver will program those pairs when dimlib recommends a slot to
use. The currently implementation uses two tables, one for receive
and one for transmit, and the pairs of values in each slot set the
maximum delay of an interrupt and a maximum number of interrupts per
second (both expressed in microseconds).

There are two separate kinds of bugs fixed by using DIMLIB, one is
UDP single stream send was too slow, and the other is that 8K
ping-pong was going to the most aggressive moderation and has much
too high latency.

The overall result of using DIMLIB is that we meet or exceed our
performance expectations set based on the old algorithm.

Co-developed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-04-14 17:00:05 -07:00
Anirudh Venkataramanan e97fb1aea9 ice: Consolidate VSI state and flags
struct ice_vsi has two fields, state and flags which seem to
be serving the same purpose. Consolidate them into one field
'state'.

enum ice_state is used to represent state information of the PF.
While some of these enum values can be use to represent VSI state,
it makes more sense to represent VSI state with its own enum. So
derive a new enum ice_vsi_state from ice_vsi_flags and ice_state
and use it. Also rename enum ice_state to ice_pf_state for clarity.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-31 14:21:28 -07:00
David S. Miller 241949e488 Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Alexei Starovoitov says:

====================
pull-request: bpf-next 2021-03-24

The following pull-request contains BPF updates for your *net-next* tree.

We've added 37 non-merge commits during the last 15 day(s) which contain
a total of 65 files changed, 3200 insertions(+), 738 deletions(-).

The main changes are:

1) Static linking of multiple BPF ELF files, from Andrii.

2) Move drop error path to devmap for XDP_REDIRECT, from Lorenzo.

3) Spelling fixes from various folks.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 16:30:46 -07:00
David S. Miller efd13b71a3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Signed-off-by: David S. Miller <davem@davemloft.net>
2021-03-25 15:31:22 -07:00
Qi Zhang d6218317e2 ice: Check FDIR program status for AVF
Enable returning FDIR completion status by checking the
ctrl_vsi Rx queue descriptor value.

To enable returning FDIR completion status from ctrl_vsi Rx queue,
COMP_Queue and COMP_Report of FDIR filter programming descriptor
needs to be properly configured. After program request sent to ctrl_vsi
Tx queue, ctrl_vsi Rx queue interrupt will be triggered and
completion status will be returned.

Driver will first issue request in ice_vc_fdir_add_fltr(), then
pass FDIR context to the background task in interrupt service routine
ice_vc_fdir_irq_handler() and finally deal with them in
ice_flush_fdir_ctx(). ice_flush_fdir_ctx() will check the descriptor's
value, fdir context, and then send back virtual channel message to VF
by calling ice_vc_add_fdir_fltr_post(). An additional timer will be
setup in case of hardware interrupt timeout.

Signed-off-by: Yahui Cao <yahui.cao@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Qi Zhang <qi.z.zhang@intel.com>
Tested-by: Chen Bo <BoX.C.Chen@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-22 11:32:12 -07:00
Lorenzo Bianconi fdc13979f9 bpf, devmap: Move drop error path to devmap for XDP_REDIRECT
We want to change the current ndo_xdp_xmit drop semantics because it will
allow us to implement better queue overflow handling. This is working
towards the larger goal of a XDP TX queue-hook. Move XDP_REDIRECT error
path handling from each XDP ethernet driver to devmap code. According to
the new APIs, the driver running the ndo_xdp_xmit pointer, will break tx
loop whenever the hw reports a tx error and it will just return to devmap
caller the number of successfully transmitted frames. It will be devmap
responsibility to free dropped frames.

Move each XDP ndo_xdp_xmit capable driver to the new APIs:

- veth
- virtio-net
- mvneta
- mvpp2
- socionext
- amazon ena
- bnxt
- freescale (dpaa2, dpaa)
- xen-frontend
- qede
- ice
- igb
- ixgbe
- i40e
- mlx5
- ti (cpsw, cpsw-new)
- tun
- sfc

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Ioana Ciornei <ioana.ciornei@nxp.com>
Reviewed-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Reviewed-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Edward Cree <ecree.xilinx@gmail.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Link: https://lore.kernel.org/bpf/ed670de24f951cfd77590decf0229a0ad7fd12f6.1615201152.git.lorenzo@kernel.org
2021-03-18 16:38:51 +01:00
Maciej Fijalkowski 89861c485c ice: move headroom initialization to ice_setup_rx_ctx
ice_rx_offset(), that is supposed to initialize the Rx buffer headroom,
relies on ICE_RX_FLAGS_RING_BUILD_SKB flag as well as XDP prog presence.

Currently, the callsite of mentioned function is placed incorrectly
within ice_setup_rx_ring() where Rx ring's build skb flag is not
set yet. This causes the XDP_REDIRECT to be partially broken due to
inability to create xdp_frame in the headroom space, as the headroom is
0.

Fix this by moving ice_rx_offset() to ice_setup_rx_ctx() after the flag
setting.

Fixes: f1b1f409bf ("ice: store the result of ice_rx_offset() onto ice_ring")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-03-12 07:43:46 -08:00
Maciej Fijalkowski f1b1f409bf ice: store the result of ice_rx_offset() onto ice_ring
Output of ice_rx_offset() is based on ethtool's priv flag setting, which
when changed, causes PF reset (disables napi, frees irqs, loads
different Rx mem model, etc.). This means that within napi its result is
constant and there is no reason to call it per each processed frame.

Add new 'rx_offset' field to ice_ring that is meant to hold the
ice_rx_offset() result and use it within ice_clean_rx_irq().
Furthermore, use it within ice_alloc_mapped_page().

Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12 10:36:57 -08:00
Maciej Fijalkowski 29b82f2a09 ice: move skb pointer from rx_buf to rx_ring
Similar thing has been done in i40e, as there is no real need for having
the sk_buff pointer in each rx_buf. Non-eop frames can be simply handled
on that pointer moved upwards to rx_ring.

Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12 10:18:40 -08:00
Maciej Fijalkowski 59c97d1b51 ice: simplify ice_run_xdp
There's no need for 'result' variable, we can directly return the
internal status based on action returned by xdp prog.

Reviewed-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Kiran Bhandare <kiranx.bhandare@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-12 10:14:30 -08:00
Chinh T Cao fc2d1165d4 ice: Refactor DCB related variables out of the ice_port_info struct
Refactor the DCB related variables out of the ice_port_info_struct. The
goal is to make the ice_port_info struct cleaner.

Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Co-developed-by: Dave Ertman <david.m.ertman@intel.com>
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08 16:27:01 -08:00
Jesse Brandeburg 1d9f7ca324 ice: fix writeback enable logic
The writeback enable logic was incorrectly implemented (due to
misunderstanding what the side effects of the implementation would be
during polling).

Fix this logic issue, while implementing a new feature allowing the user
to control the writeback frequency using the knobs for controlling
interrupt throttling that we already have.  Basically if you leave
adaptive interrupts enabled, the writeback frequency will be varied even
if busy_polling or if napi-poll is in use.  If the interrupt rates are
set to a fixed value by ethtool -C and adaptive is off, the driver will
allow the user-set interrupt rate to guide how frequently the hardware
will complete descriptors to the driver.

Effectively the user will get a control over the hardware efficiency,
allowing the choice between immediate interrupts or delayed up to a
maximum of the interrupt rate, even when interrupts are disabled
during polling.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Co-developed-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-02-08 16:27:01 -08:00
Alexander Lobakin a79afa78e6 net: use the new dev_page_is_reusable() instead of private versions
Now we can remove a bunch of identical functions from the drivers and
make them use common dev_page_is_reusable(). All {,un}likely() checks
are omitted since it's already present in this helper.
Also update some comments near the call sites.

Suggested-by: David Rientjes <rientjes@google.com>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Cc: John Hubbard <jhubbard@nvidia.com>
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
Reviewed-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-02-04 18:20:14 -08:00
Jakub Kicinski c358f95205 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
drivers/net/can/dev.c
  b552766c87 ("can: dev: prevent potential information leak in can_fill_info()")
  3e77f70e73 ("can: dev: move driver related infrastructure into separate subdir")
  0a042c6ec9 ("can: dev: move netlink related code into seperate file")

  Code move.

drivers/net/ethernet/mellanox/mlx5/core/en_ethtool.c
  57ac4a31c4 ("net/mlx5e: Correctly handle changing the number of queues when the interface is down")
  214baf2287 ("net/mlx5e: Support HTB offload")

  Adjacent code changes

net/switchdev/switchdev.c
  20776b465c ("net: switchdev: don't set port_obj_info->handled true when -EOPNOTSUPP")
  ffb68fc58e ("net: switchdev: remove the transaction structure from port object notifiers")
  bae33f2b5a ("net: switchdev: remove the transaction structure from port attributes")

  Transaction parameter gets dropped otherwise keep the fix.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2021-01-28 17:09:31 -08:00
Nick Nunley 1b0b0b581b ice: Implement flow for IPv6 next header (extension header)
This patch is based on a similar change to i40e by Slawomir Laba:
"i40e: Implement flow for IPv6 next header (extension header)".

When a packet contains an IPv6 header with next header which is
an extension header and not a protocol one, the kernel function
skb_transport_header called with such sk_buff will return a
pointer to the extension header and not to the TCP one.

The above explained call caused a problem with packet processing
for skb with encapsulation for tunnel with ICE_TX_CTX_EIPT_IPV6.
The extension header was not skipped at all.

The ipv6_skip_exthdr function does check if next header of the IPV6
header is an extension header and doesn't modify the l4_proto pointer
if it points to a protocol header value so its safe to omit the
comparison of exthdr and l4.hdr pointers. The ipv6_skip_exthdr can
return value -1. This means that the skipping process failed
and there is something wrong with the packet so it will be dropped.

Fixes: a4e82a81f5 ("ice: Add support for tunnel offloads")
Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Tony Brelinski <tonyx.brelinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2021-01-26 10:43:49 -08:00
Lorenzo Bianconi be9df4aff6 net, xdp: Introduce xdp_prepare_buff utility routine
Introduce xdp_prepare_buff utility routine to initialize per-descriptor
xdp_buff fields (e.g. xdp_buff pointers). Rely on xdp_prepare_buff() in
all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/45f46f12295972a97da8ca01990b3e71501e9d89.1608670965.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08 13:39:24 -08:00
Lorenzo Bianconi 43b5169d83 net, xdp: Introduce xdp_init_buff utility routine
Introduce xdp_init_buff utility routine to initialize xdp_buff fields
const over NAPI iterations (e.g. frame_sz or rxq pointer). Rely on
xdp_init_buff in all XDP capable drivers.

Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Alexander Duyck <alexanderduyck@fb.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Shay Agroskin <shayagr@amazon.com>
Acked-by: Martin Habets <habetsm.xilinx@gmail.com>
Acked-by: Camelia Groza <camelia.groza@nxp.com>
Acked-by: Marcin Wojtas <mw@semihalf.com>
Link: https://lore.kernel.org/bpf/7f8329b6da1434dc2b05a77f2e800b29628a8913.1608670965.git.lorenzo@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2021-01-08 13:39:24 -08:00
Jakub Kicinski 46d5e62dd3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
xdp_return_frame_bulk() needs to pass a xdp_buff
to __xdp_return().

strlcpy got converted to strscpy but here it makes no
functional difference, so just keep the right code.

Conflicts:
	net/netfilter/nf_tables_api.c

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2020-12-11 22:29:38 -08:00
Björn Töpel 1beb7830d3 ice: avoid premature Rx buffer reuse
The page recycle code, incorrectly, relied on that a page fragment
could not be freed inside xdp_do_redirect(). This assumption leads to
that page fragments that are used by the stack/XDP redirect can be
reused and overwritten.

To avoid this, store the page count prior invoking xdp_do_redirect().

Fixes: efc2214b60 ("ice: Add support for XDP")
Reported-and-analyzed-by: Li RongQing <lirongqing@baidu.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: George Kuruvinakunnel <george.kuruvinakunnel@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-12-09 15:26:58 -08:00
Björn Töpel b02e5a0ebb xsk: Propagate napi_id to XDP socket Rx path
Add napi_id to the xdp_rxq_info structure, and make sure the XDP
socket pick up the napi_id in the Rx path. The napi_id is used to find
the corresponding NAPI structure for socket busy polling.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://lore.kernel.org/bpf/20201130185205.196029-7-bjorn.topel@gmail.com
2020-12-01 00:09:25 +01:00
David S. Miller 150f29f5e6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2020-09-01

The following pull-request contains BPF updates for your *net-next* tree.

There are two small conflicts when pulling, resolve as follows:

1) Merge conflict in tools/lib/bpf/libbpf.c between 88a8212028 ("libbpf: Factor
   out common ELF operations and improve logging") in bpf-next and 1e891e513e
   ("libbpf: Fix map index used in error message") in net-next. Resolve by taking
   the hunk in bpf-next:

        [...]
        scn = elf_sec_by_idx(obj, obj->efile.btf_maps_shndx);
        data = elf_sec_data(obj, scn);
        if (!scn || !data) {
                pr_warn("elf: failed to get %s map definitions for %s\n",
                        MAPS_ELF_SEC, obj->path);
                return -EINVAL;
        }
        [...]

2) Merge conflict in drivers/net/ethernet/mellanox/mlx5/core/en/xsk/rx.c between
   9647c57b11 ("xsk: i40e: ice: ixgbe: mlx5: Test for dma_need_sync earlier for
   better performance") in bpf-next and e20f0dbf20 ("net/mlx5e: RX, Add a prefetch
   command for small L1_CACHE_BYTES") in net-next. Resolve the two locations by retaining
   net_prefetch() and taking xsk_buff_dma_sync_for_cpu() from bpf-next. Should look like:

        [...]
        xdp_set_data_meta_invalid(xdp);
        xsk_buff_dma_sync_for_cpu(xdp, rq->xsk_pool);
        net_prefetch(xdp->data);
        [...]

We've added 133 non-merge commits during the last 14 day(s) which contain
a total of 246 files changed, 13832 insertions(+), 3105 deletions(-).

The main changes are:

1) Initial support for sleepable BPF programs along with bpf_copy_from_user() helper
   for tracing to reliably access user memory, from Alexei Starovoitov.

2) Add BPF infra for writing and parsing TCP header options, from Martin KaFai Lau.

3) bpf_d_path() helper for returning full path for given 'struct path', from Jiri Olsa.

4) AF_XDP support for shared umems between devices and queues, from Magnus Karlsson.

5) Initial prep work for full BPF-to-BPF call support in libbpf, from Andrii Nakryiko.

6) Generalize bpf_sk_storage map & add local storage for inodes, from KP Singh.

7) Implement sockmap/hash updates from BPF context, from Lorenz Bauer.

8) BPF xor verification for scalar types & add BPF link iterator, from Yonghong Song.

9) Use target's prog type for BPF_PROG_TYPE_EXT prog verification, from Udip Pant.

10) Rework BPF tracing samples to use libbpf loader, from Daniel T. Lee.

11) Fix xdpsock sample to really cycle through all buffers, from Weqaar Janjua.

12) Improve type safety for tun/veth XDP frame handling, from Maciej Żenczykowski.

13) Various smaller cleanups and improvements all over the place.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-01 13:22:59 -07:00
Magnus Karlsson 1742b3d528 xsk: i40e: ice: ixgbe: mlx5: Pass buffer pool to driver instead of umem
Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
2020-08-31 21:15:03 +02:00
Tariq Toukan f468f21b7a net: Take common prefetch code structure into a function
Many device drivers use the same prefetch code structure to
deal with small L1 cacheline size.
Take this code into a function and call it from the drivers.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-08-26 15:55:53 -07:00
Tony Nguyen 7dbc63f0a5 ice: Misc minor fixes
This is a collection of minor fixes including typos, white space, and
style. No functional changes.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2020-08-01 08:44:04 -07:00
Kiran Patil 0a37abfa01 ice: port fix for chk_linearlize
This is a port of commit 248de22e63 ("i40e/i40evf: Account for frags
split over multiple descriptors in check linearize")

As part of testing workloads (read/write) using larger IO size (128K)
tx_timeout is observed and whenever it happens, it was due to
tx_linearize.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-08-01 08:44:04 -07:00
Jesse Brandeburg a4c493fea5 ice: remove page_reuse statistic
The page reuse statistic wasn't even being displayed to the user, even
though the driver counted it. Don't waste the struct space and hot-path
cycles since the driver doesn't display it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
2020-08-01 08:43:59 -07:00
Tony Nguyen 6221595fc5 ice: fix unused parameter warning
Depending on PAGE_SIZE, the following unused parameter warning can be
reported:

drivers/net/ethernet/intel/ice/ice_txrx.c: In function ‘ice_rx_frame_truesize’:
drivers/net/ethernet/intel/ice/ice_txrx.c:513:21: warning: unused parameter ‘size’ [-Wunused-parameter]
        unsigned int size)

The 'size' variable is used only when PAGE_SIZE >= 8192. Add __maybe_unused
to remove the warning.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
2020-07-29 08:38:56 -07:00
Brett Creeley 2bb19d6e07 ice: Fix transmit for all software offloaded VLANs
Currently the driver does not recognize when there is an 802.1AD VLAN
tag right after the dmac/smac (outermost VLAN tag). If any DCB map is
applied and/or DCB is enabled this is causing the hardware to insert a
VLAN 0 tag after the 802.1AD VLAN tag that is already in the packet.
Fix this by preventing VLAN tag 0 from being added when any VLAN is
already present after dmac/smac (software offloaded) or skb (hardware
offloaded).

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-31 03:38:20 -07:00
Henry Tieman cac2a27cd9 ice: Support IPv4 Flow Director filters
Support the addition and deletion of IPv4 filters.

Supported fields are: src-ip, dst-ip, src-port, and dst-port
Supported flow-types are: tcp4, udp4, sctp4, ip4

Example usage:

ethtool -N eth0 flow-type tcp4 src-ip 192.168.0.55 dst-ip 172.16.0.55 \
src-port 16 dst-port 12 action 32

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-22 21:36:27 -07:00
Henry Tieman 148beb6120 ice: Initialize Flow Director resources
Flow Director allows for redirection based on ntuple rules. Rules are
programmed using the ethtool set-ntuple interface. Supported actions are
redirect to queue and drop.

Setup the initial framework to process Flow Director filters. Create and
allocate resources to manage and program filters to the hardware. Filters
are processed via a sideband interface; a control VSI is created to manage
communication and process requests through the sideband. Upon allocation of
resources, update the hardware tables to accept perfect filters.

Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-22 21:26:37 -07:00
Tony Nguyen 5757cc7c8b ice: Rename build_ctob to ice_build_ctob
To make the function easier to identify as being part of the ice driver,
prepend ice to the function name.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Karol Kolacinski 88865fc4bb ice: Fix casting issues
Change min() macros to min_t() which has compare type specified and it
helps avoid precision loss.

In some cases there was precision loss during calls or assignments.
Some fields in structs were unnecessarily large and gave multiple
warnings.

There were also some minor type differences which are now fixed as well as
some cases where a simple cast was needed.

Callers were were passing data that is a u16 to
ice_sched_cfg_node_bw_alloc() but the function was truncating that to a u8.
Fix that by changing the function to take a u16.

Signed-off-by: Karol Kolacinski <karol.kolacinski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:04 -07:00
Tony Nguyen a4e82a81f5 ice: Add support for tunnel offloads
Create a boost TCAM entry for each tunnel port in order to get a tunnel
PTYPE. Update netdev feature flags and implement the appropriate logic to
get and set values for hardware offloads.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Henry Tieman <henry.w.tieman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-05-21 22:10:03 -07:00
Jesper Dangaard Brouer d4ecdbf7aa ice: Add XDP frame size to driver
This driver uses different memory models depending on PAGE_SIZE at
compile time. For PAGE_SIZE 4K it uses page splitting, meaning for
normal MTU frame size is 2048 bytes (and headroom 192 bytes). For
larger MTUs the driver still use page splitting, by allocating
order-1 pages (8192 bytes) for RX frames. For PAGE_SIZE larger than
4K, driver instead advance its rx_buffer->page_offset with the frame
size "truesize".

For XDP frame size calculations, this mean that in PAGE_SIZE larger
than 4K mode the frame_sz change on a per packet basis. For the page
split 4K PAGE_SIZE mode, xdp.frame_sz is more constant and can be
updated once outside the main NAPI loop.

The default setting in the driver uses build_skb(), which provides
the necessary headroom and tailroom for XDP-redirect in RX-frame
(in both modes).

There is one complication, which is legacy-rx mode (configurable via
ethtool priv-flags). There are zero headroom in this mode, which is a
requirement for XDP-redirect to work. The conversion to xdp_frame
(convert_to_xdp_frame) will detect this insufficient space, and
xdp_do_redirect() call will fail. This is deemed acceptable, as it
allows other XDP actions to still work in legacy-mode. In
legacy-mode + larger PAGE_SIZE due to lacking tailroom, we also
accept that xdp_adjust_tail shrink doesn't work.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Alexander Duyck <alexander.duyck@gmail.com>
Link: https://lore.kernel.org/bpf/158945347002.97035.328088795813704587.stgit@firesoul
2020-05-14 21:21:56 -07:00
Bruce Allan 4e83fc934e ice: replace "fallthrough" comments with fallthrough reserved word
"fallthrough" comments are used in switch case statements to explicitly
indicate the code is intended to fall through to the following statement.
Different variants of "fallthough" are acceptable, e.g. "fall through",
"fallthrough", "Fall-through".  The GCC compiler has an optional warning
(-Wimplicit-fallthrough[=n]) to warn when such a comment is not present;
the default version of which is enabled when compiling the Linux kernel.

There have been recent discussions in kernel mailing lists regarding
replacing non-standardized "fallthrough" comments with the pseudo-reserved
word 'fallthrough' which will be defined as __attribute__ ((fallthrough))
for versions of gcc that support it (i.e. gcc 7 and newer) or as a nop
for versions that do not.  Replace "fallthrough" comments with fallthrough
reserved word.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-15 17:03:12 -08:00
Bruce Allan 752eee0678 ice: remove unnecessary fallthrough comments
Fallthrough comments are used to explicitly indicate the code is intended
to flow from one case statement to the next in a switch statement rather
than break out of the switch statement.  They are only needed when a case
has one or more statements to execute before falling through to the next
case, not when there is a list of cases for which the same statement(s)
should be executed.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-15 16:56:48 -08:00
Tony Nguyen 4ee656bba8 ice: Trivial fixes
This is a collection of trivial fixes including fixing whitespace, typos,
function headers, reverse Christmas tree, etc.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12 11:49:12 -08:00
Tony Nguyen 0a6ea04e3b ice: Remove possible null dereference
Commit 1f45ebe0d8 ("ice: add extra check for null Rx descriptor") moved
the call to ice_construct_skb() under a null check as Coverity reported a
possible use of null skb. However, the original call was not deleted, do so
now.

Fixes: 1f45ebe0d8 ("ice: add extra check for null Rx descriptor")
Reported-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-02-12 11:48:51 -08:00
Mitch Williams 1f45ebe0d8 ice: add extra check for null Rx descriptor
In the case where the hardware gives us a null Rx descriptor, it is
theoretically possible that we could call one of our skb-construction
functions with no data pointer, which would cause a panic.

In real life, this will never happen - we only get null RX
descriptors as the final descriptor in a chain of otherwise-valid
descriptors. When this happens, the skb will be extant and we'll just
call ice_add_rx_frag(), which can deal with empty data buffers.

Unfortunately, Coverity does not have intimate knowledge of our
hardware, so we must add a check here.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-03 16:08:33 -08:00
Brett Creeley a54e3b8cff ice: Support UDP segmentation offload
Based on the work done by Alex Duyck on other Intel drivers, add code to
support UDP segmentation offload (USO) for the ice driver.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2020-01-03 16:08:32 -08:00
Krzysztof Kazimierczak 133f4883f9 ice: Get rid of ice_cleanup_header
ice_cleanup_hdrs() has been stripped of most of its content, it only serves
as a wrapper for eth_skb_pad(). We can get rid of it altogether and
simplify the codebase.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-06 16:41:45 -08:00
Maciej Fijalkowski aaf27254fd ice: add build_skb() support
Driver is now prepared for building the skb around the existing Rx
buffer, so introduce the ice_build_skb responsible for it. Make use of
XDP's data_meta as well.

I've observed around 30% less CPU consumption with build_skb Rx path, in
comparison to legacy Rx. What stands behind such result is the avoidance
of flow_dissector (which we were diving into via eth_get_headlen) and no
memcpy calls.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 13:09:59 -08:00
Maciej Fijalkowski 59bb080805 ice: introduce frame padding computation logic
Take into account the underlying architecture specific settings and
based on that calculate the possible padding that can be supplied.
Typically, for x86 and standard MTU size we will end up with 192 bytes
of headroom. This is the same behavior as our other drivers have and we
can dedicate it for XDP purposes.

Furthermore, introduce the Rx ring flag for indicating whether build_skb
is used on particular. Based on that invoke the routines for padding
calculation.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 13:09:50 -08:00
Maciej Fijalkowski 7237f5b0db ice: introduce legacy Rx flag
Add an ethtool "legacy-rx" priv flag for toggling the Rx path. This
control knob will be mainly used for build_skb usage as well as buffer
size/MTU manipulation.

In preparation for adding build_skb support in a way that it takes
care of how we set the values of max_frame and rx_buf_len fields of
struct ice_vsi. Specifically, in this patch mentioned fields are set to
values that will allow us to provide headroom and tailroom in-place.

This can be mostly broken down onto following:
- for legacy-rx "on" ethtool control knob, old behaviour is kept;
- for standard 1500 MTU size configure the buffer of size 1536, as
  network stack is expecting the NET_SKB_PAD to be provided and
  NET_IP_ALIGN can have a non-zero value (these can be typically equal
  to 32 and 2, respectively);
- for larger MTUs go with max_frame set to 9k and configure the 3k
  buffer in case when PAGE_SIZE of underlying arch is less than 8k; 3k
  buffer is implying the need for order 1 page, so that our page
  recycling scheme can still be applied;

With that said, substitute the hardcoded ICE_RXBUF_2048 and PAGE_SIZE
values in DMA API that we're making use of with rx_ring->rx_buf_len and
ice_rx_pg_size(rx_ring). The latter is an introduced helper for
determining the page size based on its order (which was figured out via
ice_rx_pg_order). Last but not least, take care of truesize calculation.

In the followup patch the headroom/tailroom computation logic will be
introduced.

This change aligns the buffer and frame configuration with other Intel
drivers, most importantly with iavf.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 13:09:46 -08:00
Krzysztof Kazimierczak 2d4238f556 ice: Add support for AF_XDP
Add zero copy AF_XDP support.  This patch adds zero copy support for
Tx and Rx; code for zero copy is added to ice_xsk.h and ice_xsk.c.

For Tx, implement ndo_xsk_wakeup. As with other drivers, reuse
existing XDP Tx queues for this task, since XDP_REDIRECT guarantees
mutual exclusion between different NAPI contexts based on CPU ID. In
turn, a netdev can XDP_REDIRECT to another netdev with a different
NAPI context, since the operation is bound to a specific core and each
core has its own hardware ring.

For Rx, allocate frames as MEM_TYPE_ZERO_COPY on queues that AF_XDP is
enabled.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Co-developed-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 12:01:55 -08:00
Krzysztof Kazimierczak 0891d6d4b1 ice: Move common functions to ice_txrx_lib.c
In preparation of AF XDP, move functions that will be used both by skb and
zero-copy paths to a new file called ice_txrx_lib.c.  This allows us to
avoid using ifdefs to control the staticness of said functions.

Move other functions (ice_rx_csum, ice_rx_hash and ice_ptype_to_htype)
called only by the moved ones to the new file as well.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 11:45:05 -08:00
Maciej Fijalkowski efc2214b60 ice: Add support for XDP
Add support for XDP. Implement ndo_bpf and ndo_xdp_xmit.  Upon load of
an XDP program, allocate additional Tx rings for dedicated XDP use.
The following actions are supported: XDP_TX, XDP_DROP, XDP_REDIRECT,
XDP_PASS, and XDP_ABORTED.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-11-04 10:23:59 -08:00
Jesse Brandeburg d27525ec1f ice: small efficiency fixes
Add a small bit of efficiency to the code by adding a
prefetch of the port_info structure in order to help
avoid a cache miss a little later on in execution.

Also add an unlikely statement to a branch which
generally will never happen in normal operation.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-05 08:13:40 -07:00
Jesse Brandeburg 6503b65930 ice: move code closer together
This is a simple patch to move the assignment to a local variable
closer to the site where the local variable is used.  This
can help readability and also maybe performance, although the
performance enhancement is really dependent upon the compiler.

No functional change.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-05 08:13:40 -07:00
Jesse Brandeburg 2fb0821fd5 ice: clean up arguments
There are a couple of functions that don't need two arguments
passed in when the second argument already had access to
the pointer pointed to by the first.

Remove the unnecessary arguments.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-09-05 08:13:40 -07:00
Dave Ertman 0c3a6101ff ice: Allow egress control packets from PF_VSI
For control packets (i.e. LLDP packets) to be able to egress
from the main VSI, a bit has to be set in the TX_descriptor.
This should only be done for the main VSI and only if the
FW LLDP agent is disabled.  A bit to allow this also has to
be set in the VSI context.

Add the logic to add the necessary bits in the VSI context
for the PF_VSI and the TX_descriptors for control packets
egressing the PF_VSI.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-23 09:17:45 -07:00
Mitch Williams ac6f733a7b ice: allow empty Rx descriptors
In some circumstances, the hardware will hand us a receive descriptor
which has no data attached, but is otherwise valid. The receive code was
improperly ignoring these descriptors, which result in an infinite loop.

To fix this, change the receive code to process all descriptors,
regardless of the size of the associated data. Add checks to the
memory-handling functions to allow for zero size.

Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20 14:30:37 -07:00
Brett Creeley 2ab28bb04c ice: Set WB_ON_ITR when we don't re-enable interrupts
Currently when busy polling is enabled we aren't setting/enabling
WB_ON_ITR in the driver. This doesn't break the driver, but it does
cause issues. If we don't enable WB_ON_ITR mode we will still get
write-backs from hardware during polling when a cache line has been
filled, but if a cache line is not filled we will not get the
write-back because WB_ON_ITR is not set. Fix this by enabling
WB_ON_ITR in the driver when interrupts are disabled.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20 14:21:21 -07:00
Brett Creeley 9118fcd525 ice: Assume that more than one Rx queue is rare in ice_napi_poll
Currently we divide budget by the number of Rx queues per Rx ring
container in ice_napi_poll even if there is only 1. This is an
unnecessary divide for the normal case of 1 Rx ring per Rx ring
container. Fix this by using an unlikely() call in the case where we
actually need to divide.

Also, we will always set budget_per_ring even if there are no Rx rings
in the Rx ring container so we don't need to initialize it to 0.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-08-20 12:28:35 -07:00
Brett Creeley ba880734ba ice: Remove unnecessary flag ICE_FLAG_MSIX_ENA
This flag is not needed and is called every time we re-enable interrupts
in the hotpath so remove it. Also remove ice_vsi_req_irq() because it
was a wrapper function for ice_vsi_req_irq_msix() whose sole purpose was
checking the ICE_FLAG_MSIX_ENA flag.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:41:01 -07:00
Brett Creeley a1e9968593 ice: Remove duplicate code in ice_alloc_rx_bufs
Currently if the call to ice_alloc_mapped_page() fails we jump to the
no_buf label, possibly call ice_release_rx_desc(), and return true
indicating that there is more work to do. In the success case we just
fall out of the while loop, possibly call ice_alloc_mapped_page(), and
return false saying we exhausted cleaned_count. This flow can be
improved by breaking if ice_alloc_mapped_page() fails and then the flow
outside of the while loop is the same for the failure and success case.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:52 -07:00
Brett Creeley cb7db35641 ice: Only bump Rx tail and release buffers once per napi_poll
Currently we bump the Rx tail and release/give buffers to hardware every
16 descriptors. This causes us to bump Rx tail up to 4 times per
napi_poll call. Also we are always bumping tail on an odd index and this
is a problem because hardware ignores the lower 3 bits in the QRX_TAIL
register. This is making it so hardware sees tail bumps only every 8
descriptors. Instead lets only bump Rx tail once per napi_poll if
the value aligns with hardware's expectations of the lower 3 bits being
cleared. Also only release/give Rx buffers once per napi_poll call.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-07-31 13:40:30 -07:00
Matthew Wilcox (Oracle) d7840976e3 net: Use skb accessors in network drivers
In preparation for unifying the skb_frag and bio_vec, use the fine
accessors which already exist and use skb_frag_t instead of
struct skb_frag_struct.

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2019-07-22 20:47:56 -07:00
Anirudh Venkataramanan 2f2da36ebf ice: Trivial cosmetic changes
This patch mostly capitalizes abbreviations in code comments. Fixed some
typos and removed some unnecessary newlines as well.

Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-30 10:57:55 -07:00
Bruce Allan e65e9e1566 ice: Resolve static analysis warning
Some static analysis tools can complain when doing a bitop assignment using
operands of different sizes. Fix that.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2019-05-29 02:36:58 -07:00