Make use of the "shared port table" to convert i40e to the new
infra.
i40e did not have any reference tracking, locking is also dodgy
because rtnl gets released while talking to FW, so port may get
removed from the table while it's getting added etc.
On the good side i40e does not seem to be using the ports for
TX so we can remove the table from the driver state completely.
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This takes care of all of the trivial W=1 fixes in the Intel
Ethernet drivers, which allows developers and maintainers to
build more of the networking tree with more complete warning
checks.
There are three classes of kdoc warnings fixed:
- cannot understand function prototype: 'x'
- Excess function parameter 'x' description in 'y'
- Function parameter or member 'x' not described in 'y'
All of the changes were trivial comment updates on
function headers.
Inspired by Lee Jones' series of wireless work to do the same.
Compile tested only, and passes simple test of
$ git ls-files *.[ch] | egrep drivers/net/ethernet/intel | \
xargs scripts/kernel-doc -none
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The i40e NIC supports two flavors of HW descriptors, 16 and 32
byte. The latter has, obviously, room for more offloading
information. However, the only fields of the 32B HW descriptor that is
being used by the driver, is also available in the 16B descriptor.
In other words; Reading and writing 32 bytes instead of 16 byte is a
waste of bus bandwidth.
This commit starts using 16 byte descriptors instead of 32 byte
descriptors.
For AF_XDP the rx_drop benchmark was improved by 2%.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After commit 759dc4a7e6 ("i40e: initialize our affinity_mask
based on cpu_possible_mask"), NAPI IRQ affinity_mask is bind to
all possible CPUs, not a fixed CPU
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Rename the AF_XDP zero-copy driver interface functions to better
reflect what they do after the replacement of umems with buffer
pools in the previous commit. Mostly it is about replacing the
umem name from the function names with xsk_buff and also have
them take the a buffer pool pointer instead of a umem. The
various ring functions have also been renamed in the process so
that they have the same naming convention as the internal
functions in xsk_queue.h. This so that it will be clearer what
they do and also for consistency.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-3-git-send-email-magnus.karlsson@intel.com
Replace the explicit umem reference passed to the driver in AF_XDP
zero-copy mode with the buffer pool instead. This in preparation for
extending the functionality of the zero-copy mode so that umems can be
shared between queues on the same netdev and also between netdevs. In
this commit, only an umem reference has been added to the buffer pool
struct. But later commits will add other entities to it. These are
going to be entities that are different between different queue ids
and netdevs even though the umem is shared between them.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Link: https://lore.kernel.org/bpf/1598603189-32145-2-git-send-email-magnus.karlsson@intel.com
Now that BPF program/link management is centralized in generic net_device
code, kernel code never queries program id from drivers, so
XDP_QUERY_PROG/XDP_QUERY_PROG_HW commands are unnecessary.
This patch removes all the implementations of those commands in kernel, along
the xdp_attachment_query().
This patch was compile-tested on allyesconfig.
Signed-off-by: Andrii Nakryiko <andriin@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200722064603.3350758-10-andriin@fb.com
The method struct pci_error_handlers.error_detected() is defined and
documented as taking an 'enum pci_channel_state' for the second argument,
but most drivers use 'pci_channel_state_t' instead.
This 'pci_channel_state_t' is not a typedef for the enum but a typedef for
a bitwise type in order to have better/stricter typechecking.
Consolidate everything by using 'pci_channel_state_t' in the method's
definition, in the related helpers and in the drivers.
Enforce use of 'pci_channel_state_t' by replacing 'enum pci_channel_state'
with an anonymous 'enum'.
Note: Currently, from a typechecking point of view this patch changes
nothing because only the constants defined by the enum are bitwise, not the
enum itself (sparse doesn't have the notion of 'bitwise enum'). This may
change in some not too far future, hence the patch.
[bhelgaas: squash in
https://lore.kernel.org/r/20200702162651.49526-3-luc.vanoostenryck@gmail.comhttps://lore.kernel.org/r/20200702162651.49526-4-luc.vanoostenryck@gmail.com]
Link: https://lore.kernel.org/r/20200702162651.49526-2-luc.vanoostenryck@gmail.com
Signed-off-by: Luc Van Oostenryck <luc.vanoostenryck@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Prior to this, only Rx and Tx ring statistics were accounted for.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
After OS requests to down a link on a physical network port, the
traffic is no longer being processed but the physical link with
a link partner is still established.
Currently there is a feature (Link down on close) which allows
to physically bring the link down (after OS request).
With this patch new feature with similar capability is introduced:
TOTAL_PORT_SHUTDOWN
Allows to physically disable the link on the NIC's port.
If enabled, (after link down request from the OS)
no link, traffic or led activity is possible on that port.
If I40E_FLAG_TOTAL_PORT_SHUTDOWN is enabled, the
I40E_FLAG_LINK_DOWN_ON_CLOSE_ENABLED must be explicitly forced to
true and cannot be disabled at that time.
The functionalities are exclusive in terms of configuration, but
they also have similar behavior (allowing to disable physical link
of the port), with following differences:
- LINK_DOWN_ON_CLOSE_ENABLED is configurable at host OS run-time
and is supported by whole family of 7xx Intel Ethernet Controllers
- TOTAL_PORT_SHUTDOWN may be enabled only before OS loads (in BIOS)
only if motherboard's BIOS and NIC's FW has support of it
- when LINK_DOWN_ON_CLOSE_ENABLED is used, the link is being brought
down by sending phy_type=0 to NIC's FW
- when TOTAL_PORT_SHUTDOWN is used, phy_type is not altered, instead
the link is being brought down by clearing bit
(I40E_AQ_PHY_ENABLE_LINK) in abilities field of
i40e_aq_set_phy_config structure
Introduced changes:
- new private flag I40E_FLAG_TOTAL_PORT_SHUTDOWN for handling the
feature
- probe of NVM if the feature was enabled at driver's port
initialization
- special handling on link-down procedure to let FW physically
shutdown the port if the feature was enabled
Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Convert all the remaining 'fall through" code comments to the newer
'fallthrough;' keyword.
Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2020-06-25
This series contains updates to i40e driver and removes the individual
driver versions from all of the Intel wired LAN drivers.
Shiraz moves the client header so that it can easily be shared between
the i40e LAN driver and i40iw RDMA driver.
Jesse cleans up the unused defines, since they are just dead weight.
Alek reduces the unreasonably long wait time for a PF reset after reboot
by using jiffies to limit the maximum wait time for the PF reset to
succeed. Added additional logging to let the user know when the driver
transitions into recovery mode. Adds new device support for our 5 Gbps
NICs.
Todd adds a check to see if MFS is set after warm reboot and notifies
the user when MFS is set to anything lower than the default value.
Arkadiusz fixes a possible race condition, where were holding a
spin-lock while in atomic context.
v2: removed code comments that were no longer applicable in patch 2 of
the series. Also removed 'inline' from patch 4 and patch 8 of the
series. Also re-arranged code to be able to remove the forward
function declarations. Dropped patch 9 of the series, while the
author works on cleaning up the commit message.
v3: Updated patch 8 description to answer Jakub's questions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
As with other networking drivers, remove the unnecessary driver version
from the Intel drivers. The ethtool driver information and module version
will then report the kernel version instead.
For ixgbe, i40e and ice drivers, the driver passes the driver version to
the firmware to confirm that we are up and running. So we now pass the
value of UTS_RELEASE to the firmware. This adminq call is required per
the HAS document. The Device then sends an indication to the BMC that the
PF driver is present. This is done using Host NC Driver Status Indication
in NC-SI Get Link command or via the Host Network Controller Driver Status
Change AEN.
What the BMC may do with this information is implementation-dependent, but
this is a standard NC-SI 1.1 command we honor per the HAS.
CC: Bruce Allan <bruce.w.allan@intel.com>
CC: Jesse Brandeburg <jesse.brandeburg@intel.com>
CC: Alek Loktionov <aleksandr.loktionov@intel.com>
CC: Kevin Liedtke <kevin.d.liedtke@intel.com>
CC: Aaron Rowden <aaron.f.rowden@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Co-developed-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
A customer was chain-booting to provision his systems and one of the
steps was setting MFS. MFS isn't cleared by normal warm reboots
(clearing requires a GLOBR) and there was no indication of why Jumbo
Frame receives were failing.
Add a warning if MFS is set to anything lower than the default.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Detect and log information about pre-recovery mode when firmware
transitions to a recovery mode.
When a firmware transitions to a recovery mode it stores a number
of unexpected EMP resets in one of its registers. The number of EMP
resets ranging from 0x21 to 0x2A indicates that FW transitions
to recovery mode. Use these values to emit log entry about transition
process. Previously the pre-recovery mode may not have been detected
and there was no log entry when NIC was in pre-recovery mode.
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use jiffies to limit max waiting time for PF reset to succeed.
Previous wait loop was unreliable. It required unreasonably long time
to wait for PF reset after reboot when NIC was about to enter
recovery mode
Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove all the unused defines as they are just dead weight.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
READ_ONCE should be used when reading rings prior to accessing the
statistics pointer. Introduce this as well as the corresponding WRITE_ONCE
usage when allocating and freeing the rings, to ensure protected access.
Signed-off-by: Ciara Loftus <ciara.loftus@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since snprintf() returns the would-be-output size instead of the
actual output size, the succeeding calls may go beyond the given
buffer limit. Fix it by replacing with scnprintf().
Signed-off-by: Takashi Iwai <tiwai@suse.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Remove MEM_TYPE_ZERO_COPY in favor of the new MEM_TYPE_XSK_BUFF_POOL
APIs. The AF_XDP zero-copy rx_bi ring is now simply a struct xdp_buff
pointer.
v4->v5: Fixed "warning: Excess function parameter 'bi' description in
'i40e_construct_skb_zc'". (Jakub)
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-9-bjorn.topel@gmail.com
Continuing the path to support MEM_TYPE_XSK_BUFF_POOL, the AF_XDP
zero-copy/sk_buff rx_bi rings are now separate. Functions to properly
allocate the different rings are added as well.
v3->v4: Made i40e_fd_handle_status() static. (kbuild test robot)
v4->v5: Fix kdoc for i40e_clean_programming_status(). (Jakub)
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Cc: intel-wired-lan@lists.osuosl.org
Link: https://lore.kernel.org/bpf/20200520192103.355233-8-bjorn.topel@gmail.com
Move the AF_XDP zero-copy driver interface to its own include file
called xdp_sock_drv.h. This, hopefully, will make it more clear for
NIC driver implementors to know what functions to use for zero-copy
support.
v4->v5: Fix -Wmissing-prototypes by include header file. (Jakub)
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Link: https://lore.kernel.org/bpf/20200520192103.355233-4-bjorn.topel@gmail.com
The '==' expression itself is bool, no need to convert it to bool again.
This fixes the following coccicheck warning:
drivers/net/ethernet/intel/i40e/i40e_main.c:1614:52-57: WARNING:
conversion to bool not needed here
drivers/net/ethernet/intel/i40e/i40e_main.c:11439:52-57: WARNING:
conversion to bool not needed here
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The scope of function .ndo_tx_timeout was changed to include the hang
queue when a TX timeout event occurs. See commit 0290bd291c
("netdev: pass the stuck queue to the timeout handler") for more
details. Now, drivers don't need to identify which queue is stopped.
Drivers can simply use the queue index provided by dev_watchdog and
execute all actions needed to restore network traffic. This commit do
some cleanups into Intel i40e driver to remove a redundant loop to find
stopped queue.
Signed-off-by: Julio Faracco <jcfaracco@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Use synchronize_rcu to wait until the XSK wakeup function finishes
before destroying the resources it uses:
1. i40e_down already calls synchronize_rcu. On i40e_down either
__I40E_VSI_DOWN or __I40E_CONFIG_BUSY is set. Check the latter in
i40e_xsk_wakeup (the former is already checked there).
2. After switching the XDP program, call synchronize_rcu to let
i40e_xsk_wakeup exit before the XDP program is freed.
3. Changing the number of channels brings the interface down (see
i40e_prep_for_reset and i40e_pf_quiesce_all_vsi).
4. Disabling UMEM sets __I40E_CONFIG_BUSY, too.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20191217162023.16011-4-maximmi@mellanox.com
Changing a network device MTU can be a fairly frequent operation, and
failure to change the MTU is reflected to user-space properly, both by
an appropriate message as well as by looking at whether the device's MTU
matches the configuration.
Demote the prints to debug prints by using netdev_dbg(), making all
Intel wired LAN drivers consistent, since they used a mixture of PCI
device and network device prints before.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Implement the VF stats gathering via the kernel via ndo_get_vf_stats().
The driver will show per-VF stats in the output of the command:
ip -s link show dev <PF>
Testing Hints:
ip -s link show dev eth0
will return non-zero VF stats.
...
vf 0 MAC 00:55:aa:00:55:aa, spoof checking on, link-state enable, trust off
RX: bytes packets mcast bcast
128000 1000 104 104
TX: bytes packets
128000 1000
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Based on a series from Alexander Duyck this change adds UDP segmentation
offload support to the i40e driver.
CC: Alexander Duyck <alexander.h.duyck@intel.com>
CC: Willem de Bruijn <willemb@google.com>
Signed-off-by: Josh Hunt <johunt@akamai.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In i40e_setup_macvlans if i40e_setup_channel fails the allocated memory
for ch should be released.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch prepares ground for the next VF MAC address change fix.
It lets untrusted VF to delete any VF mac filter, but it still
doesn't let untrusted VF to add mac filter not setup by PF.
It removes information duplication in num_mac mac filters counter.
And improves exact h/w mac filters usage checking in the
i40e_check_vf_permission() function by counting mac2add_cnt.
It also improves logging because now all mac addresses will be validated
first and corresponding messages will be logged.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Since commit 92418fb147 ("i40e/i40evf: Use usec value instead of reg
value for ITR defines") the driver tracks the interrupt throttling
intervals in single usec units, although the actual ITRN/ITR0 registers are
programmed in 2 usec units. Most register programming flows in the driver
correctly handle the conversion, although it is currently not applied when
the registers are initialized to their default values. Most of the time
this doesn't present a problem since the default values are usually
immediately overwritten through the standard adaptive throttling mechanism,
or updated manually by the user, but if adaptive throttling is disabled and
the interval values are left alone then the incorrect value will persist.
Since the intended default interval of 50 usecs (vs. 100 usecs as
programmed) performs better for most traffic workloads, this can lead to
performance regressions.
This patch adds the correct conversion when writing the initial values to
the ITRN registers.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When variable "req_fec, fec, an" are empty,
dmesg shows log with "Requested FEC: , Negotiated FEC: , Autoneg:".
Add link dmesg log for cards without FEC.
Signed-off-by: Czeslaw Zagorski <czeslawx.zagorski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The "Negotiated" string in i40e_print_link_message() function was missed.
This string has been added to the dmesg and small refactoring done removing
common substrings and unifying link status message format.
Without this patch it was not clear that FEC is related to negotiated FEC.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
While testing VF spawn/destroy the following panic occurred.
BUG: unable to handle kernel NULL pointer dereference at 0000000000000029
[...]
Workqueue: i40e i40e_service_task [i40e]
RIP: 0010:i40e_sync_vsi_filters+0x6fd/0xc60 [i40e]
[...]
Call Trace:
? __switch_to_asm+0x35/0x70
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x35/0x70
? _cond_resched+0x15/0x30
i40e_sync_filters_subtask+0x56/0x70 [i40e]
i40e_service_task+0x382/0x11b0 [i40e]
? __switch_to_asm+0x41/0x70
? __switch_to_asm+0x41/0x70
process_one_work+0x1a7/0x3b0
worker_thread+0x30/0x390
? create_worker+0x1a0/0x1a0
kthread+0x112/0x130
? kthread_bind+0x30/0x30
ret_from_fork+0x35/0x40
Investigation revealed a race where pf->vf[vsi->vf_id].trusted may get
accessed by the watchdog via i40e_sync_filters_subtask() although
i40e_free_vfs() already free'd pf->vf.
To avoid this the call to i40e_sync_vsi_filters() in
i40e_sync_filters_subtask() needs to be guarded by __I40E_VF_DISABLE,
which is also used by i40e_free_vfs().
Note: put the __I40E_VF_DISABLE check after the
__I40E_MACVLAN_SYNC_PENDING check as the latter is more likely to
trigger.
CC: stable@vger.kernel.org
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Daniel Borkmann says:
====================
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add the ability to use unaligned chunks in the AF_XDP umem. By
relaxing where the chunks can be placed, it allows to use an
arbitrary buffer size and place whenever there is a free
address in the umem. Helps more seamless DPDK AF_XDP driver
integration. Support for i40e, ixgbe and mlx5e, from Kevin and
Maxim.
2) Addition of a wakeup flag for AF_XDP tx and fill rings so the
application can wake up the kernel for rx/tx processing which
avoids busy-spinning of the latter, useful when app and driver
is located on the same core. Support for i40e, ixgbe and mlx5e,
from Magnus and Maxim.
3) bpftool fixes for printf()-like functions so compiler can actually
enforce checks, bpftool build system improvements for custom output
directories, and addition of 'bpftool map freeze' command, from Quentin.
4) Support attaching/detaching XDP programs from 'bpftool net' command,
from Daniel.
5) Automatic xskmap cleanup when AF_XDP socket is released, and several
barrier/{read,write}_once fixes in AF_XDP code, from Björn.
6) Relicense of bpf_helpers.h/bpf_endian.h for future libbpf
inclusion as well as libbpf versioning improvements, from Andrii.
7) Several new BPF kselftests for verifier precision tracking, from Alexei.
8) Several BPF kselftest fixes wrt endianess to run on s390x, from Ilya.
9) And more BPF kselftest improvements all over the place, from Stanislav.
10) Add simple BPF map op cache for nfp driver to batch dumps, from Jakub.
11) AF_XDP socket umem mapping improvements for 32bit archs, from Ivan.
12) Add BPF-to-BPF call and BTF line info support for s390x JIT, from Yauheni.
13) Small optimization in arm64 JIT to spare 1 insns for BPF_MOD, from Jerin.
14) Fix an error check in bpf_tcp_gen_syncookie() helper, from Petar.
15) Various minor fixes and cleanups, from Nathan, Masahiro, Masanari,
Peter, Wei, Yue.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Driver waits after issuing a reset. When a reset takes too long a driver
gives up. Implemented by invoking PF reset in a loop. After defined
number of unsuccessful PF reset trials it returns error.
Without this patch PF reset fails when NIC is in recovery mode.
So make i40e_set_mac_type() public. i40e driver requires i40e_set_mac_type()
to be public. It is required for recovery mode handling. Without this patch
recovery mode could not be detected in i40e_probe().
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch removes function i40e_update_dcb_config(). Instead of
i40e_update_dcb_config() we use i40e_init_dcb(), which implements the
correct NVM read.
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The stats structure for the VEB switch statistics is reset periodically,
but the tc_stats are not reset at the same time.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Function check_recovery_mode had wrong if statement.
Now we check proper FWS1B register values, which are responsible for
the recovery mode. Recovery mode has 4 values for x710 and 2 for x722.
That's why we need 6 different flags which are defined in the code.
Now in the if statement, we recognize type of mac address
and register value.
Without those changes driver could show wrong state.
Signed-off-by: Adrian Podlawski <adrian.podlawski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When i40e_configure_tx_ring(vsi->tx_rings[i]) returns an error, we should
exit from i40e_vsi_configure_tx and return the error, instead of continuing
to check whether xdp is enable, and configure the xdp transmit ring.
Signed-off-by: huhai <huhai@kylinos.cn>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This commit replaces ndo_xsk_async_xmit with ndo_xsk_wakeup. This new
ndo provides the same functionality as before but with the addition of
a new flags field that is used to specifiy if Rx, Tx or both should be
woken up. The previous ndo only woke up Tx, as implied by the
name. The i40e and ixgbe drivers (which are all the supported ones)
are updated with this new interface.
This new ndo will be used by the new need_wakeup functionality of XDP
sockets that need to be able to wake up both Rx and Tx driver
processing.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Add log when PF is entering and leaving allmulti mode. The
change of PF state is visible in dmesg now. Without this commit,
entering and leaving allmulti mode is not logged in dmesg.
Signed-off-by: Czeslaw Zagorski <czeslawx.zagorski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add logging for disable-fw-lldp flag by ethtool. Added check
for I40E_FLAG_DISABLE_FW_LLDP and logging state in dmesg.
Without this commit there was no clear statement in dmesg
about FW LLDP state in dmesg.
Signed-off-by: Slawomir Laba <slawomirx.laba@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes missing call of i40e_update_veb_stats() in function
i40e_get_ethtool_stats() to update stats data of VEB and VEB TC
counters before they are written into ethtool buffer.
Before the patch ethtool counters may fell behind interface counters.
Signed-off-by: Dmitrii Golovanov <dmitrii.golovanov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Instead of using to_pci_dev + pci_get_drvdata,
use dev_get_drvdata to make code simpler.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
And any other existing fields in this structure that refer to tc.
Specifically:
* tc_cls_flower_offload_flow_rule() to flow_cls_offload_flow_rule().
* TC_CLSFLOWER_* to FLOW_CLS_*.
* tc_cls_common_offload to tc_cls_common_offload.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch updates flow_block_cb_setup_simple() to use the flow block API.
Several drivers are also adjusted to use it.
This patch introduces the per-driver list of flow blocks to account for
blocks that are already in use.
Remove tc_block_offload alias.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers do the same thing to set up the flow block callbacks, this
patch adds a helper function to do this.
This preparation patch reduces the number of changes to adapt the
existing drivers to use the flow block callback API.
This new helper function takes a flow block list per-driver, which is
set to NULL until this driver list is used.
This patch also introduces the flow_block_command and
flow_block_binder_type enumerations, which are renamed to use
FLOW_BLOCK_* in follow up patches.
There are three definitions (aliases) in order to reduce the number of
updates in this patch, which go away once drivers are fully adapted to
use this flow block API.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables macvlan offloads for i40e. The idea is to use
channels as macvlan interfaces. The channels are VSIs of
type VMDQ. When the first macvlan is created, the maximum number of
channels possible are created. From then on, as a macvlan interface
is created, a macvlan filter is added to these already created
channels (VSIs).
This patch utilizes subordinate device traffic classes to make queue
groups(channels) available for an upper device like a macvlan.
Steps to configure macvlan offloads:
1. ethtool -K ethx l2-fwd-offload on
2. ip link add link ethx name macvlan1 type macvlan
3. ip addr add <address> dev macvlan1
4. ip link set macvlan1 up
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It was found that the string that prints our copyright was
not up to date. Updating to reflect our copyright.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Changing descriptor count via 'ethtool -G' is not persistent across resets.
When PF reset occurs, we roll back to the default value of vsi->num_desc,
which is used then in i40e_alloc_rings to set descriptor count. XDP does a
PF reset so when user has changed the descriptor count and load XDP
program, the default count will be back there.
To fix this:
* introduce new VSI members - num_tx_desc and num_rx_desc in favour of
num_desc
* set them in i40e_set_ringparam to user's values
* set them to default values in i40e_set_num_rings_in_vsi only when they
don't have previous values
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes reading f/w LLDP agent status at DCB init time.
It's done by removing direct NVM reading in i40e_update_dcb_config()
and checking whether f/w LLDP agent is disabled via
I40E_FLAG_DISABLE_FW_LLDP flag in i40e_init_pf_dcb(). The function
i40e_update_dcb_config() in i40e_main.c is a temporary solution which
will be later renamed to i40e_init_dcb() in the i40e_dcb module. Also
logging was extended to make visible if f/w LLDP agent is running or not
and always log a message when DCB was not initialized. Without this
patch for new f/w versions f/w LLDP agent status was always read
from NVM as disabled and DCB initialization failed without
clear reason in logs.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Generate log entry when TC0 is created or deleted.
Log entry is generated during main VSI setup.
Before there was no log info about adding or deleting TC0.
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The tx_errors statistic was being calculated twice in
i40e_update_eth_stats.
This appears to be as of commit 201db2898f2c ("i40e: add missing VSI
statistics", 2014-03-25).
Remove the extra i40e_stat_update32 call for GLV_TEPC.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes the problem with a kernel panic occurring when trying
to bind the i40e driver to a non-i40e port. The problem is fixed by
checking if the BAR size in the device is large enough by reading the
highest register.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add veb array access boundary checks.
Ensure veb array index is smaller than I40E_MAX_VEB.
Signed-off-by: Piotr Kwapulinski <piotr.kwapulinski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
New device ids are created to support X710 backplane and SFP+ cards.
This patch adds in i40e driver support for 2.5GbaseT and 5GbaseT speed.
It's implemented by checking I40E_CAP_PHY_TYPE_2_5GBASE_T,
I40E_CAP_PHY_TYPE_5GBASE_T bits from f/w and setting corresponding bits
in ethtool link ksettings supported and advertising masks.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a bug where changing VLAN to 0 was not set until VF
restart.
Now we are setting pvid info to 0 when we have to change VLAN to 0.
Without this change when VF VLAN was changed to 0 nothing happened until
VF restart. For changing to VLAN different than 0 it worked correctly.
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
New device ids are created to support X710/XXV710 N3000 cards.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces "recovery mode" to the i40e driver. It is
part of a new Any2Any idea of upgrading the firmware. In this
approach, it is required for the driver to have support for
"transition firmware", that is used for migrating from structured
to flat firmware image. In this new, very basic mode, i40e driver
must be able to handle particular IOCTL calls from the NVM Update
Tool and run a small set of AQ commands.
These additional AQ commands are part of the interface used by
the NVMUpdate tool. The NVMUpdate tool contains all of the
necessary logic to reference these new AQ commands. The end user
experience remains the same, they are using the NVMUpdate tool to
update the NVM contents.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Signed-off-by: Piotr Marczak <piotr.marczak@intel.com>
Tested-by: Don Buchholz <donald.buchholz@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Printing each devices PCI vendor and device ID has the advantage of
easily revealing what hardware we're dealing with exactly. It's no
longer necessary to match the PCI bus information to the lspci output.
Helps with bug reports where no lspci output is available.
Output before
i40e 0000:08:00.0: fw 6.1.49420 api 1.7 nvm 6.80 0x80003c64 1.2007.0
and after
i40e 0000:08:00.0: fw 6.1.49420 api 1.7 nvm 6.80 0x80003c64 1.2007.0 [8086:1572] [8086:0004]
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Just bumping the version number appropriately.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This code implements driver code changes necessary for LLDP
Agent support. Modified i40e_aq_start_lldp() and
i40e_aq_stop_lldp() adding false parameter whether LLDP state
should be persistent across power cycles.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
TX MDD events reported on the PF are the result of the
PF misconfiguring a descriptor and not because of "bad actions"
by anything else. No need to reset now because if it
results in a Tx hang, the Tx hang check will take care of it.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch changes the driver behavior when detecting a VF MDD event.
It now disables the VF after one event, which indicates a hw detected
problem in the VF. Before this change, the PF would allow a couple of
events before doing the reset.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Added the API version in the error message for clarity.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Removed misleading messages when untrusted VF tries to
add more addresses than NIC limit
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modify the i40e_init_dcb to return the correct error when LLDP or DCBX
is not in operational state.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the case where PTP is running on the hardware clock, but the kernel
system time is not being synced, a device reset can mess up the clock
time.
This occurs because we reset the clock time based on the kernel time
every reset. This causes us to potentially completely reset the PTP
time, and can cause unexpected behavior in programs like ptp4l.
Avoid this by saving the PTP time prior to device reset, and then
restoring using that time after the reset.
Directly restoring the PTP time we saved isn't perfect, because time
should have continued running, but the clock will essentially be stopped
during the reset. This is still better than the current solution of
assuming that the PTP HW clock is synced to the CLOCK_REALTIME.
We can do even better, by saving the ktime and calculating
a differential, using ktime_get(). This is based on CLOCK_MONOTONIC, and
allows us to get a fairly precise measure of the time difference between
saving and restoring the time.
Using this, we can update the saved PTP time, and use that as the value
to write to the hardware clock registers. This, of course is not perfect.
However, it does help ensure that the PTP time is restored as close as
feasible to the time it should have been if the reset had not occurred.
During device initialization, continue using the system time as the
source for the creation of the PTP clock, since this is the best known
current time source at driver load.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Modifying the VLAN stripping options when a port VLAN is configured
will break traffic for the VSI, and conceptually doesn't make sense,
so don't allow this.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces DDP (Dynamic Device Personalization) which allows
loading profiles that change the way internal parser interprets processed
frames. To load DDP profiles it utilizes ethtool flash feature. The files
with recipes must be located in /var/lib/firmware directory. Afterwards
the recipe can be loaded by invoking:
ethtool -f <if_name> <file_name> 100
ethtool -f <if_name> - 100
See further details of this feature in the i40e documentation, or
visit
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
The driver shall verify DDP profile can be loaded in accordance with
the rules:
* Package with Group ID 0 are exclusive and can only be loaded the first.
* Packages with Group ID 0x01-0xFE can only be loaded simultaneously
with the packages from the same group.
* Packages with Group ID 0xFF are compatible with all other packages.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Added a new local variable in the i40e_setup_tc function named
old_queue_pairs so num_queue_pairs can be restored to the correct
value in case configuring queue channels fails. Additionally, moved
the exit label in the i40e_setup_tc function so the if (need_reset)
block can be executed.
Also, fixed data packing in the i40e_setup_tc function.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In commit f3fef2b6e1 ("i40e: Remove umem from VSI") a regression was
introduced; When the VSI was reset, the setup code would try to enable
AF_XDP ZC unconditionally (as long as there was a umem placed in the
netdev._rx struct). Here, we add a bitmap to the VSI that tracks if a
certain queue pair has been "zero-copy enabled" via the ndo_bpf. The
bitmap is used in i40e_xsk_umem, and enables zero-copy if and only if
XDP is enabled, the corresponding qid in the bitmap is set and the
umem is non-NULL.
Fixes: f3fef2b6e1 ("i40e: Remove umem from VSI")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e_xsk_umem function was explicitly inlined in i40e.h. There is
no reason for that, so move it to i40e_main.c instead.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Three conflicts, one of which, for marvell10g.c is non-trivial and
requires some follow-up from Heiner or someone else.
The issue is that Heiner converted the marvell10g driver over to
use the generic c45 code as much as possible.
However, in 'net' a bug fix appeared which makes sure that a new
local mask (MDIO_AN_10GBT_CTRL_ADV_NBT_MASK) with value 0x01e0
is cleared.
Signed-off-by: David S. Miller <davem@davemloft.net>
When the driver clears the XDP xmit ring due to re-configuration or
teardown, in-progress ndo_xdp_xmit must be taken into consideration.
The ndo_xdp_xmit function is typically called from a NAPI context that
the driver does not control. Therefore, we must be careful not to
clear the XDP ring, while the call is on-going. This patch adds a
synchronize_rcu() to wait for napi(s) (preempt-disable regions and
softirqs), prior clearing the queue. Further, the __I40E_CONFIG_BUSY
flag is checked in the ndo_xdp_xmit implementation to avoid touching
the XDP xmit queue during re-configuration.
Fixes: d9314c474d ("i40e: add support for XDP_REDIRECT")
Fixes: 123cecd427 ("i40e: added queue pair disable/enable functions")
Reported-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When the RX rings are created they are also populated with buffers
so that packets can be received. Usually these are kernel buffers,
but for AF_XDP in zero-copy mode, these are user-space buffers and
in this case the application might not have sent down any buffers
to the driver at this point. And if no buffers are allocated at ring
creation time, no packets can be received and no interrupts will be
generated so the NAPI poll function that allocates buffers to the
rings will never get executed.
To rectify this, we kick the NAPI context of any queue with an
attached AF_XDP zero-copy socket in two places in the code. Once
after an XDP program has loaded and once after the umem is registered.
This take care of both cases: XDP program gets loaded first then AF_XDP
socket is created, and the reverse, AF_XDP socket is created first,
then XDP program is loaded.
Fixes: 0a714186d3 ("i40e: add AF_XDP zero-copy Rx support")
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Commit c9b47cc1fa ("xsk: fix bug when trying to use both copy and
zero-copy on one queue id") moved the umem query code to the AF_XDP
core, and therefore removed the need to query the netdevice for a
umem.
This patch removes XDP_QUERY_XSK_UMEM and all code that implement that
behavior, which is just dead code.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch wraps the dissector key and mask - that flower uses to
represent the matching side - around the flow_match structure.
To avoid a follow up patch that would edit the same LoCs in the drivers,
this patch also wraps this new flow match structure around the flow rule
object. This new structure will also contain the flow actions in follow
up patches.
This introduces two new interfaces:
bool flow_rule_match_key(rule, dissector_id)
that returns true if a given matching key is set on, and:
flow_rule_match_XYZ(rule, &match);
To fetch the matching side XYZ into the match container structure, to
retrieve the key and the mask with one single call.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
There are several statements that have incorrect levels of indentation,
fix these.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Convert spaces to tabs to get correct alignment.
Found with the help of Coccinelle.
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds support for setting/getting FEC configuration
using ethtool options:
set/show-priv-flags rs-fec/base-r-fec
set/show-fec off/rs/baser/auto for kernels version >= 4.14
Signed-off-by: Damian Dybek <damian.dybek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Limiting RSS queues number to online CPUs number in order to
avoid issues with creating misconfigured RSS queues.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Drivers may not be able to support certain FDB entries, and an error
code is insufficient to give clear hints as to the reasons of rejection.
In order to make it possible to communicate the rejection reason, extend
ndo_fdb_add() with an extack argument. Adapt the existing
implementations of ndo_fdb_add() to take the parameter (and ignore it).
Pass the extack parameter when invoking ndo_fdb_add() from rtnl_fdb_add().
Signed-off-by: Petr Machata <petrm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
We already need to zero out memory for dma_alloc_coherent(), as such
using dma_zalloc_coherent() is superflous. Phase it out.
This change was generated with the following Coccinelle SmPL patch:
@ replace_dma_zalloc_coherent @
expression dev, size, data, handle, flags;
@@
-dma_zalloc_coherent(dev, size, handle, flags)
+dma_alloc_coherent(dev, size, handle, flags)
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
[hch: re-ran the script on the latest tree]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2018-12-20
This series contains updates to e100, igb, ixgbe, i40e and ice drivers.
I replaced spinlocks for mutex locks to reduce the latency on CPU0 for
igb when updating the statistics. This work was based off a patch
provided by Jan Jablonsky, which was against an older version of the igb
driver.
Jesus adjusts the receive packet buffer size from 32K to 30K when
running in QAV mode, to stay within 60K for total packet buffer size for
igb.
Vinicius adds igb kernel documentation regarding the CBS algorithm and
its implementation in the i210 family of NICs.
YueHaibing from Huawei fixed the e100 driver that was potentially
passing a NULL pointer, so use the kernel macro IS_ERR_OR_NULL()
instead.
Konstantin Khorenko fixes i40e where we were not setting up the
neigh_priv_len in our net_device, which caused the driver to read beyond
the neighbor entry allocated memory.
Miroslav Lichvar extends the PTP gettime() to read the system clock by
adding support for PTP_SYS_OFFSET_EXTENDED ioctl in i40e.
Young Xiao fixed the ice driver to only enable NAPI on q_vectors that
actually have transmit and receive rings.
Kai-Heng Feng fixes an igb issue that when placed in suspend mode, the
NIC does not wake up when a cable is plugged in. This was due to the
driver not setting PME during runtime suspend.
Stephen Douthit enables the ixgbe driver allow DSA devices to use the
MII interface to talk to switches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Out of bound read reported by KASan.
i40iw_net_event() reads unconditionally 16 bytes from
neigh->primary_key while the memory allocated for
"neighbour" struct is evaluated in neigh_alloc() as
tbl->entry_size + dev->neigh_priv_len
where "dev" is a net_device.
But the driver does not setup dev->neigh_priv_len and
we read beyond the neigh entry allocated memory,
so the patch in the next mail fixes this.
Signed-off-by: Konstantin Khorenko <khorenko@virtuozzo.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Lots of conflicts, by happily all cases of overlapping
changes, parallel adds, things of that nature.
Thanks to Stephen Rothwell, Saeed Mahameed, and others
for their guidance in these resolutions.
Signed-off-by: David S. Miller <davem@davemloft.net>
Drivers may not be able to implement a VLAN addition or reconfiguration.
In those cases it's desirable to explain to the user that it was
rejected (and why).
To that end, add extack argument to ndo_bridge_setlink. Adapt all users
to that change.
Following patches will use the new argument in the bridge driver.
Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
A previous commit moved the ether_addr_copy() in i40e_set_mac() before
the mac filter del/add to avoid a race. However it wasn't taken into
account that this alters the mac address being handed to
i40e_del_mac_filter().
Also changed i40e_add_mac_filter() to operate on netdev->dev_addr,
hopefully that makes the code easier to read.
Fixes: 458867b2ca ("i40e: don't remove netdev->dev_addr when syncing uc list")
Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Trivial conflict in net/core/filter.c, a locally computed
'sdif' is now an argument to the function.
Signed-off-by: David S. Miller <davem@davemloft.net>
In __i40e_del_filter function, the flag __I40E_MACVLAN_SYNC_PENDING for
the PF state is wrongly set for the VSI. Deleting any of the MAC filters
has caused the incorrect syncing for the PF. Fix it by setting this state
flag to the intended PF.
CC: stable <stable@vger.kernel.org>
Signed-off-by: Lihong Yang <lihong.yang@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If a TX hang occurs, we attempt to recover by incrementally resetting.
If we're starved for CPU time, it's possible the reset doesn't actually
complete (or even fire) before another tx_timeout fires causing us to
fly through the different resets without actually doing them.
This adds a bit to set and check if a timeout recovery is already
pending and, if so, bail out of tx_timeout. The bit will get cleared at
the end of i40e_rebuild when reset is complete.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch allows disabling FW LLDP agent on X722 devices.
It also changes a source of information for this feature from
pf->hw_features to pf->hw.flags which are set in i40e_init_adminq.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The version numbers have not been kept up to date and this is
an effort to ammend that.
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Using strncpy allows destination buffer to be not null terminated
after the copying takes place. strlcpy ensures that's not the
case by explicitly setting last element in the buffer as '\0'.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
lockdep_assert_held() is better suited to checking locking requirements,
since it won't get confused when someone else holds the lock. This is
also a step towards possibly removing spin_is_locked().
Signed-off-by: Lance Roy <ldr709@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The assignment of the feature flag NETIF_F_NTUPLE and NETIF_F_HW_TC
occurs prior to the initial setup of the local hw_features variable.
This means the features are set as user-changeable, but are not set in
the currently active feature list. This results in the features being
disabled at the driver's initial load.
Move the assignment after the initial assignment of hw_features, and
assign to the local variable. This ensures that NETIF_F_NTUPLE and
NETIF_F_HW_TC are marked as user-changeable, and also enables them by
default when the driver loads.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since commit bacd75cfac ("i40e/i40evf: Add capability exchange for
outer checksum", 2017-04-06) the i40e driver has not reported support
for IP-in-IP offloads. This likely occurred due to a bad rebase, as the
commit extracts hw_enc_features into its own variable. As part of this
change, it dropped the NETIF_F_FSO_IPXIP flags from the
netdev->hw_enc_features. This was unfortunately not caught during code
review.
Fix this by adding back the missing feature flags.
For reference, NETIF_F_GSO_IPXIP4 was added in commit 7e13318daa
("net: define gso types for IPx over IPv4 and IPv6", 2016-05-20),
replacing NETIF_F_GSO_IPIP and NETIF_F_GSO_SIT.
NETIF_F_GSO_IPXIP6 was added in commit bf2d1df395 ("intel: Add support
for IPv6 IP-in-IP offload", 2016-05-20).
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAlvPV7IUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyaUg//WnCaRIu2oKOp8c/bplZJDW5eT10d
oYAN9qeyptU9RYrg4KBNbZL9UKGFTk3AoN5AUjrk8njxc/dY2ra/79esOvZyyYQy
qLXBvrXKg3yZnlNlnyBneGSnUVwv/kl2hZS+kmYby2YOa8AH/mhU0FIFvsnfRK2I
XvwABFm2ZYvXCqh3e5HXaHhOsR88NQ9In0AXVC7zHGqv1r/bMVn2YzPZHL/zzMrF
mS79tdBTH+shSvchH9zvfgIs+UEKvvjEJsG2liwMkcQaV41i5dZjSKTdJ3EaD/Y2
BreLxXRnRYGUkBqfcon16Yx+P6VCefDRLa+RhwYO3dxFF2N4ZpblbkIdBATwKLjL
npiGc6R8yFjTmZU0/7olMyMCm7igIBmDvWPcsKEE8R4PezwoQv6YKHBMwEaflIbl
Rv4IUqjJzmQPaA0KkRoAVgAKHxldaNqno/6G1FR2gwz+fr68p5WSYFlQ3axhvTjc
bBMJpB/fbp9WmpGJieTt6iMOI6V1pnCVjibM5ZON59WCFfytHGGpbYW05gtZEod4
d/3yRuU53JRSj3jQAQuF1B6qYhyxvv5YEtAQqIFeHaPZ67nL6agw09hE+TlXjWbE
rTQRShflQ+ydnzIfKicFgy6/53D5hq7iH2l7HwJVXbXRQ104T5DB/XHUUTr+UWQn
/Nkhov32/n6GjxQ=
=58I4
-----END PGP SIGNATURE-----
Merge tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI updates from Bjorn Helgaas:
- Fix ASPM link_state teardown on removal (Lukas Wunner)
- Fix misleading _OSC ASPM message (Sinan Kaya)
- Make _OSC optional for PCI (Sinan Kaya)
- Don't initialize ASPM link state when ACPI_FADT_NO_ASPM is set
(Patrick Talbert)
- Remove x86 and arm64 node-local allocation for host bridge structures
(Punit Agrawal)
- Pay attention to device-specific _PXM node values (Jonathan Cameron)
- Support new Immediate Readiness bit (Felipe Balbi)
- Differentiate between pciehp surprise and safe removal (Lukas Wunner)
- Remove unnecessary pciehp includes (Lukas Wunner)
- Drop pciehp hotplug_slot_ops wrappers (Lukas Wunner)
- Tolerate PCIe Slot Presence Detect being hardwired to zero to
workaround broken hardware, e.g., the Wilocity switch/wireless device
(Lukas Wunner)
- Unify pciehp controller & slot structs (Lukas Wunner)
- Constify hotplug_slot_ops (Lukas Wunner)
- Drop hotplug_slot_info (Lukas Wunner)
- Embed hotplug_slot struct into users instead of allocating it
separately (Lukas Wunner)
- Initialize PCIe port service drivers directly instead of relying on
initcall ordering (Keith Busch)
- Restore PCI config state after a slot reset (Keith Busch)
- Save/restore DPC config state along with other PCI config state
(Keith Busch)
- Reference count devices during AER handling to avoid race issue with
concurrent hot removal (Keith Busch)
- If an Upstream Port reports ERR_FATAL, don't try to read the Port's
config space because it is probably unreachable (Keith Busch)
- During error handling, use slot-specific reset instead of secondary
bus reset to avoid link up/down issues on hotplug ports (Keith Busch)
- Restore previous AER/DPC handling that does not remove and
re-enumerate devices on ERR_FATAL (Keith Busch)
- Notify all drivers that may be affected by error recovery resets
(Keith Busch)
- Always generate error recovery uevents, even if a driver doesn't have
error callbacks (Keith Busch)
- Make PCIe link active reporting detection generic (Keith Busch)
- Support D3cold in PCIe hierarchies during system sleep and runtime,
including hotplug and Thunderbolt ports (Mika Westerberg)
- Handle hpmemsize/hpiosize kernel parameters uniformly, whether slots
are empty or occupied (Jon Derrick)
- Remove duplicated include from pci/pcie/err.c and unused variable
from cpqphp (YueHaibing)
- Remove driver pci_cleanup_aer_uncorrect_error_status() calls (Oza
Pawandeep)
- Uninline PCI bus accessors for better ftracing (Keith Busch)
- Remove unused AER Root Port .error_resume method (Keith Busch)
- Use kfifo in AER instead of a local version (Keith Busch)
- Use threaded IRQ in AER bottom half (Keith Busch)
- Use managed resources in AER core (Keith Busch)
- Reuse pcie_port_find_device() for AER injection (Keith Busch)
- Abstract AER interrupt handling to disconnect error injection (Keith
Busch)
- Refactor AER injection callbacks to simplify future improvments
(Keith Busch)
- Remove unused Netronome NFP32xx Device IDs (Jakub Kicinski)
- Use bitmap_zalloc() for dma_alias_mask (Andy Shevchenko)
- Add switch fall-through annotations (Gustavo A. R. Silva)
- Remove unused Switchtec quirk variable (Joshua Abraham)
- Fix pci.c kernel-doc warning (Randy Dunlap)
- Remove trivial PCI wrappers for DMA APIs (Christoph Hellwig)
- Add Intel GPU device IDs to spurious interrupt quirk (Bin Meng)
- Run Switchtec DMA aliasing quirk only on NTB endpoints to avoid
useless dmesg errors (Logan Gunthorpe)
- Update Switchtec NTB documentation (Wesley Yung)
- Remove redundant "default n" from Kconfig (Bartlomiej Zolnierkiewicz)
- Avoid panic when drivers enable MSI/MSI-X twice (Tonghao Zhang)
- Add PCI support for peer-to-peer DMA (Logan Gunthorpe)
- Add sysfs group for PCI peer-to-peer memory statistics (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA scatterlist mapping interface (Logan
Gunthorpe)
- Add PCI configfs/sysfs helpers for use by peer-to-peer users (Logan
Gunthorpe)
- Add PCI peer-to-peer DMA driver writer's documentation (Logan
Gunthorpe)
- Add block layer flag to indicate driver support for PCI peer-to-peer
DMA (Logan Gunthorpe)
- Map Infiniband scatterlists for peer-to-peer DMA if they contain P2P
memory (Logan Gunthorpe)
- Register nvme-pci CMB buffer as PCI peer-to-peer memory (Logan
Gunthorpe)
- Add nvme-pci support for PCI peer-to-peer memory in requests (Logan
Gunthorpe)
- Use PCI peer-to-peer memory in nvme (Stephen Bates, Steve Wise,
Christoph Hellwig, Logan Gunthorpe)
- Cache VF config space size to optimize enumeration of many VFs
(KarimAllah Ahmed)
- Remove unnecessary <linux/pci-ats.h> include (Bjorn Helgaas)
- Fix VMD AERSID quirk Device ID matching (Jon Derrick)
- Fix Cadence PHY handling during probe (Alan Douglas)
- Signal Cadence Endpoint interrupts via AXI region 0 instead of last
region (Alan Douglas)
- Write Cadence Endpoint MSI interrupts with 32 bits of data (Alan
Douglas)
- Remove redundant controller tests for "device_type == pci" (Rob
Herring)
- Document R-Car E3 (R8A77990) bindings (Tho Vu)
- Add device tree support for R-Car r8a7744 (Biju Das)
- Drop unused mvebu PCIe capability code (Thomas Petazzoni)
- Add shared PCI bridge emulation code (Thomas Petazzoni)
- Convert mvebu to use shared PCI bridge emulation (Thomas Petazzoni)
- Add aardvark Root Port emulation (Thomas Petazzoni)
- Support 100MHz/200MHz refclocks for i.MX6 (Lucas Stach)
- Add initial power management for i.MX7 (Leonard Crestez)
- Add PME_Turn_Off support for i.MX7 (Leonard Crestez)
- Fix qcom runtime power management error handling (Bjorn Andersson)
- Update TI dra7xx unaligned access errata workaround for host mode as
well as endpoint mode (Vignesh R)
- Fix kirin section mismatch warning (Nathan Chancellor)
- Remove iproc PAXC slot check to allow VF support (Jitendra Bhivare)
- Quirk Keystone K2G to limit MRRS to 256 (Kishon Vijay Abraham I)
- Update Keystone to use MRRS quirk for host bridge instead of open
coding (Kishon Vijay Abraham I)
- Refactor Keystone link establishment (Kishon Vijay Abraham I)
- Simplify and speed up Keystone link training (Kishon Vijay Abraham I)
- Remove unused Keystone host_init argument (Kishon Vijay Abraham I)
- Merge Keystone driver files into one (Kishon Vijay Abraham I)
- Remove redundant Keystone platform_set_drvdata() (Kishon Vijay
Abraham I)
- Rename Keystone functions for uniformity (Kishon Vijay Abraham I)
- Add Keystone device control module DT binding (Kishon Vijay Abraham
I)
- Use SYSCON API to get Keystone control module device IDs (Kishon
Vijay Abraham I)
- Clean up Keystone PHY handling (Kishon Vijay Abraham I)
- Use runtime PM APIs to enable Keystone clock (Kishon Vijay Abraham I)
- Clean up Keystone config space access checks (Kishon Vijay Abraham I)
- Get Keystone outbound window count from DT (Kishon Vijay Abraham I)
- Clean up Keystone outbound window configuration (Kishon Vijay Abraham
I)
- Clean up Keystone DBI setup (Kishon Vijay Abraham I)
- Clean up Keystone ks_pcie_link_up() (Kishon Vijay Abraham I)
- Fix Keystone IRQ status checking (Kishon Vijay Abraham I)
- Add debug messages for all Keystone errors (Kishon Vijay Abraham I)
- Clean up Keystone includes and macros (Kishon Vijay Abraham I)
- Fix Mediatek unchecked return value from devm_pci_remap_iospace()
(Gustavo A. R. Silva)
- Fix Mediatek endpoint/port matching logic (Honghui Zhang)
- Change Mediatek Root Port Class Code to PCI_CLASS_BRIDGE_PCI (Honghui
Zhang)
- Remove redundant Mediatek PM domain check (Honghui Zhang)
- Convert Mediatek to pci_host_probe() (Honghui Zhang)
- Fix Mediatek MSI enablement (Honghui Zhang)
- Add Mediatek system PM support for MT2712 and MT7622 (Honghui Zhang)
- Add Mediatek loadable module support (Honghui Zhang)
- Detach VMD resources after stopping root bus to prevent orphan
resources (Jon Derrick)
- Convert pcitest build process to that used by other tools (iio, perf,
etc) (Gustavo Pimentel)
* tag 'pci-v4.20-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (140 commits)
PCI/AER: Refactor error injection fallbacks
PCI/AER: Abstract AER interrupt handling
PCI/AER: Reuse existing pcie_port_find_device() interface
PCI/AER: Use managed resource allocations
PCI: pcie: Remove redundant 'default n' from Kconfig
PCI: aardvark: Implement emulated root PCI bridge config space
PCI: mvebu: Convert to PCI emulated bridge config space
PCI: mvebu: Drop unused PCI express capability code
PCI: Introduce PCI bridge emulated config space common logic
PCI: vmd: Detach resources after stopping root bus
nvmet: Optionally use PCI P2P memory
nvmet: Introduce helper functions to allocate and free request SGLs
nvme-pci: Add support for P2P memory in requests
nvme-pci: Use PCI p2pmem subsystem to manage the CMB
IB/core: Ensure we map P2P memory correctly in rdma_rw_ctx_[init|destroy]()
block: Add PCI P2P flag for request queue
PCI/P2PDMA: Add P2P DMA driver writer's documentation
docs-rst: Add a new directory for PCI documentation
PCI/P2PDMA: Introduce configfs/sysfs enable attribute helpers
PCI/P2PDMA: Add PCI p2pmem DMA mappings to adjust the bus offset
...
After bfcb79fca1 ("PCI/ERR: Run error recovery callbacks for all affected
devices"), AER errors are always cleared by the PCI core and drivers don't
need to do it themselves.
Remove calls to pci_cleanup_aer_uncorrect_error_status() from device
driver error recovery functions.
Signed-off-by: Oza Pawandeep <poza@codeaurora.org>
[bhelgaas: changelog, remove PCI core changes, remove unused variables]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
When isup is false meaning that interface is going to shut down
set new speed to 0 to avoid double 'NIC Link is Down' messages.
Signed-off-by: Sergey Nemov <sergey.nemov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In order to slightly simplify the code use the variables for pf and hw
that are declared in i40e_set_mac function.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This resolves an issue where the VF link state was not being updated
when the PF is down or up, and the VF link state would always show
that it is running.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We recently updated all our SPDX identifiers to correctly
indicate our net/ethernet/intel/* drivers were always released
and intended to be released under GPL v2, but the MODULE_LICENSE
declaration was never updated.
Fix the MODULE_LICENSE to be GPL v2, for all our drivers.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Daniel Borkmann says:
====================
pull-request: bpf-next 2018-09-01
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Add AF_XDP zero-copy support for i40e driver (!), from Björn and Magnus.
2) BPF verifier improvements by giving each register its own liveness
chain which allows to simplify and getting rid of skip_callee() logic,
from Edward.
3) Add bpf fs pretty print support for percpu arraymap, percpu hashmap
and percpu lru hashmap. Also add generic percpu formatted print on
bpftool so the same can be dumped there, from Yonghong.
4) Add bpf_{set,get}sockopt() helper support for TCP_SAVE_SYN and
TCP_SAVED_SYN options to allow reflection of tos/tclass from received
SYN packet, from Nikita.
5) Misc improvements to the BPF sockmap test cases in terms of cgroup v2
interaction and removal of incorrect shutdown() calls, from John.
6) Few cleanups in xdp_umem_assign_dev() and xdpsock samples, from Prashant.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hold the rtnl lock when we're clearing interrupt scheme
in i40e_shutdown and in i40e_remove.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
If our card has been put in an unstable state due to
other drivers interacting with it, speed settings
might be incorrect. If incorrect, forcefully reset them
on open to known default values.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When XDP is enabled, the driver will report incorrect
statistics. Received frames will reported as transmitted frames.
This commits fixes the i40e implementation of ndo_get_stats64 (struct
net_device_ops), so that iproute2 will report correct statistics
(e.g. when running "ip -stats link show dev eth0") even when XDP is
enabled.
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Fixes: 74608d17fe ("i40e: add support for XDP_TX action")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Static analysis tools report a problem from original driver submission.
Removing unnecessary check in condition.
Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.
The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.
As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.
As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
i40e_xsk.c.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.
All AF_XDP specific functions are added to a new file, i40e_xsk.c.
Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add functions for queue pair enable/disable. Instead of resetting the
whole device, only the affected queue pair is disabled or enabled.
This plumbing is used in a later commit, when zero-copy AF_XDP support
is introduced.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
If interface is connected to switch port configured for DCB there are
TX timeouts when bringing up interface. Problem started appearing after
adding in i40e driver code mqprio hardware offload mode. In function
i40e_vsi_configure_bw_alloc was added resetting BW rate which should
be executing when mqprio qdisc is removed but was also when there was
no mqprio qdisc added and DCB was enabled. In this patch was added
additional check for DCB flag so now when DCB is enabled the correct
DCB configs from before mqprio patch are restored.
Signed-off-by: Martyna Szapar <martyna.szapar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 114790 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch configures FEC setting in i40e_force_link_state().
For some reason setting this field was overlooked thus causing
25G link to be configured incorrectly.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Function call to i40e_prep_for_reset() is duplicated in
i40e_shutdown routine and gets called before
i40e_enable_mc_magic_wake() which blocks it from being executed
correctly on system reboot or shutdown because adminq is already
disabled by first i40e_prep_for_reset() call.
Two register write calls are also duplicated.
Signed-off-by: Sergey Nemov <sergey.nemov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
prog_attached of struct netdev_bpf should have been superseded
by simply setting prog_id long time ago, but we kept it around
to allow offloading drivers to communicate attachment mode (drv
vs hw). Subsequently drivers were also allowed to report back
attachment flags (prog_flags), and since nowadays only programs
attached will XDP_FLAGS_HW_MODE can get offloaded, we can tell
the attachment mode from the flags driver reports. Remove
prog_attached member.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Pass the extact struct from a tc qdisc add to the block bind function and,
in turn, to the setup_tc ndo of binding device via the tc_block_offload
struct. Pass this back to any block callback registrations to allow
netlink logging of fails in the bind process.
Signed-off-by: John Hurley <john.hurley@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
-EOPNOTSUPP is the error value that should be reported if a flower
command is not supported by a driver. Fix it in couple of Intel drivers.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
random_ether_addr is a #define for eth_random_addr which is
generally preferred in kernel code by ~3:1
Convert the uses of random_ether_addr to enable removing the #define
Miscellanea:
o Convert &vfmac[0] to equivalent vfmac and avoid unnecessary line wrap
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the ndo_xdp_flush call implementation i40e_xdp_flush
as no callers of ndo_xdp_flush are left.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
This patch fixes a hardware reset support in VF driver.
It is needed because when a hardware reset is detected
adapter->state is in __I40EVF_RESETTING state before
i40evf_reset_task is called. Without this patch
unloading VF driver after a hardware reset ends
with a system crash.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a recalculation of number of MSI-X
vectors for VMDq in the case where we have less
vectors available than we would want to reserve for
VMDq.
It fixes the issue where we recalculate vectors left
and vectors wanted but we didn't take into account
the reduced number of queue pairs per VSI.
Signed-off-by: Patryk Małek <patryk.malek@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix warnings regarding restricted __be32 type usage by strictly
specifying the type of the ipv4 address being printed in the dev_err
statement.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The expectation of the ops VIRTCHNL_OP_ENABLE_QUEUES and
VIRTCHNL_OP_DISABLE_QUEUES is that the queue map sent by
the VF is taken into account when enabling/disabling
queues in the VF VSI. This patch makes sure that happens.
By breaking out the individual queue set up functions so
that they can be called directly from the i40e_virtchnl_pf.c
file, only the queues as specified by the queue bit map that
accompanies the enable/disable queues ops will be handled.
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This fixes at least 2 issues I have found with the UDP tunnel filter
configuration.
The first issue is the fact that the tunnels didn't have any sort of mutual
exclusion in place to prevent an update from racing with a user request to
add/remove a port. As such you could request to add and remove a port
before the port update code had a chance to respond which would result in a
very confusing result. To address it I have added 2 changes. First I added
the RTNL mutex wrapper around our updating of the pending, port, and
filter_index bits. Second I added logic so that we cannot use a port that
has a pending deletion since we need to free the space in hardware before
we can allow software to reuse it.
The second issue addressed is the fact that we were not recording the
actual filter index provided to us by the admin queue. As a result we were
deleting filters that were not associated with the actual filter we wanted
to delete. To fix that I added a filter_index member to the UDP port
tracking structure.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recent versions of the Linux kernel now warn about incorrect parameter
definitions for function comments. Fix up several function comments to
correctly reflect the current function arguments. This cleans up the
warnings and helps ensure our documentation is accurate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
After many years of having a ~30 line copyright and license header to our
source files, we are finally able to reduce that to one line with the
advent of the SPDX identifier.
Also caught a few files missing the SPDX license identifier, so fixed
them up.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
The driver now acts upon the XDP_REDIRECT return action. Two new ndos
are implemented, ndo_xdp_xmit and ndo_xdp_flush.
XDP_REDIRECT action enables XDP program to redirect frames to other
netdevs.
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we suspend and resume, we need to clear and re-enable the interrupt
scheme. This was previously not done while holding the RTNL lock, which
could be problematic, because we are actually destroying and re-creating
queues.
Hold the RTNL lock for the entire sequence of preparing for reset, and
when resuming. This additionally protects the flags related to interrupt
scheme under RTNL lock so that their modification is properly threaded.
This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The iWarp client flags are all potentially changed when the RTNL lock is
not held, so they should not be part of the pf->flags variable. Instead,
move them into the state field so that we can use atomic bit operations.
This is part of a larger effort to remove cmpxchg64 in
i40e_set_priv_flags()
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This flag is modified outside of the RTNL lock and thus should not be
part of the pf->flags variable.
Use a state bit instead, so that we can use atomic bit operations.
This is part of a larger effort to remove cmpxchg64 in
i40e_set_priv_flags()
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The two Flow Directory auto disable flags are used at run time to mark
when the flow director features needed to be disabled. Thus the flags
could change even when the RTNL lock is not held.
They also have some code constructions which really should be
test_and_set or test_and_clear using atomic bit operations.
Create new state fields to mark this, and stop including them in
pf->flags.
This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This flag is modified during run time, possibly even when the RTNL lock
is not held. Additionally it has a few places which should be using
test_and_set or test_and_clear atomic bit operations.
Create a new state bit __I40E_UDP_SYNC_PENDING and use it instead of the
ole I40E_FLAG_UDP_FILTER_SYNC flag.
This is part of a larger effort to remove the need for using cmpxchg64
in i40e_set_priv_flags.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The I40E_FLAG_FILTER_SYNC flag is modified during run time possibly when
the RTNL lock is not held. Thus, it should not be part of pf->flags, and
instead should be using atomic bit operations in the pf->state field.
Create a __I40E_MACVLAN_SYNC_PENDING state bit, and use it instead of
the old I40E_FLAG_FILTER_SYNC flag.
This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add the SPDX identifiers to all the Intel wired LAN driver files, as
outlined in Documentation/process/license-rules.rst.
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
During suspend client MSIx vectors are freed while they are still
in use causing a crash on entering S3.
Fix this calling client close before freeing up its MSIx vectors.
Also update the client MSIx vectors on resume before client
open is called.
Fixes commit b980c0634f ("i40e: shutdown all IRQs and disable MSI-X
when suspended")
Reported-by: Stefan Assmann <sassmann@redhat.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When we re-enable ATR we need to restore the input set for TCPv4
filters, in order for ATR to function correctly. We already do this for
the normal case of re-enabling ATR when disabling ntuple support.
However, when re-enabling ATR after the last TCPv4 filter is removed (but
when ntuple support is still active), we did not restore the TCPv4
filter input set.
This can cause problems if the TCPv4 filters from FDir had changed the
input set, as ATR will no longer behave as expected.
When clearing the ATR auto-disable flag, make sure we restore the TCPv4
input set to avoid this.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A future patch needs to expand on the logic for re-enabling ATR. Doing
so would cause some code to break the 80-character line limit.
To reduce the level of indentation, factor out helper functions for
re-enabling ATR and SB rules.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When hardware has trouble with a particular filter, we delete it from
the list. Unfortunately, we did not properly update the per-filter
statistic when doing so.
Create a helper function to handle this, and properly reduce the
necessary counter so that it tracks the number of active filters
properly.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We used to use the function i40e_vlan_rx_register as a way to hook
into the now defunct .ndo_vlan_rx_register netdev hook. This was
removed but we kept the function around because we still used it
internally to control enabling or disabling of VLAN stripping.
As pointed out in upstream review, VLAN stripping is only used in a
single location and the previous function is quite small, just inline
it into i40e_restore_vlan() rather than carrying the function
separately.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces new ethtool private flag used for
forcing true link state. Function i40e_force_link_state that implements
this functionality was added, it sets phy_type = 0 in order to
work-around firmware's LESM. False positive error messages were
suppressed.
The ndo_open() should not succeed if there were issues with forcing link
state to be UP.
Added I40E_PHY_TYPES_BITMASK define with all phy types OR-ed together in
one bitmask. Added after phy type definition, so it will be hard to
forget to include new phy types to the bitmask.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch provides support to add or delete cloud filter for queue
channels created for ADq on VF.
We are using the HW's cloud filter feature and programming it to act
as a TC filter applied to a group of queues.
There are two possible modes for a VF when applying a cloud filter
1. Basic Mode: Intended to apply filters that don't need a VF to be
Trusted. This would include the following
Dest MAC + L4 port
Dest MAC + VLAN + L4 port
2. Advanced Mode: This mode is only for filters with combination that
requires VF to be Trusted.
Dest IP + L4 port
When cloud filters are applied on a trusted VF and for some reason
the same VF is later made as untrusted then all cloud filters
will be deleted. All cloud filters has to be re-applied in
such a case.
Cloud filters are also deleted when queue channel is deleted.
Testing-Hints:
=============
1. Adding Basic Mode filter should be possible on a VF in
Non-Trusted mode.
2. In Advanced mode all filters should be able to be created.
Steps:
======
1. Enable ADq and create TCs using TC mqprio command
2. Apply cloud filter.
3. Turn-off the spoof check.
4. Pass traffic.
Example:
========
1. tc qdisc add dev enp4s2 root mqprio num_tc 4 map 0 0 0 0 1 2 2 3\
queues 2@0 2@2 1@4 1@5 hw 1 mode channel
2. tc qdisc add dev enp4s2 ingress
3. ethtool -K enp4s2 hw-tc-offload on
4. ip link set ens261f0 vf 0 spoofchk off
5. tc filter add dev enp4s2 protocol ip parent ffff: prio 1 flower\
dst_ip 192.168.3.5/32 ip_proto udp dst_port 25 skip_sw hw_tc 2
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The MAC, FW Version and NPAR check used to determine
if shutting off the FW LLDP engine is supported is not
using the usual feature check mechanism.
This patch fixes the problem by moving the feature check
to i40e_sw_init in order to set a flag in pf->hw_features
that ethtool will use for priv_flags disable operation.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Broadcast filters can now cause overflow promiscuous to trigger when
adding "too many" VLANs to all the ports of a device and the driver
needs a way to exit overflow promiscuous once triggered.
Currently the driver looks to see if there are "too many" filters and/or
we have any failed filters to determine when it is safe to exit overflow
promiscuous. If we trigger overflow promiscuous with broadcast filters,
any new filters added will be "auto-failed" until we exit overflow
promiscuous. Since the user can't manually remove the failed broadcast
filters for VLANs (nor should we expect the user to do such), there is
no way to exit overflow promiscuous without reloading the driver.
The easiest way to do this is to remove the shortcut to "auto-fail"
filters in overflow promiscuous. If the user removes the VLANs, the
failed filters will be removed and since we're no longer "auto-failing"
new filters, we'll eventually get a good set of filters and exit
overflow promiscuous.
This has the side benefit of making filter state more explicit in that
if a filter says it's failed we know for a fact it failed and not just
assuming it will if we're in overflow promiscuous. This is nice because
if the user removes some filters and then adds some, even if we're in
overflow promiscuous, the filter might succeed; we were just assuming it
won't because the user hasn't rectified other existing failed filters.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This code here is quite complex and easy to screw up. Let's see if we
can't improve the readability and maintainability a bit. This refactors
out promisc_changed into two variables 'old_overflow' and 'new_overflow'
which makes it a bit clearer when we're concerned about when and how
overflow promiscuous is changed. This also makes so that we no longer
need to pass a boolean pointer to i40e_aqc_add_filters. Instead we can
simply check if we changed the overflow promiscuous flag since the
function start.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When adding a bunch of VLANs to all the ports on a device, it's possible
to run out of space for broadcast filters. The driver should trigger
overflow promiscuous in this circumstance to prevent traffic from being
unexpectedly dropped.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Could a Bad Person do Bad Things to a server if they found these
addresses printed in the log? Who knows? But let's not take that risk.
Remove pointers from a bunch of printks. In some cases, I was able to
adjust the message to indicate whether or not the value was null. In
others, I just removed the entire message as there was really no hope of
saving it.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fixes the following sparse warning:
drivers/net/ethernet/intel/i40e/i40e_main.c:5440:5: warning:
symbol 'i40e_get_link_speed' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch replaces the existing mechanism for determining the correct
value to program for adaptive ITR with yet another new and more
complicated approach.
The basic idea from a 30K foot view is that this new approach will push the
Rx interrupt moderation up so that by default it starts in low latency and
is gradually pushed up into a higher latency setup as long as doing so
increases the number of packets processed, if the number of packets drops
to 4 to 1 per packet we will reset and just base our ITR on the size of the
packets being received. For Tx we leave it floating at a high interrupt
delay and do not pull it down unless we start processing more than 112
packets per interrupt. If we start exceeding that we will cut our interrupt
rates in half until we are back below 112.
The side effect of these patches are that we will be processing more
packets per interrupt. This is both a good and a bad thing as it means we
will not be blocking processing in the case of things like pktgen and XDP,
but we will also be consuming a bit more CPU in the cases of things such as
network throughput tests using netperf.
One delta from this versus the ixgbe version of the changes is that I have
made the interrupt moderation a bit more aggressive when we are in bulk
mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The
main motivation behind moving this is to address the fact that we need to
update less frequently, and have more fine grained control due to the
separate Tx and Rx ITR times.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch is mostly prep-work for replacing the current approach to
programming the dynamic aka adaptive ITR. Specifically here what we are
doing is splitting the Tx and Rx ITR each into two separate values.
The first value current_itr represents the current value of the register.
The second value target_itr represents the desired value of the register.
The general plan by doing this is to allow for deferring the update of the
ITR value under certain circumstances. For now we will work with what we
have, but in the future I hope to change the behavior so that we always
only update one ITR at a time using some simple logic to determine which
ITR requires an update.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds necessary delay for 4.33 firmware to recover after
EMP reset. Without this patch driver occasionally reinitializes
structures too quickly to communicate with firmware after EMP reset
causing AdminQ to timeout.
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The rings are already split out into Tx and Rx rings so it doesn't make
sense to have any single ring store both a Tx and Rx itr_setting value.
Since that is the case drop the pair in favor of storing just a single ITR
value.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fix the number of queues per enabled TC and report available queues
to the kernel without having to limit them to the max RSS limit so
they are available to be mapped for XPS. This allows a queue per
processing thread available for handling traffic for the given
traffic class.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch suppresses the message about invalid TC mapping and wrong
selected TX queue. The root cause of this bug was setting too many
TC queue pairs on huge multiprocessor machines. When quantity of the
TC queue pairs is exceeding MSI-X vectors count then TX queue number
can be selected beyond actual TX queues amount.
Signed-off-by: Paweł Jabłoński <pawel.jablonski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The drivers for i40e and i40evf had a reg_idx value stored in the q_vector
that was going completely unused. I can only assume this was copied over
from ixgbe and nobody knew how to use it.
I'm going to make use of the value to avoid having to compute the vector
and thus the register index for multiple paths throughout the drivers.
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When implementing support for IP_USER_FLOW filters, we correctly
programmed a filter for both the non fragmented IPv4/Other filter, as
well as the fragmented IPv4 filters. However, we did not properly
program the input set for fragmented IPv4 PCTYPE. This meant that the
filters would almost certainly not match, unless the user specified all
of the flow types.
Add support to program the fragmented IPv4 filter input set. Since we
always program these filters together, we'll assume that the two input
sets must match, and will thus always program the input sets to the same
value.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
kdump fails in the system when used in conjunction with Ethernet driver
X722/X710. This is mainly because when we are resource constrained i.e.
when we have just one online_cpus, we are enabling VMDq and iWARP. It
doesn't make sense to enable them with just one CPU and starve kdump
for lack of IRQs.
So don't enable VMDq or iWARP when we just have a single CPU.
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Using ethtool --set-priv-flags disable-fw-lldp <on/off> is persistent
across reboots/reloads so we need some mechanism in the driver to detect
if it's on or off on init so we can set the ethtool private flag
appropriately. Without this, every time the driver is reloaded the flag
will default to off regardless of whether it's on or off in FW.
We detect this by first attempting to program DCB and if AQ fails
returning I40E_AQ_RC_EPERM, we know that LLDP is disabled in FW.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Implement the private flag disable-fw-lldp for ethtool
to disable the processing of LLDP packets by the FW.
This will stop the FW from consuming LLDPDU and cause
them to be sent up the stack.
The FW is also being configured to apply a default DCB
configuration on link up.
Toggling the value of this flag will also cause a PF reset.
Disabling FW DCB will also disable DCBx.
Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Make use of tc_cls_can_offload_and_chain0() to set extack msg in case
ethtool tc offload flag is not set or chain unsupported.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since TC block changes drivers are required to check if
the TC hw offload flag is set on the interface themselves.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Fixes: 44ae12a768 ("net: sched: move the can_offload check from binding phase to rule insertion phase")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix recreating the channel VSIs during the reset flow to reconfigure
the Tx rings and the queue context associated with the channel VSI.
Also update the next_base_queue for the VSI while rebuilding the
channel VSIs after a reset.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Omit an extra message for a memory allocation failure in this function.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that i40e_vsi_config_tc() has the pf and hw variable defined, use
them, instead of dereferencing vsi->back. Much easier to read.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The driver (and the entire netdev layer for that matter) assumes
that TC0 will always be present in our DCB configuration.
Unfortunately, this isn't always the case. Rather than fail to
configure the VSI, let's go ahead and try to make it work, even
though DCB will end up being disabled by the kernel.
If the driver fails to configure DCB, the driver queries what's
valid, then writes that back to the hardware, always forcing TC0.
This fixes a bug where the driver could fail to adhere to ETS BW
allocations if 8 TCs were configured on the switch.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In VFs, there is a known issue which can cause writebacks
to not occur when interrupts are disabled and there are
less than 4 descriptors resulting in TX timeout. Timeout
can also occur due to lost interrupt.
The current implementation for detecting and recovering
from hung queues in the PF is problematic because it actually
actively encourages lost interrupts. By triggering a SW
interrupt, interrupts are forced on. If we are already in
napi_poll and an interrupt fires, napi_poll will not be
rescheduled and the interrupt is effectively lost; thereby
potentially *causing* hung queues.
This patch checks whether packets are being processed between
every watchdog cycle and determine potential hung queue and
fires triggers SW interrupt only for that particular queue.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In some weird circumstances with DCB enabled, the firmware can fail to
configure the VSI, leaving us with zero traffic classes. Check for this
state when we configure RSS to avoid a panic.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds new I40E_NVMUPD_GET_AQ_EVENT state to allow
retrieval of AdminQ events as a result of AdminQ commands sent
to firmware.
Add preservation flags support on X722 devices for NVM update
AdminQ function wrapper. Add new parameter and handling to
nvmupdate admin queue function intended to allow nvmupdate tool
to configure the preservation flags in the AdminQ command.
This is required to implement FlatNVM on X722 devices.
Signed-off-by: Pawel Jablonski <pawel.jablonski@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bump the i40e driver from 2.1.14 to 2.3.2.
Bump the i40evf driver from 3.0.1 to 3.2.2
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When filter configuration is not supported, drivers should return
-EOPNOTSUPP so the core can react correctly.
Fixes: 2f4b411a3d ("i40e: Enable cloud filters via tc-flower")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In some circumstances, such as with bridging, it is possible that the
stack will add a devices own MAC address to its unicast address list.
If, later, the stack deletes this address, then the i40e driver will
receive a request to remove this address.
The driver stores its current MAC address as part of the MAC/VLAN hash
array, since it is convenient and matches exactly how the hardware
expects to be told which traffic to receive.
This causes a problem, since for more devices, the MAC address is stored
separately, and requests to delete a unicast address should not have the
ability to remove the filter for the MAC address.
Fix this by forcing a check on every address sync to ensure we do not
remove the device address.
There is a very narrow possibility of a race between .set_mac and
.set_rx_mode, if we don't change netdev->dev_addr before updating our
internal MAC list in .set_mac. This might be possible if .set_rx_mode is
going to remove MAC "XYZ" from the list, at the same time as .set_mac
changes our dev_addr to MAC "XYZ", we might possibly queue a delete,
then an add in .set_mac, then queue a delete in .set_rx_mode's
dev_uc_sync and then update netdev->dev_addr. We can avoid this by
moving the copy into dev_addr prior to the changes to the MAC filter
list.
A similar race on the other side does not cause problems, as if we're
changing our MAC form A to B, and we race with .set_rx_mode, it could
queue a delete from A, we'd update our address, and allow the delete.
This seems like a race, but in reality we're about to queue a delete of
A anyways, so it would not cause any issues.
A race in the initialization code is unlikely because the netdevice has
not yet been fully initialized and the stack should not be adding or
removing addresses yet.
Note that we don't (yet) need similar code for the VF driver because it
does not make use of __dev_uc_sync and __dev_mc_sync, but instead roles
its own method for handling updates to the MAC/VLAN list, which already
has code to protect against removal of the hardware address.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since UDP based filters are not supported via big buffer cloud
filters, remove UDP support. Also change a few return types to
indicate unsupported vs invalid configuration.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adding cloud filters could fail for a number of reasons,
unsupported filter fields for example, which fails during
validation of fields itself. This will not result in admin
command errors and converting the admin queue status to posix
error code using i40e_aq_rc_to_posix would result in incorrect
error values. If the failure was due to AQ error itself,
reporting that correctly is handled in the inner function.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40e as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
After a reset we rebuild the VSIs which is going to clobber any
promiscuous settings we had before reset. This makes it so that we
restore the promiscuous settings we had before reset.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It is safe to remove the upper limit of 64 queues on a channel
VSI. The upper bound is determined by the VSI's num_queue_pairs
and gets validated when the queue mapping info through mqprio
interface is subject to bound checking in the driver.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
ndo_xdp is a control path callback for setting up XDP in the
driver. We can reuse it for other forms of communication
between the eBPF stack and the drivers. Rename the callback
and associated structures and definitions.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>
Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch enables tc-flower based hardware offloads. tc flower
filter provided by the kernel is configured as driver specific
cloud filter. The patch implements functions and admin queue
commands needed to support cloud filters in the driver and
adds cloud filters to configure these tc-flower filters.
The classification function of the filter is to direct matched
packets to a traffic class. The hardware traffic class is set
based on the the classid reserved in the range :ffe0 - :ffef.
Match Dst MAC and route to TC0:
prio 1 flower dst_mac 3c:fd:fe:a0:d6:70 skip_sw\
hw_tc 1
Match Dst IPv4,Dst Port and route to TC1:
prio 2 flower dst_ip 192.168.3.5/32\
ip_proto udp dst_port 25 skip_sw\
hw_tc 2
Match Dst IPv6,Dst Port and route to TC1:
prio 3 flower dst_ip fe8::200:1\
ip_proto udp dst_port 66 skip_sw\
hw_tc 2
Delete tc flower filter:
Example:
Flow Director Sideband is disabled while configuring cloud filters
via tc-flower and until any cloud filter exists.
Unsupported matches when cloud filters are added using enhanced
big buffer cloud filter mode of underlying switch include:
1. source port and source IP
2. Combined MAC address and IP fields.
3. Not specifying L4 port
These filter matches can however be used to redirect traffic to
the main VSI (tc 0) which does not require the enhanced big buffer
cloud filter support.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jingjing Wu <jingjing.wu@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Introduce the cloud filter data structure and cleanup of cloud
filters associated with the device.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add definitions for L4 filters and switch modes based on cloud filters
modes and extend the set switch config command to include the
additional cloud filter mode.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Add mapping of TCs with the seids of the channel VSIs. TC0
will be mapped to the main VSI seid and all other TCs are
mapped to the seid of the corresponding channel VSI.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Whether or not there are vectors_left, we only need to redistribute
our vectors if we didn't get as many as we requested. With the current
check, the code will try to redistribute even if we did in fact get all
the vectors we requested - this can happen when we have more CPUs than
we do vectors. This restores an earlier check to be sure we only
redistribute if we didn't get the full count we requested.
Fixes: 4ce20abc64 (i40e: fix MSI-X vector redistribution if hw limit is reached)
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
A cleanup of the PM code left an incorrect #ifdef in place, leading
to a harmless build warning:
drivers/net/ethernet/intel/i40e/i40e_main.c:12223:12: error: 'i40e_resume' defined but not used [-Werror=unused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:12185:12: error: 'i40e_suspend' defined but not used [-Werror=unused-function]
It's easier to use __maybe_unused attributes here, since you
can't pick the wrong one.
Fixes: 0e5d3da400 ("i40e: use newer generic PM support instead of legacy PM callbacks")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Acked-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Please do not apply this to mainline directly, instead please re-run the
coccinelle script shown below and apply its output.
For several reasons, it is desirable to use {READ,WRITE}_ONCE() in
preference to ACCESS_ONCE(), and new code is expected to use one of the
former. So far, there's been no reason to change most existing uses of
ACCESS_ONCE(), as these aren't harmful, and changing them results in
churn.
However, for some features, the read/write distinction is critical to
correct operation. To distinguish these cases, separate read/write
accessors must be used. This patch migrates (most) remaining
ACCESS_ONCE() instances to {READ,WRITE}_ONCE(), using the following
coccinelle script:
----
// Convert trivial ACCESS_ONCE() uses to equivalent READ_ONCE() and
// WRITE_ONCE()
// $ make coccicheck COCCI=/home/mark/once.cocci SPFLAGS="--include-headers" MODE=patch
virtual patch
@ depends on patch @
expression E1, E2;
@@
- ACCESS_ONCE(E1) = E2
+ WRITE_ONCE(E1, E2)
@ depends on patch @
expression E;
@@
- ACCESS_ONCE(E)
+ READ_ONCE(E)
----
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: davem@davemloft.net
Cc: linux-arch@vger.kernel.org
Cc: mpe@ellerman.id.au
Cc: shuah@kernel.org
Cc: snitzer@redhat.com
Cc: thor.thayer@linux.intel.com
Cc: tj@kernel.org
Cc: viro@zeniv.linux.org.uk
Cc: will.deacon@arm.com
Link: http://lkml.kernel.org/r/1508792849-3115-19-git-send-email-paulmck@linux.vnet.ibm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-10-17
This series contains updates to i40e and ethtool.
Alan provides most of the changes in this series which are mainly fixes
and cleanups. Renamed the ethtool "cmd" variable to "ks", since the new
ethtool API passes us ksettings structs instead of command structs.
Cleaned up an ifdef that was not accomplishing anything. Added function
header comments to provide better documentation. Fixed two issues in
i40e_get_link_ksettings(), by calling
ethtool_link_ksettings_zero_link_mode() to ensure the advertising and
link masks are cleared before we start setting bits. Cleaned up and fixed
code comments which were incorrect. Separated the setting of autoneg in
i40e_phy_types_to_ethtool() into its own conditional to clarify what PHYs
support and advertise autoneg, and makes it easier to add new PHY types in
the future. Added ethtool functionality to intersect two link masks
together to find the common ground between them. Overhauled i40e to
ensure that the new ethtool API macros are being used, instead of the
old ones. Fixed the usage of unsigned 64-bit division which is not
supported on all architectures.
Sudheer adds support for 25G Active Optical Cables (AOC) and Active Copper
Cables (ACC) PHY types.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation for unconditionally passing the struct timer_list pointer to
all timer callbacks, switch to using the new timer_setup() and from_timer()
to pass the timer pointer explicitly. Switches test of .data field to
.function, since .data will be going away.
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: intel-wired-lan@lists.osuosl.org
Cc: netdev@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 52eb1ff93e98 ("i40e: Add support setting TC max bandwidth rates")
and commit 1ea6f21ae530 ("i40e: Refactor VF BW rate limiting") add some
needed functionality for TC bandwidth rate limiting. Unfortunately they
introduce several usages of unsigned 64-bit division which needs to be
handled special by the kernel to support all architectures.
Fixes: 52eb1ff93e98 ("i40e: Add support setting TC max bandwidth
rates")
Fixes: 1ea6f21ae530 ("i40e: Refactor VF BW rate limiting")
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch enables setting up maximum Tx rates for the traffic
classes in i40e. The maximum rate is offloaded to the hardware through
the mqprio framework by specifying the mode option as 'channel' and
shaper option as 'bw_rlimit' and is configured for the VSI. Configuring
minimum Tx rate limit is not supported in the device. The minimum
usable value for Tx rate is 50Mbps.
Example:
# tc qdisc add dev eth0 root mqprio num_tc 2 map 0 0 0 0 1 1 1 1\
queues 4@0 4@4 hw 1 mode channel shaper bw_rlimit\
max_rate 4Gbit 5Gbit
To dump the bandwidth rates:
# tc qdisc show dev eth0
qdisc mqprio 804a: root tc 2 map 0 0 0 0 1 1 1 1 0 0 0 0 0 0 0 0
queues:(0:3) (4:7)
mode:channel
shaper:bw_rlimit max_rate:4Gbit 5Gbit
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch refactors the BW rate limiting for Tx traffic
on the VF to be reused in the next patch for rate limiting Tx
traffic for the VSIs on the PF as well.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e driver is modified to enable the new mqprio hardware
offload mode and factor the TCs and queue configuration by
creating channel VSIs. In this mode, the priority to traffic
class mapping and the user specified queue ranges are used
to configure the traffic classes by setting the mode option to
'channel'.
Example:
map 0 0 0 0 1 2 2 3 queues 2@0 2@2 1@4 1@5\
hw 1 mode channel
qdisc mqprio 8038: root tc 4 map 0 0 0 0 1 2 2 3 0 0 0 0 0 0 0 0
queues:(0:1) (2:3) (4:4) (5:5)
mode:channel
shaper:dcb
The HW channels created are removed and all the queue configuration
is set to default when the qdisc is detached from the root of the
device.
This patch also disables setting up channels via ethtool (ethtool -L)
when the TCs are configured using mqprio scheduler.
The patch also limits setting ethtool Rx flow hash indirection
(ethtool -X eth0 equal N) to max queues configured via mqprio.
The Rx flow hash indirection input through ethtool should be
validated so that it is within in the queue range configured via
tc/mqprio. The bound checking is achieved by reporting the current
rss size to the kernel when queues are configured via mqprio.
Example:
map 0 0 0 1 0 2 3 0 queues 2@0 4@2 8@6 11@14\
hw 1 mode channel
Cannot set RX flow hash configuration: Invalid argument
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch sets up the infrastructure for offloading TCs and
queue configurations to the hardware by creating HW channels(VSI).
A new channel is created for each of the traffic class
configuration offloaded via mqprio framework except for the first TC
(TC0). TC0 for the main VSI is also reconfigured as per user provided
queue parameters. Queue counts that are not power-of-2 are handled by
reconfiguring RSS by reprogramming LUTs using the queue count value.
This patch also handles configuring the TX rings for the channels,
setting up the RX queue map for channel.
Also, the channels so created are removed and all the queue
configuration is set to default when the qdisc is detached from the
root of the device.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Introduce a macro for the bit setting the PF reset flag and
update its usages. This makes it easier to use this flag
in functions to be introduced in future without encountering
checkpatch issues related to alignment and line over 80
characters.
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a typo in i40e_vsi_alloc_arrays() documentation.
The first parameter name should be "vsi" instead of "type".
Signed-off-by: Rami Rosen <rami.rosen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Recently, the kernel gained support for enabling XPS and QoS at the
same time. Thus, we no longer need to worry about the number of
traffic classes when enabling XPS.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The lrxq thresh value tells hardware to immediately interrupt when there
are fewer than N*64 packets left in the ring.
Counter intuitively, empirical testing has shown that decreasing this
value from 2 to 1, and thus changing from an immediate interrupt at
fewer than 128 descriptors down to 64 descriptors causes a small
increase in the maximum total packets per second we can receive. This
increase occurs even when we're polling with interrupts masked, as the
hardware must still handle interrupts internally even if we've disabled
them in software.
Also reduce the value for any VFs we allocate.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In the past we changed driver behavior to not clear the PBA when
re-enabling interrupts. This change was motivated by the flawed belief
that clearing the PBA would cause a lost interrupt if a receive
interrupt occurred while interrupts were disabled.
According to empirical testing this isn't the case. Additionally, the
data sheet specifically says that we should set the CLEARPBA bit when
re-enabling interrupts in a polling setup.
This reverts commit 40d72a5098 ("i40e/i40evf: don't lose interrupts")
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The ITR register expects to be programmed in units of 2 microseconds.
Because of this, all of the drivers I40E_ITR_* constants are in terms of
this 2 microsecond register.
Unfortunately, the rx_itr_default value is expected to be programmed in
microseconds.
Effectively the driver defaults to an ITR value of half the expected
value (in terms of minimum microseconds between interrupts).
Fix this by changing the default values to be calculated using
ITR_REG_TO_USEC macro which indicates that we're converting from the
register units into microseconds.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch implements the PCI error handler reset_prepare and reset_done.
This allows us to handle function level reset. Without this patch we
are unable to perform and recover from an FLR correctly and this will cause
VFs to be unable to recover from an FLR on the PF.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When there is no space for more flow director filters and user requested to
add a new one it is rejected by firmware and automatically removed from the
filter list maintained by driver. This behaviour is correct. Afterwards
existing filter can be removed making free slot for the new one. This
however causes the newly added filter to be accepted by firmware but
removed from driver filter list resulting in not showing after issuing
'ethtool -n <dev_name>'.
This happened due to not clearing the variable pf->fd_inv which stores
filter number to be removed from the list when firmware refused to add the
requested filter. It caused the filter with this specific ID to be
constantly removed once it was added to the list although it has been
accepted by firmware and effectively applied to the NIC.
It was fixed by clearing pf->fd_inv variable after removal of the filter
from the list when it was rejected by firmware.
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch causes error message to be displayed when NIC detects
insertion of module that does not meet thermal requirements.
Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch removes some code that was accidentally added to
the wrong function with a merge error. Fixes: c53934c6d1
("i40e: fix: do not sleep in netdev_ops")
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When using set_bit and friends, we should be using actual
bitmaps, and fix all the locations where we might access
it.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Starting with XL710 FW 5.3 PTP L4 was disabled for XL710 due to a bug. The
bug has since been resolved in XL710 FW >6.0 and PTP L4 can now be
re-enabled on those devices with updated firmware.
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Currently, when setting up the IRQ for a q_vector, we set an affinity
hint based on the v_idx of that q_vector. Meaning a loop iterates on
v_idx, which is an incremental value, and the cpumask is created based
on this value.
This is a problem in systems with multiple logical CPUs per core (like in
simultaneous multithreading (SMT) scenarios). If we disable some logical
CPUs, by turning SMT off for example, we will end up with a sparse
cpu_online_mask, i.e., only the first CPU in a core is online, and
incremental filling in q_vector cpumask might lead to multiple offline
CPUs being assigned to q_vectors.
Example: if we have a system with 8 cores each one containing 8 logical
CPUs (SMT == 8 in this case), we have 64 CPUs in total. But if SMT is
disabled, only the 1st CPU in each core remains online, so the
cpu_online_mask in this case would have only 8 bits set, in a sparse way.
In general case, when SMT is off the cpu_online_mask has only C bits set:
0, 1*N, 2*N, ..., C*(N-1) where
C == # of cores;
N == # of logical CPUs per core.
In our example, only bits 0, 8, 16, 24, 32, 40, 48, 56 would be set.
Instead, we should only assign hints for CPUs which are online. Even
better, the kernel already provides a function, cpumask_local_spread()
which takes an index and returns a CPU, spreading the interrupts across
local NUMA nodes first, and then remote ones if necessary.
Since we generally have a 1:1 mapping between vectors and CPUs, there
is no real advantage to spreading vectors to local CPUs first. In order
to avoid mismatch of the default XPS hints, we'll pass -1 so that it
spreads across all CPUs without regard to the node locality.
Note that we don't need to change the q_vector->affinity_mask as this is
initialized to cpu_possible_mask, until an actual affinity is set and
then notified back to us.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
By default, our devices do source pruning, that is, they drop receive
packets that have the source MAC matching one of the receive filters.
Unfortunately, this breaks ARP monitoring in channel bonding, as the
bonding driver expects devices to receive ARPs containing their own
source address.
Add an ethtool private flag to control this feature.
Also, remove the netif_running() check when we process our private
flags. It's OK to reset when the device is closed and in most cases we
need the reset the apply these changes.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Enable i40e to pass traffic with VLAN tags using the 802.1ad ethernet
protocol ID (0x88a8).
This requires NIC firmware providing version 1.7 of the API. With
older NIC firmware 802.1ad tagged packets will continue to be dropped.
No VLAN offloads nor RSS are supported for 802.1ad VLANs.
Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When a machine has more CPUs than queue pairs, e.g. 512 cores, the
counting gets a little funky and turns off Flow Director with the
message:
not enough queues for Flow Director. Flow Director feature is disabled
This patch limits the number of lan queues initially allocated to
be sure we have some left for FD and other features.
Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e driver now supports two different devices with two different
firmware versions. So be smart about how we handle these. Move the FW
version macros to the appropriate header file, and add a convenience
macro that checks the version based on the device. Then use this macro
to check whether or not the driver can use the new link info API.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On some platforms with a large number of CPUs, we will allocate many IRQ
vectors. When hibernating, the system will attempt to migrate all of the
vectors back to CPU0 when shutting down all the other CPUs. It is
possible that we have so many vectors that it cannot re-assign them to
CPU0. This is even more likely if we have many devices installed in one
platform.
The end result is failure to hibernate, as it is not possible to
shutdown the CPUs. We can avoid this by disabling MSI-X and clearing our
interrupt scheme when the device is suspended. A more ideal solution
would be some method for the stack to properly handle this for all
drivers, rather than on a case-by-case basis for each driver to fix
itself.
However, until this more ideal solution exists, we can do our part and
shutdown our IRQs during suspend, which should allow systems with
a large number of CPUs to safely suspend or hibernate.
It may be worth investigating if we should shut down even further when
we suspend as it may make the path cleaner, but this was the minimum fix
for the hibernation issue mentioned here.
Testing-hints:
This affects systems with a large number of CPUs, and with multiple
devices enabled. Without this change, those platforms are unable to
hibernate at all.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Although the service task does check the suspended status before
running, it might already be part way through running when we go to
suspend. Lets ensure that the service task is stopped and will not be
restarted again until we finish resuming. This ensures that service task
code does not cause strange interactions with the suspend/resume
handlers.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
When handling suspend and resume callbacks we want to make sure that (a)
we don't suspend again if we're already suspended and (b) we don't
resume again if we're already resuming. Lets make sure we test_and_set
the __I40E_SUSPENDED bit in i40e_suspend which ensures that a suspend
call when already suspended will exit early. Additionally, if
__I40E_SUSPENDED is not set when we begin resuming, exit early as well.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Stop using the old legacy PM support, since we now have stable support
for the newer generic PM callbacks.
This has several advantages. First, we no longer have to manage our
own pci_save_state() and power changes, as it's preferred to have the
PCI stack do this. Second, these routines get called for both hibernate
and suspend to ram, so we can have the driver properly handle all the
suspend/resume flows that it needs to.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
We currently (mis)use the __I40E_RECOVERY_PENDING bit to determine when
we should actually request a new IRQ in i40e_setup_misc_vector().
This led to a design mistake where we open-coded the re-setup of the
miscellaneous vector in i40e_resume() instead of using the function
provided. If we did not open-code this and instead tried to use the
i40e_setup_misc_vector() function, it would lead to never reallocating
the IRQ.
This would lead to a second i40e_suspend() call failing to free the
vector due to a NULL pointer dereference.
A future patch is going to re-work how the i40e_suspend() and
i40e_resume() flows work to clear all IRQ vectors, which would require
us to use i40e_setup_misc_vector() directly. Since during this time the
__I40E_RECOVERY_PENDING bit is set, we'll never re-allocate the vector.
Rather than leaving the open-coded setup in i40e_resume() lets just fix
the problem properly in i40e_setup_misc_vector().
Introduce a new state bit which indicates when the IRQ has been
assigned, which will be set when i40e_setup_misc_vector is first called.
This ultimately resolves the issue of re-requesting the vector, without
overloading the __I40E_RECOVERY_PENDING state. This ensures that the
suspend/resume cycle can use the setup function instead of open-coding
the re-request during resume.
Additionally, since the only callers of i40e_stop_misc_vector also want
to free it, move this code directly into the function to avoid
duplication. Due to the new functionality, rename it to
i40e_free_misc_vector().
This lets us drop the extra calls to free and re-enable the vector
during i40e_suspend() and i40e_resume(). We don't need to call
i40e_setup_misc_Vector() in i40e_resume() because it gets called by the
i40e_rebuild() call.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
An errata with GLQF_PCNT causes it to not wrap as expected. This
can cause an error in flow director statistics. This patch resets
affected counters just after reading.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Fortville and Fort Park devices are often on different firmware release
schedules. This change relaxes the minor version warning message,
so it is only displayed for older FW warning version for old
firmware Fortville 3 or earlier.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This commit replaces usage of vsi->back in i40e_print_link_message()
(which is actually a PF pointer) with temp variable.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e_print_link_message() is intended to compare new
link state with current link state and print log message
only if the new state is different from current state.
However in current driver the new state does not get updated
when link is going down because of the if condition. When an
interface is brought down, vsi->state is set to I40E_VSI_DOWN
in i40e_vsi_close() and later i40e_print_link_message() does
not get invoked in i40e_link_event due to if condition. Hence
link down message doesn't appear when link is going down. The
down state is seen later during i40e_open() and old state
gets printed. The actual link state doesn't get updated in
i40e_close() or i40e_open() but when i40e_handle_link_event is
called inside i40e_clean_adminq_subtask.
This change allows i40e_print_link_message() to be called when
interface is going down and keeps the state information updated.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In current driver, when ifconfig ethx up is done, the link state
doesn't transition to UP inside i40e_open(). It changes after AQ
command response is handled in i40e_handle_link_event().
When pf->hw.phy.link_info.link_info is DOWN inside i40e_open(),
The state is transient and invalid. So log message gets printed
based on incorrect info (i.e link_info and an_info).
This commit removes check for unqualified module inside
i40e_up_complete(). The existing check in i40e_handle_link_event()
logs the error message based on correct link state information.
Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
On older kernels a call to irq_set_affinity_hint does not guarantee that
the IRQ affinity will be set. If nothing else on the system sets the IRQ
affinity this can result in a bug in the i40e_napi_poll() routine where
we notice that our interrupt fired on the "wrong" CPU according to our
internal affinity_mask variable.
This results in a bug where we continuously tell NAPI to stop polling to
move the interrupt to a new CPU, but the CPU never changes because our
affinity mask does not match the actual mask setup for the IRQ.
The root problem is a mismatched affinity mask value. So lets initialize
the value to cpu_possible_mask instead. This ensures that prior to the
first time we get an IRQ affinity notification we'll have the mask set
to include every possible CPU.
We use cpu_possible_mask instead of cpu_online_mask since the former is
almost certainly never going to change, while the later might change
after we've made a copy.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since commit 3ffa037d7f ("i40e: Set XPS bit mask to zero in DCB mode")
we've tried to reset the XPS settings by building a custom
empty CPU mask.
This workaround is not necessary because we're not really removing the
XPS setting, but simply setting it so that no CPU is valid.
Second, we shorten the code further by using zalloc_cpumask_var instead
of a separate call to bitmap_zero().
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes an issue where an error return value is
set, but without an immediate exit, the value can be overwritten
by the following code execution. The condition at this point
is not fatal, so remove the error assignment and comment the
intent for future code maintainers
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch improves the system log message. The log message will
be expanded to include the FEC mode the FW requested before link
was established.
Signed-off-by: Mariusz Stachura <mariusz.stachura@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In new versions of GCC since 7.x a new warning exists which warns when
a string is truncated before all of the format can be completed.
When we setup VMDQ netdev names we are copying a pre-existing interface
name which could be up to 15 characters in length. Since we also add
4 bytes, v, the literal %, the d and a \0 null, we would overrun the
available size unless snprintf truncated for us.
The snprintf call will of course truncate on the end, so lets instead
modify the code to force truncation of the copied netdev name by
4 characters, to create enough space for the 4 bytes we're adding.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
According to the header file cpumask.h, we shouldn't be directly copying
a cpumask_t, since its a bitmap and might not be copied correctly. Lets
use the provided cpumask_copy() function instead.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
In i40e_vsi_add_vlan we treat attempting to add VID=0 as an error,
because it does not do what the caller might expect. We already special
case VID=0 in i40e_vlan_rx_add_vid so that we avoid this error when
adding the VLAN.
This special casing is necessary so that we do not add the VLAN=0 filter
since we don't want to stop receiving untagged traffic. Unfortunately,
not all callers of i40e_vsi_add_vlan are aware of this, including when
we add VLANs from a VF device.
Rather than special casing every single caller of i40e_vsi_add_vlan,
lets just move this check internally. This makes the code simpler
because the caller does not need to be aware of how VLAN=0 is special,
and we don't forget to add this check in new places.
This fixes a harmless error message displaying when adding a VLAN from
within a VF. The message was meaningless but there is no reason to
confuse end users and system administrators, and this is now avoided.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a problem with the HW ATR eviction feature where the
NVM setting was incorrect. This patch detects the issue on X720
adapters and disables the feature if the NVM setting is incorrect.
Without this patch, HW ATR Evict feature does not work on broken NVMs
and is not detected either. If the HW ATR Evict feature is disabled
the SW Eviction feature will take effect.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Since commit b499ffb0a2 ("i40e: Look up MAC address in Open Firmware
or IDPROM"), we've had support for obtaining the MAC address
form Open Firmware or IDPROM.
This code relied on sending the Open Firmware address directly to the
device firmware instead of relying on our MAC/VLAN filter list. Thus,
a work around was introduced in commit b1b15df592 ("i40e: Explicitly
write platform-specific mac address after PF reset")
We refactored the Open Firmware address enablement code in the ill-named
commit 41c4c2b50d ("i40e: allow look-up of MAC address from Open
Firmware or IDPROM")
Since this refactor, we no longer even set I40E_FLAG_PF_MAC. Further, we
don't need this work around, because we actually store the MAC address
as part of the MAC/VLAN filter hash. Thus, we will restore the address
correctly upon reset.
The refactor above failed to revert the workaround, so do that now.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The number of flags found in pf->flags has grown quite large, and there
are a lot of different types of flags. Most of the flags are simply
hardware features which are enabled on some firmware or some MAC types.
Other flags are dynamic run-time flags which enable or disable certain
features of the driver.
Separate these two types of flags into pf->hw_features and pf->flags.
The hw_features list will contain a set of features which are enabled at
init time. This will not contain toggles or otherwise dynamically
changing features. These flags should not need atomic protections, as
they will be set once during init and then be essentially read only.
Everything else will remain in the flags variable. These flags may be
modified at any time during run time. A future patch may wish to convert
these flags into set_bit/clear_bit/test_bit or similar approach to
ensure atomic correctness.
The I40E_FLAG_MFP_ENABLED flag may be a good fit for hw_features but
currently is used by ethtool in the private flags settings, and thus has
been left as part of flags.
Additionally, I40E_FLAG_DCB_CAPABLE may be a good fit for the
hw_features but this patch has not tried to untangle it yet.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The X722 pf flag setup should happen before the VMDq RSS queue count is
initialized for VMDq VSI to get the right number of queues for RSS in
case of X722 devices.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Now that the kernel supports double VLAN tags, we should at least play
nice. Adjust the max packet size to account for two VLAN tags, not just
one.
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Get rid of struct tc_to_netdev which is now just unnecessary container
and rather pass per-type structures down to drivers directly.
Along with that, consolidate the naming of per-type structure variables
in cls_*.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Change the return value from -EINVAL to -EOPNOTSUPP. The rest of the
drivers have it like that, so be aligned.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
As ndo_setup_tc is generic offload op for whole tc subsystem, does not
really make sense to have cls-specific args. So move them under
cls_common structurure which is embedded in all cls structs.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since the type is always present, push it to be a separate argument to
ndo_setup_tc. On the way, name the type enum and use it for arg type.
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fill the XDP prog_id with the id just like we do in other XDP enabled
drivers such as ixgbe. This is needed so that on dump we can retrieve
the attached program based on the id, and dump BPF insns, opcodes, etc
back to user space. Only XDP driver missing this is currently i40e.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The i40e driver attempts to display the UDP tunnel name by doing a check
against the type, where for non-zero types we use "vxlan" and for zero
type we use "geneve". This is not future proof, because if new tunnel
types get added, we'll incorrectly label them. It also depends on the
value of UDP_TUNNEL_TYPE_GENEVE == 0, which is brittle.
Instead, replace this with a function that can return a constant string
depending on the type. For now we'll use "unknown" for types we don't
know about, and we can expand this in the future if new types get added.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Compiler reported several places where driver compared
signed and unsigned types. Cast or change the types to remove
the warnings.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
During certain events such as a CORER, multiple devices will run a work
task to handle some cleanup. This can cause issues due to
a single-threaded workqueue which can mean that a device doesn't cleanup
in time. Prevent this by removing the single-threaded restriction on the
module workqueue. This avoids the need to add more complex yielding
logic in our service task routine. This is also similar to what other
drivers such as fm10k do.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a problem found in systems when entering
S4 state. This patch fixes the problem by ensuring that
the misc vector's IRQ is disabled as well. Without this
patch a stack trace can be seen upon entering S4 state.
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>