Commit Graph

414 Commits

Author SHA1 Message Date
Jesse Brandeburg 0bcd952fee ethernet/intel: consolidate NAPI and NAPI exit
While reviewing code, I noticed that Eric Dumazet recommends that
drivers check the return code of napi_complete_done, and use that
to decide to enable interrupts or not when exiting poll.  One of
the Intel drivers was already fixed (ixgbe).

Upon looking at the Intel drivers as a whole, we are handling our
polling and NAPI exit in a few different ways based on whether we
have multiqueue and whether we have Tx cleanup included. Several
drivers had the bug of exiting NAPI with return 0, which appears
to mess up the accounting in the stack.

Consolidate all the NAPI routines to do best known way of exiting
and to just mostly look like each other.
1) check return code of napi_complete_done to control interrupt enable
2) return the actual amount of work done.
3) return budget immediately if need NAPI poll again

Tested the changes on e1000e with a high interrupt rate set, and
it shows about an 8% reduction in the CPU utilization when busy
polling because we aren't re-enabling interrupts when we're about
to be polled.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-21 10:35:23 -08:00
Jan Sokolowski 8554768c2c i40e: Use a local variable for readability
Use a local variable to make the code a bit more readable.

Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-14 10:56:33 -08:00
Jacob Keller a9e510589d intel-ethernet: software timestamp skbs as late as possible
Many of the Intel Ethernet drivers call skb_tx_timestamp() earlier than
necessary. Move the calls to this function to the latest point possible,
just prior to notifying hardware of the new Tx packet when we bump the
tail register.

This affects i40e, iavf, igb, igc, and ixgbe.

The e100, e1000, e1000e, fm10k, and ice drivers already call the
skb_tx_timestamp() function just prior to indicating the Tx packet to
hardware, so they do not need to be changed.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-11-07 09:47:01 -08:00
Eric Dumazet 55469bc6b5 drivers: net: remove <net/busy_poll.h> inclusion when not needed
Drivers using generic NAPI interface no longer need to include
<net/busy_poll.h>, since busy polling was moved to core networking
stack long ago.

See commit 79e7fff47b ("net: remove support for per driver
ndo_busy_poll()") for reference.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-10-25 16:20:02 -07:00
Björn Töpel 411dc16ff1 i40e: clean zero-copy XDP Rx ring on shutdown/reset
Outstanding Rx descriptors are temporarily stored on a stash/reuse
queue. When/if the HW rings comes up again, entries from the stash are
used to re-populate the ring.

The latter required some restructuring of the allocation scheme for
the AF_XDP zero-copy implementation. There is now a fast, and a slow
allocation. The "fast allocation" is used from the fast-path and
obtains free buffers from the fill ring and the internal recycle
mechanism. The "slow allocation" is only used in ring setup, and
obtains buffers from the fill ring and the stash (if any).

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-25 13:15:16 -07:00
Björn Töpel 9dbb137045 i40e: clean zero-copy XDP Tx ring on shutdown/reset
When the zero-copy enabled XDP Tx ring is torn down, due to
configuration changes, outstanding frames on the hardware descriptor
ring are queued on the completion ring.

The completion ring has a back-pressure mechanism that will guarantee
that there is sufficient space on the ring.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-09-25 13:10:24 -07:00
Magnus Karlsson 1328dcddbd i40e: add AF_XDP zero-copy Tx support
This patch adds zero-copy Tx support for AF_XDP sockets. It implements
the ndo_xsk_async_xmit netdev ndo and performs all the Tx logic from a
NAPI context. This means pulling egress packets from the Tx ring,
placing the frames on the NIC HW descriptor ring and completing sent
frames back to the application via the completion ring.

The regular XDP Tx ring is used for AF_XDP as well. This rationale for
this is as follows: XDP_REDIRECT guarantees mutual exclusion between
different NAPI contexts based on CPU id. In other words, a netdev can
XDP_REDIRECT to another netdev with a different NAPI context, since
the operation is bound to a specific core and each core has its own
hardware ring.

As the AF_XDP Tx action is running in the same NAPI context and using
the same ring, it will also be protected from XDP_REDIRECT actions
with the exact same mechanism.

As with AF_XDP Rx, all AF_XDP Tx specific functions are added to
i40e_xsk.c.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Magnus Karlsson a96e747273 i40e: move common Tx functions to i40e_txrx_common.h
This patch prepares for the upcoming zero-copy Tx functionality, by
moving common functions and refactor chunks of code into re-usable
functions, used both by the regular path and zero-copy path.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Björn Töpel 0a714186d3 i40e: add AF_XDP zero-copy Rx support
This patch adds zero-copy Rx support for AF_XDP sockets. Instead of
allocating buffers of type MEM_TYPE_PAGE_SHARED, the Rx frames are
allocated as MEM_TYPE_ZERO_COPY when AF_XDP is enabled for a certain
queue.

All AF_XDP specific functions are added to a new file, i40e_xsk.c.

Note that when AF_XDP zero-copy is enabled, the XDP action XDP_PASS
will allocate a new buffer and copy the zero-copy frame prior passing
it to the kernel stack.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Björn Töpel 20a739dbef i40e: move common Rx functions to i40e_txrx_common.h
This patch prepares for the upcoming zero-copy Rx functionality, by
moving/changing linkage of common functions, used both by the regular
path and zero-copy path.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Björn Töpel 6d7aad1da2 i40e: refactor Rx path for re-use
In this commit, the Rx path is refactored some, as a step torwards the
introduction AF_XDP Rx zero-copy.

The page re-use counter is moved into the i40e_reuse_rx_page, instead
of bumping the counter in many places. The Rx buffer page clearing is
moved for better readability. Lastely, functions to update statistics
and bump the XDP Tx ring are introduced.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-08-29 12:25:53 -07:00
Gustavo A. R. Silva f7c3ca2da4 i40e_txrx: mark expected switch fall-through
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 114791 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-08-07 17:54:20 -07:00
Jesper Dangaard Brouer 2e68931238 i40e: split XDP_TX tail and XDP_REDIRECT map flushing
The driver was combining the XDP_TX tail flush and XDP_REDIRECT
map flushing (xdp_do_flush_map).  This is suboptimal, these two
flush operations should be kept separate.

It looks like the mistake was copy-pasted from ixgbe.

Fixes: d9314c474d ("i40e: add support for XDP_REDIRECT")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Björn Töpel <bjorn.topel@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-28 14:27:52 +09:00
Daniel Borkmann c51818d5b7 bpf, xdp, i40e: fix i40e_build_skb skb reserve and truesize
Using skb_reserve(skb, I40E_SKB_PAD + (xdp->data - xdp->data_hard_start))
is clearly wrong since I40E_SKB_PAD already points to the offset where
the original xdp->data was sitting since xdp->data_hard_start is defined
as xdp->data - i40e_rx_offset(rx_ring) where latter offsets to I40E_SKB_PAD
when build skb is used.

However, also before cc5b114dcf ("bpf, i40e: add meta data support")
this seems broken since bpf_xdp_adjust_head() helper could have been used
to alter headroom and enlarge / shrink the frame and with that the assumption
that the xdp->data remains unchanged does not hold and would push a bogus
packet to upper stack.

ixgbe got this right in 9247080816 ("ixgbe: add XDP support for pass and
drop actions"). In any case, fix it by removing the I40E_SKB_PAD from both
skb_reserve() and truesize calculation.

Fixes: cc5b114dcf ("bpf, i40e: add meta data support")
Fixes: 0c8493d90b ("i40e: add XDP support for pass and drop actions")
Reported-by: Keith Busch <keith.busch@linux.intel.com>
Reported-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Tested-by: Keith Busch <keith.busch@linux.intel.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-20 07:24:28 +09:00
David S. Miller fd129f8941 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:

====================
pull-request: bpf-next 2018-06-05

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Add a new BPF hook for sendmsg similar to existing hooks for bind and
   connect: "This allows to override source IP (including the case when it's
   set via cmsg(3)) and destination IP:port for unconnected UDP (slow path).
   TCP and connected UDP (fast path) are not affected. This makes UDP support
   complete, that is, connected UDP is handled by connect hooks, unconnected
   by sendmsg ones.", from Andrey.

2) Rework of the AF_XDP API to allow extending it in future for type writer
   model if necessary. In this mode a memory window is passed to hardware
   and multiple frames might be filled into that window instead of just one
   that is the case in the current fixed frame-size model. With the new
   changes made this can be supported without having to add a new descriptor
   format. Also, core bits for the zero-copy support for AF_XDP have been
   merged as agreed upon, where i40e bits will be routed via Jeff later on.
   Various improvements to documentation and sample programs included as
   well, all from Björn and Magnus.

3) Given BPF's flexibility, a new program type has been added to implement
   infrared decoders. Quote: "The kernel IR decoders support the most
   widely used IR protocols, but there are many protocols which are not
   supported. [...] There is a 'long tail' of unsupported IR protocols,
   for which lircd is need to decode the IR. IR encoding is done in such
   a way that some simple circuit can decode it; therefore, BPF is ideal.
   [...] user-space can define a decoder in BPF, attach it to the rc
   device through the lirc chardev.", from Sean.

4) Several improvements and fixes to BPF core, among others, dumping map
   and prog IDs into fdinfo which is a straight forward way to correlate
   BPF objects used by applications, removing an indirect call and therefore
   retpoline in all map lookup/update/delete calls by invoking the callback
   directly for 64 bit archs, adding a new bpf_skb_cgroup_id() BPF helper
   for tc BPF programs to have an efficient way of looking up cgroup v2 id
   for policy or other use cases. Fixes to make sure we zero tunnel/xfrm
   state that hasn't been filled, to allow context access wrt pt_regs in
   32 bit archs for tracing, and last but not least various test cases
   for fixes that landed in bpf earlier, from Daniel.

5) Get rid of the ndo_xdp_flush API and extend the ndo_xdp_xmit with
   a XDP_XMIT_FLUSH flag instead which allows to avoid one indirect
   call as flushing is now merged directly into ndo_xdp_xmit(), from Jesper.

6) Add a new bpf_get_current_cgroup_id() helper that can be used in
   tracing to retrieve the cgroup id from the current process in order
   to allow for e.g. aggregation of container-level events, from Yonghong.

7) Two follow-up fixes for BTF to reject invalid input values and
   related to that also two test cases for BPF kselftests, from Martin.

8) Various API improvements to the bpf_fib_lookup() helper, that is,
   dropping MPLS bits which are not fully hashed out yet, rejecting
   invalid helper flags, returning error for unsupported address
   families as well as renaming flowlabel to flowinfo, from David.

9) Various fixes and improvements to sockmap BPF kselftests in particular
   in proper error detection and data verification, from Prashant.

10) Two arm32 BPF JIT improvements. One is to fix imm range check with
    regards to whether immediate fits into 24 bits, and a naming cleanup
    to get functions related to rsh handling consistent to those handling
    lsh, from Wang.

11) Two compile warning fixes in BPF, one for BTF and a false positive
    to silent gcc in stack_map_get_build_id_offset(), from Arnd.

12) Add missing seg6.h header into tools include infrastructure in order
    to fix compilation of BPF kselftests, from Mathieu.

13) Several formatting cleanups in the BPF UAPI helper description that
    also fix an error during rst2man compilation, from Quentin.

14) Hide an unused variable in sk_msg_convert_ctx_access() when IPv6 is
    not built into the kernel, from Yue.

15) Remove a useless double assignment in dev_map_enqueue(), from Colin.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2018-06-05 12:42:19 -04:00
Jesper Dangaard Brouer 763ea096f3 i40e: remove ndo_xdp_flush call i40e_xdp_flush
Remove the ndo_xdp_flush call implementation i40e_xdp_flush
as no callers of ndo_xdp_flush are left.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-06-05 14:03:15 +02:00
Daniel Borkmann cc5b114dcf bpf, i40e: add meta data support
Add support for XDP meta data when using build skb variant of
the i40e driver. Implementation is analogous to the existing
ixgbe and ixgbevf support for meta data from 366a88fe2f ("bpf,
ixgbe: add meta data support") and be8333322e ("ixgbevf: Add
support for meta data"). With the build skb variant we get
192 bytes of extra headroom which can be used for encaps or
meta data.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Tested-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-06-04 10:23:58 -07:00
Jesper Dangaard Brouer cdb57ed07f i40e: implement flush flag for ndo_xdp_xmit
When passed the XDP_XMIT_FLUSH flag i40e_xdp_xmit now performs the
same kind of ring tail update as in i40e_xdp_flush.  The advantage is
that all the necessary checks have been performed and xdp_ring can be
updated, instead of having to perform the exact same steps/checks in
i40e_xdp_flush

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03 08:11:34 -07:00
Jesper Dangaard Brouer 42b3346898 xdp: add flags argument to ndo_xdp_xmit API
This patch only change the API and reject any use of flags. This is an
intermediate step that allows us to implement the flush flag operation
later, for each individual driver in a separate patch.

The plan is to implement flush operation via XDP_XMIT_FLUSH flag
and then remove XDP_XMIT_FLAGS_NONE when done.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-06-03 08:11:34 -07:00
Jesper Dangaard Brouer 735fc4054b xdp: change ndo_xdp_xmit API to support bulking
This patch change the API for ndo_xdp_xmit to support bulking
xdp_frames.

When kernel is compiled with CONFIG_RETPOLINE, XDP sees a huge slowdown.
Most of the slowdown is caused by DMA API indirect function calls, but
also the net_device->ndo_xdp_xmit() call.

Benchmarked patch with CONFIG_RETPOLINE, using xdp_redirect_map with
single flow/core test (CPU E5-1650 v4 @ 3.60GHz), showed
performance improved:
 for driver ixgbe: 6,042,682 pps -> 6,853,768 pps = +811,086 pps
 for driver i40e : 6,187,169 pps -> 6,724,519 pps = +537,350 pps

With frames avail as a bulk inside the driver ndo_xdp_xmit call,
further optimizations are possible, like bulk DMA-mapping for TX.

Testing without CONFIG_RETPOLINE show the same performance for
physical NIC drivers.

The virtual NIC driver tun sees a huge performance boost, as it can
avoid doing per frame producer locking, but instead amortize the
locking cost over the bulk.

V2: Fix compile errors reported by kbuild test robot <lkp@intel.com>
V4: Isolated ndo, driver changes and callers.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-05-24 18:36:15 -07:00
Jacob Keller f5254429e1 i40e/i40evf: cleanup incorrect function doxygen comments
Recent versions of the Linux kernel now warn about incorrect parameter
definitions for function comments. Fix up several function comments to
correctly reflect the current function arguments. This cleans up the
warnings and helps ensure our documentation is accurate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-04-30 09:09:04 -07:00
Jeff Kirsher 51dce24bcd net: intel: Cleanup the copyright/license headers
After many years of having a ~30 line copyright and license header to our
source files, we are finally able to reduce that to one line with the
advent of the SPDX identifier.

Also caught a few files missing the SPDX license identifier, so fixed
them up.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@oracle.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-27 14:00:04 -04:00
Jesper Dangaard Brouer 44fa2dbd47 xdp: transition into using xdp_frame for ndo_xdp_xmit
Changing API ndo_xdp_xmit to take a struct xdp_frame instead of struct
xdp_buff.  This brings xdp_return_frame and ndp_xdp_xmit in sync.

This builds towards changing the API further to become a bulk API,
because xdp_buff is not a queue-able object while xdp_frame is.

V4: Adjust for commit 59655a5b6c ("tuntap: XDP_TX can use native XDP")
V7: Adjust for commit d9314c474d ("i40e: add support for XDP_REDIRECT")

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:30 -04:00
Jesper Dangaard Brouer 039930945a xdp: transition into using xdp_frame for return API
Changing API xdp_return_frame() to take struct xdp_frame as argument,
seems like a natural choice. But there are some subtle performance
details here that needs extra care, which is a deliberate choice.

When de-referencing xdp_frame on a remote CPU during DMA-TX
completion, result in the cache-line is change to "Shared"
state. Later when the page is reused for RX, then this xdp_frame
cache-line is written, which change the state to "Modified".

This situation already happens (naturally) for, virtio_net, tun and
cpumap as the xdp_frame pointer is the queued object.  In tun and
cpumap, the ptr_ring is used for efficiently transferring cache-lines
(with pointers) between CPUs. Thus, the only option is to
de-referencing xdp_frame.

It is only the ixgbe driver that had an optimization, in which it can
avoid doing the de-reference of xdp_frame.  The driver already have
TX-ring queue, which (in case of remote DMA-TX completion) have to be
transferred between CPUs anyhow.  In this data area, we stored a
struct xdp_mem_info and a data pointer, which allowed us to avoid
de-referencing xdp_frame.

To compensate for this, a prefetchw is used for telling the cache
coherency protocol about our access pattern.  My benchmarks show that
this prefetchw is enough to compensate the ixgbe driver.

V7: Adjust for commit d9314c474d ("i40e: add support for XDP_REDIRECT")
V8: Adjust for commit bd658dda42 ("net/mlx5e: Separate dma base address
and offset in dma_sync call")

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:29 -04:00
Jesper Dangaard Brouer b411ef1102 i40e: convert to use generic xdp_frame and xdp_return_frame API
Also convert driver i40e, which very recently got XDP_REDIRECT support
in commit d9314c474d ("i40e: add support for XDP_REDIRECT").

V7: This patch got added in V7 of this patchset.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-04-17 10:50:28 -04:00
Björn Töpel d9314c474d i40e: add support for XDP_REDIRECT
The driver now acts upon the XDP_REDIRECT return action. Two new ndos
are implemented, ndo_xdp_xmit and ndo_xdp_flush.

XDP_REDIRECT action enables XDP program to redirect frames to other
netdevs.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 14:17:10 -07:00
Björn Töpel 8ce29c679a i40e: tweak page counting for XDP_REDIRECT
This commit tweaks the page counting for XDP_REDIRECT to function
properly. XDP_REDIRECT support will be added in a future commit.

The current page counting scheme assumes that the reference count
cannot decrease until the received frame is sent to the upper layers
of the networking stack. This assumption does not hold for the
XDP_REDIRECT action, since a page (pointed out by xdp_buff) can have
its reference count decreased via the xdp_do_redirect call.

To work around that, we now start off by a large page count and then
don't allow a refcount less than two.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 14:10:04 -07:00
Jacob Keller 134201aead i40e: move AUTO_DISABLED flags into the state field
The two Flow Directory auto disable flags are used at run time to mark
when the flow director features needed to be disabled. Thus the flags
could change even when the RTNL lock is not held.

They also have some code constructions which really should be
test_and_set or test_and_clear using atomic bit operations.

Create new state fields to mark this, and stop including them in
pf->flags.

This is part of a larger effort to remove the need for cmpxchg64 in
i40e_set_priv_flags().

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-03-26 13:46:30 -07:00
Jeff Kirsher ae06c70b13 intel: add SPDX identifiers to all the Intel drivers
Add the SPDX identifiers to all the Intel wired LAN driver files, as
outlined in Documentation/process/license-rules.rst.

Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2018-03-23 12:18:21 -04:00
Alan Brady 04d4105174 i40e/i40evf: use SW variables for hang detection
The i40e_detect_recover_hung function uses the i40e_get_tx_pending
function to determine if there are packets stalled on the ring.
i40e_get_tx_pending calculates the pending packets using the head
writeback value and HW tail.  If the queue is stopped and we lose the
interrupt to update our next_to_clean then we a) won't get another
interrupt to clean because queue is stopped b) we won't catch the
problem with i40e_detect_recover_hung because the HW values look like
there's no packets waiting to be transmitted.  Using the SW values we
can catch the issue because next_to_clean will be out of sync with head
writeback.

This has the added benefit being less CPU intensive because we don't
need to reach into the hardware to get the values.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-26 12:33:27 -08:00
Alexander Duyck a0073a4b8b i40e/i40evf: Add support for new mechanism of updating adaptive ITR
This patch replaces the existing mechanism for determining the correct
value to program for adaptive ITR with yet another new and more
complicated approach.

The basic idea from a 30K foot view is that this new approach will push the
Rx interrupt moderation up so that by default it starts in low latency and
is gradually pushed up into a higher latency setup as long as doing so
increases the number of packets processed, if the number of packets drops
to 4 to 1 per packet we will reset and just base our ITR on the size of the
packets being received. For Tx we leave it floating at a high interrupt
delay and do not pull it down unless we start processing more than 112
packets per interrupt. If we start exceeding that we will cut our interrupt
rates in half until we are back below 112.

The side effect of these patches are that we will be processing more
packets per interrupt. This is both a good and a bad thing as it means we
will not be blocking processing in the case of things like pktgen and XDP,
but we will also be consuming a bit more CPU in the cases of things such as
network throughput tests using netperf.

One delta from this versus the ixgbe version of the changes is that I have
made the interrupt moderation a bit more aggressive when we are in bulk
mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The
main motivation behind moving this is to address the fact that we need to
update less frequently, and have more fine grained control due to the
separate Tx and Rx ITR times.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:50:10 -08:00
Alexander Duyck 556fdfd6e6 i40e/i40evf: Split container ITR into current_itr and target_itr
This patch is mostly prep-work for replacing the current approach to
programming the dynamic aka adaptive ITR. Specifically here what we are
doing is splitting the Tx and Rx ITR each into two separate values.

The first value current_itr represents the current value of the register.

The second value target_itr represents the desired value of the register.

The general plan by doing this is to allow for deferring the update of the
ITR value under certain circumstances. For now we will work with what we
have, but in the future I hope to change the behavior so that we always
only update one ITR at a time using some simple logic to determine which
ITR requires an update.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:43:50 -08:00
Alexander Duyck 92418fb147 i40e/i40evf: Use usec value instead of reg value for ITR defines
Instead of using the register value for the defines when setting up the
ring ITR we can just use the actual values and avoid the use of shifts and
macros to translate between the values we have and the values we want.

This helps to make the code more readable as we can quickly translate from
one value to the other.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 11:29:47 -08:00
Alexander Duyck 4ff17929e6 i40e/i40evf: Don't bother setting the CLEARPBA bit
The CLEARPBA bit in the dynamic interrupt control register actually has
no effect either way on the hardware. As per errata 28 in the XL710
specification update the interrupt is actually cleared any time the
register is written with the INTENA_MSK bit set to 0. As such the act of
toggling the enable bit actually will trigger the interrupt being
cleared and could lead to potential lost events if auto-masking is
not enabled.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:56:16 -08:00
Alexander Duyck 71dc371993 i40e/i40evf: Clean up logic for adaptive ITR
The logic for dynamic ITR update is confusing at best as there were odd
paths chosen for how to find the rings associated with a given queue based
on the vector index and other inconsistencies throughout the code.

This patch is an attempt to clean up the logic so that we can more easily
understand what is going on. Specifically if there is a Rx or Tx ring that
is enabled in dynamic mode on the q_vector it is allowed to override the
other side of the interrupt moderation. While it isn't correct all this
patch is doing is cleaning up the logic for now so that when we come
through and fix it we can more easily identify that this is wrong.

The other big change made here is that we replace references to:
	vsi->rx_rings[q_vector->v_idx]->itr_setting
with:
	q_vector->rx.ring->itr_setting

The general idea is we can avoid the long pointer chase since just
accessing q_vector->rx.ring is a single pointer access versus having to
chase down vsi->rx_rings, and then finding the pointer in the array, and
finally chasing down the itr_setting from there.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:35:49 -08:00
Alexander Duyck 40588ca651 i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx
The rings are already split out into Tx and Rx rings so it doesn't make
sense to have any single ring store both a Tx and Rx itr_setting value.
Since that is the case drop the pair in favor of storing just a single ITR
value.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 10:27:12 -08:00
Alan Brady 11a350c965 i40e: fix typo in function description
'bufer' should be 'buffer'

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-02-12 09:55:34 -08:00
Alexander Duyck a3f9fb5ef3 i40e/i40evf: Record ITR register location in the q_vector
The drivers for i40e and i40evf had a reg_idx value stored in the q_vector
that was going completely unused. I can only assume this was copied over
from ixgbe and nobody knew how to use it.

I'm going to make use of the value to avoid having to compute the vector
and thus the register index for multiple paths throughout the drivers.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-26 13:23:46 -08:00
Sudheer Mogilappagari 07d44190a3 i40e/i40evf: Detect and recover hung queue scenario
In VFs, there is a known issue which can cause writebacks
to not occur when interrupts are disabled and there are
less than 4 descriptors resulting in TX timeout. Timeout
can also occur due to lost interrupt.

The current implementation for detecting and recovering
from hung queues in the PF is problematic because it actually
actively encourages lost interrupts.  By triggering a SW
interrupt, interrupts are forced on.  If we are already in
napi_poll and an interrupt fires, napi_poll will not be
rescheduled and the interrupt is effectively lost; thereby
potentially *causing* hung queues.

This patch checks whether packets are being processed between
every watchdog cycle and determine potential hung queue and
fires triggers SW interrupt only for that particular queue.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-23 11:29:19 -08:00
David S. Miller a0ce093180 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-01-09 10:37:00 -05:00
Jesper Dangaard Brouer 871288248d i40e: setup xdp_rxq_info
The i40e driver has a special "FDIR" RX-ring (I40E_VSI_FDIR) which is
a sideband channel for configuring/updating the flow director tables.
This (i40e_vsi_)type does not invoke XDP-ebpf code.

As suggested by Björn (V2): Instead of marking this I40E_VSI_FDIR RX-ring
a special case, reverse the logic and only select RX-rings of type
I40E_VSI_MAIN to register xdp_rxq_info's for.

Driver hook points for xdp_rxq_info:
 * reg  : i40e_setup_rx_descriptors (via i40e_vsi_setup_rx_resources)
 * unreg: i40e_free_rx_resources    (via i40e_vsi_free_rx_resources)

Tested on actual hardware with samples/bpf program.

V2: Fixed bug in i40e_set_ringparam (memset zero) + match on I40E_VSI_MAIN.
V4: Update patch desc that got out-of-sync with code.

Cc: intel-wired-lan@lists.osuosl.org
Cc: Björn Töpel <bjorn.topel@intel.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-01-05 15:21:21 -08:00
Alexander Duyck 248de22e63 i40e/i40evf: Account for frags split over multiple descriptors in check linearize
The original code for __i40e_chk_linearize didn't take into account the
fact that if a fragment is 16K in size or larger it has to be split over 2
descriptors and the smaller of those 2 descriptors will be on the trailing
edge of the transmit. As a result we can get into situations where we didn't
catch requests that could result in a Tx hang.

This patch takes care of that by subtracting the length of all but the
trailing edge of the stale fragment before we test for sum. By doing this
we can guarantee that we have all cases covered, including the case of a
fragment that spans multiple descriptors. We don't need to worry about
checking the inner portions of this since 12K is the maximum aligned DMA
size and that is larger than any MSS will ever be since the MTU limit for
jumbos is something on the order of 9K.

Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2018-01-03 08:47:29 -08:00
Brian King 52c6912fde i40e: Use smp_rmb rather than read_barrier_depends
The original issue being fixed in this patch was seen with the ixgbe
driver, but the same issue exists with i40e as well, as the code is
very similar. read_barrier_depends is not sufficient to ensure
loads following it are not speculatively loaded out of order
by the CPU, which can result in stale data being loaded, causing
potential system crashes.

Cc: stable <stable@vger.kernel.org>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-11-21 23:43:21 -08:00
Alexander Duyck aa250f1186 i40e/i40evf: Revert "i40e/i40evf: bump tail only in multiples of 8"
This reverts commit 11f29003d6.

I am reverting this as I am fairly certain this can result in a memory leak
when combined with the current page recycling scheme. Specifically we end
up attempting to allocate fewer buffers than we recycled and this results
in us rewinding the next to alloc pointer which leads to leaks when we
overwrite the rx_buffer_info when processing the next frame.

Fixes: 11f29003d6 ("i40e/i40evf: bump tail only in multiples of 8")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-31 10:41:10 -07:00
David S. Miller e1ea2f9856 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Several conflicts here.

NFP driver bug fix adding nfp_netdev_is_nfp_repr() check to
nfp_fl_output() needed some adjustments because the code block is in
an else block now.

Parallel additions to net/pkt_cls.h and net/sch_generic.h

A bug fix in __tcp_retransmit_skb() conflicted with some of
the rbtree changes in net-next.

The tc action RCU callback fixes in 'net' had some overlap with some
of the recent tcf_block reworking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-30 21:09:24 +09:00
Alexander Duyck 62b4c6694d i40e: Add programming descriptors to cleaned_count
This patch updates the i40e driver to include programming descriptors in
the cleaned_count. Without this change it becomes possible for us to leak
memory as we don't trigger a large enough allocation when the time comes to
allocate new buffers and we end up overwriting a number of rx_buffers equal
to the number of programming descriptors we encountered.

Fixes: 0e626ff7cc ("i40e: Fix support for flow director programming status")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-26 07:42:58 -07:00
Alexander Duyck 10781348ca i40e: Fix incorrect use of tx_itr_setting when checking for Rx ITR setup
It looks like there was either a copy/paste error or just a typo that
resulted in the Tx ITR setting being used to determine if we were using
adaptive Rx interrupt moderation or not.

This patch fixes the typo.

Fixes: 65e87c0398 ("i40evf: support queue-specific settings for interrupt moderation")
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-26 07:42:58 -07:00
David S. Miller f8ddadc4db Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
There were quite a few overlapping sets of changes here.

Daniel's bug fix for off-by-ones in the new BPF branch instructions,
along with the added allowances for "data_end > ptr + x" forms
collided with the metadata additions.

Along with those three changes came veritifer test cases, which in
their final form I tried to group together properly.  If I had just
trimmed GIT's conflict tags as-is, this would have split up the
meta tests unnecessarily.

In the socketmap code, a set of preemption disabling changes
overlapped with the rename of bpf_compute_data_end() to
bpf_compute_data_pointers().

Changes were made to the mv88e6060.c driver set addr method
which got removed in net-next.

The hyperv transport socket layer had a locking change in 'net'
which overlapped with a change of socket state macro usage
in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-10-22 13:39:14 +01:00
Alexander Duyck 2b9478ffc5 i40e: Fix memory leak related filter programming status
It looks like we weren't correctly placing the pages from buffers that had
been used to return a filter programming status back on the ring. As a
result they were being overwritten and tracking of the pages was lost.

This change works to correct that by incorporating part of
i40e_put_rx_buffer into the programming status handler code. As a result we
should now be correctly placing the pages for those buffers on the
re-allocation list instead of letting them stay in place.

Fixes: 0e626ff7cc ("i40e: Fix support for flow director programming status")
Reported-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Anders K Pedersen <akp@cohaesio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-10 08:04:36 -07:00
Jacob Keller 11f29003d6 i40e/i40evf: bump tail only in multiples of 8
Hardware only fetches descriptors on cachelines of 8, essentially
ignoring the lower 3 bits of the tail register. Thus, it is pointless to
bump tail by an unaligned access as the hardware will ignore some of the
new descriptors we allocated. Thus, it's ideal if we can ensure tail
writes are always aligned to 8.

At first, it seems like we'd already do this, since we allocate
descriptors in batches which are a multiple of 8. Since we'd always
increment by a multiple of 8, it seems like the value should always be
aligned.

However, this ignores allocation failures. If we fail to allocate
a buffer, our tail register will become unaligned. Once it has become
unaligned it will essentially be stuck unaligned until a buffer
allocation happens to fail at the exact amount necessary to re-align it.

We can do better, by simply rounding down the number of buffers we're
about to allocate (cleaned_count) such that "next_to_clean
+ cleaned_count" is rounded to the nearest multiple of 8.

We do this by calculating how far off that value is and subtracting it
from the cleaned_count. This essentially defers allocation of buffers if
they're going to be ignored by hardware anyways, and re-aligns our
next_to_use and tail values after a failure to allocate a descriptor.

This calculation ensures that we always align the tail writes in a way
the hardware expects and don't unnecessarily allocate buffers which
won't be fetched immediately.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09 14:26:29 -07:00
Jacob Keller dbadbbe235 i40e/i40evf: always set the CLEARPBA flag when re-enabling interrupts
In the past we changed driver behavior to not clear the PBA when
re-enabling interrupts. This change was motivated by the flawed belief
that clearing the PBA would cause a lost interrupt if a receive
interrupt occurred while interrupts were disabled.

According to empirical testing this isn't the case. Additionally, the
data sheet specifically says that we should set the CLEARPBA bit when
re-enabling interrupts in a polling setup.

This reverts commit 40d72a5098 ("i40e/i40evf: don't lose interrupts")

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-09 14:21:26 -07:00
Jacob Keller a5340d933e i40e: ignore skb->xmit_more when deciding to set RS bit
Since commit 6a7fded776 ("i40e: Fix RS bit update in Tx path and
disable force WB workaround") we've tried to "optimize" setting the
RS bit based around skb->xmit_more. This same logic was refactored
in commit 1dc8b53879 ("i40e: Reorder logic for coalescing RS bits"),
but ultimately was not functionally changed.

Using skb->xmit_more in this way is incorrect, because in certain
circumstances we may see a large number of skbs in sequence with
xmit_more set. This leads to a performance loss as the hardware does not
writeback anything for those packets, which delays the time it takes for
us to respond to the stack transmit requests. This significantly impacts
UDP performance, especially when layered with multiple devices, such as
bonding, VLANs, and vnet setups.

This was not noticed until now because it is difficult to create a setup
which reproduces the issue. It was discovered in a UDP_STREAM test in
a VM, connected using a vnet device to a bridge, which is connected to
a bonded pair of X710 ports in active-backup mode with a VLAN. These
layered devices seem to compound the number of skbs transmitted at once
by the qdisc. Additionally, the problem can be masked by reducing the
ITR value.

Since the original commit does not provide strong justification for this
RS bit "optimization", revert to the previous behavior of setting the RS
bit every 4th packet.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-10-06 08:11:32 -07:00
Jacob Keller 2b634bb068 i40e/i40evf: rename bytes_per_int to bytes_per_usec
This value is not calculating bytes_per_int, which would actually just
be bytes/ITR_COUNTDOWN_START, but rather it's calculating bytes/usecs.

Rename the variable for clarity so that future developers understand
what the value is actually calculating.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-09-29 12:50:58 -07:00
Daniel Borkmann de8f3a83b0 bpf: add meta pointer for direct access
This work enables generic transfer of metadata from XDP into skb. The
basic idea is that we can make use of the fact that the resulting skb
must be linear and already comes with a larger headroom for supporting
bpf_xdp_adjust_head(), which mangles xdp->data. Here, we base our work
on a similar principle and introduce a small helper bpf_xdp_adjust_meta()
for adjusting a new pointer called xdp->data_meta. Thus, the packet has
a flexible and programmable room for meta data, followed by the actual
packet data. struct xdp_buff is therefore laid out that we first point
to data_hard_start, then data_meta directly prepended to data followed
by data_end marking the end of packet. bpf_xdp_adjust_head() takes into
account whether we have meta data already prepended and if so, memmove()s
this along with the given offset provided there's enough room.

xdp->data_meta is optional and programs are not required to use it. The
rationale is that when we process the packet in XDP (e.g. as DoS filter),
we can push further meta data along with it for the XDP_PASS case, and
give the guarantee that a clsact ingress BPF program on the same device
can pick this up for further post-processing. Since we work with skb
there, we can also set skb->mark, skb->priority or other skb meta data
out of BPF, thus having this scratch space generic and programmable
allows for more flexibility than defining a direct 1:1 transfer of
potentially new XDP members into skb (it's also more efficient as we
don't need to initialize/handle each of such new members). The facility
also works together with GRO aggregation. The scratch space at the head
of the packet can be multiple of 4 byte up to 32 byte large. Drivers not
yet supporting xdp->data_meta can simply be set up with xdp->data_meta
as xdp->data + 1 as bpf_xdp_adjust_meta() will detect this and bail out,
such that the subsequent match against xdp->data for later access is
guaranteed to fail.

The verifier treats xdp->data_meta/xdp->data the same way as we treat
xdp->data/xdp->data_end pointer comparisons. The requirement for doing
the compare against xdp->data is that it hasn't been modified from it's
original address we got from ctx access. It may have a range marking
already from prior successful xdp->data/xdp->data_end pointer comparisons
though.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-09-26 13:36:44 -07:00
Jacob Keller 742c987575 i40e/i40evf: avoid dynamic ITR updates when polling or low packet rate
The dynamic ITR algorithm depends on a calculation of usecs which
assumes that the interrupts have been firing constantly at the interrupt
throttle rate. This is not guaranteed because we could have a low packet
rate, or have been polling in software.

We'll estimate whether this is the case by using jiffies to determine if
we've been too long. If the time difference of jiffies is larger we are
guaranteed to have an incorrect calculation. If the time difference of
jiffies is smaller we might have been polling some but the difference
shouldn't affect the calculation too much.

This ensures that we don't get stuck in BULK latency during certain rare
situations where we receive bursts of packets that force us into NAPI
polling.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:15:24 -07:00
Jacob Keller 0a2c7722be i40e/i40evf: remove ULTRA latency mode
Since commit c56625d597 ("i40e/i40evf: change dynamic interrupt
thresholds") a new higher latency ITR setting called I40E_ULTRA_LATENCY
was added with a cryptic comment about how it was meant for adjusting Rx
more aggressively when streaming small packets.

This mode was attempting to calculate packets per second and then kick
in when we have a huge number of small packets.

Unfortunately, the ULTRA setting was kicking in for workloads it wasn't
intended for including single-thread UDP_STREAM workloads.

This wasn't caught for a variety of reasons. First, the ip_defrag
routines were improved somewhat which makes the UDP_STREAM test still
reasonable at 10GbE, even when dropped down to 8k interrupts a second.
Additionally, some other obvious workloads appear to work fine, such
as TCP_STREAM.

The number 40k doesn't make sense for a number of reasons. First, we
absolutely can do more than 40k packets per second. Second, we calculate
the value inline in an integer, which sometimes can overflow resulting
in using incorrect values.

If we fix this overflow it makes it even more likely that we'll enter
ULTRA mode which is the opposite of what we want.

The ULTRA mode was added originally as a way to reduce CPU utilization
during a small packet workload where we weren't keeping up anyways. It
should never have been kicking in during these other workloads.

Given the issues outlined above, let's remove the ULTRA latency mode. If
necessary, a better solution to the CPU utilization issue for small
packet workloads will be added in a future patch.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:12:15 -07:00
Jacob Keller 6d9777298b i40e: invert logic for checking incorrect cpu vs irq affinity
In commit 96db776a36 ("i40e/vf: fix interrupt affinity bug")
we added some code to force exit of polling in case we did
not have the correct CPU. This is important since it was possible for
the IRQ affinity to be changed while the CPU is pegged at 100%. This can
result in the polling routine being stuck on the wrong CPU until
traffic finally stops.

Unfortunately, the implementation, "if the CPU is correct, exit as
normal, otherwise, fall-through to the end-polling exit" is incredibly
confusing to reason about. In this case, the normal flow looks like the
exception, while the exception actually occurs far away from the if
statement and comment.

We recently discovered and fixed a bug in this code because we were
incorrectly initializing the affinity mask.

Re-write the code so that the exceptional case is handled at the check,
rather than having the logic be spread through the regular exit flow.
This does end up with minor code duplication, but the resulting code is
much easier to reason about.

The new logic is identical, but inverted. If we are running on a CPU not
in our affinity mask, we'll exit polling. However, the code flow is much
easier to understand.

Note that we don't actually have to check for MSI-X, because in the MSI
case we'll only have one q_vector, but its default affinity mask should
be correct as it includes all CPUs when it's initialized. Further, we
could at some point add code to setup the notifier for the non-MSI-X
case and enable this workaround for that case too, if desired, though
there isn't much gain since its unlikely to be the common case.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:10:48 -07:00
Jacob Keller 9254c0e34e i40e: move enabling icr0 into i40e_update_enable_itr
If we don't have MSI-X enabled, we handle interrupts on all icr0. This
is a special case, so let's move the conditional into
i40e_update_enable_itr() in order to make i40e_napi_poll easier to
read about.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-08-27 16:07:13 -07:00
David S. Miller 3118e6e19d Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The UDP offload conflict is dealt with by simply taking what is
in net-next where we have removed all of the UFO handling code
entirely.

The TCP conflict was a case of local variables in a function
being removed from both net and net-next.

In netvsc we had an assignment right next to where a missing
set of u64 stats sync object inits were added.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-09 16:28:45 -07:00
Florian Fainelli 7d6d067790 i40e: Initialize 64-bit statistics TX ring seqcount
On 32-bit hosts and with CONFIG_DEBUG_LOCK_ALLOC we should be seeing a
lockdep splat indicating this seqcount is not correctly initialized, fix
that.

Fixes: 980e9b1186 ("i40e: Add support for 64 bit netstats")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-08-01 20:06:06 -07:00
Jesse Brandeburg b85c94b617 i40e/i40evf: remove mismatched type warnings
Compiler reported several places where driver compared
signed and unsigned types.  Cast or change the types to remove
the warnings.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-07-26 03:25:20 -07:00
Jesse Brandeburg 601a2e7ac5 i40e/i40evf: make IPv6 ATR code clearer
This just reorders some local vars and makes the code flow
clearer.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-07-26 03:25:20 -07:00
Björn Töpel 74608d17fe i40e: add support for XDP_TX action
This patch adds proper XDP_TX action support. For each Tx ring, an
additional XDP Tx ring is allocated and setup. This version does the
DMA mapping in the fast-path, which will penalize performance for
IOMMU enabled systems. Further, debugfs support is not wired up for
the XDP Tx rings.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20 18:17:09 -07:00
Björn Töpel 0c8493d90b i40e: add XDP support for pass and drop actions
This commit adds basic XDP support for i40e derived NICs. All XDP
actions will end up in XDP_DROP.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-20 18:17:09 -07:00
David S. Miller 0ddead90b2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
The conflicts were two cases of overlapping changes in
batman-adv and the qed driver.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-15 11:59:32 -04:00
Jacob Keller 6964e53f55 i40e: fix handling of HW ATR eviction
A recent commit to refactor the driver and remove the hw_disabled_flags
field accidentally introduced two regressions. First, we overwrote
pf->flags which removed various key flags including the MSI-X settings.

Additionally, it was intended that we have now two flags,
HW_ATR_EVICT_CAPABLE and HW_ATR_EVICT_ENABLED, but this was not done,
and we accidentally were mis-using HW_ATR_EVICT_CAPABLE everywhere.

This patch adds the missing piece, HW_ATR_EVICT_ENABLED, and safely
updates pf->flags instead of overwriting it.

Without this patch we will have many problems including disabling MSI-X
support, and we'll attempt to use HW ATR eviction on devices which do
not support it.

Fixes: 47994c119a ("i40e: remove hw_disabled_flags in favor of using separate flag bits", 2017-04-19)
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-12 18:53:02 -04:00
David S. Miller 216fe8f021 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Just some simple overlapping changes in marvell PHY driver
and the DSA core code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2017-06-06 22:20:08 -04:00
Björn Töpel 2aae918c7a i40e/i40evf: proper update of the page_offset field
In f8b45b74cc ("i40e/i40evf: Use build_skb to build frames")
i40e_build_skb updates the page_offset field with an incorrect offset,
which can lead to data corruption. This patch updates page_offset
correctly, by properly setting truesize.

Note that the bug only appears on architectures where PAGE_SIZE is
8192 or larger.

Fixes: f8b45b74cc ("i40e/i40evf: Use build_skb to build frames")
Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Acked-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-06-06 02:49:15 -07:00
Jacob Keller 0bc0706b46 i40e: check for Tx timestamp timeouts during watchdog
The i40e driver has logic to handle only one Tx timestamp at a time,
using a state bit lock to avoid multiple requests at once.

It may be possible, if incredibly unlikely, that a Tx timestamp event is
requested but never completes. Since we use an interrupt scheme to
determine when the Tx timestamp occurred we would never clear the state
bit in this case.

Add an i40e_ptp_tx_hang() function similar to the already existing
i40e_ptp_rx_hang() function. This function runs in the watchdog routine
and makes sure we eventually recover from this case instead of
permanently disabling Tx timestamps.

Note: there is no currently known way to cause this without hacking the
driver code to force it.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-05-31 03:12:06 -07:00
Jacob Keller 2955faca04 i40e: add statistic indicating number of skipped Tx timestamps
The i40e driver can only handle one Tx timestamp request at a time.
This means it is possible for an application timestamp request to be
ignored.

There is no easy way for an administrator to determine if this occurred.
Add a new statistic which tracks this, tx_hwtstamp_skipped.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-05-31 03:09:14 -07:00
Jacob Keller 69077577af i40e: avoid permanent lock of *_PTP_TX_IN_PROGRESS
The i40e driver uses a bit lock to indicate when a Tx timestamp is in
progress to avoid attempting to timestamp multiple packets at once. This
is required because hardware only has registers to handle one request at
a time.

There is a corner case where we failed to cleanup the bit lock after
a failed transmit. This can potentially result in a state bit being
locked forever.

Add some cleanup code to i40e_xmit_frame_ring to check and make sure we
cleanup incase of these failures. We also modify i40e_tx_map to return
an error code indication DMA failure.

Reported-by: Reported-by: David Mirabito <davidm@metamako.com>
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-05-31 03:08:09 -07:00
Jacob Keller 47994c119a i40e: remove hw_disabled_flags in favor of using separate flag bits
The hw_disabled_flags field was added as a way of signifying that
a feature was automatically or temporarily disabled. However, we
actually only use this for FDir features. Replace its use with new
_AUTO_DISABLED flags instead. This is more readable, because you aren't
setting an *_ENABLED flag to *disable* the feature.

Additionally, clean up a few areas where we used these bits. First, we
don't really need to set the auto-disable flag for ATR if we're fully
disabling the feature via ethtool.

Second, we should always clear the auto-disable bits in case they somehow
got set when the feature was disabled. However, avoid displaying
a message that we've re-enabled the feature.

Third, we shouldn't be re-enabling ATR in the SB ntuple add flow,
because it might have been disabled due to space constraints. Instead,
we should just wait for the fdir_check_and_reenable to be called by the
watchdog.

Overall, this change allows us to simplify some code by removing an
extra field we didn't need, and the result should make it more clear as
to what we're actually doing with these flags.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30 04:53:58 -07:00
Jacob Keller 0da36b9774 i40e: use DECLARE_BITMAP for state fields
Instead of assuming our flags fit within an unsigned long, use
DECLARE_BITMAP which will ensure that we always allocate enough space.
Additionally, use __I40E_STATE_SIZE__ markers as the last element of the
enumeration so that the size of the BITMAP is compile-time assigned
rather than programmer-time assigned. This ensures that potential future
flag additions do not actually overrun the array. This is especially
important as 32bit systems would only have 32bit longs instead of 64bit
longs as we generally have assumed in the prior code.

This change also removes a dereference of the state fields throughout
the code, so it does have a bit of code churn. The conversions were
automated using sed replacements with an alternation

  s/&(vsi->back|vsi|pf)->state/\1->state/
  s/&adapter->vsi.state/adapter->vsi.state/

For debugfs, we modify the printing so that we can display chunks of the
state value on new lines. This ensures that we can print the entire set
of state values. Additionally, we now print them as 08lx to ensure that
they display nicely.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30 04:48:13 -07:00
Jacob Keller d19cb64b92 i40e: separate PF and VSI state flags
Avoid using the same named flags for both vsi->state and pf->state. This
makes code review easier, as it is more likely that future authors will
use the correct state field when checking bits. Previous commits already
found issues with at least one check, and possibly others may be
incorrect.

This reduces confusion as it is more clear what each flag represents,
and which flags are valid for which state field.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-30 04:42:30 -07:00
Scott Peterson ed0980c440 i40e/i40evf: Add tracepoints
This patch adds tracepoints to the i40e and i40evf drivers to which
BPF programs can be attached for feature testing and verification.
It's expected that an attached BPF program will identify and count or
log some interesting subset of traffic. The bcc-tools package is
helpful there for containing all the BPF arcana in a handy Python
wrapper. Though you can make these tracepoints log trace messages, the
messages themselves probably won't be very useful (other to verify the
tracepoint is being called while you're debugging your BPF program).

The idea here is that tracepoints have such low performance cost when
disabled that we can leave these in the upstream drivers. This may
eventually enable the instrumentation of unmodified customer systems
should the need arise to verify a NIC feature is working as expected.
In general this enables one set of feature verification tools to be
used on these drivers whether they're built with the kernel or
separately.

Users are advised against using these tracepoints for anything other
than a diagnostic tool. They have a performance impact when enabled,
and their exact placement and form may change as we see how well they
work in practice for the purposes above.

Change-ID: Id6014a7322c0e6d08068114dd20bd156f2f6435e
Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-19 16:47:31 -07:00
Alexander Duyck 0e626ff7cc i40e: Fix support for flow director programming status
This patch fixes an issue I introduced when I converted the code over to
using the length field to determine if a descriptor was done or not. It
turns out that we are also processing programming descriptors in the Rx
path and need to have these processed even though the length field will be
0 on these packets.  What will happen with a programming descriptor is that
we will receive a descriptor that has the SPH bit set, and the header
length and packet length fields cleared.

To account for this we should be checking for the bit for split header
being set even though we aren't actually using header split. This bit is
set in the length field to indicate if a programming descriptor response is
contained in the descriptor. Since we don't support header split we don't
need to perform the extra checks of using a fixed value for the entire
length field.

In addition I am moving the function for checking if a filter is a
programming status filter into the i40e_txrx.c file since there is no
longer support for FCoE it doesn't make sense to keep this file in i40e.h.

Change-ID: I12c359c3dc70adb9d6b92b27324bb2c7f04c1a06
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-19 16:36:18 -07:00
Alexander Duyck f8b45b74cc i40e/i40evf: Use build_skb to build frames
This patch is meant to improve the performance of the Rx path.
Specifically by using build_skb we have several distinct advantages.

In the case of small frames we were previously using a copy-break approach.
This means that we were allocating a page fragment to use for skb->head,
and were having to copy the packet into that region.  Both of those calls
are now avoided since we just build the skb around the data.

In the case of large frames the gains are much more significant.
Specifically we were having to allocate skb->head, and copy the headers as
before.  However in addition we were having to parse the header using
eth_get_headlen which could be quite expensive.  All of this is avoided by
building the frame around the data.  I have seen gains as high as 30% when
using VXLAN for instance due to just header pulling overhead.

Finally with all this in place it also sets us up to start looking at
enabling XDP.  Specifically we now have a path in which the data is in the
page and the frame is built around it.  So if we parse it with XDP before
we call build_skb we can take care of any necessary processing there.

Change-ID: Id4bdd618e94473d41f892417e5d8019639e421e3
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08 02:53:51 -07:00
Alexander Duyck ca9ec0888d i40e/i40evf: Add support for padding start of frames
This patch adds padding to the start of frames to make room for headroom
for us to eventually start using build_skb.  Right now we guarantee at
least NET_SKB_PAD + NET_IP_ALIGN, however we allocate more space if more is
available.  For example on x86 the headroom should be 192 bytes.

On systems that have too large of a cache line size to support storing 1.5K
padding and shared info we default to using 3K buffers and reserve
everything that isn't used for skb_shared_info or the data buffer for
headroom.

Change-ID: I33c641c9a1ea10cf7cc484c2d20985368d2d709a
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08 02:53:51 -07:00
Alexander Duyck 98efd69493 i40e/i40evf: Add support for using order 1 pages with a 3K buffer
There are situations where adding padding to the front and back of an Rx
buffer will require that we add additional padding.  Specifically if
NET_IP_ALIGN is non-zero, or the MTU size is larger than 7.5K we would need
to use 2K buffers which leaves us with no room for the padding.

To preemptively address these cases I am adding support for 3K buffers to
the Rx path so that we can provide the additional padding needed in the
event of NET_IP_ALIGN being non-zero or a cache line being greater than 64.

Change-ID: I938bc1ba611285428df39a613cd66f98e60b55c7
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08 02:53:50 -07:00
Alan Brady 17daabb5e8 i40e: Simplify i40e_detect_recover_hung_queue logic
This patch greatly reduces the unneeded complexity in the
i40e_detect_recover_hung_queue code path.  The previous implementation
set a 'hung bit' which would then get cleared while polling.  If the
detection routine was called a second time with the bit already set, we
would issue a software interrupt.  This patch makes it such that if
interrupts are disabled and we have pending TX descriptors, we trigger a
software interrupt since in, the worst case, queues are already clean
and we have an extra interrupt.

Additionally this patch removes the workaround for lost interrupts as
calling napi_reschedule in this context can cause software interrupts to
fire on the wrong CPU.

Change-ID: Iae108582a3ceb6229ed1d22e4ed6e69cf97aad8d
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08 02:53:49 -07:00
Alexander Duyck e8c5f7231c i40e: Swap use of pf->flags and pf->hw_disabled_flags for ATR Eviction
This is a minor cleanup so that we are always updating pf->flags when we
make a change to the private flags instead of updating a mix of either
pf->flags and/or pf->hw_disabled_flags.

In addition I went through and cleaned out all the spots where we were
using the X722 define in regards to this flag.

Lastly since we changed the logic I went through and flushed out any
redundancy and cleaned up the handling of the flags in the Tx path.

Change-ID: I79ff95a7272bb2533251ff11ef91e89ccb80b610
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08 02:53:49 -07:00
Jacob Keller a346fb836c i40e: update error message when trying to add invalid filters
Re-word the error message displayed when adding a filter with an
invalid flow type. Additionally, report a distinct error message when
the IPv4 protocol is at fault.

Change-ID: Iba3d85b87f8d383c97c8bdd180df34a6adf3ee67
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-04-08 02:53:48 -07:00
Alexander Duyck fa2343e903 i40e/i40evf: Break i40e_fetch_rx_buffer up to allow for reuse of frag code
This patch is meant to clean up the code in preparation for us adding
support for build_skb.  Specifically we deconstruct i40e_fetch_buffer into
several functions so that those functions can later be reused when we add a
path for build_skb.

Specifically with this change we split out the code for adding a page to an
exiting skb.

Change-ID: Iab1efbab6b8b97cb60ab9fdd0be1d37a056a154d
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-29 02:15:06 -07:00
Alexander Duyck a0cfc3130e i40e/i40evf: Pull out code for cleaning up Rx buffers
This patch pulls out the code responsible for handling buffer recycling and
page counting and distributes it through several functions.  This allows us
to commonize the bits that handle either freeing or recycling the buffers.

As far as the page count tracking one change to the logic is that
pagecnt_bias is decremented as soon as we call i40e_get_rx_buffer.  It is
then the responsibility of the function that pulls the data to either
increment the pagecnt_bias if the buffer can be recycled as-is, or to
update page_offset so that we are pointing at the correct location for
placement of the next buffer.

Change-ID: Ibac576360cb7f0b1627f2a993d13c1a8a2bf60af
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-29 02:15:06 -07:00
Alexander Duyck 9a064128fc i40e/i40evf: Pull code for grabbing and syncing rx_buffer from fetch_buffer
This patch pulls the code responsible for fetching the Rx buffer and
synchronizing DMA into a function, specifically called i40e_get_rx_buffer.

The general idea is to allow for better code reuse by pulling this out of
i40e_fetch_rx_buffer.  We dropped a couple of prefetches since the time
between the prefetch being called and the data being accessed was too small
to be useful.

Change-ID: I4885fce4b2637dbedc8e16431169d23d3d7e79b9
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-29 02:15:06 -07:00
Alexander Duyck d57c0e08c7 i40e/i40evf: Use length to determine if descriptor is done
This change makes it so that we use the length of the packet instead of the
DD status bit to determine if a new descriptor is ready to be processed.
The obvious advantage is that it cuts down on reads as we don't really even
need the DD bit if going from a 0 to a non-zero value on size is enough to
inform us that the packet has been completed.

Change-ID: Iebdf9cdb36c454ef092df27199b92ad09c374231
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-29 02:15:06 -07:00
Alexander Duyck 9eed69a914 i40e: Drop FCoE code from core driver files
Looking over the code for FCoE it looks like the Rx path has been broken at
least since the last major Rx refactor almost a year ago.  It seems like
FCoE isn't supported for any of the Fortville/Fortpark hardware so there
isn't much point in carrying the code around, especially if it is broken
and untested.

Change-ID: I892de8fa551cb129ce2361e738ff82ce55fa229e
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27 16:47:43 -07:00
Alexander Duyck a5b268e4b1 i40e/i40evf: Clean-up process_skb_fields
This is a minor clean-up to make the i40e/i40evf process_skb_fields
function look a little more like what we have in igb.  The Rx checksum
function called out a need for skb->protocol but I can't see where it
actually needs it.  I am assuming this is something that was likely
refactored out some time ago as the Rx checksum code has gone through a few
rewrites.

Change-ID: I0b4668a34d90b61b66ded7c7c26e19a3e2d06251
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27 16:47:43 -07:00
Alexander Duyck 741b8b832a i40e/i40evf: Fix use after free in Rx cleanup path
We need to reset skb back to NULL when we have freed it in the Rx cleanup
path.  I found one spot where this wasn't occurring so this patch fixes it.

Change-ID: Iaca68934200732cd4a63eb0bd83b539c95f8c4dd
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27 16:45:14 -07:00
Alexander Duyck 1793668c3b i40e/i40evf: Update code to better handle incrementing page count
Update the driver code so that we do bulk updates of the page reference
count instead of just incrementing it by one reference at a time.  The
advantage to doing this is that we cut down on atomic operations and
this in turn should give us a slight improvement in cycles per packet.
In addition if we eventually move this over to using build_skb the gains
will be more noticeable.

I also found and fixed a store forwarding stall from where we were
assigning "*new_buff = *old_buff".  By breaking it up into individual
copies we can avoid this and as a result the performance is slightly
improved.

Change-ID: I1d3880dece4133eca3c32423b04a5467321ccc52
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-27 16:45:13 -07:00
Jacob Keller f223c8752a i40e: add support for SCTPv4 FDir filters
Enable FDir filters for SCTPv4 packets using the ethtool ntuple
interface to enable filters. The ethtool API does not allow masking on
the verification tag.

Change-Id: I093e88a8143994c7e6f4b7b17a0bd5cf861d18e4
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-23 21:13:33 -07:00
Jacob Keller 0e588de17f i40e: implement support for flexible word payload
Add support for flexible payloads passed via ethtool user-def field.
This support is somewhat limited due to hardware design. The input set
can only be programmed once per filter type, and the flexible offset is
part of this filter input set. This means that the user cannot program
both a regular and a flexible filter at the same time for a given flow
type. Additionally, the user may not program two flexible filters of the
same flow type with different offsets, although they are allowed to
configure different values at that offset location.

We support a single flexible word (2byte) value per protocol type, and
we handle the FLX_PIT register using a list of flexible entries so that
each flow type may be configured separately.

Due to hardware implementation, the flexible data is offset from the
start of the packet payload, and thus may not be in part of the header
data. For this reason, the offset provided by the user defined data is
interpreted as a byte offset from the start of the matching payload.
Previous implementations have tried to represent the offset as from the
start of the frame, but this is not feasible because header sizes may
change due to options.

Change-Id: 36ed27995e97de63f9aea5ade5778ff038d6f811
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-23 21:13:33 -07:00
Jacob Keller 097dbf5250 i40e: add counters for UDP/IPv4 and IPv4 filters
In preparation for adding code to properly check the mask values, we
will need to know the number of active filters for each type. Add
counters for each filter type. Rename the already existing fd_tcp_rule
to fd_tcp4_filter_cnt to match the style of other names. To avoid style
warnings, avoid assigning multiple parameters at once, and fix up one
other case where we did so previously.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:22 -07:00
Jacob Keller 377cc24980 i40e: exit ATR mode only when adding TCP/IPv4 filter succeeds
Move ATR exit check after we have sent the TCP/IPv4 filter to the ring
successfully. This avoids an issue where we potentially update the
filter count without actually succeeding in adding the filter. Now, we
only increment the fd_tcp_rule after we've succeeded. Additionally, we
will re-enable ATR mode only after deletion of the filter is actually
posted to the FDIR ring.

Change-ID: If5c1dea422081cc5e2de65618b01b4c3bf6bd586
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Jacob Keller e5187ee3ee i40e: return immediately when failing to add fdir filter
Instead of setting err=true and checking this to determine when to free
the raw_packet near the end of the function, simply kfree and return
immediately. The resulting code is a bit cleaner and has one less
variable. This also resolves a subtle bug in the ipv4 case which could
fail to add the first filter and then never free the memory, resulting
in a small memory leak.

Change-ID: I7583aac033481dc794b4acaa14445059c8930ff1
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed-by: Avinash Dayanand <avinash.dayanand@intel.com>
Reviewed-by: Alan Brady <alan.brady@intel.com>
Reviewed-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Jacob Keller 8ce43dce6f i40e: don't use arrays for (src|dst)_ip
The code originally included src_ip and dst_ip with enough space to
support ipv6 filters. However, no actual support for ipv6 filters has
been implemented. Thus, remove the arrays and just use __be32 values.
Should ipv6 support be added in the future, we can replace these with
a union that has sizes for both values.

Change-Id: I1bc04032244a80eb6ebc8a4e6c723a4a665c1dd5
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-20 16:45:20 -07:00
Harshitha Ramamurthy b77ac97593 i40e: rename auto_disable_flags to hw_disabled_flags
A previous commit introduced a field that tracks the features
that are disabled due to HW resource limitations as opposed
to the featured disabled by the user. This patch changes the
name of the field to make it more readable since it might get
confusing when looking at code containing both the flags
field and the auto_disable_features field together.

Change-ID: Idcc9888659698f6fe3ccff17c8c3f09b5026f708
Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-15 02:02:02 -07:00
Alexander Duyck 59605bc096 i40e/i40evf: Add support for mapping pages with DMA attributes
This patch adds support for DMA_ATTR_SKIP_CPU_SYNC and
DMA_ATTR_WEAK_ORDERING. By enabling both of these for the Rx path we
are able to see performance improvements on architectures that implement
either one due to the fact that page mapping and unmapping only has to
sync what is actually being used instead of the entire buffer. In addition
by enabling the weak ordering attribute enables a performance improvement
for architectures that can associate a memory ordering with a DMA buffer
such as Sparc.

Change-ID: If176824e8231c5b24b8a5d55b339a6026738fc75
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-03-15 02:00:14 -07:00
Jacob Keller b9c015d421 i40e: mark the value passed to csum_replace_by_diff as __wsum
Fix, or rather, avoid a sparse warning caused by the fact that
csum_replace_by_diff expects to receive a __wsum value. Since the
calculation appears to work, simply typecast the passed paylen value to
__wsum to avoid the warning.

This seems pretty fishy since __wsum was obviously annotated as
a separate type on purpose, so this throws the entire calculation into
question. Since it currently appears to behave as expected, the typecast
is probably safe.

Change-ID: I4fdc5cddd589abc16098176e8a61127e761488f4
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-18 20:35:36 -08:00
Carolyn Wyborny 3c234c4709 i40e: Fix Adaptive ITR enabling
This patch fixes a bug introduced with the addition of the per queue
ITR feature support in ethtool.  With that addition, there were
functions added which converted the ITR settings to binary values.
The IS_ENABLED macros that run on those values check whether a bit
is set or not and with the value being binary, the bit check always
returned ITR disabled which prevents any updating of the ITR rate.
This patch fixes the problem by changing the functions to return the
current ITR value instead and renaming it to better reflect
its function.  These functions now provide a value which will be
accurately asessed and update the ITR as intended.

Change-ID: I14f1d088d052e27f652aaa3113e186415ddea1fc
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-18 20:35:36 -08:00
Jacob Keller a158aeaf5b i40e: update comment explaining where FDIR buffers are freed
The original comment implies that the only location where the raw_packet
buffer will be freed is in i40e_clean_tx_ring() which is incorrect. In
fact this isn't even the normal case. Update the comment explaining
where the memory is freed.

Change-ID: Ie0defc35ed1c3af183f81fdc60b6d783707a5595
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11 20:39:01 -08:00
Scott Peterson 9b37c93731 i40e/i40evf: eliminate i40e_pull_tail()
Reorganize the i40e_pull_tail() logic, doing it in i40e_add_rx_frag()
where it's cheaper.  The igb driver does this the same way.

Also renames i40e_page_is_reserved() to reflect what it actually
tests.

Change-ID: Icd9cc507aae1fcdc02308b3a09034111b4c24071
Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11 20:39:01 -08:00
Scott Peterson e72e56597b i40e/i40evf: Moves skb from i40e_rx_buffer to i40e_ring
This patch reduces the size of struct i40e_rx_buffer by one pointer,
and makes the i40e driver a little more consistent with the igb driver
in terms of packets that span buffers.

We do this by moving the skb field from struct i40e_rx_buffer to
struct i40e_ring. We pass the skb we already have (or NULL if we
don't) to i40e_fetch_rx_buffer(), which skips the skb allocation if we
already have one for this packet.

Change-ID: I4ad48a531844494ba0c5d8e1a62209a057f661b0
Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11 20:39:01 -08:00
Scott Peterson 7987dcd7b9 i40e/i40evf: Limit DMA sync of RX buffers to actual packet size
On packet RX, we perform a DMA sync for CPU before passing the
packet up.  Here we limit that sync to the actual length of the
incoming packet, rather than always syncing the entire buffer.

Change-ID: I626aaf6c37275a8ce9e81efcaa773f327b331487
Signed-off-by: Scott Peterson <scott.d.peterson@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-11 20:39:01 -08:00
Alexander Duyck 52ea3e8020 i40e: Quick refactor to start moving data off stack and into Tx buffer info
This patch does some quick work to pull some of the data off of the stack
and hopefully start storing it in the Tx buffer info section of the Tx
ring.  Ideally we should be moving away from having to store much of
anything on the stack and can just maintain it all in the descriptor rings.

Change-ID: I4b4715ea1920e122502482b3f9e56a9a6cb1e9fe
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2017-02-02 22:41:10 -08:00
Alexander Duyck 6beb84a73e i40e/i40evf: napi_poll must return the work done
Currently the function i40e_napi-poll() returns 0 when it clean completely
the Rx rings, but this foul budget accounting in core code.

Fix this by returning the actual work done, capped to budget - 1, since
the core doesn't allow to return the full budget when the driver modifies
the NAPI status

This is based on a similar change that was made for the ixgbe driver by
Paolo Abeni.

Change-ID: Ic3d93ad2fa2fc8ce3164bc461e69367da0f9173b
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-12-06 20:46:13 -08:00
Alexander Duyck 1dc8b53879 i40e: Reorder logic for coalescing RS bits
This patch reorders the logic at the end of i40e_tx_map to address the
fact that the logic was rather convoluted and much larger than it needed
to be.

In order to try and coalesce the code paths I have updated some of the
comments and repurposed some of the variables in order to reduce
unnecessary overhead.

This patch does the following:
1.  Quit tracking skb->xmit_more with a flag, just max out packet_stride
2.  Drop tail_bump and do_rs and instead just use desc_count and td_cmd
3.  Pull comments from ixgbe that make need for wmb() more explicit.

Change-ID: Ic7da85ec75043c634e87fef958109789bcc6317c
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-31 14:26:40 -07:00
Alexander Duyck 4b8164467b i40e: Add common function for finding VSI by type
This patch adds a common method for finding a VSI by type.  The main
motivation for doing this is that the Flow Director path actually had two
ways of handling this, one stopped on first match and one did not.  This
patch makes it so that all callers of this function will get the same
approach for finding a VSI.

Change-ID: Ibf25de8acd8466582520694424aa87da66965fbd
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Signed-off-by: Bimmy Pujari <bimmy.pujari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-31 14:26:40 -07:00
Jacob Keller 124905012d i40e: replace PTP Rx timestamp hang logic
The current Rx timestamp hang logic is not very robust because it does
not notice a register is hung until all four timestamps have been
latched and we wait a full 5 seconds. Replace this logic with a newer Rx
hang detection based on storing the jiffies when we first notice
a receive timestamp event. We store each register's time separately,
along with a flag indicating if it is currently latched. Upon first
transitioning to latch, we will update the latch_events[i] jiffies
value. This indicates the time we first noticed this event. The watchdog
routine will simply check that the either the flag has been cleared, or
we have passed at least one second. In this case, it is able to clear
the Rx timestamp register under the assumption that it was for a dropped
frame. The benefit if this strategy is that we should be able to
detect and clear out stalled RXTIME_H registers before we exhaust the
supply of 4, and avoid complete stall of Rx timestamp events.

Change-ID: Id55458c0cd7a5dd0c951ff2b8ac0b2509364131f
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-31 14:26:40 -07:00
Jacob Keller 144ed17630 i40e: correct check for reading TSYNINDX from the receive descriptor
When hardware has taken a timestamp for a received packet, it indicates
which RXTIME register the timestamp was placed in by some bits in the
receive descriptor. It uses 3 bits, one to indicate if the descriptor
index is valid (ie: there was a timestamp) and 2 bits to indicate which
of the 4 registers to read. However, the driver currently does not check
the TSYNVALID bit and only checks the index. It assumes a zero index
means no timestamp, and a non zero index means a timestamp occurred.
While this appears to be true, it prevents ever reading a timestamp in
RXTIME[0], and causes the first timestamp the device captures to be
ignored.

Fix this by using the TSYNVALID bit correctly as the true indicator of
whether the packet has an associated timestamp.

Also rename the variable rsyn to tsyn as this is more descriptive and
matches the register names.

Change-ID: I4437e8f3a3df2c2ddb458b0fb61420f3dafc4c12
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-31 14:26:40 -07:00
Alexander Duyck 99dad8b34c i40e: Drop redundant Rx descriptor processing code
This patch cleans up several pieces of redundant code in the Rx clean-up
paths.

The first bit is that hdr_addr and the status_err_len portions of the Rx
descriptor represent the same value.  As such there is no point in setting
them to 0 before setting them to 0.  I'm dropping the second spot where we
are updating the value to 0 so that we only have 1 write for this value
instead of 2.

The second piece is the checking for the DD bit in the packet.  We only
need to check for a non-zero value for the status_err_len because if the
device is done with the descriptor it will have written something back and
the DD is just one piece of it.  In addition I have moved the reading of
the Rx descriptor bits related to rx_ptype down so that they are actually
below the dma_rmb() call so that we are guaranteed that we don't have any
funky 64b on 32b calls causing any ordering issues.

Change-ID: I256e44a025d3c64a7224aaaec37c852bfcb1871b
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28 23:28:39 -07:00
Alan Brady 96db776a36 i40e/i40evf: fix interrupt affinity bug
There exists a bug in which a 'perfect storm' can occur and cause
interrupts to fail to be correctly affinitized. This causes unexpected
behavior and has a substantial impact on performance when it happens.

The bug occurs if there is heavy traffic, any number of CPUs that have
an i40e interrupt are pegged at 100%, and the interrupt afffinity for
those CPUs is changed.  Instead of moving to the new CPU, the interrupt
continues to be polled while there is heavy traffic.

The bug is most readily realized as the driver is first brought up and
all interrupts start on CPU0. If there is heavy traffic and the
interrupt starts polling before the interrupt is affinitized, the
interrupt will be stuck on CPU0 until traffic stops. The bug, however,
can also be wrought out more simply by affinitizing all the interrupts
to a single CPU and then attempting to move any of those interrupts off
while there is heavy traffic.

This patch fixes the bug by registering for update notifications from
the kernel when the interrupt affinity changes. When that fires, we
cache the intended affinity mask. Then, while polling, if the cpu is
pegged at 100% and we failed to clean the rings, we check to make sure
we have the correct affinity and stop polling if we're firing on the
wrong CPU.  When the kernel successfully moves the interrupt, it will
start polling on the correct CPU. The performance impact is minimal
since the only time this section gets executed is when performance is
already compromised by the CPU.

Change-ID: I4410a880159b9dba1f8297aa72bef36dca34e830
Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28 23:28:39 -07:00
Alexander Duyck e1da71ca88 i40e: Drop code for unsupported flow types
We cannot currently support SCTP in the hardware, and IPV4_FLOW is not used
anywhere by the software so we can go through and drop the functionality
related to these two flow types.

In addition we cannot support masking based on the protocol value so if the
user is expecting a value other than TCP or UDP we should simply return an
error rather then trying to allocate a filter for a rule that will only
partially match what the user requested.

Change-ID: I10d52bb97d8104d76255fe244551814ff9531a63
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28 23:28:39 -07:00
Alexander Duyck ed245406da i40e: Rewrite Flow Director busy wait loop
We can reorder the busy wait loop at the start of the Flow Director
transmit function to reduce the overall code size while still retaining the
same functionality.  As such I am taking advantage of the opportunity to do
so.

Change-ID: I34c403ca001953c6ac9816e65d5305e73d869026
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-10-28 23:28:39 -07:00
Jacob Keller 65e87c0398 i40evf: support queue-specific settings for interrupt moderation
In commit a75e8005d5 ("i40e: queue-specific settings for interrupt
moderation") the i40e driver gained support for setting interrupt
moderation values per queue. This patch adds support for this feature
to the i40evf driver as well. In addition, a few changes are made to
the i40e implementation to add function header documentation comments,
as well.

This behaves in a similar fashion to the implementation in i40e. Thus,
requesting the moderation value when no queue is provided will report
queue 0 value, while setting the value without a queue will set all
queues at once.

Change-ID: I1f310a57c8e6c84a8524c178d44d1b7a6d3a848e
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24 22:50:23 -07:00
Alexander Duyck 1eb846ac90 i40e: Strip out debugfs hook for Flow Director filter programming
This  interface was only ever meant for debug only. Since it is not
supposed to be here we are removing it.

Change-ID: Id771a1e5e7d3e2b4b7f56591b61fb48c921e1d04
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24 22:41:47 -07:00
Alexander Duyck 5e02f28373 i40e: Split Flow Director descriptor config into separate function
In an effort to improve code readability I am splitting the Flow Director
filter configuration out into a separate function like we have done for the
standard xmit path.  The general idea is to provide a single block of code
that translates the flow specification into a proper Flow Director
descriptor.

Change-ID: Id355ad8030c4e6c72c57504fa09de60c976a8ffe
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24 22:40:03 -07:00
Alexander Duyck e486bdfd7c i40e/i40evf: Add txring_txq function to match fm10k and ixgbe
This patch adds a txring_txq function which allows us to convert a
i40e_ring/i40evf_ring to a netdev_tx_queue structure.  This way we
can avoid having to make a multi-line function call for all the spots
that need access to this.

Change-ID: Ic063b71d8b92ea406d2c32e798c8e2b02809d65b
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24 22:22:17 -07:00
Alexander Duyck 64bfd68eae i40e: Fix Flow Director raw_buf cleanup
The Tx cleanup flow was incorrectly assuming it could check for the flow
director bits after it had unmapped the buffer.  However in this case it
results in us trying to free a raw_buf as though it is an sk_buff.

To fix this I am moving up the flag test for the FD_SB bit so that when
find a non-NULL skb or raw_buf value we then check the flag and use the
appropriate call to free the buffer.

Change-ID: I6284034ba1ea87c9922e56f6eb3181f7f09bddde
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24 22:15:12 -07:00
Jacob Keller 234dc4e676 i40e: cleanup ATR auto_disable_flags use
Some locations that disable ATR accidentally used the "full" disable by
disabling the flag in the standard flags field. This incorrectly forces
ATR off permanently instead of temporarily disabling it. In addition,
some code locations accidentally set the ATR flag enabled when they only
meant to clear the auto_disable_flags. This results in ignoring the
user's ethtool private flag settings.

Additionally, when disabling ATR via ethtool, we did not perform a flush
of the FD table. This results in the previously assigned ATR rules still
functioning which was not expected.

Cleanup all these areas so that automatic disable uses only the
auto_disable_flag. Fix the flush code so that we can trigger a flush
even when we've disabled ATR and SB support, as otherwise the flush
doesn't work. Fix ethtool setting to actually request a flush. Fix
NETIF_F_NTUPLE flag to only clear the auto_disable setting and not
enable the full feature.

Change-ID: Ib2486111f8031bd16943e9308757b276305c03b5
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-24 19:31:48 -07:00
Alexander Duyck 841493a3f6 i40e: Limit TX descriptor count in cases where frag size is greater than 16K
The i40e driver was incorrectly assuming that we would always be pulling
no more than 1 descriptor from each fragment.  It is in fact possible for
us to end up with the case where 2 descriptors worth of data may be pulled
when a frame is larger than one of the pieces generated when aligning the
payload to either 4K or pieces smaller than 16K.

To adjust for this we just need to make certain to test all the way to the
end of the fragments as it is possible for us to span 2 descriptors in the
block before us so we need to guarantee that even the last 6 descriptors
have enough data to fill a full frame.

Change-ID: Ic2ecb4d6b745f447d334e66c14002152f50e2f99
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-09-22 22:33:41 -07:00
Carolyn Wyborny ffeac83685 i40e: refactor tail_bump check
This patch refactors tail bump check.

Change-ID: Ide0e19171d67d90cb2b06b8dcd4fa791ae120160
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-08-19 21:24:28 -07:00
David S. Miller da54bb13c0 Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Conflicts:
	drivers/net/ethernet/intel/i40e/i40e_main.c

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2016-07-22

This series contains updates to i40e and i40evf.

Heinrich Schuchardt found a possible null pointer being dereferenced in
i40e_debug_aq(), fixed the issue by doing the variable assignment after
we are sure the pointer is not null.

Avinash fixed an issue when link was down, we were not showing the
correct advertised link modes.

Mitch cleans up a useless initializer since the variable is assigned
right away.  Refactors the receive filter handling to properly track
filter adds and deletes so the driver will not lose filters during a
reset and up/down cycles.  Also added a tracking mechanism so that the
driver knows when to enter and leave promiscuous mode.

Catherine removes a device id which is not needed (or used).  Moves
a mutex lock since we need to lock the client list around the
i40e_client_release() call to prevent the release from interrupting
the client instances while they are being added.

Joshua adds Hyper-V specific VF device ids.

Amitoj Kaur Chawla cleans up a redundant memset() call before a memcpy().

Stefan Assmann adds the missing link advertise for some x710 NICs.

Tushar Dave fixes and issue found on SPARC, where a PF reset clears MAC
filters and if a platform-specific MAC address is used, the driver has
to explicitly write default MAC address to MAC filters otherwise all
incoming traffic destined to the default MAC address will be dropped
after reset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-25 10:43:07 -07:00
Mitch Williams 88dc9e6fed i40e/i40evf: remove useless initializer
This initializer isn't needed because the variable is assigned right
away.

Change-ID: I6ce3edb3f4e0364db248a7a0bcc62ca95c01d941
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-22 00:07:03 -07:00
Alexander Duyck 858296c878 i40e/i40evf: Fix i40e_rx_checksum
There are a couple of issues I found in i40e_rx_checksum while doing some
recent testing.  As a result I have found the Rx checksum logic is pretty
much broken and returning that the checksum is valid for tunnels in cases
where it is not.

First the inner types are not the correct values to use to test for if a
tunnel is present or not.  In addition the inner protocol types are not a
bitmask as such performing an OR of the values doesn't make sense.  I have
instead changed the code so that the inner protocol types are used to
determine if we report CHECKSUM_UNNECESSARY or not.  For anything that does
not end in UDP, TCP, or SCTP it doesn't make much sense to report a
checksum offload since it won't contain a checksum anyway.

This leaves us with the need to set the csum_level based on some value.
For that purpose I am using the tunnel_type field.  If the tunnel type is
GRENAT or greater then this means we have a GRE or UDP tunnel with an inner
header.  In the case of GRE or UDP we will have a possible checksum present
so for this reason it should be safe to set the csum_level to 1 to indicate
that we are reporting the state of the inner header.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-07-14 23:17:45 -07:00
Alexander Duyck bf2d1df395 intel: Add support for IPv6 IP-in-IP offload
This patch adds support for offloading IPXIP6 type packets that represent
either IPv4 or IPv6 encapsulated inside of an IPv6 outer IP header.  In
addition with this change we should also be able to support FOU
encapsulated traffic with outer IPv6 headers.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 19:25:52 -04:00
Tom Herbert 7e13318daa net: define gso types for IPx over IPv4 and IPv6
This patch defines two new GSO definitions SKB_GSO_IPXIP4 and
SKB_GSO_IPXIP6 along with corresponding NETIF_F_GSO_IPXIP4 and
NETIF_F_GSO_IPXIP6. These are used to described IP in IP
tunnel and what the outer protocol is. The inner protocol
can be deduced from other GSO types (e.g. SKB_GSO_TCPV4 and
SKB_GSO_TCPV6). The GSO types of SKB_GSO_IPIP and SKB_GSO_SIT
are removed (these are both instances of SKB_GSO_IPXIP4).
SKB_GSO_IPXIP6 will be used when support for GSO with IP
encapsulation over IPv6 is added.

Signed-off-by: Tom Herbert <tom@herbertland.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-05-20 18:03:15 -04:00
Mitch Williams a876c3ba59 i40e/i40evf: properly report Rx packet hash
This logic is inverted. If the RXHASH flag is set, then we should go
ahead and call skb_set_hash.

Change-ID: Ib2e30356dced1d3e939c8061ab6ad5bd94197e7c
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-14 00:07:40 -07:00
Jesse Brandeburg 1a557afc4d i40e: Refactor receive routine
This is part 1 of the Rx refactor series, just including
changes to i40e.

This refactor aligns the receive routine with the one in
ixgbe which was highly optimized.  This reduces the code
we have to maintain and allows for (hopefully) more readable
and maintainable RX hot path.

In order to do this:
- consolidate the receive path into a single function that doesn't
  use packet split but *does* use pages for Rx buffers.
- remove the old _1buf routine
- consolidate several routines into helper functions
- remove ethtool control over packet split

Change-ID: I5ca100721de65992aa0114f8b4bac844b84758e0
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 21:53:16 -07:00
Jesse Brandeburg b32bfa1724 i40e: Drop packet split receive routine
As part of preparation for the rx-refactor, remove the
packet split receive routine and ancillary code.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 18:52:06 -07:00
Jesse Brandeburg f8a952cb40 i40e/i40evf: Refactor tunnel interpretation
Refactor the interpretation of a tunnel.  This removes
some code and lets us start using the hardware's parsing.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-05 18:24:10 -07:00
Alexander Duyck 1c7b4a23d1 i40e/i40evf: Add support for GSO partial with UDP_TUNNEL_CSUM and GRE_CSUM
This patch makes it so that i40e and i40evf can use GSO_PARTIAL to support
segmentation for frames with checksums enabled in outer headers.  As a
result we can now send data over these types of tunnels at over 20Gb/s
versus the 12Gb/s that was previously possible on my system.

The advantage with the i40e parts is that this offload is mostly
transparent as the hardware still deals with the inner and/or outer IPv4
headers so the IP ID is still incrementing for both when this offload is
performed.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-05-01 17:05:16 -07:00
Jesse Brandeburg a149f2c323 i40e/i40evf: Only offload VLAN tag if enabled
The driver was offloading the VLAN tag into the skb
any time there was a VLAN tag and the hardware stripping was
enabled.  Just check to make sure it's enabled before put_tag.

Change-Id: Ife95290c06edd9a616393b38679923938b382241
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-27 13:05:47 -07:00
Alexander Duyck 577389a5db i40e/i40evf: Add support for IPIP and SIT offloads
Looking over the documentation it turns out enabling IPIP and SIT offloads
for i40e is pretty straightforward.  As such I decided to enable them with
this patch.  In my testing I am seeing an improvement of 8 to 10 Gb/s
for IPIP and SIT tunnels with this offload enabled.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-26 03:29:49 -07:00
David S. Miller 1602f49b58 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts were two cases of simple overlapping changes,
nothing serious.

In the UDP case, we need to add a hlist_add_tail_rcu()
to linux/rculist.h, because we've moved UDP socket handling
away from using nulls lists.

Signed-off-by: David S. Miller <davem@davemloft.net>
2016-04-23 18:51:33 -04:00
Alexander Duyck 3f3f7cb875 i40e/i40evf: Limit TSO to 7 descriptors for payload instead of 8 per packet
This patch addresses a bug introduced based on my interpretation of the
XL710 datasheet.  Specifically section 8.4.1 states that "A single transmit
packet may span up to 8 buffers (up to 8 data descriptors per packet
including both the header and payload buffers)."  It then later goes on to
say that each segment for a TSO obeys the previous rule, however it then
refers to TSO header and the segment payload buffers.

I believe the actual limit for fragments with TSO and a skbuff that has
payload data in the header portion of the buffer is actually only 7
fragments as the skb->data portion counts as 2 buffers, one for the TSO
header, and one for a segment payload buffer.

Fixes: 2d37490b82 ("i40e/i40evf: Rewrite logic for 8 descriptor per packet check")
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-13 19:59:23 -07:00
Jesse Brandeburg 1f15d66712 i40e/i40evf: Faster RX via avoiding FCoE
As it turns out, calling into other files from hot path hurts
performance a lot.  In this case the majority of the time we
call "check FCoE" and the packet is *not* FCoE, but this call
was taking 5% of our total cycles spent on receive.

Change-ID: I080552c26e7060bc7b78504dc2763f6f0b3d8c76
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06 18:26:23 -07:00
Jesse Brandeburg 84b079928a i40e/i40evf: Drop unused tx_ring argument
Some of the tx_ring arguments can be deleted since they are not used.

Change-ID: I99275b0f191d7f63ec2f05061919904940c36f31
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06 18:15:17 -07:00
Jesse Brandeburg d1bd743b5b i40e/i40evf: Move stack var deeper
A local variable could move down inside the context where it is used.

Change-ID: I9caba9e1eacf921037077f2665cbce83fd8e95d6
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-06 18:10:57 -07:00
Alexander Duyck 24d41e5e2c i40e/i40evf: Fix TSO checksum pseudo-header adjustment
With IPv4 and IPv6 now using the same format for checksums based on the
length of the frame we need to update the i40e and i40evf drivers so that
they correctly account for lengths greater than or equal to 64K.

With this patch the driver should now correctly update checksums for frames
up to 16776960 in length which should be more than large enough for all
possible TSO frames in the near future.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05 20:34:51 -07:00
Jesse Brandeburg 4ea623922d i40e/i40evf: Fix casting in transmit code
Simple cast to fix a sparse warning.

Fixes: commit 5453205cd0 ("i40e/i40evf: Enable support for
SKB_GSO_UDP_TUNNEL_CSUM")

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05 02:05:34 -07:00
Alexander Duyck a619afe814 i40e/i40evf: Add support for bulk free in Tx cleanup
This patch enables bulk Tx clean for skbs.  In order to enable it we need
to pass the napi_budget value as that is used to determine if we are truly
running in NAPI mode or if we are simply calling the routine from netpoll
with a budget of 0.  In order to avoid adding too many more variables I
thought it best to pass the VSI directly in a fashion similar to what we do
on igb and ixgbe with the q_vector.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05 01:58:53 -07:00
Alexander Duyck f2edaaaa39 i40e/i40evf: Fix handling of boolean logic in polling routines
In the polling routines for i40e and i40evf we were using bitwise operators
to avoid the side effects of the logical operators, specifically the fact
that if the first case is true with "||" we skip the second case, or if it
is false with "&&" we skip the second case.  This fixes an earlier patch
that converted the bitwise operators over to the logical operators and
instead replaces the entire thing with just an if statement since it should
be more readable what we are trying to do this way.

Fixes: 1a36d7fadd ("i40e/i40evf: use logical operators, not bitwise")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05 01:54:12 -07:00
Alexander Duyck 5c4654daf2 i40e/i40evf: Allow up to 12K bytes of data per Tx descriptor instead of 8K
From what I can tell the practical limitation on the size of the Tx data
buffer is the fact that the Tx descriptor is limited to 14 bits.  As such
we cannot use 16K as is typically used on the other Intel drivers.  However
artificially limiting ourselves to 8K can be expensive as this means that
we will consume up to 10 descriptors (1 context, 1 for header, and 9 for
payload, non-8K aligned) in a single send.

I propose that we can reduce this by increasing the maximum data for a 4K
aligned block to 12K.  We can reduce the descriptors used for a 32K aligned
block by 1 by increasing the size like this.  In addition we still have the
4K - 1 of space that is still unused.  We can use this as a bit of extra
padding when dealing with data that is not aligned to 4K.

By aligning the descriptors after the first to 4K we can improve the
efficiency of PCIe accesses as we can avoid using byte enables and can fetch
full TLP transactions after the first fetch of the buffer.  This helps to
improve PCIe efficiency.  Below is the results of testing before and after
with this patch:

Recv   Send   Send                         Utilization      Service Demand
Socket Socket Message  Elapsed             Send     Recv    Send    Recv
Size   Size   Size     Time    Throughput  local    remote  local   remote
bytes  bytes  bytes    secs.   10^6bits/s  % S      % U     us/KB   us/KB
Before:
87380  16384  16384    10.00     33682.24  20.27    -1.00   0.592   -1.00
After:
87380  16384  16384    10.00     34204.08  20.54    -1.00   0.590   -1.00

So the net result of this patch is that we have a small gain in throughput
due to a reduction in overhead for putting together the frame.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-04-05 01:28:44 -07:00
Kan Liang a75e8005d5 i40e: queue-specific settings for interrupt moderation
For i40e driver, each vector has its own ITR register. However, there
are no concept of queue-specific settings in the driver proper. Only
global variable is used to store ITR values. That will cause problems
especially when resetting the vector. The specific ITR values could be
lost.
This patch move rx_itr_setting and tx_itr_setting to i40e_ring to store
specific ITR register for each queue.
i40e_get_coalesce and i40e_set_coalesce are also modified accordingly to
support queue-specific settings. To make it compatible with old ethtool,
if user doesn't specify the queue number, i40e_get_coalesce will return
queue 0's value. While i40e_set_coalesce will apply value to all queues.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Acked-by: Shannon Nelson <shannon.nelson@intel.com>
Acked-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-02-19 22:54:10 -05:00
Alexander Duyck 3bc67973e8 i40e/i40evf: Move Tx checksum closer to TSO
On all of the other Intel drivers we place checksum close to TSO as they
have a significant amount in common and it can help to reduce the decision
tree for how to handle the frame as the first check in TSO is to see if
checksumming is offloaded, and if it is not we can skip _BOTH_ TSO and Tx
checksum offload based on a single check.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 23:30:19 -08:00
Alexander Duyck 2d37490b82 i40e/i40evf: Rewrite logic for 8 descriptor per packet check
This patch is meant to rewrite the logic for how we determine if we can
transmit the frame or if it needs to be linearized.

The previous code for this function was using a mix of division and modulus
division as a part of computing if we need to take the slow path.  Instead
I have replaced this by simply working with a sliding window which will
tell us if the frame would be capable of causing a single packet to span
several descriptors.

The logic for the scan is fairly simple.  If any given group of 6 fragments
is less than gso_size - 1 then it is possible for us to have one byte
coming out of the first fragment, 6 fragments, and one or more bytes coming
out of the last fragment.  This gives us a total of 8 fragments
which exceeds what we can allow so we send such frames to be linearized.

Arguably the use of modulus might be more exact as the approach I propose
may generate some false positives.  However the likelihood of us taking much
of a hit for those false positives is fairly low, and I would rather not
add more overhead in the case where we are receiving a frame composed of 4K
pages.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 23:27:05 -08:00
Alexander Duyck 4ec441df25 i40e/i40evf: Break up xmit_descriptor_count from maybe_stop_tx
In an upcoming patch I would like to have access to the descriptor count
used for the data portion of the frame.  For this reason I am splitting up
the descriptor count function from the function that stops the ring.

Also in order to try and reduce unnecessary duplication of code I am moving
the slow-path portions of the code out of being inline calls so that we can
just jump to them and process them instead of having to build them into
each function that calls them.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 23:23:51 -08:00
Alexander Duyck ffcc55c0c2 i40e: Add support for ATR w/ IPv6 extension headers
This patch updates the code for determining the L4 protocol and L3 header
length so that when IPv6 extension headers are being used we can determine
the offset and type of the L4 protocol.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:57:26 -08:00
Alexander Duyck 6b037cd465 i40e: Fix ATR in relation to tunnels
This patch contains a number of fixes to make certain that we are using
the correct protocols when parsing both the inner and outer headers of a
frame that is mixed between IPv4 and IPv6 for inner and outer.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
2016-02-18 15:01:35 -08:00