linux/drivers
Jakub Kicinski 02a97e02c6 mlx5-updates-2022-10-24
SW steering updates from Yevgeny Kliteynik:
 
 1) 1st Four patches: small fixes / optimizations for SW steering:
 
  - Patch 1: Don't abort destroy flow if failed to destroy table - continue
    and free everything else.
  - Patches 2 and 3 deal with fast teardown:
     + Skip sync during fast teardown, as PCI device is not there any more.
     + Check device state when polling CQ - otherwise SW steering keeps polling
       the CQ forever, because nobody is there to flush it.
  - Patch 4: Removing unneeded function argument.
 
 2) Deal with the hiccups that we get during rules insertion/deletion,
 which sometimes reach 1/4 of a second. While insertion/deletion rate
 improvement was not the focus here, it still is a by-product of removing these
 hiccups.
 
 Another by-product is the reduced standard deviation in measuring the duration
 of rules insertion/deletion bursts.
 
 In the testing we add K rules (warm-up phase), and then continuously do
 insertion/deletion bursts of N rules.
 During the test execution, the driver measures hiccups (amount and duration)
 and total time for insertion/deletion of a batch of rules.
 
 Here are some numbers, before and after these patches:
 
 +--------------------------------------------+-----------------+----------------+
 |                                            |   Create rules  |  Delete rules  |
 |                                            +--------+--------+--------+-------+
 |                                            | Before |  After | Before | After |
 +--------------------------------------------+--------+--------+--------+-------+
 | Max hiccup [msec]                          |    253 |     42 |    254 |    68 |
 +--------------------------------------------+--------+--------+--------+-------+
 | Avg duration of 10K rules add/remove [msec]| 140.07 | 124.32 | 106.99 | 99.51 |
 +--------------------------------------------+--------+--------+--------+-------+
 | Num of hiccups per 100K rules add/remove   |   7.77 |   7.97 |  12.60 | 11.57 |
 +--------------------------------------------+--------+--------+--------+-------+
 | Avg hiccup duration [msec]                 |  36.92 |  33.25 |  36.15 | 33.74 |
 +--------------------------------------------+--------+--------+--------+-------+
 
  - Patch 5: Allocate a short array on stack instead of dynamically- it is
    destroyed at the end of the function.
  - Patch 6: Rather than cleaning the corresponding chunk's section of
    ste_arrays on chunk deletion, initialize these areas upon chunk creation.
    Chunk destruction tend to come in large batches (during pool syncing),
    so instead of doing huge memory initialization during pool sync,
    we amortize this by doing small initsializations on chunk creation.
  - Patch 7: In order to simplifies error flow and allows cleaner addition
    of new pools, handle creation/destruction of all the domain's memory pools
    and other memory-related fields in a separate init/uninit functions.
  - Patch 8: During rehash, write each table row immediately instead of waiting
    for the whole table to be ready and writing it all - saves allocations
    of ste_send_info structures and improves performance.
  - Patch 9: Instead of allocating/freeing send info objects dynamically,
    manage them in pool. The number of send info objects doesn't depend on
    number of rules, so after pre-populating the pool with an initial batch of
    send info objects, the pool is not expected to grow.
    This way we save alloc/free during writing STEs to ICM, which by itself can
    sometimes take up to 40msec.
  - Patch 10: Allocate icm_chunks from their own slab allocator, which lowered
    the alloc/free "hiccups" frequency.
  - Patch 11: Similar to patch 9, allocate htbl from its own slab allocator.
  - Patch 12: Lower sync threshold for ICM hot memory - set the threshold for
    sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have
    more syncs, each     sync will be shorter and will help with insertion rate
    stability. Also, notice that the overall number of hiccups wasn't increased
    due to all the other patches.
  - Patch 13: Keep track of hot ICM chunks in an array instead of list.
    After steering sync, we traverse the hot list and finally free all the
    chunks. It appears that traversing a long list takes unusually long time
    due to cache misses on many entries, which causes a big "hiccup" during
    rule insertion. This patch replaces the list with pre-allocated array that
    stores only the bookkeeping information that is needed to later free the
    chunks in its buddy allocator.
  - Patch 14: Remove the unneeded buddy used_list - we don't need to have the
    list of used chunks, we only need the total amount of used memory.
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGhZs6bAKwk/OTgTpSD+KveBX+j4FAmNamsEACgkQSD+KveBX
 +j6hWgf/ec6O0Ige4Az9quVtN1YiLRpcA4RJs5prV/2qyqcjzpkumTLpWgFzD6SM
 T7uz/lQY4/JTLAkFNQBE6aynjtFfUP7bJ2LulqE+8QXBmHoHndHA+S3ZBGAmjLgK
 9tY73Bb5qxsHCzEvaab+UxEIWXiBPtaNaw5mkzKKO5ULCplVl1loKxVEmLO1ri7j
 fa7G7I1VHgSg6/7GWPzMN9tsR8b927H9gdRw3atTC91T8jgwf+9YYXmhd4Bj2Dk0
 uB1n4AyVCcLxGZiFtNHUSyBNIvFwaO87DzBDDftIJPSMcJvRJxeBx0692Z7sZdE0
 cfta+4bPDpjHVNN2slcYgzJu/jDh8w==
 =5XcL
 -----END PGP SIGNATURE-----

Merge tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2022-10-24

SW steering updates from Yevgeny Kliteynik:

1) 1st Four patches: small fixes / optimizations for SW steering:

 - Patch 1: Don't abort destroy flow if failed to destroy table - continue
   and free everything else.
 - Patches 2 and 3 deal with fast teardown:
    + Skip sync during fast teardown, as PCI device is not there any more.
    + Check device state when polling CQ - otherwise SW steering keeps polling
      the CQ forever, because nobody is there to flush it.
 - Patch 4: Removing unneeded function argument.

2) Deal with the hiccups that we get during rules insertion/deletion,
which sometimes reach 1/4 of a second. While insertion/deletion rate
improvement was not the focus here, it still is a by-product of removing these
hiccups.

Another by-product is the reduced standard deviation in measuring the duration
of rules insertion/deletion bursts.

In the testing we add K rules (warm-up phase), and then continuously do
insertion/deletion bursts of N rules.
During the test execution, the driver measures hiccups (amount and duration)
and total time for insertion/deletion of a batch of rules.

Here are some numbers, before and after these patches:

+--------------------------------------------+-----------------+----------------+
|                                            |   Create rules  |  Delete rules  |
|                                            +--------+--------+--------+-------+
|                                            | Before |  After | Before | After |
+--------------------------------------------+--------+--------+--------+-------+
| Max hiccup [msec]                          |    253 |     42 |    254 |    68 |
+--------------------------------------------+--------+--------+--------+-------+
| Avg duration of 10K rules add/remove [msec]| 140.07 | 124.32 | 106.99 | 99.51 |
+--------------------------------------------+--------+--------+--------+-------+
| Num of hiccups per 100K rules add/remove   |   7.77 |   7.97 |  12.60 | 11.57 |
+--------------------------------------------+--------+--------+--------+-------+
| Avg hiccup duration [msec]                 |  36.92 |  33.25 |  36.15 | 33.74 |
+--------------------------------------------+--------+--------+--------+-------+

 - Patch 5: Allocate a short array on stack instead of dynamically- it is
   destroyed at the end of the function.
 - Patch 6: Rather than cleaning the corresponding chunk's section of
   ste_arrays on chunk deletion, initialize these areas upon chunk creation.
   Chunk destruction tend to come in large batches (during pool syncing),
   so instead of doing huge memory initialization during pool sync,
   we amortize this by doing small initsializations on chunk creation.
 - Patch 7: In order to simplifies error flow and allows cleaner addition
   of new pools, handle creation/destruction of all the domain's memory pools
   and other memory-related fields in a separate init/uninit functions.
 - Patch 8: During rehash, write each table row immediately instead of waiting
   for the whole table to be ready and writing it all - saves allocations
   of ste_send_info structures and improves performance.
 - Patch 9: Instead of allocating/freeing send info objects dynamically,
   manage them in pool. The number of send info objects doesn't depend on
   number of rules, so after pre-populating the pool with an initial batch of
   send info objects, the pool is not expected to grow.
   This way we save alloc/free during writing STEs to ICM, which by itself can
   sometimes take up to 40msec.
 - Patch 10: Allocate icm_chunks from their own slab allocator, which lowered
   the alloc/free "hiccups" frequency.
 - Patch 11: Similar to patch 9, allocate htbl from its own slab allocator.
 - Patch 12: Lower sync threshold for ICM hot memory - set the threshold for
   sync to 1/4 of the pool instead of 1/2 of the pool. Although we will have
   more syncs, each     sync will be shorter and will help with insertion rate
   stability. Also, notice that the overall number of hiccups wasn't increased
   due to all the other patches.
 - Patch 13: Keep track of hot ICM chunks in an array instead of list.
   After steering sync, we traverse the hot list and finally free all the
   chunks. It appears that traversing a long list takes unusually long time
   due to cache misses on many entries, which causes a big "hiccup" during
   rule insertion. This patch replaces the list with pre-allocated array that
   stores only the bookkeeping information that is needed to later free the
   chunks in its buddy allocator.
 - Patch 14: Remove the unneeded buddy used_list - we don't need to have the
   list of used chunks, we only need the total amount of used memory.

* tag 'mlx5-updates-2022-10-24' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux:
  net/mlx5: DR, Remove the buddy used_list
  net/mlx5: DR, Keep track of hot ICM chunks in an array instead of list
  net/mlx5: DR, Lower sync threshold for ICM hot memory
  net/mlx5: DR, Allocate htbl from its own slab allocator
  net/mlx5: DR, Allocate icm_chunks from their own slab allocator
  net/mlx5: DR, Manage STE send info objects in pool
  net/mlx5: DR, In rehash write the line in the entry immediately
  net/mlx5: DR, Handle domain memory resources init/uninit separately
  net/mlx5: DR, Initialize chunk's ste_arrays at chunk creation
  net/mlx5: DR, For short chains of STEs, avoid allocating ste_arr dynamically
  net/mlx5: DR, Remove unneeded argument from dr_icm_chunk_destroy
  net/mlx5: DR, Check device state when polling CQ
  net/mlx5: DR, Fix the SMFS sync_steering for fast teardown
  net/mlx5: DR, In destroy flow, free resources even if FW command failed
====================

Link: https://lore.kernel.org/r/20221027145643.6618-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-10-28 22:07:48 -07:00
..
accessibility
acpi platform-drivers-x86 for v6.1-2 2022-10-25 12:05:08 -07:00
amba
android Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
ata ata: ahci_qoriq: Fix compilation warning 2022-10-18 08:02:14 +09:00
atm
auxdisplay
base Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
bcma wireless-next patches for v6.2 2022-10-28 18:31:40 -07:00
block block-6.1-2022-10-20 2022-10-21 15:14:14 -07:00
bluetooth
bus
cdrom
char This push fixes an issue exposed by the recent change to feed 2022-10-17 10:20:04 -07:00
clk This is the final part of the clk patches for this merge window. 2022-10-16 11:08:19 -07:00
clocksource A boring time, timekeeping, timers update: 2022-10-10 10:16:00 -07:00
comedi
connector
counter
cpufreq cpufreq: sun50i: Switch to use dev_err_probe() helper 2022-10-18 16:22:26 +05:30
cpuidle RISC-V Patches for the 6.1 Merge Window, Part 1 2022-10-09 13:24:01 -07:00
crypto This update includes the following changes: 2022-10-10 13:04:25 -07:00
cxl
dax libnvdimm for 6.1 2022-10-14 18:41:41 -07:00
dca
devfreq
dio
dma treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
dma-buf
edac Merge patch series "Use composable cache instead of L2 cache" 2022-10-13 11:07:13 -07:00
eisa
extcon
firewire
firmware efi: runtime: Don't assume virtual mappings are missing if VA == PA == 0 2022-10-21 11:09:41 +02:00
fpga
fsi
gnss
gpio Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
gpu 17 hotfixes, mainly for MM. 5 are cc:stable and the remainder address 2022-10-21 12:33:03 -07:00
greybus
hid for-linus-2022102101 2022-10-21 17:41:57 -07:00
hsi
hte
hv hyperv-next for 6.1 2022-10-10 13:59:01 -07:00
hwmon - Use the correct CPU capability clearing function on the error path in 2022-10-23 10:01:34 -07:00
hwspinlock
hwtracing
i2c i2c: mlxbf: depend on ACPI; clean away ifdeffage 2022-10-21 07:59:35 +02:00
i3c i3c: master: Remove the wrong place of reattach. 2022-10-12 23:45:29 +02:00
idle
iio
infiniband treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
input Input updates for 6.1 merge window: 2022-10-11 10:53:25 -07:00
interconnect
iommu iommu/vt-d: Clean up si_domain in the init_dmars() error path 2022-10-21 10:49:35 +02:00
ipack
irqchip Interrupt subsystem updates: 2022-10-12 10:23:24 -07:00
isdn mISDN: hfcpci: Fix use-after-free bug in hfcpci_softirq 2022-10-09 19:11:54 +01:00
leds leds: simatic-ipc-leds-gpio: fix incorrect LED to GPIO mapping 2022-10-24 11:32:10 +02:00
macintosh powerpc updates for 6.1 2022-10-09 14:05:15 -07:00
mailbox
mcb
md dm clone: Fix typo in block_device format specifier 2022-10-18 17:17:48 -04:00
media media: vivid: set num_in/outputs to 0 if not supported 2022-10-25 16:43:34 +01:00
memory
memstick
message
mfd Revert "mfd: syscon: Remove repetition of the regmap_get_val_endian()" 2022-10-23 12:04:56 -07:00
misc treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
mmc Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
most
mtd Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
mux
net mlx5-updates-2022-10-24 2022-10-28 22:07:48 -07:00
nfc nfc: s3fwrn5: use devm_clk_get_optional_enabled() helper 2022-10-28 11:27:59 +01:00
ntb
nubus
nvdimm libnvdimm for 6.1 2022-10-14 18:41:41 -07:00
nvme block-6.1-2022-10-20 2022-10-21 15:14:14 -07:00
nvmem
of Devicetree updates for v6.1: 2022-10-10 13:13:51 -07:00
opp
parisc parisc architecture fixes and updates for kernel v6.1-rc1: 2022-10-14 12:10:01 -07:00
parport
pci Revert "PCI: tegra: Use PCI_CONF1_EXT_ADDRESS() macro" 2022-10-17 12:11:09 -05:00
pcmcia
peci
perf arm64 fixes: 2022-10-14 12:38:03 -07:00
phy pci-v6.1-changes 2022-10-11 11:08:18 -07:00
pinctrl pinctrl: ocelot: Fix incorrect trigger of the interrupt. 2022-10-18 10:42:10 +02:00
platform platform/x86/intel: pmc/core: Add Raptor Lake support to pmc core driver 2022-10-24 11:39:27 +02:00
pnp Merge branches 'acpi-apei', 'acpi-wakeup', 'acpi-reboot' and 'acpi-thermal' 2022-10-10 18:11:11 +02:00
power
powercap Scheduler changes for v6.1: 2022-10-10 09:10:28 -07:00
pps
ps3
ptp ptp: ocp: remove flash image header check fallback 2022-10-24 13:10:40 +01:00
pwm
rapidio
ras
regulator
remoteproc
reset
rpmsg
rtc RTC for 6.1 2022-10-14 18:36:42 -07:00
s390 s390 updates for the 6.1 merge window #2 2022-10-14 11:36:05 -07:00
sbus
scsi scsi: mpt3sas: re-do lost mpt3sas DMA mask fix 2022-10-25 00:33:16 -07:00
sh
siox
slimbus
soc Merge patch series "Use composable cache instead of L2 cache" 2022-10-13 11:07:13 -07:00
soundwire
spi spi: Fixes for v6.1 2022-10-26 17:38:46 -07:00
spmi
ssb
staging wireless-next patches for v6.2 2022-10-28 18:31:40 -07:00
target treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
tc
tee - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
thermal thermal: intel_powerclamp: Use first online CPU as control_cpu 2022-10-15 19:33:57 +02:00
thunderbolt treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
tty parisc architecture fixes and updates for kernel v6.1-rc1: 2022-10-14 12:10:01 -07:00
ufs
uio
usb - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in 2022-10-10 17:53:04 -07:00
vdpa virtio: fixes, features 2022-10-10 14:02:53 -07:00
vfio VFIO updates for v6.1-rc1 2022-10-12 14:46:48 -07:00
vhost virtio: fixes, features 2022-10-10 14:02:53 -07:00
video Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
virt
virtio virtio_pci: use irq to detect interrupt support 2022-10-13 09:33:03 -04:00
vlynq
w1
watchdog linux-watchdog 6.1-rc2 tag 2022-10-21 12:25:39 -07:00
xen xen: branch for v6.1-rc2 2022-10-21 14:43:09 -07:00
zorro
Kconfig
Makefile