linux/lib
Bitao Hu d7037381d0 watchdog/softlockup: Low-overhead detection of interrupt storm
The following softlockup is caused by interrupt storm, but it cannot be
identified from the call tree. Because the call tree is just a snapshot
and doesn't fully capture the behavior of the CPU during the soft lockup.
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  ...
  Call trace:
    __do_softirq+0xa0/0x37c
    __irq_exit_rcu+0x108/0x140
    irq_exit+0x14/0x20
    __handle_domain_irq+0x84/0xe0
    gic_handle_irq+0x80/0x108
    el0_irq_naked+0x50/0x58

Therefore, it is necessary to report CPU utilization during the
softlockup_threshold period (report once every sample_period, for a total
of 5 reportings), like this:
  watchdog: BUG: soft lockup - CPU#28 stuck for 23s! [fio:83921]
  CPU#28 Utilization every 4s during lockup:
    #1: 0% system, 0% softirq, 100% hardirq, 0% idle
    #2: 0% system, 0% softirq, 100% hardirq, 0% idle
    #3: 0% system, 0% softirq, 100% hardirq, 0% idle
    #4: 0% system, 0% softirq, 100% hardirq, 0% idle
    #5: 0% system, 0% softirq, 100% hardirq, 0% idle
  ...

This is helpful in determining whether an interrupt storm has occurred or
in identifying the cause of the softlockup. The criteria for determination
are as follows:

  a. If the hardirq utilization is high, then interrupt storm should be
     considered and the root cause cannot be determined from the call tree.
  b. If the softirq utilization is high, then the call might not necessarily
     point at the root cause.
  c. If the system utilization is high, then analyzing the root
     cause from the call tree is possible in most cases.

The mechanism requires a considerable amount of global storage space
when configured for the maximum number of CPUs. Therefore, adding a
SOFTLOCKUP_DETECTOR_INTR_STORM Kconfig knob that defaults to "yes"
if the max number of CPUs is <= 128.

Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Liu Song <liusong@linux.alibaba.com>
Link: https://lore.kernel.org/r/20240411074134.30922-5-yaoma@linux.alibaba.com
2024-04-12 17:08:05 +02:00
..
842
crypto crypto: lib/mpi - Fix unexpected pointer access in mpi_ec_init 2023-12-22 12:30:19 +08:00
dim linux/dim: Do nothing if no time delta between samples 2023-05-09 11:06:45 +02:00
fonts fbdev fixes and cleanups for 6.9-rc1: 2024-03-22 10:09:08 -07:00
kunit kunit: test: Log the correct filter string in executor_test 2024-02-27 15:25:50 -07:00
lz4
lzo
math mul_u64_u64_div_u64: increase precision by conditionally swapping a and b 2024-03-12 13:09:22 -07:00
pldmfw lib: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:53 -07:00
raid6 - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min 2024-03-14 18:03:09 -07:00
reed_solomon treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_fortify string: Remove strlcpy() 2024-01-19 11:59:11 -08:00
vdso vdso: Improve cmd_vdso_check to check all dynamic relocations 2023-03-21 21:15:34 +01:00
xz arch: Remove Itanium (IA-64) architecture 2023-09-11 08:13:17 +00:00
zlib_deflate lib/zlib: DFLTCC deflate does not write all available bits for Z_NO_FLUSH 2023-02-27 17:00:14 -08:00
zlib_dfltcc lib/zlib: remove redundation assignement of avail_in dfltcc_gdht() 2023-02-02 22:50:10 -08:00
zlib_inflate lib/zlib: Split deflate and inflate states for DFLTCC 2023-02-02 22:50:09 -08:00
zstd zstd: Fix array-index-out-of-bounds UBSAN warning 2023-11-14 17:12:52 -08:00
.gitignore
Kconfig PCI: Move pci_iomap.c to drivers/pci/ 2024-02-12 10:35:40 -06:00
Kconfig.debug watchdog/softlockup: Low-overhead detection of interrupt storm 2024-04-12 17:08:05 +02:00
Kconfig.kasan treewide: update LLVM Bugzilla links 2024-02-22 15:38:51 -08:00
Kconfig.kcsan Kernel concurrency sanitizer (KCSAN) updates for v6.3 2023-02-25 13:02:20 -08:00
Kconfig.kfence mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile 2023-12-05 11:14:40 +01:00
Kconfig.kgdb vt: remove superfluous CONFIG_HW_CONSOLE 2024-01-27 19:03:51 -08:00
Kconfig.kmsan mm/slab: remove CONFIG_SLAB from all Kconfig and Makefile 2023-12-05 11:14:40 +01:00
Kconfig.ubsan ubsan: Disable signed integer overflow sanitizer on GCC < 8 2024-03-18 11:24:14 -07:00
Makefile pci-v6.9-changes 2024-03-14 10:58:27 -07:00
argv_split.c argv_split: fix kernel-doc warnings 2023-09-19 13:21:33 -07:00
ashldi3.c
ashrdi3.c
asn1_decoder.c
asn1_encoder.c
assoc_array.c assoc_array: fix the return value in assoc_array_insert_mid_shortcut() 2024-03-12 13:09:23 -07:00
atomic64.c
atomic64_test.c
audit.c
base64.c
bcd.c
bch.c lib/bch.c: use bitrev instead of internal logic 2023-08-18 10:18:58 -07:00
bitfield_kunit.c
bitmap-str.c lib/bitmap: split-out string-related operations to a separate files 2023-10-14 20:25:22 -07:00
bitmap.c cpumask: add cpumask_weight_andnot() 2024-02-01 13:06:40 +01:00
bitrev.c
bootconfig-data.S
bootconfig.c
bsearch.c
btree.c btree: remove MODULE_LICENSE in non-modules 2023-04-13 13:13:54 -07:00
bucket_locks.c
bug.c cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG 2023-01-31 15:01:45 +01:00
build_OID_registry
buildid.c - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min 2024-03-14 18:03:09 -07:00
bust_spinlocks.c
check_signature.c
checksum.c
checksum_kunit.c kunit: Fix again checksum tests on big endian CPUs 2024-02-29 09:16:02 -08:00
closure.c closures: CLOSURE_CALLBACK() to fix type punning 2023-11-24 00:29:58 -05:00
clz_ctz.c lib/clz_ctz.c: Fix __clzdi2() and __ctzdi2() for 32-bit kernels 2023-08-25 13:22:10 -07:00
clz_tab.c
cmdline.c
cmdline_kunit.c lib/cmdline: Fix an invalid format specifier in an assertion msg 2024-02-27 15:25:56 -07:00
cmpdi2.c
compat_audit.c
cpu_rmap.c lib: cpu_rmap: Fix potential use-after-free in irq_cpu_rmap_release() 2023-06-07 21:25:00 -07:00
cpumask.c bitmap patches for v6.7 2023-11-03 07:08:36 -10:00
cpumask_kunit.c cpumask: re-introduce constant-sized cpumask optimizations 2023-03-05 14:30:34 -08:00
crc-ccitt.c lib: crc_ccitt_false() is identical to crc_itu_t() 2023-12-29 12:22:26 -08:00
crc-itu-t.c
crc-t10dif.c
crc4.c
crc7.c
crc8.c
crc16.c
crc32.c
crc32defs.h
crc32test.c
crc64-rocksoft.c
crc64.c
ctype.c
debug_info.c
debug_locks.c
debugobjects.c debugobjects: Stop accessing objects after releasing hash bucket lock 2023-11-22 10:41:46 +01:00
dec_and_lock.c perf: Fix perf_event_pmu_context serialization 2023-01-31 20:37:18 +01:00
decompress.c
decompress_bunzip2.c
decompress_inflate.c decompressor: provide missing prototypes 2023-06-09 17:44:17 -07:00
decompress_unlz4.c
decompress_unlzma.c
decompress_unlzo.c
decompress_unxz.c arch: Remove Itanium (IA-64) architecture 2023-09-11 08:13:17 +00:00
decompress_unzstd.c decompressor: provide missing prototypes 2023-06-09 17:44:17 -07:00
devmem_is_allowed.c lib: devmem_is_allowed: include linux/io.h 2023-06-09 17:44:15 -07:00
devres.c PCI: Move PCI-specific devres code to drivers/pci/ 2024-02-12 10:36:17 -06:00
dhry.h lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_1.c lib: dhry: use ktime_ms_delta() helper 2024-02-22 15:38:52 -08:00
dhry_2.c lib: add Dhrystone benchmark test 2023-02-02 22:50:01 -08:00
dhry_run.c lib: dhry: remove unneeded <linux/mutex.h> 2024-02-22 15:38:52 -08:00
digsig.c
dump_stack.c dump_stack: Do not get cpu_sync for panic CPU 2024-02-07 17:23:19 +01:00
dynamic_debug.c dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace() 2024-03-06 13:07:39 -08:00
dynamic_queue_limits.c net: dqs: add NIC stall detector based on BQL 2024-03-08 10:23:26 +00:00
earlycpio.c
errname.c parisc: Drop the HP-UX ENOSYM and EREMOTERELEASE error codes 2023-11-25 09:43:18 +01:00
error-inject.c lib: error-inject: remove error checking for debugfs_create_dir() 2023-08-18 10:18:55 -07:00
errseq.c
extable.c
fault-inject-usercopy.c
fault-inject.c fault-inject: allow configuration via configfs 2023-04-13 07:38:54 -06:00
fdt.c
fdt_addresses.c
fdt_empty_tree.c
fdt_ro.c
fdt_rw.c
fdt_strerror.c
fdt_sw.c
fdt_wip.c
find_bit.c cpumask: introduce for_each_cpu_or 2023-03-19 10:02:04 -07:00
find_bit_benchmark.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
flex_proportions.c flex_proportions: remove unused fprop_local_single 2024-02-22 15:38:52 -08:00
fortify_kunit.c fortify: Improve buffer overflow reporting 2024-02-29 13:38:02 -08:00
fw_table.c lib/firmware_table: Provide buffer length argument to cdat_table_parse() 2024-03-13 00:03:21 -07:00
gen_crc32table.c
gen_crc64table.c
genalloc.c Devicetree include cleanups for v6.6: 2023-08-30 17:04:28 -07:00
generic-radix-tree.c lib/generic-radix-tree.c: Make nodes more reasonably sized 2024-03-13 21:22:26 -04:00
glob.c
globtest.c
group_cpus.c lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly 2023-12-06 16:12:46 -08:00
hashtable_test.c lib/hashtable_test.c: add test for the hashtable structure 2023-02-08 14:28:17 -07:00
hexdump.c
hweight.c
idr.c ida: Fix crash in ida_free when the bitmap is empty 2023-12-21 10:02:28 -08:00
inflate.c
interval_tree.c interval-tree: Add a utility to iterate over spans in an interval tree 2022-11-29 16:34:15 -04:00
interval_tree_test.c
iomap.c
iomap_copy.c
iommu-helper.c
iov_iter.c vfs-6.9.misc 2024-03-11 09:38:17 -07:00
irq_poll.c
irq_regs.c
is_signed_type_kunit.c lib: assume char is unsigned 2022-11-19 00:56:15 +01:00
is_single_threaded.c
kasprintf.c
kfifo.c
klist.c
kobject.c Revert "kobject: Remove redundant checks for whether ktype is NULL" 2024-02-08 16:39:25 +00:00
kobject_uevent.c kobject: reduce uevent_sock_mutex scope 2024-02-17 16:20:41 +01:00
kstrtox.c kstrtox: consistently use _tolower() 2023-08-21 13:46:25 -07:00
kstrtox.h
kunit_iov_iter.c iov_iter: Kunit tests for page extraction 2023-09-09 15:11:49 -07:00
libcrc32c.c libcrc32c: remove crc32c_impl 2023-04-17 18:01:23 +02:00
linear_ranges.c
list-test.c list: test: Test the klist structure 2023-03-31 09:21:35 -06:00
list_debug.c list: Introduce CONFIG_LIST_HARDENED 2023-08-15 14:57:25 -07:00
list_sort.c
llist.c llist: add llist_del_first_this() 2023-10-16 12:44:06 -04:00
locking-selftest-hardirq.h
locking-selftest-mutex.h
locking-selftest-rlock-hardirq.h
locking-selftest-rlock-softirq.h
locking-selftest-rlock.h
locking-selftest-rsem.h
locking-selftest-rtmutex.h
locking-selftest-softirq.h
locking-selftest-spin-hardirq.h
locking-selftest-spin-softirq.h
locking-selftest-spin.h
locking-selftest-wlock-hardirq.h
locking-selftest-wlock-softirq.h
locking-selftest-wlock.h
locking-selftest-wsem.h
locking-selftest.c lockdep/selftests: Use SBRM APIs for wait context tests 2023-07-26 12:29:13 +02:00
lockref.c lockref: stop doing cpu_relax in the cmpxchg loop 2023-01-13 14:35:38 -06:00
logic_iomem.c
logic_pio.c minmax: add in_range() macro 2023-08-24 16:20:18 -07:00
lru_cache.c lru_cache: remove unused lc_private, lc_set, lc_index_of 2022-11-22 19:38:39 -07:00
lshrdi3.c
lwq.c lib: add light-weight queuing mechanism. 2023-10-16 12:44:06 -04:00
maple_tree.c - Sumanth Korikkar has taught s390 to allocate hotplug-time page frames 2024-03-14 17:43:30 -07:00
memcat_p.c
memcpy_kunit.c Revert "kunit: memcpy: Split slow memcpy tests into MEMCPY_SLOW_KUNIT_TEST" 2024-03-18 11:24:15 -07:00
memory-notifier-error-inject.c
memregion.c
memweight.c
muldi3.c
net_utils.c mac_pton: Clean up the header inclusions 2023-06-06 13:18:32 +02:00
netdev-notifier-error-inject.c
nlattr.c netlink: add nla be16/32 types to minlen array 2024-02-22 19:01:55 -08:00
nmi_backtrace.c nmi_backtrace: allow excluding an arbitrary CPU 2023-08-18 10:19:00 -07:00
notifier-error-inject.c lib: remove error checking for debugfs_create_dir() 2023-08-18 10:18:55 -07:00
notifier-error-inject.h
objagg.c
objpool.c lib: objpool: fix head overrun on RK3588 SBC 2023-12-01 14:53:55 +09:00
of-reconfig-notifier-error-inject.c
oid_registry.c lib/oid_registry.c: remove redundant assignment to variable num 2022-11-18 13:55:06 -08:00
once.c
overflow_kunit.c overflow: Change DEFINE_FLEX to take __counted_by member 2024-03-22 16:25:31 -07:00
packing.c lib: packing: remove MODULE_LICENSE in non-modules 2023-03-09 23:08:04 -08:00
parman.c
parser.c lib: parser: update documentation for match_NUMBER functions 2023-03-02 21:54:22 -08:00
percpu-refcount.c percpu-refcount: Use call_rcu_hurry() for atomic switch 2022-11-30 13:16:40 -08:00
percpu_counter.c percpu_counter: extend _limited_add() to negative amounts 2023-10-18 14:34:14 -07:00
percpu_test.c
plist.c
pm-notifier-error-inject.c
polynomial.c
radix-tree.c radix tree: remove unused variable 2023-08-21 13:07:22 -07:00
radix-tree.h radix-tree: move declarations to header 2023-06-12 11:31:50 -07:00
random32.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
ratelimit.c
rbtree.c lib/rbtree: use '+' instead of '|' for setting color. 2023-04-18 16:39:33 -07:00
rbtree_test.c
rcuref.c locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath() 2023-10-10 10:14:27 +02:00
ref_tracker.c lib/ref_tracker: remove warnings in case of allocation failure 2023-06-05 15:28:42 -07:00
refcount.c
rhashtable.c rhashtable: Allow rhashtable to be used from irq-safe contexts 2022-12-09 10:42:56 +00:00
sbitmap.c sbitmap: remove stale comment in sbq_calc_wake_batch 2024-01-15 07:23:50 -07:00
scatterlist.c scatterlist: add missing function params to kernel-doc 2023-09-19 13:21:33 -07:00
seq_buf.c seq_buf: Fix kernel documentation 2024-02-15 12:17:28 -05:00
sg_pool.c
sg_split.c
siphash.c
siphash_kunit.c siphash: Convert selftest to KUnit 2022-11-01 10:04:52 -07:00
slub_kunit.c linux-kselftest-kunit-next-6.2-rc1 2022-12-12 16:42:57 -08:00
smp_processor_id.c
sort.c lib/sort: optimize heapsort with double-pop variation 2024-02-22 15:38:52 -08:00
stackdepot.c lib/stackdepot: off by one in depot_fetch_stack() 2024-03-04 17:01:17 -08:00
stackinit_kunit.c - Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min 2024-03-14 18:03:09 -07:00
stmp_device.c
strcat_kunit.c string: Add Kunit tests for strcat() family 2023-05-16 14:08:02 -07:00
string.c string: Allow 2-argument strscpy() 2024-02-20 20:47:32 -08:00
string_helpers.c lib/string_helpers: Add flags param to string_get_size() 2024-02-29 22:34:42 -08:00
string_helpers_kunit.c string: Convert helpers selftest to KUnit 2024-03-05 01:55:28 -08:00
string_kunit.c string: Convert selftest to KUnit 2024-03-05 01:55:28 -08:00
strncpy_from_user.c
strnlen_user.c
strscpy_kunit.c fortify: Short-circuit known-safe calls to strscpy() 2022-11-01 10:04:52 -07:00
syscall.c
test-kstrtox.c
test_bitmap.c lib/bitmap: Introduce bitmap_scatter() and bitmap_gather() helpers 2024-03-11 09:36:11 +00:00
test_bitops.c
test_bits.c
test_blackhole_dev.c net: blackhole_dev: fix build warning for ethh set but not used 2024-02-05 12:30:54 +00:00
test_bpf.c test_bpf: Rename second ALU64_SMOD_X to ALU64_SMOD_K 2023-12-09 21:27:54 -08:00
test_debug_virtual.c
test_dynamic_debug.c
test_firmware.c firmware_loader: Expand Firmware upload error codes with firmware invalid error 2023-11-24 18:09:19 -08:00
test_fprobe.c fprobe: Pass return address to the handlers 2023-06-06 21:39:55 +09:00
test_fpu.c
test_free_pages.c
test_hash.c
test_hexdump.c treewide: use get_random_u32_inclusive() when possible 2022-11-18 02:18:02 +01:00
test_hmm.c lib: replace kmap() with kmap_local_page() 2023-08-18 10:18:50 -07:00
test_hmm_uapi.h hmm-tests: add test for migrate_device_range() 2022-10-12 18:51:50 -07:00
test_ida.c Quite a lot of kexec work this time around. Many singleton patches in 2024-01-09 11:46:20 -08:00
test_kmod.c lib/test_kmod: fix kernel-doc warnings 2024-02-02 10:21:26 -08:00
test_kprobes.c test_kprobes: Add recursed kprobe test case 2023-02-21 08:52:42 +09:00
test_linear_ranges.c lib/test_linear_ranges: Use LINEAR_RANGE() 2022-11-16 13:32:32 +00:00
test_list_sort.c treewide: use get_random_u32_below() instead of deprecated function 2022-11-18 02:15:15 +01:00
test_lockup.c
test_maple_tree.c test_maple_tree: testing the cyclic allocation 2024-02-21 09:34:26 +01:00
test_memcat_p.c
test_meminit.c mm, treewide: introduce NR_PAGE_ORDERS 2024-01-08 15:27:15 -08:00
test_min_heap.c treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
test_module.c
test_objagg.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
test_objpool.c lib: test_objpool: make global variables static 2023-11-10 19:59:04 +09:00
test_parman.c
test_printf.c lib/vsprintf: declare no_hash_pointers in sprintf.h 2023-08-21 13:46:24 -07:00
test_ref_tracker.c lib/ref_tracker: improve printing stats 2023-06-05 15:28:42 -07:00
test_rhashtable.c Kill sched.h dependency on rcupdate.h 2023-12-27 11:50:20 -05:00
test_scanf.c lib: test_scanf: Add explicit type cast to result initialization in test_number_prefix() 2023-08-16 11:47:29 +02:00
test_sort.c
test_static_key_base.c
test_static_keys.c
test_sysctl.c sysctl: Remove the now superfluous sentinel elements from ctl_table array 2023-12-28 04:57:57 -08:00
test_ubsan.c ubsan: Reintroduce signed overflow sanitizer 2024-02-20 20:44:49 -08:00
test_user_copy.c
test_uuid.c
test_vmalloc.c lib/test_vmalloc.c: use unsigned long constant 2024-03-04 17:01:22 -08:00
test_xarray.c XArray: add cmpxchg order test 2024-02-22 10:24:48 -08:00
textsearch.c
timerqueue.c
trace_readwrite.c lib/trace_readwrite.c:: replace asm-generic/io with linux/io 2023-12-29 12:22:29 -08:00
ts_bm.c lib/ts_bm: add helper to reduce indentation and improve readability 2023-07-27 13:45:51 +02:00
ts_fsm.c
ts_kmp.c
ubsan.c ubsan: Reintroduce signed overflow sanitizer 2024-02-20 20:44:49 -08:00
ubsan.h ubsan: Reintroduce signed overflow sanitizer 2024-02-20 20:44:49 -08:00
ucmpdi2.c
ucs2_string.c lib/ucs2_string: Add UCS-2 strscpy function 2023-09-13 10:18:42 -07:00
usercopy.c uaccess: Add speculation barrier to copy_from_user() 2023-02-21 14:45:22 -08:00
uuid.c treewide: use get_random_bytes() when possible 2022-10-11 17:42:58 -06:00
vsprintf.c lib/vsprintf: Fix %pfwf when current node refcount == 0 2023-12-06 11:06:59 +01:00
win_minmax.c lib/win_minmax: use /* notation for regular comments 2023-01-11 16:14:21 -08:00
xarray.c xarray: Document necessary flag in alloc functions 2023-09-05 19:01:38 -04:00
xxhash.c