linux/tools/perf/bench
Dirk Gouders 99476fa085 perf bench sched pipe: fix enforced blocking reads in worker_thread
The function worker_thread() is programmed in a way that roughly
doubles the number of expectable context switches, because it enforces
blocking reads:

 Performance counter stats for 'perf bench sched pipe':

         2,000,004      context-switches

      11.859548321 seconds time elapsed

       0.674871000 seconds user
       8.076890000 seconds sys

The result of this behavior is that the blocking reads by far dominate
the performance analysis of 'perf bench sched pipe':

Samples: 78K of event 'cycles:P', Event count (approx.): 27964965844
Overhead  Command     Shared Object         Symbol
  25.28%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   8.11%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   2.82%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

From the code, it is unclear if that behavior is wanted but the log
says that at least Ingo Molnar aims to mimic lmbench's lat_ctx, that
doesn't handle the pipe ends that way
(https://sourceforge.net/p/lmbench/code/HEAD/tree/trunk/lmbench2/src/lat_ctx.c)

Fix worker_thread() by always first feeding the write ends of the pipes
and then trying to read.

This roughly halves the context switches and runtime of pure
'perf bench sched pipe':

 Performance counter stats for 'perf bench sched pipe':

         1,005,770      context-switches

       6.033448041 seconds time elapsed

       0.423142000 seconds user
       4.519829000 seconds sys

And the blocking reads do no longer dominate the analysis at the above
extreme:

Samples: 40K of event 'cycles:P', Event count (approx.): 14309364879
Overhead  Command     Shared Object         Symbol
  12.20%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   9.23%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   3.68%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

Signed-off-by: Dirk Gouders <dirk@gouders.net>
Acked-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/20250323140316.19027-2-dirk@gouders.net
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
2025-03-23 23:20:37 -07:00
..
Build perf bench: Make bench its own library 2024-06-26 11:07:28 -07:00
bench.h perf bench uprobe: Add uretprobe variant of uprobe benchmarks 2024-04-12 17:54:02 -03:00
breakpoint.c perf bench breakpoint: Skip run if no breakpoints available 2023-08-23 08:39:02 -03:00
epoll-ctl.c tools/perf: Fix perf bench epoll to enable the run when some CPU's are offline 2024-06-13 21:27:26 -07:00
epoll-wait.c perf bench: Fix undefined behavior in cmpworker() 2025-01-18 10:14:36 -08:00
evlist-open-close.c perf evlist: Rename cpus to user_requested_cpus 2022-04-01 16:19:35 -03:00
find-bit-bench.c perf bench: Avoid NDEBUG warning 2023-04-04 09:39:56 -03:00
futex-hash.c tools/perf: Fix perf bench futex to enable the run when some CPU's are offline 2024-06-13 21:26:58 -07:00
futex-lock-pi.c tools/perf: Fix perf bench futex to enable the run when some CPU's are offline 2024-06-13 21:26:58 -07:00
futex-requeue.c tools/perf: Fix perf bench futex to enable the run when some CPU's are offline 2024-06-13 21:26:58 -07:00
futex-wake-parallel.c tools/perf: Fix timing issue with parallel threads in perf bench wake-up-parallel 2024-06-13 21:27:49 -07:00
futex-wake.c tools/perf: Fix perf bench futex to enable the run when some CPU's are offline 2024-06-13 21:26:58 -07:00
futex.h Revert "perf bench futex: Add support for 32-bit systems with 64-bit time_t" 2021-11-01 11:42:54 -03:00
inject-buildid.c perf bench: Remove reference to cmd_inject 2024-12-18 16:24:33 -03:00
kallsyms-parse.c
mem-functions.c
mem-memcpy-arch.h
mem-memcpy-x86-64-asm-def.h tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' 2023-05-17 10:42:19 -03:00
mem-memcpy-x86-64-asm.S tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' 2023-05-17 10:42:19 -03:00
mem-memset-arch.h
mem-memset-x86-64-asm-def.h tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' 2023-05-17 10:42:19 -03:00
mem-memset-x86-64-asm.S tools headers: Update the copy of x86's mem{cpy,set}_64.S used in 'perf bench' 2023-05-17 10:42:19 -03:00
numa.c perf header: Move is_cpu_online to numa bench 2024-11-16 16:36:47 -03:00
pmu-scan.c perf pmu: Abstract alias/event struct 2023-08-24 10:42:46 -03:00
sched-messaging.c perf bench messaging: Kill child processes when exit abnormally in process mode 2023-09-26 21:47:12 -07:00
sched-pipe.c perf bench sched pipe: fix enforced blocking reads in worker_thread 2025-03-23 23:20:37 -07:00
sched-seccomp-notify.c perf bench sched-seccomp-notify: Fix spelling mistake "synchronious" -> "synchronous" 2023-12-05 15:48:52 -03:00
synthesize.c perf tool: Constify tool pointers 2024-08-12 18:05:14 -03:00
syscall.c perf bench: Fix perf bench syscall loop count 2025-03-05 09:19:23 -08:00
uprobe.c perf bench uprobe: Add uretprobe variant of uprobe benchmarks 2024-04-12 17:54:02 -03:00