Commit Graph

203 Commits

Author SHA1 Message Date
Linus Torvalds 5779de8d36 rtla updaets for v6.19:
- Officially add Tomas Glozar as a maintainer to RTLA tool
 
 - Add for_each_monitored_cpu() helper
 
   In multiple places, RTLA tools iterate over the list of CPUs running
   tracer threads.
 
   Use single helper instead of repeating the for/if combination.
 
 - Remove unused variable option_index in argument parsing
 
   RTLA tools use getopt_long() for argument parsing. For its last
   argument, an unused variable "option_index" is passed.
 
   Remove the variable and pass NULL to getopt_long() to shorten
   the naturally long parsing functions, and make them more readable.
 
 - Fix unassigned nr_cpus after code consolidation
 
   In recent code consolidation, timerlat tool cleanup, previously
   implemented separately for each tool, was moved to a common function
   timerlat_free().
 
   The cleanup relies on nr_cpus being set. This was not done in the new
   function, leaving the variable uninitialized.
 
   Initialize the variable properly, and remove silencing of compiler
   warning for uninitialized variables.
 
 - Stop tracing on user latency in BPF mode
 
   Despite the name, rtla-timerlat's -T/--thread option sets timerlat's
   stop_tracing_total_us option, which also stops tracing on
   return-from-user latency, not only on thread latency.
 
   Implement the same behavior also in BPF sample collection stop tracing
   handler to avoid a discrepancy and restore correspondence of behavior
   with the equivalent option of cyclictest.
 
 - Fix threshold actions always triggering
 
   A bug in threshold action logic caused the action to execute even
   if tracing did not stop because of threshold.
 
   Fix the logic to stop correctly.
 
 - Fix few minor issues in tests
 
   Extend tests that were shown to need it to 5s, fix osnoise test
   calling timerlat by mistake, and use new, more reliable output
   checking in timerlat's "top stop at failed action" test.
 
 - Do not print usage on argument parsing error
 
   RTLA prints the entire usage message on encountering errors in
   argument parsing, like a malformed CPU list.
 
   The usage message has gotten too long. Instead of printing it,
   use newly added fatal() helper function to simply exit with
   the error message, excluding the usage.
 
 - Fix unintuitive -C/--cgroup interface
 
   "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that
   being a common way to specify an option with argument. Moreover,
   using them fails silently and no cgroup is set.
 
   Create new helper function to unify the handling of all such options
   and allow all of:
 
   -Xsomething
   -X=something
   -X something
 
   as well as the equivalent for the long option.
 
 - Fix -a overriding -t argument filename
 
   Fix a bug where -a following -t custom_file.txt overrides the custom
   filename with the default timerlat_trace.txt.
 
 - Stop tracing correctly on multiple events at once
 
   In some race scenarios, RTLA BPF sample collection might send multiple
   stop tracing events via the BPF ringbuffer at once.
 
   Compare the number of events for != 0 instead of == 1 to cover for
   this scenario and stop tracing properly.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS9bxBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhrgAP0a/AtsL9+IFXAK5JK8aO1XWApVyK9n
 48FRZWu/jrupuAD7BO+EHazmPEourNaUqYPeuymwxT+4O47RH1Q/aasLQwo=
 =RvNH
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rtla trace tooling updates from Steven Rostedt:

 - Officially add Tomas Glozar as a maintainer to RTLA tool

 - Add for_each_monitored_cpu() helper

   In multiple places, RTLA tools iterate over the list of CPUs running
   tracer threads.

   Use single helper instead of repeating the for/if combination.

 - Remove unused variable option_index in argument parsing

   RTLA tools use getopt_long() for argument parsing. For its last
   argument, an unused variable "option_index" is passed.

   Remove the variable and pass NULL to getopt_long() to shorten the
   naturally long parsing functions, and make them more readable.

 - Fix unassigned nr_cpus after code consolidation

   In recent code consolidation, timerlat tool cleanup, previously
   implemented separately for each tool, was moved to a common function
   timerlat_free().

   The cleanup relies on nr_cpus being set. This was not done in the new
   function, leaving the variable uninitialized.

   Initialize the variable properly, and remove silencing of compiler
   warning for uninitialized variables.

 - Stop tracing on user latency in BPF mode

   Despite the name, rtla-timerlat's -T/--thread option sets timerlat's
   stop_tracing_total_us option, which also stops tracing on
   return-from-user latency, not only on thread latency.

   Implement the same behavior also in BPF sample collection stop
   tracing handler to avoid a discrepancy and restore correspondence of
   behavior with the equivalent option of cyclictest.

 - Fix threshold actions always triggering

   A bug in threshold action logic caused the action to execute even if
   tracing did not stop because of threshold.

   Fix the logic to stop correctly.

 - Fix few minor issues in tests

   Extend tests that were shown to need it to 5s, fix osnoise test
   calling timerlat by mistake, and use new, more reliable output
   checking in timerlat's "top stop at failed action" test.

 - Do not print usage on argument parsing error

   RTLA prints the entire usage message on encountering errors in
   argument parsing, like a malformed CPU list.

   The usage message has gotten too long. Instead of printing it, use
   newly added fatal() helper function to simply exit with the error
   message, excluding the usage.

 - Fix unintuitive -C/--cgroup interface

   "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that
   being a common way to specify an option with argument. Moreover,
   using them fails silently and no cgroup is set.

   Create new helper function to unify the handling of all such options
   and allow all of:

     -Xsomething
     -X=something
     -X something

   as well as the equivalent for the long option.

 - Fix -a overriding -t argument filename

   Fix a bug where -a following -t custom_file.txt overrides the custom
   filename with the default timerlat_trace.txt.

 - Stop tracing correctly on multiple events at once

   In some race scenarios, RTLA BPF sample collection might send
   multiple stop tracing events via the BPF ringbuffer at once.

   Compare the number of events for != 0 instead of == 1 to cover for
   this scenario and stop tracing properly.

* tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla/timerlat: Exit top main loop on any non-zero wait_retval
  rtla/tests: Don't rely on matching ^1ALL
  rtla: Fix -a overriding -t argument
  rtla: Fix -C/--cgroup interface
  tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")
  tools/rtla: Replace osnoise_top_usage("...") with fatal("...")
  tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")
  tools/rtla: Replace timerlat_top_usage("...") with fatal("...")
  tools/rtla: Add fatal() and replace error handling pattern
  rtla/tests: Fix osnoise test calling timerlat
  rtla/tests: Extend action tests to 5s
  tools/rtla: Fix --on-threshold always triggering
  rtla/timerlat_bpf: Stop tracing on user latency
  tools/rtla: Fix unassigned nr_cpus
  tools/rtla: Remove unused optional option_index
  tools/rtla: Add for_each_monitored_cpu() helper
  MAINTAINERS: Add Tomas Glozar as a maintainer to RTLA tool
2025-12-05 09:34:01 -08:00
Crystal Wood 3138df6f0c rtla/timerlat: Exit top main loop on any non-zero wait_retval
Comparing to exactly 1 will fail if more than one ring buffer
event was seen since the last call to timerlat_bpf_wait(), which
can happen in some race scenarios.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Crystal Wood 61f1fd5d69 rtla/tests: Don't rely on matching ^1ALL
The timerlat "top stop at failed action" test was relying on "ALL" being
printed immediately after the "1" from the threshold action.  Besides being
fragile, this depends on stdbuf behavior, which is easy to miss when
recreating the test outside of the framework for debugging purposes.

Instead, use the expected/unexpected text mechanism from the
corresponding osnoise test.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-2-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Ivan Pravdin ddb6e42494 rtla: Fix -a overriding -t argument
When running rtla as

    `rtla <timerlat|osnoise> <top|hist> -t custom_file.txt -a 100`

-a options override trace output filename specified by -t option.
Running the command above will create <timerlat|osnoise>_trace.txt file
instead of custom_file.txt. Fix this by making sure that -a option does
not override trace output filename even if it's passed after trace
output filename is specified.

Fixes: 173a3b0148 ("rtla/timerlat: Add the automatic trace option")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Ivan Pravdin 7b71f3a698 rtla: Fix -C/--cgroup interface
Currently, user can only specify cgroup to the tracer's thread the
following ways:

    `-C[cgroup]`
    `-C[=cgroup]`
    `--cgroup[=cgroup]`

If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`,
the parser silently fails and rtla's cgroup is used for the tracer
threads.

To make interface more user-friendly, allow user to specify cgroup in
the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`.

Refactor identical logic between -t/--trace and -C/--cgroup into a
common function.

Change documentation to reflect this user interface change.

Fixes: a957cbc025 ("rtla: Add -C cgroup support")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 49c1579419 tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace osnoise_hist_usage("...") with fatal("...") on errors.

Remove the already unused 'usage' argument from osnoise_hist_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-6-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 92b5b55e5e tools/rtla: Replace osnoise_top_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace osnoise_top_usage("...") with fatal("...") on errors.

Remove the already unused 'usage' argument from osnoise_top_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-5-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 8f4264e046 tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace timerlat_hist_usage("...\n") with fatal("...") on errors.

Remove the already unused 'usage' argument from timerlat_hist_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-4-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 4e5e7210f9 tools/rtla: Replace timerlat_top_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace timerlat_top_usage("...\n") with fatal("...") on errors.

Remove the already unused 'usage' argument from timerlat_top_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 8cbb25db81 tools/rtla: Add fatal() and replace error handling pattern
The code contains some technical debt in error handling,
which complicates the consolidation of duplicated code.

Introduce an fatal() function to replace the common pattern of
err_msg() followed by exit(EXIT_FAILURE), reducing the length of an
already long function.

Further patches using fatal() follow.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Tomas Glozar 34c170ae5c rtla/tests: Fix osnoise test calling timerlat
osnoise test "top stop at failed action" is calling timerlat instead of
osnoise by mistake.

Fix it so that it calls the correct RTLA subcommand.

Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:00 +01:00
Tomas Glozar d649e9f04c rtla/tests: Extend action tests to 5s
In non-BPF mode, it takes up to 1 second for RTLA to notice that tracing
has been stopped. That means that action tests cannot have a 1 second
duration, as the SIGALRM will be racing with the threshold overflow.

Previously, non-BPF mode actions were buggy and always executed
the action, even when stopping on duration or SIGINT, preventing
this issue from manifesting. Now that this has been fixed, the tests
have become flaky, and this has to be adjusted.

Fixes: 4e26f84abf ("rtla/tests: Add tests for actions")
Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:29:01 +01:00
Tomas Glozar 417bd0d502 tools/rtla: Fix --on-threshold always triggering
Commit 8d933d5c89 ("rtla/timerlat: Add continue action") moved the
code performing on-threshold actions (enabled through --on-threshold
option) to inside the RTLA main loop.

The condition in the loop does not check whether the threshold was
actually exceeded or if stop tracing was requested by the user through
SIGINT or duration. This leads to a bug where on-threshold actions are
always performed, even when the threshold was not hit.

(BPF mode is not affected, since it uses a different condition in the
while loop.)

Add a condition that checks for !stop_tracing before executing the
actions. Also, fix incorrect brackets in hist_main_loop to match the
semantics of top_main_loop.

Fixes: 8d933d5c89 ("rtla/timerlat: Add continue action")
Fixes: 2f3172f9dd ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Reviewed-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:55 +01:00
Tomas Glozar e4240db933 rtla/timerlat_bpf: Stop tracing on user latency
rtla-timerlat allows a *thread* latency threshold to be set via the
-T/--thread option. However, the timerlat tracer calls this *total*
latency (stop_tracing_total_us), and stops tracing also when the
return-to-user latency is over the threshold.

Change the behavior of the timerlat BPF program to reflect what the
timerlat tracer is doing, to avoid discrepancy between stopping
collecting data in the BPF program and stopping tracing in the timerlat
tracer.

Cc: stable@vger.kernel.org
Fixes: e34293ddce ("rtla/timerlat: Add BPF skeleton to collect samples")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251006143100.137255-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:55 +01:00
Costa Shulyupin b4275b2301 tools/rtla: Fix unassigned nr_cpus
In recently introduced timerlat_free(),
the variable 'nr_cpus' is not assigned.

Assign it with sysconf(_SC_NPROCESSORS_CONF) as done elsewhere.
Remove the culprit: -Wno-maybe-uninitialized. The rest of the
code is clean.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Fixes: 2f3172f9dd ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Link: https://lore.kernel.org/r/20251002170846.437888-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:54 +01:00
Costa Shulyupin 671314fce1 tools/rtla: Remove unused optional option_index
The longindex argument of getopt_long() is optional
and tied to the unused local variable option_index.

Remove it to shorten the four longest functions
and make the code neater.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:53 +01:00
Costa Shulyupin 04fa6bf373 tools/rtla: Add for_each_monitored_cpu() helper
The rtla tools have many instances of iterating over CPUs while
checking if they are monitored.

Add a for_each_monitored_cpu() helper macro to make the code
more readable and reduce code duplication.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:53 +01:00
Zhang Chujun 53afec2c8f tracing/tools: Fix incorrcet short option in usage text for --threads
The help message incorrectly listed '-t' as the short option for
--threads, but the actual getopt_long configuration uses '-e'.
This mismatch can confuse users and lead to incorrect command-line
usage. This patch updates the usage string to correctly show:
	"-e, --threads NRTHR"
to match the implementation.

Note: checkpatch.pl reports a false-positive spelling warning on
'Run', which is intentional.

Link: https://patch.msgid.link/20251106031040.1869-1-zhangchujun@cmss.chinamobile.com
Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-07 07:59:37 -05:00
Linus Torvalds d9f24f8e60 rtla: Updates for v6.18
- This update is mostly just consolidating code between osnoise/timerlat
   and top/hist for easier maintenance and less future divergence.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaN/guhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qqU9AQCO+u+Qmx678DCfDJo9X1UPDtS/bM5f
 r30X1pwYfZ3nNAEA47hbkVFcryFJZbrIPxuTGb0GSM36PHAxmch4QAwBqgs=
 =qzZh
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tools updates from Steven Rostedt

 - This is mostly just consolidating code between osnoise/timerlat and
   top/hist for easier maintenance and less future divergence

* tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tools/rtla: Add remaining support for osnoise actions
  tools/rtla: Add test engine support for unexpected output
  tools/rtla: Fix -A option name in test comment
  tools/rtla: Consolidate code between osnoise/timerlat and hist/top
  tools/rtla: Create common_apply_config()
  tools/rtla: Move top/hist params into common struct
  tools/rtla: Consolidate common parameters into shared structure
2025-10-05 09:38:26 -07:00
Wander Lairson Costa 2227f273b7 rtla/actions: Fix condition for buffer reallocation
The condition to check if the actions buffer needs to be resized was
incorrect. The check `self->size >= self->len` would evaluate to
true on almost every call to `actions_new()`, causing the buffer to
be reallocated unnecessarily each time an action was added.

Fix the condition to `self->len >= self.size`, ensuring
that the buffer is only resized when it is actually full.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com
Fixes: 6ea082b171 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Wander Lairson Costa <wander@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 06:01:20 -04:00
Ivan Pravdin b1e0ff7209 rtla: Fix buffer overflow in actions_parse
Currently, tests 3 and 13-22 in tests/timerlat.t fail with error:

    *** buffer overflow detected ***: terminated
    timeout: the monitored command dumped core

The result of running `sudo make check` is

    tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11)
      Failed tests:  3, 13-22
    Files=3, Tests=34, 140 wallclock secs ( 0.07 usr  0.01 sys + 27.63 cusr
    27.96 csys = 55.67 CPU)
    Result: FAIL

Fix buffer overflow in actions_parse to avoid this error. After this
change, the tests results are

    tests/hwnoise.t ... ok
    tests/osnoise.t ... ok
    tests/timerlat.t .. ok
    All tests successful.
    Files=3, Tests=34, 186 wallclock secs ( 0.06 usr  0.01 sys + 41.10 cusr
    44.38 csys = 85.55 CPU)
    Result: PASS

Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com
Fixes: 6ea082b171 ("rtla/timerlat: Add action on threshold feature")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 05:57:29 -04:00
Crystal Wood 05b7e10687 tools/rtla: Add remaining support for osnoise actions
The basic functionality came with the consolidation; now hook up the
command line options, and add documentation and tests.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-8-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:53:48 -04:00
Crystal Wood 3cd6b18d10 tools/rtla: Add test engine support for unexpected output
Add a check() parameter to indicate which text must not appear in the
output.

Simplify the code so that we can print failures as they happen rather
than trying to figure out what went wrong after printing "not ok".  This
also means that "not ok" gets printed after the info rather than before,
which seems more intuitive anyway.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-7-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:53:21 -04:00
Crystal Wood c4e30c22ba tools/rtla: Fix -A option name in test comment
This was changed to --on-threshold when the patches were applied.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-6-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:53:07 -04:00
Crystal Wood 2f3172f9dd tools/rtla: Consolidate code between osnoise/timerlat and hist/top
Currently a lot of code is duplicated between the different rtla tools,
making maintenance more difficult, and encouraging divergence such as
features that are only implemented for certain tools even though they
could be more broadly applicable.

Merge the various main() functions into a common run_tool() with an ops
struct for tool-specific details.

Implement enough support for actions on osnoise to not need to keep the
old params->trace_output path.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-5-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:52:57 -04:00
Crystal Wood 263d7eacf8 tools/rtla: Create common_apply_config()
Merge the common bits of osnoise_apply_config() and
timerlat_apply_config().  Put the result in a new common.c, and move
enough things to common.h so that common.c does not need to include
osnoise.h.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-4-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:52:46 -04:00
Crystal Wood 5742bf62e6 tools/rtla: Move top/hist params into common struct
The hist members were very similar between timerlat and top, so
just use one common hist struct.

output_divisor, quiet, and pretty printing are pretty generic
concepts that can go in the main struct even if not every
specific tool (currently) uses them.

Cc: John Kacur <jkacur@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-3-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:52:17 -04:00
Costa Shulyupin 344823886e tools/rtla: Consolidate common parameters into shared structure
timerlat_params and osnoise_params structures contain 15 identical
fields.

Introduce a new header common.h and define a common_params structure to
consolidate shared fields, reduce code duplication, and enhance
maintainability.

Cc: John Kacur <jkacur@redhat.com>
Link: https://lore.kernel.org/20250907022325.243930-2-crwood@redhat.com
Reviewed-by: Tomas Glozar  <tglozar@redhat.com>
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Signed-off-by: Crystal Wood <crwood@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-09-27 04:51:18 -04:00
Tao Chen 7b128f1d53 rtla: Check pkg-config install
The tool pkg-config used to check libtraceevent and libtracefs, if not
installed, it will report the libs not found, even though they have
already been installed.

Before:
libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel
libtracefs is missing. Please install libtracefs-dev/libtracefs-devel

After:
Makefile.config:10: *** Error: pkg-config needed by libtraceevent/libtracefs is missing
on this system, please install it.

Link: https://lore.kernel.org/20250808040527.2036023-2-chen.dylane@linux.dev
Fixes: 01474dc706 ("tools/rtla: Use tools/build makefiles to build rtla")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-19 20:32:54 -04:00
Tao Chen 26ebba25e2 tools/latency-collector: Check pkg-config install
The tool pkg-config used to check libtraceevent and libtracefs, if not
installed, it will report the libs not found, even though they have
already been installed.

Before:
libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel
libtracefs is missing. Please install libtracefs-dev/libtracefs-devel

After:
Makefile.config:10: *** Error: pkg-config needed by libtraceevent/libtracefs is missing
on this system, please install it.

Link: https://lore.kernel.org/20250808040527.2036023-1-chen.dylane@linux.dev
Fixes: 9d56c88e52 ("tools/tracing: Use tools/build makefiles on latency-collector")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-08-19 20:32:54 -04:00
Tomas Glozar a80db1f857 rtla/tests: Test timerlat -P option using actions
The -P option is used to set priority of osnoise and timerlat threads.

Extend the test for -P with --on-threshold calling a script that looks
for running timerlat threads and checks if their priority is set
correctly.

As --on-threshold is only supported by timerlat at the moment, this is
only implemented there so far.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250725133817.59237-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28 10:22:39 -04:00
Tomas Glozar 892ae5f806 rtla/tests: Add grep checks for base test cases
Checking for patterns in rtla output with grep was added to test rtla
actions. Add grep checks also for base tests where applicable.

Also fix trace event histogram trigger check to use the correct syntax
for the command-line option so that the test passes with the grep check.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Link: https://lore.kernel.org/20250725133817.59237-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-28 10:22:38 -04:00
Tomas Glozar 04f837165b rtla/tests: Limit duration to maximum of 10s
Many of the original rtla tests included durations of 1 minute and 30
seconds. Experience has shown this is unnecessary, since 10 seconds as
waiting time for samples to appear.

Change duration of all rtla tests to at most 10 seconds. This speeds up
testing significantly.

Before:
$ make check
All tests successful.
Files=3, Tests=54, 536 wallclock secs
( 0.03 usr  0.00 sys + 20.31 cusr 22.02 csys = 42.36 CPU)
Result: PASS

After:
$ make check
...
All tests successful.
Files=3, Tests=54, 196 wallclock secs
( 0.03 usr  0.01 sys + 20.28 cusr 20.68 csys = 41.00 CPU)
Result: PASS

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-9-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:57 -04:00
Tomas Glozar 4e26f84abf rtla/tests: Add tests for actions
Add a bunch of tests covering most of both --on-threshold and --on-end.
Parts sensitive to implementation of hist/top are tested for both.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-8-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:57 -04:00
Tomas Glozar 916a9c5b03 rtla/tests: Check rtla output with grep
Add argument to the check command in the test suite that takes a regular
expression that the output of rtla command is checked against. This
allows testing for specific information in rtla output in addition
to checking the return value.

Two minor improvements are included: running rtla with "eval" so that
arguments with spaces can be passed to it via shell quotations, and
the stdout of pushd and popd is suppressed to clean up the test output.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-7-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:57 -04:00
Tomas Glozar 3aadb65db5 rtla/timerlat: Add action on end feature
Implement actions on end next to actions on threshold. A new option,
--on-end is added, parallel to --on-threshold. Instead of being
executed whenever a latency threshold is reached, it is executed at the
end of the measurement.

For example:

$ rtla timerlat hist -d 5s --on-end trace

will save the trace output at the end.

All actions supported by --on-threshold are also supported by --on-end,
except for continue, which does nothing with --on-end.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:57 -04:00
Tomas Glozar 8d933d5c89 rtla/timerlat: Add continue action
Introduce option to resume tracing after a latency threshold overflow.
The option is implemented as an action named "continue".

Example:
$ rtla timerlat top -q -T 200 -d 1s --on-threshold \
exec,command="echo Threshold" --on-threshold continue
Threshold
Threshold
Threshold
                                     Timer Latency
...

The feature is supported for both hist and top. After the continue
action is executed, processing of the list of actions is stopped and
tracing is resumed.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:57 -04:00
Tomas Glozar 3b78670e3a rtla/timerlat_bpf: Allow resuming tracing
Currently, rtla-timerlat BPF program uses a global variable stored in a
.bss section to store whether tracing has been stopped.

Move the information to a separate map, so that it is easily writable
from userspace, and add a function that clears the value, resuming
tracing after it has been stopped.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-4-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:56 -04:00
Tomas Glozar 6ea082b171 rtla/timerlat: Add action on threshold feature
Extend the functionality provided by the -t/--trace option, which
triggers saving the contents of a tracefs buffer after tracing is
stopped, to support implementing arbitrary actions.

A new option, --on-threshold, is added, taking an argument
that further specifies the action. Actions added in this patch are:

- trace[,file=<filename>]: Saves tracefs buffer, optionally taking a
filename.
- signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might
be specified instead of number to send signal to parent process.
- shell,command=<command>: Execute shell command.

Multiple actions may be specified and will be executed in order,
including multiple actions of the same type. Trace output requested via
-t and -a now adds a trace action to the end of the list.

If an action fails, the following actions are not executed. For
example, this command:

$ rtla timerlat -T 20 --on-threshold trace \
--on-threshold shell,command="grep ipi_send timerlat_trace.txt" \
--on-threshold signal,num=2,pid=parent

will send signal 2 (SIGINT) to parent process, but only if saved trace
contains the text "ipi_send".

This way, the feature can be used for flexible reactions on latency
spikes, and allows combining rtla with other tooling like perf.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:56 -04:00
Tomas Glozar 8b6cbcac76 rtla/timerlat: Introduce enum timerlat_tracing_mode
After the introduction of BPF-based sample collection, rtla-timerlat
effectively runs in one of three modes:

- Pure BPF mode, with tracefs only being used to set up the timerlat
tracer. Sample processing and stop on threshold are handled by BPF.

- tracefs mode. BPF is unsupported or kernel is lacking the necessary
trace event (osnoise:timerlat_sample). Stop on theshold is handled by
timerlat tracer stopping tracing in all instances.

- BPF/tracefs mixed mode - BPF is used for sample collection for top or
histogram, tracefs is used for trace output and/or auto-analysis. Stop
on threshold is handled both through BPF program, which stops sample
collection for top/histogram and wakes up rtla, and by timerlat
tracer, which stops tracing for trace output/auto-analysis instances.

Add enum timerlat_tracing_mode, with three values:

- TRACING_MODE_BPF
- TRACING_MODE_TRACEFS
- TRACING_MODE_MIXED

Those represent the modes described above. A field of this type is added
to struct timerlat_params, named "mode", replacing the no_bpf variable.
params->mode is set in timerlat_{top,hist}_parse_args to
TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output
and/or auto-analysis is requested. timerlat_{top,hist}_main then checks
if BPF is not unavailable or disabled, in that case, it sets
params->mode to TRACING_MODE_TRACEFS.

A condition is added to timerlat_apply_config that skips setting
timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those
are unnecessary, since they only turn off tracing, which is already
turned off in that case, since BPF is used to collect samples).

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Chang Yin <cyin@redhat.com>
Cc: Costa Shulyupin <costa.shul@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-25 16:43:56 -04:00
Linus Torvalds 472c5f736b tracing tools updates for v6.16:
- Set distinctive value for failed tests
 
   When running "make check" that performs tests on rtla the failure is
   checked by examining the output. Instead have the tool return an error
   status if it exceeds the threadhold.
 
 - Define __NR_sched_setattr for LoongArch
 
   Define __NR_sched_setattr to allow this to build for LoongArch.
 
 - Define _GNU_SOURCE for timerlat_bpf.c
 
   Due to modifications of struct sched_attr in utils.h when _GNU_SOURCE is
   not defined, this can cause errors for timerlat_bpf_init() and breakage in
   BPF sample collection mode.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDeTzBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qokRAP0XJzos+uvQtkGrqiX5SB/rn1s3/tiD
 nZagARyiV06BAwEA+NNzqFyx/BLUwMnpx/HFTnIMGXbRVWCVAEeL3t77zgk=
 =dx7t
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tools updates from Steven Rostedt:

 - Set distinctive value for failed tests

   When running "make check" that performs tests on rtla the failure is
   checked by examining the output. Instead have the tool return an
   error status if it exceeds the threadhold.

 - Define __NR_sched_setattr for LoongArch

   Define __NR_sched_setattr to allow this to build for LoongArch.

 - Define _GNU_SOURCE for timerlat_bpf.c

   Due to modifications of struct sched_attr in utils.h when _GNU_SOURCE
   is not defined, this can cause errors for timerlat_bpf_init() and
   breakage in BPF sample collection mode.

* tag 'trace-tools-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla: Define _GNU_SOURCE in timerlat_bpf.c
  rtla: Define __NR_sched_setattr for LoongArch
  rtla: Set distinctive exit value for failed tests
2025-05-29 20:59:52 -07:00
Tomas Glozar 8020361d51 rtla: Define _GNU_SOURCE in timerlat_bpf.c
Newer versions of glibc include a definition of struct sched_attr in
bits/sched.h (included through sched.h which is included by rtla).
Commit 0eecee3406 ("tools/rtla: fix collision with glibc
sched_attr/sched_set_attr") has modified the definition of struct
sched_attr in utils.h, so that it is only applied with older versions of
glibc that do not define it, in order to prevent build failure.

The definition in bits/sched.h depends on _GNU_SOURCE.
timerlat_bpf.c does not define _GNU_SOURCE, making it fall back to the
definition in utils.h. The latter has two fields less, leading to
shifted offsets of struct timerlat_params in timerlat_bpf_init.

Because of the shift, timerlat_bpf_init incorrectly reads
params->entries as 0 for timerlat-hist and disables the creation of
histogram maps, causing breakage in BPF sample collection mode:

$ rtla timerlat hist -d 1s
Error pulling BPF data

Fix the issue by also defining _GNU_SOURCE in timerlat_bpf.c.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/20250430144651.621766-1-tglozar@redhat.com
Fixes: e34293ddce ("rtla/timerlat: Add BPF skeleton to collect samples")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07 17:25:46 -04:00
Tiezhu Yang 6a38c51a25 rtla: Define __NR_sched_setattr for LoongArch
When executing "make -C tools/tracing/rtla" on LoongArch, there exists
the following error:

  src/utils.c:237:24: error: '__NR_sched_setattr' undeclared

Just define __NR_sched_setattr for LoongArch if not exist.

Link: https://lore.kernel.org/20250422074917.25771-1-yangtiezhu@loongson.cn
Reported-by: Haiyong Sun <sunhaiyong@loongson.cn>
Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07 16:35:31 -04:00
Costa Shulyupin 18682166f6 rtla: Set distinctive exit value for failed tests
A test is considered failed when a sample trace exceeds the threshold.
Failed tests return the same exit code as passed tests, requiring test
frameworks to determine the result by searching for "hit stop tracing"
in the output.

Assign a distinct exit code for failed tests to enable the use of shell
expressions and seamless integration with testing frameworks without the
need to parse output.

Add enum type for return value.

Update `make check`.

Cc: Daniel Bristot de Oliveira <bristot@kernel.org>
Cc: John Kacur <jkacur@redhat.com>
Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Cc: Eder Zulian <ezulian@redhat.com>
Cc: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Jan Stancek <jstancek@redhat.com>
Link: https://lore.kernel.org/20250417185757.2194541-1-costa.shul@redhat.com
Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-05-07 15:36:19 -04:00
Tomas Glozar 770840a0e7 Documentation/rtla: Include BPF sample collection
Add dependencies needed to build rtla with BPF sample collection support
to README, and document both ways of sample collection in the manpages.

Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Link: https://lore.kernel.org/r/20250311114936.148012-5-tglozar@redhat.com
2025-04-14 10:42:55 -06:00
Linus Torvalds 4fa118e5b7 tracing tooling updates for 6.15:
- Allow RTLA to collect data via BPF
 
   The current implementation of rtla uses libtracefs and libtraceevent to
   pull sample events generated by the timerlat tracer from the trace
   buffer. rtla then processes the sample by updating the histogram and
   summary (current, maximum, minimum, and sum values) as well as checks
   if tracing has been stopped due to threshold overflow.
 
   In use cases where a large number of samples is being generated, that
   is, with measurements running on many CPUs and with a low interval,
   this sample processing design causes a significant CPU load on the rtla
   side. Furthermore, with >100 CPUs and 100us interval, rtla was reported
   as not being able to keep up with the samples and dropping most of them,
   leading to it being unusable.
 
   Change the way the timerlat trace processes samples by attaching
   a BPF program to the trace event using the BPF skeleton feature of bpftool.
   Unlike the current implementation, the BPF implementation does not check
   whether tracing is stopped (in BPF mode, tracing is always off to improve
   performance), but waits for a write to a BPF ringbuffer instead. This allows
   rtla to exit immediately when a threshold is violated, without waiting
   for the next iteration of the while loop.
 
   If the requirements for the BPF implementation are not met, either at
   build time or at run time, the current implementation is used as
   fallback. Which implementation is being used can be seen when running
   rtla timerlat with "-D" option. rtla can be forced to run in non-BPF
   mode by setting the RTLA_NO_BPF option to 1, for debugging purposes.
 
 - Fix LD_FLAGS from being dropped in build
 
 - Refactor code to remove duplication of save_trace_to_file
 
 - Always set options and do not rely on default settings
 
   Do not rely on the default kernel settings of the tracers when
   starting. They could have been changed by the user which gives
   inconsistent results. Always set the options that rtla expects.
 
 - Add creation of ctags and TAGS for traversing code
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+WBgRQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qg54AQDCOChaSSBiUkD0VoPKIeDMlPfvO5Qz
 Xvrst5gtopfKFgEA12/9Lll/sh1eoc4saeGBooNY48HBUMjmX3KNFB14PQg=
 =aHFu
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing tooling updates from Steven Rostedt:

 - Allow RTLA to collect data via BPF

   The current implementation of rtla uses libtracefs and libtraceevent
   to pull sample events generated by the timerlat tracer from the trace
   buffer. rtla then processes the sample by updating the histogram and
   summary (current, maximum, minimum, and sum values) as well as checks
   if tracing has been stopped due to threshold overflow.

   In use cases where a large number of samples is being generated, that
   is, with measurements running on many CPUs and with a low interval,
   this sample processing design causes a significant CPU load on the
   rtla side. Furthermore, with >100 CPUs and 100us interval, rtla was
   reported as not being able to keep up with the samples and dropping
   most of them, leading to it being unusable.

   Change the way the timerlat trace processes samples by attaching a
   BPF program to the trace event using the BPF skeleton feature of
   bpftool. Unlike the current implementation, the BPF implementation
   does not check whether tracing is stopped (in BPF mode, tracing is
   always off to improve performance), but waits for a write to a BPF
   ringbuffer instead. This allows rtla to exit immediately when a
   threshold is violated, without waiting for the next iteration of the
   while loop.

   If the requirements for the BPF implementation are not met, either at
   build time or at run time, the current implementation is used as
   fallback. Which implementation is being used can be seen when running
   rtla timerlat with "-D" option. rtla can be forced to run in non-BPF
   mode by setting the RTLA_NO_BPF option to 1, for debugging purposes.

 - Fix LD_FLAGS from being dropped in build

 - Refactor code to remove duplication of save_trace_to_file

 - Always set options and do not rely on default settings

   Do not rely on the default kernel settings of the tracers when
   starting. They could have been changed by the user which gives
   inconsistent results. Always set the options that rtla expects.

 - Add creation of ctags and TAGS for traversing code

* tag 'trace-tools-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla: Add the ability to create ctags and etags
  rtla/tests: Test setting default options
  rtla/tests: Reset osnoise options before check
  rtla: Always set all tracer options
  rtla/osnoise: Set OSNOISE_WORKLOAD to true
  rtla: Unify apply_config between top and hist
  rtla/osnoise: Unify params struct
  rtla: Fix segfault in save_trace_to_file call
  tools/build: Use SYSTEM_BPFTOOL for system bpftool
  rtla: Refactor save_trace_to_file
  tools/rv: Keep user LDFLAGS in build
  rtla/timerlat: Test BPF mode
  rtla/timerlat_top: Use BPF to collect samples
  rtla/timerlat_top: Move divisor to update
  rtla/timerlat_hist: Use BPF to collect samples
  rtla/timerlat: Add BPF skeleton to collect samples
  rtla: Add optional dependency on BPF tooling
  tools/build: Add bpftool-skeletons feature test
  rtla/timerlat: Unify params struct
2025-03-27 17:03:01 -07:00
John Kacur 732032692f rtla: Add the ability to create ctags and etags
- Add the ability to create and remove ctags and etags, using the following
make tags
make TAGS
make tags_clean

- fix a comment in Makefile.rtla with the correct spelling and don't
  imply that the ability to create an rtla tarball will be removed

Cc: Tomas Glozar <tglozar@redhat.com>
Cc: "Luis Claudio R . Goncalves" <lgoncalv@redhat.com>
Link: https://lore.kernel.org/20250321175053.29048-1-jkacur@redhat.com
Signed-off-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-26 10:39:40 -04:00
Tomas Glozar a86150f310 rtla/tests: Test setting default options
Add function to test engine to test with pre-set osnoise options, and
use it to test whether osnoise period (as an example) is set correctly.

The test works by pre-setting a high period of 10 minutes and stop on
threshold. Thus, it is easy to check whether rtla is properly resetting
the period to default: if it is, the test will complete on time, since
the first sample will overflow the threshold. If not, it will time out.

Cc: Luis Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/20250320092500.101385-7-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-26 10:36:39 -04:00
Tomas Glozar 6c6182728a rtla/tests: Reset osnoise options before check
Remove any dangling tracing instances from previous improperly exited
runs of rtla, and reset osnoise options to default before running a test
case.

This ensures that the test results are deterministic. Specific test
cases checked that rtla behaves correctly even when the tracer state is
not clean will be added later.

Cc: John Kacur <jkacur@redhat.com>
Cc: Luis Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/20250320092500.101385-6-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-26 10:36:39 -04:00
Tomas Glozar 0122938a7a rtla: Always set all tracer options
rtla currently only sets tracer options that are explicitly set by the
user, with the exception of OSNOISE_WORKLOAD.

This leads to improper behavior in case rtla is run with those options
not set to the default value. rtla does reset them to the original
value upon exiting, but that does not protect it from starting with
non-default values set either by an improperly exited rtla or by another
user of the tracers.

For example, after running this command:

$ echo 1 > /sys/kernel/tracing/osnoise/stop_tracing_us

all runs of rtla will stop at the 1us threshold, even if not requested
by the user:

$ rtla osnoise hist
Index   CPU-000   CPU-001
1             8         5
2             5         9
3             1         2
4             6         1
5             2         1
6             0         1
8             1         1
12            0         1
14            1         0
15            1         0
over:         0         0
count:       25        21
min:          1         1
avg:       3.68      3.05
max:         15        12
rtla osnoise hit stop tracing

Fix the problem by setting the default value for all tracer options if
the user has not provided their own value.

For most of the options, it's enough to just drop the if clause checking
for the value being set. For cpus, "all" is used as the default value,
and for osnoise default period and runtime, default values of
the osnoise_data variable in trace_osnoise.c are used.

Cc: Luis Goncalves <lgoncalv@redhat.com>
Link: https://lore.kernel.org/20250320092500.101385-5-tglozar@redhat.com
Fixes: 1eceb2fc2c ("rtla/osnoise: Add osnoise top mode")
Fixes: 829a6c0b56 ("rtla/osnoise: Add the hist mode")
Fixes: a828cd18bc ("rtla: Add timerlat tool and timelart top mode")
Fixes: 1eeb6328e8 ("rtla/timerlat: Add timerlat hist mode")
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
Reviewed-by: John Kacur <jkacur@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-03-26 10:36:39 -04:00