linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	5779de8d36	rtla updaets for v6.19: - Officially add Tomas Glozar as a maintainer to RTLA tool - Add for_each_monitored_cpu() helper In multiple places, RTLA tools iterate over the list of CPUs running tracer threads. Use single helper instead of repeating the for/if combination. - Remove unused variable option_index in argument parsing RTLA tools use getopt_long() for argument parsing. For its last argument, an unused variable "option_index" is passed. Remove the variable and pass NULL to getopt_long() to shorten the naturally long parsing functions, and make them more readable. - Fix unassigned nr_cpus after code consolidation In recent code consolidation, timerlat tool cleanup, previously implemented separately for each tool, was moved to a common function timerlat_free(). The cleanup relies on nr_cpus being set. This was not done in the new function, leaving the variable uninitialized. Initialize the variable properly, and remove silencing of compiler warning for uninitialized variables. - Stop tracing on user latency in BPF mode Despite the name, rtla-timerlat's -T/--thread option sets timerlat's stop_tracing_total_us option, which also stops tracing on return-from-user latency, not only on thread latency. Implement the same behavior also in BPF sample collection stop tracing handler to avoid a discrepancy and restore correspondence of behavior with the equivalent option of cyclictest. - Fix threshold actions always triggering A bug in threshold action logic caused the action to execute even if tracing did not stop because of threshold. Fix the logic to stop correctly. - Fix few minor issues in tests Extend tests that were shown to need it to 5s, fix osnoise test calling timerlat by mistake, and use new, more reliable output checking in timerlat's "top stop at failed action" test. - Do not print usage on argument parsing error RTLA prints the entire usage message on encountering errors in argument parsing, like a malformed CPU list. The usage message has gotten too long. Instead of printing it, use newly added fatal() helper function to simply exit with the error message, excluding the usage. - Fix unintuitive -C/--cgroup interface "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that being a common way to specify an option with argument. Moreover, using them fails silently and no cgroup is set. Create new helper function to unify the handling of all such options and allow all of: -Xsomething -X=something -X something as well as the equivalent for the long option. - Fix -a overriding -t argument filename Fix a bug where -a following -t custom_file.txt overrides the custom filename with the default timerlat_trace.txt. - Stop tracing correctly on multiple events at once In some race scenarios, RTLA BPF sample collection might send multiple stop tracing events via the BPF ringbuffer at once. Compare the number of events for != 0 instead of == 1 to cover for this scenario and stop tracing properly. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS9bxBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qhrgAP0a/AtsL9+IFXAK5JK8aO1XWApVyK9n 48FRZWu/jrupuAD7BO+EHazmPEourNaUqYPeuymwxT+4O47RH1Q/aasLQwo= =RvNH -----END PGP SIGNATURE----- Merge tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull rtla trace tooling updates from Steven Rostedt: - Officially add Tomas Glozar as a maintainer to RTLA tool - Add for_each_monitored_cpu() helper In multiple places, RTLA tools iterate over the list of CPUs running tracer threads. Use single helper instead of repeating the for/if combination. - Remove unused variable option_index in argument parsing RTLA tools use getopt_long() for argument parsing. For its last argument, an unused variable "option_index" is passed. Remove the variable and pass NULL to getopt_long() to shorten the naturally long parsing functions, and make them more readable. - Fix unassigned nr_cpus after code consolidation In recent code consolidation, timerlat tool cleanup, previously implemented separately for each tool, was moved to a common function timerlat_free(). The cleanup relies on nr_cpus being set. This was not done in the new function, leaving the variable uninitialized. Initialize the variable properly, and remove silencing of compiler warning for uninitialized variables. - Stop tracing on user latency in BPF mode Despite the name, rtla-timerlat's -T/--thread option sets timerlat's stop_tracing_total_us option, which also stops tracing on return-from-user latency, not only on thread latency. Implement the same behavior also in BPF sample collection stop tracing handler to avoid a discrepancy and restore correspondence of behavior with the equivalent option of cyclictest. - Fix threshold actions always triggering A bug in threshold action logic caused the action to execute even if tracing did not stop because of threshold. Fix the logic to stop correctly. - Fix few minor issues in tests Extend tests that were shown to need it to 5s, fix osnoise test calling timerlat by mistake, and use new, more reliable output checking in timerlat's "top stop at failed action" test. - Do not print usage on argument parsing error RTLA prints the entire usage message on encountering errors in argument parsing, like a malformed CPU list. The usage message has gotten too long. Instead of printing it, use newly added fatal() helper function to simply exit with the error message, excluding the usage. - Fix unintuitive -C/--cgroup interface "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that being a common way to specify an option with argument. Moreover, using them fails silently and no cgroup is set. Create new helper function to unify the handling of all such options and allow all of: -Xsomething -X=something -X something as well as the equivalent for the long option. - Fix -a overriding -t argument filename Fix a bug where -a following -t custom_file.txt overrides the custom filename with the default timerlat_trace.txt. - Stop tracing correctly on multiple events at once In some race scenarios, RTLA BPF sample collection might send multiple stop tracing events via the BPF ringbuffer at once. Compare the number of events for != 0 instead of == 1 to cover for this scenario and stop tracing properly. * tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla/timerlat: Exit top main loop on any non-zero wait_retval rtla/tests: Don't rely on matching ^1ALL rtla: Fix -a overriding -t argument rtla: Fix -C/--cgroup interface tools/rtla: Replace osnoise_hist_usage("...") with fatal("...") tools/rtla: Replace osnoise_top_usage("...") with fatal("...") tools/rtla: Replace timerlat_hist_usage("...") with fatal("...") tools/rtla: Replace timerlat_top_usage("...") with fatal("...") tools/rtla: Add fatal() and replace error handling pattern rtla/tests: Fix osnoise test calling timerlat rtla/tests: Extend action tests to 5s tools/rtla: Fix --on-threshold always triggering rtla/timerlat_bpf: Stop tracing on user latency tools/rtla: Fix unassigned nr_cpus tools/rtla: Remove unused optional option_index tools/rtla: Add for_each_monitored_cpu() helper MAINTAINERS: Add Tomas Glozar as a maintainer to RTLA tool	2025-12-05 09:34:01 -08:00
Crystal Wood	3138df6f0c	rtla/timerlat: Exit top main loop on any non-zero wait_retval Comparing to exactly 1 will fail if more than one ring buffer event was seen since the last call to timerlat_bpf_wait(), which can happen in some race scenarios. Signed-off-by: Crystal Wood <crwood@redhat.com> Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Crystal Wood	61f1fd5d69	rtla/tests: Don't rely on matching ^1ALL The timerlat "top stop at failed action" test was relying on "ALL" being printed immediately after the "1" from the threshold action. Besides being fragile, this depends on stdbuf behavior, which is easy to miss when recreating the test outside of the framework for debugging purposes. Instead, use the expected/unexpected text mechanism from the corresponding osnoise test. Signed-off-by: Crystal Wood <crwood@redhat.com> Link: https://lore.kernel.org/r/20251112152529.956778-2-crwood@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Ivan Pravdin	ddb6e42494	rtla: Fix -a overriding -t argument When running rtla as `rtla <timerlat\|osnoise> <top\|hist> -t custom_file.txt -a 100` -a options override trace output filename specified by -t option. Running the command above will create <timerlat\|osnoise>_trace.txt file instead of custom_file.txt. Fix this by making sure that -a option does not override trace output filename even if it's passed after trace output filename is specified. Fixes: `173a3b0148` ("rtla/timerlat: Add the automatic trace option") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Ivan Pravdin	7b71f3a698	rtla: Fix -C/--cgroup interface Currently, user can only specify cgroup to the tracer's thread the following ways: `-C[cgroup]` `-C[=cgroup]` `--cgroup[=cgroup]` If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`, the parser silently fails and rtla's cgroup is used for the tracer threads. To make interface more user-friendly, allow user to specify cgroup in the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`. Refactor identical logic between -t/--trace and -C/--cgroup into a common function. Change documentation to reflect this user interface change. Fixes: `a957cbc025` ("rtla: Add -C cgroup support") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com [ use capital letter in subject, as required by tracing subsystem ] Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Costa Shulyupin	49c1579419	tools/rtla: Replace osnoise_hist_usage("...") with fatal("...") A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace osnoise_hist_usage("...") with fatal("...") on errors. Remove the already unused 'usage' argument from osnoise_hist_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-6-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Costa Shulyupin	92b5b55e5e	tools/rtla: Replace osnoise_top_usage("...") with fatal("...") A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace osnoise_top_usage("...") with fatal("...") on errors. Remove the already unused 'usage' argument from osnoise_top_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-5-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Costa Shulyupin	8f4264e046	tools/rtla: Replace timerlat_hist_usage("...") with fatal("...") A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace timerlat_hist_usage("...\n") with fatal("...") on errors. Remove the already unused 'usage' argument from timerlat_hist_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-4-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Costa Shulyupin	4e5e7210f9	tools/rtla: Replace timerlat_top_usage("...") with fatal("...") A long time ago, when the usage help was short, it was a favor to the user to show it on error. Now that the usage help has become very long, it is too noisy to dump the complete help text for each typo after the error message itself. Replace timerlat_top_usage("...\n") with fatal("...") on errors. Remove the already unused 'usage' argument from timerlat_top_usage(). Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Costa Shulyupin	8cbb25db81	tools/rtla: Add fatal() and replace error handling pattern The code contains some technical debt in error handling, which complicates the consolidation of duplicated code. Introduce an fatal() function to replace the common pattern of err_msg() followed by exit(EXIT_FAILURE), reducing the length of an already long function. Further patches using fatal() follow. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:27 +01:00
Tomas Glozar	34c170ae5c	rtla/tests: Fix osnoise test calling timerlat osnoise test "top stop at failed action" is calling timerlat instead of osnoise by mistake. Fix it so that it calls the correct RTLA subcommand. Fixes: `05b7e10687` ("tools/rtla: Add remaining support for osnoise actions") Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20251007095341.186923-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:30:00 +01:00
Tomas Glozar	d649e9f04c	rtla/tests: Extend action tests to 5s In non-BPF mode, it takes up to 1 second for RTLA to notice that tracing has been stopped. That means that action tests cannot have a 1 second duration, as the SIGALRM will be racing with the threshold overflow. Previously, non-BPF mode actions were buggy and always executed the action, even when stopping on duration or SIGINT, preventing this issue from manifesting. Now that this has been fixed, the tests have become flaky, and this has to be adjusted. Fixes: `4e26f84abf` ("rtla/tests: Add tests for actions") Fixes: `05b7e10687` ("tools/rtla: Add remaining support for osnoise actions") Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20251007095341.186923-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-21 10:29:01 +01:00
Tomas Glozar	417bd0d502	tools/rtla: Fix --on-threshold always triggering Commit `8d933d5c89` ("rtla/timerlat: Add continue action") moved the code performing on-threshold actions (enabled through --on-threshold option) to inside the RTLA main loop. The condition in the loop does not check whether the threshold was actually exceeded or if stop tracing was requested by the user through SIGINT or duration. This leads to a bug where on-threshold actions are always performed, even when the threshold was not hit. (BPF mode is not affected, since it uses a different condition in the while loop.) Add a condition that checks for !stop_tracing before executing the actions. Also, fix incorrect brackets in hist_main_loop to match the semantics of top_main_loop. Fixes: `8d933d5c89` ("rtla/timerlat: Add continue action") Fixes: `2f3172f9dd` ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top") Reviewed-by: Crystal Wood <crwood@redhat.com> Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20251007095341.186923-1-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-20 13:15:55 +01:00
Tomas Glozar	e4240db933	rtla/timerlat_bpf: Stop tracing on user latency rtla-timerlat allows a thread latency threshold to be set via the -T/--thread option. However, the timerlat tracer calls this total latency (stop_tracing_total_us), and stops tracing also when the return-to-user latency is over the threshold. Change the behavior of the timerlat BPF program to reflect what the timerlat tracer is doing, to avoid discrepancy between stopping collecting data in the BPF program and stopping tracing in the timerlat tracer. Cc: stable@vger.kernel.org Fixes: `e34293ddce` ("rtla/timerlat: Add BPF skeleton to collect samples") Reviewed-by: Wander Lairson Costa <wander@redhat.com> Link: https://lore.kernel.org/r/20251006143100.137255-1-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-20 13:15:55 +01:00
Costa Shulyupin	b4275b2301	tools/rtla: Fix unassigned nr_cpus In recently introduced timerlat_free(), the variable 'nr_cpus' is not assigned. Assign it with sysconf(_SC_NPROCESSORS_CONF) as done elsewhere. Remove the culprit: -Wno-maybe-uninitialized. The rest of the code is clean. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Fixes: `2f3172f9dd` ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top") Link: https://lore.kernel.org/r/20251002170846.437888-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-20 13:15:54 +01:00
Costa Shulyupin	671314fce1	tools/rtla: Remove unused optional option_index The longindex argument of getopt_long() is optional and tied to the unused local variable option_index. Remove it to shorten the four longest functions and make the code neater. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-20 13:15:53 +01:00
Costa Shulyupin	04fa6bf373	tools/rtla: Add for_each_monitored_cpu() helper The rtla tools have many instances of iterating over CPUs while checking if they are monitored. Add a for_each_monitored_cpu() helper macro to make the code more readable and reduce code duplication. Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com>	2025-11-20 13:15:53 +01:00
Zhang Chujun	53afec2c8f	tracing/tools: Fix incorrcet short option in usage text for --threads The help message incorrectly listed '-t' as the short option for --threads, but the actual getopt_long configuration uses '-e'. This mismatch can confuse users and lead to incorrect command-line usage. This patch updates the usage string to correctly show: "-e, --threads NRTHR" to match the implementation. Note: checkpatch.pl reports a false-positive spelling warning on 'Run', which is intentional. Link: https://patch.msgid.link/20251106031040.1869-1-zhangchujun@cmss.chinamobile.com Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-11-07 07:59:37 -05:00
Linus Torvalds	d9f24f8e60	rtla: Updates for v6.18 - This update is mostly just consolidating code between osnoise/timerlat and top/hist for easier maintenance and less future divergence. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaN/guhQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qqU9AQCO+u+Qmx678DCfDJo9X1UPDtS/bM5f r30X1pwYfZ3nNAEA47hbkVFcryFJZbrIPxuTGb0GSM36PHAxmch4QAwBqgs= =qzZh -----END PGP SIGNATURE----- Merge tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt - This is mostly just consolidating code between osnoise/timerlat and top/hist for easier maintenance and less future divergence * tag 'trace-tools-v6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tools/rtla: Add remaining support for osnoise actions tools/rtla: Add test engine support for unexpected output tools/rtla: Fix -A option name in test comment tools/rtla: Consolidate code between osnoise/timerlat and hist/top tools/rtla: Create common_apply_config() tools/rtla: Move top/hist params into common struct tools/rtla: Consolidate common parameters into shared structure	2025-10-05 09:38:26 -07:00
Wander Lairson Costa	2227f273b7	rtla/actions: Fix condition for buffer reallocation The condition to check if the actions buffer needs to be resized was incorrect. The check `self->size >= self->len` would evaluate to true on almost every call to `actions_new()`, causing the buffer to be reallocated unnecessarily each time an action was added. Fix the condition to `self->len >= self.size`, ensuring that the buffer is only resized when it is actually full. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250915181101.52513-1-wander@redhat.com Fixes: `6ea082b171` ("rtla/timerlat: Add action on threshold feature") Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 06:01:20 -04:00
Ivan Pravdin	b1e0ff7209	rtla: Fix buffer overflow in actions_parse Currently, tests 3 and 13-22 in tests/timerlat.t fail with error: * buffer overflow detected *: terminated timeout: the monitored command dumped core The result of running `sudo make check` is tests/timerlat.t (Wstat: 0 Tests: 22 Failed: 11) Failed tests: 3, 13-22 Files=3, Tests=34, 140 wallclock secs ( 0.07 usr 0.01 sys + 27.63 cusr 27.96 csys = 55.67 CPU) Result: FAIL Fix buffer overflow in actions_parse to avoid this error. After this change, the tests results are tests/hwnoise.t ... ok tests/osnoise.t ... ok tests/timerlat.t .. ok All tests successful. Files=3, Tests=34, 186 wallclock secs ( 0.06 usr 0.01 sys + 41.10 cusr 44.38 csys = 85.55 CPU) Result: PASS Link: https://lore.kernel.org/164ffc2ec8edacaf1295789dad82a07817b6263d.1757034919.git.ipravdin.official@gmail.com Fixes: `6ea082b171` ("rtla/timerlat: Add action on threshold feature") Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 05:57:29 -04:00
Crystal Wood	05b7e10687	tools/rtla: Add remaining support for osnoise actions The basic functionality came with the consolidation; now hook up the command line options, and add documentation and tests. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-8-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:53:48 -04:00
Crystal Wood	3cd6b18d10	tools/rtla: Add test engine support for unexpected output Add a check() parameter to indicate which text must not appear in the output. Simplify the code so that we can print failures as they happen rather than trying to figure out what went wrong after printing "not ok". This also means that "not ok" gets printed after the info rather than before, which seems more intuitive anyway. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-7-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:53:21 -04:00
Crystal Wood	c4e30c22ba	tools/rtla: Fix -A option name in test comment This was changed to --on-threshold when the patches were applied. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-6-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:53:07 -04:00
Crystal Wood	2f3172f9dd	tools/rtla: Consolidate code between osnoise/timerlat and hist/top Currently a lot of code is duplicated between the different rtla tools, making maintenance more difficult, and encouraging divergence such as features that are only implemented for certain tools even though they could be more broadly applicable. Merge the various main() functions into a common run_tool() with an ops struct for tool-specific details. Implement enough support for actions on osnoise to not need to keep the old params->trace_output path. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-5-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:52:57 -04:00
Crystal Wood	263d7eacf8	tools/rtla: Create common_apply_config() Merge the common bits of osnoise_apply_config() and timerlat_apply_config(). Put the result in a new common.c, and move enough things to common.h so that common.c does not need to include osnoise.h. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-4-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:52:46 -04:00
Crystal Wood	5742bf62e6	tools/rtla: Move top/hist params into common struct The hist members were very similar between timerlat and top, so just use one common hist struct. output_divisor, quiet, and pretty printing are pretty generic concepts that can go in the main struct even if not every specific tool (currently) uses them. Cc: John Kacur <jkacur@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-3-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:52:17 -04:00
Costa Shulyupin	344823886e	tools/rtla: Consolidate common parameters into shared structure timerlat_params and osnoise_params structures contain 15 identical fields. Introduce a new header common.h and define a common_params structure to consolidate shared fields, reduce code duplication, and enhance maintainability. Cc: John Kacur <jkacur@redhat.com> Link: https://lore.kernel.org/20250907022325.243930-2-crwood@redhat.com Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Crystal Wood <crwood@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-09-27 04:51:18 -04:00
Tao Chen	7b128f1d53	rtla: Check pkg-config install The tool pkg-config used to check libtraceevent and libtracefs, if not installed, it will report the libs not found, even though they have already been installed. Before: libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel libtracefs is missing. Please install libtracefs-dev/libtracefs-devel After: Makefile.config:10: *** Error: pkg-config needed by libtraceevent/libtracefs is missing on this system, please install it. Link: https://lore.kernel.org/20250808040527.2036023-2-chen.dylane@linux.dev Fixes: `01474dc706` ("tools/rtla: Use tools/build makefiles to build rtla") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-08-19 20:32:54 -04:00
Tao Chen	26ebba25e2	tools/latency-collector: Check pkg-config install The tool pkg-config used to check libtraceevent and libtracefs, if not installed, it will report the libs not found, even though they have already been installed. Before: libtraceevent is missing. Please install libtraceevent-dev/libtraceevent-devel libtracefs is missing. Please install libtracefs-dev/libtracefs-devel After: Makefile.config:10: *** Error: pkg-config needed by libtraceevent/libtracefs is missing on this system, please install it. Link: https://lore.kernel.org/20250808040527.2036023-1-chen.dylane@linux.dev Fixes: `9d56c88e52` ("tools/tracing: Use tools/build makefiles on latency-collector") Signed-off-by: Tao Chen <chen.dylane@linux.dev> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-08-19 20:32:54 -04:00
Tomas Glozar	a80db1f857	rtla/tests: Test timerlat -P option using actions The -P option is used to set priority of osnoise and timerlat threads. Extend the test for -P with --on-threshold calling a script that looks for running timerlat threads and checks if their priority is set correctly. As --on-threshold is only supported by timerlat at the moment, this is only implemented there so far. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250725133817.59237-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-28 10:22:39 -04:00
Tomas Glozar	892ae5f806	rtla/tests: Add grep checks for base test cases Checking for patterns in rtla output with grep was added to test rtla actions. Add grep checks also for base tests where applicable. Also fix trace event histogram trigger check to use the correct syntax for the command-line option so that the test passes with the grep check. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Link: https://lore.kernel.org/20250725133817.59237-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-28 10:22:38 -04:00
Tomas Glozar	04f837165b	rtla/tests: Limit duration to maximum of 10s Many of the original rtla tests included durations of 1 minute and 30 seconds. Experience has shown this is unnecessary, since 10 seconds as waiting time for samples to appear. Change duration of all rtla tests to at most 10 seconds. This speeds up testing significantly. Before: $ make check All tests successful. Files=3, Tests=54, 536 wallclock secs ( 0.03 usr 0.00 sys + 20.31 cusr 22.02 csys = 42.36 CPU) Result: PASS After: $ make check ... All tests successful. Files=3, Tests=54, 196 wallclock secs ( 0.03 usr 0.01 sys + 20.28 cusr 20.68 csys = 41.00 CPU) Result: PASS Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-9-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:57 -04:00
Tomas Glozar	4e26f84abf	rtla/tests: Add tests for actions Add a bunch of tests covering most of both --on-threshold and --on-end. Parts sensitive to implementation of hist/top are tested for both. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-8-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:57 -04:00
Tomas Glozar	916a9c5b03	rtla/tests: Check rtla output with grep Add argument to the check command in the test suite that takes a regular expression that the output of rtla command is checked against. This allows testing for specific information in rtla output in addition to checking the return value. Two minor improvements are included: running rtla with "eval" so that arguments with spaces can be passed to it via shell quotations, and the stdout of pushd and popd is suppressed to clean up the test output. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-7-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:57 -04:00
Tomas Glozar	3aadb65db5	rtla/timerlat: Add action on end feature Implement actions on end next to actions on threshold. A new option, --on-end is added, parallel to --on-threshold. Instead of being executed whenever a latency threshold is reached, it is executed at the end of the measurement. For example: $ rtla timerlat hist -d 5s --on-end trace will save the trace output at the end. All actions supported by --on-threshold are also supported by --on-end, except for continue, which does nothing with --on-end. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:57 -04:00
Tomas Glozar	8d933d5c89	rtla/timerlat: Add continue action Introduce option to resume tracing after a latency threshold overflow. The option is implemented as an action named "continue". Example: $ rtla timerlat top -q -T 200 -d 1s --on-threshold \ exec,command="echo Threshold" --on-threshold continue Threshold Threshold Threshold Timer Latency ... The feature is supported for both hist and top. After the continue action is executed, processing of the list of actions is stopped and tracing is resumed. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-5-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:57 -04:00
Tomas Glozar	3b78670e3a	rtla/timerlat_bpf: Allow resuming tracing Currently, rtla-timerlat BPF program uses a global variable stored in a .bss section to store whether tracing has been stopped. Move the information to a separate map, so that it is easily writable from userspace, and add a function that clears the value, resuming tracing after it has been stopped. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-4-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:56 -04:00
Tomas Glozar	6ea082b171	rtla/timerlat: Add action on threshold feature Extend the functionality provided by the -t/--trace option, which triggers saving the contents of a tracefs buffer after tracing is stopped, to support implementing arbitrary actions. A new option, --on-threshold, is added, taking an argument that further specifies the action. Actions added in this patch are: - trace[,file=<filename>]: Saves tracefs buffer, optionally taking a filename. - signal,num=<sig>,pid=<pid>: Sends signal to process. "parent" might be specified instead of number to send signal to parent process. - shell,command=<command>: Execute shell command. Multiple actions may be specified and will be executed in order, including multiple actions of the same type. Trace output requested via -t and -a now adds a trace action to the end of the list. If an action fails, the following actions are not executed. For example, this command: $ rtla timerlat -T 20 --on-threshold trace \ --on-threshold shell,command="grep ipi_send timerlat_trace.txt" \ --on-threshold signal,num=2,pid=parent will send signal 2 (SIGINT) to parent process, but only if saved trace contains the text "ipi_send". This way, the feature can be used for flexible reactions on latency spikes, and allows combining rtla with other tooling like perf. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-3-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:56 -04:00
Tomas Glozar	8b6cbcac76	rtla/timerlat: Introduce enum timerlat_tracing_mode After the introduction of BPF-based sample collection, rtla-timerlat effectively runs in one of three modes: - Pure BPF mode, with tracefs only being used to set up the timerlat tracer. Sample processing and stop on threshold are handled by BPF. - tracefs mode. BPF is unsupported or kernel is lacking the necessary trace event (osnoise:timerlat_sample). Stop on theshold is handled by timerlat tracer stopping tracing in all instances. - BPF/tracefs mixed mode - BPF is used for sample collection for top or histogram, tracefs is used for trace output and/or auto-analysis. Stop on threshold is handled both through BPF program, which stops sample collection for top/histogram and wakes up rtla, and by timerlat tracer, which stops tracing for trace output/auto-analysis instances. Add enum timerlat_tracing_mode, with three values: - TRACING_MODE_BPF - TRACING_MODE_TRACEFS - TRACING_MODE_MIXED Those represent the modes described above. A field of this type is added to struct timerlat_params, named "mode", replacing the no_bpf variable. params->mode is set in timerlat_{top,hist}_parse_args to TRACING_MODE_BPF or TRACING_MODE_MIXED based on whether trace output and/or auto-analysis is requested. timerlat_{top,hist}_main then checks if BPF is not unavailable or disabled, in that case, it sets params->mode to TRACING_MODE_TRACEFS. A condition is added to timerlat_apply_config that skips setting timerlat tracer thresholds if params->mode is TRACING_MODE_BPF (those are unnecessary, since they only turn off tracing, which is already turned off in that case, since BPF is used to collect samples). Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Chang Yin <cyin@redhat.com> Cc: Costa Shulyupin <costa.shul@redhat.com> Cc: Crystal Wood <crwood@redhat.com> Cc: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/20250626123405.1496931-2-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-07-25 16:43:56 -04:00
Linus Torvalds	472c5f736b	tracing tools updates for v6.16: - Set distinctive value for failed tests When running "make check" that performs tests on rtla the failure is checked by examining the output. Instead have the tool return an error status if it exceeds the threadhold. - Define __NR_sched_setattr for LoongArch Define __NR_sched_setattr to allow this to build for LoongArch. - Define _GNU_SOURCE for timerlat_bpf.c Due to modifications of struct sched_attr in utils.h when _GNU_SOURCE is not defined, this can cause errors for timerlat_bpf_init() and breakage in BPF sample collection mode. -----BEGIN PGP SIGNATURE----- iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaDeTzBQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qokRAP0XJzos+uvQtkGrqiX5SB/rn1s3/tiD nZagARyiV06BAwEA+NNzqFyx/BLUwMnpx/HFTnIMGXbRVWCVAEeL3t77zgk= =dx7t -----END PGP SIGNATURE----- Merge tag 'trace-tools-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tools updates from Steven Rostedt: - Set distinctive value for failed tests When running "make check" that performs tests on rtla the failure is checked by examining the output. Instead have the tool return an error status if it exceeds the threadhold. - Define __NR_sched_setattr for LoongArch Define __NR_sched_setattr to allow this to build for LoongArch. - Define _GNU_SOURCE for timerlat_bpf.c Due to modifications of struct sched_attr in utils.h when _GNU_SOURCE is not defined, this can cause errors for timerlat_bpf_init() and breakage in BPF sample collection mode. * tag 'trace-tools-v6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla: Define _GNU_SOURCE in timerlat_bpf.c rtla: Define __NR_sched_setattr for LoongArch rtla: Set distinctive exit value for failed tests	2025-05-29 20:59:52 -07:00
Tomas Glozar	8020361d51	rtla: Define _GNU_SOURCE in timerlat_bpf.c Newer versions of glibc include a definition of struct sched_attr in bits/sched.h (included through sched.h which is included by rtla). Commit `0eecee3406` ("tools/rtla: fix collision with glibc sched_attr/sched_set_attr") has modified the definition of struct sched_attr in utils.h, so that it is only applied with older versions of glibc that do not define it, in order to prevent build failure. The definition in bits/sched.h depends on _GNU_SOURCE. timerlat_bpf.c does not define _GNU_SOURCE, making it fall back to the definition in utils.h. The latter has two fields less, leading to shifted offsets of struct timerlat_params in timerlat_bpf_init. Because of the shift, timerlat_bpf_init incorrectly reads params->entries as 0 for timerlat-hist and disables the creation of histogram maps, causing breakage in BPF sample collection mode: $ rtla timerlat hist -d 1s Error pulling BPF data Fix the issue by also defining _GNU_SOURCE in timerlat_bpf.c. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250430144651.621766-1-tglozar@redhat.com Fixes: `e34293ddce` ("rtla/timerlat: Add BPF skeleton to collect samples") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-05-07 17:25:46 -04:00
Tiezhu Yang	6a38c51a25	rtla: Define __NR_sched_setattr for LoongArch When executing "make -C tools/tracing/rtla" on LoongArch, there exists the following error: src/utils.c:237:24: error: '__NR_sched_setattr' undeclared Just define __NR_sched_setattr for LoongArch if not exist. Link: https://lore.kernel.org/20250422074917.25771-1-yangtiezhu@loongson.cn Reported-by: Haiyong Sun <sunhaiyong@loongson.cn> Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-05-07 16:35:31 -04:00
Costa Shulyupin	18682166f6	rtla: Set distinctive exit value for failed tests A test is considered failed when a sample trace exceeds the threshold. Failed tests return the same exit code as passed tests, requiring test frameworks to determine the result by searching for "hit stop tracing" in the output. Assign a distinct exit code for failed tests to enable the use of shell expressions and seamless integration with testing frameworks without the need to parse output. Add enum type for return value. Update `make check`. Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: John Kacur <jkacur@redhat.com> Cc: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com> Cc: Eder Zulian <ezulian@redhat.com> Cc: Dan Carpenter <dan.carpenter@linaro.org> Cc: Jan Stancek <jstancek@redhat.com> Link: https://lore.kernel.org/20250417185757.2194541-1-costa.shul@redhat.com Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-05-07 15:36:19 -04:00
Tomas Glozar	770840a0e7	Documentation/rtla: Include BPF sample collection Add dependencies needed to build rtla with BPF sample collection support to README, and document both ways of sample collection in the manpages. Signed-off-by: Tomas Glozar <tglozar@redhat.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Reviewed-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Link: https://lore.kernel.org/r/20250311114936.148012-5-tglozar@redhat.com	2025-04-14 10:42:55 -06:00
Linus Torvalds	4fa118e5b7	tracing tooling updates for 6.15: - Allow RTLA to collect data via BPF The current implementation of rtla uses libtracefs and libtraceevent to pull sample events generated by the timerlat tracer from the trace buffer. rtla then processes the sample by updating the histogram and summary (current, maximum, minimum, and sum values) as well as checks if tracing has been stopped due to threshold overflow. In use cases where a large number of samples is being generated, that is, with measurements running on many CPUs and with a low interval, this sample processing design causes a significant CPU load on the rtla side. Furthermore, with >100 CPUs and 100us interval, rtla was reported as not being able to keep up with the samples and dropping most of them, leading to it being unusable. Change the way the timerlat trace processes samples by attaching a BPF program to the trace event using the BPF skeleton feature of bpftool. Unlike the current implementation, the BPF implementation does not check whether tracing is stopped (in BPF mode, tracing is always off to improve performance), but waits for a write to a BPF ringbuffer instead. This allows rtla to exit immediately when a threshold is violated, without waiting for the next iteration of the while loop. If the requirements for the BPF implementation are not met, either at build time or at run time, the current implementation is used as fallback. Which implementation is being used can be seen when running rtla timerlat with "-D" option. rtla can be forced to run in non-BPF mode by setting the RTLA_NO_BPF option to 1, for debugging purposes. - Fix LD_FLAGS from being dropped in build - Refactor code to remove duplication of save_trace_to_file - Always set options and do not rely on default settings Do not rely on the default kernel settings of the tracers when starting. They could have been changed by the user which gives inconsistent results. Always set the options that rtla expects. - Add creation of ctags and TAGS for traversing code -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZ+WBgRQccm9zdGVkdEBn b29kbWlzLm9yZwAKCRAp5XQQmuv6qg54AQDCOChaSSBiUkD0VoPKIeDMlPfvO5Qz Xvrst5gtopfKFgEA12/9Lll/sh1eoc4saeGBooNY48HBUMjmX3KNFB14PQg= =aHFu -----END PGP SIGNATURE----- Merge tag 'trace-tools-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing tooling updates from Steven Rostedt: - Allow RTLA to collect data via BPF The current implementation of rtla uses libtracefs and libtraceevent to pull sample events generated by the timerlat tracer from the trace buffer. rtla then processes the sample by updating the histogram and summary (current, maximum, minimum, and sum values) as well as checks if tracing has been stopped due to threshold overflow. In use cases where a large number of samples is being generated, that is, with measurements running on many CPUs and with a low interval, this sample processing design causes a significant CPU load on the rtla side. Furthermore, with >100 CPUs and 100us interval, rtla was reported as not being able to keep up with the samples and dropping most of them, leading to it being unusable. Change the way the timerlat trace processes samples by attaching a BPF program to the trace event using the BPF skeleton feature of bpftool. Unlike the current implementation, the BPF implementation does not check whether tracing is stopped (in BPF mode, tracing is always off to improve performance), but waits for a write to a BPF ringbuffer instead. This allows rtla to exit immediately when a threshold is violated, without waiting for the next iteration of the while loop. If the requirements for the BPF implementation are not met, either at build time or at run time, the current implementation is used as fallback. Which implementation is being used can be seen when running rtla timerlat with "-D" option. rtla can be forced to run in non-BPF mode by setting the RTLA_NO_BPF option to 1, for debugging purposes. - Fix LD_FLAGS from being dropped in build - Refactor code to remove duplication of save_trace_to_file - Always set options and do not rely on default settings Do not rely on the default kernel settings of the tracers when starting. They could have been changed by the user which gives inconsistent results. Always set the options that rtla expects. - Add creation of ctags and TAGS for traversing code * tag 'trace-tools-v6.15' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: rtla: Add the ability to create ctags and etags rtla/tests: Test setting default options rtla/tests: Reset osnoise options before check rtla: Always set all tracer options rtla/osnoise: Set OSNOISE_WORKLOAD to true rtla: Unify apply_config between top and hist rtla/osnoise: Unify params struct rtla: Fix segfault in save_trace_to_file call tools/build: Use SYSTEM_BPFTOOL for system bpftool rtla: Refactor save_trace_to_file tools/rv: Keep user LDFLAGS in build rtla/timerlat: Test BPF mode rtla/timerlat_top: Use BPF to collect samples rtla/timerlat_top: Move divisor to update rtla/timerlat_hist: Use BPF to collect samples rtla/timerlat: Add BPF skeleton to collect samples rtla: Add optional dependency on BPF tooling tools/build: Add bpftool-skeletons feature test rtla/timerlat: Unify params struct	2025-03-27 17:03:01 -07:00
John Kacur	732032692f	rtla: Add the ability to create ctags and etags - Add the ability to create and remove ctags and etags, using the following make tags make TAGS make tags_clean - fix a comment in Makefile.rtla with the correct spelling and don't imply that the ability to create an rtla tarball will be removed Cc: Tomas Glozar <tglozar@redhat.com> Cc: "Luis Claudio R . Goncalves" <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250321175053.29048-1-jkacur@redhat.com Signed-off-by: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-03-26 10:39:40 -04:00
Tomas Glozar	a86150f310	rtla/tests: Test setting default options Add function to test engine to test with pre-set osnoise options, and use it to test whether osnoise period (as an example) is set correctly. The test works by pre-setting a high period of 10 minutes and stop on threshold. Thus, it is easy to check whether rtla is properly resetting the period to default: if it is, the test will complete on time, since the first sample will overflow the threshold. If not, it will time out. Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250320092500.101385-7-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-03-26 10:36:39 -04:00
Tomas Glozar	6c6182728a	rtla/tests: Reset osnoise options before check Remove any dangling tracing instances from previous improperly exited runs of rtla, and reset osnoise options to default before running a test case. This ensures that the test results are deterministic. Specific test cases checked that rtla behaves correctly even when the tracer state is not clean will be added later. Cc: John Kacur <jkacur@redhat.com> Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250320092500.101385-6-tglozar@redhat.com Signed-off-by: Tomas Glozar <tglozar@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-03-26 10:36:39 -04:00
Tomas Glozar	0122938a7a	rtla: Always set all tracer options rtla currently only sets tracer options that are explicitly set by the user, with the exception of OSNOISE_WORKLOAD. This leads to improper behavior in case rtla is run with those options not set to the default value. rtla does reset them to the original value upon exiting, but that does not protect it from starting with non-default values set either by an improperly exited rtla or by another user of the tracers. For example, after running this command: $ echo 1 > /sys/kernel/tracing/osnoise/stop_tracing_us all runs of rtla will stop at the 1us threshold, even if not requested by the user: $ rtla osnoise hist Index CPU-000 CPU-001 1 8 5 2 5 9 3 1 2 4 6 1 5 2 1 6 0 1 8 1 1 12 0 1 14 1 0 15 1 0 over: 0 0 count: 25 21 min: 1 1 avg: 3.68 3.05 max: 15 12 rtla osnoise hit stop tracing Fix the problem by setting the default value for all tracer options if the user has not provided their own value. For most of the options, it's enough to just drop the if clause checking for the value being set. For cpus, "all" is used as the default value, and for osnoise default period and runtime, default values of the osnoise_data variable in trace_osnoise.c are used. Cc: Luis Goncalves <lgoncalv@redhat.com> Link: https://lore.kernel.org/20250320092500.101385-5-tglozar@redhat.com Fixes: `1eceb2fc2c` ("rtla/osnoise: Add osnoise top mode") Fixes: `829a6c0b56` ("rtla/osnoise: Add the hist mode") Fixes: `a828cd18bc` ("rtla: Add timerlat tool and timelart top mode") Fixes: `1eeb6328e8` ("rtla/timerlat: Add timerlat hist mode") Signed-off-by: Tomas Glozar <tglozar@redhat.com> Reviewed-by: John Kacur <jkacur@redhat.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>	2025-03-26 10:36:39 -04:00

1 2 3 4 5

203 Commits