Compare commits

...

110 Commits

Author SHA1 Message Date
Linus Torvalds d1d36025a6 Probes for v6.19
- fprobe: Performance enhancement of the fprobe using rhltable
   . fprobe: use rhltable for fprobe_ip_table. The fprobe IP table has
     been converted to use an rhltable for improved performance when
     dealing with a large number of probed functions.
   . Fix a suspicious RCU usage warning of the above change in the
     fprobe entry handler.
   . Remove an unused local variable of the above change.
   . Fix to initialize fprobe_ip_table in core_initcall().
 
 - fprobe: Performance optimization of fprobe by ftrace
   . fprobe: Use ftrace instead of fgraph for entry only probes. This
     avoids the unneeded overhead of fgraph stack setup.
   . Also update fprobe selftest for entry-only probe.
   . fprobe: Use ftrace only if CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
     WITH_REGS is defined.
 
 - probes: Cleanup probe event subsystems.
   . uprobe/eprobe: Allocate traceprobe_parse_context per probe instead
     of each probe argument parsing. This reduce memory allocation/free
     of temporary working memory.
   . uprobes: Cleanup code using __free().
   . eprobes: Cleanup code using __free().
   . probes: Cleanup code using __free(trace_probe_log_clear) to clear
     error log automatically.
   . probes: Replace strcpy() with memcpy() in __trace_probe_log_err().
 -----BEGIN PGP SIGNATURE-----
 
 iQFPBAABCgA5FiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmkvhSsbHG1hc2FtaS5o
 aXJhbWF0c3VAZ21haWwuY29tAAoJENv7B78FKz8bWhEH/23XM5Msjy5vopB+ECZb
 iCj8SkWrQzfiCBILUqxCkZdfJHFomGPHewxvxIOWdb7evtHuy0Ypne/Uw/TMAtAh
 xvDQmu03IV2jO7h7GExsnEh0nX0upYg4IVmN0sCSSWSfgLLTWO9ICClavV9adcva
 ZR+5TdZbK+W59n+ejxA9OMDt1G+nz1Ls9Qhx9ktf7odkJzBkQGPq/heZuPbF3+6k
 Vj2IHTuqWobDDt+ekKOBRWNh9cS61ybxvsr/vmkT6s904ortP6mZa3zEYPRVOUNG
 WJ/KGJwvExTcaG/Dy2g6q8tam1Bidx9/S6klyOGXQXxvaIT1VtBc66HzAUfso6jg
 yIc=
 =w6Kq
 -----END PGP SIGNATURE-----

Merge tag 'probes-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes updates from Masami Hiramatsu:
 "fprobe performance enhancement using rhltable:
   - use rhltable for fprobe_ip_table. The fprobe IP table has been
     converted to use an rhltable for improved performance when dealing
     with a large number of probed functions
   - Fix a suspicious RCU usage warning of the above change in the
     fprobe entry handler
   - Remove an unused local variable of the above change
   - Fix to initialize fprobe_ip_table in core_initcall()

  Performance optimization of fprobe by ftrace:
   - Use ftrace instead of fgraph for entry only probes. This avoids the
     unneeded overhead of fgraph stack setup
   - Also update fprobe selftest for entry-only probe
   - fprobe: Use ftrace only if CONFIG_DYNAMIC_FTRACE_WITH_ARGS or
     WITH_REGS is defined

  Cleanup probe event subsystems:
   - Allocate traceprobe_parse_context per probe instead of each probe
     argument parsing. This reduce memory allocation/free of temporary
     working memory
   - Cleanup code using __free()
   - Replace strcpy() with memcpy() in __trace_probe_log_err()"

* tag 'probes-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: fprobe: use ftrace if CONFIG_DYNAMIC_FTRACE_WITH_ARGS
  lib/test_fprobe: add testcase for mixed fprobe
  tracing: fprobe: optimization for entry only case
  tracing: fprobe: Fix to init fprobe_ip_table earlier
  tracing: fprobe: Remove unused local variable
  tracing: probes: Replace strcpy() with memcpy() in __trace_probe_log_err()
  tracing: fprobe: fix suspicious rcu usage in fprobe_entry
  tracing: uprobe: eprobes: Allocate traceprobe_parse_context per probe
  tracing: uprobes: Cleanup __trace_uprobe_create() with __free()
  tracing: eprobe: Cleanup eprobe event using __free()
  tracing: probes: Use __free() for trace_probe_log
  tracing: fprobe: use rhltable for fprobe_ip_table
2025-12-05 10:55:47 -08:00
Linus Torvalds 2e8c1c6a50 ktest: Fix for v6.19:
- Fix incorrect variable in error message in config-bisect.pl
 
   If the old config file fails to get copied as the last good or bad
   config file, then it fails the program and prints an error message.
   But the variable used to print what the old config's name was incorrect.
   It was $config when it should have been $output_config.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTL5PxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qnWBAQDtZ+UNtbeNR2eHzbcQ1+ENi0aWGwF9
 e93hKyvAWkMgWAEAklDIdstyCaSQQgq3X4ilv1kaG1eu+KSWNnyhmqnyqAM=
 =KPW2
 -----END PGP SIGNATURE-----

Merge tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest

Pull ktest fix from Steven Rostedt:

 - Fix incorrect variable in error message in config-bisect.pl

   If the old config file fails to get copied as the last good or bad
   config file, then it fails the program and prints an error message.

   But the variable used to print what the old config's name was
   incorrect. It was $config when it should have been $output_config.

* tag 'ktest-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
  ktest.pl: Fix uninitialized var in config-bisect.pl
2025-12-05 10:53:43 -08:00
Linus Torvalds 2ba59045fb - Add helper functions for allocations
The allocation of the per CPU buffer descriptor, the buffer page
   descriptors and the buffer page data itself can be pretty ugly.
   Add some helper macros and a function to have the code that allocates
   buffer pages and such look a little cleaner.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTL3JxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qvDgAP9HFxPe2EqGspnY0RungWDs3yCxqlUp
 Eqz7SaI9GCXdXgD/TKiz3YjNVxZveeDU6QHWsDl4svoBzjSAsaeTkXD+OQ8=
 =siR0
 -----END PGP SIGNATURE-----

Merge tag 'trace-ringbuffer-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull trace ring-buffer cleanup from Steven Rostedt:

 - Add helper functions for allocations

   The allocation of the per CPU buffer descriptor, the buffer page
   descriptors and the buffer page data itself can be pretty ugly.

   Add some helper macros and a function to have the code that allocates
   buffer pages and such look a little cleaner.

* tag 'trace-ringbuffer-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  ring-buffer: Add helper functions for allocations
2025-12-05 10:50:24 -08:00
Linus Torvalds 0b1b4a3d8e Runtime verifier updates for v6.19:
- Adapt the ftracetest script to be run from a different folder
 
   This uses the already existing OPT_TEST_DIR but extends it further to run
   independent tests, then add an --rv flag to allow using the script for
   testing RV (mostly) independently on ftrace.
 
 - Add basic RV selftests in selftests/verification for more validations
 
   Add more validations for available/enabled monitors and reactors. This
   could have caught the bug introducing kernel panic solved above. Tests use
   ftracetest.
 
 - Convert react() function in reactor to use va_list directly
 
   Use a central helper to handle the variadic arguments. Clean up macros
   and mark functions as static.
 
 - Add lockdep annotations to reactors to have lockdep complain of errors
 
   If the reactors are called from improper context. Useful to develop new
   reactors. This highlights a warning in the panic reactor that is related
   to the printk subsystem and not to RV.
 
 - Convert core RV code to use lock guards and __free helpers
 
   This completely removes goto statements.
 
 - Fix compilation if !CONFIG_RV_REACTORS
 
   Fix the warning by keeping LTL monitor variable as always static.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaTBoVxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qtWpAQDxPQAJQvBZ41l9q9Cis7PqGGezT4Nv
 g6Fh/ydMOlJCsQD/R0Xd5JxPmBI8FLCwCfqHo7wYKUhP8GfL/ORPEWhU2gI=
 =EEot
 -----END PGP SIGNATURE-----

Merge tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull runtime verifier updates from Steven Rostedt:

 - Adapt the ftracetest script to be run from a different folder

   This uses the already existing OPT_TEST_DIR but extends it further to
   run independent tests, then add an --rv flag to allow using the
   script for testing RV (mostly) independently on ftrace.

 - Add basic RV selftests in selftests/verification for more validations

   Add more validations for available/enabled monitors and reactors.
   This could have caught the bug introducing kernel panic solved above.
   Tests use ftracetest.

 - Convert react() function in reactor to use va_list directly

   Use a central helper to handle the variadic arguments. Clean up
   macros and mark functions as static.

 - Add lockdep annotations to reactors to have lockdep complain of
   errors

   If the reactors are called from improper context. Useful to develop
   new reactors. This highlights a warning in the panic reactor that is
   related to the printk subsystem and not to RV.

 - Convert core RV code to use lock guards and __free helpers

   This completely removes goto statements.

 - Fix compilation if !CONFIG_RV_REACTORS

   Fix the warning by keeping LTL monitor variable as always static.

* tag 'trace-rv-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix compilation if !CONFIG_RV_REACTORS
  rv: Convert to use __free
  rv: Convert to use lock guard
  rv: Add explicit lockdep context for reactors
  rv: Make rv_reacting_on() static
  rv: Pass va_list to reactors
  selftests/verification: Add initial RV tests
  selftest/ftrace: Generalise ftracetest to use with RV
2025-12-05 10:17:00 -08:00
Linus Torvalds 0771cee974 ftrace fixes for v6.19:
- Fix regression of pid filtering of function graph tracer
 
   When the function graph tracer allowed multiple instances of
   graph tracing using subops, the filtering by pid broke.
 
   The ftrace_ops->private that was used for pid filtering wasn't
   updated on creation.
 
   The wrong function entry callback was used when pid filtering was
   enabled when the function graph tracer started, which meant that
   the pid filtering wasn't happening.
 
 - Remove no longer needed ftrace_trace_task()
 
   With PID filtering working via ftrace_pids_enabled() and fgraph_pid_func(),
   the coarse-grained ftrace_trace_task() check in graph_entry() is obsolete.
 
   It was only a fallback for uninitialized op->private (now fixed), and its
   removal ensures consistent PID filtering with standard function tracing.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS90FhQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qrqMAQDbU53VhvZ6rE0pNvu0Tlk+LDCu3gxg
 F2wisWr65389OgD/VFLTVRjCZh1iY7FFWjAPGRCMbetljmMgK5vpH6XSigA=
 =VKaD
 -----END PGP SIGNATURE-----

Merge tag 'ftrace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull ftrace updates from Steven Rostedt:

 - Fix regression of pid filtering of function graph tracer

   When the function graph tracer allowed multiple instances of graph
   tracing using subops, the filtering by pid broke.

   The ftrace_ops->private that was used for pid filtering wasn't
   updated on creation.

   The wrong function entry callback was used when pid filtering was
   enabled when the function graph tracer started, which meant that
   the pid filtering wasn't happening.

 - Remove no longer needed ftrace_trace_task()

   With PID filtering working via ftrace_pids_enabled() and
   fgraph_pid_func(), the coarse-grained ftrace_trace_task()
   check in graph_entry() is obsolete.

   It was only a fallback for uninitialized op->private (now fixed),
   and its removal ensures consistent PID filtering with standard
   function tracing.

* tag 'ftrace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  fgraph: Remove coarse PID filtering from graph_entry()
  fgraph: Check ftrace_pids_enabled on registration for early filtering
  fgraph: Initialize ftrace_ops->private for function graph ops
2025-12-05 10:13:04 -08:00
Linus Torvalds 69c5079b49 tracing updates for v6.19:
- Merge branch shared with kprobes on extending trace options
 
   The trace options were defined by a 32 bit variable. This limits the
   tracing instances to have a total of 32 different options. As that limit
   has been hit, and more options are being added, increase the option mask
   to a 64 bit number, doubling the number of options available.
 
   As this is required for the kprobe topic branches as well as the tracing
   topic branch, a separate branch was created and merged into both.
 
 - Make trace_user_fault_read() available for the rest of tracing
 
   The function trace_user_fault_read() is used by trace_marker file read to
   allow reading user space to be done fast and without locking or
   allocations. Make this available so that the system call trace events can
   use it too.
 
 - Have system call trace events read user space values
 
   Now that the system call trace events callbacks are called in a faultable
   context, take advantage of this and read the user space buffers for
   various system calls. For example, show the path name of the openat system
   call instead of just showing the pointer to that path name in user space.
   Also show the contents of the buffer of the write system call. Several
   system call trace events are updated to make tracing into a light weight
   strace tool for all applications in the system.
 
 - Update perf system call tracing to do the same
 
 - And a config and syscall_user_buf_size file to control the size of the buffer
 
   Limit the amount of data that can be read from user space. The default
   size is 63 bytes but that can be expanded to 165 bytes.
 
 - Allow the persistent ring buffer to print system calls normally
 
   The persistent ring buffer prints trace events by their type and ignores
   the print_fmt. This is because the print_fmt may change from kernel to
   kernel. As the system call output is fixed by the system call ABI itself,
   there's no reason to limit that. This makes reading the system call events
   in the persistent ring buffer much nicer and easier to understand.
 
 - Add options to show text offset to function profiler
 
   The function profiler that counts the number of times a function is hit
   currently lists all functions by its name and offset. But this becomes
   ambiguous when there are several functions with the same name. Add a
   tracing option that changes the output to be that of _text+offset
   instead. Now a user space tool can use this information to map the
   _text+offset to the unique function it is counting.
 
 - Report bad dynamic event command
 
   If a bad command is passed to the dynamic_events file, report it properly
   in the error log.
 
 - Clean up tracer options
 
   Clean up the tracer option code a bit, by removing some useless code and
   also using switch statements instead of a series of if statements.
 
 - Have tracing options be instance specific
 
   Tracers can have their own options (function tracer, irqsoff tracer,
   function graph tracer, etc). But now that the same tracer can be enabled
   in multiple trace instances, their options are still global. The API is
   per instance, thus changing one affects other instances. This isn't even
   consistent, as the option take affect differently depending on when an
   tracer started in an instance.  Make the options for instances only affect
   the instance it is changed under.
 
 - Optimize pid_list lock contention
 
   Whenever the pid_list is read, it uses a spin lock. This happens at every
   sched switch. Taking the lock at sched switch can be removed by instead
   using a seqlock counter.
 
 - Clean up the trace trigger structures
 
   The trigger code uses two different structures to implement a single
   tigger. This was due to trying to reuse code for the two different types
   of triggers (always on trigger, and count limited trigger). But by adding
   a single field to one structure, the other structure could be absorbed
   into the first structure making he code easier to understand.
 
 - Create a bulk garbage collector for trace triggers
 
   If user space has triggers for several hundreds of events and then removes
   them, it can take several seconds to complete. This is because each
   removal calls the slow tracepoint_synchronize_unregister() that can take
   hundreds of milliseconds to complete. Instead, create a helper thread that
   will do the clean up. When a trigger is removed, it will create the
   kthread if it isn't already created, and then add the trigger to a llist.
   The kthread will take the items off the llist, call
   tracepoint_synchronize_unregister(), and then remove the items it took
   off. It will then check if there's more items to free before sleeping.
 
   This makes user space removing all these triggers to finish in less than a
   second.
 
 - Allow function tracing of some of the tracing infrastructure code
 
   Because the tracing code can cause recursion issues if it is traced by the
   function tracer the entire tracing directory disables function tracing.
   But not all of tracing causes issues if it is traced. Namely, the event
   tracing code. Add a config that enables some of the tracing code to be
   traced to help in debugging it. Note, when this is enabled, it does add
   noise to general function tracing, especially if events are enabled as
   well (which is a common case).
 
 - Add boot-time backup instance for persistent buffer
 
   The persistent ring buffer is used mostly for kernel crash analysis in the
   field. One issue is that if there's a crash, the data in the persistent
   ring buffer must be read before tracing can begin using it. This slows
   down the boot process. Once tracing starts in the persistent ring buffer,
   the old data must be freed and the addresses no longer match and old
   events can't be in the buffer with new events.
 
   Create a way to create a backup buffer that copies the persistent ring
   buffer at boot up. Then after a crash, the always on tracer can begin
   immediately as well as the normal boot process while the crash analysis
   tooling uses the backup buffer. After the backup buffer is finished being
   read, it can be removed.
 
 - Enable function graph args and return address options at the same time
 
   Currently the when reading of arguments in the function graph tracer is
   enabled, the option to record the parent function in the entry event can
   not be enabled. Update the code so that it can.
 
 - Add new struct_offset() helper macro
 
   Add a new macro that takes a pointer to a structure and a name of one of
   its members and it will return the offset of that member. This allows the
   ring buffer code to simplify the following:
 
   From:  size = struct_size(entry, buf, cnt - sizeof(entry->id));
     To:  size = struct_offset(entry, id) + cnt;
 
   There should be other simplifications that this macro can help out with as
   well.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS9xqxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qj6tAQD4MR1lsE3XpH09asO4CDDfhbtRSQVD
 o8bVKVihWx/j5gD/XezjqE2Q2+DO6dhnsQY6pbtNdXoKgaMuQJGA+dvPsQc=
 =HilC
 -----END PGP SIGNATURE-----

Merge tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing updates from Steven Rostedt:

 - Extend tracing option mask to 64 bits

   The trace options were defined by a 32 bit variable. This limits the
   tracing instances to have a total of 32 different options. As that
   limit has been hit, and more options are being added, increase the
   option mask to a 64 bit number, doubling the number of options
   available.

   As this is required for the kprobe topic branches as well as the
   tracing topic branch, a separate branch was created and merged into
   both.

 - Make trace_user_fault_read() available for the rest of tracing

   The function trace_user_fault_read() is used by trace_marker file
   read to allow reading user space to be done fast and without locking
   or allocations. Make this available so that the system call trace
   events can use it too.

 - Have system call trace events read user space values

   Now that the system call trace events callbacks are called in a
   faultable context, take advantage of this and read the user space
   buffers for various system calls. For example, show the path name of
   the openat system call instead of just showing the pointer to that
   path name in user space. Also show the contents of the buffer of the
   write system call. Several system call trace events are updated to
   make tracing into a light weight strace tool for all applications in
   the system.

 - Update perf system call tracing to do the same

 - And a config and syscall_user_buf_size file to control the size of
   the buffer

   Limit the amount of data that can be read from user space. The
   default size is 63 bytes but that can be expanded to 165 bytes.

 - Allow the persistent ring buffer to print system calls normally

   The persistent ring buffer prints trace events by their type and
   ignores the print_fmt. This is because the print_fmt may change from
   kernel to kernel. As the system call output is fixed by the system
   call ABI itself, there's no reason to limit that. This makes reading
   the system call events in the persistent ring buffer much nicer and
   easier to understand.

 - Add options to show text offset to function profiler

   The function profiler that counts the number of times a function is
   hit currently lists all functions by its name and offset. But this
   becomes ambiguous when there are several functions with the same
   name.

   Add a tracing option that changes the output to be that of
   '_text+offset' instead. Now a user space tool can use this
   information to map the '_text+offset' to the unique function it is
   counting.

 - Report bad dynamic event command

   If a bad command is passed to the dynamic_events file, report it
   properly in the error log.

 - Clean up tracer options

   Clean up the tracer option code a bit, by removing some useless code
   and also using switch statements instead of a series of if
   statements.

 - Have tracing options be instance specific

   Tracers can have their own options (function tracer, irqsoff tracer,
   function graph tracer, etc). But now that the same tracer can be
   enabled in multiple trace instances, their options are still global.
   The API is per instance, thus changing one affects other instances.
   This isn't even consistent, as the option take affect differently
   depending on when an tracer started in an instance. Make the options
   for instances only affect the instance it is changed under.

 - Optimize pid_list lock contention

   Whenever the pid_list is read, it uses a spin lock. This happens at
   every sched switch. Taking the lock at sched switch can be removed by
   instead using a seqlock counter.

 - Clean up the trace trigger structures

   The trigger code uses two different structures to implement a single
   tigger. This was due to trying to reuse code for the two different
   types of triggers (always on trigger, and count limited trigger). But
   by adding a single field to one structure, the other structure could
   be absorbed into the first structure making he code easier to
   understand.

 - Create a bulk garbage collector for trace triggers

   If user space has triggers for several hundreds of events and then
   removes them, it can take several seconds to complete. This is
   because each removal calls tracepoint_synchronize_unregister() that
   can take hundreds of milliseconds to complete.

   Instead, create a helper thread that will do the clean up. When a
   trigger is removed, it will create the kthread if it isn't already
   created, and then add the trigger to a llist. The kthread will take
   the items off the llist, call tracepoint_synchronize_unregister(),
   and then remove the items it took off. It will then check if there's
   more items to free before sleeping.

   This makes user space removing all these triggers to finish in less
   than a second.

 - Allow function tracing of some of the tracing infrastructure code

   Because the tracing code can cause recursion issues if it is traced
   by the function tracer the entire tracing directory disables function
   tracing. But not all of tracing causes issues if it is traced.
   Namely, the event tracing code. Add a config that enables some of the
   tracing code to be traced to help in debugging it. Note, when this is
   enabled, it does add noise to general function tracing, especially if
   events are enabled as well (which is a common case).

 - Add boot-time backup instance for persistent buffer

   The persistent ring buffer is used mostly for kernel crash analysis
   in the field. One issue is that if there's a crash, the data in the
   persistent ring buffer must be read before tracing can begin using
   it. This slows down the boot process. Once tracing starts in the
   persistent ring buffer, the old data must be freed and the addresses
   no longer match and old events can't be in the buffer with new
   events.

   Create a way to create a backup buffer that copies the persistent
   ring buffer at boot up. Then after a crash, the always on tracer can
   begin immediately as well as the normal boot process while the crash
   analysis tooling uses the backup buffer. After the backup buffer is
   finished being read, it can be removed.

 - Enable function graph args and return address options at the same
   time

   Currently the when reading of arguments in the function graph tracer
   is enabled, the option to record the parent function in the entry
   event can not be enabled. Update the code so that it can.

 - Add new struct_offset() helper macro

   Add a new macro that takes a pointer to a structure and a name of one
   of its members and it will return the offset of that member. This
   allows the ring buffer code to simplify the following:

   From:  size = struct_size(entry, buf, cnt - sizeof(entry->id));
     To:  size = struct_offset(entry, id) + cnt;

   There should be other simplifications that this macro can help out
   with as well

* tag 'trace-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (42 commits)
  overflow: Introduce struct_offset() to get offset of member
  function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
  tracing: Add boot-time backup of persistent ring buffer
  ftrace: Allow tracing of some of the tracing code
  tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
  tracing: Add bulk garbage collection of freeing event_trigger_data
  tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
  tracing: Merge struct event_trigger_ops into struct event_command
  tracing: Remove get_trigger_ops() and add count_func() from trigger ops
  tracing: Show the tracer options in boot-time created instance
  ftrace: Avoid redundant initialization in register_ftrace_direct
  tracing: Remove unused variable in tracing_trace_options_show()
  fgraph: Make fgraph_no_sleep_time signed
  tracing: Convert function graph set_flags() to use a switch() statement
  tracing: Have function graph tracer option sleep-time be per instance
  tracing: Move graph-time out of function graph options
  tracing: Have function graph tracer option funcgraph-irqs be per instance
  trace/pid_list: optimize pid_list->lock contention
  tracing: Have function graph tracer define options per instance
  tracing: Have function tracer define options per instance
  ...
2025-12-05 09:51:37 -08:00
Linus Torvalds 36492b7141 Detect unused tracepoints for v6.19:
If a tracepoint is defined but never used (TRACE_EVENT() created but no
 trace_<tracepoint>() called), it can take up to or more than 5K of memory
 each. This can add up as there are around a hundred unused tracepoints with
 various configs. That is 500K of wasted memory.
 
 Add a make build parameter of "UT=1" to have the build warn if an unused
 tracepoint is detected in the build. This allows detection of unused
 tracepoints to be upstream so that outreachy and the mentoring project can
 have new developers look for fixing them, without having these warnings
 suddenly show up when someone upgrades their kernel. When all known unused
 tracepoints are removed, then the "UT=1" build parameter can be removed and
 unused tracepoints will always warn. This will catch new unused tracepoints
 after the current ones have been removed.
 
 - Separate out elf functions from sorttable.c
 
   Move out the ELF parsing functions from sorttable.c so that the tracing
   tooling can use it.
 
 - Add a tracepoint verifier tool to the build process
 
   If "UT=1" is added to the kernel command line, any unused tracepoints will
   trigger a warning at build time.
 
 - Do not warn about unused tracepoints for tracepoints that are exported
 
   There are sever cases where a tracepoint is created by the kernel and used
   by modules. Since there's no easy way to detect if these are truly unused
   since the users are in modules, if a tracepoint is exported, assume it
   will eventually be used by a module. Note, there's not many exported
   tracepoints so this should not be a problem to ignore them.
 
 - Have building of modules also detect unused tracepoints
 
   Do not only check the main vmlinux for unused tracepoints, also check
   modules. If a module is defining a tracepoint it should be using it.
 
 - Add the tracepoint-update program to the ignore file
 
   The new tracepoint-update program needs to be ignored by git.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS9iLxQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qk4mAP96T/IPPjox1Fd7r/Dpm+JNfYom8AZ8
 WGNL06+aEKRWZwEAqc+u/9k3r964k+pKQ7qwL3ZslG2ALSOdKbFXHpsPpw8=
 =R/qK
 -----END PGP SIGNATURE-----

Merge tag 'tracepoints-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull unused tracepoints update from Steven Rostedt:
 "Detect unused tracepoints.

  If a tracepoint is defined but never used (TRACE_EVENT() created but
  no trace_<tracepoint>() called), it can take up to or more than 5K of
  memory each. This can add up as there are around a hundred unused
  tracepoints with various configs. That is 500K of wasted memory.

  Add a make build parameter of "UT=1" to have the build warn if an
  unused tracepoint is detected in the build. This allows detection of
  unused tracepoints to be upstream so that outreachy and the mentoring
  project can have new developers look for fixing them, without having
  these warnings suddenly show up when someone upgrades their kernel.

  When all known unused tracepoints are removed, then the "UT=1" build
  parameter can be removed and unused tracepoints will always warn. This
  will catch new unused tracepoints after the current ones have been
  removed.

  Summary:

   - Separate out elf functions from sorttable.c

     Move out the ELF parsing functions from sorttable.c so that the
     tracing tooling can use it.

   - Add a tracepoint verifier tool to the build process

     If "UT=1" is added to the kernel command line, any unused
     tracepoints will trigger a warning at build time.

   - Do not warn about unused tracepoints for tracepoints that are
     exported

     There are sever cases where a tracepoint is created by the kernel
     and used by modules. Since there's no easy way to detect if these
     are truly unused since the users are in modules, if a tracepoint is
     exported, assume it will eventually be used by a module. Note,
     there's not many exported tracepoints so this should not be a
     problem to ignore them.

   - Have building of modules also detect unused tracepoints

     Do not only check the main vmlinux for unused tracepoints, also
     check modules. If a module is defining a tracepoint it should be
     using it.

   - Add the tracepoint-update program to the ignore file

     The new tracepoint-update program needs to be ignored by git"

* tag 'tracepoints-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  scripts: add tracepoint-update to the list of ignores files
  tracing: Add warnings for unused tracepoints for modules
  tracing: Allow tracepoint-update.c to work with modules
  tracepoint: Do not warn for unused event that is exported
  tracing: Add a tracepoint verification check at build time
  sorttable: Move ELF parsing into scripts/elf-parse.[ch]
2025-12-05 09:37:41 -08:00
Linus Torvalds 5779de8d36 rtla updaets for v6.19:
- Officially add Tomas Glozar as a maintainer to RTLA tool
 
 - Add for_each_monitored_cpu() helper
 
   In multiple places, RTLA tools iterate over the list of CPUs running
   tracer threads.
 
   Use single helper instead of repeating the for/if combination.
 
 - Remove unused variable option_index in argument parsing
 
   RTLA tools use getopt_long() for argument parsing. For its last
   argument, an unused variable "option_index" is passed.
 
   Remove the variable and pass NULL to getopt_long() to shorten
   the naturally long parsing functions, and make them more readable.
 
 - Fix unassigned nr_cpus after code consolidation
 
   In recent code consolidation, timerlat tool cleanup, previously
   implemented separately for each tool, was moved to a common function
   timerlat_free().
 
   The cleanup relies on nr_cpus being set. This was not done in the new
   function, leaving the variable uninitialized.
 
   Initialize the variable properly, and remove silencing of compiler
   warning for uninitialized variables.
 
 - Stop tracing on user latency in BPF mode
 
   Despite the name, rtla-timerlat's -T/--thread option sets timerlat's
   stop_tracing_total_us option, which also stops tracing on
   return-from-user latency, not only on thread latency.
 
   Implement the same behavior also in BPF sample collection stop tracing
   handler to avoid a discrepancy and restore correspondence of behavior
   with the equivalent option of cyclictest.
 
 - Fix threshold actions always triggering
 
   A bug in threshold action logic caused the action to execute even
   if tracing did not stop because of threshold.
 
   Fix the logic to stop correctly.
 
 - Fix few minor issues in tests
 
   Extend tests that were shown to need it to 5s, fix osnoise test
   calling timerlat by mistake, and use new, more reliable output
   checking in timerlat's "top stop at failed action" test.
 
 - Do not print usage on argument parsing error
 
   RTLA prints the entire usage message on encountering errors in
   argument parsing, like a malformed CPU list.
 
   The usage message has gotten too long. Instead of printing it,
   use newly added fatal() helper function to simply exit with
   the error message, excluding the usage.
 
 - Fix unintuitive -C/--cgroup interface
 
   "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that
   being a common way to specify an option with argument. Moreover,
   using them fails silently and no cgroup is set.
 
   Create new helper function to unify the handling of all such options
   and allow all of:
 
   -Xsomething
   -X=something
   -X something
 
   as well as the equivalent for the long option.
 
 - Fix -a overriding -t argument filename
 
   Fix a bug where -a following -t custom_file.txt overrides the custom
   filename with the default timerlat_trace.txt.
 
 - Stop tracing correctly on multiple events at once
 
   In some race scenarios, RTLA BPF sample collection might send multiple
   stop tracing events via the BPF ringbuffer at once.
 
   Compare the number of events for != 0 instead of == 1 to cover for
   this scenario and stop tracing properly.
 -----BEGIN PGP SIGNATURE-----
 
 iIoEABYKADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCaS9bxBQccm9zdGVkdEBn
 b29kbWlzLm9yZwAKCRAp5XQQmuv6qhrgAP0a/AtsL9+IFXAK5JK8aO1XWApVyK9n
 48FRZWu/jrupuAD7BO+EHazmPEourNaUqYPeuymwxT+4O47RH1Q/aasLQwo=
 =RvNH
 -----END PGP SIGNATURE-----

Merge tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull rtla trace tooling updates from Steven Rostedt:

 - Officially add Tomas Glozar as a maintainer to RTLA tool

 - Add for_each_monitored_cpu() helper

   In multiple places, RTLA tools iterate over the list of CPUs running
   tracer threads.

   Use single helper instead of repeating the for/if combination.

 - Remove unused variable option_index in argument parsing

   RTLA tools use getopt_long() for argument parsing. For its last
   argument, an unused variable "option_index" is passed.

   Remove the variable and pass NULL to getopt_long() to shorten the
   naturally long parsing functions, and make them more readable.

 - Fix unassigned nr_cpus after code consolidation

   In recent code consolidation, timerlat tool cleanup, previously
   implemented separately for each tool, was moved to a common function
   timerlat_free().

   The cleanup relies on nr_cpus being set. This was not done in the new
   function, leaving the variable uninitialized.

   Initialize the variable properly, and remove silencing of compiler
   warning for uninitialized variables.

 - Stop tracing on user latency in BPF mode

   Despite the name, rtla-timerlat's -T/--thread option sets timerlat's
   stop_tracing_total_us option, which also stops tracing on
   return-from-user latency, not only on thread latency.

   Implement the same behavior also in BPF sample collection stop
   tracing handler to avoid a discrepancy and restore correspondence of
   behavior with the equivalent option of cyclictest.

 - Fix threshold actions always triggering

   A bug in threshold action logic caused the action to execute even if
   tracing did not stop because of threshold.

   Fix the logic to stop correctly.

 - Fix few minor issues in tests

   Extend tests that were shown to need it to 5s, fix osnoise test
   calling timerlat by mistake, and use new, more reliable output
   checking in timerlat's "top stop at failed action" test.

 - Do not print usage on argument parsing error

   RTLA prints the entire usage message on encountering errors in
   argument parsing, like a malformed CPU list.

   The usage message has gotten too long. Instead of printing it, use
   newly added fatal() helper function to simply exit with the error
   message, excluding the usage.

 - Fix unintuitive -C/--cgroup interface

   "-C cgroup" and "--cgroup cgroup" are invalid syntax, despite that
   being a common way to specify an option with argument. Moreover,
   using them fails silently and no cgroup is set.

   Create new helper function to unify the handling of all such options
   and allow all of:

     -Xsomething
     -X=something
     -X something

   as well as the equivalent for the long option.

 - Fix -a overriding -t argument filename

   Fix a bug where -a following -t custom_file.txt overrides the custom
   filename with the default timerlat_trace.txt.

 - Stop tracing correctly on multiple events at once

   In some race scenarios, RTLA BPF sample collection might send
   multiple stop tracing events via the BPF ringbuffer at once.

   Compare the number of events for != 0 instead of == 1 to cover for
   this scenario and stop tracing properly.

* tag 'trace-tools-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rtla/timerlat: Exit top main loop on any non-zero wait_retval
  rtla/tests: Don't rely on matching ^1ALL
  rtla: Fix -a overriding -t argument
  rtla: Fix -C/--cgroup interface
  tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")
  tools/rtla: Replace osnoise_top_usage("...") with fatal("...")
  tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")
  tools/rtla: Replace timerlat_top_usage("...") with fatal("...")
  tools/rtla: Add fatal() and replace error handling pattern
  rtla/tests: Fix osnoise test calling timerlat
  rtla/tests: Extend action tests to 5s
  tools/rtla: Fix --on-threshold always triggering
  rtla/timerlat_bpf: Stop tracing on user latency
  tools/rtla: Fix unassigned nr_cpus
  tools/rtla: Remove unused optional option_index
  tools/rtla: Add for_each_monitored_cpu() helper
  MAINTAINERS: Add Tomas Glozar as a maintainer to RTLA tool
2025-12-05 09:34:01 -08:00
Linus Torvalds ed1b409137 hardening updates for v6.19-rc1
- string: Add missing kernel-doc return descriptions (Kriish Sharma)
 
 - Update some mis-typed allocations
 
 - Enable GCC diagnostic context for value-tracking warnings
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaS9E5QAKCRA2KwveOeQk
 u5lYAQDEXFBD3+X+k9LNuPS/FLpz5sEI0SOI4lD8xDEjhtmygAD+LVV8yRf6ajPA
 5O2f4hbKnP5+4XHwSiG+CV7QpAgHHwo=
 =6GEw
 -----END PGP SIGNATURE-----

Merge tag 'hardening-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull hardening updates from Kees Cook:

 - string: Add missing kernel-doc return descriptions (Kriish Sharma)

 - Update some mis-typed allocations

   These correct some accidentally wrong types used in allocations (that
   didn't affect the resulting size) that never got picked up from the
   batch I sent a few months ago.

 - Enable GCC diagnostic context for value-tracking warnings

   This results in better GCC diagnostics for the value range tracking,
   so we can get better visibility into where those values are coming
   from when we get out-of-bounds warnings at compile time.

* tag 'hardening-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kbuild: Enable GCC diagnostic context for value-tracking warnings
  string: Add missing kernel-doc return descriptions
  media: iris: Cast iris_hfi_gen2_get_instance() allocation type
  drm/plane: Remove const qualifier from plane->modifiers allocation type
  comedi: Adjust range_table_list allocation type
2025-12-05 09:11:02 -08:00
Linus Torvalds 3ee37abbbd pstore update for v6.19-rc1
- pstore/ram: Update module parameters from platform data (Tzung-Bi Shih)
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYKAB0WIQRSPkdeREjth1dHnSE2KwveOeQkuwUCaS9EPwAKCRA2KwveOeQk
 uyEDAQDZTrI547b000g8gjAWpQpQWB32wkNZOP4ANafAGQXLDwD9G8YxKQFRX+HJ
 mfkvgK8JqOiPjsnvbPJv/F2VdBaQHAk=
 =FDkt
 -----END PGP SIGNATURE-----

Merge tag 'pstore-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull pstore update from Kees Cook:

 - pstore/ram: Update module parameters from platform data (Tzung-Bi Shih)

* tag 'pstore-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  pstore/ram: Update module parameters from platform data
2025-12-05 09:08:13 -08:00
Linus Torvalds 5d45c729ed configfs changes for v6.19
-----BEGIN PGP SIGNATURE-----
 
 iQJKBAABCgA0FiEEEsH5R1a/fCoV1sAS4bgaPnkoY3cFAmku+SoWHGEuaGluZGJv
 cmdAa2VybmVsLm9yZwAKCRDhuBo+eShjd5XpEACgVpU27v59kWhw+vIPQeNfrCzo
 L8Is/w4a0IkBbaeLvp7JhjshQ9ViEuWFJE7un9RVxOm0YC9sisGZYAfNks71nFYE
 /AAwocY4MrCLcDHwtRm5N2eykz6DtpFKorl6Go2WEUN5ye5ulDO7cpMTG5knKKb2
 CiaifWx/oRqsqQxdeA53c8fnlGKJ/R0KJXDwslj2G4tGnyz1YwbrpQdT3QNYjSWb
 hxyoet36Mprylrrfn9LBnGJuCwCXmAcxWAn/vtGwr7SDoL0o4XhcVfcnHblclDvr
 ZVcKWKpesLfqNjBeO0GsBMabPKn6wZvscvtPNh09x57MxfYLPLXnsD+rMmeMOd0P
 X3Wv7aJjvBWsgSOONVI2gRizNilYq8lnzE1BdPMenlxDk8DIh2blGeA2SzZmpvM/
 8tkpv7x/hYKXmcxGyaUPcYIqMCnXkbVHDGI05DJLCRttF21XIDPpQBuVpApMETzD
 nBAZO7sVatprmEi/+n4C8rCu7B5VuSSFW5Q/eO2QeVJmXIoPPG86b78sedIzIRHO
 rH9ox3HWgDTwi4GuBJmf7Qn8/lurS6QncVZCI91cXUnOhf1juACSRMVOOvCP9lio
 M7Dmd3U2QkkkE6h/TwM2o7wBBrzlCLRPL7TeF9ft8m29g+SU9jR8N0/fo2yJUA4F
 EZ9xxiwzNLkLTkqBtg==
 =TV16
 -----END PGP SIGNATURE-----

Merge tag 'configfs-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux

Pull configfs updates from Andreas Hindborg:
 "Two commits changing constness of the configfs vtable pointers. We
  plan to follow up with changes at call sites down the road"

* tag 'configfs-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/a.hindborg/linux:
  configfs: Constify ct_item_ops in struct config_item_type
  configfs: Constify ct_group_ops in struct config_item_type
2025-12-05 08:59:41 -08:00
Steven Rostedt d3042cbe84 ktest.pl: Fix uninitialized var in config-bisect.pl
The error path of copying the old config used the wrong variable in the
error message:

 $ mkdir /tmp/build
 $ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad
 $ chmod 0 /tmp/build
 $ ./tools/testing/ktest/config-bisect.pl -b /tmp/build config-good /tmp/config-bad good
 cp /tmp/build//.config config-good.tmp ... [0 seconds] FAILED!
 Use of uninitialized value $config in concatenation (.) or string at ./tools/testing/ktest/config-bisect.pl line 744.
 failed to copy  to config-good.tmp

When it should have shown:

 failed to copy /tmp/build//.config to config-good.tmp

Cc: stable@vger.kernel.org
Cc: John 'Warthog9' Hawley <warthog9@kernel.org>
Fixes: 0f0db06599 ("ktest: Add standalone config-bisect.pl program")
Link: https://patch.msgid.link/20251203180924.6862bd26@gandalf.local.home
Reported-by: "John W. Krahn" <jwkrahn@shaw.ca>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2025-12-03 18:25:18 -05:00
Steven Rostedt b1e7a590a0 ring-buffer: Add helper functions for allocations
The allocation of the per CPU buffer descriptor, the buffer page
descriptors and the buffer page data itself can be pretty ugly:

  kzalloc_node(ALIGN(sizeof(struct buffer_page), cache_line_size()),
               GFP_KERNEL, cpu_to_node(cpu));

And the data pages:

  page = alloc_pages_node(cpu_to_node(cpu),
                          GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_COMP | __GFP_ZERO, order);
  if (!page)
	return NULL;
  bpage->page = page_address(page);
  rb_init_page(bpage->page);

Add helper functions to make the code easier to read.

This does make all allocations of the data page (bpage->page) allocated
with the __GFP_RETRY_MAYFAIL flag (and not just the bulk allocator). Which
is actually better, as allocating the data page for the ring buffer tracing
should try hard but not trigger the OOM killer.

Link: https://lore.kernel.org/all/CAHk-=wjMMSAaqTjBSfYenfuzE1bMjLj+2DLtLWJuGt07UGCH_Q@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251125121153.35c07461@gandalf.local.home
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-12-02 15:49:35 -05:00
Gabriele Monaco bbaacdc339 rv: Fix compilation if !CONFIG_RV_REACTORS
The kernel test robot spotted a compilation error if reactors are
disabled.

Fix the warning by keeping LTL monitor variable as always static.

Cc: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://patch.msgid.link/20251113150618.185479-2-gmonaco@redhat.com
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511131948.vxi5mdjU-lkp@intel.com/
Fixes: 4f739ed19d ("rv: Pass va_list to reactors")
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-12-02 12:33:37 -05:00
Nam Cao b30f635bb6 rv: Convert to use __free
Convert to use __free to tidy up the code.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/62854e2fcb8f8dd2180a98a9700702dcf89a6980.1763370183.git.namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-12-02 07:28:32 +01:00
Nam Cao 8db3790c4d rv: Convert to use lock guard
Convert to use lock guard to tidy up the code.

Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/dbefeb868093c40d4b29fd6b57294a6aa011b719.1763370183.git.namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-12-02 07:28:20 +01:00
Steven Rostedt f6ed9c5d31 overflow: Introduce struct_offset() to get offset of member
The trace_marker_raw file in tracefs takes a buffer from user space that
contains an id as well as a raw data string which is usually a binary
structure. The structure used has the following:

	struct raw_data_entry {
		struct trace_entry	ent;
		unsigned int		id;
		char			buf[];
	};

Since the passed in "cnt" variable is both the size of buf as well as the
size of id, the code to allocate the location on the ring buffer had:

   size = struct_size(entry, buf, cnt - sizeof(entry->id));

Which is quite ugly and hard to understand. Instead, add a helper macro
called struct_offset() which then changes the above to a simple and easy
to understand:

   size = struct_offset(entry, id) + cnt;

This will likely come in handy for other use cases too.

Link: https://lore.kernel.org/all/CAHk-=whYZVoEdfO1PmtbirPdBMTV9Nxt9f09CK0k6S+HJD3Zmg@mail.gmail.com/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://patch.msgid.link/20251126145249.05b1770a@gandalf.local.home
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-27 20:18:05 -05:00
Christophe JAILLET f7f7809869 configfs: Constify ct_item_ops in struct config_item_type
Make 'ct_item_ops' const in struct config_item_type.
This allows constification of many structures which hold some function
pointers.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/f43cb57418a7f59e883be8eedc7d6abe802a2094.1761390472.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
2025-11-27 12:03:27 +01:00
Christophe JAILLET f2f36500a6 configfs: Constify ct_group_ops in struct config_item_type
Make 'ct_group_ops' const in struct config_item_type.
This allows constification of many structures which hold some function
pointers.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Reviewed-by: Breno Leitao <leitao@debian.org>
Link: https://lore.kernel.org/r/6b720cf407e8a6d30f35beb72e031b2553d1ab7e.1761390472.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Andreas Hindborg <a.hindborg@kernel.org>
2025-11-27 12:03:27 +01:00
Shengming Hu c264534c39 fgraph: Remove coarse PID filtering from graph_entry()
With PID filtering working via ftrace_pids_enabled() and fgraph_pid_func,
the coarse-grained ftrace_trace_task() check in graph_entry() is obsolete.

It was only a fallback for uninitialized op->private (now fixed), and its
removal ensures consistent PID filtering with standard function tracing.

Also remove unused ftrace_trace_task() definition from trace.h.

Cc: <wang.yaxin@zte.com.cn>
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <zhang.run@zte.com.cn>
Cc: <yang.yang29@zte.com.cn>
Link: https://patch.msgid.link/20251126173552333XoJZN20143fWbsdTEtWoU@zte.com.cn
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:41:35 -05:00
Shengming Hu 1650a1b6cb fgraph: Check ftrace_pids_enabled on registration for early filtering
When registering ftrace_graph, check if ftrace_pids_enabled is active.
If enabled, assign entryfunc to fgraph_pid_func to ensure filtering
is performed before executing the saved original entry function.

Cc: stable@vger.kernel.org
Cc: <wang.yaxin@zte.com.cn>
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <zhang.run@zte.com.cn>
Cc: <yang.yang29@zte.com.cn>
Link: https://patch.msgid.link/20251126173331679XGVF98NLhyLJRdtNkVZ6w@zte.com.cn
Fixes: df3ec5da6a ("function_graph: Add pid tracing back to function graph tracer")
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:41:16 -05:00
Shengming Hu b5d6d3f73d fgraph: Initialize ftrace_ops->private for function graph ops
The ftrace_pids_enabled(op) check relies on op->private being properly
initialized, but fgraph_ops's underlying ftrace_ops->private was left
uninitialized. This caused ftrace_pids_enabled() to always return false,
effectively disabling PID filtering for function graph tracing.

Fix this by copying src_ops->private to dst_ops->private in
fgraph_init_ops(), ensuring PID filter state is correctly propagated.

Cc: stable@vger.kernel.org
Cc: <wang.yaxin@zte.com.cn>
Cc: <mhiramat@kernel.org>
Cc: <mark.rutland@arm.com>
Cc: <mathieu.desnoyers@efficios.com>
Cc: <zhang.run@zte.com.cn>
Cc: <yang.yang29@zte.com.cn>
Fixes: c132be2c4f ("function_graph: Have the instances use their own ftrace_ops for filtering")
Link: https://patch.msgid.link/20251126172926004y3hC8QyU4WFOjBkU_UxLC@zte.com.cn
Signed-off-by: Shengming Hu <hu.shengming@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:38:21 -05:00
pengdonglin f83ac7544f function_graph: Enable funcgraph-args and funcgraph-retaddr to work simultaneously
Currently, the funcgraph-args and funcgraph-retaddr features are
mutually exclusive. This patch resolves this limitation by allowing
funcgraph-retaddr to have an args array.

To verify the change, use perf to trace vfs_write with both options
enabled:

Before:
 # perf ftrace -G vfs_write --graph-opts args,retaddr
   ......
   down_read() { /* <-n_tty_write+0xa3/0x540 */
     __cond_resched(); /* <-down_read+0x12/0x160 */
     preempt_count_add(); /* <-down_read+0x3b/0x160 */
     preempt_count_sub(); /* <-down_read+0x8b/0x160 */
   }

After:
 # perf ftrace -G vfs_write --graph-opts args,retaddr
   ......
   down_read(sem=0xffff8880100bea78) { /* <-n_tty_write+0xa3/0x540 */
     __cond_resched(); /* <-down_read+0x12/0x160 */
     preempt_count_add(val=1); /* <-down_read+0x3b/0x160 */
     preempt_count_sub(val=1); /* <-down_read+0x8b/0x160 */
   }

Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Xiaoqin Zhang <zhangxiaoqin@xiaomi.com>
Link: https://patch.msgid.link/20251125093425.2563849-1-dolinux.peng@gmail.com
Signed-off-by: pengdonglin <pengdonglin@xiaomi.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:30 -05:00
Masami Hiramatsu (Google) 20e7168326 tracing: Add boot-time backup of persistent ring buffer
Currently, the persistent ring buffer instance needs to be read before
using it. This means we have to wait for boot up user space and dump
the persistent ring buffer. However, in that case we can not start
tracing on it from the kernel cmdline.

To solve this limitation, this adds an option which allows to create
a trace instance as a backup of the persistent ring buffer at boot.
If user specifies trace_instance=<BACKUP>=<PERSIST_RB> then the
<BACKUP> instance is made as a copy of the <PERSIST_RB> instance.

For example, the below kernel cmdline records all syscalls, scheduler
and interrupt events on the persistent ring buffer `boot_map` but
before starting the tracing, it makes a `backup` instance from the
`boot_map`. Thus, the `backup` instance has the previous boot events.

'reserve_mem=12M:4M:trace trace_instance=boot_map@trace,syscalls:*,sched:*,irq:* trace_instance=backup=boot_map'

As you can see, this just make a copy of entire reserved area and
make a backup instance on it. So you can release (or shrink) the
backup instance after use it to save the memory usage.

  /sys/kernel/tracing/instances # free
                total        used        free      shared  buff/cache   available
  Mem:        1999284       55704     1930520       10132       13060     1914628
  Swap:             0           0           0
  /sys/kernel/tracing/instances # rmdir backup/
  /sys/kernel/tracing/instances # free
                total        used        free      shared  buff/cache   available
  Mem:        1999284       40640     1945584       10132       13060     1929692
  Swap:             0           0           0

Note: since there is no reason to make a copy of empty buffer, this
backup only accepts a persistent ring buffer as the original instance.
Also, since this backup is based on vmalloc(), it does not support
user-space mmap().

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/176377150002.219692.9425536150438129267.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:30 -05:00
Steven Rostedt f93a7d0cac ftrace: Allow tracing of some of the tracing code
There is times when tracing the tracing infrastructure can be useful for
debugging the tracing code. Currently all files in the tracing directory
are set to "notrace" the functions.

Add a new config option FUNCTION_SELF_TRACING that will allow some of the
files in the tracing infrastructure to be traced. It requires a config to
enable because it will add noise to the function tracer if events and
other tracing features are enabled. Tracing functions and events together
is quite common, so not tracing the event code should be the default.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20251120181514.736f2d5f@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:30 -05:00
Steven Rostedt 400ddf1dbe tracing: Use strim() in trigger_process_regex() instead of skip_spaces()
The function trigger_process_regex() is called by a few functions, where
only one calls strim() on the buffer passed to it. That leaves the other
functions not trimming the end of the buffer passed in and making it a
little inconsistent.

Remove the strim() from event_trigger_regex_write() and have
trigger_process_regex() use strim() instead of skip_spaces(). The buff
variable is not passed in as const, so it can be modified.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20251125214032.323747707@kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:30 -05:00
Steven Rostedt 61d445af0a tracing: Add bulk garbage collection of freeing event_trigger_data
The event trigger data requires a full tracepoint_synchronize_unregister()
call before freeing. That call can take 100s of milliseconds to complete.
In order to allow for bulk freeing of the trigger data, it can not call
the tracepoint_synchronize_unregister() for every individual trigger data
being free.

Create a kthread that gets created the first time a trigger data is freed,
and have it use the lockless llist to get the list of data to free, run
the tracepoint_synchronize_unregister() then free everything in the list.

By freeing hundreds of event_trigger_data elements together, it only
requires two runs of the synchronization function, and not hundreds of
runs. This speeds up the operation by orders of magnitude (milliseconds
instead of several seconds).

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20251125214032.151674992@kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:30 -05:00
Steven Rostedt 78c7051394 tracing: Remove unneeded event_mutex lock in event_trigger_regex_release()
In event_trigger_regex_release(), the only code is:

	mutex_lock(&event_mutex);
	if (file->f_mode & FMODE_READ)
		seq_release(inode, file);
	mutex_unlock(&event_mutex);

	return 0;

There's nothing special about the file->f_mode or the seq_release() that
requires any locking. Remove the unnecessary locks.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20251125214031.975879283@kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:29 -05:00
Steven Rostedt b052d70f7c tracing: Merge struct event_trigger_ops into struct event_command
Now that there's pretty much a one to one mapping between the struct
event_trigger_ops and struct event_command, there's no reason to have two
different structures. Merge the function pointers of event_trigger_ops
into event_command.

There's one exception in trace_events_hist.c for the
event_hist_trigger_named_ops. This has special logic for the init and free
function pointers for "named histograms". In this case, allocate the
cmd_ops of the event_trigger_data and set it to the proper init and free
functions, which are used to initialize and free the event_trigger_data
respectively. Have the free function and the init function (on failure)
free the cmd_ops of the data element.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251125200932.446322765@kernel.org
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:29 -05:00
Steven Rostedt bdafb4d4cb tracing: Remove get_trigger_ops() and add count_func() from trigger ops
The struct event_command has a callback function called get_trigger_ops().
This callback returns the "trigger_ops" to use for the trigger. These ops
define the trigger function, how to init the trigger, how to print the
trigger and how to free it.

The only reason there's a callback function to get these ops is because
some triggers have two types of operations. One is an "always on"
operation, and the other is a "count down" operation. If a user passes in
a parameter to say how many times the trigger should execute. For example:

  echo stacktrace:5 > events/kmem/kmem_cache_alloc/trigger

It will trigger the stacktrace for the first 5 times the kmem_cache_alloc
event is hit.

Instead of having two different trigger_ops since the only difference
between them is the tigger itself (the print, init and free functions are
all the same), just use a single ops that the event_command points to and
add a function field to the trigger_ops to have a count_func.

When a trigger is added to an event, if there's a count attached to it and
the trigger ops has the count_func field, the data allocated to represent
this trigger will have a new flag set called COUNT.

Then when the trigger executes, it will check if the COUNT data flag is
set, and if so, it will call the ops count_func(). If that returns false,
it returns without executing the trigger.

This removes the need for duplicate event_trigger_ops structures.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251125200932.274566147@kernel.org
Reviewed-by: Tom Zanussi <zanussi@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:29 -05:00
Masami Hiramatsu (Google) 23c0e9cc76 tracing: Show the tracer options in boot-time created instance
Since tracer_init_tracefs_work_func() only updates the tracer options
for the global_trace, the instances created by the kernel cmdline
do not have those options.

Fix to update tracer options for those boot-time created instances
to show those options.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/176354112555.2356172.3989277078358802353.stgit@mhiramat.tok.corp.google.com
Fixes: 428add559b ("tracing: Have tracer option be instance specific")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:29 -05:00
Menglong Dong 7a6735cc9b ftrace: Avoid redundant initialization in register_ftrace_direct
The FTRACE_OPS_FL_INITIALIZED flag is cleared in register_ftrace_direct,
which can make it initialized by ftrace_ops_init() even if it is already
initialized. It seems that there is no big deal here, but let's still fix
it.

Link: https://patch.msgid.link/20251110121808.1559240-1-dongml2@chinatelecom.cn
Fixes: f64dd4627e ("ftrace: Add multi direct register/unregister interface")
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:28 -05:00
Steven Rostedt 49c1364c7c tracing: Remove unused variable in tracing_trace_options_show()
The flags and opts used in tracing_trace_options_show() now come directly
from the trace array "current_trace_flags" and not the current_trace. The
variable "trace" was still being assigned to tr->current_trace but never
used. This caused a warning in clang.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251117120637.43ef995d@gandalf.local.home
Reported-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Tested-by: Andy Shevchenko <andriy.shevchenko@intel.com>
Closes: https://lore.kernel.org/all/aRtHWXzYa8ijUIDa@black.igk.intel.com/
Fixes: 428add559b ("tracing: Have tracer option be instance specific")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:28 -05:00
Steven Rostedt ac87b220a6 fgraph: Make fgraph_no_sleep_time signed
The variable fgraph_no_sleep_time changed from being a boolean to being a
counter. A check is made to make sure that it never goes below zero. But
the variable being unsigned makes the check always fail even if it does go
below zero.

Make the variable a signed int so that checking it going below zero still
works.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251125104751.4c9c7f28@gandalf.local.home
Fixes: 5abb6ccb58 ("tracing: Have function graph tracer option sleep-time be per instance")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/all/aR1yRQxDmlfLZzoo@stanley.mountain/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-26 15:13:28 -05:00
Kees Cook 7454048db2 kbuild: Enable GCC diagnostic context for value-tracking warnings
Enable GCC 16's coming "-fdiagnostics-show-context=N" option[1] to
provide enhanced diagnostic information for value-tracking warnings,
which displays the control flow chain leading to the diagnostic. This
covers our existing use of -Wrestrict and -Wstringop-overread, and
gets us closer to enabling -Warray-bounds, -Wstringop-overflow, and
-Wstringop-truncation, so we can track the rationale for the warning,
letting us more quickly identify actual issues vs what have looked in
the past like false positives. Fixes based on this work have already
been landing, e.g.:

  4a6f18f286 ("net/mlx4_core: Avoid impossible mlx4_db_alloc() order value")
  8a39f1c870 ("ovl: Check for NULL d_inode() in ovl_dentry_upper()")
  e5f7e4e0a4 ("drm/amdgpu/atom: Work around vbios NULL offset false positive")

The context depth ("=N") provides the immediate decision path that led
to the problematic code location, showing conditional checks and branch
decisions that caused the warning. This will help us understand why
GCC's value-tracking analysis triggered the warning and makes it easier
to determine whether warnings are legitimate issues or false positives.

For example, an array bounds warning will now show the conditional
statements (like "if (i >= 4)") that established the out-of-bounds access
range, directly connecting the control flow to the warning location.
This is particularly valuable when GCC's interprocedural analysis can
generate warnings that are difficult to understand without seeing the
inferred control flow.

While my testing has shown that "=1" reports enough for finding
the origin of most bounds issues, I have used "=2" here just to be
conservative. Build time measurements with this option off, =1, and =2
are all with noise of each other, so there seems to be no harm in "turning
it up". If we need to, we can make this value configurable in the future.

Link: https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=6faa3cfe60ff9769d1bebfffdd2c7325217d7389 [1]
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://patch.msgid.link/20251121184342.it.626-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24 12:44:05 -08:00
Kriish Sharma 645b9ad2dc string: Add missing kernel-doc return descriptions
While running kernel-doc validation on linux-next, warnings were emitted
for functions in include/linux/string.h due to missing return value
documentation:

    Warning: include/linux/string.h:375 No description found for return value of 'kbasename'
    Warning: include/linux/string.h:560 No description found for return value of 'strstarts'

This patch adds the missing return value descriptions for both functions
and clears the related kernel-doc warnings.

Signed-off-by: Kriish Sharma <kriish.sharma2006@gmail.com>
Reviewed-by: Andy Shevchenko <andy@kernel.org>
Link: https://patch.msgid.link/20251118184828.2621595-1-kriish.sharma2006@gmail.com
Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24 12:44:05 -08:00
Kees Cook fbcc2150aa media: iris: Cast iris_hfi_gen2_get_instance() allocation type
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "struct iris_inst *", but the returned type is
"struct iris_inst_hfi_gen2 *". The allocation is intentionally larger as
the first member of struct iris_inst_hfi_gen2 is struct iris_inst, so
this is by design. Cast the allocation type to match the assignment.

Link: https://patch.msgid.link/20250426061526.work.106-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24 12:44:05 -08:00
Kees Cook 961c989c5f drm/plane: Remove const qualifier from plane->modifiers allocation type
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The assigned type is "uint64_t *", but the returned type, while matching,
will be const qualified. As there is no general way to remove const
qualifiers, adjust the allocation type to match the assignment.

Link: https://patch.msgid.link/20250426061325.work.665-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24 12:43:52 -08:00
Kees Cook 5146f56dee comedi: Adjust range_table_list allocation type
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)

The returned type is "struct comedi_lrange **", but the assigned type,
while technically matching, is const qualified. Since there is no general
way to remove const qualifiers, switch the returned type to match the
assign type. No change in allocation size results.

Link: https://patch.msgid.link/20250426061015.work.971-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-24 12:43:28 -08:00
Crystal Wood 3138df6f0c rtla/timerlat: Exit top main loop on any non-zero wait_retval
Comparing to exactly 1 will fail if more than one ring buffer
event was seen since the last call to timerlat_bpf_wait(), which
can happen in some race scenarios.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-5-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Crystal Wood 61f1fd5d69 rtla/tests: Don't rely on matching ^1ALL
The timerlat "top stop at failed action" test was relying on "ALL" being
printed immediately after the "1" from the threshold action.  Besides being
fragile, this depends on stdbuf behavior, which is easy to miss when
recreating the test outside of the framework for debugging purposes.

Instead, use the expected/unexpected text mechanism from the
corresponding osnoise test.

Signed-off-by: Crystal Wood <crwood@redhat.com>
Link: https://lore.kernel.org/r/20251112152529.956778-2-crwood@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Ivan Pravdin ddb6e42494 rtla: Fix -a overriding -t argument
When running rtla as

    `rtla <timerlat|osnoise> <top|hist> -t custom_file.txt -a 100`

-a options override trace output filename specified by -t option.
Running the command above will create <timerlat|osnoise>_trace.txt file
instead of custom_file.txt. Fix this by making sure that -a option does
not override trace output filename even if it's passed after trace
output filename is specified.

Fixes: 173a3b0148 ("rtla/timerlat: Add the automatic trace option")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/b6ae60424050b2c1c8709e18759adead6012b971.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Ivan Pravdin 7b71f3a698 rtla: Fix -C/--cgroup interface
Currently, user can only specify cgroup to the tracer's thread the
following ways:

    `-C[cgroup]`
    `-C[=cgroup]`
    `--cgroup[=cgroup]`

If user tries to specify cgroup as `-C [cgroup]` or `--cgroup [cgroup]`,
the parser silently fails and rtla's cgroup is used for the tracer
threads.

To make interface more user-friendly, allow user to specify cgroup in
the aforementioned way, i.e. `-C [cgroup]` and `--cgroup [cgroup]`.

Refactor identical logic between -t/--trace and -C/--cgroup into a
common function.

Change documentation to reflect this user interface change.

Fixes: a957cbc025 ("rtla: Add -C cgroup support")
Signed-off-by: Ivan Pravdin <ipravdin.official@gmail.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/16132f1565cf5142b5fbd179975be370b529ced7.1762186418.git.ipravdin.official@gmail.com
[ use capital letter in subject, as required by tracing subsystem ]
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 49c1579419 tools/rtla: Replace osnoise_hist_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace osnoise_hist_usage("...") with fatal("...") on errors.

Remove the already unused 'usage' argument from osnoise_hist_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-6-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 92b5b55e5e tools/rtla: Replace osnoise_top_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace osnoise_top_usage("...") with fatal("...") on errors.

Remove the already unused 'usage' argument from osnoise_top_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-5-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 8f4264e046 tools/rtla: Replace timerlat_hist_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace timerlat_hist_usage("...\n") with fatal("...") on errors.

Remove the already unused 'usage' argument from timerlat_hist_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-4-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 4e5e7210f9 tools/rtla: Replace timerlat_top_usage("...") with fatal("...")
A long time ago, when the usage help was short, it was a favor
to the user to show it on error. Now that the usage help has
become very long, it is too noisy to dump the complete help text
for each typo after the error message itself.

Replace timerlat_top_usage("...\n") with fatal("...") on errors.

Remove the already unused 'usage' argument from timerlat_top_usage().

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-3-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Costa Shulyupin 8cbb25db81 tools/rtla: Add fatal() and replace error handling pattern
The code contains some technical debt in error handling,
which complicates the consolidation of duplicated code.

Introduce an fatal() function to replace the common pattern of
err_msg() followed by exit(EXIT_FAILURE), reducing the length of an
already long function.

Further patches using fatal() follow.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251011082738.173670-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:27 +01:00
Tomas Glozar 34c170ae5c rtla/tests: Fix osnoise test calling timerlat
osnoise test "top stop at failed action" is calling timerlat instead of
osnoise by mistake.

Fix it so that it calls the correct RTLA subcommand.

Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-3-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:30:00 +01:00
Tomas Glozar d649e9f04c rtla/tests: Extend action tests to 5s
In non-BPF mode, it takes up to 1 second for RTLA to notice that tracing
has been stopped. That means that action tests cannot have a 1 second
duration, as the SIGALRM will be racing with the threshold overflow.

Previously, non-BPF mode actions were buggy and always executed
the action, even when stopping on duration or SIGINT, preventing
this issue from manifesting. Now that this has been fixed, the tests
have become flaky, and this has to be adjusted.

Fixes: 4e26f84abf ("rtla/tests: Add tests for actions")
Fixes: 05b7e10687 ("tools/rtla: Add remaining support for osnoise actions")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-2-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-21 10:29:01 +01:00
Tomas Glozar 417bd0d502 tools/rtla: Fix --on-threshold always triggering
Commit 8d933d5c89 ("rtla/timerlat: Add continue action") moved the
code performing on-threshold actions (enabled through --on-threshold
option) to inside the RTLA main loop.

The condition in the loop does not check whether the threshold was
actually exceeded or if stop tracing was requested by the user through
SIGINT or duration. This leads to a bug where on-threshold actions are
always performed, even when the threshold was not hit.

(BPF mode is not affected, since it uses a different condition in the
while loop.)

Add a condition that checks for !stop_tracing before executing the
actions. Also, fix incorrect brackets in hist_main_loop to match the
semantics of top_main_loop.

Fixes: 8d933d5c89 ("rtla/timerlat: Add continue action")
Fixes: 2f3172f9dd ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Reviewed-by: Crystal Wood <crwood@redhat.com>
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251007095341.186923-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:55 +01:00
Tomas Glozar e4240db933 rtla/timerlat_bpf: Stop tracing on user latency
rtla-timerlat allows a *thread* latency threshold to be set via the
-T/--thread option. However, the timerlat tracer calls this *total*
latency (stop_tracing_total_us), and stops tracing also when the
return-to-user latency is over the threshold.

Change the behavior of the timerlat BPF program to reflect what the
timerlat tracer is doing, to avoid discrepancy between stopping
collecting data in the BPF program and stopping tracing in the timerlat
tracer.

Cc: stable@vger.kernel.org
Fixes: e34293ddce ("rtla/timerlat: Add BPF skeleton to collect samples")
Reviewed-by: Wander Lairson Costa <wander@redhat.com>
Link: https://lore.kernel.org/r/20251006143100.137255-1-tglozar@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:55 +01:00
Costa Shulyupin b4275b2301 tools/rtla: Fix unassigned nr_cpus
In recently introduced timerlat_free(),
the variable 'nr_cpus' is not assigned.

Assign it with sysconf(_SC_NPROCESSORS_CONF) as done elsewhere.
Remove the culprit: -Wno-maybe-uninitialized. The rest of the
code is clean.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Fixes: 2f3172f9dd ("tools/rtla: Consolidate code between osnoise/timerlat and hist/top")
Link: https://lore.kernel.org/r/20251002170846.437888-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:54 +01:00
Costa Shulyupin 671314fce1 tools/rtla: Remove unused optional option_index
The longindex argument of getopt_long() is optional
and tied to the unused local variable option_index.

Remove it to shorten the four longest functions
and make the code neater.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-2-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:53 +01:00
Costa Shulyupin 04fa6bf373 tools/rtla: Add for_each_monitored_cpu() helper
The rtla tools have many instances of iterating over CPUs while
checking if they are monitored.

Add a for_each_monitored_cpu() helper macro to make the code
more readable and reduce code duplication.

Signed-off-by: Costa Shulyupin <costa.shul@redhat.com>
Reviewed-by: Tomas Glozar <tglozar@redhat.com>
Link: https://lore.kernel.org/r/20251002123553.389467-1-costa.shul@redhat.com
Signed-off-by: Tomas Glozar <tglozar@redhat.com>
2025-11-20 13:15:53 +01:00
Steven Rostedt bc089c4725 tracing: Convert function graph set_flags() to use a switch() statement
Currently the set_flags() of the function graph tracer has a bunch of:

  if (bit == FLAG1) {
	[..]
  }

  if (bit == FLAG2) {
	[..]
  }

To clean it up a bit, convert it over to a switch statement.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251114192319.117123664@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-14 14:30:55 -05:00
Steven Rostedt 5abb6ccb58 tracing: Have function graph tracer option sleep-time be per instance
Currently the option to have function graph tracer to ignore time spent
when a task is sleeping is global when the interface is per-instance.
Changing the value in one instance will affect the results of another
instance that is also running the function graph tracer. This can lead to
confusing results.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251114192318.950255167@kernel.org
Fixes: c132be2c4f ("function_graph: Have the instances use their own ftrace_ops for filtering")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-14 14:30:55 -05:00
Steven Rostedt 4132886e1b tracing: Move graph-time out of function graph options
The option "graph-time" affects the function profiler when it is using the
function graph infrastructure. It has nothing to do with the function
graph tracer itself. The option only affects the global function profiler
and does nothing to the function graph tracer.

Move it out of the function graph tracer options and make it a global
option that is only available at the top level instance.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251114192318.781711154@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-14 14:30:55 -05:00
Steven Rostedt 6479325eca tracing: Have function graph tracer option funcgraph-irqs be per instance
Currently the option to trace interrupts in the function graph tracer is
global when the interface is per-instance. Changing the value in one
instance will affect the results of another instance that is also running
the function graph tracer. This can lead to confusing results.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251114192318.613867934@kernel.org
Fixes: c132be2c4f ("function_graph: Have the instances use their own ftrace_ops for filtering")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-14 14:30:54 -05:00
Steven Rostedt 0d5077c73a MAINTAINERS: Add Tomas Glozar as a maintainer to RTLA tool
Tomas will start taking over managing the changes to the Real-time Linux
Analysis (RTLA) tool. Make him officially one of the maintainers.

Also update the RTLA entry to include the linux-kernel mailing list as
well as list the patchwork and git repository that the patches will go
through.

Cc: Tomas Glozar <tglozar@redhat.com>
Cc: Gabriele Monaco <gmonaco@redhat.com>
Cc: John Kacur <jkacur@redhat.com>
Cc: Crystal Wood <crwood@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: https://patch.msgid.link/20251112113556.47ec9d12@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-14 12:24:41 -05:00
Yongliang Gao 97e047f44d trace/pid_list: optimize pid_list->lock contention
When the system has many cores and task switching is frequent,
setting set_ftrace_pid can cause frequent pid_list->lock contention
and high system sys usage.

For example, in a 288-core VM environment, we observed 267 CPUs
experiencing contention on pid_list->lock, with stack traces showing:

 #4 [ffffa6226fb4bc70] native_queued_spin_lock_slowpath at ffffffff99cd4b7e
 #5 [ffffa6226fb4bc90] _raw_spin_lock_irqsave at ffffffff99cd3e36
 #6 [ffffa6226fb4bca0] trace_pid_list_is_set at ffffffff99267554
 #7 [ffffa6226fb4bcc0] trace_ignore_this_task at ffffffff9925c288
 #8 [ffffa6226fb4bcd8] ftrace_filter_pid_sched_switch_probe at ffffffff99246efe
 #9 [ffffa6226fb4bcf0] __schedule at ffffffff99ccd161

Replaces the existing spinlock with a seqlock to allow concurrent readers,
while maintaining write exclusivity.

Link: https://patch.msgid.link/20251113000252.1058144-1-leonylgao@gmail.com
Reviewed-by: Huang Cun <cunhuang@tencent.com>
Signed-off-by: Yongliang Gao <leonylgao@tencent.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-13 15:15:54 -05:00
Steven Rostedt e29aa918a9 tracing: Have function graph tracer define options per instance
Currently the function graph tracer's options are saved via a global mask
when it should be per instance. Use the new infrastructure to define a
"default_flags" field in the tracer structure that is used for the top
level instance as well as new ones.

Currently the global mask causes confusion:

  # cd /sys/kernel/tracing
  # mkdir instances/foo
  # echo function_graph > instances/foo/current_tracer
  # echo 1 > options/funcgraph-args
  # echo function_graph > current_tracer
  # cat trace
[..]
 2)               |          _raw_spin_lock_irq(lock=0xffff96b97dea16c0) {
 2)   0.422 us    |            do_raw_spin_lock(lock=0xffff96b97dea16c0);
 7)               |              rcu_sched_clock_irq(user=0) {
 2)   1.478 us    |          }
 7)   0.758 us    |                rcu_is_cpu_rrupt_from_idle();
 2)   0.647 us    |          enqueue_hrtimer(timer=0xffff96b97dea2058, base=0xffff96b97dea1740, mode=0);
 # cat instances/foo/options/funcgraph-args
 1
 # cat instances/foo/trace
[..]
 4)               |  __x64_sys_read() {
 4)               |    ksys_read() {
 4)   0.755 us    |      fdget_pos();
 4)               |      vfs_read() {
 4)               |        rw_verify_area() {
 4)               |          security_file_permission() {
 4)               |            apparmor_file_permission() {
 4)               |              common_file_perm() {
 4)               |                aa_file_perm() {
 4)               |                  rcu_read_lock_held() {
[..]

The above shows that updating the "funcgraph-args" option at the top level
instance also updates the "funcgraph-args" option in the instance but
because the update is only done by the instance that gets changed (as it
should), it's confusing to see that the option is already set in the other
instance.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251111232429.641030027@kernel.org
Fixes: c132be2c4f ("function_graph: Have the instances use their own ftrace_ops for filtering")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-13 15:08:17 -05:00
Steven Rostedt 76680d0d28 tracing: Have function tracer define options per instance
Currently the function tracer's options are saved via a global mask when
it should be per instance. Use the new infrastructure to define a
"default_flags" field in the tracer structure that is used for the top
level instance as well as new ones.

Currently the global mask causes confusion:

  # cd /sys/kernel/tracing
  # mkdir instances/foo
  # echo function > instances/foo/current_tracer
  # echo 1 > options/func-args
  # echo function > current_tracer
  # cat trace
[..]
  <idle>-0       [005] d..3.  1050.656187: rcu_needs_cpu() <-tick_nohz_next_event
  <idle>-0       [005] d..3.  1050.656188: get_next_timer_interrupt(basej=0x10002dbad, basem=0xf45fd7d300) <-tick_nohz_next_event
  <idle>-0       [005] d..3.  1050.656189: _raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt
  <idle>-0       [005] d..4.  1050.656190: do_raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt
  <idle>-0       [005] d..4.  1050.656191: _raw_spin_lock_nested(lock=0xffff8944bdf5f140, subclass=1) <-__get_next_timer_interrupt
 # cat instances/foo/options/func-args
 1
 # cat instances/foo/trace
[..]
  kworker/4:1-88      [004] ...1.   298.127735: next_zone <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127736: first_online_pgdat <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127738: next_online_pgdat <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127739: fold_diff <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127741: round_jiffies_relative <-vmstat_update
[..]

The above shows that updating the "func-args" option at the top level
instance also updates the "func-args" option in the instance but because
the update is only done by the instance that gets changed (as it should),
it's confusing to see that the option is already set in the other instance.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251111232429.470883736@kernel.org
Fixes: f20a580627 ("ftrace: Allow instances to use function tracing")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-12 09:59:54 -05:00
Steven Rostedt 428add559b tracing: Have tracer option be instance specific
Tracers can add specify options to modify them. This logic was added
before instances were created and the tracer flags were global variables.
After instances were created where a tracer may exist in more than one
instance, the flags were not updated from being global into instance
specific. This causes confusion with these options. For example, the
function tracer has an option to enable function arguments:

  # cd /sys/kernel/tracing
  # mkdir instances/foo
  # echo function > instances/foo/current_tracer
  # echo 1 > options/func-args
  # echo function > current_tracer
  # cat trace
[..]
  <idle>-0       [005] d..3.  1050.656187: rcu_needs_cpu() <-tick_nohz_next_event
  <idle>-0       [005] d..3.  1050.656188: get_next_timer_interrupt(basej=0x10002dbad, basem=0xf45fd7d300) <-tick_nohz_next_event
  <idle>-0       [005] d..3.  1050.656189: _raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt
  <idle>-0       [005] d..4.  1050.656190: do_raw_spin_lock(lock=0xffff8944bdf5de80) <-__get_next_timer_interrupt
  <idle>-0       [005] d..4.  1050.656191: _raw_spin_lock_nested(lock=0xffff8944bdf5f140, subclass=1) <-__get_next_timer_interrupt
 # cat instances/foo/options/func-args
 1
 # cat instances/foo/trace
[..]
  kworker/4:1-88      [004] ...1.   298.127735: next_zone <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127736: first_online_pgdat <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127738: next_online_pgdat <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127739: fold_diff <-refresh_cpu_vm_stats
  kworker/4:1-88      [004] ...1.   298.127741: round_jiffies_relative <-vmstat_update
[..]

The above shows that setting "func-args" in the top level instance also
set it in the instance "foo", but since the interface of the trace flags
are per instance, the update didn't take affect in the "foo" instance.

Update the infrastructure to allow tracers to add a "default_flags" field
in the tracer structure that can be set instead of "flags" which will make
the flags per instance. If a tracer needs to keep the flags global (like
blktrace), keeping the "flags" field set will keep the old behavior.

This does not update function or the function graph tracers. That will be
handled later.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251111232429.305317942@kernel.org
Fixes: f20a580627 ("ftrace: Allow instances to use function tracing")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-12 09:59:54 -05:00
Menglong Dong cd06078a38 tracing: fprobe: use ftrace if CONFIG_DYNAMIC_FTRACE_WITH_ARGS
For now, we will use ftrace for the fprobe if fp->exit_handler not exists
and CONFIG_DYNAMIC_FTRACE_WITH_REGS is enabled.

However, CONFIG_DYNAMIC_FTRACE_WITH_REGS is not supported by some arch,
such as arm. What we need in the fprobe is the function arguments, so we
can use ftrace for fprobe if CONFIG_DYNAMIC_FTRACE_WITH_ARGS is enabled.

Therefore, use ftrace if CONFIG_DYNAMIC_FTRACE_WITH_REGS or
CONFIG_DYNAMIC_FTRACE_WITH_ARGS enabled.

Link: https://lore.kernel.org/all/20251103063434.47388-1-dongml2@chinatelecom.cn/

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-11 22:32:10 +09:00
Menglong Dong 08ed5c81f6 lib/test_fprobe: add testcase for mixed fprobe
Add the testcase for the fprobe, which will hook the same target with two
fprobe: entry, entry+exit. And the two fprobes will be registered with
different order.

fgraph and ftrace are both used for the fprobe, and this testcase is for
the mixed situation.

Link: https://lore.kernel.org/all/20251015083238.2374294-3-dongml2@chinatelecom.cn/

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-11 22:32:09 +09:00
Menglong Dong 2c67dc457b tracing: fprobe: optimization for entry only case
For now, fgraph is used for the fprobe, even if we need trace the entry
only. However, the performance of ftrace is better than fgraph, and we
can use ftrace_ops for this case.

Then performance of kprobe-multi increases from 54M to 69M. Before this
commit:

  $ ./benchs/run_bench_trigger.sh kprobe-multi
  kprobe-multi   :   54.663 ± 0.493M/s

After this commit:

  $ ./benchs/run_bench_trigger.sh kprobe-multi
  kprobe-multi   :   69.447 ± 0.143M/s

Mitigation is disable during the bench testing above.

Link: https://lore.kernel.org/all/20251015083238.2374294-2-dongml2@chinatelecom.cn/

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-11 22:32:09 +09:00
Masami Hiramatsu (Google) e667152e00 tracing: fprobe: Fix to init fprobe_ip_table earlier
Since the fprobe_ip_table is used from module unloading in
the failure path of load_module(), it must be initialized in
the earlier timing than late_initcall(). Unless that, the
fprobe_module_callback() will use an uninitialized spinlock of
fprobe_ip_table.

Initialize fprobe_ip_table in core_initcall which is the same
timing as ftrace.

Link: https://lore.kernel.org/all/175939434403.3665022.13030530757238556332.stgit@mhiramat.tok.corp.google.com/

Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202509301440.be4b3631-lkp@intel.com
Fixes: e5a4cc28a052 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Menglong Dong <menglong8.dong@gmail.com>
2025-11-11 22:32:09 +09:00
Thomas Weißschuh 69d8895cb9 rv: Add explicit lockdep context for reactors
Reactors can be called from any context through tracepoints.
When developing reactors care needs to be taken to only call APIs which
are safe. As the tracepoints used during testing may not actually be
called from restrictive contexts lockdep may not be helpful.

Add explicit overrides to help lockdep find invalid code patterns.

The usage of LD_WAIT_FREE will trigger lockdep warnings in the panic
reactor. These are indeed valid warnings but they are out of scope for
RV and will instead be fixed by the printk subsystem.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Acked-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20251014-rv-lockdep-v1-3-0b9e51919ea8@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 13:18:56 +01:00
Thomas Weißschuh 68f63cea46 rv: Make rv_reacting_on() static
There are no external users left.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20251014-rv-lockdep-v1-2-0b9e51919ea8@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 13:18:56 +01:00
Thomas Weißschuh 4f739ed19d rv: Pass va_list to reactors
The only thing the reactors can do with the passed in varargs is to
convert it into a va_list. Do that in a central helper instead.
It simplifies the reactors, removes some hairy macro-generated code
and introduces a convenient hook point to modify reactor behavior.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Link: https://lore.kernel.org/r/20251014-rv-lockdep-v1-1-0b9e51919ea8@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 13:18:55 +01:00
Gabriele Monaco 0c0cd931a0 selftests/verification: Add initial RV tests
Add a series of tests to validate the RV tracefs API and basic
functionality.

* available monitors:
    Check that all monitors (from the monitors folder) appear as
    available and have a description. Works with nested monitors.

* enable/disable:
    Enable and disable all monitors and validate both the enabled file
    and the enabled_monitors. Check that enabling container monitors
    enables all nested monitors.

* reactors:
    Set all reactors and validate the setting, also for nested monitors.

* wwnr with printk:
    wwnr is broken on purpose, run it with a load and check that the
    printk reactor works. Also validate disabling reacting_on or
    monitoring_on prevents reactions.

These tests use the ftracetest suite.

Acked-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20251017115203.140080-3-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 13:18:55 +01:00
Gabriele Monaco a0aa283c53 selftest/ftrace: Generalise ftracetest to use with RV
The ftracetest script is a fairly complete test framework for tracefs-like
subsystem, but it can only be used for ftrace selftests.

If OPT_TEST_DIR is provided and includes a function file, use that as
test directory going forward rather than just grabbing tests from it.

Generalise function names like initialize_ftrace to initialize_system.

Add the --rv argument to set up the test for rv, basically changing the
trace directory to $TRACING_DIR/rv and displaying an error if that
cannot be found.

This prepares for rv selftests inclusion.

Link: https://lore.kernel.org/r/20251017115203.140080-2-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2025-11-11 12:40:15 +01:00
Masami Hiramatsu (Google) 7157062bb4 tracing: Report wrong dynamic event command
Report wrong dynamic event type in the command via error_log.
-----
 # echo "z hoge" > /sys/kernel/tracing/dynamic_events
 sh: write error: Invalid argument
 # cat /sys/kernel/tracing/error_log
 [   22.977022] dynevent: error: No matching dynamic event type
   Command: z hoge
            ^
-----

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/176278970056.343441.10528135217342926645.stgit@devnote2
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-10 19:26:14 -05:00
Steven Rostedt 3a0d5bc76f tracing: Use switch statement instead of ifs in set_tracer_flag()
The "mask" passed in to set_trace_flag() has a single bit set. The
function then checks if the mask is equal to one of the option masks and
performs the appropriate function associated to that option.

Instead of having a bunch of "if ()" statement, use a "switch ()"
statement instead to make it cleaner and a bit more optimal.

No function changes.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251106003501.890298562@kernel.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-10 14:33:51 -05:00
Steven Rostedt 9c5053083e tracing: Exit out immediately after update_marker_trace()
The call to update_marker_trace() in set_tracer_flag() performs the update
to the tr->trace_flags. There's no reason to perform it again after it is
called. Return immediately instead.

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251106003501.726406870@kernel.org
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-10 14:33:43 -05:00
Steven Rostedt 5aa0d18df0 tracing: Have add_tracer_options() error pass up to callers
The function add_tracer_options() can fail, but currently it is ignored.
Pass the status of add_tracer_options() up to adding a new tracer as well
as when an instance is created. Have the instance creation fail if the
add_tracer_options() fail.

Only print a warning for the top level instance, like it does with other
failures.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251105161935.375299297@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-10 14:28:31 -05:00
Steven Rostedt c7bed15ccf tracing: Remove dummy options and flags
When a tracer does not define their own flags, dummy options and flags are
used so that the values are always valid. There's not that many locations
that reference these values so having dummy versions just complicates the
code. Remove the dummy values and just check for NULL when appropriate.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20251105161935.206093132@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-10 14:27:04 -05:00
Steven Rostedt a10e6e6818 tracing: Hide __NR_utimensat and _NR_mq_timedsend when not defined
Some architectures (riscv-32) do not define __NR_utimensat and
_NR_mq_timedsend, and fails to build when they are used.

Hide them in "ifdef"s.

Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251104205310.00a1db9a@batman.local.home
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202511031239.ZigDcWzY-lkp@intel.com/
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-10 14:23:53 -05:00
Steven Rostedt 2f294c35c0 Merge branch 'topic/func-profiler-offset' of git://git.kernel.org/pub/scm/linux/kernel/git/mhiramat/linux into trace/trace/core
Updates to the function profiler adds new options to tracefs. The options
are currently defined by an enum as flags. The added options brings the
number of options over 32, which means they can no longer be held in a 32
bit enum. The TRACE_ITER_* flags are converted to a macro TRACE_ITER(*) to
allow the creation of options to still be done by macros.

This change is intrusive, as it affects all TRACE_ITER* options throughout
the trace code. Merge the branch that added these options and converted
the TRACE_ITER_* enum into a TRACE_ITER(*) macro, to allow the topic
branches to still be developed without conflict.

Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-11-04 10:12:32 -05:00
Masami Hiramatsu (Google) 1149fcf759 tracing: Add an option to show symbols in _text+offset for function profiler
Function profiler shows the hit count of each function using its symbol
name. However, there are some same-name local symbols, which we can not
distinguish.
To solve this issue, this introduces an option to show the symbols
in "_text+OFFSET" format. This can avoid exposing the random shift of
KASLR. The functions in modules are shown as "MODNAME+OFFSET" where the
offset is from ".text".

E.g. for the kernel text symbols, specify vmlinux and the output to
 addr2line, you can find the actual function and source info;

  $ addr2line -fie vmlinux _text+3078208
  __balance_callbacks
  kernel/sched/core.c:5064

for modules, specify the module file and .text+OFFSET;

  $ addr2line -fie samples/trace_events/trace-events-sample.ko .text+8224
  do_simple_thread_func
  samples/trace_events/trace-events-sample.c:23

Link: https://lore.kernel.org/all/176187878064.994619.8878296550240416558.stgit@devnote2/

Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-04 21:44:18 +09:00
Masami Hiramatsu (Google) bbec8e28ca tracing: Allow tracer to add more than 32 options
Since enum trace_iterator_flags is 32bit, the max number of the
option flags is limited to 32 and it is fully used now. To add
a new option, we need to expand it.

So replace the TRACE_ITER_##flag with TRACE_ITER(flag) macro which
is 64bit bitmask.

Link: https://lore.kernel.org/all/176187877103.994619.166076000668757232.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-04 21:44:00 +09:00
Masami Hiramatsu (Google) 90e69d291d tracing: fprobe: Remove unused local variable
The 'ret' local variable in fprobe_remove_node_in_module() was used
for checking the error state in the loop, but commit dfe0d675df82
("tracing: fprobe: use rhltable for fprobe_ip_table") removed the loop.
So we don't need it anymore.

Link: https://lore.kernel.org/all/175867358989.600222.6175459620045800878.stgit@devnote2/

Fixes: e5a4cc28a052 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Menglong Dong <menglong8.dong@gmail.com>
2025-11-01 01:10:29 +09:00
Thorsten Blum cbe1e1241a tracing: probes: Replace strcpy() with memcpy() in __trace_probe_log_err()
strcpy() is deprecated; use memcpy() instead.

Link: https://lore.kernel.org/all/20250820214717.778243-3-thorsten.blum@linux.dev/

Link: https://github.com/KSPP/linux/issues/88
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:29 +09:00
Menglong Dong ceb5d8d367 tracing: fprobe: fix suspicious rcu usage in fprobe_entry
rcu_read_lock() is not needed in fprobe_entry, but rcu_dereference_check()
is used in rhltable_lookup(), which causes suspicious RCU usage warning:

  WARNING: suspicious RCU usage
  6.17.0-rc1-00001-gdfe0d675df82 #1 Tainted: G S
  -----------------------------
  include/linux/rhashtable.h:602 suspicious rcu_dereference_check() usage!
  ......
  stack backtrace:
  CPU: 1 UID: 0 PID: 4652 Comm: ftracetest Tainted: G S
  Tainted: [S]=CPU_OUT_OF_SPEC, [I]=FIRMWARE_WORKAROUND
  Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015
  Call Trace:
   <TASK>
   dump_stack_lvl+0x7c/0x90
   lockdep_rcu_suspicious+0x14f/0x1c0
   __rhashtable_lookup+0x1e0/0x260
   ? __pfx_kernel_clone+0x10/0x10
   fprobe_entry+0x9a/0x450
   ? __lock_acquire+0x6b0/0xca0
   ? find_held_lock+0x2b/0x80
   ? __pfx_fprobe_entry+0x10/0x10
   ? __pfx_kernel_clone+0x10/0x10
   ? lock_acquire+0x14c/0x2d0
   ? __might_fault+0x74/0xc0
   function_graph_enter_regs+0x2a0/0x550
   ? __do_sys_clone+0xb5/0x100
   ? __pfx_function_graph_enter_regs+0x10/0x10
   ? _copy_to_user+0x58/0x70
   ? __pfx_kernel_clone+0x10/0x10
   ? __x64_sys_rt_sigprocmask+0x114/0x180
   ? __pfx___x64_sys_rt_sigprocmask+0x10/0x10
   ? __pfx_kernel_clone+0x10/0x10
   ftrace_graph_func+0x87/0xb0

As we discussed in [1], fix this by using guard(rcu)() in fprobe_entry()
to protect the rhltable_lookup() and rhl_for_each_entry_rcu() with
rcu_read_lock and suppress this warning.

Link: https://lore.kernel.org/all/20250904062729.151931-1-dongml2@chinatelecom.cn/

Link: https://lore.kernel.org/all/20250829021436.19982-1-dongml2@chinatelecom.cn/ [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Closes: https://lore.kernel.org/oe-lkp/202508281655.54c87330-lkp@intel.com
Fixes: dfe0d675df82 ("tracing: fprobe: use rhltable for fprobe_ip_table")
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:29 +09:00
Masami Hiramatsu (Google) 84404ce71a tracing: uprobe: eprobes: Allocate traceprobe_parse_context per probe
Since traceprobe_parse_context is reusable among a probe arguments,
it is more efficient to allocate it outside of the loop for parsing
probe argument as kprobe and fprobe events do.

Link: https://lore.kernel.org/all/175509541393.193596.16330324746701582114.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:29 +09:00
Masami Hiramatsu (Google) 8b658df206 tracing: uprobes: Cleanup __trace_uprobe_create() with __free()
Use __free() to cleanup ugly gotos in __trace_uprobe_create().

Link: https://lore.kernel.org/all/175509540482.193596.6541098946023873304.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:29 +09:00
Masami Hiramatsu (Google) 0d6edbc9a4 tracing: eprobe: Cleanup eprobe event using __free()
Use __free(trace_event_probe_cleanup) to remove unneeded gotos and
cleanup the last part of trace_eprobe_parse_filter().

Link: https://lore.kernel.org/all/175509539571.193596.4674012182718751429.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:29 +09:00
Masami Hiramatsu (Google) f959ecdfcb tracing: probes: Use __free() for trace_probe_log
Use __free() for trace_probe_log_clear() to cleanup error log interface.

Link: https://lore.kernel.org/all/175509538609.193596.16646724647358218778.stgit@devnote2/

Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:28 +09:00
Menglong Dong 0de4c70d04 tracing: fprobe: use rhltable for fprobe_ip_table
For now, all the kernel functions who are hooked by the fprobe will be
added to the hash table "fprobe_ip_table". The key of it is the function
address, and the value of it is "struct fprobe_hlist_node".

The budget of the hash table is FPROBE_IP_TABLE_SIZE, which is 256. And
this means the overhead of the hash table lookup will grow linearly if
the count of the functions in the fprobe more than 256. When we try to
hook all the kernel functions, the overhead will be huge.

Therefore, replace the hash table with rhltable to reduce the overhead.

Link: https://lore.kernel.org/all/20250819031825.55653-1-dongml2@chinatelecom.cn/

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-11-01 01:10:28 +09:00
Bartosz Golaszewski b21f90e2e4 scripts: add tracepoint-update to the list of ignores files
The new program for removing unused tracepoints is not ignored as it
should. Add it to the local .gitignore.

Cc: Vladimir Oltean <vladimir.oltean@nxp.com>
Cc: Jacob Keller <jacob.e.keller@intel.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Link: https://lore.kernel.org/20251029120709.24669-1-brgl@bgdev.pl
Fixes: e30f8e61e2 ("tracing: Add a tracepoint verification check at build time")
Signed-off-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-29 08:46:05 -04:00
Steven Rostedt 25bd47a592 tracing: Have persistent ring buffer print syscalls normally
The persistent ring buffer from a previous boot has to be careful printing
events as the print formats of random events can have pointers to strings
and such that are not available.

Ftrace static events (like the function tracer event) are stable and are
printed normally.

System call event formats are also stable. Allow them to be printed
normally as well:

Instead of:

  <...>-1       [005] ...1.    57.240405: sys_enter_waitid: __syscall_nr=0xf7 (247) which=0x1 (1) upid=0x499 (1177) infop=0x7ffd5294d690 (140725988939408) options=0x5 (5) ru=0x0 (0)
  <...>-1       [005] ...1.    57.240433: sys_exit_waitid: __syscall_nr=0xf7 (247) ret=0x0 (0)
  <...>-1       [005] ...1.    57.240437: sys_enter_rt_sigprocmask: __syscall_nr=0xe (14) how=0x2 (2) nset=0x7ffd5294d7c0 (140725988939712) oset=0x0 (0) sigsetsize=0x8 (8)
  <...>-1       [005] ...1.    57.240438: sys_exit_rt_sigprocmask: __syscall_nr=0xe (14) ret=0x0 (0)
  <...>-1       [005] ...1.    57.240442: sys_enter_close: __syscall_nr=0x3 (3) fd=0x4 (4)
  <...>-1       [005] ...1.    57.240463: sys_exit_close: __syscall_nr=0x3 (3) ret=0x0 (0)
  <...>-1       [005] ...1.    57.240485: sys_enter_openat: __syscall_nr=0x101 (257) dfd=0xffffffffffdfff9c (-2097252) filename=(0xffff8b81639ca01c) flags=0x80000 (524288) mode=0x0 (0) __filename_val=/run/systemd/reboot-param
  <...>-1       [005] ...1.    57.240555: sys_exit_openat: __syscall_nr=0x101 (257) ret=0xffffffffffdffffe (-2097154)
  <...>-1       [005] ...1.    57.240571: sys_enter_openat: __syscall_nr=0x101 (257) dfd=0xffffffffffdfff9c (-2097252) filename=(0xffff8b81639ca01c) flags=0x80000 (524288) mode=0x0 (0) __filename_val=/run/systemd/reboot-param
  <...>-1       [005] ...1.    57.240620: sys_exit_openat: __syscall_nr=0x101 (257) ret=0xffffffffffdffffe (-2097154)
  <...>-1       [005] ...1.    57.240629: sys_enter_writev: __syscall_nr=0x14 (20) fd=0x3 (3) vec=0x7ffd5294ce50 (140725988937296) vlen=0x7 (7)
  <...>-1       [005] ...1.    57.242281: sys_exit_writev: __syscall_nr=0x14 (20) ret=0x24 (36)
  <...>-1       [005] ...1.    57.242286: sys_enter_reboot: __syscall_nr=0xa9 (169) magic1=0xfee1dead (4276215469) magic2=0x28121969 (672274793) cmd=0x1234567 (19088743) arg=0x0 (0)

Have:

  <...>-1       [000] ...1.    91.446011: sys_waitid(which: 1, upid: 0x4d2, infop: 0x7ffdccdadfd0, options: 5, ru: 0)
  <...>-1       [000] ...1.    91.446042: sys_waitid -> 0x0
  <...>-1       [000] ...1.    91.446045: sys_rt_sigprocmask(how: 2, nset: 0x7ffdccdae100, oset: 0, sigsetsize: 8)
  <...>-1       [000] ...1.    91.446047: sys_rt_sigprocmask -> 0x0
  <...>-1       [000] ...1.    91.446051: sys_close(fd: 4)
  <...>-1       [000] ...1.    91.446073: sys_close -> 0x0
  <...>-1       [000] ...1.    91.446095: sys_openat(dfd: 18446744073709551516, filename: 139732544945794 "/run/systemd/reboot-param", flags: O_RDONLY|O_CLOEXEC)
  <...>-1       [000] ...1.    91.446165: sys_openat -> 0xfffffffffffffffe
  <...>-1       [000] ...1.    91.446182: sys_openat(dfd: 18446744073709551516, filename: 139732544945794 "/run/systemd/reboot-param", flags: O_RDONLY|O_CLOEXEC)
  <...>-1       [000] ...1.    91.446233: sys_openat -> 0xfffffffffffffffe
  <...>-1       [000] ...1.    91.446242: sys_writev(fd: 3, vec: 0x7ffdccdad790, vlen: 7)
  <...>-1       [000] ...1.    91.447877: sys_writev -> 0x24
  <...>-1       [000] ...1.    91.447883: sys_reboot(magic1: 0xfee1dead, magic2: 0x28121969, cmd: 0x1234567, arg: 0)

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231149.097404581@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:11:00 -04:00
Steven Rostedt b6e5d971fc tracing: Check for printable characters when printing field dyn strings
When the "fields" option is enabled, it prints each trace event field
based on its type. But a dynamic array and a dynamic string can both have
a "char *" type. Printing it as a string can cause escape characters to be
printed and mess up the output of the trace.

For dynamic strings, test if there are any non-printable characters, and
if so, print both the string with the non printable characters as '.', and
the print the hex value of the array.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231148.929243047@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:59 -04:00
Steven Rostedt 64b627c8da tracing: Add parsing of flags to the sys_enter_openat trace event
Add some logic to give the openat system call trace event a bit more human
readable information:

   syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7f0053dc121c "/etc/ld.so.cache", flags: O_RDONLY|O_CLOEXEC, mode: 0000

The above is output from "perf script" and now shows the flags used by the
openat system call.

Since the output from tracing is in the kernel, it can also remove the
mode field when not used (when flags does not contain O_CREATE|O_TMPFILE)

   touch-1185    [002] ...1.  1291.690154: sys_openat(dfd: 4294967196, filename: 139785545139344 "/usr/lib/locale/locale-archive", flags: O_RDONLY|O_CLOEXEC)
   touch-1185    [002] ...1.  1291.690504: sys_openat(dfd: 18446744073709551516, filename: 140733603151330 "/tmp/x", flags: O_WRONLY|O_CREAT|O_NOCTTY|O_NONBLOCK, mode: 0666)

As system calls have a fixed ABI, their trace events can be extended. This
currently only updates the openat system call, but others may be extended
in the future.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231148.763161484@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:59 -04:00
Steven Rostedt 32e0f607ac tracing: Add trace_seq_pop() and seq_buf_pop()
In order to allow an interface to remove an added character from the
trace_seq and seq_buf descriptors, add helper functions trace_seq_pop()
and seq_buf_pop().

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231148.594898736@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:59 -04:00
Steven Rostedt e77ad6da90 tracing: Show printable characters in syscall arrays
When displaying the contents of the user space data passed to the kernel,
instead of just showing the array values, also print any printable
content.

Instead of just:

  bash-1113    [003] .....  3433.290654: sys_write(fd: 2, buf: 0x555a8deeddb0 (72:6f:6f:74:40:64:65:62:69:61:6e:2d:78:38:36:2d:36:34:3a:7e:23:20), count: 0x16)

Display:

  bash-1113    [003] .....  3433.290654: sys_write(fd: 2, buf: 0x555a8deeddb0 (72:6f:6f:74:40:64:65:62:69:61:6e:2d:78:38:36:2d:36:34:3a:7e:23:20) "root@debian-x86-64:~# ", count: 0x16)

This only affects tracing and does not affect perf, as this only updates
the output from the kernel. The output from perf is via user space. This
may change by an update to libtraceevent that will then update perf to
have this as well.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231148.429422865@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:59 -04:00
Steven Rostedt 299ea67e6a tracing: Add a config and syscall_user_buf_size file to limit amount written
When a system call that can copy user space addresses into the ring
buffer, it can copy up to 511 bytes of data. This can waste precious ring
buffer space if the user isn't interested in the output. Add a new file
"syscall_user_buf_size" that gets initialized to a new config
CONFIG_SYSCALL_BUF_SIZE_DEFAULT that defaults to 63.

The config also is used to limit how much perf can read from user space.

Also lower the max down to 165, as this isn't to record everything that a
system call may be passing through to the kernel. 165 is more than enough.

The reason for 165 is because adding one for the nul terminating byte, as
well as possibly needing to append the "..." string turns it into 170
bytes. As this needs to save up to 3 arguments and 3 * 170 is 510 which
fits nicely in 512 bytes (a power of 2).

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231148.260068913@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:59 -04:00
Steven Rostedt baa031b7bd tracing: Allow syscall trace events to read more than one user parameter
Allow more than one field of a syscall trace event to read user space.
Build on top of the user_mask by allowing more than one bit to be set that
corresponds to the @args array of the syscall metadata. For each argument
in the @args array that is to be read, it will have a dynamic array/string
field associated to it.

Note that multiple fields to be read from user space is not supported if
the user_arg_size field is set in the syscall metada. That field can only
be used if only one field is being read from user space as that field is a
number representing the size field of the syscall event that holds the
size of the data to read from user space. It becomes ambiguous if the
system call reads more than one field. Currently this is not an issue.

If a syscall event happens to enable two events to read user space and
sets the user_arg_size field, it will trigger a warning at boot and the
user_arg_size field will be cleared.

The per CPU buffer that is used to read the user space addresses is now
broken up into 3 sections, each of 168 bytes. The reason for 168 is that
it is the biggest portion of 512 bytes divided by 3 that is 8 byte aligned.

The max amount copied into the ring buffer from user space is now only 128
bytes, which is plenty. When reading user space, it still reads 167
(168-1) bytes and uses the remaining to know if it should append the extra
"..." to the end or not.

This will allow the event to look like this:

  sys_renameat2(olddfd: 0xffffff9c, oldname: 0x7ffe02facdff "/tmp/x", newdfd: 0xffffff9c, newname: 0x7ffe02face06 "/tmp/y", flags: 1)

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231148.095789277@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:58 -04:00
Steven Rostedt 011ea0501d tracing: Display some syscall arrays as strings
Some of the system calls that read a fixed length of memory from the user
space address are not arrays but strings. Take a bit away from the nb_args
field in the syscall meta data to use as a flag to denote that the system
call's user_arg_size is being used as a string. The nb_args should never
be more than 6, so 7 bits is plenty to hold that number. When the
user_arg_is_str flag that, when set, will display the data array from the
user space address as a string and not an array.

This will allow the output to look like this:

  sys_sethostname(name: 0x5584310eb2a0 "debian", len: 6)

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231147.930550359@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:58 -04:00
Steven Rostedt b4f7624cfc tracing: Have system call events record user array data
For system call events that have a length field, add a "user_arg_size"
parameter to the system call meta data that denotes the index of the args
array that holds the size of arg that the user_mask field has a bit set
for.

The "user_mask" has a bit set that denotes the arg that points to an array
in the user space address space and if a system call event has the
user_mask field set and the user_arg_size set, it will then record the
content of that address into the trace event, up to the size defined by
SYSCALL_FAULT_BUF_SZ - 1.

This allows the output to look like:

  sys_write(fd: 0xa, buf: 0x5646978d13c0 (01:00:05:00:00:00:00:00:01:87:55:89:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00), count: 0x20)

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231147.763528474@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:58 -04:00
Steven Rostedt 2e82e256df perf: tracing: Have perf system calls read user space
Allow some of the system call events to read user space buffers. Instead
of just showing the pointer into user space, allow perf events to also
record the content of those pointers. For example:

  # perf record -e syscalls:sys_enter_openat ls /usr/bin
  [..]
  # perf script
      ls    1024 [005]    52.902721: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbae321c "/etc/ld.so.cache", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.902899: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaae140 "/lib/x86_64-linux-gnu/libselinux.so.1", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.903471: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaae690 "/lib/x86_64-linux-gnu/libcap.so.2", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.903946: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaaebe0 "/lib/x86_64-linux-gnu/libc.so.6", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.904629: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dbaaf110 "/lib/x86_64-linux-gnu/libpcre2-8.so.0", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.906985: syscalls:sys_enter_openat: dfd: 0xffffffffffffff9c, filename: 0x7fc1dba92904 "/proc/filesystems", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.907323: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x7fc1dba19490 "/usr/lib/locale/locale-archive", flags: 0x00080000, mode: 0x00000000
      ls    1024 [005]    52.907746: syscalls:sys_enter_openat: dfd: 0xffffff9c, filename: 0x556fb888dcd0 "/usr/bin", flags: 0x00090800, mode: 0x00000000

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231147.593925979@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:58 -04:00
Steven Rostedt bd1b80fba7 perf: tracing: Simplify perf_sysenter_enable/disable() with guards
Use guard(mutex)(&syscall_trace_lock) for perf_sysenter_enable() and
perf_sysenter_disable() as well as for the perf_sysexit_enable() and
perf_sysexit_disable(). This will make it easier to update these functions
with other code that has early exit handling.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231147.429583335@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:58 -04:00
Steven Rostedt a544d9a66b tracing: Have syscall trace events read user space string
As of commit 654ced4a13 ("tracing: Introduce tracepoint_is_faultable()")
system call trace events allow faulting in user space memory. Have some of
the system call trace events take advantage of this.

Use the trace_user_fault_read() logic to read the user space buffer from
user space and instead of just saving the pointer to the buffer in the
system call event, also save the string that is passed in.

The syscall event has its nb_args shorten from an int to a short (where
even u8 is plenty big enough) and the freed two bytes are used for
"user_mask".  The new "user_mask" field is used to store the index of the
"args" field array that has the address to read from user space. This
value is set to 0 if the system call event does not need to read user
space for a field. This mask can be used to know if the event may fault or
not. Only one bit set in user_mask is supported at this time.

This allows the output to look like this:

 sys_access(filename: 0x7f8c55368470 "/etc/ld.so.preload", mode: 4)
 sys_execve(filename: 0x564ebcf5a6b8 "/usr/bin/emacs", argv: 0x7fff357c0300, envp: 0x564ebc4a4820)

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231147.261867956@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:58 -04:00
Steven Rostedt a9f1687264 tracing: Make trace_user_fault_read() exposed to rest of tracing
The write to the trace_marker file is a critical section where it cannot
take locks nor allocate memory. To read from user space, it allocates a per
CPU buffer when the trace_marker file is opened, and then when the write
system call is performed, it uses the following method to read from user
space:

	preempt_disable();
	buffer = per_cpu_ptr(cpu_buffers, cpu);
	do {
		cnt = nr_context_switches_cpu();
		migrate_disable();
		preempt_enable();
		ret = copy_from_user(buffer, ptr, len);
		preempt_disable();
		migrate_enable();
	} while (!ret && cnt != nr_context_switches_cpu());
	if (!ret)
		ring_buffer_write(buffer);
	preempt_enable();

It records the number of context switches for the current CPU, enables
preemption, copies from user space, disable preemption and then checks if
the number of context switches changed. If it did not, then the buffer is
valid, otherwise the buffer may have been corrupted and the read from user
space must be tried again.

The system call trace events are now faultable and have the same
restrictions as the trace_marker write. For system calls to read the user
space buffer (for example to read the file of the openat system call), it
needs the same logic. Instead of copying the code over to the system call
trace events, make the code generic to allow the system call trace events to
use the same code. The following API is added internally to the tracing sub
system (these are only exposed within the tracing subsystem and not to be
used outside of it):

  trace_user_fault_init() - initializes a trace_user_buf_info descriptor
       that will allocate the per CPU buffers to copy from user space into.

  trace_user_fault_destroy() - used to free the allocations made by
       trace_user_fault_init().

  trace_user_fault_get() - update the ref count of the info descriptor to
       allow more than one user to use the same descriptor.

  trace_user_fault_put() - decrement the ref count.

  trace_user_fault_read() - performs the above action to read user space
      into the per CPU buffer. The preempt_disable() is expected before
      calling this function and preemption must remain disabled while the
      buffer returned is in use.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Takaya Saeki <takayas@google.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ian Rogers <irogers@google.com>
Cc: Douglas Raillard <douglas.raillard@arm.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Link: https://lore.kernel.org/20251028231147.096570057@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-28 20:10:57 -04:00
Steven Rostedt 01ecf7af00 tracing: Add warnings for unused tracepoints for modules
If a modules has TRACE_EVENT() but does not use it, add a warning about it
at build time.

Currently, the build must be made by adding "UT=1" to the make command
line in order for this to trigger.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas.schier@linux.dev>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/20251022004453.422000794@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-24 16:43:15 -04:00
Steven Rostedt eec3516b25 tracing: Allow tracepoint-update.c to work with modules
In order for tracepoint-update.c to work with modules, it cannot error out
if both "__tracepoint_check" and "__tracepoints_strings" are not found.
When enabled, the vmlinux.o may be required to have both, but modules only
have these sections if they have tracepoints. Modules without tracepoints
will not have either. They should not fail to build because of that.

If one section exists the other one should too. Note, if a module defines
a tracepoint but doesn't use any, it can cause this to fail.

Add a new "--module" parameter to tracepoint-update to be used when
running on module code. It will not error out if this is set and both
sections are missing. If this is set, and only the "__tracepoint_check"
section is missing, it means the module has defined tracepoints but none
of them are used. In that case, it prints a warning that the module has
only unused tracepoints and exits normally to not fail the build.

If the "__tracepoint_check" section exists but not the
"__tracepoint_strings", then that is an error and should fail the build.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas.schier@linux.dev>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/20251022004453.255696445@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-24 16:43:15 -04:00
Steven Rostedt faf938153c tracepoint: Do not warn for unused event that is exported
There are a few generic events that may only be used by modules. They are
defined and then set with EXPORT_TRACEPOINT*(). Mark events that are
exported as being used, even though they still waste memory in the kernel
proper.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas.schier@linux.dev>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/20251022004453.089254920@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-24 16:43:14 -04:00
Steven Rostedt e30f8e61e2 tracing: Add a tracepoint verification check at build time
If a tracepoint is defined via DECLARE_TRACE() or TRACE_EVENT() but never
called (via the trace_<tracepoint>() function), its metadata is still
around in memory and not discarded.

When created via TRACE_EVENT() the situation is worse because the
TRACE_EVENT() creates metadata that can be around 5k per trace event.
Having unused trace events causes several thousand of wasted bytes.

Add a verifier that injects a string of the name of the tracepoint it
calls that is added to the discarded section "__tracepoint_check".
For every builtin tracepoint, its name (which is saved in the in-memory
section "__tracepoint_strings") will have its name also in the
"__tracepoint_check" section if it is used.

Add a new program that is run on build called tracepoint-update. This is
executed on the vmlinux.o before the __tracepoint_check section is
discarded (the section is discarded before vmlinux is created). This
program will create an array of each string in the __tracepoint_check
section and then sort it. Then it will walk the strings in the
__tracepoint_strings section and do a binary search to check if its name
is in the __tracepoint_check section. If it is not, then it is unused and
a warning is printed.

Note, this currently only handles tracepoints that are builtin and not in
modules.

Enabling this currently with a given config produces:

warning: tracepoint 'sched_move_numa' is unused.
warning: tracepoint 'sched_stick_numa' is unused.
warning: tracepoint 'sched_swap_numa' is unused.
warning: tracepoint 'pelt_hw_tp' is unused.
warning: tracepoint 'pelt_irq_tp' is unused.
warning: tracepoint 'rcu_preempt_task' is unused.
warning: tracepoint 'rcu_unlock_preempted_task' is unused.
warning: tracepoint 'xdp_bulk_tx' is unused.
warning: tracepoint 'xdp_redirect_map' is unused.
warning: tracepoint 'xdp_redirect_map_err' is unused.
warning: tracepoint 'vma_mas_szero' is unused.
warning: tracepoint 'vma_store' is unused.
warning: tracepoint 'hugepage_set_pmd' is unused.
warning: tracepoint 'hugepage_set_pud' is unused.
warning: tracepoint 'hugepage_update_pmd' is unused.
warning: tracepoint 'hugepage_update_pud' is unused.
warning: tracepoint 'block_rq_remap' is unused.
warning: tracepoint 'xhci_dbc_handle_event' is unused.
warning: tracepoint 'xhci_dbc_handle_transfer' is unused.
warning: tracepoint 'xhci_dbc_gadget_ep_queue' is unused.
warning: tracepoint 'xhci_dbc_alloc_request' is unused.
warning: tracepoint 'xhci_dbc_free_request' is unused.
warning: tracepoint 'xhci_dbc_queue_request' is unused.
warning: tracepoint 'xhci_dbc_giveback_request' is unused.
warning: tracepoint 'tcp_ao_wrong_maclen' is unused.
warning: tracepoint 'tcp_ao_mismatch' is unused.
warning: tracepoint 'tcp_ao_key_not_found' is unused.
warning: tracepoint 'tcp_ao_rnext_request' is unused.
warning: tracepoint 'tcp_ao_synack_no_key' is unused.
warning: tracepoint 'tcp_ao_snd_sne_update' is unused.
warning: tracepoint 'tcp_ao_rcv_sne_update' is unused.

Some of the above is totally unused but others are not used due to their
"trace_" functions being inside configs, in which case, the defined
tracepoints should also be inside those same configs. Others are
architecture specific but defined in generic code, where they should
either be moved to the architecture or be surrounded by #ifdef for the
architectures they are for.

This tool could be updated to process modules in the future.

I'd like to thank Mathieu Desnoyers for suggesting using strings instead
of pointers, as using pointers in vmlinux.o required handling relocations
and it required implementing almost a full feature linker to do so.

To enable this check, run the build with: make UT=1

Note, when all the existing unused tracepoints are removed from the build,
the "UT=1" will be removed and this will always be enabled when
tracepoints are configured to warn on any new tracepoints. The reason this
isn't always enabled now is because it will introduce a lot of warnings
for the current unused tracepoints, and all bisects would end at this
commit for those warnings.

Link: https://lore.kernel.org/all/20250528114549.4d8a5e03@gandalf.local.home/

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas.schier@linux.dev>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/20251022004452.920728129@kernel.org
Suggested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> # for using strings instead of pointers
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-24 16:43:14 -04:00
Steven Rostedt b055f4c431 sorttable: Move ELF parsing into scripts/elf-parse.[ch]
In order to share the elf parsing that is in sorttable.c so that other
programs could use the same code, move it into elf-parse.c and
elf-parse.h.

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas.schier@linux.dev>
Cc: Nick Desaulniers <nick.desaulniers+lkml@gmail.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Link: https://lore.kernel.org/20251022004452.752298788@kernel.org
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-24 16:43:14 -04:00
Tzung-Bi Shih b692553573 pstore/ram: Update module parameters from platform data
Update module parameters `mem_type` and `ramoops_ecc` from platform data
so that they are available through /sys/module/ramoops/parameters/.

`ramoops_dump_oops` isn't included as it has been deprecated.

Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20251023143755.26204-1-tzungbi@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
2025-10-23 08:56:04 -07:00
96 changed files with 4282 additions and 2162 deletions

View File

@ -48,7 +48,7 @@
If not set, tracer threads keep their default priority. For rtla user threads, it is set to SCHED_FIFO with priority 95. For kernel threads, see *osnoise* and *timerlat* tracer documentation for the running kernel version. If not set, tracer threads keep their default priority. For rtla user threads, it is set to SCHED_FIFO with priority 95. For kernel threads, see *osnoise* and *timerlat* tracer documentation for the running kernel version.
**-C**, **--cgroup**\[*=cgroup*] **-C**, **--cgroup** \[*cgroup*]
Set a *cgroup* to the tracer's threads. If the **-C** option is passed without arguments, the tracer's thread will inherit **rtla**'s *cgroup*. Otherwise, the threads will be placed on the *cgroup* passed to the option. Set a *cgroup* to the tracer's threads. If the **-C** option is passed without arguments, the tracer's thread will inherit **rtla**'s *cgroup*. Otherwise, the threads will be placed on the *cgroup* passed to the option.

View File

@ -366,6 +366,14 @@ of ftrace. Here is a list of some of the key files:
for each function. The displayed address is the patch-site address for each function. The displayed address is the patch-site address
and can differ from /proc/kallsyms address. and can differ from /proc/kallsyms address.
syscall_user_buf_size:
Some system call trace events will record the data from a user
space address that one of the parameters point to. The amount of
data per event is limited. This file holds the max number of bytes
that will be recorded into the ring buffer to hold this data.
The max value is currently 165.
dyn_ftrace_total_info: dyn_ftrace_total_info:
This file is for debugging purposes. The number of functions that This file is for debugging purposes. The number of functions that

View File

@ -21802,8 +21802,12 @@ F: tools/testing/selftests/rtc/
Real-time Linux Analysis (RTLA) tools Real-time Linux Analysis (RTLA) tools
M: Steven Rostedt <rostedt@goodmis.org> M: Steven Rostedt <rostedt@goodmis.org>
M: Tomas Glozar <tglozar@redhat.com>
L: linux-trace-kernel@vger.kernel.org L: linux-trace-kernel@vger.kernel.org
L: linux-kernel@vger.kernel.org
S: Maintained S: Maintained
Q: https://patchwork.kernel.org/project/linux-trace-kernel/list/
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
F: Documentation/tools/rtla/ F: Documentation/tools/rtla/
F: tools/tracing/rtla/ F: tools/tracing/rtla/
@ -22696,6 +22700,7 @@ F: Documentation/trace/rv/
F: include/linux/rv.h F: include/linux/rv.h
F: include/rv/ F: include/rv/
F: kernel/trace/rv/ F: kernel/trace/rv/
F: tools/testing/selftests/verification/
F: tools/verification/ F: tools/verification/
RUST RUST

View File

@ -810,6 +810,25 @@ ifdef CONFIG_FUNCTION_TRACER
CC_FLAGS_FTRACE := -pg CC_FLAGS_FTRACE := -pg
endif endif
ifdef CONFIG_TRACEPOINTS
# To check for unused tracepoints (tracepoints that are defined but never
# called), run with:
#
# make UT=1
#
# Each unused tracepoints can take up to 5KB of memory in the running kernel.
# It is best to remove any that are not used.
#
# This command line option will be removed when all current unused
# tracepoints are removed.
ifeq ("$(origin UT)", "command line")
WARN_ON_UNUSED_TRACEPOINTS := $(UT)
endif
endif # CONFIG_TRACEPOINTS
export WARN_ON_UNUSED_TRACEPOINTS
include $(srctree)/arch/$(SRCARCH)/Makefile include $(srctree)/arch/$(SRCARCH)/Makefile
ifdef need-config ifdef need-config
@ -940,6 +959,9 @@ KBUILD_CFLAGS += $(call cc-option,-fzero-init-padding-bits=all)
# for the randomize_kstack_offset feature. Disable it for all compilers. # for the randomize_kstack_offset feature. Disable it for all compilers.
KBUILD_CFLAGS += $(call cc-option, -fno-stack-clash-protection) KBUILD_CFLAGS += $(call cc-option, -fno-stack-clash-protection)
# Get details on warnings generated due to GCC value tracking.
KBUILD_CFLAGS += $(call cc-option, -fdiagnostics-show-context=2)
# Clear used registers at func exit (to reduce data lifetime and ROP gadgets). # Clear used registers at func exit (to reduce data lifetime and ROP gadgets).
ifdef CONFIG_ZERO_CALL_USED_REGS ifdef CONFIG_ZERO_CALL_USED_REGS
KBUILD_CFLAGS += -fzero-call-used-regs=used-gpr KBUILD_CFLAGS += -fzero-call-used-regs=used-gpr
@ -1784,6 +1806,8 @@ help:
@echo ' c: extra checks in the configuration stage (Kconfig)' @echo ' c: extra checks in the configuration stage (Kconfig)'
@echo ' e: warnings are being treated as errors' @echo ' e: warnings are being treated as errors'
@echo ' Multiple levels can be combined with W=12 or W=123' @echo ' Multiple levels can be combined with W=12 or W=123'
@echo ' make UT=1 [targets] Warn if a tracepoint is defined but not used.'
@echo ' [ This will be removed when all current unused tracepoints are eliminated. ]'
@$(if $(dtstree), \ @$(if $(dtstree), \
echo ' make CHECK_DTBS=1 [targets] Check all generated dtb files against schema'; \ echo ' make CHECK_DTBS=1 [targets] Check all generated dtb files against schema'; \
echo ' This can be applied both to "dtbs" and to individual "foo.dtb" targets' ; \ echo ' This can be applied both to "dtbs" and to individual "foo.dtb" targets' ; \

View File

@ -199,7 +199,7 @@ static int ni_670x_auto_attach(struct comedi_device *dev,
const struct comedi_lrange **range_table_list; const struct comedi_lrange **range_table_list;
range_table_list = kmalloc_array(32, range_table_list = kmalloc_array(32,
sizeof(struct comedi_lrange *), sizeof(*range_table_list),
GFP_KERNEL); GFP_KERNEL);
if (!range_table_list) if (!range_table_list)
return -ENOMEM; return -ENOMEM;

View File

@ -425,7 +425,7 @@ static int __drm_universal_plane_init(struct drm_device *dev,
plane->modifier_count = format_modifier_count; plane->modifier_count = format_modifier_count;
plane->modifiers = kmalloc_array(format_modifier_count, plane->modifiers = kmalloc_array(format_modifier_count,
sizeof(format_modifiers[0]), sizeof(*plane->modifiers),
GFP_KERNEL); GFP_KERNEL);
if (format_modifier_count && !plane->modifiers) { if (format_modifier_count && !plane->modifiers) {

View File

@ -1212,5 +1212,5 @@ void iris_hfi_gen2_command_ops_init(struct iris_core *core)
struct iris_inst *iris_hfi_gen2_get_instance(void) struct iris_inst *iris_hfi_gen2_get_instance(void)
{ {
return kzalloc(sizeof(struct iris_inst_hfi_gen2), GFP_KERNEL); return (struct iris_inst *)kzalloc(sizeof(struct iris_inst_hfi_gen2), GFP_KERNEL);
} }

View File

@ -598,7 +598,7 @@ static void detach_attrs(struct config_item * item)
static int populate_attrs(struct config_item *item) static int populate_attrs(struct config_item *item)
{ {
const struct config_item_type *t = item->ci_type; const struct config_item_type *t = item->ci_type;
struct configfs_group_operations *ops; const struct configfs_group_operations *ops;
struct configfs_attribute *attr; struct configfs_attribute *attr;
struct configfs_bin_attribute *bin_attr; struct configfs_bin_attribute *bin_attr;
int error = 0; int error = 0;

View File

@ -30,7 +30,7 @@ struct configfs_buffer {
size_t count; size_t count;
loff_t pos; loff_t pos;
char * page; char * page;
struct configfs_item_operations * ops; const struct configfs_item_operations *ops;
struct mutex mutex; struct mutex mutex;
int needs_read_fill; int needs_read_fill;
bool read_in_progress; bool read_in_progress;

View File

@ -864,6 +864,8 @@ static int ramoops_probe(struct platform_device *pdev)
ramoops_console_size = pdata->console_size; ramoops_console_size = pdata->console_size;
ramoops_pmsg_size = pdata->pmsg_size; ramoops_pmsg_size = pdata->pmsg_size;
ramoops_ftrace_size = pdata->ftrace_size; ramoops_ftrace_size = pdata->ftrace_size;
mem_type = pdata->mem_type;
ramoops_ecc = pdata->ecc_info.ecc_size;
pr_info("using 0x%lx@0x%llx, ecc: %d\n", pr_info("using 0x%lx@0x%llx, ecc: %d\n",
cxt->size, (unsigned long long)cxt->phys_addr, cxt->size, (unsigned long long)cxt->phys_addr,

View File

@ -1065,6 +1065,7 @@
*(.no_trim_symbol) \ *(.no_trim_symbol) \
/* ld.bfd warns about .gnu.version* even when not emitted */ \ /* ld.bfd warns about .gnu.version* even when not emitted */ \
*(.gnu.version*) \ *(.gnu.version*) \
*(__tracepoint_check) \
#define DISCARDS \ #define DISCARDS \
/DISCARD/ : { \ /DISCARD/ : { \

View File

@ -64,8 +64,8 @@ extern void config_item_put(struct config_item *);
struct config_item_type { struct config_item_type {
struct module *ct_owner; struct module *ct_owner;
struct configfs_item_operations *ct_item_ops; const struct configfs_item_operations *ct_item_ops;
struct configfs_group_operations *ct_group_ops; const struct configfs_group_operations *ct_group_ops;
struct configfs_attribute **ct_attrs; struct configfs_attribute **ct_attrs;
struct configfs_bin_attribute **ct_bin_attrs; struct configfs_bin_attribute **ct_bin_attrs;
}; };

View File

@ -7,6 +7,7 @@
#include <linux/ftrace.h> #include <linux/ftrace.h>
#include <linux/rcupdate.h> #include <linux/rcupdate.h>
#include <linux/refcount.h> #include <linux/refcount.h>
#include <linux/rhashtable.h>
#include <linux/slab.h> #include <linux/slab.h>
struct fprobe; struct fprobe;
@ -26,7 +27,7 @@ typedef void (*fprobe_exit_cb)(struct fprobe *fp, unsigned long entry_ip,
* @fp: The fprobe which owns this. * @fp: The fprobe which owns this.
*/ */
struct fprobe_hlist_node { struct fprobe_hlist_node {
struct hlist_node hlist; struct rhlist_head hlist;
unsigned long addr; unsigned long addr;
struct fprobe *fp; struct fprobe *fp;
}; };

View File

@ -1167,17 +1167,14 @@ static inline void ftrace_init(void) { }
*/ */
struct ftrace_graph_ent { struct ftrace_graph_ent {
unsigned long func; /* Current function */ unsigned long func; /* Current function */
int depth; unsigned long depth;
} __packed; } __packed;
/* /*
* Structure that defines an entry function trace with retaddr. * Structure that defines an entry function trace with retaddr.
* It's already packed but the attribute "packed" is needed
* to remove extra padding at the end.
*/ */
struct fgraph_retaddr_ent { struct fgraph_retaddr_ent {
unsigned long func; /* Current function */ struct ftrace_graph_ent ent;
int depth;
unsigned long retaddr; /* Return address */ unsigned long retaddr; /* Return address */
} __packed; } __packed;

View File

@ -458,6 +458,18 @@ static inline size_t __must_check size_sub(size_t minuend, size_t subtrahend)
#define struct_size_t(type, member, count) \ #define struct_size_t(type, member, count) \
struct_size((type *)NULL, member, count) struct_size((type *)NULL, member, count)
/**
* struct_offset() - Calculate the offset of a member within a struct
* @p: Pointer to the struct
* @member: Name of the member to get the offset of
*
* Calculates the offset of a particular @member of the structure pointed
* to by @p.
*
* Return: number of bytes to the location of @member.
*/
#define struct_offset(p, member) (offsetof(typeof(*(p)), member))
/** /**
* __DEFINE_FLEX() - helper macro for DEFINE_FLEX() family. * __DEFINE_FLEX() - helper macro for DEFINE_FLEX() family.
* Enables caller macro to pass arbitrary trailing expressions * Enables caller macro to pass arbitrary trailing expressions

View File

@ -88,7 +88,7 @@ union rv_task_monitor {
struct rv_reactor { struct rv_reactor {
const char *name; const char *name;
const char *description; const char *description;
__printf(1, 2) void (*react)(const char *msg, ...); __printf(1, 0) void (*react)(const char *msg, va_list args);
struct list_head list; struct list_head list;
}; };
#endif #endif
@ -102,7 +102,7 @@ struct rv_monitor {
void (*reset)(void); void (*reset)(void);
#ifdef CONFIG_RV_REACTORS #ifdef CONFIG_RV_REACTORS
struct rv_reactor *reactor; struct rv_reactor *reactor;
__printf(1, 2) void (*react)(const char *msg, ...); __printf(1, 0) void (*react)(const char *msg, va_list args);
#endif #endif
struct list_head list; struct list_head list;
struct rv_monitor *parent; struct rv_monitor *parent;
@ -116,13 +116,14 @@ int rv_get_task_monitor_slot(void);
void rv_put_task_monitor_slot(int slot); void rv_put_task_monitor_slot(int slot);
#ifdef CONFIG_RV_REACTORS #ifdef CONFIG_RV_REACTORS
bool rv_reacting_on(void);
int rv_unregister_reactor(struct rv_reactor *reactor); int rv_unregister_reactor(struct rv_reactor *reactor);
int rv_register_reactor(struct rv_reactor *reactor); int rv_register_reactor(struct rv_reactor *reactor);
__printf(2, 3)
void rv_react(struct rv_monitor *monitor, const char *msg, ...);
#else #else
static inline bool rv_reacting_on(void) __printf(2, 3)
static inline void rv_react(struct rv_monitor *monitor, const char *msg, ...)
{ {
return false;
} }
#endif /* CONFIG_RV_REACTORS */ #endif /* CONFIG_RV_REACTORS */

View File

@ -149,6 +149,23 @@ static inline void seq_buf_commit(struct seq_buf *s, int num)
} }
} }
/**
* seq_buf_pop - pop off the last written character
* @s: the seq_buf handle
*
* Removes the last written character to the seq_buf @s.
*
* Returns the last character or -1 if it is empty.
*/
static inline int seq_buf_pop(struct seq_buf *s)
{
if (!s->len)
return -1;
s->len--;
return (unsigned int)s->buffer[s->len];
}
extern __printf(2, 3) extern __printf(2, 3)
int seq_buf_printf(struct seq_buf *s, const char *fmt, ...); int seq_buf_printf(struct seq_buf *s, const char *fmt, ...);
extern __printf(2, 0) extern __printf(2, 0)

View File

@ -371,6 +371,10 @@ static inline void memzero_explicit(void *s, size_t count)
* kbasename - return the last part of a pathname. * kbasename - return the last part of a pathname.
* *
* @path: path to extract the filename from. * @path: path to extract the filename from.
*
* Returns:
* Pointer to the filename portion inside @path. If no '/' exists,
* returns @path unchanged.
*/ */
static inline const char *kbasename(const char *path) static inline const char *kbasename(const char *path)
{ {
@ -556,6 +560,9 @@ static __always_inline size_t str_has_prefix(const char *str, const char *prefix
* strstarts - does @str start with @prefix? * strstarts - does @str start with @prefix?
* @str: string to examine * @str: string to examine
* @prefix: prefix to look for. * @prefix: prefix to look for.
*
* Returns:
* True if @str begins with @prefix. False in all other cases.
*/ */
static inline bool strstarts(const char *str, const char *prefix) static inline bool strstarts(const char *str, const char *prefix)
{ {

View File

@ -80,6 +80,19 @@ static inline bool trace_seq_has_overflowed(struct trace_seq *s)
return s->full || seq_buf_has_overflowed(&s->seq); return s->full || seq_buf_has_overflowed(&s->seq);
} }
/**
* trace_seq_pop - pop off the last written character
* @s: trace sequence descriptor
*
* Removes the last written character to the trace_seq @s.
*
* Returns the last character or -1 if it is empty.
*/
static inline int trace_seq_pop(struct trace_seq *s)
{
return seq_buf_pop(&s->seq);
}
/* /*
* Currently only defined when tracing is enabled. * Currently only defined when tracing is enabled.
*/ */

View File

@ -221,6 +221,15 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
__do_trace_##name(args); \ __do_trace_##name(args); \
} }
/*
* When a tracepoint is used, it's name is added to the __tracepoint_check
* section. This section is only used at build time to make sure all
* defined tracepoints are used. It is discarded after the build.
*/
# define TRACEPOINT_CHECK(name) \
static const char __used __section("__tracepoint_check") \
__trace_check_##name[] = #name;
/* /*
* Make sure the alignment of the structure in the __tracepoints section will * Make sure the alignment of the structure in the __tracepoints section will
* not add unwanted padding between the beginning of the section and the * not add unwanted padding between the beginning of the section and the
@ -270,6 +279,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
__DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \
static inline void __do_trace_##name(proto) \ static inline void __do_trace_##name(proto) \
{ \ { \
TRACEPOINT_CHECK(name) \
if (cond) { \ if (cond) { \
guard(preempt_notrace)(); \ guard(preempt_notrace)(); \
__DO_TRACE_CALL(name, TP_ARGS(args)); \ __DO_TRACE_CALL(name, TP_ARGS(args)); \
@ -289,6 +299,7 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
__DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \ __DECLARE_TRACE_COMMON(name, PARAMS(proto), PARAMS(args), PARAMS(data_proto)) \
static inline void __do_trace_##name(proto) \ static inline void __do_trace_##name(proto) \
{ \ { \
TRACEPOINT_CHECK(name) \
guard(rcu_tasks_trace)(); \ guard(rcu_tasks_trace)(); \
__DO_TRACE_CALL(name, TP_ARGS(args)); \ __DO_TRACE_CALL(name, TP_ARGS(args)); \
} \ } \
@ -371,10 +382,12 @@ static inline struct tracepoint *tracepoint_ptr_deref(tracepoint_ptr_t *p)
__DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args)); __DEFINE_TRACE_EXT(_name, NULL, PARAMS(_proto), PARAMS(_args));
#define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \ #define EXPORT_TRACEPOINT_SYMBOL_GPL(name) \
TRACEPOINT_CHECK(name) \
EXPORT_SYMBOL_GPL(__tracepoint_##name); \ EXPORT_SYMBOL_GPL(__tracepoint_##name); \
EXPORT_SYMBOL_GPL(__traceiter_##name); \ EXPORT_SYMBOL_GPL(__traceiter_##name); \
EXPORT_STATIC_CALL_GPL(tp_func_##name) EXPORT_STATIC_CALL_GPL(tp_func_##name)
#define EXPORT_TRACEPOINT_SYMBOL(name) \ #define EXPORT_TRACEPOINT_SYMBOL(name) \
TRACEPOINT_CHECK(name) \
EXPORT_SYMBOL(__tracepoint_##name); \ EXPORT_SYMBOL(__tracepoint_##name); \
EXPORT_SYMBOL(__traceiter_##name); \ EXPORT_SYMBOL(__traceiter_##name); \
EXPORT_STATIC_CALL(tp_func_##name) EXPORT_STATIC_CALL(tp_func_##name)

View File

@ -16,34 +16,19 @@
#include <linux/bug.h> #include <linux/bug.h>
#include <linux/sched.h> #include <linux/sched.h>
#ifdef CONFIG_RV_REACTORS
#define DECLARE_RV_REACTING_HELPERS(name, type) \
static void cond_react_##name(type curr_state, type event) \
{ \
if (!rv_reacting_on() || !rv_##name.react) \
return; \
rv_##name.react("rv: monitor %s does not allow event %s on state %s\n", \
#name, \
model_get_event_name_##name(event), \
model_get_state_name_##name(curr_state)); \
}
#else /* CONFIG_RV_REACTOR */
#define DECLARE_RV_REACTING_HELPERS(name, type) \
static void cond_react_##name(type curr_state, type event) \
{ \
return; \
}
#endif
/* /*
* Generic helpers for all types of deterministic automata monitors. * Generic helpers for all types of deterministic automata monitors.
*/ */
#define DECLARE_DA_MON_GENERIC_HELPERS(name, type) \ #define DECLARE_DA_MON_GENERIC_HELPERS(name, type) \
\ \
DECLARE_RV_REACTING_HELPERS(name, type) \ static void react_##name(type curr_state, type event) \
{ \
rv_react(&rv_##name, \
"rv: monitor %s does not allow event %s on state %s\n", \
#name, \
model_get_event_name_##name(event), \
model_get_state_name_##name(curr_state)); \
} \
\ \
/* \ /* \
* da_monitor_reset_##name - reset a monitor and setting it to init state \ * da_monitor_reset_##name - reset a monitor and setting it to init state \
@ -126,7 +111,7 @@ da_event_##name(struct da_monitor *da_mon, enum events_##name event) \
for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { \ for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { \
next_state = model_get_next_state_##name(curr_state, event); \ next_state = model_get_next_state_##name(curr_state, event); \
if (next_state == INVALID_STATE) { \ if (next_state == INVALID_STATE) { \
cond_react_##name(curr_state, event); \ react_##name(curr_state, event); \
trace_error_##name(model_get_state_name_##name(curr_state), \ trace_error_##name(model_get_state_name_##name(curr_state), \
model_get_event_name_##name(event)); \ model_get_event_name_##name(event)); \
return false; \ return false; \
@ -165,7 +150,7 @@ static inline bool da_event_##name(struct da_monitor *da_mon, struct task_struct
for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { \ for (int i = 0; i < MAX_DA_RETRY_RACING_EVENTS; i++) { \
next_state = model_get_next_state_##name(curr_state, event); \ next_state = model_get_next_state_##name(curr_state, event); \
if (next_state == INVALID_STATE) { \ if (next_state == INVALID_STATE) { \
cond_react_##name(curr_state, event); \ react_##name(curr_state, event); \
trace_error_##name(tsk->pid, \ trace_error_##name(tsk->pid, \
model_get_state_name_##name(curr_state), \ model_get_state_name_##name(curr_state), \
model_get_event_name_##name(event)); \ model_get_event_name_##name(event)); \

View File

@ -16,23 +16,9 @@
#error "Please include $(MODEL_NAME).h generated by rvgen" #error "Please include $(MODEL_NAME).h generated by rvgen"
#endif #endif
#ifdef CONFIG_RV_REACTORS
#define RV_MONITOR_NAME CONCATENATE(rv_, MONITOR_NAME) #define RV_MONITOR_NAME CONCATENATE(rv_, MONITOR_NAME)
static struct rv_monitor RV_MONITOR_NAME; static struct rv_monitor RV_MONITOR_NAME;
static void rv_cond_react(struct task_struct *task)
{
if (!rv_reacting_on() || !RV_MONITOR_NAME.react)
return;
RV_MONITOR_NAME.react("rv: "__stringify(MONITOR_NAME)": %s[%d]: violation detected\n",
task->comm, task->pid);
}
#else
static void rv_cond_react(struct task_struct *task)
{
}
#endif
static int ltl_monitor_slot = RV_PER_TASK_MONITOR_INIT; static int ltl_monitor_slot = RV_PER_TASK_MONITOR_INIT;
static void ltl_atoms_fetch(struct task_struct *task, struct ltl_monitor *mon); static void ltl_atoms_fetch(struct task_struct *task, struct ltl_monitor *mon);
@ -98,7 +84,8 @@ static void ltl_monitor_destroy(void)
static void ltl_illegal_state(struct task_struct *task, struct ltl_monitor *mon) static void ltl_illegal_state(struct task_struct *task, struct ltl_monitor *mon)
{ {
CONCATENATE(trace_error_, MONITOR_NAME)(task); CONCATENATE(trace_error_, MONITOR_NAME)(task);
rv_cond_react(task); rv_react(&RV_MONITOR_NAME, "rv: "__stringify(MONITOR_NAME)": %s[%d]: violation detected\n",
task->comm, task->pid);
} }
static void ltl_attempt_start(struct task_struct *task, struct ltl_monitor *mon) static void ltl_attempt_start(struct task_struct *task, struct ltl_monitor *mon)

View File

@ -16,6 +16,9 @@
* @name: name of the syscall * @name: name of the syscall
* @syscall_nr: number of the syscall * @syscall_nr: number of the syscall
* @nb_args: number of parameters it takes * @nb_args: number of parameters it takes
* @user_arg_is_str: set if the arg for @user_arg_size is a string
* @user_arg_size: holds @arg that has size of the user space to read
* @user_mask: mask of @args that will read user space
* @types: list of types as strings * @types: list of types as strings
* @args: list of args as strings (args[i] matches types[i]) * @args: list of args as strings (args[i] matches types[i])
* @enter_fields: list of fields for syscall_enter trace event * @enter_fields: list of fields for syscall_enter trace event
@ -25,7 +28,10 @@
struct syscall_metadata { struct syscall_metadata {
const char *name; const char *name;
int syscall_nr; int syscall_nr;
int nb_args; u8 nb_args:7;
u8 user_arg_is_str:1;
s8 user_arg_size;
short user_mask;
const char **types; const char **types;
const char **args; const char **args;
struct list_head enter_fields; struct list_head enter_fields;

View File

@ -342,6 +342,20 @@ config DYNAMIC_FTRACE_WITH_JMP
depends on DYNAMIC_FTRACE_WITH_DIRECT_CALLS depends on DYNAMIC_FTRACE_WITH_DIRECT_CALLS
depends on HAVE_DYNAMIC_FTRACE_WITH_JMP depends on HAVE_DYNAMIC_FTRACE_WITH_JMP
config FUNCTION_SELF_TRACING
bool "Function trace tracing code"
depends on FUNCTION_TRACER
help
Normally all the tracing code is set to notrace, where the function
tracer will ignore all the tracing functions. Sometimes it is useful
for debugging to trace some of the tracing infratructure itself.
Enable this to allow some of the tracing infrastructure to be traced
by the function tracer. Note, this will likely add noise to function
tracing if events and other tracing features are enabled along with
function tracing.
If unsure, say N.
config FPROBE config FPROBE
bool "Kernel Function Probe (fprobe)" bool "Kernel Function Probe (fprobe)"
depends on HAVE_FUNCTION_GRAPH_FREGS && HAVE_FTRACE_GRAPH_FUNC depends on HAVE_FUNCTION_GRAPH_FREGS && HAVE_FTRACE_GRAPH_FUNC
@ -587,6 +601,20 @@ config FTRACE_SYSCALLS
help help
Basic tracer to catch the syscall entry and exit events. Basic tracer to catch the syscall entry and exit events.
config TRACE_SYSCALL_BUF_SIZE_DEFAULT
int "System call user read max size"
range 0 165
default 63
depends on FTRACE_SYSCALLS
help
Some system call trace events will record the data from a user
space address that one of the parameters point to. The amount of
data per event is limited. That limit is set by this config and
this config also affects how much user space data perf can read.
For a tracing instance, this size may be changed by writing into
its syscall_user_buf_size file.
config TRACER_SNAPSHOT config TRACER_SNAPSHOT
bool "Create a snapshot trace buffer" bool "Create a snapshot trace buffer"
select TRACER_MAX_TRACE select TRACER_MAX_TRACE

View File

@ -16,6 +16,23 @@ obj-y += trace_selftest_dynamic.o
endif endif
endif endif
# Allow some files to be function traced
ifdef CONFIG_FUNCTION_SELF_TRACING
CFLAGS_trace_output.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_seq.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_stat.o = $(CC_FLAGS_FTRACE)
CFLAGS_tracing_map.o = $(CC_FLAGS_FTRACE)
CFLAGS_synth_event_gen_test.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_events.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_syscalls.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_events_filter.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_events_trigger.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_events_synth.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_events_hist.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_events_user.o = $(CC_FLAGS_FTRACE)
CFLAGS_trace_dynevent.o = $(CC_FLAGS_FTRACE)
endif
ifdef CONFIG_FTRACE_STARTUP_TEST ifdef CONFIG_FTRACE_STARTUP_TEST
CFLAGS_trace_kprobe_selftest.o = $(CC_FLAGS_FTRACE) CFLAGS_trace_kprobe_selftest.o = $(CC_FLAGS_FTRACE)
obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe_selftest.o obj-$(CONFIG_KPROBE_EVENTS) += trace_kprobe_selftest.o

View File

@ -1738,7 +1738,7 @@ static enum print_line_t print_one_line(struct trace_iterator *iter,
t = te_blk_io_trace(iter->ent); t = te_blk_io_trace(iter->ent);
what = (t->action & ((1 << BLK_TC_SHIFT) - 1)) & ~__BLK_TA_CGROUP; what = (t->action & ((1 << BLK_TC_SHIFT) - 1)) & ~__BLK_TA_CGROUP;
long_act = !!(tr->trace_flags & TRACE_ITER_VERBOSE); long_act = !!(tr->trace_flags & TRACE_ITER(VERBOSE));
log_action = classic ? &blk_log_action_classic : &blk_log_action; log_action = classic ? &blk_log_action_classic : &blk_log_action;
has_cg = t->action & __BLK_TA_CGROUP; has_cg = t->action & __BLK_TA_CGROUP;
@ -1803,9 +1803,9 @@ blk_tracer_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
/* don't output context-info for blk_classic output */ /* don't output context-info for blk_classic output */
if (bit == TRACE_BLK_OPT_CLASSIC) { if (bit == TRACE_BLK_OPT_CLASSIC) {
if (set) if (set)
tr->trace_flags &= ~TRACE_ITER_CONTEXT_INFO; tr->trace_flags &= ~TRACE_ITER(CONTEXT_INFO);
else else
tr->trace_flags |= TRACE_ITER_CONTEXT_INFO; tr->trace_flags |= TRACE_ITER(CONTEXT_INFO);
} }
return 0; return 0;
} }

View File

@ -498,9 +498,6 @@ void *fgraph_retrieve_parent_data(int idx, int *size_bytes, int depth)
return get_data_type_data(current, offset); return get_data_type_data(current, offset);
} }
/* Both enabled by default (can be cleared by function_graph tracer flags */
bool fgraph_sleep_time = true;
#ifdef CONFIG_DYNAMIC_FTRACE #ifdef CONFIG_DYNAMIC_FTRACE
/* /*
* archs can override this function if they must do something * archs can override this function if they must do something
@ -1019,15 +1016,11 @@ void fgraph_init_ops(struct ftrace_ops *dst_ops,
mutex_init(&dst_ops->local_hash.regex_lock); mutex_init(&dst_ops->local_hash.regex_lock);
INIT_LIST_HEAD(&dst_ops->subop_list); INIT_LIST_HEAD(&dst_ops->subop_list);
dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED; dst_ops->flags |= FTRACE_OPS_FL_INITIALIZED;
dst_ops->private = src_ops->private;
} }
#endif #endif
} }
void ftrace_graph_sleep_time_control(bool enable)
{
fgraph_sleep_time = enable;
}
/* /*
* Simply points to ftrace_stub, but with the proper protocol. * Simply points to ftrace_stub, but with the proper protocol.
* Defined by the linker script in linux/vmlinux.lds.h * Defined by the linker script in linux/vmlinux.lds.h
@ -1098,7 +1091,7 @@ ftrace_graph_probe_sched_switch(void *ignore, bool preempt,
* Does the user want to count the time a function was asleep. * Does the user want to count the time a function was asleep.
* If so, do not update the time stamps. * If so, do not update the time stamps.
*/ */
if (fgraph_sleep_time) if (!fgraph_no_sleep_time)
return; return;
timestamp = trace_clock_local(); timestamp = trace_clock_local();
@ -1376,6 +1369,13 @@ int register_ftrace_graph(struct fgraph_ops *gops)
ftrace_graph_active++; ftrace_graph_active++;
/* Always save the function, and reset at unregistering */
gops->saved_func = gops->entryfunc;
#ifdef CONFIG_DYNAMIC_FTRACE
if (ftrace_pids_enabled(&gops->ops))
gops->entryfunc = fgraph_pid_func;
#endif
if (ftrace_graph_active == 2) if (ftrace_graph_active == 2)
ftrace_graph_disable_direct(true); ftrace_graph_disable_direct(true);
@ -1395,8 +1395,6 @@ int register_ftrace_graph(struct fgraph_ops *gops)
} else { } else {
init_task_vars(gops->idx); init_task_vars(gops->idx);
} }
/* Always save the function, and reset at unregistering */
gops->saved_func = gops->entryfunc;
gops->ops.flags |= FTRACE_OPS_FL_GRAPH; gops->ops.flags |= FTRACE_OPS_FL_GRAPH;

View File

@ -10,6 +10,7 @@
#include <linux/kprobes.h> #include <linux/kprobes.h>
#include <linux/list.h> #include <linux/list.h>
#include <linux/mutex.h> #include <linux/mutex.h>
#include <linux/rhashtable.h>
#include <linux/slab.h> #include <linux/slab.h>
#include <linux/sort.h> #include <linux/sort.h>
@ -41,60 +42,68 @@
* - RCU hlist traversal under disabling preempt * - RCU hlist traversal under disabling preempt
*/ */
static struct hlist_head fprobe_table[FPROBE_TABLE_SIZE]; static struct hlist_head fprobe_table[FPROBE_TABLE_SIZE];
static struct hlist_head fprobe_ip_table[FPROBE_IP_TABLE_SIZE]; static struct rhltable fprobe_ip_table;
static DEFINE_MUTEX(fprobe_mutex); static DEFINE_MUTEX(fprobe_mutex);
static struct fgraph_ops fprobe_graph_ops;
/* static u32 fprobe_node_hashfn(const void *data, u32 len, u32 seed)
* Find first fprobe in the hlist. It will be iterated twice in the entry
* probe, once for correcting the total required size, the second time is
* calling back the user handlers.
* Thus the hlist in the fprobe_table must be sorted and new probe needs to
* be added *before* the first fprobe.
*/
static struct fprobe_hlist_node *find_first_fprobe_node(unsigned long ip)
{ {
struct fprobe_hlist_node *node; return hash_ptr(*(unsigned long **)data, 32);
struct hlist_head *head;
head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
hlist_for_each_entry_rcu(node, head, hlist,
lockdep_is_held(&fprobe_mutex)) {
if (node->addr == ip)
return node;
}
return NULL;
} }
NOKPROBE_SYMBOL(find_first_fprobe_node);
static int fprobe_node_cmp(struct rhashtable_compare_arg *arg,
const void *ptr)
{
unsigned long key = *(unsigned long *)arg->key;
const struct fprobe_hlist_node *n = ptr;
return n->addr != key;
}
static u32 fprobe_node_obj_hashfn(const void *data, u32 len, u32 seed)
{
const struct fprobe_hlist_node *n = data;
return hash_ptr((void *)n->addr, 32);
}
static const struct rhashtable_params fprobe_rht_params = {
.head_offset = offsetof(struct fprobe_hlist_node, hlist),
.key_offset = offsetof(struct fprobe_hlist_node, addr),
.key_len = sizeof_field(struct fprobe_hlist_node, addr),
.hashfn = fprobe_node_hashfn,
.obj_hashfn = fprobe_node_obj_hashfn,
.obj_cmpfn = fprobe_node_cmp,
.automatic_shrinking = true,
};
/* Node insertion and deletion requires the fprobe_mutex */ /* Node insertion and deletion requires the fprobe_mutex */
static void insert_fprobe_node(struct fprobe_hlist_node *node) static int insert_fprobe_node(struct fprobe_hlist_node *node)
{ {
unsigned long ip = node->addr;
struct fprobe_hlist_node *next;
struct hlist_head *head;
lockdep_assert_held(&fprobe_mutex); lockdep_assert_held(&fprobe_mutex);
next = find_first_fprobe_node(ip); return rhltable_insert(&fprobe_ip_table, &node->hlist, fprobe_rht_params);
if (next) {
hlist_add_before_rcu(&node->hlist, &next->hlist);
return;
}
head = &fprobe_ip_table[hash_ptr((void *)ip, FPROBE_IP_HASH_BITS)];
hlist_add_head_rcu(&node->hlist, head);
} }
/* Return true if there are synonims */ /* Return true if there are synonims */
static bool delete_fprobe_node(struct fprobe_hlist_node *node) static bool delete_fprobe_node(struct fprobe_hlist_node *node)
{ {
lockdep_assert_held(&fprobe_mutex); lockdep_assert_held(&fprobe_mutex);
bool ret;
/* Avoid double deleting */ /* Avoid double deleting */
if (READ_ONCE(node->fp) != NULL) { if (READ_ONCE(node->fp) != NULL) {
WRITE_ONCE(node->fp, NULL); WRITE_ONCE(node->fp, NULL);
hlist_del_rcu(&node->hlist); rhltable_remove(&fprobe_ip_table, &node->hlist,
fprobe_rht_params);
} }
return !!find_first_fprobe_node(node->addr);
rcu_read_lock();
ret = !!rhltable_lookup(&fprobe_ip_table, &node->addr,
fprobe_rht_params);
rcu_read_unlock();
return ret;
} }
/* Check existence of the fprobe */ /* Check existence of the fprobe */
@ -246,12 +255,128 @@ static inline int __fprobe_kprobe_handler(unsigned long ip, unsigned long parent
return ret; return ret;
} }
static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops, #if defined(CONFIG_DYNAMIC_FTRACE_WITH_ARGS) || defined(CONFIG_DYNAMIC_FTRACE_WITH_REGS)
struct ftrace_regs *fregs) /* ftrace_ops callback, this processes fprobes which have only entry_handler. */
static void fprobe_ftrace_entry(unsigned long ip, unsigned long parent_ip,
struct ftrace_ops *ops, struct ftrace_regs *fregs)
{
struct fprobe_hlist_node *node;
struct rhlist_head *head, *pos;
struct fprobe *fp;
int bit;
bit = ftrace_test_recursion_trylock(ip, parent_ip);
if (bit < 0)
return;
/*
* ftrace_test_recursion_trylock() disables preemption, but
* rhltable_lookup() checks whether rcu_read_lcok is held.
* So we take rcu_read_lock() here.
*/
rcu_read_lock();
head = rhltable_lookup(&fprobe_ip_table, &ip, fprobe_rht_params);
rhl_for_each_entry_rcu(node, pos, head, hlist) {
if (node->addr != ip)
break;
fp = READ_ONCE(node->fp);
if (unlikely(!fp || fprobe_disabled(fp) || fp->exit_handler))
continue;
if (fprobe_shared_with_kprobes(fp))
__fprobe_kprobe_handler(ip, parent_ip, fp, fregs, NULL);
else
__fprobe_handler(ip, parent_ip, fp, fregs, NULL);
}
rcu_read_unlock();
ftrace_test_recursion_unlock(bit);
}
NOKPROBE_SYMBOL(fprobe_ftrace_entry);
static struct ftrace_ops fprobe_ftrace_ops = {
.func = fprobe_ftrace_entry,
.flags = FTRACE_OPS_FL_SAVE_ARGS,
};
static int fprobe_ftrace_active;
static int fprobe_ftrace_add_ips(unsigned long *addrs, int num)
{
int ret;
lockdep_assert_held(&fprobe_mutex);
ret = ftrace_set_filter_ips(&fprobe_ftrace_ops, addrs, num, 0, 0);
if (ret)
return ret;
if (!fprobe_ftrace_active) {
ret = register_ftrace_function(&fprobe_ftrace_ops);
if (ret) {
ftrace_free_filter(&fprobe_ftrace_ops);
return ret;
}
}
fprobe_ftrace_active++;
return 0;
}
static void fprobe_ftrace_remove_ips(unsigned long *addrs, int num)
{
lockdep_assert_held(&fprobe_mutex);
fprobe_ftrace_active--;
if (!fprobe_ftrace_active)
unregister_ftrace_function(&fprobe_ftrace_ops);
if (num)
ftrace_set_filter_ips(&fprobe_ftrace_ops, addrs, num, 1, 0);
}
static bool fprobe_is_ftrace(struct fprobe *fp)
{
return !fp->exit_handler;
}
#ifdef CONFIG_MODULES
static void fprobe_set_ips(unsigned long *ips, unsigned int cnt, int remove,
int reset)
{
ftrace_set_filter_ips(&fprobe_graph_ops.ops, ips, cnt, remove, reset);
ftrace_set_filter_ips(&fprobe_ftrace_ops, ips, cnt, remove, reset);
}
#endif
#else
static int fprobe_ftrace_add_ips(unsigned long *addrs, int num)
{
return -ENOENT;
}
static void fprobe_ftrace_remove_ips(unsigned long *addrs, int num)
{
}
static bool fprobe_is_ftrace(struct fprobe *fp)
{
return false;
}
#ifdef CONFIG_MODULES
static void fprobe_set_ips(unsigned long *ips, unsigned int cnt, int remove,
int reset)
{
ftrace_set_filter_ips(&fprobe_graph_ops.ops, ips, cnt, remove, reset);
}
#endif
#endif /* !CONFIG_DYNAMIC_FTRACE_WITH_ARGS && !CONFIG_DYNAMIC_FTRACE_WITH_REGS */
/* fgraph_ops callback, this processes fprobes which have exit_handler. */
static int fprobe_fgraph_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
struct ftrace_regs *fregs)
{ {
struct fprobe_hlist_node *node, *first;
unsigned long *fgraph_data = NULL; unsigned long *fgraph_data = NULL;
unsigned long func = trace->func; unsigned long func = trace->func;
struct fprobe_hlist_node *node;
struct rhlist_head *head, *pos;
unsigned long ret_ip; unsigned long ret_ip;
int reserved_words; int reserved_words;
struct fprobe *fp; struct fprobe *fp;
@ -260,14 +385,12 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
if (WARN_ON_ONCE(!fregs)) if (WARN_ON_ONCE(!fregs))
return 0; return 0;
first = node = find_first_fprobe_node(func); guard(rcu)();
if (unlikely(!first)) head = rhltable_lookup(&fprobe_ip_table, &func, fprobe_rht_params);
return 0;
reserved_words = 0; reserved_words = 0;
hlist_for_each_entry_from_rcu(node, hlist) { rhl_for_each_entry_rcu(node, pos, head, hlist) {
if (node->addr != func) if (node->addr != func)
break; continue;
fp = READ_ONCE(node->fp); fp = READ_ONCE(node->fp);
if (!fp || !fp->exit_handler) if (!fp || !fp->exit_handler)
continue; continue;
@ -278,15 +401,14 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
reserved_words += reserved_words +=
FPROBE_HEADER_SIZE_IN_LONG + SIZE_IN_LONG(fp->entry_data_size); FPROBE_HEADER_SIZE_IN_LONG + SIZE_IN_LONG(fp->entry_data_size);
} }
node = first;
if (reserved_words) { if (reserved_words) {
fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long)); fgraph_data = fgraph_reserve_data(gops->idx, reserved_words * sizeof(long));
if (unlikely(!fgraph_data)) { if (unlikely(!fgraph_data)) {
hlist_for_each_entry_from_rcu(node, hlist) { rhl_for_each_entry_rcu(node, pos, head, hlist) {
if (node->addr != func) if (node->addr != func)
break; continue;
fp = READ_ONCE(node->fp); fp = READ_ONCE(node->fp);
if (fp && !fprobe_disabled(fp)) if (fp && !fprobe_disabled(fp) && !fprobe_is_ftrace(fp))
fp->nmissed++; fp->nmissed++;
} }
return 0; return 0;
@ -299,14 +421,14 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
*/ */
ret_ip = ftrace_regs_get_return_address(fregs); ret_ip = ftrace_regs_get_return_address(fregs);
used = 0; used = 0;
hlist_for_each_entry_from_rcu(node, hlist) { rhl_for_each_entry_rcu(node, pos, head, hlist) {
int data_size; int data_size;
void *data; void *data;
if (node->addr != func) if (node->addr != func)
break; continue;
fp = READ_ONCE(node->fp); fp = READ_ONCE(node->fp);
if (!fp || fprobe_disabled(fp)) if (unlikely(!fp || fprobe_disabled(fp) || fprobe_is_ftrace(fp)))
continue; continue;
data_size = fp->entry_data_size; data_size = fp->entry_data_size;
@ -334,7 +456,7 @@ static int fprobe_entry(struct ftrace_graph_ent *trace, struct fgraph_ops *gops,
/* If any exit_handler is set, data must be used. */ /* If any exit_handler is set, data must be used. */
return used != 0; return used != 0;
} }
NOKPROBE_SYMBOL(fprobe_entry); NOKPROBE_SYMBOL(fprobe_fgraph_entry);
static void fprobe_return(struct ftrace_graph_ret *trace, static void fprobe_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops, struct fgraph_ops *gops,
@ -373,7 +495,7 @@ static void fprobe_return(struct ftrace_graph_ret *trace,
NOKPROBE_SYMBOL(fprobe_return); NOKPROBE_SYMBOL(fprobe_return);
static struct fgraph_ops fprobe_graph_ops = { static struct fgraph_ops fprobe_graph_ops = {
.entryfunc = fprobe_entry, .entryfunc = fprobe_fgraph_entry,
.retfunc = fprobe_return, .retfunc = fprobe_return,
}; };
static int fprobe_graph_active; static int fprobe_graph_active;
@ -449,25 +571,18 @@ static int fprobe_addr_list_add(struct fprobe_addr_list *alist, unsigned long ad
return 0; return 0;
} }
static void fprobe_remove_node_in_module(struct module *mod, struct hlist_head *head, static void fprobe_remove_node_in_module(struct module *mod, struct fprobe_hlist_node *node,
struct fprobe_addr_list *alist) struct fprobe_addr_list *alist)
{ {
struct fprobe_hlist_node *node; if (!within_module(node->addr, mod))
int ret = 0; return;
if (delete_fprobe_node(node))
hlist_for_each_entry_rcu(node, head, hlist, return;
lockdep_is_held(&fprobe_mutex)) { /*
if (!within_module(node->addr, mod)) * If failed to update alist, just continue to update hlist.
continue; * Therefore, at list user handler will not hit anymore.
if (delete_fprobe_node(node)) */
continue; fprobe_addr_list_add(alist, node->addr);
/*
* If failed to update alist, just continue to update hlist.
* Therefore, at list user handler will not hit anymore.
*/
if (!ret)
ret = fprobe_addr_list_add(alist, node->addr);
}
} }
/* Handle module unloading to manage fprobe_ip_table. */ /* Handle module unloading to manage fprobe_ip_table. */
@ -475,8 +590,9 @@ static int fprobe_module_callback(struct notifier_block *nb,
unsigned long val, void *data) unsigned long val, void *data)
{ {
struct fprobe_addr_list alist = {.size = FPROBE_IPS_BATCH_INIT}; struct fprobe_addr_list alist = {.size = FPROBE_IPS_BATCH_INIT};
struct fprobe_hlist_node *node;
struct rhashtable_iter iter;
struct module *mod = data; struct module *mod = data;
int i;
if (val != MODULE_STATE_GOING) if (val != MODULE_STATE_GOING)
return NOTIFY_DONE; return NOTIFY_DONE;
@ -487,12 +603,19 @@ static int fprobe_module_callback(struct notifier_block *nb,
return NOTIFY_DONE; return NOTIFY_DONE;
mutex_lock(&fprobe_mutex); mutex_lock(&fprobe_mutex);
for (i = 0; i < FPROBE_IP_TABLE_SIZE; i++) rhltable_walk_enter(&fprobe_ip_table, &iter);
fprobe_remove_node_in_module(mod, &fprobe_ip_table[i], &alist); do {
rhashtable_walk_start(&iter);
while ((node = rhashtable_walk_next(&iter)) && !IS_ERR(node))
fprobe_remove_node_in_module(mod, node, &alist);
rhashtable_walk_stop(&iter);
} while (node == ERR_PTR(-EAGAIN));
rhashtable_walk_exit(&iter);
if (alist.index > 0) if (alist.index > 0)
ftrace_set_filter_ips(&fprobe_graph_ops.ops, fprobe_set_ips(alist.addrs, alist.index, 1, 0);
alist.addrs, alist.index, 1, 0);
mutex_unlock(&fprobe_mutex); mutex_unlock(&fprobe_mutex);
kfree(alist.addrs); kfree(alist.addrs);
@ -725,11 +848,23 @@ int register_fprobe_ips(struct fprobe *fp, unsigned long *addrs, int num)
mutex_lock(&fprobe_mutex); mutex_lock(&fprobe_mutex);
hlist_array = fp->hlist_array; hlist_array = fp->hlist_array;
ret = fprobe_graph_add_ips(addrs, num); if (fprobe_is_ftrace(fp))
ret = fprobe_ftrace_add_ips(addrs, num);
else
ret = fprobe_graph_add_ips(addrs, num);
if (!ret) { if (!ret) {
add_fprobe_hash(fp); add_fprobe_hash(fp);
for (i = 0; i < hlist_array->size; i++) for (i = 0; i < hlist_array->size; i++) {
insert_fprobe_node(&hlist_array->array[i]); ret = insert_fprobe_node(&hlist_array->array[i]);
if (ret)
break;
}
/* fallback on insert error */
if (ret) {
for (i--; i >= 0; i--)
delete_fprobe_node(&hlist_array->array[i]);
}
} }
mutex_unlock(&fprobe_mutex); mutex_unlock(&fprobe_mutex);
@ -813,7 +948,10 @@ int unregister_fprobe(struct fprobe *fp)
} }
del_fprobe_hash(fp); del_fprobe_hash(fp);
fprobe_graph_remove_ips(addrs, count); if (fprobe_is_ftrace(fp))
fprobe_ftrace_remove_ips(addrs, count);
else
fprobe_graph_remove_ips(addrs, count);
kfree_rcu(hlist_array, rcu); kfree_rcu(hlist_array, rcu);
fp->hlist_array = NULL; fp->hlist_array = NULL;
@ -825,3 +963,10 @@ int unregister_fprobe(struct fprobe *fp)
return ret; return ret;
} }
EXPORT_SYMBOL_GPL(unregister_fprobe); EXPORT_SYMBOL_GPL(unregister_fprobe);
static int __init fprobe_initcall(void)
{
rhltable_init(&fprobe_ip_table, &fprobe_rht_params);
return 0;
}
core_initcall(fprobe_initcall);

View File

@ -534,7 +534,9 @@ static int function_stat_headers(struct seq_file *m)
static int function_stat_show(struct seq_file *m, void *v) static int function_stat_show(struct seq_file *m, void *v)
{ {
struct trace_array *tr = trace_get_global_array();
struct ftrace_profile *rec = v; struct ftrace_profile *rec = v;
const char *refsymbol = NULL;
char str[KSYM_SYMBOL_LEN]; char str[KSYM_SYMBOL_LEN];
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
static struct trace_seq s; static struct trace_seq s;
@ -554,7 +556,29 @@ static int function_stat_show(struct seq_file *m, void *v)
return 0; return 0;
#endif #endif
kallsyms_lookup(rec->ip, NULL, NULL, NULL, str); if (tr->trace_flags & TRACE_ITER(PROF_TEXT_OFFSET)) {
unsigned long offset;
if (core_kernel_text(rec->ip)) {
refsymbol = "_text";
offset = rec->ip - (unsigned long)_text;
} else {
struct module *mod;
guard(rcu)();
mod = __module_text_address(rec->ip);
if (mod) {
refsymbol = mod->name;
/* Calculate offset from module's text entry address. */
offset = rec->ip - (unsigned long)mod->mem[MOD_TEXT].base;
}
}
if (refsymbol)
snprintf(str, sizeof(str), " %s+%#lx", refsymbol, offset);
}
if (!refsymbol)
kallsyms_lookup(rec->ip, NULL, NULL, NULL, str);
seq_printf(m, " %-30.30s %10lu", str, rec->counter); seq_printf(m, " %-30.30s %10lu", str, rec->counter);
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
@ -838,6 +862,8 @@ static int profile_graph_entry(struct ftrace_graph_ent *trace,
return 1; return 1;
} }
bool fprofile_no_sleep_time;
static void profile_graph_return(struct ftrace_graph_ret *trace, static void profile_graph_return(struct ftrace_graph_ret *trace,
struct fgraph_ops *gops, struct fgraph_ops *gops,
struct ftrace_regs *fregs) struct ftrace_regs *fregs)
@ -863,7 +889,7 @@ static void profile_graph_return(struct ftrace_graph_ret *trace,
calltime = rettime - profile_data->calltime; calltime = rettime - profile_data->calltime;
if (!fgraph_sleep_time) { if (fprofile_no_sleep_time) {
if (current->ftrace_sleeptime) if (current->ftrace_sleeptime)
calltime -= current->ftrace_sleeptime - profile_data->sleeptime; calltime -= current->ftrace_sleeptime - profile_data->sleeptime;
} }
@ -6075,7 +6101,7 @@ int register_ftrace_direct(struct ftrace_ops *ops, unsigned long addr)
new_hash = NULL; new_hash = NULL;
ops->func = call_direct_funcs; ops->func = call_direct_funcs;
ops->flags = MULTI_FLAGS; ops->flags |= MULTI_FLAGS;
ops->trampoline = FTRACE_REGS_ADDR; ops->trampoline = FTRACE_REGS_ADDR;
ops->direct_call = addr; ops->direct_call = addr;

View File

@ -3,6 +3,7 @@
* Copyright (C) 2021 VMware Inc, Steven Rostedt <rostedt@goodmis.org> * Copyright (C) 2021 VMware Inc, Steven Rostedt <rostedt@goodmis.org>
*/ */
#include <linux/spinlock.h> #include <linux/spinlock.h>
#include <linux/seqlock.h>
#include <linux/irq_work.h> #include <linux/irq_work.h>
#include <linux/slab.h> #include <linux/slab.h>
#include "trace.h" #include "trace.h"
@ -126,7 +127,7 @@ bool trace_pid_list_is_set(struct trace_pid_list *pid_list, unsigned int pid)
{ {
union upper_chunk *upper_chunk; union upper_chunk *upper_chunk;
union lower_chunk *lower_chunk; union lower_chunk *lower_chunk;
unsigned long flags; unsigned int seq;
unsigned int upper1; unsigned int upper1;
unsigned int upper2; unsigned int upper2;
unsigned int lower; unsigned int lower;
@ -138,14 +139,16 @@ bool trace_pid_list_is_set(struct trace_pid_list *pid_list, unsigned int pid)
if (pid_split(pid, &upper1, &upper2, &lower) < 0) if (pid_split(pid, &upper1, &upper2, &lower) < 0)
return false; return false;
raw_spin_lock_irqsave(&pid_list->lock, flags); do {
upper_chunk = pid_list->upper[upper1]; seq = read_seqcount_begin(&pid_list->seqcount);
if (upper_chunk) { ret = false;
lower_chunk = upper_chunk->data[upper2]; upper_chunk = pid_list->upper[upper1];
if (lower_chunk) if (upper_chunk) {
ret = test_bit(lower, lower_chunk->data); lower_chunk = upper_chunk->data[upper2];
} if (lower_chunk)
raw_spin_unlock_irqrestore(&pid_list->lock, flags); ret = test_bit(lower, lower_chunk->data);
}
} while (read_seqcount_retry(&pid_list->seqcount, seq));
return ret; return ret;
} }
@ -178,6 +181,7 @@ int trace_pid_list_set(struct trace_pid_list *pid_list, unsigned int pid)
return -EINVAL; return -EINVAL;
raw_spin_lock_irqsave(&pid_list->lock, flags); raw_spin_lock_irqsave(&pid_list->lock, flags);
write_seqcount_begin(&pid_list->seqcount);
upper_chunk = pid_list->upper[upper1]; upper_chunk = pid_list->upper[upper1];
if (!upper_chunk) { if (!upper_chunk) {
upper_chunk = get_upper_chunk(pid_list); upper_chunk = get_upper_chunk(pid_list);
@ -199,6 +203,7 @@ int trace_pid_list_set(struct trace_pid_list *pid_list, unsigned int pid)
set_bit(lower, lower_chunk->data); set_bit(lower, lower_chunk->data);
ret = 0; ret = 0;
out: out:
write_seqcount_end(&pid_list->seqcount);
raw_spin_unlock_irqrestore(&pid_list->lock, flags); raw_spin_unlock_irqrestore(&pid_list->lock, flags);
return ret; return ret;
} }
@ -230,6 +235,7 @@ int trace_pid_list_clear(struct trace_pid_list *pid_list, unsigned int pid)
return -EINVAL; return -EINVAL;
raw_spin_lock_irqsave(&pid_list->lock, flags); raw_spin_lock_irqsave(&pid_list->lock, flags);
write_seqcount_begin(&pid_list->seqcount);
upper_chunk = pid_list->upper[upper1]; upper_chunk = pid_list->upper[upper1];
if (!upper_chunk) if (!upper_chunk)
goto out; goto out;
@ -250,6 +256,7 @@ int trace_pid_list_clear(struct trace_pid_list *pid_list, unsigned int pid)
} }
} }
out: out:
write_seqcount_end(&pid_list->seqcount);
raw_spin_unlock_irqrestore(&pid_list->lock, flags); raw_spin_unlock_irqrestore(&pid_list->lock, flags);
return 0; return 0;
} }
@ -340,8 +347,10 @@ static void pid_list_refill_irq(struct irq_work *iwork)
again: again:
raw_spin_lock(&pid_list->lock); raw_spin_lock(&pid_list->lock);
write_seqcount_begin(&pid_list->seqcount);
upper_count = CHUNK_ALLOC - pid_list->free_upper_chunks; upper_count = CHUNK_ALLOC - pid_list->free_upper_chunks;
lower_count = CHUNK_ALLOC - pid_list->free_lower_chunks; lower_count = CHUNK_ALLOC - pid_list->free_lower_chunks;
write_seqcount_end(&pid_list->seqcount);
raw_spin_unlock(&pid_list->lock); raw_spin_unlock(&pid_list->lock);
if (upper_count <= 0 && lower_count <= 0) if (upper_count <= 0 && lower_count <= 0)
@ -370,6 +379,7 @@ static void pid_list_refill_irq(struct irq_work *iwork)
} }
raw_spin_lock(&pid_list->lock); raw_spin_lock(&pid_list->lock);
write_seqcount_begin(&pid_list->seqcount);
if (upper) { if (upper) {
*upper_next = pid_list->upper_list; *upper_next = pid_list->upper_list;
pid_list->upper_list = upper; pid_list->upper_list = upper;
@ -380,6 +390,7 @@ static void pid_list_refill_irq(struct irq_work *iwork)
pid_list->lower_list = lower; pid_list->lower_list = lower;
pid_list->free_lower_chunks += lcnt; pid_list->free_lower_chunks += lcnt;
} }
write_seqcount_end(&pid_list->seqcount);
raw_spin_unlock(&pid_list->lock); raw_spin_unlock(&pid_list->lock);
/* /*
@ -419,6 +430,7 @@ struct trace_pid_list *trace_pid_list_alloc(void)
init_irq_work(&pid_list->refill_irqwork, pid_list_refill_irq); init_irq_work(&pid_list->refill_irqwork, pid_list_refill_irq);
raw_spin_lock_init(&pid_list->lock); raw_spin_lock_init(&pid_list->lock);
seqcount_raw_spinlock_init(&pid_list->seqcount, &pid_list->lock);
for (i = 0; i < CHUNK_ALLOC; i++) { for (i = 0; i < CHUNK_ALLOC; i++) {
union upper_chunk *chunk; union upper_chunk *chunk;

View File

@ -76,6 +76,7 @@ union upper_chunk {
}; };
struct trace_pid_list { struct trace_pid_list {
seqcount_raw_spinlock_t seqcount;
raw_spinlock_t lock; raw_spinlock_t lock;
struct irq_work refill_irqwork; struct irq_work refill_irqwork;
union upper_chunk *upper[UPPER1_SIZE]; // 1 or 2K in size union upper_chunk *upper[UPPER1_SIZE]; // 1 or 2K in size

View File

@ -401,6 +401,41 @@ static void free_buffer_page(struct buffer_page *bpage)
kfree(bpage); kfree(bpage);
} }
/*
* For best performance, allocate cpu buffer data cache line sized
* and per CPU.
*/
#define alloc_cpu_buffer(cpu) (struct ring_buffer_per_cpu *) \
kzalloc_node(ALIGN(sizeof(struct ring_buffer_per_cpu), \
cache_line_size()), GFP_KERNEL, cpu_to_node(cpu));
#define alloc_cpu_page(cpu) (struct buffer_page *) \
kzalloc_node(ALIGN(sizeof(struct buffer_page), \
cache_line_size()), GFP_KERNEL, cpu_to_node(cpu));
static struct buffer_data_page *alloc_cpu_data(int cpu, int order)
{
struct buffer_data_page *dpage;
struct page *page;
gfp_t mflags;
/*
* __GFP_RETRY_MAYFAIL flag makes sure that the allocation fails
* gracefully without invoking oom-killer and the system is not
* destabilized.
*/
mflags = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_COMP | __GFP_ZERO;
page = alloc_pages_node(cpu_to_node(cpu), mflags, order);
if (!page)
return NULL;
dpage = page_address(page);
rb_init_page(dpage);
return dpage;
}
/* /*
* We need to fit the time_stamp delta into 27 bits. * We need to fit the time_stamp delta into 27 bits.
*/ */
@ -2204,7 +2239,6 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
struct ring_buffer_cpu_meta *meta = NULL; struct ring_buffer_cpu_meta *meta = NULL;
struct buffer_page *bpage, *tmp; struct buffer_page *bpage, *tmp;
bool user_thread = current->mm != NULL; bool user_thread = current->mm != NULL;
gfp_t mflags;
long i; long i;
/* /*
@ -2218,13 +2252,6 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
if (i < nr_pages) if (i < nr_pages)
return -ENOMEM; return -ENOMEM;
/*
* __GFP_RETRY_MAYFAIL flag makes sure that the allocation fails
* gracefully without invoking oom-killer and the system is not
* destabilized.
*/
mflags = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
/* /*
* If a user thread allocates too much, and si_mem_available() * If a user thread allocates too much, and si_mem_available()
* reports there's enough memory, even though there is not. * reports there's enough memory, even though there is not.
@ -2241,10 +2268,8 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
meta = rb_range_meta(buffer, nr_pages, cpu_buffer->cpu); meta = rb_range_meta(buffer, nr_pages, cpu_buffer->cpu);
for (i = 0; i < nr_pages; i++) { for (i = 0; i < nr_pages; i++) {
struct page *page;
bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), bpage = alloc_cpu_page(cpu_buffer->cpu);
mflags, cpu_to_node(cpu_buffer->cpu));
if (!bpage) if (!bpage)
goto free_pages; goto free_pages;
@ -2267,13 +2292,10 @@ static int __rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
bpage->range = 1; bpage->range = 1;
bpage->id = i + 1; bpage->id = i + 1;
} else { } else {
page = alloc_pages_node(cpu_to_node(cpu_buffer->cpu), int order = cpu_buffer->buffer->subbuf_order;
mflags | __GFP_COMP | __GFP_ZERO, bpage->page = alloc_cpu_data(cpu_buffer->cpu, order);
cpu_buffer->buffer->subbuf_order); if (!bpage->page)
if (!page)
goto free_pages; goto free_pages;
bpage->page = page_address(page);
rb_init_page(bpage->page);
} }
bpage->order = cpu_buffer->buffer->subbuf_order; bpage->order = cpu_buffer->buffer->subbuf_order;
@ -2324,14 +2346,12 @@ static int rb_allocate_pages(struct ring_buffer_per_cpu *cpu_buffer,
static struct ring_buffer_per_cpu * static struct ring_buffer_per_cpu *
rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu) rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
{ {
struct ring_buffer_per_cpu *cpu_buffer __free(kfree) = NULL; struct ring_buffer_per_cpu *cpu_buffer __free(kfree) =
alloc_cpu_buffer(cpu);
struct ring_buffer_cpu_meta *meta; struct ring_buffer_cpu_meta *meta;
struct buffer_page *bpage; struct buffer_page *bpage;
struct page *page;
int ret; int ret;
cpu_buffer = kzalloc_node(ALIGN(sizeof(*cpu_buffer), cache_line_size()),
GFP_KERNEL, cpu_to_node(cpu));
if (!cpu_buffer) if (!cpu_buffer)
return NULL; return NULL;
@ -2347,8 +2367,7 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
init_waitqueue_head(&cpu_buffer->irq_work.full_waiters); init_waitqueue_head(&cpu_buffer->irq_work.full_waiters);
mutex_init(&cpu_buffer->mapping_lock); mutex_init(&cpu_buffer->mapping_lock);
bpage = kzalloc_node(ALIGN(sizeof(*bpage), cache_line_size()), bpage = alloc_cpu_page(cpu);
GFP_KERNEL, cpu_to_node(cpu));
if (!bpage) if (!bpage)
return NULL; return NULL;
@ -2370,13 +2389,10 @@ rb_allocate_cpu_buffer(struct trace_buffer *buffer, long nr_pages, int cpu)
rb_meta_buffer_update(cpu_buffer, bpage); rb_meta_buffer_update(cpu_buffer, bpage);
bpage->range = 1; bpage->range = 1;
} else { } else {
page = alloc_pages_node(cpu_to_node(cpu), int order = cpu_buffer->buffer->subbuf_order;
GFP_KERNEL | __GFP_COMP | __GFP_ZERO, bpage->page = alloc_cpu_data(cpu, order);
cpu_buffer->buffer->subbuf_order); if (!bpage->page)
if (!page)
goto fail_free_reader; goto fail_free_reader;
bpage->page = page_address(page);
rb_init_page(bpage->page);
} }
INIT_LIST_HEAD(&cpu_buffer->reader_page->list); INIT_LIST_HEAD(&cpu_buffer->reader_page->list);
@ -6464,7 +6480,6 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
struct ring_buffer_per_cpu *cpu_buffer; struct ring_buffer_per_cpu *cpu_buffer;
struct buffer_data_read_page *bpage = NULL; struct buffer_data_read_page *bpage = NULL;
unsigned long flags; unsigned long flags;
struct page *page;
if (!cpumask_test_cpu(cpu, buffer->cpumask)) if (!cpumask_test_cpu(cpu, buffer->cpumask))
return ERR_PTR(-ENODEV); return ERR_PTR(-ENODEV);
@ -6486,22 +6501,16 @@ ring_buffer_alloc_read_page(struct trace_buffer *buffer, int cpu)
arch_spin_unlock(&cpu_buffer->lock); arch_spin_unlock(&cpu_buffer->lock);
local_irq_restore(flags); local_irq_restore(flags);
if (bpage->data) if (bpage->data) {
goto out; rb_init_page(bpage->data);
} else {
page = alloc_pages_node(cpu_to_node(cpu), bpage->data = alloc_cpu_data(cpu, cpu_buffer->buffer->subbuf_order);
GFP_KERNEL | __GFP_NORETRY | __GFP_COMP | __GFP_ZERO, if (!bpage->data) {
cpu_buffer->buffer->subbuf_order); kfree(bpage);
if (!page) { return ERR_PTR(-ENOMEM);
kfree(bpage); }
return ERR_PTR(-ENOMEM);
} }
bpage->data = page_address(page);
out:
rb_init_page(bpage->data);
return bpage; return bpage;
} }
EXPORT_SYMBOL_GPL(ring_buffer_alloc_read_page); EXPORT_SYMBOL_GPL(ring_buffer_alloc_read_page);

View File

@ -13,13 +13,9 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/rv.h> #include <linux/rv.h>
__printf(1, 2) static void rv_panic_reaction(const char *msg, ...) __printf(1, 0) static void rv_panic_reaction(const char *msg, va_list args)
{ {
va_list args;
va_start(args, msg);
vpanic(msg, args); vpanic(msg, args);
va_end(args);
} }
static struct rv_reactor rv_panic = { static struct rv_reactor rv_panic = {

View File

@ -12,13 +12,9 @@
#include <linux/init.h> #include <linux/init.h>
#include <linux/rv.h> #include <linux/rv.h>
__printf(1, 2) static void rv_printk_reaction(const char *msg, ...) __printf(1, 0) static void rv_printk_reaction(const char *msg, va_list args)
{ {
va_list args;
va_start(args, msg);
vprintk_deferred(msg, args); vprintk_deferred(msg, args);
va_end(args);
} }
static struct rv_reactor rv_printk = { static struct rv_reactor rv_printk = {

View File

@ -375,15 +375,13 @@ static ssize_t monitor_enable_write_data(struct file *filp, const char __user *u
if (retval) if (retval)
return retval; return retval;
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
if (val) if (val)
retval = rv_enable_monitor(mon); retval = rv_enable_monitor(mon);
else else
retval = rv_disable_monitor(mon); retval = rv_disable_monitor(mon);
mutex_unlock(&rv_interface_lock);
return retval ? : count; return retval ? : count;
} }
@ -422,35 +420,27 @@ static const struct file_operations interface_desc_fops = {
static int create_monitor_dir(struct rv_monitor *mon, struct rv_monitor *parent) static int create_monitor_dir(struct rv_monitor *mon, struct rv_monitor *parent)
{ {
struct dentry *root = parent ? parent->root_d : get_monitors_root(); struct dentry *root = parent ? parent->root_d : get_monitors_root();
const char *name = mon->name; struct dentry *dir __free(rv_remove) = rv_create_dir(mon->name, root);
struct dentry *tmp; struct dentry *tmp;
int retval; int retval;
mon->root_d = rv_create_dir(name, root); if (!dir)
if (!mon->root_d)
return -ENOMEM; return -ENOMEM;
tmp = rv_create_file("enable", RV_MODE_WRITE, mon->root_d, mon, &interface_enable_fops); tmp = rv_create_file("enable", RV_MODE_WRITE, dir, mon, &interface_enable_fops);
if (!tmp) { if (!tmp)
retval = -ENOMEM; return -ENOMEM;
goto out_remove_root;
}
tmp = rv_create_file("desc", RV_MODE_READ, mon->root_d, mon, &interface_desc_fops); tmp = rv_create_file("desc", RV_MODE_READ, dir, mon, &interface_desc_fops);
if (!tmp) { if (!tmp)
retval = -ENOMEM; return -ENOMEM;
goto out_remove_root;
}
retval = reactor_populate_monitor(mon); retval = reactor_populate_monitor(mon, dir);
if (retval) if (retval)
goto out_remove_root; return retval;
mon->root_d = no_free_ptr(dir);
return 0; return 0;
out_remove_root:
rv_remove(mon->root_d);
return retval;
} }
/* /*
@ -568,7 +558,7 @@ static void disable_all_monitors(void)
struct rv_monitor *mon; struct rv_monitor *mon;
int enabled = 0; int enabled = 0;
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
list_for_each_entry(mon, &rv_monitors_list, list) list_for_each_entry(mon, &rv_monitors_list, list)
enabled += __rv_disable_monitor(mon, false); enabled += __rv_disable_monitor(mon, false);
@ -581,8 +571,6 @@ static void disable_all_monitors(void)
*/ */
tracepoint_synchronize_unregister(); tracepoint_synchronize_unregister();
} }
mutex_unlock(&rv_interface_lock);
} }
static int enabled_monitors_open(struct inode *inode, struct file *file) static int enabled_monitors_open(struct inode *inode, struct file *file)
@ -623,7 +611,7 @@ static ssize_t enabled_monitors_write(struct file *filp, const char __user *user
if (!len) if (!len)
return count; return count;
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
retval = -EINVAL; retval = -EINVAL;
@ -644,13 +632,11 @@ static ssize_t enabled_monitors_write(struct file *filp, const char __user *user
else else
retval = rv_disable_monitor(mon); retval = rv_disable_monitor(mon);
if (!retval) if (retval)
retval = count; return retval;
return count;
break;
} }
mutex_unlock(&rv_interface_lock);
return retval; return retval;
} }
@ -737,7 +723,7 @@ static ssize_t monitoring_on_write_data(struct file *filp, const char __user *us
if (retval) if (retval)
return retval; return retval;
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
if (val) if (val)
turn_monitoring_on_with_reset(); turn_monitoring_on_with_reset();
@ -750,8 +736,6 @@ static ssize_t monitoring_on_write_data(struct file *filp, const char __user *us
*/ */
tracepoint_synchronize_unregister(); tracepoint_synchronize_unregister();
mutex_unlock(&rv_interface_lock);
return count; return count;
} }
@ -784,28 +768,26 @@ int rv_register_monitor(struct rv_monitor *monitor, struct rv_monitor *parent)
return -EINVAL; return -EINVAL;
} }
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
list_for_each_entry(r, &rv_monitors_list, list) { list_for_each_entry(r, &rv_monitors_list, list) {
if (strcmp(monitor->name, r->name) == 0) { if (strcmp(monitor->name, r->name) == 0) {
pr_info("Monitor %s is already registered\n", monitor->name); pr_info("Monitor %s is already registered\n", monitor->name);
retval = -EEXIST; return -EEXIST;
goto out_unlock;
} }
} }
if (parent && rv_is_nested_monitor(parent)) { if (parent && rv_is_nested_monitor(parent)) {
pr_info("Parent monitor %s is already nested, cannot nest further\n", pr_info("Parent monitor %s is already nested, cannot nest further\n",
parent->name); parent->name);
retval = -EINVAL; return -EINVAL;
goto out_unlock;
} }
monitor->parent = parent; monitor->parent = parent;
retval = create_monitor_dir(monitor, parent); retval = create_monitor_dir(monitor, parent);
if (retval) if (retval)
goto out_unlock; return retval;
/* keep children close to the parent for easier visualisation */ /* keep children close to the parent for easier visualisation */
if (parent) if (parent)
@ -813,9 +795,7 @@ int rv_register_monitor(struct rv_monitor *monitor, struct rv_monitor *parent)
else else
list_add_tail(&monitor->list, &rv_monitors_list); list_add_tail(&monitor->list, &rv_monitors_list);
out_unlock: return 0;
mutex_unlock(&rv_interface_lock);
return retval;
} }
/** /**
@ -826,13 +806,12 @@ int rv_register_monitor(struct rv_monitor *monitor, struct rv_monitor *parent)
*/ */
int rv_unregister_monitor(struct rv_monitor *monitor) int rv_unregister_monitor(struct rv_monitor *monitor)
{ {
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
rv_disable_monitor(monitor); rv_disable_monitor(monitor);
list_del(&monitor->list); list_del(&monitor->list);
destroy_monitor_dir(monitor); destroy_monitor_dir(monitor);
mutex_unlock(&rv_interface_lock);
return 0; return 0;
} }
@ -840,39 +819,36 @@ int __init rv_init_interface(void)
{ {
struct dentry *tmp; struct dentry *tmp;
int retval; int retval;
struct dentry *root_dir __free(rv_remove) = rv_create_dir("rv", NULL);
rv_root.root_dir = rv_create_dir("rv", NULL); if (!root_dir)
if (!rv_root.root_dir) return 1;
goto out_err;
rv_root.monitors_dir = rv_create_dir("monitors", rv_root.root_dir); rv_root.monitors_dir = rv_create_dir("monitors", root_dir);
if (!rv_root.monitors_dir) if (!rv_root.monitors_dir)
goto out_err; return 1;
tmp = rv_create_file("available_monitors", RV_MODE_READ, rv_root.root_dir, NULL, tmp = rv_create_file("available_monitors", RV_MODE_READ, root_dir, NULL,
&available_monitors_ops); &available_monitors_ops);
if (!tmp) if (!tmp)
goto out_err; return 1;
tmp = rv_create_file("enabled_monitors", RV_MODE_WRITE, rv_root.root_dir, NULL, tmp = rv_create_file("enabled_monitors", RV_MODE_WRITE, root_dir, NULL,
&enabled_monitors_ops); &enabled_monitors_ops);
if (!tmp) if (!tmp)
goto out_err; return 1;
tmp = rv_create_file("monitoring_on", RV_MODE_WRITE, rv_root.root_dir, NULL, tmp = rv_create_file("monitoring_on", RV_MODE_WRITE, root_dir, NULL,
&monitoring_on_fops); &monitoring_on_fops);
if (!tmp) if (!tmp)
goto out_err; return 1;
retval = init_rv_reactors(rv_root.root_dir); retval = init_rv_reactors(root_dir);
if (retval) if (retval)
goto out_err; return 1;
turn_monitoring_on(); turn_monitoring_on();
return 0; rv_root.root_dir = no_free_ptr(root_dir);
out_err: return 0;
rv_remove(rv_root.root_dir);
printk(KERN_ERR "RV: Error while creating the RV interface\n");
return 1;
} }

View File

@ -17,6 +17,8 @@ struct rv_interface {
#define rv_create_file tracefs_create_file #define rv_create_file tracefs_create_file
#define rv_remove tracefs_remove #define rv_remove tracefs_remove
DEFINE_FREE(rv_remove, struct dentry *, if (_T) rv_remove(_T));
#define MAX_RV_MONITOR_NAME_SIZE 32 #define MAX_RV_MONITOR_NAME_SIZE 32
#define MAX_RV_REACTOR_NAME_SIZE 32 #define MAX_RV_REACTOR_NAME_SIZE 32
@ -30,10 +32,10 @@ bool rv_is_container_monitor(struct rv_monitor *mon);
bool rv_is_nested_monitor(struct rv_monitor *mon); bool rv_is_nested_monitor(struct rv_monitor *mon);
#ifdef CONFIG_RV_REACTORS #ifdef CONFIG_RV_REACTORS
int reactor_populate_monitor(struct rv_monitor *mon); int reactor_populate_monitor(struct rv_monitor *mon, struct dentry *root);
int init_rv_reactors(struct dentry *root_dir); int init_rv_reactors(struct dentry *root_dir);
#else #else
static inline int reactor_populate_monitor(struct rv_monitor *mon) static inline int reactor_populate_monitor(struct rv_monitor *mon, struct dentry *root)
{ {
return 0; return 0;
} }

View File

@ -61,6 +61,7 @@
* printk * printk
*/ */
#include <linux/lockdep.h>
#include <linux/slab.h> #include <linux/slab.h>
#include "rv.h" #include "rv.h"
@ -232,9 +233,7 @@ monitor_reactors_write(struct file *file, const char __user *user_buf,
seq_f = file->private_data; seq_f = file->private_data;
mon = seq_f->private; mon = seq_f->private;
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
retval = -EINVAL;
list_for_each_entry(reactor, &rv_reactors_list, list) { list_for_each_entry(reactor, &rv_reactors_list, list) {
if (strcmp(ptr, reactor->name) != 0) if (strcmp(ptr, reactor->name) != 0)
@ -242,13 +241,10 @@ monitor_reactors_write(struct file *file, const char __user *user_buf,
monitor_swap_reactors(mon, reactor); monitor_swap_reactors(mon, reactor);
retval = count; return count;
break;
} }
mutex_unlock(&rv_interface_lock); return -EINVAL;
return retval;
} }
/* /*
@ -309,18 +305,14 @@ static int __rv_register_reactor(struct rv_reactor *reactor)
*/ */
int rv_register_reactor(struct rv_reactor *reactor) int rv_register_reactor(struct rv_reactor *reactor)
{ {
int retval = 0;
if (strlen(reactor->name) >= MAX_RV_REACTOR_NAME_SIZE) { if (strlen(reactor->name) >= MAX_RV_REACTOR_NAME_SIZE) {
pr_info("Reactor %s has a name longer than %d\n", pr_info("Reactor %s has a name longer than %d\n",
reactor->name, MAX_RV_MONITOR_NAME_SIZE); reactor->name, MAX_RV_MONITOR_NAME_SIZE);
return -EINVAL; return -EINVAL;
} }
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
retval = __rv_register_reactor(reactor); return __rv_register_reactor(reactor);
mutex_unlock(&rv_interface_lock);
return retval;
} }
/** /**
@ -331,9 +323,8 @@ int rv_register_reactor(struct rv_reactor *reactor)
*/ */
int rv_unregister_reactor(struct rv_reactor *reactor) int rv_unregister_reactor(struct rv_reactor *reactor)
{ {
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
list_del(&reactor->list); list_del(&reactor->list);
mutex_unlock(&rv_interface_lock);
return 0; return 0;
} }
@ -347,7 +338,7 @@ static bool __read_mostly reacting_on;
* *
* Returns 1 if on, 0 otherwise. * Returns 1 if on, 0 otherwise.
*/ */
bool rv_reacting_on(void) static bool rv_reacting_on(void)
{ {
/* Ensures that concurrent monitors read consistent reacting_on */ /* Ensures that concurrent monitors read consistent reacting_on */
smp_rmb(); smp_rmb();
@ -389,7 +380,7 @@ static ssize_t reacting_on_write_data(struct file *filp, const char __user *user
if (retval) if (retval)
return retval; return retval;
mutex_lock(&rv_interface_lock); guard(mutex)(&rv_interface_lock);
if (val) if (val)
turn_reacting_on(); turn_reacting_on();
@ -402,8 +393,6 @@ static ssize_t reacting_on_write_data(struct file *filp, const char __user *user
*/ */
tracepoint_synchronize_unregister(); tracepoint_synchronize_unregister();
mutex_unlock(&rv_interface_lock);
return count; return count;
} }
@ -416,14 +405,15 @@ static const struct file_operations reacting_on_fops = {
/** /**
* reactor_populate_monitor - creates per monitor reactors file * reactor_populate_monitor - creates per monitor reactors file
* @mon: The monitor. * @mon: The monitor.
* @root: The directory of the monitor.
* *
* Returns 0 if successful, error otherwise. * Returns 0 if successful, error otherwise.
*/ */
int reactor_populate_monitor(struct rv_monitor *mon) int reactor_populate_monitor(struct rv_monitor *mon, struct dentry *root)
{ {
struct dentry *tmp; struct dentry *tmp;
tmp = rv_create_file("reactors", RV_MODE_WRITE, mon->root_d, mon, &monitor_reactors_ops); tmp = rv_create_file("reactors", RV_MODE_WRITE, root, mon, &monitor_reactors_ops);
if (!tmp) if (!tmp)
return -ENOMEM; return -ENOMEM;
@ -438,7 +428,7 @@ int reactor_populate_monitor(struct rv_monitor *mon)
/* /*
* Nop reactor register * Nop reactor register
*/ */
__printf(1, 2) static void rv_nop_reaction(const char *msg, ...) __printf(1, 0) static void rv_nop_reaction(const char *msg, va_list args)
{ {
} }
@ -450,30 +440,42 @@ static struct rv_reactor rv_nop = {
int init_rv_reactors(struct dentry *root_dir) int init_rv_reactors(struct dentry *root_dir)
{ {
struct dentry *available, *reacting;
int retval; int retval;
available = rv_create_file("available_reactors", RV_MODE_READ, root_dir, NULL, struct dentry *available __free(rv_remove) =
&available_reactors_ops); rv_create_file("available_reactors", RV_MODE_READ, root_dir,
if (!available) NULL, &available_reactors_ops);
goto out_err;
reacting = rv_create_file("reacting_on", RV_MODE_WRITE, root_dir, NULL, &reacting_on_fops); struct dentry *reacting __free(rv_remove) =
if (!reacting) rv_create_file("reacting_on", RV_MODE_WRITE, root_dir, NULL, &reacting_on_fops);
goto rm_available;
if (!reacting || !available)
return -ENOMEM;
retval = __rv_register_reactor(&rv_nop); retval = __rv_register_reactor(&rv_nop);
if (retval) if (retval)
goto rm_reacting; return retval;
turn_reacting_on(); turn_reacting_on();
retain_and_null_ptr(available);
retain_and_null_ptr(reacting);
return 0; return 0;
}
rm_reacting:
rv_remove(reacting); void rv_react(struct rv_monitor *monitor, const char *msg, ...)
rm_available: {
rv_remove(available); static DEFINE_WAIT_OVERRIDE_MAP(rv_react_map, LD_WAIT_FREE);
out_err: va_list args;
return -ENOMEM;
if (!rv_reacting_on() || !monitor->react)
return;
va_start(args, msg);
lock_map_acquire_try(&rv_react_map);
monitor->react(msg, args);
lock_map_release(&rv_react_map);
va_end(args);
} }

File diff suppressed because it is too large Load Diff

View File

@ -22,6 +22,7 @@
#include <linux/ctype.h> #include <linux/ctype.h>
#include <linux/once_lite.h> #include <linux/once_lite.h>
#include <linux/ftrace_regs.h> #include <linux/ftrace_regs.h>
#include <linux/llist.h>
#include "pid_list.h" #include "pid_list.h"
@ -131,6 +132,8 @@ enum trace_type {
#define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long)) #define HIST_STACKTRACE_SIZE (HIST_STACKTRACE_DEPTH * sizeof(unsigned long))
#define HIST_STACKTRACE_SKIP 5 #define HIST_STACKTRACE_SKIP 5
#define SYSCALL_FAULT_USER_MAX 165
/* /*
* syscalls are special, and need special handling, this is why * syscalls are special, and need special handling, this is why
* they are not included in trace_entries.h * they are not included in trace_entries.h
@ -216,7 +219,7 @@ struct array_buffer {
int cpu; int cpu;
}; };
#define TRACE_FLAGS_MAX_SIZE 32 #define TRACE_FLAGS_MAX_SIZE 64
struct trace_options { struct trace_options {
struct tracer *tracer; struct tracer *tracer;
@ -390,7 +393,8 @@ struct trace_array {
int buffer_percent; int buffer_percent;
unsigned int n_err_log_entries; unsigned int n_err_log_entries;
struct tracer *current_trace; struct tracer *current_trace;
unsigned int trace_flags; struct tracer_flags *current_trace_flags;
u64 trace_flags;
unsigned char trace_flags_index[TRACE_FLAGS_MAX_SIZE]; unsigned char trace_flags_index[TRACE_FLAGS_MAX_SIZE];
unsigned int flags; unsigned int flags;
raw_spinlock_t start_lock; raw_spinlock_t start_lock;
@ -404,6 +408,7 @@ struct trace_array {
struct list_head systems; struct list_head systems;
struct list_head events; struct list_head events;
struct list_head marker_list; struct list_head marker_list;
struct list_head tracers;
struct trace_event_file *trace_marker_file; struct trace_event_file *trace_marker_file;
cpumask_var_t tracing_cpumask; /* only trace on set CPUs */ cpumask_var_t tracing_cpumask; /* only trace on set CPUs */
/* one per_cpu trace_pipe can be opened by only one user */ /* one per_cpu trace_pipe can be opened by only one user */
@ -430,6 +435,7 @@ struct trace_array {
int function_enabled; int function_enabled;
#endif #endif
int no_filter_buffering_ref; int no_filter_buffering_ref;
unsigned int syscall_buf_sz;
struct list_head hist_vars; struct list_head hist_vars;
#ifdef CONFIG_TRACER_SNAPSHOT #ifdef CONFIG_TRACER_SNAPSHOT
struct cond_snapshot *cond_snapshot; struct cond_snapshot *cond_snapshot;
@ -448,6 +454,7 @@ enum {
TRACE_ARRAY_FL_LAST_BOOT = BIT(2), TRACE_ARRAY_FL_LAST_BOOT = BIT(2),
TRACE_ARRAY_FL_MOD_INIT = BIT(3), TRACE_ARRAY_FL_MOD_INIT = BIT(3),
TRACE_ARRAY_FL_MEMMAP = BIT(4), TRACE_ARRAY_FL_MEMMAP = BIT(4),
TRACE_ARRAY_FL_VMALLOC = BIT(5),
}; };
#ifdef CONFIG_MODULES #ifdef CONFIG_MODULES
@ -631,9 +638,10 @@ struct tracer {
u32 old_flags, u32 bit, int set); u32 old_flags, u32 bit, int set);
/* Return 0 if OK with change, else return non-zero */ /* Return 0 if OK with change, else return non-zero */
int (*flag_changed)(struct trace_array *tr, int (*flag_changed)(struct trace_array *tr,
u32 mask, int set); u64 mask, int set);
struct tracer *next; struct tracer *next;
struct tracer_flags *flags; struct tracer_flags *flags;
struct tracer_flags *default_flags;
int enabled; int enabled;
bool print_max; bool print_max;
bool allow_instances; bool allow_instances;
@ -937,8 +945,6 @@ static __always_inline bool ftrace_hash_empty(struct ftrace_hash *hash)
#define TRACE_GRAPH_PRINT_FILL_SHIFT 28 #define TRACE_GRAPH_PRINT_FILL_SHIFT 28
#define TRACE_GRAPH_PRINT_FILL_MASK (0x3 << TRACE_GRAPH_PRINT_FILL_SHIFT) #define TRACE_GRAPH_PRINT_FILL_MASK (0x3 << TRACE_GRAPH_PRINT_FILL_SHIFT)
extern void ftrace_graph_sleep_time_control(bool enable);
#ifdef CONFIG_FUNCTION_PROFILER #ifdef CONFIG_FUNCTION_PROFILER
extern void ftrace_graph_graph_time_control(bool enable); extern void ftrace_graph_graph_time_control(bool enable);
#else #else
@ -958,7 +964,8 @@ extern int __trace_graph_entry(struct trace_array *tr,
extern int __trace_graph_retaddr_entry(struct trace_array *tr, extern int __trace_graph_retaddr_entry(struct trace_array *tr,
struct ftrace_graph_ent *trace, struct ftrace_graph_ent *trace,
unsigned int trace_ctx, unsigned int trace_ctx,
unsigned long retaddr); unsigned long retaddr,
struct ftrace_regs *fregs);
extern void __trace_graph_return(struct trace_array *tr, extern void __trace_graph_return(struct trace_array *tr,
struct ftrace_graph_ret *trace, struct ftrace_graph_ret *trace,
unsigned int trace_ctx, unsigned int trace_ctx,
@ -1109,7 +1116,8 @@ static inline void ftrace_graph_addr_finish(struct fgraph_ops *gops, struct ftra
#endif /* CONFIG_DYNAMIC_FTRACE */ #endif /* CONFIG_DYNAMIC_FTRACE */
extern unsigned int fgraph_max_depth; extern unsigned int fgraph_max_depth;
extern bool fgraph_sleep_time; extern int fgraph_no_sleep_time;
extern bool fprofile_no_sleep_time;
static inline bool static inline bool
ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent *trace) ftrace_graph_ignore_func(struct fgraph_ops *gops, struct ftrace_graph_ent *trace)
@ -1154,11 +1162,6 @@ struct ftrace_func_command {
char *params, int enable); char *params, int enable);
}; };
extern bool ftrace_filter_param __initdata; extern bool ftrace_filter_param __initdata;
static inline int ftrace_trace_task(struct trace_array *tr)
{
return this_cpu_read(tr->array_buffer.data->ftrace_ignore_pid) !=
FTRACE_PID_IGNORE;
}
extern int ftrace_is_dead(void); extern int ftrace_is_dead(void);
int ftrace_create_function_files(struct trace_array *tr, int ftrace_create_function_files(struct trace_array *tr,
struct dentry *parent); struct dentry *parent);
@ -1176,10 +1179,6 @@ void ftrace_clear_pids(struct trace_array *tr);
int init_function_trace(void); int init_function_trace(void);
void ftrace_pid_follow_fork(struct trace_array *tr, bool enable); void ftrace_pid_follow_fork(struct trace_array *tr, bool enable);
#else #else
static inline int ftrace_trace_task(struct trace_array *tr)
{
return 1;
}
static inline int ftrace_is_dead(void) { return 0; } static inline int ftrace_is_dead(void) { return 0; }
static inline int static inline int
ftrace_create_function_files(struct trace_array *tr, ftrace_create_function_files(struct trace_array *tr,
@ -1345,11 +1344,11 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
# define FUNCTION_FLAGS \ # define FUNCTION_FLAGS \
C(FUNCTION, "function-trace"), \ C(FUNCTION, "function-trace"), \
C(FUNC_FORK, "function-fork"), C(FUNC_FORK, "function-fork"),
# define FUNCTION_DEFAULT_FLAGS TRACE_ITER_FUNCTION # define FUNCTION_DEFAULT_FLAGS TRACE_ITER(FUNCTION)
#else #else
# define FUNCTION_FLAGS # define FUNCTION_FLAGS
# define FUNCTION_DEFAULT_FLAGS 0UL # define FUNCTION_DEFAULT_FLAGS 0UL
# define TRACE_ITER_FUNC_FORK 0UL # define TRACE_ITER_FUNC_FORK_BIT -1
#endif #endif
#ifdef CONFIG_STACKTRACE #ifdef CONFIG_STACKTRACE
@ -1359,6 +1358,24 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
# define STACK_FLAGS # define STACK_FLAGS
#endif #endif
#ifdef CONFIG_FUNCTION_PROFILER
# define PROFILER_FLAGS \
C(PROF_TEXT_OFFSET, "prof-text-offset"),
# ifdef CONFIG_FUNCTION_GRAPH_TRACER
# define FPROFILE_FLAGS \
C(GRAPH_TIME, "graph-time"),
# define FPROFILE_DEFAULT_FLAGS TRACE_ITER(GRAPH_TIME)
# else
# define FPROFILE_FLAGS
# define FPROFILE_DEFAULT_FLAGS 0UL
# endif
#else
# define PROFILER_FLAGS
# define FPROFILE_FLAGS
# define FPROFILE_DEFAULT_FLAGS 0UL
# define TRACE_ITER_PROF_TEXT_OFFSET_BIT -1
#endif
/* /*
* trace_iterator_flags is an enumeration that defines bit * trace_iterator_flags is an enumeration that defines bit
* positions into trace_flags that controls the output. * positions into trace_flags that controls the output.
@ -1391,13 +1408,15 @@ extern int trace_get_user(struct trace_parser *parser, const char __user *ubuf,
C(MARKERS, "markers"), \ C(MARKERS, "markers"), \
C(EVENT_FORK, "event-fork"), \ C(EVENT_FORK, "event-fork"), \
C(TRACE_PRINTK, "trace_printk_dest"), \ C(TRACE_PRINTK, "trace_printk_dest"), \
C(COPY_MARKER, "copy_trace_marker"),\ C(COPY_MARKER, "copy_trace_marker"), \
C(PAUSE_ON_TRACE, "pause-on-trace"), \ C(PAUSE_ON_TRACE, "pause-on-trace"), \
C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \ C(HASH_PTR, "hash-ptr"), /* Print hashed pointer */ \
FUNCTION_FLAGS \ FUNCTION_FLAGS \
FGRAPH_FLAGS \ FGRAPH_FLAGS \
STACK_FLAGS \ STACK_FLAGS \
BRANCH_FLAGS BRANCH_FLAGS \
PROFILER_FLAGS \
FPROFILE_FLAGS
/* /*
* By defining C, we can make TRACE_FLAGS a list of bit names * By defining C, we can make TRACE_FLAGS a list of bit names
@ -1413,20 +1432,17 @@ enum trace_iterator_bits {
}; };
/* /*
* By redefining C, we can make TRACE_FLAGS a list of masks that * And use TRACE_ITER(flag) to define the bit masks.
* use the bits as defined above.
*/ */
#undef C #define TRACE_ITER(flag) \
#define C(a, b) TRACE_ITER_##a = (1 << TRACE_ITER_##a##_BIT) (TRACE_ITER_##flag##_BIT < 0 ? 0 : 1ULL << (TRACE_ITER_##flag##_BIT))
enum trace_iterator_flags { TRACE_FLAGS };
/* /*
* TRACE_ITER_SYM_MASK masks the options in trace_flags that * TRACE_ITER_SYM_MASK masks the options in trace_flags that
* control the output of kernel symbols. * control the output of kernel symbols.
*/ */
#define TRACE_ITER_SYM_MASK \ #define TRACE_ITER_SYM_MASK \
(TRACE_ITER_PRINT_PARENT|TRACE_ITER_SYM_OFFSET|TRACE_ITER_SYM_ADDR) (TRACE_ITER(PRINT_PARENT)|TRACE_ITER(SYM_OFFSET)|TRACE_ITER(SYM_ADDR))
extern struct tracer nop_trace; extern struct tracer nop_trace;
@ -1435,7 +1451,7 @@ extern int enable_branch_tracing(struct trace_array *tr);
extern void disable_branch_tracing(void); extern void disable_branch_tracing(void);
static inline int trace_branch_enable(struct trace_array *tr) static inline int trace_branch_enable(struct trace_array *tr)
{ {
if (tr->trace_flags & TRACE_ITER_BRANCH) if (tr->trace_flags & TRACE_ITER(BRANCH))
return enable_branch_tracing(tr); return enable_branch_tracing(tr);
return 0; return 0;
} }
@ -1531,6 +1547,23 @@ void trace_buffered_event_enable(void);
void early_enable_events(struct trace_array *tr, char *buf, bool disable_first); void early_enable_events(struct trace_array *tr, char *buf, bool disable_first);
struct trace_user_buf;
struct trace_user_buf_info {
struct trace_user_buf __percpu *tbuf;
size_t size;
int ref;
};
typedef int (*trace_user_buf_copy)(char *dst, const char __user *src,
size_t size, void *data);
int trace_user_fault_init(struct trace_user_buf_info *tinfo, size_t size);
int trace_user_fault_get(struct trace_user_buf_info *tinfo);
int trace_user_fault_put(struct trace_user_buf_info *tinfo);
void trace_user_fault_destroy(struct trace_user_buf_info *tinfo);
char *trace_user_fault_read(struct trace_user_buf_info *tinfo,
const char __user *ptr, size_t size,
trace_user_buf_copy copy_func, void *data);
static inline void static inline void
__trace_event_discard_commit(struct trace_buffer *buffer, __trace_event_discard_commit(struct trace_buffer *buffer,
struct ring_buffer_event *event) struct ring_buffer_event *event)
@ -1752,13 +1785,13 @@ extern void clear_event_triggers(struct trace_array *tr);
enum { enum {
EVENT_TRIGGER_FL_PROBE = BIT(0), EVENT_TRIGGER_FL_PROBE = BIT(0),
EVENT_TRIGGER_FL_COUNT = BIT(1),
}; };
struct event_trigger_data { struct event_trigger_data {
unsigned long count; unsigned long count;
int ref; int ref;
int flags; int flags;
const struct event_trigger_ops *ops;
struct event_command *cmd_ops; struct event_command *cmd_ops;
struct event_filter __rcu *filter; struct event_filter __rcu *filter;
char *filter_str; char *filter_str;
@ -1769,6 +1802,7 @@ struct event_trigger_data {
char *name; char *name;
struct list_head named_list; struct list_head named_list;
struct event_trigger_data *named_data; struct event_trigger_data *named_data;
struct llist_node llist;
}; };
/* Avoid typos */ /* Avoid typos */
@ -1783,6 +1817,10 @@ struct enable_trigger_data {
bool hist; bool hist;
}; };
bool event_trigger_count(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event);
extern int event_enable_trigger_print(struct seq_file *m, extern int event_enable_trigger_print(struct seq_file *m,
struct event_trigger_data *data); struct event_trigger_data *data);
extern void event_enable_trigger_free(struct event_trigger_data *data); extern void event_enable_trigger_free(struct event_trigger_data *data);
@ -1845,64 +1883,6 @@ extern void event_trigger_unregister(struct event_command *cmd_ops,
extern void event_file_get(struct trace_event_file *file); extern void event_file_get(struct trace_event_file *file);
extern void event_file_put(struct trace_event_file *file); extern void event_file_put(struct trace_event_file *file);
/**
* struct event_trigger_ops - callbacks for trace event triggers
*
* The methods in this structure provide per-event trigger hooks for
* various trigger operations.
*
* The @init and @free methods are used during trigger setup and
* teardown, typically called from an event_command's @parse()
* function implementation.
*
* The @print method is used to print the trigger spec.
*
* The @trigger method is the function that actually implements the
* trigger and is called in the context of the triggering event
* whenever that event occurs.
*
* All the methods below, except for @init() and @free(), must be
* implemented.
*
* @trigger: The trigger 'probe' function called when the triggering
* event occurs. The data passed into this callback is the data
* that was supplied to the event_command @reg() function that
* registered the trigger (see struct event_command) along with
* the trace record, rec.
*
* @init: An optional initialization function called for the trigger
* when the trigger is registered (via the event_command reg()
* function). This can be used to perform per-trigger
* initialization such as incrementing a per-trigger reference
* count, for instance. This is usually implemented by the
* generic utility function @event_trigger_init() (see
* trace_event_triggers.c).
*
* @free: An optional de-initialization function called for the
* trigger when the trigger is unregistered (via the
* event_command @reg() function). This can be used to perform
* per-trigger de-initialization such as decrementing a
* per-trigger reference count and freeing corresponding trigger
* data, for instance. This is usually implemented by the
* generic utility function @event_trigger_free() (see
* trace_event_triggers.c).
*
* @print: The callback function invoked to have the trigger print
* itself. This is usually implemented by a wrapper function
* that calls the generic utility function @event_trigger_print()
* (see trace_event_triggers.c).
*/
struct event_trigger_ops {
void (*trigger)(struct event_trigger_data *data,
struct trace_buffer *buffer,
void *rec,
struct ring_buffer_event *rbe);
int (*init)(struct event_trigger_data *data);
void (*free)(struct event_trigger_data *data);
int (*print)(struct seq_file *m,
struct event_trigger_data *data);
};
/** /**
* struct event_command - callbacks and data members for event commands * struct event_command - callbacks and data members for event commands
* *
@ -1952,7 +1932,7 @@ struct event_trigger_ops {
* *
* @reg: Adds the trigger to the list of triggers associated with the * @reg: Adds the trigger to the list of triggers associated with the
* event, and enables the event trigger itself, after * event, and enables the event trigger itself, after
* initializing it (via the event_trigger_ops @init() function). * initializing it (via the event_command @init() function).
* This is also where commands can use the @trigger_type value to * This is also where commands can use the @trigger_type value to
* make the decision as to whether or not multiple instances of * make the decision as to whether or not multiple instances of
* the trigger should be allowed. This is usually implemented by * the trigger should be allowed. This is usually implemented by
@ -1961,7 +1941,7 @@ struct event_trigger_ops {
* *
* @unreg: Removes the trigger from the list of triggers associated * @unreg: Removes the trigger from the list of triggers associated
* with the event, and disables the event trigger itself, after * with the event, and disables the event trigger itself, after
* initializing it (via the event_trigger_ops @free() function). * initializing it (via the event_command @free() function).
* This is usually implemented by the generic utility function * This is usually implemented by the generic utility function
* @unregister_trigger() (see trace_event_triggers.c). * @unregister_trigger() (see trace_event_triggers.c).
* *
@ -1975,12 +1955,41 @@ struct event_trigger_ops {
* ignored. This is usually implemented by the generic utility * ignored. This is usually implemented by the generic utility
* function @set_trigger_filter() (see trace_event_triggers.c). * function @set_trigger_filter() (see trace_event_triggers.c).
* *
* @get_trigger_ops: The callback function invoked to retrieve the * All the methods below, except for @init() and @free(), must be
* event_trigger_ops implementation associated with the command. * implemented.
* This callback function allows a single event_command to *
* support multiple trigger implementations via different sets of * @trigger: The trigger 'probe' function called when the triggering
* event_trigger_ops, depending on the value of the @param * event occurs. The data passed into this callback is the data
* string. * that was supplied to the event_command @reg() function that
* registered the trigger (see struct event_command) along with
* the trace record, rec.
*
* @count_func: If defined and a numeric parameter is passed to the
* trigger, then this function will be called before @trigger
* is called. If this function returns false, then @trigger is not
* executed.
*
* @init: An optional initialization function called for the trigger
* when the trigger is registered (via the event_command reg()
* function). This can be used to perform per-trigger
* initialization such as incrementing a per-trigger reference
* count, for instance. This is usually implemented by the
* generic utility function @event_trigger_init() (see
* trace_event_triggers.c).
*
* @free: An optional de-initialization function called for the
* trigger when the trigger is unregistered (via the
* event_command @reg() function). This can be used to perform
* per-trigger de-initialization such as decrementing a
* per-trigger reference count and freeing corresponding trigger
* data, for instance. This is usually implemented by the
* generic utility function @event_trigger_free() (see
* trace_event_triggers.c).
*
* @print: The callback function invoked to have the trigger print
* itself. This is usually implemented by a wrapper function
* that calls the generic utility function @event_trigger_print()
* (see trace_event_triggers.c).
*/ */
struct event_command { struct event_command {
struct list_head list; struct list_head list;
@ -2001,7 +2010,18 @@ struct event_command {
int (*set_filter)(char *filter_str, int (*set_filter)(char *filter_str,
struct event_trigger_data *data, struct event_trigger_data *data,
struct trace_event_file *file); struct trace_event_file *file);
const struct event_trigger_ops *(*get_trigger_ops)(char *cmd, char *param); void (*trigger)(struct event_trigger_data *data,
struct trace_buffer *buffer,
void *rec,
struct ring_buffer_event *rbe);
bool (*count_func)(struct event_trigger_data *data,
struct trace_buffer *buffer,
void *rec,
struct ring_buffer_event *rbe);
int (*init)(struct event_trigger_data *data);
void (*free)(struct event_trigger_data *data);
int (*print)(struct seq_file *m,
struct event_trigger_data *data);
}; };
/** /**
@ -2022,7 +2042,7 @@ struct event_command {
* either committed or discarded. At that point, if any commands * either committed or discarded. At that point, if any commands
* have deferred their triggers, those commands are finally * have deferred their triggers, those commands are finally
* invoked following the close of the current event. In other * invoked following the close of the current event. In other
* words, if the event_trigger_ops @func() probe implementation * words, if the event_command @func() probe implementation
* itself logs to the trace buffer, this flag should be set, * itself logs to the trace buffer, this flag should be set,
* otherwise it can be left unspecified. * otherwise it can be left unspecified.
* *
@ -2064,8 +2084,8 @@ extern const char *__stop___tracepoint_str[];
void trace_printk_control(bool enabled); void trace_printk_control(bool enabled);
void trace_printk_start_comm(void); void trace_printk_start_comm(void);
int trace_keep_overwrite(struct tracer *tracer, u32 mask, int set); int trace_keep_overwrite(struct tracer *tracer, u64 mask, int set);
int set_tracer_flag(struct trace_array *tr, unsigned int mask, int enabled); int set_tracer_flag(struct trace_array *tr, u64 mask, int enabled);
/* Used from boot time tracer */ /* Used from boot time tracer */
extern int trace_set_options(struct trace_array *tr, char *option); extern int trace_set_options(struct trace_array *tr, char *option);
@ -2248,4 +2268,25 @@ static inline int rv_init_interface(void)
*/ */
#define FTRACE_TRAMPOLINE_MARKER ((unsigned long) INT_MAX) #define FTRACE_TRAMPOLINE_MARKER ((unsigned long) INT_MAX)
/*
* This is used to get the address of the args array based on
* the type of the entry.
*/
#define FGRAPH_ENTRY_ARGS(e) \
({ \
unsigned long *_args; \
struct ftrace_graph_ent_entry *_e = e; \
\
if (IS_ENABLED(CONFIG_FUNCTION_GRAPH_RETADDR) && \
e->ent.type == TRACE_GRAPH_RETADDR_ENT) { \
struct fgraph_retaddr_ent_entry *_re; \
\
_re = (typeof(_re))_e; \
_args = _re->args; \
} else { \
_args = _e->args; \
} \
_args; \
})
#endif /* _LINUX_KERNEL_TRACE_H */ #endif /* _LINUX_KERNEL_TRACE_H */

View File

@ -144,9 +144,16 @@ static int create_dyn_event(const char *raw_command)
if (!ret || ret != -ECANCELED) if (!ret || ret != -ECANCELED)
break; break;
} }
mutex_unlock(&dyn_event_ops_mutex); if (ret == -ECANCELED) {
if (ret == -ECANCELED) static const char *err_msg[] = {"No matching dynamic event type"};
/* Wrong dynamic event. Leave an error message. */
tracing_log_err(NULL, "dynevent", raw_command, err_msg,
0, 0);
ret = -EINVAL; ret = -EINVAL;
}
mutex_unlock(&dyn_event_ops_mutex);
return ret; return ret;
} }

View File

@ -80,11 +80,11 @@ FTRACE_ENTRY(funcgraph_entry, ftrace_graph_ent_entry,
F_STRUCT( F_STRUCT(
__field_struct( struct ftrace_graph_ent, graph_ent ) __field_struct( struct ftrace_graph_ent, graph_ent )
__field_packed( unsigned long, graph_ent, func ) __field_packed( unsigned long, graph_ent, func )
__field_packed( unsigned int, graph_ent, depth ) __field_packed( unsigned long, graph_ent, depth )
__dynamic_array(unsigned long, args ) __dynamic_array(unsigned long, args )
), ),
F_printk("--> %ps (%u)", (void *)__entry->func, __entry->depth) F_printk("--> %ps (%lu)", (void *)__entry->func, __entry->depth)
); );
#ifdef CONFIG_FUNCTION_GRAPH_RETADDR #ifdef CONFIG_FUNCTION_GRAPH_RETADDR
@ -95,13 +95,14 @@ FTRACE_ENTRY_PACKED(fgraph_retaddr_entry, fgraph_retaddr_ent_entry,
TRACE_GRAPH_RETADDR_ENT, TRACE_GRAPH_RETADDR_ENT,
F_STRUCT( F_STRUCT(
__field_struct( struct fgraph_retaddr_ent, graph_ent ) __field_struct( struct fgraph_retaddr_ent, graph_rent )
__field_packed( unsigned long, graph_ent, func ) __field_packed( unsigned long, graph_rent.ent, func )
__field_packed( unsigned int, graph_ent, depth ) __field_packed( unsigned long, graph_rent.ent, depth )
__field_packed( unsigned long, graph_ent, retaddr ) __field_packed( unsigned long, graph_rent, retaddr )
__dynamic_array(unsigned long, args )
), ),
F_printk("--> %ps (%u) <- %ps", (void *)__entry->func, __entry->depth, F_printk("--> %ps (%lu) <- %ps", (void *)__entry->func, __entry->depth,
(void *)__entry->retaddr) (void *)__entry->retaddr)
); );

View File

@ -61,6 +61,9 @@ static void trace_event_probe_cleanup(struct trace_eprobe *ep)
kfree(ep); kfree(ep);
} }
DEFINE_FREE(trace_event_probe_cleanup, struct trace_eprobe *,
if (!IS_ERR_OR_NULL(_T)) trace_event_probe_cleanup(_T))
static struct trace_eprobe *to_trace_eprobe(struct dyn_event *ev) static struct trace_eprobe *to_trace_eprobe(struct dyn_event *ev)
{ {
return container_of(ev, struct trace_eprobe, devent); return container_of(ev, struct trace_eprobe, devent);
@ -197,10 +200,10 @@ static struct trace_eprobe *alloc_event_probe(const char *group,
struct trace_event_call *event, struct trace_event_call *event,
int nargs) int nargs)
{ {
struct trace_eprobe *ep; struct trace_eprobe *ep __free(trace_event_probe_cleanup) = NULL;
const char *event_name; const char *event_name;
const char *sys_name; const char *sys_name;
int ret = -ENOMEM; int ret;
if (!event) if (!event)
return ERR_PTR(-ENODEV); return ERR_PTR(-ENODEV);
@ -211,25 +214,22 @@ static struct trace_eprobe *alloc_event_probe(const char *group,
ep = kzalloc(struct_size(ep, tp.args, nargs), GFP_KERNEL); ep = kzalloc(struct_size(ep, tp.args, nargs), GFP_KERNEL);
if (!ep) { if (!ep) {
trace_event_put_ref(event); trace_event_put_ref(event);
goto error; return ERR_PTR(-ENOMEM);
} }
ep->event = event; ep->event = event;
ep->event_name = kstrdup(event_name, GFP_KERNEL); ep->event_name = kstrdup(event_name, GFP_KERNEL);
if (!ep->event_name) if (!ep->event_name)
goto error; return ERR_PTR(-ENOMEM);
ep->event_system = kstrdup(sys_name, GFP_KERNEL); ep->event_system = kstrdup(sys_name, GFP_KERNEL);
if (!ep->event_system) if (!ep->event_system)
goto error; return ERR_PTR(-ENOMEM);
ret = trace_probe_init(&ep->tp, this_event, group, false, nargs); ret = trace_probe_init(&ep->tp, this_event, group, false, nargs);
if (ret < 0) if (ret < 0)
goto error; return ERR_PTR(ret);
dyn_event_init(&ep->devent, &eprobe_dyn_event_ops); dyn_event_init(&ep->devent, &eprobe_dyn_event_ops);
return ep; return_ptr(ep);
error:
trace_event_probe_cleanup(ep);
return ERR_PTR(ret);
} }
static int eprobe_event_define_fields(struct trace_event_call *event_call) static int eprobe_event_define_fields(struct trace_event_call *event_call)
@ -484,13 +484,6 @@ static void eprobe_trigger_func(struct event_trigger_data *data,
__eprobe_trace_func(edata, rec); __eprobe_trace_func(edata, rec);
} }
static const struct event_trigger_ops eprobe_trigger_ops = {
.trigger = eprobe_trigger_func,
.print = eprobe_trigger_print,
.init = eprobe_trigger_init,
.free = eprobe_trigger_free,
};
static int eprobe_trigger_cmd_parse(struct event_command *cmd_ops, static int eprobe_trigger_cmd_parse(struct event_command *cmd_ops,
struct trace_event_file *file, struct trace_event_file *file,
char *glob, char *cmd, char *glob, char *cmd,
@ -513,12 +506,6 @@ static void eprobe_trigger_unreg_func(char *glob,
} }
static const struct event_trigger_ops *eprobe_trigger_get_ops(char *cmd,
char *param)
{
return &eprobe_trigger_ops;
}
static struct event_command event_trigger_cmd = { static struct event_command event_trigger_cmd = {
.name = "eprobe", .name = "eprobe",
.trigger_type = ETT_EVENT_EPROBE, .trigger_type = ETT_EVENT_EPROBE,
@ -527,8 +514,11 @@ static struct event_command event_trigger_cmd = {
.reg = eprobe_trigger_reg_func, .reg = eprobe_trigger_reg_func,
.unreg = eprobe_trigger_unreg_func, .unreg = eprobe_trigger_unreg_func,
.unreg_all = NULL, .unreg_all = NULL,
.get_trigger_ops = eprobe_trigger_get_ops,
.set_filter = NULL, .set_filter = NULL,
.trigger = eprobe_trigger_func,
.print = eprobe_trigger_print,
.init = eprobe_trigger_init,
.free = eprobe_trigger_free,
}; };
static struct event_trigger_data * static struct event_trigger_data *
@ -548,7 +538,6 @@ new_eprobe_trigger(struct trace_eprobe *ep, struct trace_event_file *file)
trigger->flags = EVENT_TRIGGER_FL_PROBE; trigger->flags = EVENT_TRIGGER_FL_PROBE;
trigger->count = -1; trigger->count = -1;
trigger->ops = &eprobe_trigger_ops;
/* /*
* EVENT PROBE triggers are not registered as commands with * EVENT PROBE triggers are not registered as commands with
@ -801,25 +790,6 @@ find_and_get_event(const char *system, const char *event_name)
return NULL; return NULL;
} }
static int trace_eprobe_tp_update_arg(struct trace_eprobe *ep, const char *argv[], int i)
{
struct traceprobe_parse_context *ctx __free(traceprobe_parse_context) = NULL;
int ret;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->event = ep->event;
ctx->flags = TPARG_FL_KERNEL | TPARG_FL_TEVENT;
ret = traceprobe_parse_probe_arg(&ep->tp, i, argv[i], ctx);
/* Handle symbols "@" */
if (!ret)
ret = traceprobe_update_arg(&ep->tp.args[i]);
return ret;
}
static int trace_eprobe_parse_filter(struct trace_eprobe *ep, int argc, const char *argv[]) static int trace_eprobe_parse_filter(struct trace_eprobe *ep, int argc, const char *argv[])
{ {
struct event_filter *dummy = NULL; struct event_filter *dummy = NULL;
@ -856,13 +826,10 @@ static int trace_eprobe_parse_filter(struct trace_eprobe *ep, int argc, const ch
ret = create_event_filter(top_trace_array(), ep->event, ep->filter_str, ret = create_event_filter(top_trace_array(), ep->event, ep->filter_str,
true, &dummy); true, &dummy);
free_event_filter(dummy); free_event_filter(dummy);
if (ret) if (ret) {
goto error; kfree(ep->filter_str);
ep->filter_str = NULL;
return 0; }
error:
kfree(ep->filter_str);
ep->filter_str = NULL;
return ret; return ret;
} }
@ -874,31 +841,33 @@ static int __trace_eprobe_create(int argc, const char *argv[])
* Fetch args (no space): * Fetch args (no space):
* <name>=$<field>[:TYPE] * <name>=$<field>[:TYPE]
*/ */
struct traceprobe_parse_context *ctx __free(traceprobe_parse_context) = NULL;
struct trace_eprobe *ep __free(trace_event_probe_cleanup) = NULL;
const char *trlog __free(trace_probe_log_clear) = NULL;
const char *event = NULL, *group = EPROBE_EVENT_SYSTEM; const char *event = NULL, *group = EPROBE_EVENT_SYSTEM;
const char *sys_event = NULL, *sys_name = NULL; const char *sys_event = NULL, *sys_name = NULL;
struct trace_event_call *event_call; struct trace_event_call *event_call;
char *buf1 __free(kfree) = NULL; char *buf1 __free(kfree) = NULL;
char *buf2 __free(kfree) = NULL; char *buf2 __free(kfree) = NULL;
char *gbuf __free(kfree) = NULL; char *gbuf __free(kfree) = NULL;
struct trace_eprobe *ep = NULL;
int ret = 0, filter_idx = 0; int ret = 0, filter_idx = 0;
int i, filter_cnt; int i, filter_cnt;
if (argc < 2 || argv[0][0] != 'e') if (argc < 2 || argv[0][0] != 'e')
return -ECANCELED; return -ECANCELED;
trace_probe_log_init("event_probe", argc, argv); trlog = trace_probe_log_init("event_probe", argc, argv);
event = strchr(&argv[0][1], ':'); event = strchr(&argv[0][1], ':');
if (event) { if (event) {
gbuf = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL); gbuf = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL);
if (!gbuf) if (!gbuf)
goto mem_error; return -ENOMEM;
event++; event++;
ret = traceprobe_parse_event_name(&event, &group, gbuf, ret = traceprobe_parse_event_name(&event, &group, gbuf,
event - argv[0]); event - argv[0]);
if (ret) if (ret)
goto parse_error; return -EINVAL;
} }
trace_probe_log_set_index(1); trace_probe_log_set_index(1);
@ -906,18 +875,18 @@ static int __trace_eprobe_create(int argc, const char *argv[])
buf2 = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL); buf2 = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL);
if (!buf2) if (!buf2)
goto mem_error; return -ENOMEM;
ret = traceprobe_parse_event_name(&sys_event, &sys_name, buf2, 0); ret = traceprobe_parse_event_name(&sys_event, &sys_name, buf2, 0);
if (ret || !sys_event || !sys_name) { if (ret || !sys_event || !sys_name) {
trace_probe_log_err(0, NO_EVENT_INFO); trace_probe_log_err(0, NO_EVENT_INFO);
goto parse_error; return -EINVAL;
} }
if (!event) { if (!event) {
buf1 = kstrdup(sys_event, GFP_KERNEL); buf1 = kstrdup(sys_event, GFP_KERNEL);
if (!buf1) if (!buf1)
goto mem_error; return -ENOMEM;
event = buf1; event = buf1;
} }
@ -933,8 +902,7 @@ static int __trace_eprobe_create(int argc, const char *argv[])
if (argc - 2 > MAX_TRACE_ARGS) { if (argc - 2 > MAX_TRACE_ARGS) {
trace_probe_log_set_index(2); trace_probe_log_set_index(2);
trace_probe_log_err(0, TOO_MANY_ARGS); trace_probe_log_err(0, TOO_MANY_ARGS);
ret = -E2BIG; return -E2BIG;
goto error;
} }
scoped_guard(mutex, &event_mutex) { scoped_guard(mutex, &event_mutex) {
@ -948,29 +916,39 @@ static int __trace_eprobe_create(int argc, const char *argv[])
trace_probe_log_err(0, BAD_ATTACH_EVENT); trace_probe_log_err(0, BAD_ATTACH_EVENT);
/* This must return -ENOMEM or missing event, else there is a bug */ /* This must return -ENOMEM or missing event, else there is a bug */
WARN_ON_ONCE(ret != -ENOMEM && ret != -ENODEV); WARN_ON_ONCE(ret != -ENOMEM && ret != -ENODEV);
ep = NULL; return ret;
goto error;
} }
if (filter_idx) { if (filter_idx) {
trace_probe_log_set_index(filter_idx); trace_probe_log_set_index(filter_idx);
ret = trace_eprobe_parse_filter(ep, filter_cnt, argv + filter_idx); ret = trace_eprobe_parse_filter(ep, filter_cnt, argv + filter_idx);
if (ret) if (ret)
goto parse_error; return -EINVAL;
} else } else
ep->filter_str = NULL; ep->filter_str = NULL;
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->event = ep->event;
ctx->flags = TPARG_FL_KERNEL | TPARG_FL_TEVENT;
argc -= 2; argv += 2; argc -= 2; argv += 2;
/* parse arguments */ /* parse arguments */
for (i = 0; i < argc; i++) { for (i = 0; i < argc; i++) {
trace_probe_log_set_index(i + 2); trace_probe_log_set_index(i + 2);
ret = trace_eprobe_tp_update_arg(ep, argv, i);
ret = traceprobe_parse_probe_arg(&ep->tp, i, argv[i], ctx);
/* Handle symbols "@" */
if (!ret)
ret = traceprobe_update_arg(&ep->tp.args[i]);
if (ret) if (ret)
goto error; return ret;
} }
ret = traceprobe_set_print_fmt(&ep->tp, PROBE_PRINT_EVENT); ret = traceprobe_set_print_fmt(&ep->tp, PROBE_PRINT_EVENT);
if (ret < 0) if (ret < 0)
goto error; return ret;
init_trace_eprobe_call(ep); init_trace_eprobe_call(ep);
scoped_guard(mutex, &event_mutex) { scoped_guard(mutex, &event_mutex) {
ret = trace_probe_register_event_call(&ep->tp); ret = trace_probe_register_event_call(&ep->tp);
@ -979,25 +957,16 @@ static int __trace_eprobe_create(int argc, const char *argv[])
trace_probe_log_set_index(0); trace_probe_log_set_index(0);
trace_probe_log_err(0, EVENT_EXIST); trace_probe_log_err(0, EVENT_EXIST);
} }
goto error; return ret;
} }
ret = dyn_event_add(&ep->devent, &ep->tp.event->call); ret = dyn_event_add(&ep->devent, &ep->tp.event->call);
if (ret < 0) { if (ret < 0) {
trace_probe_unregister_event_call(&ep->tp); trace_probe_unregister_event_call(&ep->tp);
goto error; return ret;
} }
/* To avoid freeing registered eprobe event, clear ep. */
ep = NULL;
} }
trace_probe_log_clear();
return ret;
mem_error:
ret = -ENOMEM;
goto error;
parse_error:
ret = -EINVAL;
error:
trace_probe_log_clear();
trace_event_probe_cleanup(ep);
return ret; return ret;
} }

View File

@ -845,13 +845,13 @@ static int __ftrace_event_enable_disable(struct trace_event_file *file,
if (soft_disable) if (soft_disable)
set_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags); set_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &file->flags);
if (tr->trace_flags & TRACE_ITER_RECORD_CMD) { if (tr->trace_flags & TRACE_ITER(RECORD_CMD)) {
cmd = true; cmd = true;
tracing_start_cmdline_record(); tracing_start_cmdline_record();
set_bit(EVENT_FILE_FL_RECORDED_CMD_BIT, &file->flags); set_bit(EVENT_FILE_FL_RECORDED_CMD_BIT, &file->flags);
} }
if (tr->trace_flags & TRACE_ITER_RECORD_TGID) { if (tr->trace_flags & TRACE_ITER(RECORD_TGID)) {
tgid = true; tgid = true;
tracing_start_tgid_record(); tracing_start_tgid_record();
set_bit(EVENT_FILE_FL_RECORDED_TGID_BIT, &file->flags); set_bit(EVENT_FILE_FL_RECORDED_TGID_BIT, &file->flags);

View File

@ -5696,7 +5696,7 @@ static void hist_trigger_show(struct seq_file *m,
seq_puts(m, "\n\n"); seq_puts(m, "\n\n");
seq_puts(m, "# event histogram\n#\n# trigger info: "); seq_puts(m, "# event histogram\n#\n# trigger info: ");
data->ops->print(m, data); data->cmd_ops->print(m, data);
seq_puts(m, "#\n\n"); seq_puts(m, "#\n\n");
hist_data = data->private_data; hist_data = data->private_data;
@ -6018,7 +6018,7 @@ static void hist_trigger_debug_show(struct seq_file *m,
seq_puts(m, "\n\n"); seq_puts(m, "\n\n");
seq_puts(m, "# event histogram\n#\n# trigger info: "); seq_puts(m, "# event histogram\n#\n# trigger info: ");
data->ops->print(m, data); data->cmd_ops->print(m, data);
seq_puts(m, "#\n\n"); seq_puts(m, "#\n\n");
hist_data = data->private_data; hist_data = data->private_data;
@ -6328,20 +6328,21 @@ static void event_hist_trigger_free(struct event_trigger_data *data)
free_hist_pad(); free_hist_pad();
} }
static const struct event_trigger_ops event_hist_trigger_ops = {
.trigger = event_hist_trigger,
.print = event_hist_trigger_print,
.init = event_hist_trigger_init,
.free = event_hist_trigger_free,
};
static int event_hist_trigger_named_init(struct event_trigger_data *data) static int event_hist_trigger_named_init(struct event_trigger_data *data)
{ {
int ret;
data->ref++; data->ref++;
save_named_trigger(data->named_data->name, data); save_named_trigger(data->named_data->name, data);
return event_hist_trigger_init(data->named_data); ret = event_hist_trigger_init(data->named_data);
if (ret < 0) {
kfree(data->cmd_ops);
data->cmd_ops = &trigger_hist_cmd;
}
return ret;
} }
static void event_hist_trigger_named_free(struct event_trigger_data *data) static void event_hist_trigger_named_free(struct event_trigger_data *data)
@ -6353,24 +6354,14 @@ static void event_hist_trigger_named_free(struct event_trigger_data *data)
data->ref--; data->ref--;
if (!data->ref) { if (!data->ref) {
struct event_command *cmd_ops = data->cmd_ops;
del_named_trigger(data); del_named_trigger(data);
trigger_data_free(data); trigger_data_free(data);
kfree(cmd_ops);
} }
} }
static const struct event_trigger_ops event_hist_trigger_named_ops = {
.trigger = event_hist_trigger,
.print = event_hist_trigger_print,
.init = event_hist_trigger_named_init,
.free = event_hist_trigger_named_free,
};
static const struct event_trigger_ops *event_hist_get_trigger_ops(char *cmd,
char *param)
{
return &event_hist_trigger_ops;
}
static void hist_clear(struct event_trigger_data *data) static void hist_clear(struct event_trigger_data *data)
{ {
struct hist_trigger_data *hist_data = data->private_data; struct hist_trigger_data *hist_data = data->private_data;
@ -6564,13 +6555,24 @@ static int hist_register_trigger(char *glob,
data->paused = true; data->paused = true;
if (named_data) { if (named_data) {
struct event_command *cmd_ops;
data->private_data = named_data->private_data; data->private_data = named_data->private_data;
set_named_trigger_data(data, named_data); set_named_trigger_data(data, named_data);
data->ops = &event_hist_trigger_named_ops; /* Copy the command ops and update some of the functions */
cmd_ops = kmalloc(sizeof(*cmd_ops), GFP_KERNEL);
if (!cmd_ops) {
ret = -ENOMEM;
goto out;
}
*cmd_ops = *data->cmd_ops;
cmd_ops->init = event_hist_trigger_named_init;
cmd_ops->free = event_hist_trigger_named_free;
data->cmd_ops = cmd_ops;
} }
if (data->ops->init) { if (data->cmd_ops->init) {
ret = data->ops->init(data); ret = data->cmd_ops->init(data);
if (ret < 0) if (ret < 0)
goto out; goto out;
} }
@ -6684,8 +6686,8 @@ static void hist_unregister_trigger(char *glob,
} }
} }
if (test && test->ops->free) if (test && test->cmd_ops->free)
test->ops->free(test); test->cmd_ops->free(test);
if (hist_data->enable_timestamps) { if (hist_data->enable_timestamps) {
if (!hist_data->remove || test) if (!hist_data->remove || test)
@ -6737,8 +6739,8 @@ static void hist_unreg_all(struct trace_event_file *file)
update_cond_flag(file); update_cond_flag(file);
if (hist_data->enable_timestamps) if (hist_data->enable_timestamps)
tracing_set_filter_buffering(file->tr, false); tracing_set_filter_buffering(file->tr, false);
if (test->ops->free) if (test->cmd_ops->free)
test->ops->free(test); test->cmd_ops->free(test);
} }
} }
} }
@ -6914,8 +6916,11 @@ static struct event_command trigger_hist_cmd = {
.reg = hist_register_trigger, .reg = hist_register_trigger,
.unreg = hist_unregister_trigger, .unreg = hist_unregister_trigger,
.unreg_all = hist_unreg_all, .unreg_all = hist_unreg_all,
.get_trigger_ops = event_hist_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = event_hist_trigger,
.print = event_hist_trigger_print,
.init = event_hist_trigger_init,
.free = event_hist_trigger_free,
}; };
__init int register_trigger_hist_cmd(void) __init int register_trigger_hist_cmd(void)
@ -6947,66 +6952,6 @@ hist_enable_trigger(struct event_trigger_data *data,
} }
} }
static void
hist_enable_count_trigger(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event)
{
if (!data->count)
return;
if (data->count != -1)
(data->count)--;
hist_enable_trigger(data, buffer, rec, event);
}
static const struct event_trigger_ops hist_enable_trigger_ops = {
.trigger = hist_enable_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops hist_enable_count_trigger_ops = {
.trigger = hist_enable_count_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops hist_disable_trigger_ops = {
.trigger = hist_enable_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops hist_disable_count_trigger_ops = {
.trigger = hist_enable_count_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops *
hist_enable_get_trigger_ops(char *cmd, char *param)
{
const struct event_trigger_ops *ops;
bool enable;
enable = (strcmp(cmd, ENABLE_HIST_STR) == 0);
if (enable)
ops = param ? &hist_enable_count_trigger_ops :
&hist_enable_trigger_ops;
else
ops = param ? &hist_disable_count_trigger_ops :
&hist_disable_trigger_ops;
return ops;
}
static void hist_enable_unreg_all(struct trace_event_file *file) static void hist_enable_unreg_all(struct trace_event_file *file)
{ {
struct event_trigger_data *test, *n; struct event_trigger_data *test, *n;
@ -7016,8 +6961,8 @@ static void hist_enable_unreg_all(struct trace_event_file *file)
list_del_rcu(&test->list); list_del_rcu(&test->list);
update_cond_flag(file); update_cond_flag(file);
trace_event_trigger_enable_disable(file, 0); trace_event_trigger_enable_disable(file, 0);
if (test->ops->free) if (test->cmd_ops->free)
test->ops->free(test); test->cmd_ops->free(test);
} }
} }
} }
@ -7029,8 +6974,12 @@ static struct event_command trigger_hist_enable_cmd = {
.reg = event_enable_register_trigger, .reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger, .unreg = event_enable_unregister_trigger,
.unreg_all = hist_enable_unreg_all, .unreg_all = hist_enable_unreg_all,
.get_trigger_ops = hist_enable_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = hist_enable_trigger,
.count_func = event_trigger_count,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
}; };
static struct event_command trigger_hist_disable_cmd = { static struct event_command trigger_hist_disable_cmd = {
@ -7040,8 +6989,12 @@ static struct event_command trigger_hist_disable_cmd = {
.reg = event_enable_register_trigger, .reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger, .unreg = event_enable_unregister_trigger,
.unreg_all = hist_enable_unreg_all, .unreg_all = hist_enable_unreg_all,
.get_trigger_ops = hist_enable_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = hist_enable_trigger,
.count_func = event_trigger_count,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
}; };
static __init void unregister_trigger_hist_enable_disable_cmds(void) static __init void unregister_trigger_hist_enable_disable_cmds(void)

View File

@ -359,7 +359,7 @@ static enum print_line_t print_synth_event(struct trace_iterator *iter,
fmt = synth_field_fmt(se->fields[i]->type); fmt = synth_field_fmt(se->fields[i]->type);
/* parameter types */ /* parameter types */
if (tr && tr->trace_flags & TRACE_ITER_VERBOSE) if (tr && tr->trace_flags & TRACE_ITER(VERBOSE))
trace_seq_printf(s, "%s ", fmt); trace_seq_printf(s, "%s ", fmt);
snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt); snprintf(print_fmt, sizeof(print_fmt), "%%s=%s%%s", fmt);

View File

@ -6,6 +6,7 @@
*/ */
#include <linux/security.h> #include <linux/security.h>
#include <linux/kthread.h>
#include <linux/module.h> #include <linux/module.h>
#include <linux/ctype.h> #include <linux/ctype.h>
#include <linux/mutex.h> #include <linux/mutex.h>
@ -17,15 +18,77 @@
static LIST_HEAD(trigger_commands); static LIST_HEAD(trigger_commands);
static DEFINE_MUTEX(trigger_cmd_mutex); static DEFINE_MUTEX(trigger_cmd_mutex);
static struct task_struct *trigger_kthread;
static struct llist_head trigger_data_free_list;
static DEFINE_MUTEX(trigger_data_kthread_mutex);
/* Bulk garbage collection of event_trigger_data elements */
static int trigger_kthread_fn(void *ignore)
{
struct event_trigger_data *data, *tmp;
struct llist_node *llnodes;
/* Once this task starts, it lives forever */
for (;;) {
set_current_state(TASK_INTERRUPTIBLE);
if (llist_empty(&trigger_data_free_list))
schedule();
__set_current_state(TASK_RUNNING);
llnodes = llist_del_all(&trigger_data_free_list);
/* make sure current triggers exit before free */
tracepoint_synchronize_unregister();
llist_for_each_entry_safe(data, tmp, llnodes, llist)
kfree(data);
}
return 0;
}
void trigger_data_free(struct event_trigger_data *data) void trigger_data_free(struct event_trigger_data *data)
{ {
if (data->cmd_ops->set_filter) if (data->cmd_ops->set_filter)
data->cmd_ops->set_filter(NULL, data, NULL); data->cmd_ops->set_filter(NULL, data, NULL);
/* make sure current triggers exit before free */ if (unlikely(!trigger_kthread)) {
tracepoint_synchronize_unregister(); guard(mutex)(&trigger_data_kthread_mutex);
/* Check again after taking mutex */
if (!trigger_kthread) {
struct task_struct *kthread;
kfree(data); kthread = kthread_create(trigger_kthread_fn, NULL,
"trigger_data_free");
if (!IS_ERR(kthread))
WRITE_ONCE(trigger_kthread, kthread);
}
}
if (!trigger_kthread) {
/* Do it the slow way */
tracepoint_synchronize_unregister();
kfree(data);
return;
}
llist_add(&data->llist, &trigger_data_free_list);
wake_up_process(trigger_kthread);
}
static inline void data_ops_trigger(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event)
{
const struct event_command *cmd_ops = data->cmd_ops;
if (data->flags & EVENT_TRIGGER_FL_COUNT) {
if (!cmd_ops->count_func(data, buffer, rec, event))
return;
}
cmd_ops->trigger(data, buffer, rec, event);
} }
/** /**
@ -70,7 +133,7 @@ event_triggers_call(struct trace_event_file *file,
if (data->paused) if (data->paused)
continue; continue;
if (!rec) { if (!rec) {
data->ops->trigger(data, buffer, rec, event); data_ops_trigger(data, buffer, rec, event);
continue; continue;
} }
filter = rcu_dereference_sched(data->filter); filter = rcu_dereference_sched(data->filter);
@ -80,7 +143,7 @@ event_triggers_call(struct trace_event_file *file,
tt |= data->cmd_ops->trigger_type; tt |= data->cmd_ops->trigger_type;
continue; continue;
} }
data->ops->trigger(data, buffer, rec, event); data_ops_trigger(data, buffer, rec, event);
} }
return tt; return tt;
} }
@ -122,7 +185,7 @@ event_triggers_post_call(struct trace_event_file *file,
if (data->paused) if (data->paused)
continue; continue;
if (data->cmd_ops->trigger_type & tt) if (data->cmd_ops->trigger_type & tt)
data->ops->trigger(data, NULL, NULL, NULL); data_ops_trigger(data, NULL, NULL, NULL);
} }
} }
EXPORT_SYMBOL_GPL(event_triggers_post_call); EXPORT_SYMBOL_GPL(event_triggers_post_call);
@ -191,7 +254,7 @@ static int trigger_show(struct seq_file *m, void *v)
} }
data = list_entry(v, struct event_trigger_data, list); data = list_entry(v, struct event_trigger_data, list);
data->ops->print(m, data); data->cmd_ops->print(m, data);
return 0; return 0;
} }
@ -245,7 +308,8 @@ int trigger_process_regex(struct trace_event_file *file, char *buff)
char *command, *next; char *command, *next;
struct event_command *p; struct event_command *p;
next = buff = skip_spaces(buff); next = buff = strim(buff);
command = strsep(&next, ": \t"); command = strsep(&next, ": \t");
if (next) { if (next) {
next = skip_spaces(next); next = skip_spaces(next);
@ -282,8 +346,6 @@ static ssize_t event_trigger_regex_write(struct file *file,
if (IS_ERR(buf)) if (IS_ERR(buf))
return PTR_ERR(buf); return PTR_ERR(buf);
strim(buf);
guard(mutex)(&event_mutex); guard(mutex)(&event_mutex);
event_file = event_file_file(file); event_file = event_file_file(file);
@ -300,13 +362,9 @@ static ssize_t event_trigger_regex_write(struct file *file,
static int event_trigger_regex_release(struct inode *inode, struct file *file) static int event_trigger_regex_release(struct inode *inode, struct file *file)
{ {
mutex_lock(&event_mutex);
if (file->f_mode & FMODE_READ) if (file->f_mode & FMODE_READ)
seq_release(inode, file); seq_release(inode, file);
mutex_unlock(&event_mutex);
return 0; return 0;
} }
@ -378,7 +436,37 @@ __init int unregister_event_command(struct event_command *cmd)
} }
/** /**
* event_trigger_print - Generic event_trigger_ops @print implementation * event_trigger_count - Optional count function for event triggers
* @data: Trigger-specific data
* @buffer: The ring buffer that the event is being written to
* @rec: The trace entry for the event, NULL for unconditional invocation
* @event: The event meta data in the ring buffer
*
* For triggers that can take a count parameter that doesn't do anything
* special, they can use this function to assign to their .count_func
* field.
*
* This simply does a count down of the @data->count field.
*
* If the @data->count is greater than zero, it will decrement it.
*
* Returns false if @data->count is zero, otherwise true.
*/
bool event_trigger_count(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event)
{
if (!data->count)
return false;
if (data->count != -1)
(data->count)--;
return true;
}
/**
* event_trigger_print - Generic event_command @print implementation
* @name: The name of the event trigger * @name: The name of the event trigger
* @m: The seq_file being printed to * @m: The seq_file being printed to
* @data: Trigger-specific data * @data: Trigger-specific data
@ -413,7 +501,7 @@ event_trigger_print(const char *name, struct seq_file *m,
} }
/** /**
* event_trigger_init - Generic event_trigger_ops @init implementation * event_trigger_init - Generic event_command @init implementation
* @data: Trigger-specific data * @data: Trigger-specific data
* *
* Common implementation of event trigger initialization. * Common implementation of event trigger initialization.
@ -430,7 +518,7 @@ int event_trigger_init(struct event_trigger_data *data)
} }
/** /**
* event_trigger_free - Generic event_trigger_ops @free implementation * event_trigger_free - Generic event_command @free implementation
* @data: Trigger-specific data * @data: Trigger-specific data
* *
* Common implementation of event trigger de-initialization. * Common implementation of event trigger de-initialization.
@ -492,8 +580,8 @@ clear_event_triggers(struct trace_array *tr)
list_for_each_entry_safe(data, n, &file->triggers, list) { list_for_each_entry_safe(data, n, &file->triggers, list) {
trace_event_trigger_enable_disable(file, 0); trace_event_trigger_enable_disable(file, 0);
list_del_rcu(&data->list); list_del_rcu(&data->list);
if (data->ops->free) if (data->cmd_ops->free)
data->ops->free(data); data->cmd_ops->free(data);
} }
} }
} }
@ -556,8 +644,8 @@ static int register_trigger(char *glob,
return -EEXIST; return -EEXIST;
} }
if (data->ops->init) { if (data->cmd_ops->init) {
ret = data->ops->init(data); ret = data->cmd_ops->init(data);
if (ret < 0) if (ret < 0)
return ret; return ret;
} }
@ -595,8 +683,8 @@ static bool try_unregister_trigger(char *glob,
} }
if (data) { if (data) {
if (data->ops->free) if (data->cmd_ops->free)
data->ops->free(data); data->cmd_ops->free(data);
return true; return true;
} }
@ -807,9 +895,13 @@ int event_trigger_separate_filter(char *param_and_filter, char **param,
* @private_data: User data to associate with the event trigger * @private_data: User data to associate with the event trigger
* *
* Allocate an event_trigger_data instance and initialize it. The * Allocate an event_trigger_data instance and initialize it. The
* @cmd_ops are used along with the @cmd and @param to get the * @cmd_ops defines how the trigger will operate. If @param is set,
* trigger_ops to assign to the event_trigger_data. @private_data can * and @cmd_ops->trigger_ops->count_func is non NULL, then the
* also be passed in and associated with the event_trigger_data. * data->count is set to @param and before the trigger is executed, the
* @cmd_ops->trigger_ops->count_func() is called. If that function returns
* false, the @cmd_ops->trigger_ops->trigger() function will not be called.
* @private_data can also be passed in and associated with the
* event_trigger_data.
* *
* Use trigger_data_free() to free an event_trigger_data object. * Use trigger_data_free() to free an event_trigger_data object.
* *
@ -821,18 +913,16 @@ struct event_trigger_data *trigger_data_alloc(struct event_command *cmd_ops,
void *private_data) void *private_data)
{ {
struct event_trigger_data *trigger_data; struct event_trigger_data *trigger_data;
const struct event_trigger_ops *trigger_ops;
trigger_ops = cmd_ops->get_trigger_ops(cmd, param);
trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL); trigger_data = kzalloc(sizeof(*trigger_data), GFP_KERNEL);
if (!trigger_data) if (!trigger_data)
return NULL; return NULL;
trigger_data->count = -1; trigger_data->count = -1;
trigger_data->ops = trigger_ops;
trigger_data->cmd_ops = cmd_ops; trigger_data->cmd_ops = cmd_ops;
trigger_data->private_data = private_data; trigger_data->private_data = private_data;
if (param && cmd_ops->count_func)
trigger_data->flags |= EVENT_TRIGGER_FL_COUNT;
INIT_LIST_HEAD(&trigger_data->list); INIT_LIST_HEAD(&trigger_data->list);
INIT_LIST_HEAD(&trigger_data->named_list); INIT_LIST_HEAD(&trigger_data->named_list);
@ -1271,31 +1361,28 @@ traceon_trigger(struct event_trigger_data *data,
tracing_on(); tracing_on();
} }
static void static bool
traceon_count_trigger(struct event_trigger_data *data, traceon_count_func(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec, struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event) struct ring_buffer_event *event)
{ {
struct trace_event_file *file = data->private_data; struct trace_event_file *file = data->private_data;
if (file) { if (file) {
if (tracer_tracing_is_on(file->tr)) if (tracer_tracing_is_on(file->tr))
return; return false;
} else { } else {
if (tracing_is_on()) if (tracing_is_on())
return; return false;
} }
if (!data->count) if (!data->count)
return; return false;
if (data->count != -1) if (data->count != -1)
(data->count)--; (data->count)--;
if (file) return true;
tracer_tracing_on(file->tr);
else
tracing_on();
} }
static void static void
@ -1319,31 +1406,28 @@ traceoff_trigger(struct event_trigger_data *data,
tracing_off(); tracing_off();
} }
static void static bool
traceoff_count_trigger(struct event_trigger_data *data, traceoff_count_func(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec, struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event) struct ring_buffer_event *event)
{ {
struct trace_event_file *file = data->private_data; struct trace_event_file *file = data->private_data;
if (file) { if (file) {
if (!tracer_tracing_is_on(file->tr)) if (!tracer_tracing_is_on(file->tr))
return; return false;
} else { } else {
if (!tracing_is_on()) if (!tracing_is_on())
return; return false;
} }
if (!data->count) if (!data->count)
return; return false;
if (data->count != -1) if (data->count != -1)
(data->count)--; (data->count)--;
if (file) return true;
tracer_tracing_off(file->tr);
else
tracing_off();
} }
static int static int
@ -1360,58 +1444,18 @@ traceoff_trigger_print(struct seq_file *m, struct event_trigger_data *data)
data->filter_str); data->filter_str);
} }
static const struct event_trigger_ops traceon_trigger_ops = {
.trigger = traceon_trigger,
.print = traceon_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops traceon_count_trigger_ops = {
.trigger = traceon_count_trigger,
.print = traceon_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops traceoff_trigger_ops = {
.trigger = traceoff_trigger,
.print = traceoff_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops traceoff_count_trigger_ops = {
.trigger = traceoff_count_trigger,
.print = traceoff_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops *
onoff_get_trigger_ops(char *cmd, char *param)
{
const struct event_trigger_ops *ops;
/* we register both traceon and traceoff to this callback */
if (strcmp(cmd, "traceon") == 0)
ops = param ? &traceon_count_trigger_ops :
&traceon_trigger_ops;
else
ops = param ? &traceoff_count_trigger_ops :
&traceoff_trigger_ops;
return ops;
}
static struct event_command trigger_traceon_cmd = { static struct event_command trigger_traceon_cmd = {
.name = "traceon", .name = "traceon",
.trigger_type = ETT_TRACE_ONOFF, .trigger_type = ETT_TRACE_ONOFF,
.parse = event_trigger_parse, .parse = event_trigger_parse,
.reg = register_trigger, .reg = register_trigger,
.unreg = unregister_trigger, .unreg = unregister_trigger,
.get_trigger_ops = onoff_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = traceon_trigger,
.count_func = traceon_count_func,
.print = traceon_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
}; };
static struct event_command trigger_traceoff_cmd = { static struct event_command trigger_traceoff_cmd = {
@ -1421,8 +1465,12 @@ static struct event_command trigger_traceoff_cmd = {
.parse = event_trigger_parse, .parse = event_trigger_parse,
.reg = register_trigger, .reg = register_trigger,
.unreg = unregister_trigger, .unreg = unregister_trigger,
.get_trigger_ops = onoff_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = traceoff_trigger,
.count_func = traceoff_count_func,
.print = traceoff_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
}; };
#ifdef CONFIG_TRACER_SNAPSHOT #ifdef CONFIG_TRACER_SNAPSHOT
@ -1439,20 +1487,6 @@ snapshot_trigger(struct event_trigger_data *data,
tracing_snapshot(); tracing_snapshot();
} }
static void
snapshot_count_trigger(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event)
{
if (!data->count)
return;
if (data->count != -1)
(data->count)--;
snapshot_trigger(data, buffer, rec, event);
}
static int static int
register_snapshot_trigger(char *glob, register_snapshot_trigger(char *glob,
struct event_trigger_data *data, struct event_trigger_data *data,
@ -1484,34 +1518,18 @@ snapshot_trigger_print(struct seq_file *m, struct event_trigger_data *data)
data->filter_str); data->filter_str);
} }
static const struct event_trigger_ops snapshot_trigger_ops = {
.trigger = snapshot_trigger,
.print = snapshot_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops snapshot_count_trigger_ops = {
.trigger = snapshot_count_trigger,
.print = snapshot_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops *
snapshot_get_trigger_ops(char *cmd, char *param)
{
return param ? &snapshot_count_trigger_ops : &snapshot_trigger_ops;
}
static struct event_command trigger_snapshot_cmd = { static struct event_command trigger_snapshot_cmd = {
.name = "snapshot", .name = "snapshot",
.trigger_type = ETT_SNAPSHOT, .trigger_type = ETT_SNAPSHOT,
.parse = event_trigger_parse, .parse = event_trigger_parse,
.reg = register_snapshot_trigger, .reg = register_snapshot_trigger,
.unreg = unregister_snapshot_trigger, .unreg = unregister_snapshot_trigger,
.get_trigger_ops = snapshot_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = snapshot_trigger,
.count_func = event_trigger_count,
.print = snapshot_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
}; };
static __init int register_trigger_snapshot_cmd(void) static __init int register_trigger_snapshot_cmd(void)
@ -1558,20 +1576,6 @@ stacktrace_trigger(struct event_trigger_data *data,
trace_dump_stack(STACK_SKIP); trace_dump_stack(STACK_SKIP);
} }
static void
stacktrace_count_trigger(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event)
{
if (!data->count)
return;
if (data->count != -1)
(data->count)--;
stacktrace_trigger(data, buffer, rec, event);
}
static int static int
stacktrace_trigger_print(struct seq_file *m, struct event_trigger_data *data) stacktrace_trigger_print(struct seq_file *m, struct event_trigger_data *data)
{ {
@ -1579,26 +1583,6 @@ stacktrace_trigger_print(struct seq_file *m, struct event_trigger_data *data)
data->filter_str); data->filter_str);
} }
static const struct event_trigger_ops stacktrace_trigger_ops = {
.trigger = stacktrace_trigger,
.print = stacktrace_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops stacktrace_count_trigger_ops = {
.trigger = stacktrace_count_trigger,
.print = stacktrace_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
};
static const struct event_trigger_ops *
stacktrace_get_trigger_ops(char *cmd, char *param)
{
return param ? &stacktrace_count_trigger_ops : &stacktrace_trigger_ops;
}
static struct event_command trigger_stacktrace_cmd = { static struct event_command trigger_stacktrace_cmd = {
.name = "stacktrace", .name = "stacktrace",
.trigger_type = ETT_STACKTRACE, .trigger_type = ETT_STACKTRACE,
@ -1606,8 +1590,12 @@ static struct event_command trigger_stacktrace_cmd = {
.parse = event_trigger_parse, .parse = event_trigger_parse,
.reg = register_trigger, .reg = register_trigger,
.unreg = unregister_trigger, .unreg = unregister_trigger,
.get_trigger_ops = stacktrace_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = stacktrace_trigger,
.count_func = event_trigger_count,
.print = stacktrace_trigger_print,
.init = event_trigger_init,
.free = event_trigger_free,
}; };
static __init int register_trigger_stacktrace_cmd(void) static __init int register_trigger_stacktrace_cmd(void)
@ -1642,24 +1630,24 @@ event_enable_trigger(struct event_trigger_data *data,
set_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &enable_data->file->flags); set_bit(EVENT_FILE_FL_SOFT_DISABLED_BIT, &enable_data->file->flags);
} }
static void static bool
event_enable_count_trigger(struct event_trigger_data *data, event_enable_count_func(struct event_trigger_data *data,
struct trace_buffer *buffer, void *rec, struct trace_buffer *buffer, void *rec,
struct ring_buffer_event *event) struct ring_buffer_event *event)
{ {
struct enable_trigger_data *enable_data = data->private_data; struct enable_trigger_data *enable_data = data->private_data;
if (!data->count) if (!data->count)
return; return false;
/* Skip if the event is in a state we want to switch to */ /* Skip if the event is in a state we want to switch to */
if (enable_data->enable == !(enable_data->file->flags & EVENT_FILE_FL_SOFT_DISABLED)) if (enable_data->enable == !(enable_data->file->flags & EVENT_FILE_FL_SOFT_DISABLED))
return; return false;
if (data->count != -1) if (data->count != -1)
(data->count)--; (data->count)--;
event_enable_trigger(data, buffer, rec, event); return true;
} }
int event_enable_trigger_print(struct seq_file *m, int event_enable_trigger_print(struct seq_file *m,
@ -1704,34 +1692,6 @@ void event_enable_trigger_free(struct event_trigger_data *data)
} }
} }
static const struct event_trigger_ops event_enable_trigger_ops = {
.trigger = event_enable_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops event_enable_count_trigger_ops = {
.trigger = event_enable_count_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops event_disable_trigger_ops = {
.trigger = event_enable_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
static const struct event_trigger_ops event_disable_count_trigger_ops = {
.trigger = event_enable_count_trigger,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
};
int event_enable_trigger_parse(struct event_command *cmd_ops, int event_enable_trigger_parse(struct event_command *cmd_ops,
struct trace_event_file *file, struct trace_event_file *file,
char *glob, char *cmd, char *param_and_filter) char *glob, char *cmd, char *param_and_filter)
@ -1861,8 +1821,8 @@ int event_enable_register_trigger(char *glob,
} }
} }
if (data->ops->init) { if (data->cmd_ops->init) {
ret = data->ops->init(data); ret = data->cmd_ops->init(data);
if (ret < 0) if (ret < 0)
return ret; return ret;
} }
@ -1902,30 +1862,8 @@ void event_enable_unregister_trigger(char *glob,
} }
} }
if (data && data->ops->free) if (data && data->cmd_ops->free)
data->ops->free(data); data->cmd_ops->free(data);
}
static const struct event_trigger_ops *
event_enable_get_trigger_ops(char *cmd, char *param)
{
const struct event_trigger_ops *ops;
bool enable;
#ifdef CONFIG_HIST_TRIGGERS
enable = ((strcmp(cmd, ENABLE_EVENT_STR) == 0) ||
(strcmp(cmd, ENABLE_HIST_STR) == 0));
#else
enable = strcmp(cmd, ENABLE_EVENT_STR) == 0;
#endif
if (enable)
ops = param ? &event_enable_count_trigger_ops :
&event_enable_trigger_ops;
else
ops = param ? &event_disable_count_trigger_ops :
&event_disable_trigger_ops;
return ops;
} }
static struct event_command trigger_enable_cmd = { static struct event_command trigger_enable_cmd = {
@ -1934,8 +1872,12 @@ static struct event_command trigger_enable_cmd = {
.parse = event_enable_trigger_parse, .parse = event_enable_trigger_parse,
.reg = event_enable_register_trigger, .reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger, .unreg = event_enable_unregister_trigger,
.get_trigger_ops = event_enable_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = event_enable_trigger,
.count_func = event_enable_count_func,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
}; };
static struct event_command trigger_disable_cmd = { static struct event_command trigger_disable_cmd = {
@ -1944,8 +1886,12 @@ static struct event_command trigger_disable_cmd = {
.parse = event_enable_trigger_parse, .parse = event_enable_trigger_parse,
.reg = event_enable_register_trigger, .reg = event_enable_register_trigger,
.unreg = event_enable_unregister_trigger, .unreg = event_enable_unregister_trigger,
.get_trigger_ops = event_enable_get_trigger_ops,
.set_filter = set_trigger_filter, .set_filter = set_trigger_filter,
.trigger = event_enable_trigger,
.count_func = event_enable_count_func,
.print = event_enable_trigger_print,
.init = event_trigger_init,
.free = event_enable_trigger_free,
}; };
static __init void unregister_trigger_enable_disable_cmds(void) static __init void unregister_trigger_enable_disable_cmds(void)

View File

@ -632,7 +632,7 @@ print_fentry_event(struct trace_iterator *iter, int flags,
trace_seq_printf(s, "%s: (", trace_probe_name(tp)); trace_seq_printf(s, "%s: (", trace_probe_name(tp));
if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET)) if (!seq_print_ip_sym_offset(s, field->ip, flags))
goto out; goto out;
trace_seq_putc(s, ')'); trace_seq_putc(s, ')');
@ -662,12 +662,12 @@ print_fexit_event(struct trace_iterator *iter, int flags,
trace_seq_printf(s, "%s: (", trace_probe_name(tp)); trace_seq_printf(s, "%s: (", trace_probe_name(tp));
if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET)) if (!seq_print_ip_sym_offset(s, field->ret_ip, flags))
goto out; goto out;
trace_seq_puts(s, " <- "); trace_seq_puts(s, " <- ");
if (!seq_print_ip_sym(s, field->func, flags & ~TRACE_ITER_SYM_OFFSET)) if (!seq_print_ip_sym_no_offset(s, field->func, flags))
goto out; goto out;
trace_seq_putc(s, ')'); trace_seq_putc(s, ')');

View File

@ -154,11 +154,11 @@ static int function_trace_init(struct trace_array *tr)
if (!tr->ops) if (!tr->ops)
return -ENOMEM; return -ENOMEM;
func = select_trace_function(func_flags.val); func = select_trace_function(tr->current_trace_flags->val);
if (!func) if (!func)
return -EINVAL; return -EINVAL;
if (!handle_func_repeats(tr, func_flags.val)) if (!handle_func_repeats(tr, tr->current_trace_flags->val))
return -ENOMEM; return -ENOMEM;
ftrace_init_array_ops(tr, func); ftrace_init_array_ops(tr, func);
@ -459,14 +459,14 @@ func_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
u32 new_flags; u32 new_flags;
/* Do nothing if already set. */ /* Do nothing if already set. */
if (!!set == !!(func_flags.val & bit)) if (!!set == !!(tr->current_trace_flags->val & bit))
return 0; return 0;
/* We can change this flag only when not running. */ /* We can change this flag only when not running. */
if (tr->current_trace != &function_trace) if (tr->current_trace != &function_trace)
return 0; return 0;
new_flags = (func_flags.val & ~bit) | (set ? bit : 0); new_flags = (tr->current_trace_flags->val & ~bit) | (set ? bit : 0);
func = select_trace_function(new_flags); func = select_trace_function(new_flags);
if (!func) if (!func)
return -EINVAL; return -EINVAL;
@ -491,7 +491,7 @@ static struct tracer function_trace __tracer_data =
.init = function_trace_init, .init = function_trace_init,
.reset = function_trace_reset, .reset = function_trace_reset,
.start = function_trace_start, .start = function_trace_start,
.flags = &func_flags, .default_flags = &func_flags,
.set_flag = func_set_flag, .set_flag = func_set_flag,
.allow_instances = true, .allow_instances = true,
#ifdef CONFIG_FTRACE_SELFTEST #ifdef CONFIG_FTRACE_SELFTEST

View File

@ -16,9 +16,12 @@
#include "trace.h" #include "trace.h"
#include "trace_output.h" #include "trace_output.h"
/* When set, irq functions will be ignored */ /* When set, irq functions might be ignored */
static int ftrace_graph_skip_irqs; static int ftrace_graph_skip_irqs;
/* Do not record function time when task is sleeping */
int fgraph_no_sleep_time;
struct fgraph_cpu_data { struct fgraph_cpu_data {
pid_t last_pid; pid_t last_pid;
int depth; int depth;
@ -33,14 +36,19 @@ struct fgraph_ent_args {
unsigned long args[FTRACE_REGS_MAX_ARGS]; unsigned long args[FTRACE_REGS_MAX_ARGS];
}; };
struct fgraph_retaddr_ent_args {
struct fgraph_retaddr_ent_entry ent;
/* Force the sizeof of args[] to have FTRACE_REGS_MAX_ARGS entries */
unsigned long args[FTRACE_REGS_MAX_ARGS];
};
struct fgraph_data { struct fgraph_data {
struct fgraph_cpu_data __percpu *cpu_data; struct fgraph_cpu_data __percpu *cpu_data;
/* Place to preserve last processed entry. */ /* Place to preserve last processed entry. */
union { union {
struct fgraph_ent_args ent; struct fgraph_ent_args ent;
/* TODO allow retaddr to have args */ struct fgraph_retaddr_ent_args rent;
struct fgraph_retaddr_ent_entry rent;
}; };
struct ftrace_graph_ret_entry ret; struct ftrace_graph_ret_entry ret;
int failed; int failed;
@ -85,11 +93,6 @@ static struct tracer_opt trace_opts[] = {
/* Include sleep time (scheduled out) between entry and return */ /* Include sleep time (scheduled out) between entry and return */
{ TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) }, { TRACER_OPT(sleep-time, TRACE_GRAPH_SLEEP_TIME) },
#ifdef CONFIG_FUNCTION_PROFILER
/* Include time within nested functions */
{ TRACER_OPT(graph-time, TRACE_GRAPH_GRAPH_TIME) },
#endif
{ } /* Empty entry */ { } /* Empty entry */
}; };
@ -97,13 +100,13 @@ static struct tracer_flags tracer_flags = {
/* Don't display overruns, proc, or tail by default */ /* Don't display overruns, proc, or tail by default */
.val = TRACE_GRAPH_PRINT_CPU | TRACE_GRAPH_PRINT_OVERHEAD | .val = TRACE_GRAPH_PRINT_CPU | TRACE_GRAPH_PRINT_OVERHEAD |
TRACE_GRAPH_PRINT_DURATION | TRACE_GRAPH_PRINT_IRQS | TRACE_GRAPH_PRINT_DURATION | TRACE_GRAPH_PRINT_IRQS |
TRACE_GRAPH_SLEEP_TIME | TRACE_GRAPH_GRAPH_TIME, TRACE_GRAPH_SLEEP_TIME,
.opts = trace_opts .opts = trace_opts
}; };
static bool tracer_flags_is_set(u32 flags) static bool tracer_flags_is_set(struct trace_array *tr, u32 flags)
{ {
return (tracer_flags.val & flags) == flags; return (tr->current_trace_flags->val & flags) == flags;
} }
/* /*
@ -162,20 +165,32 @@ int __trace_graph_entry(struct trace_array *tr,
int __trace_graph_retaddr_entry(struct trace_array *tr, int __trace_graph_retaddr_entry(struct trace_array *tr,
struct ftrace_graph_ent *trace, struct ftrace_graph_ent *trace,
unsigned int trace_ctx, unsigned int trace_ctx,
unsigned long retaddr) unsigned long retaddr,
struct ftrace_regs *fregs)
{ {
struct ring_buffer_event *event; struct ring_buffer_event *event;
struct trace_buffer *buffer = tr->array_buffer.buffer; struct trace_buffer *buffer = tr->array_buffer.buffer;
struct fgraph_retaddr_ent_entry *entry; struct fgraph_retaddr_ent_entry *entry;
int size;
/* If fregs is defined, add FTRACE_REGS_MAX_ARGS long size words */
size = sizeof(*entry) + (FTRACE_REGS_MAX_ARGS * !!fregs * sizeof(long));
event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_RETADDR_ENT, event = trace_buffer_lock_reserve(buffer, TRACE_GRAPH_RETADDR_ENT,
sizeof(*entry), trace_ctx); size, trace_ctx);
if (!event) if (!event)
return 0; return 0;
entry = ring_buffer_event_data(event); entry = ring_buffer_event_data(event);
entry->graph_ent.func = trace->func; entry->graph_rent.ent = *trace;
entry->graph_ent.depth = trace->depth; entry->graph_rent.retaddr = retaddr;
entry->graph_ent.retaddr = retaddr;
#ifdef CONFIG_HAVE_FUNCTION_ARG_ACCESS_API
if (fregs) {
for (int i = 0; i < FTRACE_REGS_MAX_ARGS; i++)
entry->args[i] = ftrace_regs_get_argument(fregs, i);
}
#endif
trace_buffer_unlock_commit_nostack(buffer, event); trace_buffer_unlock_commit_nostack(buffer, event);
return 1; return 1;
@ -184,17 +199,21 @@ int __trace_graph_retaddr_entry(struct trace_array *tr,
int __trace_graph_retaddr_entry(struct trace_array *tr, int __trace_graph_retaddr_entry(struct trace_array *tr,
struct ftrace_graph_ent *trace, struct ftrace_graph_ent *trace,
unsigned int trace_ctx, unsigned int trace_ctx,
unsigned long retaddr) unsigned long retaddr,
struct ftrace_regs *fregs)
{ {
return 1; return 1;
} }
#endif #endif
static inline int ftrace_graph_ignore_irqs(void) static inline int ftrace_graph_ignore_irqs(struct trace_array *tr)
{ {
if (!ftrace_graph_skip_irqs || trace_recursion_test(TRACE_IRQ_BIT)) if (!ftrace_graph_skip_irqs || trace_recursion_test(TRACE_IRQ_BIT))
return 0; return 0;
if (tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_IRQS))
return 0;
return in_hardirq(); return in_hardirq();
} }
@ -232,22 +251,20 @@ static int graph_entry(struct ftrace_graph_ent *trace,
return 1; return 1;
} }
if (!ftrace_trace_task(tr))
return 0;
if (ftrace_graph_ignore_func(gops, trace)) if (ftrace_graph_ignore_func(gops, trace))
return 0; return 0;
if (ftrace_graph_ignore_irqs()) if (ftrace_graph_ignore_irqs(tr))
return 0; return 0;
if (fgraph_sleep_time) { if (fgraph_no_sleep_time &&
/* Only need to record the calltime */ !tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME)) {
ftimes = fgraph_reserve_data(gops->idx, sizeof(ftimes->calltime));
} else {
ftimes = fgraph_reserve_data(gops->idx, sizeof(*ftimes)); ftimes = fgraph_reserve_data(gops->idx, sizeof(*ftimes));
if (ftimes) if (ftimes)
ftimes->sleeptime = current->ftrace_sleeptime; ftimes->sleeptime = current->ftrace_sleeptime;
} else {
/* Only need to record the calltime */
ftimes = fgraph_reserve_data(gops->idx, sizeof(ftimes->calltime));
} }
if (!ftimes) if (!ftimes)
return 0; return 0;
@ -263,9 +280,10 @@ static int graph_entry(struct ftrace_graph_ent *trace,
trace_ctx = tracing_gen_ctx(); trace_ctx = tracing_gen_ctx();
if (IS_ENABLED(CONFIG_FUNCTION_GRAPH_RETADDR) && if (IS_ENABLED(CONFIG_FUNCTION_GRAPH_RETADDR) &&
tracer_flags_is_set(TRACE_GRAPH_PRINT_RETADDR)) { tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_RETADDR)) {
unsigned long retaddr = ftrace_graph_top_ret_addr(current); unsigned long retaddr = ftrace_graph_top_ret_addr(current);
ret = __trace_graph_retaddr_entry(tr, trace, trace_ctx, retaddr); ret = __trace_graph_retaddr_entry(tr, trace, trace_ctx,
retaddr, fregs);
} else { } else {
ret = __graph_entry(tr, trace, trace_ctx, fregs); ret = __graph_entry(tr, trace, trace_ctx, fregs);
} }
@ -333,11 +351,15 @@ void __trace_graph_return(struct trace_array *tr,
trace_buffer_unlock_commit_nostack(buffer, event); trace_buffer_unlock_commit_nostack(buffer, event);
} }
static void handle_nosleeptime(struct ftrace_graph_ret *trace, static void handle_nosleeptime(struct trace_array *tr,
struct ftrace_graph_ret *trace,
struct fgraph_times *ftimes, struct fgraph_times *ftimes,
int size) int size)
{ {
if (fgraph_sleep_time || size < sizeof(*ftimes)) if (size < sizeof(*ftimes))
return;
if (!fgraph_no_sleep_time || tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME))
return; return;
ftimes->calltime += current->ftrace_sleeptime - ftimes->sleeptime; ftimes->calltime += current->ftrace_sleeptime - ftimes->sleeptime;
@ -366,7 +388,7 @@ void trace_graph_return(struct ftrace_graph_ret *trace,
if (!ftimes) if (!ftimes)
return; return;
handle_nosleeptime(trace, ftimes, size); handle_nosleeptime(tr, trace, ftimes, size);
calltime = ftimes->calltime; calltime = ftimes->calltime;
@ -379,6 +401,7 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
struct ftrace_regs *fregs) struct ftrace_regs *fregs)
{ {
struct fgraph_times *ftimes; struct fgraph_times *ftimes;
struct trace_array *tr;
int size; int size;
ftrace_graph_addr_finish(gops, trace); ftrace_graph_addr_finish(gops, trace);
@ -392,7 +415,8 @@ static void trace_graph_thresh_return(struct ftrace_graph_ret *trace,
if (!ftimes) if (!ftimes)
return; return;
handle_nosleeptime(trace, ftimes, size); tr = gops->private;
handle_nosleeptime(tr, trace, ftimes, size);
if (tracing_thresh && if (tracing_thresh &&
(trace_clock_local() - ftimes->calltime < tracing_thresh)) (trace_clock_local() - ftimes->calltime < tracing_thresh))
@ -441,7 +465,7 @@ static int graph_trace_init(struct trace_array *tr)
{ {
int ret; int ret;
if (tracer_flags_is_set(TRACE_GRAPH_ARGS)) if (tracer_flags_is_set(tr, TRACE_GRAPH_ARGS))
tr->gops->entryfunc = trace_graph_entry_args; tr->gops->entryfunc = trace_graph_entry_args;
else else
tr->gops->entryfunc = trace_graph_entry; tr->gops->entryfunc = trace_graph_entry;
@ -451,6 +475,12 @@ static int graph_trace_init(struct trace_array *tr)
else else
tr->gops->retfunc = trace_graph_return; tr->gops->retfunc = trace_graph_return;
if (!tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_IRQS))
ftrace_graph_skip_irqs++;
if (!tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME))
fgraph_no_sleep_time++;
/* Make gops functions visible before we start tracing */ /* Make gops functions visible before we start tracing */
smp_mb(); smp_mb();
@ -468,10 +498,6 @@ static int ftrace_graph_trace_args(struct trace_array *tr, int set)
{ {
trace_func_graph_ent_t entry; trace_func_graph_ent_t entry;
/* Do nothing if the current tracer is not this tracer */
if (tr->current_trace != &graph_trace)
return 0;
if (set) if (set)
entry = trace_graph_entry_args; entry = trace_graph_entry_args;
else else
@ -492,6 +518,16 @@ static int ftrace_graph_trace_args(struct trace_array *tr, int set)
static void graph_trace_reset(struct trace_array *tr) static void graph_trace_reset(struct trace_array *tr)
{ {
if (!tracer_flags_is_set(tr, TRACE_GRAPH_PRINT_IRQS))
ftrace_graph_skip_irqs--;
if (WARN_ON_ONCE(ftrace_graph_skip_irqs < 0))
ftrace_graph_skip_irqs = 0;
if (!tracer_flags_is_set(tr, TRACE_GRAPH_SLEEP_TIME))
fgraph_no_sleep_time--;
if (WARN_ON_ONCE(fgraph_no_sleep_time < 0))
fgraph_no_sleep_time = 0;
tracing_stop_cmdline_record(); tracing_stop_cmdline_record();
unregister_ftrace_graph(tr->gops); unregister_ftrace_graph(tr->gops);
} }
@ -634,13 +670,9 @@ get_return_for_leaf(struct trace_iterator *iter,
* Save current and next entries for later reference * Save current and next entries for later reference
* if the output fails. * if the output fails.
*/ */
if (unlikely(curr->ent.type == TRACE_GRAPH_RETADDR_ENT)) { int size = min_t(int, sizeof(data->rent), iter->ent_size);
data->rent = *(struct fgraph_retaddr_ent_entry *)curr;
} else {
int size = min((int)sizeof(data->ent), (int)iter->ent_size);
memcpy(&data->ent, curr, size); memcpy(&data->rent, curr, size);
}
/* /*
* If the next event is not a return type, then * If the next event is not a return type, then
* we only care about what type it is. Otherwise we can * we only care about what type it is. Otherwise we can
@ -703,7 +735,7 @@ print_graph_irq(struct trace_iterator *iter, unsigned long addr,
addr >= (unsigned long)__irqentry_text_end) addr >= (unsigned long)__irqentry_text_end)
return; return;
if (tr->trace_flags & TRACE_ITER_CONTEXT_INFO) { if (tr->trace_flags & TRACE_ITER(CONTEXT_INFO)) {
/* Absolute time */ /* Absolute time */
if (flags & TRACE_GRAPH_PRINT_ABS_TIME) if (flags & TRACE_GRAPH_PRINT_ABS_TIME)
print_graph_abs_time(iter->ts, s); print_graph_abs_time(iter->ts, s);
@ -723,7 +755,7 @@ print_graph_irq(struct trace_iterator *iter, unsigned long addr,
} }
/* Latency format */ /* Latency format */
if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) if (tr->trace_flags & TRACE_ITER(LATENCY_FMT))
print_graph_lat_fmt(s, ent); print_graph_lat_fmt(s, ent);
} }
@ -777,7 +809,7 @@ print_graph_duration(struct trace_array *tr, unsigned long long duration,
struct trace_seq *s, u32 flags) struct trace_seq *s, u32 flags)
{ {
if (!(flags & TRACE_GRAPH_PRINT_DURATION) || if (!(flags & TRACE_GRAPH_PRINT_DURATION) ||
!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO)) !(tr->trace_flags & TRACE_ITER(CONTEXT_INFO)))
return; return;
/* No real adata, just filling the column with spaces */ /* No real adata, just filling the column with spaces */
@ -818,7 +850,7 @@ static void print_graph_retaddr(struct trace_seq *s, struct fgraph_retaddr_ent_e
trace_seq_puts(s, " /*"); trace_seq_puts(s, " /*");
trace_seq_puts(s, " <-"); trace_seq_puts(s, " <-");
seq_print_ip_sym(s, entry->graph_ent.retaddr, trace_flags | TRACE_ITER_SYM_OFFSET); seq_print_ip_sym_offset(s, entry->graph_rent.retaddr, trace_flags);
if (comment) if (comment)
trace_seq_puts(s, " */"); trace_seq_puts(s, " */");
@ -964,7 +996,7 @@ print_graph_entry_leaf(struct trace_iterator *iter,
trace_seq_printf(s, "%ps", (void *)ret_func); trace_seq_printf(s, "%ps", (void *)ret_func);
if (args_size >= FTRACE_REGS_MAX_ARGS * sizeof(long)) { if (args_size >= FTRACE_REGS_MAX_ARGS * sizeof(long)) {
print_function_args(s, entry->args, ret_func); print_function_args(s, FGRAPH_ENTRY_ARGS(entry), ret_func);
trace_seq_putc(s, ';'); trace_seq_putc(s, ';');
} else } else
trace_seq_puts(s, "();"); trace_seq_puts(s, "();");
@ -1016,7 +1048,7 @@ print_graph_entry_nested(struct trace_iterator *iter,
args_size = iter->ent_size - offsetof(struct ftrace_graph_ent_entry, args); args_size = iter->ent_size - offsetof(struct ftrace_graph_ent_entry, args);
if (args_size >= FTRACE_REGS_MAX_ARGS * sizeof(long)) if (args_size >= FTRACE_REGS_MAX_ARGS * sizeof(long))
print_function_args(s, entry->args, func); print_function_args(s, FGRAPH_ENTRY_ARGS(entry), func);
else else
trace_seq_puts(s, "()"); trace_seq_puts(s, "()");
@ -1054,7 +1086,7 @@ print_graph_prologue(struct trace_iterator *iter, struct trace_seq *s,
/* Interrupt */ /* Interrupt */
print_graph_irq(iter, addr, type, cpu, ent->pid, flags); print_graph_irq(iter, addr, type, cpu, ent->pid, flags);
if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO)) if (!(tr->trace_flags & TRACE_ITER(CONTEXT_INFO)))
return; return;
/* Absolute time */ /* Absolute time */
@ -1076,7 +1108,7 @@ print_graph_prologue(struct trace_iterator *iter, struct trace_seq *s,
} }
/* Latency format */ /* Latency format */
if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) if (tr->trace_flags & TRACE_ITER(LATENCY_FMT))
print_graph_lat_fmt(s, ent); print_graph_lat_fmt(s, ent);
return; return;
@ -1198,11 +1230,14 @@ print_graph_entry(struct ftrace_graph_ent_entry *field, struct trace_seq *s,
/* /*
* print_graph_entry() may consume the current event, * print_graph_entry() may consume the current event,
* thus @field may become invalid, so we need to save it. * thus @field may become invalid, so we need to save it.
* sizeof(struct ftrace_graph_ent_entry) is very small, * This function is shared by ftrace_graph_ent_entry and
* it can be safely saved at the stack. * fgraph_retaddr_ent_entry, the size of the latter one
* is larger, but it is very small and can be safely saved
* at the stack.
*/ */
struct ftrace_graph_ent_entry *entry; struct ftrace_graph_ent_entry *entry;
u8 save_buf[sizeof(*entry) + FTRACE_REGS_MAX_ARGS * sizeof(long)]; struct fgraph_retaddr_ent_entry *rentry;
u8 save_buf[sizeof(*rentry) + FTRACE_REGS_MAX_ARGS * sizeof(long)];
/* The ent_size is expected to be as big as the entry */ /* The ent_size is expected to be as big as the entry */
if (iter->ent_size > sizeof(save_buf)) if (iter->ent_size > sizeof(save_buf))
@ -1431,12 +1466,17 @@ print_graph_function_flags(struct trace_iterator *iter, u32 flags)
} }
#ifdef CONFIG_FUNCTION_GRAPH_RETADDR #ifdef CONFIG_FUNCTION_GRAPH_RETADDR
case TRACE_GRAPH_RETADDR_ENT: { case TRACE_GRAPH_RETADDR_ENT: {
struct fgraph_retaddr_ent_entry saved; /*
* ftrace_graph_ent_entry and fgraph_retaddr_ent_entry have
* similar functions and memory layouts. The only difference
* is that the latter one has an extra retaddr member, so
* they can share most of the logic.
*/
struct fgraph_retaddr_ent_entry *rfield; struct fgraph_retaddr_ent_entry *rfield;
trace_assign_type(rfield, entry); trace_assign_type(rfield, entry);
saved = *rfield; return print_graph_entry((struct ftrace_graph_ent_entry *)rfield,
return print_graph_entry((struct ftrace_graph_ent_entry *)&saved, s, iter, flags); s, iter, flags);
} }
#endif #endif
case TRACE_GRAPH_RET: { case TRACE_GRAPH_RET: {
@ -1459,7 +1499,8 @@ print_graph_function_flags(struct trace_iterator *iter, u32 flags)
static enum print_line_t static enum print_line_t
print_graph_function(struct trace_iterator *iter) print_graph_function(struct trace_iterator *iter)
{ {
return print_graph_function_flags(iter, tracer_flags.val); struct trace_array *tr = iter->tr;
return print_graph_function_flags(iter, tr->current_trace_flags->val);
} }
static enum print_line_t static enum print_line_t
@ -1495,7 +1536,7 @@ static void print_lat_header(struct seq_file *s, u32 flags)
static void __print_graph_headers_flags(struct trace_array *tr, static void __print_graph_headers_flags(struct trace_array *tr,
struct seq_file *s, u32 flags) struct seq_file *s, u32 flags)
{ {
int lat = tr->trace_flags & TRACE_ITER_LATENCY_FMT; int lat = tr->trace_flags & TRACE_ITER(LATENCY_FMT);
if (lat) if (lat)
print_lat_header(s, flags); print_lat_header(s, flags);
@ -1535,7 +1576,10 @@ static void __print_graph_headers_flags(struct trace_array *tr,
static void print_graph_headers(struct seq_file *s) static void print_graph_headers(struct seq_file *s)
{ {
print_graph_headers_flags(s, tracer_flags.val); struct trace_iterator *iter = s->private;
struct trace_array *tr = iter->tr;
print_graph_headers_flags(s, tr->current_trace_flags->val);
} }
void print_graph_headers_flags(struct seq_file *s, u32 flags) void print_graph_headers_flags(struct seq_file *s, u32 flags)
@ -1543,10 +1587,10 @@ void print_graph_headers_flags(struct seq_file *s, u32 flags)
struct trace_iterator *iter = s->private; struct trace_iterator *iter = s->private;
struct trace_array *tr = iter->tr; struct trace_array *tr = iter->tr;
if (!(tr->trace_flags & TRACE_ITER_CONTEXT_INFO)) if (!(tr->trace_flags & TRACE_ITER(CONTEXT_INFO)))
return; return;
if (tr->trace_flags & TRACE_ITER_LATENCY_FMT) { if (tr->trace_flags & TRACE_ITER(LATENCY_FMT)) {
/* print nothing if the buffers are empty */ /* print nothing if the buffers are empty */
if (trace_empty(iter)) if (trace_empty(iter))
return; return;
@ -1613,17 +1657,56 @@ void graph_trace_close(struct trace_iterator *iter)
static int static int
func_graph_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set) func_graph_set_flag(struct trace_array *tr, u32 old_flags, u32 bit, int set)
{ {
if (bit == TRACE_GRAPH_PRINT_IRQS) /*
ftrace_graph_skip_irqs = !set; * The function profiler gets updated even if function graph
* isn't the current tracer. Handle it separately.
*/
#ifdef CONFIG_FUNCTION_PROFILER
if (bit == TRACE_GRAPH_SLEEP_TIME && (tr->flags & TRACE_ARRAY_FL_GLOBAL) &&
!!set == fprofile_no_sleep_time) {
if (set) {
fgraph_no_sleep_time--;
if (WARN_ON_ONCE(fgraph_no_sleep_time < 0))
fgraph_no_sleep_time = 0;
fprofile_no_sleep_time = false;
} else {
fgraph_no_sleep_time++;
fprofile_no_sleep_time = true;
}
}
#endif
if (bit == TRACE_GRAPH_SLEEP_TIME) /* Do nothing if the current tracer is not this tracer */
ftrace_graph_sleep_time_control(set); if (tr->current_trace != &graph_trace)
return 0;
if (bit == TRACE_GRAPH_GRAPH_TIME) /* Do nothing if already set. */
ftrace_graph_graph_time_control(set); if (!!set == !!(tr->current_trace_flags->val & bit))
return 0;
if (bit == TRACE_GRAPH_ARGS) switch (bit) {
case TRACE_GRAPH_SLEEP_TIME:
if (set) {
fgraph_no_sleep_time--;
if (WARN_ON_ONCE(fgraph_no_sleep_time < 0))
fgraph_no_sleep_time = 0;
} else {
fgraph_no_sleep_time++;
}
break;
case TRACE_GRAPH_PRINT_IRQS:
if (set)
ftrace_graph_skip_irqs--;
else
ftrace_graph_skip_irqs++;
if (WARN_ON_ONCE(ftrace_graph_skip_irqs < 0))
ftrace_graph_skip_irqs = 0;
break;
case TRACE_GRAPH_ARGS:
return ftrace_graph_trace_args(tr, set); return ftrace_graph_trace_args(tr, set);
}
return 0; return 0;
} }
@ -1660,7 +1743,7 @@ static struct tracer graph_trace __tracer_data = {
.reset = graph_trace_reset, .reset = graph_trace_reset,
.print_line = print_graph_function, .print_line = print_graph_function,
.print_header = print_graph_headers, .print_header = print_graph_headers,
.flags = &tracer_flags, .default_flags = &tracer_flags,
.set_flag = func_graph_set_flag, .set_flag = func_graph_set_flag,
.allow_instances = true, .allow_instances = true,
#ifdef CONFIG_FTRACE_SELFTEST #ifdef CONFIG_FTRACE_SELFTEST

View File

@ -63,7 +63,7 @@ irq_trace(void)
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
static int irqsoff_display_graph(struct trace_array *tr, int set); static int irqsoff_display_graph(struct trace_array *tr, int set);
# define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH) # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER(DISPLAY_GRAPH))
#else #else
static inline int irqsoff_display_graph(struct trace_array *tr, int set) static inline int irqsoff_display_graph(struct trace_array *tr, int set)
{ {
@ -485,8 +485,8 @@ static int register_irqsoff_function(struct trace_array *tr, int graph, int set)
{ {
int ret; int ret;
/* 'set' is set if TRACE_ITER_FUNCTION is about to be set */ /* 'set' is set if TRACE_ITER(FUNCTION) is about to be set */
if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER_FUNCTION))) if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER(FUNCTION))))
return 0; return 0;
if (graph) if (graph)
@ -515,7 +515,7 @@ static void unregister_irqsoff_function(struct trace_array *tr, int graph)
static int irqsoff_function_set(struct trace_array *tr, u32 mask, int set) static int irqsoff_function_set(struct trace_array *tr, u32 mask, int set)
{ {
if (!(mask & TRACE_ITER_FUNCTION)) if (!(mask & TRACE_ITER(FUNCTION)))
return 0; return 0;
if (set) if (set)
@ -536,7 +536,7 @@ static inline int irqsoff_function_set(struct trace_array *tr, u32 mask, int set
} }
#endif /* CONFIG_FUNCTION_TRACER */ #endif /* CONFIG_FUNCTION_TRACER */
static int irqsoff_flag_changed(struct trace_array *tr, u32 mask, int set) static int irqsoff_flag_changed(struct trace_array *tr, u64 mask, int set)
{ {
struct tracer *tracer = tr->current_trace; struct tracer *tracer = tr->current_trace;
@ -544,7 +544,7 @@ static int irqsoff_flag_changed(struct trace_array *tr, u32 mask, int set)
return 0; return 0;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (mask & TRACE_ITER_DISPLAY_GRAPH) if (mask & TRACE_ITER(DISPLAY_GRAPH))
return irqsoff_display_graph(tr, set); return irqsoff_display_graph(tr, set);
#endif #endif
@ -582,10 +582,10 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
save_flags = tr->trace_flags; save_flags = tr->trace_flags;
/* non overwrite screws up the latency tracers */ /* non overwrite screws up the latency tracers */
set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); set_tracer_flag(tr, TRACE_ITER(OVERWRITE), 1);
set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), 1);
/* without pause, we will produce garbage if another latency occurs */ /* without pause, we will produce garbage if another latency occurs */
set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, 1); set_tracer_flag(tr, TRACE_ITER(PAUSE_ON_TRACE), 1);
tr->max_latency = 0; tr->max_latency = 0;
irqsoff_trace = tr; irqsoff_trace = tr;
@ -605,15 +605,15 @@ static int __irqsoff_tracer_init(struct trace_array *tr)
static void __irqsoff_tracer_reset(struct trace_array *tr) static void __irqsoff_tracer_reset(struct trace_array *tr)
{ {
int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int lat_flag = save_flags & TRACE_ITER(LATENCY_FMT);
int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; int overwrite_flag = save_flags & TRACE_ITER(OVERWRITE);
int pause_flag = save_flags & TRACE_ITER_PAUSE_ON_TRACE; int pause_flag = save_flags & TRACE_ITER(PAUSE_ON_TRACE);
stop_irqsoff_tracer(tr, is_graph(tr)); stop_irqsoff_tracer(tr, is_graph(tr));
set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), lat_flag);
set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); set_tracer_flag(tr, TRACE_ITER(OVERWRITE), overwrite_flag);
set_tracer_flag(tr, TRACE_ITER_PAUSE_ON_TRACE, pause_flag); set_tracer_flag(tr, TRACE_ITER(PAUSE_ON_TRACE), pause_flag);
ftrace_reset_array_ops(tr); ftrace_reset_array_ops(tr);
irqsoff_busy = false; irqsoff_busy = false;

View File

@ -31,7 +31,7 @@ static void ftrace_dump_buf(int skip_entries, long cpu_file)
old_userobj = tr->trace_flags; old_userobj = tr->trace_flags;
/* don't look at user memory in panic mode */ /* don't look at user memory in panic mode */
tr->trace_flags &= ~TRACE_ITER_SYM_USEROBJ; tr->trace_flags &= ~TRACE_ITER(SYM_USEROBJ);
kdb_printf("Dumping ftrace buffer:\n"); kdb_printf("Dumping ftrace buffer:\n");
if (skip_entries) if (skip_entries)

View File

@ -1584,7 +1584,7 @@ print_kprobe_event(struct trace_iterator *iter, int flags,
trace_seq_printf(s, "%s: (", trace_probe_name(tp)); trace_seq_printf(s, "%s: (", trace_probe_name(tp));
if (!seq_print_ip_sym(s, field->ip, flags | TRACE_ITER_SYM_OFFSET)) if (!seq_print_ip_sym_offset(s, field->ip, flags))
goto out; goto out;
trace_seq_putc(s, ')'); trace_seq_putc(s, ')');
@ -1614,12 +1614,12 @@ print_kretprobe_event(struct trace_iterator *iter, int flags,
trace_seq_printf(s, "%s: (", trace_probe_name(tp)); trace_seq_printf(s, "%s: (", trace_probe_name(tp));
if (!seq_print_ip_sym(s, field->ret_ip, flags | TRACE_ITER_SYM_OFFSET)) if (!seq_print_ip_sym_offset(s, field->ret_ip, flags))
goto out; goto out;
trace_seq_puts(s, " <- "); trace_seq_puts(s, " <- ");
if (!seq_print_ip_sym(s, field->func, flags & ~TRACE_ITER_SYM_OFFSET)) if (!seq_print_ip_sym_no_offset(s, field->func, flags))
goto out; goto out;
trace_seq_putc(s, ')'); trace_seq_putc(s, ')');

View File

@ -420,7 +420,7 @@ static int seq_print_user_ip(struct trace_seq *s, struct mm_struct *mm,
} }
mmap_read_unlock(mm); mmap_read_unlock(mm);
} }
if (ret && ((sym_flags & TRACE_ITER_SYM_ADDR) || !file)) if (ret && ((sym_flags & TRACE_ITER(SYM_ADDR)) || !file))
trace_seq_printf(s, " <" IP_FMT ">", ip); trace_seq_printf(s, " <" IP_FMT ">", ip);
return !trace_seq_has_overflowed(s); return !trace_seq_has_overflowed(s);
} }
@ -433,9 +433,9 @@ seq_print_ip_sym(struct trace_seq *s, unsigned long ip, unsigned long sym_flags)
goto out; goto out;
} }
trace_seq_print_sym(s, ip, sym_flags & TRACE_ITER_SYM_OFFSET); trace_seq_print_sym(s, ip, sym_flags & TRACE_ITER(SYM_OFFSET));
if (sym_flags & TRACE_ITER_SYM_ADDR) if (sym_flags & TRACE_ITER(SYM_ADDR))
trace_seq_printf(s, " <" IP_FMT ">", ip); trace_seq_printf(s, " <" IP_FMT ">", ip);
out: out:
@ -569,7 +569,7 @@ static int
lat_print_timestamp(struct trace_iterator *iter, u64 next_ts) lat_print_timestamp(struct trace_iterator *iter, u64 next_ts)
{ {
struct trace_array *tr = iter->tr; struct trace_array *tr = iter->tr;
unsigned long verbose = tr->trace_flags & TRACE_ITER_VERBOSE; unsigned long verbose = tr->trace_flags & TRACE_ITER(VERBOSE);
unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS; unsigned long in_ns = iter->iter_flags & TRACE_FILE_TIME_IN_NS;
unsigned long long abs_ts = iter->ts - iter->array_buffer->time_start; unsigned long long abs_ts = iter->ts - iter->array_buffer->time_start;
unsigned long long rel_ts = next_ts - iter->ts; unsigned long long rel_ts = next_ts - iter->ts;
@ -636,7 +636,7 @@ int trace_print_context(struct trace_iterator *iter)
trace_seq_printf(s, "%16s-%-7d ", comm, entry->pid); trace_seq_printf(s, "%16s-%-7d ", comm, entry->pid);
if (tr->trace_flags & TRACE_ITER_RECORD_TGID) { if (tr->trace_flags & TRACE_ITER(RECORD_TGID)) {
unsigned int tgid = trace_find_tgid(entry->pid); unsigned int tgid = trace_find_tgid(entry->pid);
if (!tgid) if (!tgid)
@ -647,7 +647,7 @@ int trace_print_context(struct trace_iterator *iter)
trace_seq_printf(s, "[%03d] ", iter->cpu); trace_seq_printf(s, "[%03d] ", iter->cpu);
if (tr->trace_flags & TRACE_ITER_IRQ_INFO) if (tr->trace_flags & TRACE_ITER(IRQ_INFO))
trace_print_lat_fmt(s, entry); trace_print_lat_fmt(s, entry);
trace_print_time(s, iter, iter->ts); trace_print_time(s, iter, iter->ts);
@ -661,7 +661,7 @@ int trace_print_lat_context(struct trace_iterator *iter)
struct trace_entry *entry, *next_entry; struct trace_entry *entry, *next_entry;
struct trace_array *tr = iter->tr; struct trace_array *tr = iter->tr;
struct trace_seq *s = &iter->seq; struct trace_seq *s = &iter->seq;
unsigned long verbose = (tr->trace_flags & TRACE_ITER_VERBOSE); unsigned long verbose = (tr->trace_flags & TRACE_ITER(VERBOSE));
u64 next_ts; u64 next_ts;
next_entry = trace_find_next_entry(iter, NULL, &next_ts); next_entry = trace_find_next_entry(iter, NULL, &next_ts);
@ -950,7 +950,9 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c
int offset; int offset;
int len; int len;
int ret; int ret;
int i;
void *pos; void *pos;
char *str;
list_for_each_entry_reverse(field, head, link) { list_for_each_entry_reverse(field, head, link) {
trace_seq_printf(&iter->seq, " %s=", field->name); trace_seq_printf(&iter->seq, " %s=", field->name);
@ -977,8 +979,29 @@ static void print_fields(struct trace_iterator *iter, struct trace_event_call *c
trace_seq_puts(&iter->seq, "<OVERFLOW>"); trace_seq_puts(&iter->seq, "<OVERFLOW>");
break; break;
} }
pos = (void *)iter->ent + offset; str = (char *)iter->ent + offset;
trace_seq_printf(&iter->seq, "%.*s", len, (char *)pos); /* Check if there's any non printable strings */
for (i = 0; i < len; i++) {
if (str[i] && !(isascii(str[i]) && isprint(str[i])))
break;
}
if (i < len) {
for (i = 0; i < len; i++) {
if (isascii(str[i]) && isprint(str[i]))
trace_seq_putc(&iter->seq, str[i]);
else
trace_seq_putc(&iter->seq, '.');
}
trace_seq_puts(&iter->seq, " (");
for (i = 0; i < len; i++) {
if (i)
trace_seq_putc(&iter->seq, ':');
trace_seq_printf(&iter->seq, "%02x", str[i]);
}
trace_seq_putc(&iter->seq, ')');
} else {
trace_seq_printf(&iter->seq, "%.*s", len, str);
}
break; break;
case FILTER_PTR_STRING: case FILTER_PTR_STRING:
if (!iter->fmt_size) if (!iter->fmt_size)
@ -1127,7 +1150,7 @@ static void print_fn_trace(struct trace_seq *s, unsigned long ip,
if (args) if (args)
print_function_args(s, args, ip); print_function_args(s, args, ip);
if ((flags & TRACE_ITER_PRINT_PARENT) && parent_ip) { if ((flags & TRACE_ITER(PRINT_PARENT)) && parent_ip) {
trace_seq_puts(s, " <-"); trace_seq_puts(s, " <-");
seq_print_ip_sym(s, parent_ip, flags); seq_print_ip_sym(s, parent_ip, flags);
} }
@ -1417,7 +1440,7 @@ static enum print_line_t trace_user_stack_print(struct trace_iterator *iter,
trace_seq_puts(s, "<user stack trace>\n"); trace_seq_puts(s, "<user stack trace>\n");
if (tr->trace_flags & TRACE_ITER_SYM_USEROBJ) { if (tr->trace_flags & TRACE_ITER(SYM_USEROBJ)) {
struct task_struct *task; struct task_struct *task;
/* /*
* we do the lookup on the thread group leader, * we do the lookup on the thread group leader,

View File

@ -16,6 +16,17 @@ extern int
seq_print_ip_sym(struct trace_seq *s, unsigned long ip, seq_print_ip_sym(struct trace_seq *s, unsigned long ip,
unsigned long sym_flags); unsigned long sym_flags);
static inline int seq_print_ip_sym_offset(struct trace_seq *s, unsigned long ip,
unsigned long sym_flags)
{
return seq_print_ip_sym(s, ip, sym_flags | TRACE_ITER(SYM_OFFSET));
}
static inline int seq_print_ip_sym_no_offset(struct trace_seq *s, unsigned long ip,
unsigned long sym_flags)
{
return seq_print_ip_sym(s, ip, sym_flags & ~TRACE_ITER(SYM_OFFSET));
}
extern void trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset); extern void trace_seq_print_sym(struct trace_seq *s, unsigned long address, bool offset);
extern int trace_print_context(struct trace_iterator *iter); extern int trace_print_context(struct trace_iterator *iter);
extern int trace_print_lat_context(struct trace_iterator *iter); extern int trace_print_lat_context(struct trace_iterator *iter);

View File

@ -156,7 +156,7 @@ static const struct fetch_type *find_fetch_type(const char *type, unsigned long
static struct trace_probe_log trace_probe_log; static struct trace_probe_log trace_probe_log;
extern struct mutex dyn_event_ops_mutex; extern struct mutex dyn_event_ops_mutex;
void trace_probe_log_init(const char *subsystem, int argc, const char **argv) const char *trace_probe_log_init(const char *subsystem, int argc, const char **argv)
{ {
lockdep_assert_held(&dyn_event_ops_mutex); lockdep_assert_held(&dyn_event_ops_mutex);
@ -164,6 +164,7 @@ void trace_probe_log_init(const char *subsystem, int argc, const char **argv)
trace_probe_log.argc = argc; trace_probe_log.argc = argc;
trace_probe_log.argv = argv; trace_probe_log.argv = argv;
trace_probe_log.index = 0; trace_probe_log.index = 0;
return subsystem;
} }
void trace_probe_log_clear(void) void trace_probe_log_clear(void)
@ -214,7 +215,7 @@ void __trace_probe_log_err(int offset, int err_type)
p = command; p = command;
for (i = 0; i < trace_probe_log.argc; i++) { for (i = 0; i < trace_probe_log.argc; i++) {
len = strlen(trace_probe_log.argv[i]); len = strlen(trace_probe_log.argv[i]);
strcpy(p, trace_probe_log.argv[i]); memcpy(p, trace_probe_log.argv[i], len);
p[len] = ' '; p[len] = ' ';
p += len + 1; p += len + 1;
} }

View File

@ -578,11 +578,13 @@ struct trace_probe_log {
int index; int index;
}; };
void trace_probe_log_init(const char *subsystem, int argc, const char **argv); const char *trace_probe_log_init(const char *subsystem, int argc, const char **argv);
void trace_probe_log_set_index(int index); void trace_probe_log_set_index(int index);
void trace_probe_log_clear(void); void trace_probe_log_clear(void);
void __trace_probe_log_err(int offset, int err); void __trace_probe_log_err(int offset, int err);
DEFINE_FREE(trace_probe_log_clear, const char *, if (_T) trace_probe_log_clear())
#define trace_probe_log_err(offs, err) \ #define trace_probe_log_err(offs, err) \
__trace_probe_log_err(offs, TP_ERR_##err) __trace_probe_log_err(offs, TP_ERR_##err)

View File

@ -41,7 +41,7 @@ static void stop_func_tracer(struct trace_array *tr, int graph);
static int save_flags; static int save_flags;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
# define is_graph(tr) ((tr)->trace_flags & TRACE_ITER_DISPLAY_GRAPH) # define is_graph(tr) ((tr)->trace_flags & TRACE_ITER(DISPLAY_GRAPH))
#else #else
# define is_graph(tr) false # define is_graph(tr) false
#endif #endif
@ -247,8 +247,8 @@ static int register_wakeup_function(struct trace_array *tr, int graph, int set)
{ {
int ret; int ret;
/* 'set' is set if TRACE_ITER_FUNCTION is about to be set */ /* 'set' is set if TRACE_ITER(FUNCTION) is about to be set */
if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER_FUNCTION))) if (function_enabled || (!set && !(tr->trace_flags & TRACE_ITER(FUNCTION))))
return 0; return 0;
if (graph) if (graph)
@ -277,7 +277,7 @@ static void unregister_wakeup_function(struct trace_array *tr, int graph)
static int wakeup_function_set(struct trace_array *tr, u32 mask, int set) static int wakeup_function_set(struct trace_array *tr, u32 mask, int set)
{ {
if (!(mask & TRACE_ITER_FUNCTION)) if (!(mask & TRACE_ITER(FUNCTION)))
return 0; return 0;
if (set) if (set)
@ -324,7 +324,7 @@ __trace_function(struct trace_array *tr,
trace_function(tr, ip, parent_ip, trace_ctx, NULL); trace_function(tr, ip, parent_ip, trace_ctx, NULL);
} }
static int wakeup_flag_changed(struct trace_array *tr, u32 mask, int set) static int wakeup_flag_changed(struct trace_array *tr, u64 mask, int set)
{ {
struct tracer *tracer = tr->current_trace; struct tracer *tracer = tr->current_trace;
@ -332,7 +332,7 @@ static int wakeup_flag_changed(struct trace_array *tr, u32 mask, int set)
return 0; return 0;
#ifdef CONFIG_FUNCTION_GRAPH_TRACER #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (mask & TRACE_ITER_DISPLAY_GRAPH) if (mask & TRACE_ITER(DISPLAY_GRAPH))
return wakeup_display_graph(tr, set); return wakeup_display_graph(tr, set);
#endif #endif
@ -681,8 +681,8 @@ static int __wakeup_tracer_init(struct trace_array *tr)
save_flags = tr->trace_flags; save_flags = tr->trace_flags;
/* non overwrite screws up the latency tracers */ /* non overwrite screws up the latency tracers */
set_tracer_flag(tr, TRACE_ITER_OVERWRITE, 1); set_tracer_flag(tr, TRACE_ITER(OVERWRITE), 1);
set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, 1); set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), 1);
tr->max_latency = 0; tr->max_latency = 0;
wakeup_trace = tr; wakeup_trace = tr;
@ -725,15 +725,15 @@ static int wakeup_dl_tracer_init(struct trace_array *tr)
static void wakeup_tracer_reset(struct trace_array *tr) static void wakeup_tracer_reset(struct trace_array *tr)
{ {
int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int lat_flag = save_flags & TRACE_ITER(LATENCY_FMT);
int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; int overwrite_flag = save_flags & TRACE_ITER(OVERWRITE);
stop_wakeup_tracer(tr); stop_wakeup_tracer(tr);
/* make sure we put back any tasks we are tracing */ /* make sure we put back any tasks we are tracing */
wakeup_reset(tr); wakeup_reset(tr);
set_tracer_flag(tr, TRACE_ITER_LATENCY_FMT, lat_flag); set_tracer_flag(tr, TRACE_ITER(LATENCY_FMT), lat_flag);
set_tracer_flag(tr, TRACE_ITER_OVERWRITE, overwrite_flag); set_tracer_flag(tr, TRACE_ITER(OVERWRITE), overwrite_flag);
ftrace_reset_array_ops(tr); ftrace_reset_array_ops(tr);
wakeup_busy = false; wakeup_busy = false;
} }

File diff suppressed because it is too large Load Diff

View File

@ -533,21 +533,26 @@ static int register_trace_uprobe(struct trace_uprobe *tu)
return ret; return ret;
} }
DEFINE_FREE(free_trace_uprobe, struct trace_uprobe *, if (_T) free_trace_uprobe(_T))
/* /*
* Argument syntax: * Argument syntax:
* - Add uprobe: p|r[:[GRP/][EVENT]] PATH:OFFSET[%return][(REF)] [FETCHARGS] * - Add uprobe: p|r[:[GRP/][EVENT]] PATH:OFFSET[%return][(REF)] [FETCHARGS]
*/ */
static int __trace_uprobe_create(int argc, const char **argv) static int __trace_uprobe_create(int argc, const char **argv)
{ {
struct traceprobe_parse_context *ctx __free(traceprobe_parse_context) = NULL;
struct trace_uprobe *tu __free(free_trace_uprobe) = NULL;
const char *trlog __free(trace_probe_log_clear) = NULL;
const char *event = NULL, *group = UPROBE_EVENT_SYSTEM; const char *event = NULL, *group = UPROBE_EVENT_SYSTEM;
char *arg, *filename, *rctr, *rctr_end, *tmp; struct path path __free(path_put) = {};
unsigned long offset, ref_ctr_offset; unsigned long offset, ref_ctr_offset;
char *filename __free(kfree) = NULL;
char *arg, *rctr, *rctr_end, *tmp;
char *gbuf __free(kfree) = NULL; char *gbuf __free(kfree) = NULL;
char *buf __free(kfree) = NULL; char *buf __free(kfree) = NULL;
enum probe_print_type ptype; enum probe_print_type ptype;
struct trace_uprobe *tu;
bool is_return = false; bool is_return = false;
struct path path;
int i, ret; int i, ret;
ref_ctr_offset = 0; ref_ctr_offset = 0;
@ -565,7 +570,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
if (argc < 2) if (argc < 2)
return -ECANCELED; return -ECANCELED;
trace_probe_log_init("trace_uprobe", argc, argv); trlog = trace_probe_log_init("trace_uprobe", argc, argv);
if (argc - 2 > MAX_TRACE_ARGS) { if (argc - 2 > MAX_TRACE_ARGS) {
trace_probe_log_set_index(2); trace_probe_log_set_index(2);
@ -585,10 +590,8 @@ static int __trace_uprobe_create(int argc, const char **argv)
/* Find the last occurrence, in case the path contains ':' too. */ /* Find the last occurrence, in case the path contains ':' too. */
arg = strrchr(filename, ':'); arg = strrchr(filename, ':');
if (!arg || !isdigit(arg[1])) { if (!arg || !isdigit(arg[1]))
kfree(filename);
return -ECANCELED; return -ECANCELED;
}
trace_probe_log_set_index(1); /* filename is the 2nd argument */ trace_probe_log_set_index(1); /* filename is the 2nd argument */
@ -596,14 +599,11 @@ static int __trace_uprobe_create(int argc, const char **argv)
ret = kern_path(filename, LOOKUP_FOLLOW, &path); ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret) { if (ret) {
trace_probe_log_err(0, FILE_NOT_FOUND); trace_probe_log_err(0, FILE_NOT_FOUND);
kfree(filename);
trace_probe_log_clear();
return ret; return ret;
} }
if (!d_is_reg(path.dentry)) { if (!d_is_reg(path.dentry)) {
trace_probe_log_err(0, NO_REGULAR_FILE); trace_probe_log_err(0, NO_REGULAR_FILE);
ret = -EINVAL; return -EINVAL;
goto fail_address_parse;
} }
/* Parse reference counter offset if specified. */ /* Parse reference counter offset if specified. */
@ -611,16 +611,14 @@ static int __trace_uprobe_create(int argc, const char **argv)
if (rctr) { if (rctr) {
rctr_end = strchr(rctr, ')'); rctr_end = strchr(rctr, ')');
if (!rctr_end) { if (!rctr_end) {
ret = -EINVAL;
rctr_end = rctr + strlen(rctr); rctr_end = rctr + strlen(rctr);
trace_probe_log_err(rctr_end - filename, trace_probe_log_err(rctr_end - filename,
REFCNT_OPEN_BRACE); REFCNT_OPEN_BRACE);
goto fail_address_parse; return -EINVAL;
} else if (rctr_end[1] != '\0') { } else if (rctr_end[1] != '\0') {
ret = -EINVAL;
trace_probe_log_err(rctr_end + 1 - filename, trace_probe_log_err(rctr_end + 1 - filename,
BAD_REFCNT_SUFFIX); BAD_REFCNT_SUFFIX);
goto fail_address_parse; return -EINVAL;
} }
*rctr++ = '\0'; *rctr++ = '\0';
@ -628,7 +626,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
ret = kstrtoul(rctr, 0, &ref_ctr_offset); ret = kstrtoul(rctr, 0, &ref_ctr_offset);
if (ret) { if (ret) {
trace_probe_log_err(rctr - filename, BAD_REFCNT); trace_probe_log_err(rctr - filename, BAD_REFCNT);
goto fail_address_parse; return ret;
} }
} }
@ -640,8 +638,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
is_return = true; is_return = true;
} else { } else {
trace_probe_log_err(tmp - filename, BAD_ADDR_SUFFIX); trace_probe_log_err(tmp - filename, BAD_ADDR_SUFFIX);
ret = -EINVAL; return -EINVAL;
goto fail_address_parse;
} }
} }
@ -649,7 +646,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
ret = kstrtoul(arg, 0, &offset); ret = kstrtoul(arg, 0, &offset);
if (ret) { if (ret) {
trace_probe_log_err(arg - filename, BAD_UPROBE_OFFS); trace_probe_log_err(arg - filename, BAD_UPROBE_OFFS);
goto fail_address_parse; return ret;
} }
/* setup a probe */ /* setup a probe */
@ -657,12 +654,12 @@ static int __trace_uprobe_create(int argc, const char **argv)
if (event) { if (event) {
gbuf = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL); gbuf = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL);
if (!gbuf) if (!gbuf)
goto fail_mem; return -ENOMEM;
ret = traceprobe_parse_event_name(&event, &group, gbuf, ret = traceprobe_parse_event_name(&event, &group, gbuf,
event - argv[0]); event - argv[0]);
if (ret) if (ret)
goto fail_address_parse; return ret;
} }
if (!event) { if (!event) {
@ -671,7 +668,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
tail = kstrdup(kbasename(filename), GFP_KERNEL); tail = kstrdup(kbasename(filename), GFP_KERNEL);
if (!tail) if (!tail)
goto fail_mem; return -ENOMEM;
ptr = strpbrk(tail, ".-_"); ptr = strpbrk(tail, ".-_");
if (ptr) if (ptr)
@ -679,7 +676,7 @@ static int __trace_uprobe_create(int argc, const char **argv)
buf = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL); buf = kmalloc(MAX_EVENT_NAME_LEN, GFP_KERNEL);
if (!buf) if (!buf)
goto fail_mem; return -ENOMEM;
snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset); snprintf(buf, MAX_EVENT_NAME_LEN, "%c_%s_0x%lx", 'p', tail, offset);
event = buf; event = buf;
kfree(tail); kfree(tail);
@ -693,51 +690,36 @@ static int __trace_uprobe_create(int argc, const char **argv)
ret = PTR_ERR(tu); ret = PTR_ERR(tu);
/* This must return -ENOMEM otherwise there is a bug */ /* This must return -ENOMEM otherwise there is a bug */
WARN_ON_ONCE(ret != -ENOMEM); WARN_ON_ONCE(ret != -ENOMEM);
goto fail_address_parse; return ret;
} }
tu->offset = offset; tu->offset = offset;
tu->ref_ctr_offset = ref_ctr_offset; tu->ref_ctr_offset = ref_ctr_offset;
tu->path = path; tu->path = path;
tu->filename = filename; /* Clear @path so that it will not freed by path_put() */
memset(&path, 0, sizeof(path));
tu->filename = no_free_ptr(filename);
ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx)
return -ENOMEM;
ctx->flags = (is_return ? TPARG_FL_RETURN : 0) | TPARG_FL_USER;
/* parse arguments */ /* parse arguments */
for (i = 0; i < argc; i++) { for (i = 0; i < argc; i++) {
struct traceprobe_parse_context *ctx __free(traceprobe_parse_context)
= kzalloc(sizeof(*ctx), GFP_KERNEL);
if (!ctx) {
ret = -ENOMEM;
goto error;
}
ctx->flags = (is_return ? TPARG_FL_RETURN : 0) | TPARG_FL_USER;
trace_probe_log_set_index(i + 2); trace_probe_log_set_index(i + 2);
ret = traceprobe_parse_probe_arg(&tu->tp, i, argv[i], ctx); ret = traceprobe_parse_probe_arg(&tu->tp, i, argv[i], ctx);
if (ret) if (ret)
goto error; return ret;
} }
ptype = is_ret_probe(tu) ? PROBE_PRINT_RETURN : PROBE_PRINT_NORMAL; ptype = is_ret_probe(tu) ? PROBE_PRINT_RETURN : PROBE_PRINT_NORMAL;
ret = traceprobe_set_print_fmt(&tu->tp, ptype); ret = traceprobe_set_print_fmt(&tu->tp, ptype);
if (ret < 0) if (ret < 0)
goto error; return ret;
ret = register_trace_uprobe(tu); ret = register_trace_uprobe(tu);
if (!ret) if (!ret)
goto out; tu = NULL;
error:
free_trace_uprobe(tu);
out:
trace_probe_log_clear();
return ret;
fail_mem:
ret = -ENOMEM;
fail_address_parse:
trace_probe_log_clear();
path_put(&path);
kfree(filename);
return ret; return ret;
} }

View File

@ -12,7 +12,8 @@
static struct kunit *current_test; static struct kunit *current_test;
static u32 rand1, entry_val, exit_val; static u32 rand1, entry_only_val, entry_val, exit_val;
static u32 entry_only_count, entry_count, exit_count;
/* Use indirect calls to avoid inlining the target functions */ /* Use indirect calls to avoid inlining the target functions */
static u32 (*target)(u32 value); static u32 (*target)(u32 value);
@ -190,6 +191,101 @@ static void test_fprobe_skip(struct kunit *test)
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp)); KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp));
} }
/* Handler for fprobe entry only case */
static notrace int entry_only_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
struct ftrace_regs *fregs, void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
KUNIT_EXPECT_EQ(current_test, ip, target_ip);
entry_only_count++;
entry_only_val = (rand1 / div_factor);
return 0;
}
static notrace int fprobe_entry_multi_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
struct ftrace_regs *fregs,
void *data)
{
KUNIT_EXPECT_FALSE(current_test, preemptible());
KUNIT_EXPECT_EQ(current_test, ip, target_ip);
entry_count++;
entry_val = (rand1 / div_factor);
return 0;
}
static notrace void fprobe_exit_multi_handler(struct fprobe *fp, unsigned long ip,
unsigned long ret_ip,
struct ftrace_regs *fregs,
void *data)
{
unsigned long ret = ftrace_regs_get_return_value(fregs);
KUNIT_EXPECT_FALSE(current_test, preemptible());
KUNIT_EXPECT_EQ(current_test, ip, target_ip);
KUNIT_EXPECT_EQ(current_test, ret, (rand1 / div_factor));
exit_count++;
exit_val = ret;
}
static void check_fprobe_multi(struct kunit *test)
{
entry_only_count = entry_count = exit_count = 0;
entry_only_val = entry_val = exit_val = 0;
target(rand1);
/* Verify all handlers were called */
KUNIT_EXPECT_EQ(test, 1, entry_only_count);
KUNIT_EXPECT_EQ(test, 1, entry_count);
KUNIT_EXPECT_EQ(test, 1, exit_count);
/* Verify values are correct */
KUNIT_EXPECT_EQ(test, (rand1 / div_factor), entry_only_val);
KUNIT_EXPECT_EQ(test, (rand1 / div_factor), entry_val);
KUNIT_EXPECT_EQ(test, (rand1 / div_factor), exit_val);
}
/* Test multiple fprobes hooking the same target function */
static void test_fprobe_multi(struct kunit *test)
{
struct fprobe fp1 = {
.entry_handler = fprobe_entry_multi_handler,
.exit_handler = fprobe_exit_multi_handler,
};
struct fprobe fp2 = {
.entry_handler = entry_only_handler,
};
current_test = test;
/* Test Case 1: Register in order 1 -> 2 */
KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp1, "fprobe_selftest_target", NULL));
KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp2, "fprobe_selftest_target", NULL));
check_fprobe_multi(test);
/* Unregister all */
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp1));
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp2));
/* Test Case 2: Register in order 2 -> 1 */
KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp2, "fprobe_selftest_target", NULL));
KUNIT_EXPECT_EQ(test, 0, register_fprobe(&fp1, "fprobe_selftest_target", NULL));
check_fprobe_multi(test);
/* Unregister all */
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp1));
KUNIT_EXPECT_EQ(test, 0, unregister_fprobe(&fp2));
}
static unsigned long get_ftrace_location(void *func) static unsigned long get_ftrace_location(void *func)
{ {
unsigned long size, addr = (unsigned long)func; unsigned long size, addr = (unsigned long)func;
@ -217,6 +313,7 @@ static struct kunit_case fprobe_testcases[] = {
KUNIT_CASE(test_fprobe_syms), KUNIT_CASE(test_fprobe_syms),
KUNIT_CASE(test_fprobe_data), KUNIT_CASE(test_fprobe_data),
KUNIT_CASE(test_fprobe_skip), KUNIT_CASE(test_fprobe_skip),
KUNIT_CASE(test_fprobe_multi),
{} {}
}; };

1
scripts/.gitignore vendored
View File

@ -11,4 +11,5 @@
/sign-file /sign-file
/sorttable /sorttable
/target.json /target.json
/tracepoint-update
/unifdef /unifdef

View File

@ -11,6 +11,10 @@ hostprogs-always-$(CONFIG_MODULE_SIG_FORMAT) += sign-file
hostprogs-always-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert hostprogs-always-$(CONFIG_SYSTEM_EXTRA_CERTIFICATE) += insert-sys-cert
hostprogs-always-$(CONFIG_RUST_KERNEL_DOCTESTS) += rustdoc_test_builder hostprogs-always-$(CONFIG_RUST_KERNEL_DOCTESTS) += rustdoc_test_builder
hostprogs-always-$(CONFIG_RUST_KERNEL_DOCTESTS) += rustdoc_test_gen hostprogs-always-$(CONFIG_RUST_KERNEL_DOCTESTS) += rustdoc_test_gen
hostprogs-always-$(CONFIG_TRACEPOINTS) += tracepoint-update
sorttable-objs := sorttable.o elf-parse.o
tracepoint-update-objs := tracepoint-update.o elf-parse.o
ifneq ($(or $(CONFIG_X86_64),$(CONFIG_X86_32)),) ifneq ($(or $(CONFIG_X86_64),$(CONFIG_X86_32)),)
always-$(CONFIG_RUST) += target.json always-$(CONFIG_RUST) += target.json
@ -25,6 +29,8 @@ generate_rust_target-rust := y
rustdoc_test_builder-rust := y rustdoc_test_builder-rust := y
rustdoc_test_gen-rust := y rustdoc_test_gen-rust := y
HOSTCFLAGS_tracepoint-update.o = -I$(srctree)/tools/include
HOSTCFLAGS_elf-parse.o = -I$(srctree)/tools/include
HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include HOSTCFLAGS_sorttable.o = -I$(srctree)/tools/include
HOSTLDLIBS_sorttable = -lpthread HOSTLDLIBS_sorttable = -lpthread
HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include

View File

@ -28,6 +28,10 @@ ccflags-remove-y := $(CC_FLAGS_CFI)
.module-common.o: $(srctree)/scripts/module-common.c FORCE .module-common.o: $(srctree)/scripts/module-common.c FORCE
$(call if_changed_rule,cc_o_c) $(call if_changed_rule,cc_o_c)
ifneq ($(WARN_ON_UNUSED_TRACEPOINTS),)
cmd_check_tracepoint = $(objtree)/scripts/tracepoint-update --module $<;
endif
quiet_cmd_ld_ko_o = LD [M] $@ quiet_cmd_ld_ko_o = LD [M] $@
cmd_ld_ko_o = \ cmd_ld_ko_o = \
$(LD) -r $(KBUILD_LDFLAGS) \ $(LD) -r $(KBUILD_LDFLAGS) \
@ -57,6 +61,7 @@ if_changed_except = $(if $(call newer_prereqs_except,$(2))$(cmd-check), \
ifdef CONFIG_DEBUG_INFO_BTF_MODULES ifdef CONFIG_DEBUG_INFO_BTF_MODULES
+$(if $(newer-prereqs),$(call cmd,btf_ko)) +$(if $(newer-prereqs),$(call cmd,btf_ko))
endif endif
+$(call cmd,check_tracepoint)
targets += $(modules:%.o=%.ko) $(modules:%.o=%.mod.o) .module-common.o targets += $(modules:%.o=%.ko) $(modules:%.o=%.mod.o) .module-common.o

198
scripts/elf-parse.c Normal file
View File

@ -0,0 +1,198 @@
#include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include "elf-parse.h"
struct elf_funcs elf_parser;
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
* avoids copying unused pieces; else just read the whole file.
* Open for both read and write.
*/
static void *map_file(char const *fname, size_t *size)
{
int fd;
struct stat sb;
void *addr = NULL;
fd = open(fname, O_RDWR);
if (fd < 0) {
perror(fname);
return NULL;
}
if (fstat(fd, &sb) < 0) {
perror(fname);
goto out;
}
if (!S_ISREG(sb.st_mode)) {
fprintf(stderr, "not a regular file: %s\n", fname);
goto out;
}
addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
fprintf(stderr, "Could not mmap file: %s\n", fname);
goto out;
}
*size = sb.st_size;
out:
close(fd);
return addr;
}
static int elf_parse(const char *fname, void *addr, uint32_t types)
{
Elf_Ehdr *ehdr = addr;
uint16_t type;
switch (ehdr->e32.e_ident[EI_DATA]) {
case ELFDATA2LSB:
elf_parser.r = rle;
elf_parser.r2 = r2le;
elf_parser.r8 = r8le;
elf_parser.w = wle;
elf_parser.w8 = w8le;
break;
case ELFDATA2MSB:
elf_parser.r = rbe;
elf_parser.r2 = r2be;
elf_parser.r8 = r8be;
elf_parser.w = wbe;
elf_parser.w8 = w8be;
break;
default:
fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
ehdr->e32.e_ident[EI_DATA], fname);
return -1;
}
if (memcmp(ELFMAG, ehdr->e32.e_ident, SELFMAG) != 0 ||
ehdr->e32.e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ELF file %s\n", fname);
return -1;
}
type = elf_parser.r2(&ehdr->e32.e_type);
if (!((1 << type) & types)) {
fprintf(stderr, "Invalid ELF type file %s\n", fname);
return -1;
}
switch (ehdr->e32.e_ident[EI_CLASS]) {
case ELFCLASS32: {
elf_parser.ehdr_shoff = ehdr32_shoff;
elf_parser.ehdr_shentsize = ehdr32_shentsize;
elf_parser.ehdr_shstrndx = ehdr32_shstrndx;
elf_parser.ehdr_shnum = ehdr32_shnum;
elf_parser.shdr_addr = shdr32_addr;
elf_parser.shdr_offset = shdr32_offset;
elf_parser.shdr_link = shdr32_link;
elf_parser.shdr_size = shdr32_size;
elf_parser.shdr_name = shdr32_name;
elf_parser.shdr_type = shdr32_type;
elf_parser.shdr_entsize = shdr32_entsize;
elf_parser.sym_type = sym32_type;
elf_parser.sym_name = sym32_name;
elf_parser.sym_value = sym32_value;
elf_parser.sym_shndx = sym32_shndx;
elf_parser.rela_offset = rela32_offset;
elf_parser.rela_info = rela32_info;
elf_parser.rela_addend = rela32_addend;
elf_parser.rela_write_addend = rela32_write_addend;
if (elf_parser.r2(&ehdr->e32.e_ehsize) != sizeof(Elf32_Ehdr) ||
elf_parser.r2(&ehdr->e32.e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
return -1;
}
}
break;
case ELFCLASS64: {
elf_parser.ehdr_shoff = ehdr64_shoff;
elf_parser.ehdr_shentsize = ehdr64_shentsize;
elf_parser.ehdr_shstrndx = ehdr64_shstrndx;
elf_parser.ehdr_shnum = ehdr64_shnum;
elf_parser.shdr_addr = shdr64_addr;
elf_parser.shdr_offset = shdr64_offset;
elf_parser.shdr_link = shdr64_link;
elf_parser.shdr_size = shdr64_size;
elf_parser.shdr_name = shdr64_name;
elf_parser.shdr_type = shdr64_type;
elf_parser.shdr_entsize = shdr64_entsize;
elf_parser.sym_type = sym64_type;
elf_parser.sym_name = sym64_name;
elf_parser.sym_value = sym64_value;
elf_parser.sym_shndx = sym64_shndx;
elf_parser.rela_offset = rela64_offset;
elf_parser.rela_info = rela64_info;
elf_parser.rela_addend = rela64_addend;
elf_parser.rela_write_addend = rela64_write_addend;
if (elf_parser.r2(&ehdr->e64.e_ehsize) != sizeof(Elf64_Ehdr) ||
elf_parser.r2(&ehdr->e64.e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n",
fname);
return -1;
}
}
break;
default:
fprintf(stderr, "unrecognized ELF class %d %s\n",
ehdr->e32.e_ident[EI_CLASS], fname);
return -1;
}
return 0;
}
int elf_map_machine(void *addr)
{
Elf_Ehdr *ehdr = addr;
return elf_parser.r2(&ehdr->e32.e_machine);
}
int elf_map_long_size(void *addr)
{
Elf_Ehdr *ehdr = addr;
return ehdr->e32.e_ident[EI_CLASS] == ELFCLASS32 ? 4 : 8;
}
void *elf_map(char const *fname, size_t *size, uint32_t types)
{
void *addr;
int ret;
addr = map_file(fname, size);
if (!addr)
return NULL;
ret = elf_parse(fname, addr, types);
if (ret < 0) {
elf_unmap(addr, *size);
return NULL;
}
return addr;
}
void elf_unmap(void *addr, size_t size)
{
munmap(addr, size);
}

305
scripts/elf-parse.h Normal file
View File

@ -0,0 +1,305 @@
/* SPDX-License-Identifier: GPL-2.0-only */
#ifndef _SCRIPTS_ELF_PARSE_H
#define _SCRIPTS_ELF_PARSE_H
#include <elf.h>
#include <tools/be_byteshift.h>
#include <tools/le_byteshift.h>
typedef union {
Elf32_Ehdr e32;
Elf64_Ehdr e64;
} Elf_Ehdr;
typedef union {
Elf32_Shdr e32;
Elf64_Shdr e64;
} Elf_Shdr;
typedef union {
Elf32_Sym e32;
Elf64_Sym e64;
} Elf_Sym;
typedef union {
Elf32_Rela e32;
Elf64_Rela e64;
} Elf_Rela;
struct elf_funcs {
int (*compare_extable)(const void *a, const void *b);
uint64_t (*ehdr_shoff)(Elf_Ehdr *ehdr);
uint16_t (*ehdr_shstrndx)(Elf_Ehdr *ehdr);
uint16_t (*ehdr_shentsize)(Elf_Ehdr *ehdr);
uint16_t (*ehdr_shnum)(Elf_Ehdr *ehdr);
uint64_t (*shdr_addr)(Elf_Shdr *shdr);
uint64_t (*shdr_offset)(Elf_Shdr *shdr);
uint64_t (*shdr_size)(Elf_Shdr *shdr);
uint64_t (*shdr_entsize)(Elf_Shdr *shdr);
uint32_t (*shdr_link)(Elf_Shdr *shdr);
uint32_t (*shdr_name)(Elf_Shdr *shdr);
uint32_t (*shdr_type)(Elf_Shdr *shdr);
uint8_t (*sym_type)(Elf_Sym *sym);
uint32_t (*sym_name)(Elf_Sym *sym);
uint64_t (*sym_value)(Elf_Sym *sym);
uint16_t (*sym_shndx)(Elf_Sym *sym);
uint64_t (*rela_offset)(Elf_Rela *rela);
uint64_t (*rela_info)(Elf_Rela *rela);
uint64_t (*rela_addend)(Elf_Rela *rela);
void (*rela_write_addend)(Elf_Rela *rela, uint64_t val);
uint32_t (*r)(const uint32_t *);
uint16_t (*r2)(const uint16_t *);
uint64_t (*r8)(const uint64_t *);
void (*w)(uint32_t, uint32_t *);
void (*w8)(uint64_t, uint64_t *);
};
extern struct elf_funcs elf_parser;
static inline uint64_t ehdr64_shoff(Elf_Ehdr *ehdr)
{
return elf_parser.r8(&ehdr->e64.e_shoff);
}
static inline uint64_t ehdr32_shoff(Elf_Ehdr *ehdr)
{
return elf_parser.r(&ehdr->e32.e_shoff);
}
static inline uint64_t ehdr_shoff(Elf_Ehdr *ehdr)
{
return elf_parser.ehdr_shoff(ehdr);
}
#define EHDR_HALF(fn_name) \
static inline uint16_t ehdr64_##fn_name(Elf_Ehdr *ehdr) \
{ \
return elf_parser.r2(&ehdr->e64.e_##fn_name); \
} \
\
static inline uint16_t ehdr32_##fn_name(Elf_Ehdr *ehdr) \
{ \
return elf_parser.r2(&ehdr->e32.e_##fn_name); \
} \
\
static inline uint16_t ehdr_##fn_name(Elf_Ehdr *ehdr) \
{ \
return elf_parser.ehdr_##fn_name(ehdr); \
}
EHDR_HALF(shentsize)
EHDR_HALF(shstrndx)
EHDR_HALF(shnum)
#define SHDR_WORD(fn_name) \
static inline uint32_t shdr64_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.r(&shdr->e64.sh_##fn_name); \
} \
\
static inline uint32_t shdr32_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.r(&shdr->e32.sh_##fn_name); \
} \
\
static inline uint32_t shdr_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.shdr_##fn_name(shdr); \
}
#define SHDR_ADDR(fn_name) \
static inline uint64_t shdr64_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.r8(&shdr->e64.sh_##fn_name); \
} \
\
static inline uint64_t shdr32_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.r(&shdr->e32.sh_##fn_name); \
} \
\
static inline uint64_t shdr_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.shdr_##fn_name(shdr); \
}
#define SHDR_WORD(fn_name) \
static inline uint32_t shdr64_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.r(&shdr->e64.sh_##fn_name); \
} \
\
static inline uint32_t shdr32_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.r(&shdr->e32.sh_##fn_name); \
} \
static inline uint32_t shdr_##fn_name(Elf_Shdr *shdr) \
{ \
return elf_parser.shdr_##fn_name(shdr); \
}
SHDR_ADDR(addr)
SHDR_ADDR(offset)
SHDR_ADDR(size)
SHDR_ADDR(entsize)
SHDR_WORD(link)
SHDR_WORD(name)
SHDR_WORD(type)
#define SYM_ADDR(fn_name) \
static inline uint64_t sym64_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.r8(&sym->e64.st_##fn_name); \
} \
\
static inline uint64_t sym32_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.r(&sym->e32.st_##fn_name); \
} \
\
static inline uint64_t sym_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.sym_##fn_name(sym); \
}
#define SYM_WORD(fn_name) \
static inline uint32_t sym64_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.r(&sym->e64.st_##fn_name); \
} \
\
static inline uint32_t sym32_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.r(&sym->e32.st_##fn_name); \
} \
\
static inline uint32_t sym_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.sym_##fn_name(sym); \
}
#define SYM_HALF(fn_name) \
static inline uint16_t sym64_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.r2(&sym->e64.st_##fn_name); \
} \
\
static inline uint16_t sym32_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.r2(&sym->e32.st_##fn_name); \
} \
\
static inline uint16_t sym_##fn_name(Elf_Sym *sym) \
{ \
return elf_parser.sym_##fn_name(sym); \
}
static inline uint8_t sym64_type(Elf_Sym *sym)
{
return ELF64_ST_TYPE(sym->e64.st_info);
}
static inline uint8_t sym32_type(Elf_Sym *sym)
{
return ELF32_ST_TYPE(sym->e32.st_info);
}
static inline uint8_t sym_type(Elf_Sym *sym)
{
return elf_parser.sym_type(sym);
}
SYM_ADDR(value)
SYM_WORD(name)
SYM_HALF(shndx)
#define __maybe_unused __attribute__((__unused__))
#define RELA_ADDR(fn_name) \
static inline uint64_t rela64_##fn_name(Elf_Rela *rela) \
{ \
return elf_parser.r8((uint64_t *)&rela->e64.r_##fn_name); \
} \
\
static inline uint64_t rela32_##fn_name(Elf_Rela *rela) \
{ \
return elf_parser.r((uint32_t *)&rela->e32.r_##fn_name); \
} \
\
static inline uint64_t __maybe_unused rela_##fn_name(Elf_Rela *rela) \
{ \
return elf_parser.rela_##fn_name(rela); \
}
RELA_ADDR(offset)
RELA_ADDR(info)
RELA_ADDR(addend)
static inline void rela64_write_addend(Elf_Rela *rela, uint64_t val)
{
elf_parser.w8(val, (uint64_t *)&rela->e64.r_addend);
}
static inline void rela32_write_addend(Elf_Rela *rela, uint64_t val)
{
elf_parser.w(val, (uint32_t *)&rela->e32.r_addend);
}
static inline uint32_t rbe(const uint32_t *x)
{
return get_unaligned_be32(x);
}
static inline uint16_t r2be(const uint16_t *x)
{
return get_unaligned_be16(x);
}
static inline uint64_t r8be(const uint64_t *x)
{
return get_unaligned_be64(x);
}
static inline uint32_t rle(const uint32_t *x)
{
return get_unaligned_le32(x);
}
static inline uint16_t r2le(const uint16_t *x)
{
return get_unaligned_le16(x);
}
static inline uint64_t r8le(const uint64_t *x)
{
return get_unaligned_le64(x);
}
static inline void wbe(uint32_t val, uint32_t *x)
{
put_unaligned_be32(val, x);
}
static inline void wle(uint32_t val, uint32_t *x)
{
put_unaligned_le32(val, x);
}
static inline void w8be(uint64_t val, uint64_t *x)
{
put_unaligned_be64(val, x);
}
static inline void w8le(uint64_t val, uint64_t *x)
{
put_unaligned_le64(val, x);
}
void *elf_map(char const *fname, size_t *size, uint32_t types);
void elf_unmap(void *addr, size_t size);
int elf_map_machine(void *addr);
int elf_map_long_size(void *addr);
#endif /* _SCRIPTS_ELF_PARSE_H */

View File

@ -209,6 +209,13 @@ kallsymso=
strip_debug= strip_debug=
generate_map= generate_map=
# Use "make UT=1" to trigger warnings on unused tracepoints
case "${WARN_ON_UNUSED_TRACEPOINTS}" in
*1*)
${objtree}/scripts/tracepoint-update vmlinux.o
;;
esac
if is_enabled CONFIG_KALLSYMS; then if is_enabled CONFIG_KALLSYMS; then
true > .tmp_vmlinux0.syms true > .tmp_vmlinux0.syms
kallsyms .tmp_vmlinux0.syms .tmp_vmlinux0.kallsyms kallsyms .tmp_vmlinux0.syms .tmp_vmlinux0.kallsyms

View File

@ -21,10 +21,8 @@
*/ */
#include <sys/types.h> #include <sys/types.h>
#include <sys/mman.h>
#include <sys/stat.h> #include <sys/stat.h>
#include <getopt.h> #include <getopt.h>
#include <elf.h>
#include <fcntl.h> #include <fcntl.h>
#include <stdio.h> #include <stdio.h>
#include <stdlib.h> #include <stdlib.h>
@ -34,8 +32,7 @@
#include <errno.h> #include <errno.h>
#include <pthread.h> #include <pthread.h>
#include <tools/be_byteshift.h> #include "elf-parse.h"
#include <tools/le_byteshift.h>
#ifndef EM_ARCOMPACT #ifndef EM_ARCOMPACT
#define EM_ARCOMPACT 93 #define EM_ARCOMPACT 93
@ -65,335 +62,8 @@
#define EM_LOONGARCH 258 #define EM_LOONGARCH 258
#endif #endif
typedef union {
Elf32_Ehdr e32;
Elf64_Ehdr e64;
} Elf_Ehdr;
typedef union {
Elf32_Shdr e32;
Elf64_Shdr e64;
} Elf_Shdr;
typedef union {
Elf32_Sym e32;
Elf64_Sym e64;
} Elf_Sym;
typedef union {
Elf32_Rela e32;
Elf64_Rela e64;
} Elf_Rela;
static uint32_t (*r)(const uint32_t *);
static uint16_t (*r2)(const uint16_t *);
static uint64_t (*r8)(const uint64_t *);
static void (*w)(uint32_t, uint32_t *);
static void (*w8)(uint64_t, uint64_t *);
typedef void (*table_sort_t)(char *, int); typedef void (*table_sort_t)(char *, int);
static struct elf_funcs {
int (*compare_extable)(const void *a, const void *b);
uint64_t (*ehdr_shoff)(Elf_Ehdr *ehdr);
uint16_t (*ehdr_shstrndx)(Elf_Ehdr *ehdr);
uint16_t (*ehdr_shentsize)(Elf_Ehdr *ehdr);
uint16_t (*ehdr_shnum)(Elf_Ehdr *ehdr);
uint64_t (*shdr_addr)(Elf_Shdr *shdr);
uint64_t (*shdr_offset)(Elf_Shdr *shdr);
uint64_t (*shdr_size)(Elf_Shdr *shdr);
uint64_t (*shdr_entsize)(Elf_Shdr *shdr);
uint32_t (*shdr_link)(Elf_Shdr *shdr);
uint32_t (*shdr_name)(Elf_Shdr *shdr);
uint32_t (*shdr_type)(Elf_Shdr *shdr);
uint8_t (*sym_type)(Elf_Sym *sym);
uint32_t (*sym_name)(Elf_Sym *sym);
uint64_t (*sym_value)(Elf_Sym *sym);
uint16_t (*sym_shndx)(Elf_Sym *sym);
uint64_t (*rela_offset)(Elf_Rela *rela);
uint64_t (*rela_info)(Elf_Rela *rela);
uint64_t (*rela_addend)(Elf_Rela *rela);
void (*rela_write_addend)(Elf_Rela *rela, uint64_t val);
} e;
static uint64_t ehdr64_shoff(Elf_Ehdr *ehdr)
{
return r8(&ehdr->e64.e_shoff);
}
static uint64_t ehdr32_shoff(Elf_Ehdr *ehdr)
{
return r(&ehdr->e32.e_shoff);
}
static uint64_t ehdr_shoff(Elf_Ehdr *ehdr)
{
return e.ehdr_shoff(ehdr);
}
#define EHDR_HALF(fn_name) \
static uint16_t ehdr64_##fn_name(Elf_Ehdr *ehdr) \
{ \
return r2(&ehdr->e64.e_##fn_name); \
} \
\
static uint16_t ehdr32_##fn_name(Elf_Ehdr *ehdr) \
{ \
return r2(&ehdr->e32.e_##fn_name); \
} \
\
static uint16_t ehdr_##fn_name(Elf_Ehdr *ehdr) \
{ \
return e.ehdr_##fn_name(ehdr); \
}
EHDR_HALF(shentsize)
EHDR_HALF(shstrndx)
EHDR_HALF(shnum)
#define SHDR_WORD(fn_name) \
static uint32_t shdr64_##fn_name(Elf_Shdr *shdr) \
{ \
return r(&shdr->e64.sh_##fn_name); \
} \
\
static uint32_t shdr32_##fn_name(Elf_Shdr *shdr) \
{ \
return r(&shdr->e32.sh_##fn_name); \
} \
\
static uint32_t shdr_##fn_name(Elf_Shdr *shdr) \
{ \
return e.shdr_##fn_name(shdr); \
}
#define SHDR_ADDR(fn_name) \
static uint64_t shdr64_##fn_name(Elf_Shdr *shdr) \
{ \
return r8(&shdr->e64.sh_##fn_name); \
} \
\
static uint64_t shdr32_##fn_name(Elf_Shdr *shdr) \
{ \
return r(&shdr->e32.sh_##fn_name); \
} \
\
static uint64_t shdr_##fn_name(Elf_Shdr *shdr) \
{ \
return e.shdr_##fn_name(shdr); \
}
#define SHDR_WORD(fn_name) \
static uint32_t shdr64_##fn_name(Elf_Shdr *shdr) \
{ \
return r(&shdr->e64.sh_##fn_name); \
} \
\
static uint32_t shdr32_##fn_name(Elf_Shdr *shdr) \
{ \
return r(&shdr->e32.sh_##fn_name); \
} \
static uint32_t shdr_##fn_name(Elf_Shdr *shdr) \
{ \
return e.shdr_##fn_name(shdr); \
}
SHDR_ADDR(addr)
SHDR_ADDR(offset)
SHDR_ADDR(size)
SHDR_ADDR(entsize)
SHDR_WORD(link)
SHDR_WORD(name)
SHDR_WORD(type)
#define SYM_ADDR(fn_name) \
static uint64_t sym64_##fn_name(Elf_Sym *sym) \
{ \
return r8(&sym->e64.st_##fn_name); \
} \
\
static uint64_t sym32_##fn_name(Elf_Sym *sym) \
{ \
return r(&sym->e32.st_##fn_name); \
} \
\
static uint64_t sym_##fn_name(Elf_Sym *sym) \
{ \
return e.sym_##fn_name(sym); \
}
#define SYM_WORD(fn_name) \
static uint32_t sym64_##fn_name(Elf_Sym *sym) \
{ \
return r(&sym->e64.st_##fn_name); \
} \
\
static uint32_t sym32_##fn_name(Elf_Sym *sym) \
{ \
return r(&sym->e32.st_##fn_name); \
} \
\
static uint32_t sym_##fn_name(Elf_Sym *sym) \
{ \
return e.sym_##fn_name(sym); \
}
#define SYM_HALF(fn_name) \
static uint16_t sym64_##fn_name(Elf_Sym *sym) \
{ \
return r2(&sym->e64.st_##fn_name); \
} \
\
static uint16_t sym32_##fn_name(Elf_Sym *sym) \
{ \
return r2(&sym->e32.st_##fn_name); \
} \
\
static uint16_t sym_##fn_name(Elf_Sym *sym) \
{ \
return e.sym_##fn_name(sym); \
}
static uint8_t sym64_type(Elf_Sym *sym)
{
return ELF64_ST_TYPE(sym->e64.st_info);
}
static uint8_t sym32_type(Elf_Sym *sym)
{
return ELF32_ST_TYPE(sym->e32.st_info);
}
static uint8_t sym_type(Elf_Sym *sym)
{
return e.sym_type(sym);
}
SYM_ADDR(value)
SYM_WORD(name)
SYM_HALF(shndx)
#define __maybe_unused __attribute__((__unused__))
#define RELA_ADDR(fn_name) \
static uint64_t rela64_##fn_name(Elf_Rela *rela) \
{ \
return r8((uint64_t *)&rela->e64.r_##fn_name); \
} \
\
static uint64_t rela32_##fn_name(Elf_Rela *rela) \
{ \
return r((uint32_t *)&rela->e32.r_##fn_name); \
} \
\
static uint64_t __maybe_unused rela_##fn_name(Elf_Rela *rela) \
{ \
return e.rela_##fn_name(rela); \
}
RELA_ADDR(offset)
RELA_ADDR(info)
RELA_ADDR(addend)
static void rela64_write_addend(Elf_Rela *rela, uint64_t val)
{
w8(val, (uint64_t *)&rela->e64.r_addend);
}
static void rela32_write_addend(Elf_Rela *rela, uint64_t val)
{
w(val, (uint32_t *)&rela->e32.r_addend);
}
/*
* Get the whole file as a programming convenience in order to avoid
* malloc+lseek+read+free of many pieces. If successful, then mmap
* avoids copying unused pieces; else just read the whole file.
* Open for both read and write.
*/
static void *mmap_file(char const *fname, size_t *size)
{
int fd;
struct stat sb;
void *addr = NULL;
fd = open(fname, O_RDWR);
if (fd < 0) {
perror(fname);
return NULL;
}
if (fstat(fd, &sb) < 0) {
perror(fname);
goto out;
}
if (!S_ISREG(sb.st_mode)) {
fprintf(stderr, "not a regular file: %s\n", fname);
goto out;
}
addr = mmap(0, sb.st_size, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
if (addr == MAP_FAILED) {
fprintf(stderr, "Could not mmap file: %s\n", fname);
goto out;
}
*size = sb.st_size;
out:
close(fd);
return addr;
}
static uint32_t rbe(const uint32_t *x)
{
return get_unaligned_be32(x);
}
static uint16_t r2be(const uint16_t *x)
{
return get_unaligned_be16(x);
}
static uint64_t r8be(const uint64_t *x)
{
return get_unaligned_be64(x);
}
static uint32_t rle(const uint32_t *x)
{
return get_unaligned_le32(x);
}
static uint16_t r2le(const uint16_t *x)
{
return get_unaligned_le16(x);
}
static uint64_t r8le(const uint64_t *x)
{
return get_unaligned_le64(x);
}
static void wbe(uint32_t val, uint32_t *x)
{
put_unaligned_be32(val, x);
}
static void wle(uint32_t val, uint32_t *x)
{
put_unaligned_le32(val, x);
}
static void w8be(uint64_t val, uint64_t *x)
{
put_unaligned_be64(val, x);
}
static void w8le(uint64_t val, uint64_t *x)
{
put_unaligned_le64(val, x);
}
/* /*
* Move reserved section indices SHN_LORESERVE..SHN_HIRESERVE out of * Move reserved section indices SHN_LORESERVE..SHN_HIRESERVE out of
* the way to -256..-1, to avoid conflicting with real section * the way to -256..-1, to avoid conflicting with real section
@ -415,13 +85,13 @@ static inline unsigned int get_secindex(unsigned int shndx,
return SPECIAL(shndx); return SPECIAL(shndx);
if (shndx != SHN_XINDEX) if (shndx != SHN_XINDEX)
return shndx; return shndx;
return r(&symtab_shndx_start[sym_offs]); return elf_parser.r(&symtab_shndx_start[sym_offs]);
} }
static int compare_extable_32(const void *a, const void *b) static int compare_extable_32(const void *a, const void *b)
{ {
Elf32_Addr av = r(a); Elf32_Addr av = elf_parser.r(a);
Elf32_Addr bv = r(b); Elf32_Addr bv = elf_parser.r(b);
if (av < bv) if (av < bv)
return -1; return -1;
@ -430,18 +100,15 @@ static int compare_extable_32(const void *a, const void *b)
static int compare_extable_64(const void *a, const void *b) static int compare_extable_64(const void *a, const void *b)
{ {
Elf64_Addr av = r8(a); Elf64_Addr av = elf_parser.r8(a);
Elf64_Addr bv = r8(b); Elf64_Addr bv = elf_parser.r8(b);
if (av < bv) if (av < bv)
return -1; return -1;
return av > bv; return av > bv;
} }
static int compare_extable(const void *a, const void *b) static int (*compare_extable)(const void *a, const void *b);
{
return e.compare_extable(a, b);
}
static inline void *get_index(void *start, int entsize, int index) static inline void *get_index(void *start, int entsize, int index)
{ {
@ -577,7 +244,7 @@ static int (*compare_values)(const void *a, const void *b);
/* Only used for sorting mcount table */ /* Only used for sorting mcount table */
static void rela_write_addend(Elf_Rela *rela, uint64_t val) static void rela_write_addend(Elf_Rela *rela, uint64_t val)
{ {
e.rela_write_addend(rela, val); elf_parser.rela_write_addend(rela, val);
} }
struct func_info { struct func_info {
@ -792,9 +459,9 @@ static int fill_addrs(void *ptr, uint64_t size, void *addrs)
for (; ptr < end; ptr += long_size, addrs += long_size, count++) { for (; ptr < end; ptr += long_size, addrs += long_size, count++) {
if (long_size == 4) if (long_size == 4)
*(uint32_t *)ptr = r(addrs); *(uint32_t *)ptr = elf_parser.r(addrs);
else else
*(uint64_t *)ptr = r8(addrs); *(uint64_t *)ptr = elf_parser.r8(addrs);
} }
return count; return count;
} }
@ -805,9 +472,9 @@ static void replace_addrs(void *ptr, uint64_t size, void *addrs)
for (; ptr < end; ptr += long_size, addrs += long_size) { for (; ptr < end; ptr += long_size, addrs += long_size) {
if (long_size == 4) if (long_size == 4)
w(*(uint32_t *)ptr, addrs); elf_parser.w(*(uint32_t *)ptr, addrs);
else else
w8(*(uint64_t *)ptr, addrs); elf_parser.w8(*(uint64_t *)ptr, addrs);
} }
} }
@ -1111,7 +778,7 @@ static int do_sort(Elf_Ehdr *ehdr,
sym_value(sort_needed_sym) - shdr_addr(sort_needed_sec); sym_value(sort_needed_sym) - shdr_addr(sort_needed_sec);
/* extable has been sorted, clear the flag */ /* extable has been sorted, clear the flag */
w(0, sort_needed_loc); elf_parser.w(0, sort_needed_loc);
rc = 0; rc = 0;
out: out:
@ -1155,8 +822,8 @@ static int do_sort(Elf_Ehdr *ehdr,
static int compare_relative_table(const void *a, const void *b) static int compare_relative_table(const void *a, const void *b)
{ {
int32_t av = (int32_t)r(a); int32_t av = (int32_t)elf_parser.r(a);
int32_t bv = (int32_t)r(b); int32_t bv = (int32_t)elf_parser.r(b);
if (av < bv) if (av < bv)
return -1; return -1;
@ -1175,7 +842,7 @@ static void sort_relative_table(char *extab_image, int image_size)
*/ */
while (i < image_size) { while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i); uint32_t *loc = (uint32_t *)(extab_image + i);
w(r(loc) + i, loc); elf_parser.w(elf_parser.r(loc) + i, loc);
i += 4; i += 4;
} }
@ -1185,7 +852,7 @@ static void sort_relative_table(char *extab_image, int image_size)
i = 0; i = 0;
while (i < image_size) { while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i); uint32_t *loc = (uint32_t *)(extab_image + i);
w(r(loc) - i, loc); elf_parser.w(elf_parser.r(loc) - i, loc);
i += 4; i += 4;
} }
} }
@ -1197,8 +864,8 @@ static void sort_relative_table_with_data(char *extab_image, int image_size)
while (i < image_size) { while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i); uint32_t *loc = (uint32_t *)(extab_image + i);
w(r(loc) + i, loc); elf_parser.w(elf_parser.r(loc) + i, loc);
w(r(loc + 1) + i + 4, loc + 1); elf_parser.w(elf_parser.r(loc + 1) + i + 4, loc + 1);
/* Don't touch the fixup type or data */ /* Don't touch the fixup type or data */
i += sizeof(uint32_t) * 3; i += sizeof(uint32_t) * 3;
@ -1210,8 +877,8 @@ static void sort_relative_table_with_data(char *extab_image, int image_size)
while (i < image_size) { while (i < image_size) {
uint32_t *loc = (uint32_t *)(extab_image + i); uint32_t *loc = (uint32_t *)(extab_image + i);
w(r(loc) - i, loc); elf_parser.w(elf_parser.r(loc) - i, loc);
w(r(loc + 1) - (i + 4), loc + 1); elf_parser.w(elf_parser.r(loc + 1) - (i + 4), loc + 1);
/* Don't touch the fixup type or data */ /* Don't touch the fixup type or data */
i += sizeof(uint32_t) * 3; i += sizeof(uint32_t) * 3;
@ -1223,35 +890,7 @@ static int do_file(char const *const fname, void *addr)
Elf_Ehdr *ehdr = addr; Elf_Ehdr *ehdr = addr;
table_sort_t custom_sort = NULL; table_sort_t custom_sort = NULL;
switch (ehdr->e32.e_ident[EI_DATA]) { switch (elf_map_machine(ehdr)) {
case ELFDATA2LSB:
r = rle;
r2 = r2le;
r8 = r8le;
w = wle;
w8 = w8le;
break;
case ELFDATA2MSB:
r = rbe;
r2 = r2be;
r8 = r8be;
w = wbe;
w8 = w8be;
break;
default:
fprintf(stderr, "unrecognized ELF data encoding %d: %s\n",
ehdr->e32.e_ident[EI_DATA], fname);
return -1;
}
if (memcmp(ELFMAG, ehdr->e32.e_ident, SELFMAG) != 0 ||
(r2(&ehdr->e32.e_type) != ET_EXEC && r2(&ehdr->e32.e_type) != ET_DYN) ||
ehdr->e32.e_ident[EI_VERSION] != EV_CURRENT) {
fprintf(stderr, "unrecognized ET_EXEC/ET_DYN file %s\n", fname);
return -1;
}
switch (r2(&ehdr->e32.e_machine)) {
case EM_AARCH64: case EM_AARCH64:
#ifdef MCOUNT_SORT_ENABLED #ifdef MCOUNT_SORT_ENABLED
sort_reloc = true; sort_reloc = true;
@ -1281,85 +920,37 @@ static int do_file(char const *const fname, void *addr)
break; break;
default: default:
fprintf(stderr, "unrecognized e_machine %d %s\n", fprintf(stderr, "unrecognized e_machine %d %s\n",
r2(&ehdr->e32.e_machine), fname); elf_parser.r2(&ehdr->e32.e_machine), fname);
return -1; return -1;
} }
switch (ehdr->e32.e_ident[EI_CLASS]) { switch (elf_map_long_size(addr)) {
case ELFCLASS32: { case 4:
struct elf_funcs efuncs = { compare_extable = compare_extable_32,
.compare_extable = compare_extable_32,
.ehdr_shoff = ehdr32_shoff,
.ehdr_shentsize = ehdr32_shentsize,
.ehdr_shstrndx = ehdr32_shstrndx,
.ehdr_shnum = ehdr32_shnum,
.shdr_addr = shdr32_addr,
.shdr_offset = shdr32_offset,
.shdr_link = shdr32_link,
.shdr_size = shdr32_size,
.shdr_name = shdr32_name,
.shdr_type = shdr32_type,
.shdr_entsize = shdr32_entsize,
.sym_type = sym32_type,
.sym_name = sym32_name,
.sym_value = sym32_value,
.sym_shndx = sym32_shndx,
.rela_offset = rela32_offset,
.rela_info = rela32_info,
.rela_addend = rela32_addend,
.rela_write_addend = rela32_write_addend,
};
e = efuncs;
long_size = 4; long_size = 4;
extable_ent_size = 8; extable_ent_size = 8;
if (r2(&ehdr->e32.e_ehsize) != sizeof(Elf32_Ehdr) || if (elf_parser.r2(&ehdr->e32.e_ehsize) != sizeof(Elf32_Ehdr) ||
r2(&ehdr->e32.e_shentsize) != sizeof(Elf32_Shdr)) { elf_parser.r2(&ehdr->e32.e_shentsize) != sizeof(Elf32_Shdr)) {
fprintf(stderr, fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", fname); "unrecognized ET_EXEC/ET_DYN file: %s\n", fname);
return -1; return -1;
} }
}
break; break;
case ELFCLASS64: { case 8:
struct elf_funcs efuncs = { compare_extable = compare_extable_64,
.compare_extable = compare_extable_64,
.ehdr_shoff = ehdr64_shoff,
.ehdr_shentsize = ehdr64_shentsize,
.ehdr_shstrndx = ehdr64_shstrndx,
.ehdr_shnum = ehdr64_shnum,
.shdr_addr = shdr64_addr,
.shdr_offset = shdr64_offset,
.shdr_link = shdr64_link,
.shdr_size = shdr64_size,
.shdr_name = shdr64_name,
.shdr_type = shdr64_type,
.shdr_entsize = shdr64_entsize,
.sym_type = sym64_type,
.sym_name = sym64_name,
.sym_value = sym64_value,
.sym_shndx = sym64_shndx,
.rela_offset = rela64_offset,
.rela_info = rela64_info,
.rela_addend = rela64_addend,
.rela_write_addend = rela64_write_addend,
};
e = efuncs;
long_size = 8; long_size = 8;
extable_ent_size = 16; extable_ent_size = 16;
if (r2(&ehdr->e64.e_ehsize) != sizeof(Elf64_Ehdr) || if (elf_parser.r2(&ehdr->e64.e_ehsize) != sizeof(Elf64_Ehdr) ||
r2(&ehdr->e64.e_shentsize) != sizeof(Elf64_Shdr)) { elf_parser.r2(&ehdr->e64.e_shentsize) != sizeof(Elf64_Shdr)) {
fprintf(stderr, fprintf(stderr,
"unrecognized ET_EXEC/ET_DYN file: %s\n", "unrecognized ET_EXEC/ET_DYN file: %s\n",
fname); fname);
return -1; return -1;
} }
}
break; break;
default: default:
fprintf(stderr, "unrecognized ELF class %d %s\n", fprintf(stderr, "unrecognized ELF class %d %s\n",
@ -1398,7 +989,7 @@ int main(int argc, char *argv[])
/* Process each file in turn, allowing deep failure. */ /* Process each file in turn, allowing deep failure. */
for (i = optind; i < argc; i++) { for (i = optind; i < argc; i++) {
addr = mmap_file(argv[i], &size); addr = elf_map(argv[i], &size, (1 << ET_EXEC) | (1 << ET_DYN));
if (!addr) { if (!addr) {
++n_error; ++n_error;
continue; continue;
@ -1407,7 +998,7 @@ int main(int argc, char *argv[])
if (do_file(argv[i], addr)) if (do_file(argv[i], addr))
++n_error; ++n_error;
munmap(addr, size); elf_unmap(addr, size);
} }
return !!n_error; return !!n_error;

261
scripts/tracepoint-update.c Normal file
View File

@ -0,0 +1,261 @@
// SPDX-License-Identifier: GPL-2.0-only
#include <sys/types.h>
#include <sys/stat.h>
#include <getopt.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
#include <string.h>
#include <unistd.h>
#include <errno.h>
#include <pthread.h>
#include "elf-parse.h"
static Elf_Shdr *check_data_sec;
static Elf_Shdr *tracepoint_data_sec;
static inline void *get_index(void *start, int entsize, int index)
{
return start + (entsize * index);
}
static int compare_strings(const void *a, const void *b)
{
const char *av = *(const char **)a;
const char *bv = *(const char **)b;
return strcmp(av, bv);
}
struct elf_tracepoint {
Elf_Ehdr *ehdr;
const char **array;
int count;
};
#define REALLOC_SIZE (1 << 10)
#define REALLOC_MASK (REALLOC_SIZE - 1)
static int add_string(const char *str, const char ***vals, int *count)
{
const char **array = *vals;
if (!(*count & REALLOC_MASK)) {
int size = (*count) + REALLOC_SIZE;
array = realloc(array, sizeof(char *) * size);
if (!array) {
fprintf(stderr, "Failed memory allocation\n");
return -1;
}
*vals = array;
}
array[(*count)++] = str;
return 0;
}
/**
* for_each_shdr_str - iterator that reads strings that are in an ELF section.
* @len: "int" to hold the length of the current string
* @ehdr: A pointer to the ehdr of the ELF file
* @sec: The section that has the strings to iterate on
*
* This is a for loop that iterates over all the nul terminated strings
* that are in a given ELF section. The variable "str" will hold
* the current string for each iteration and the passed in @len will
* contain the strlen() of that string.
*/
#define for_each_shdr_str(len, ehdr, sec) \
for (const char *str = (void *)(ehdr) + shdr_offset(sec), \
*end = str + shdr_size(sec); \
len = strlen(str), str < end; \
str += (len) + 1)
static void make_trace_array(struct elf_tracepoint *etrace)
{
Elf_Ehdr *ehdr = etrace->ehdr;
const char **vals = NULL;
int count = 0;
int len;
etrace->array = NULL;
/*
* The __tracepoint_check section is filled with strings of the
* names of tracepoints (in tracepoint_strings). Create an array
* that points to each string and then sort the array.
*/
for_each_shdr_str(len, ehdr, check_data_sec) {
if (!len)
continue;
if (add_string(str, &vals, &count) < 0)
return;
}
/* If CONFIG_TRACEPOINT_VERIFY_USED is not set, there's nothing to do */
if (!count)
return;
qsort(vals, count, sizeof(char *), compare_strings);
etrace->array = vals;
etrace->count = count;
}
static int find_event(const char *str, void *array, size_t size)
{
return bsearch(&str, array, size, sizeof(char *), compare_strings) != NULL;
}
static void check_tracepoints(struct elf_tracepoint *etrace, const char *fname)
{
Elf_Ehdr *ehdr = etrace->ehdr;
int len;
if (!etrace->array)
return;
/*
* The __tracepoints_strings section holds all the names of the
* defined tracepoints. If any of them are not in the
* __tracepoint_check_section it means they are not used.
*/
for_each_shdr_str(len, ehdr, tracepoint_data_sec) {
if (!len)
continue;
if (!find_event(str, etrace->array, etrace->count)) {
fprintf(stderr, "warning: tracepoint '%s' is unused", str);
if (fname)
fprintf(stderr, " in module %s\n", fname);
else
fprintf(stderr, "\n");
}
}
free(etrace->array);
}
static void *tracepoint_check(struct elf_tracepoint *etrace, const char *fname)
{
make_trace_array(etrace);
check_tracepoints(etrace, fname);
return NULL;
}
static int process_tracepoints(bool mod, void *addr, const char *fname)
{
struct elf_tracepoint etrace = {0};
Elf_Ehdr *ehdr = addr;
Elf_Shdr *shdr_start;
Elf_Shdr *string_sec;
const char *secstrings;
unsigned int shnum;
unsigned int shstrndx;
int shentsize;
int idx;
int done = 2;
shdr_start = (Elf_Shdr *)((char *)ehdr + ehdr_shoff(ehdr));
shentsize = ehdr_shentsize(ehdr);
shstrndx = ehdr_shstrndx(ehdr);
if (shstrndx == SHN_XINDEX)
shstrndx = shdr_link(shdr_start);
string_sec = get_index(shdr_start, shentsize, shstrndx);
secstrings = (const char *)ehdr + shdr_offset(string_sec);
shnum = ehdr_shnum(ehdr);
if (shnum == SHN_UNDEF)
shnum = shdr_size(shdr_start);
for (int i = 0; done && i < shnum; i++) {
Elf_Shdr *shdr = get_index(shdr_start, shentsize, i);
idx = shdr_name(shdr);
/* locate the __tracepoint_check in vmlinux */
if (!strcmp(secstrings + idx, "__tracepoint_check")) {
check_data_sec = shdr;
done--;
}
/* locate the __tracepoints_ptrs section in vmlinux */
if (!strcmp(secstrings + idx, "__tracepoints_strings")) {
tracepoint_data_sec = shdr;
done--;
}
}
/*
* Modules may not have either section. But if it has one section,
* it should have both of them.
*/
if (mod && !check_data_sec && !tracepoint_data_sec)
return 0;
if (!check_data_sec) {
if (mod) {
fprintf(stderr, "warning: Module %s has only unused tracepoints\n", fname);
/* Do not fail build */
return 0;
}
fprintf(stderr, "no __tracepoint_check in file: %s\n", fname);
return -1;
}
if (!tracepoint_data_sec) {
fprintf(stderr, "no __tracepoint_strings in file: %s\n", fname);
return -1;
}
if (!mod)
fname = NULL;
etrace.ehdr = ehdr;
tracepoint_check(&etrace, fname);
return 0;
}
int main(int argc, char *argv[])
{
int n_error = 0;
size_t size = 0;
void *addr = NULL;
bool mod = false;
if (argc > 1 && strcmp(argv[1], "--module") == 0) {
mod = true;
argc--;
argv++;
}
if (argc < 2) {
if (mod)
fprintf(stderr, "usage: tracepoint-update --module module...\n");
else
fprintf(stderr, "usage: tracepoint-update vmlinux...\n");
return 0;
}
/* Process each file in turn, allowing deep failure. */
for (int i = 1; i < argc; i++) {
addr = elf_map(argv[i], &size, 1 << ET_REL);
if (!addr) {
++n_error;
continue;
}
if (process_tracepoints(mod, addr, argv[i]))
++n_error;
elf_unmap(addr, size);
}
return !!n_error;
}

View File

@ -741,9 +741,9 @@ if ($start) {
die "Can not find file $bad\n"; die "Can not find file $bad\n";
} }
if ($val eq "good") { if ($val eq "good") {
run_command "cp $output_config $good" or die "failed to copy $config to $good\n"; run_command "cp $output_config $good" or die "failed to copy $output_config to $good\n";
} elsif ($val eq "bad") { } elsif ($val eq "bad") {
run_command "cp $output_config $bad" or die "failed to copy $config to $bad\n"; run_command "cp $output_config $bad" or die "failed to copy $output_config to $bad\n";
} }
} }

View File

@ -22,6 +22,7 @@ echo " --fail-unresolved Treat UNRESOLVED as a failure"
echo " -d|--debug Debug mode (trace all shell commands)" echo " -d|--debug Debug mode (trace all shell commands)"
echo " -l|--logdir <dir> Save logs on the <dir>" echo " -l|--logdir <dir> Save logs on the <dir>"
echo " If <dir> is -, all logs output in console only" echo " If <dir> is -, all logs output in console only"
echo " --rv Run RV selftests instead of ftrace ones"
exit $1 exit $1
} }
@ -133,6 +134,10 @@ parse_opts() { # opts
LINK_PTR= LINK_PTR=
shift 2 shift 2
;; ;;
--rv)
RV_TEST=1
shift 1
;;
*.tc) *.tc)
if [ -f "$1" ]; then if [ -f "$1" ]; then
OPT_TEST_CASES="$OPT_TEST_CASES `abspath $1`" OPT_TEST_CASES="$OPT_TEST_CASES `abspath $1`"
@ -152,9 +157,13 @@ parse_opts() { # opts
;; ;;
esac esac
done done
if [ ! -z "$OPT_TEST_CASES" ]; then if [ -n "$OPT_TEST_CASES" ]; then
TEST_CASES=$OPT_TEST_CASES TEST_CASES=$OPT_TEST_CASES
fi fi
if [ -n "$OPT_TEST_DIR" -a -f "$OPT_TEST_DIR"/test.d/functions ]; then
TOP_DIR=$OPT_TEST_DIR
TEST_DIR=$TOP_DIR/test.d
fi
} }
# Parameters # Parameters
@ -190,10 +199,6 @@ fi
TOP_DIR=`absdir $0` TOP_DIR=`absdir $0`
TEST_DIR=$TOP_DIR/test.d TEST_DIR=$TOP_DIR/test.d
TEST_CASES=`find_testcases $TEST_DIR` TEST_CASES=`find_testcases $TEST_DIR`
LOG_TOP_DIR=$TOP_DIR/logs
LOG_DATE=`date +%Y%m%d-%H%M%S`
LOG_DIR=$LOG_TOP_DIR/$LOG_DATE/
LINK_PTR=$LOG_TOP_DIR/latest
KEEP_LOG=0 KEEP_LOG=0
KTAP=0 KTAP=0
DEBUG=0 DEBUG=0
@ -201,14 +206,23 @@ VERBOSE=0
UNSUPPORTED_RESULT=0 UNSUPPORTED_RESULT=0
UNRESOLVED_RESULT=0 UNRESOLVED_RESULT=0
STOP_FAILURE=0 STOP_FAILURE=0
RV_TEST=0
# Parse command-line options # Parse command-line options
parse_opts $* parse_opts $*
LOG_TOP_DIR=$TOP_DIR/logs
LOG_DATE=`date +%Y%m%d-%H%M%S`
LOG_DIR=$LOG_TOP_DIR/$LOG_DATE/
LINK_PTR=$LOG_TOP_DIR/latest
[ $DEBUG -ne 0 ] && set -x [ $DEBUG -ne 0 ] && set -x
# Verify parameters if [ $RV_TEST -ne 0 ]; then
if [ -z "$TRACING_DIR" -o ! -d "$TRACING_DIR" ]; then TRACING_DIR=$TRACING_DIR/rv
errexit "No ftrace directory found" if [ ! -d "$TRACING_DIR" ]; then
err_ret=$err_skip
errexit "rv is not configured in this kernel"
fi
fi fi
# Preparing logs # Preparing logs
@ -419,7 +433,7 @@ trap 'SIG_RESULT=$XFAIL' $SIG_XFAIL
__run_test() { # testfile __run_test() { # testfile
# setup PID and PPID, $$ is not updated. # setup PID and PPID, $$ is not updated.
(cd $TRACING_DIR; read PID _ < /proc/self/stat; set -e; set -x; (cd $TRACING_DIR; read PID _ < /proc/self/stat; set -e; set -x;
checkreq $1; initialize_ftrace; . $1) checkreq $1; initialize_system; . $1)
[ $? -ne 0 ] && kill -s $SIG_FAIL $SIG_PID [ $? -ne 0 ] && kill -s $SIG_FAIL $SIG_PID
} }
@ -496,7 +510,7 @@ for t in $TEST_CASES; do
exit 1 exit 1
fi fi
done done
(cd $TRACING_DIR; finish_ftrace) # for cleanup (cd $TRACING_DIR; finish_system) # for cleanup
prlog "" prlog ""
prlog "# of passed: " `echo $PASSED_CASES | wc -w` prlog "# of passed: " `echo $PASSED_CASES | wc -w`

View File

@ -28,7 +28,7 @@ unmount_tracefs() {
local mount_point="$1" local mount_point="$1"
# Need to make sure the mount isn't busy so that we can umount it # Need to make sure the mount isn't busy so that we can umount it
(cd $mount_point; finish_ftrace;) (cd $mount_point; finish_system;)
cleanup cleanup
} }

View File

@ -104,7 +104,7 @@ clear_dynamic_events() { # reset all current dynamic events
done done
} }
initialize_ftrace() { # Reset ftrace to initial-state initialize_system() { # Reset ftrace to initial-state
# As the initial state, ftrace will be set to nop tracer, # As the initial state, ftrace will be set to nop tracer,
# no events, no triggers, no filters, no function filters, # no events, no triggers, no filters, no function filters,
# no probes, and tracing on. # no probes, and tracing on.
@ -134,8 +134,8 @@ initialize_ftrace() { # Reset ftrace to initial-state
enable_tracing enable_tracing
} }
finish_ftrace() { finish_system() {
initialize_ftrace initialize_system
# And recover it to default. # And recover it to default.
[ -f options/pause-on-trace ] && echo 0 > options/pause-on-trace [ -f options/pause-on-trace ] && echo 0 > options/pause-on-trace
} }

View File

@ -0,0 +1,2 @@
# SPDX-License-Identifier: GPL-2.0-only
logs

View File

@ -0,0 +1,8 @@
# SPDX-License-Identifier: GPL-2.0
all:
TEST_PROGS := verificationtest-ktap
TEST_FILES := test.d settings
EXTRA_CLEAN := $(OUTPUT)/logs/*
include ../lib.mk

View File

@ -0,0 +1 @@
CONFIG_RV=y

View File

@ -0,0 +1 @@
timeout=0

View File

@ -0,0 +1,39 @@
check_requires() { # Check required files, monitors and reactors
for i in "$@" ; do
p=${i%:program}
m=${i%:monitor}
r=${i%:reactor}
if [ $p != $i ]; then
if ! which $p ; then
echo "Required program $p is not found."
exit_unresolved
fi
elif [ $m != $i ]; then
if ! grep -wq $m available_monitors ; then
echo "Required monitor $m is not configured."
exit_unsupported
fi
elif [ $r != $i ]; then
if ! grep -wq $r available_reactors ; then
echo "Required reactor $r is not configured."
exit_unsupported
fi
elif [ ! -e $i ]; then
echo "Required feature interface $i doesn't exist."
exit_unsupported
fi
done
}
initialize_system() { # Reset RV to initial-state
echo > enabled_monitors
for m in monitors/*; do
echo nop > $m/reactors || true
done
echo 1 > monitoring_on
echo 1 > reacting_on || true
}
finish_system() {
initialize_system
}

View File

@ -0,0 +1,75 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-or-later
# description: Test monitor enable/disable
test_simple_monitor() {
local monitor="$1"
local prefix="$2" # nested monitors
echo 1 > "monitors/$prefix$monitor/enable"
grep -q "$monitor$" enabled_monitors
echo 0 > "monitors/$prefix$monitor/enable"
! grep -q "$monitor$" enabled_monitors
echo "$monitor" >> enabled_monitors
grep -q 1 "monitors/$prefix$monitor/enable"
echo "!$monitor" >> enabled_monitors
grep -q 0 "monitors/$prefix$monitor/enable"
}
test_container_monitor() {
local monitor="$1"
local nested
echo 1 > "monitors/$monitor/enable"
grep -q "^$monitor$" enabled_monitors
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
grep -q "^$monitor:$nested$" enabled_monitors
done
test -n "$nested"
echo 0 > "monitors/$monitor/enable"
! grep -q "^$monitor$" enabled_monitors
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
! grep -q "^$monitor:$nested$" enabled_monitors
done
echo "$monitor" >> enabled_monitors
grep -q 1 "monitors/$monitor/enable"
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
grep -q "^$monitor:$nested$" enabled_monitors
done
echo "!$monitor" >> enabled_monitors
grep -q 0 "monitors/$monitor/enable"
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
test_simple_monitor "$nested" "$monitor/"
done
}
for monitor_dir in monitors/*; do
monitor=$(basename "$monitor_dir")
if find "$monitor_dir" -mindepth 1 -type d | grep -q .; then
test_container_monitor "$monitor"
else
test_simple_monitor "$monitor"
fi
done
! echo non_existent_monitor > enabled_monitors
! grep -q "^non_existent_monitor$" enabled_monitors

View File

@ -0,0 +1,68 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-or-later
# description: Test monitor reactor setting
# requires: available_reactors
test_monitor_reactor() {
local monitor="$1"
local prefix="$2" # nested monitors
while read -r reactor; do
[ "$reactor" = nop ] && continue
echo "$reactor" > "monitors/$prefix$monitor/reactors"
grep -q "\\[$reactor\\]" "monitors/$prefix$monitor/reactors"
done < available_reactors
echo nop > "monitors/$prefix$monitor/reactors"
grep -q "\\[nop\\]" "monitors/$prefix$monitor/reactors"
}
test_container_monitor() {
local monitor="$1"
local nested
while read -r reactor; do
[ "$reactor" = nop ] && continue
echo "$reactor" > "monitors/$monitor/reactors"
grep -q "\\[$reactor\\]" "monitors/$monitor/reactors"
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
grep -q "\\[$reactor\\]" "monitors/$monitor/$nested/reactors"
done
done < available_reactors
test -n "$nested"
echo nop > "monitors/$monitor/reactors"
grep -q "\\[nop\\]" "monitors/$monitor/reactors"
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
grep -q "\\[nop\\]" "monitors/$monitor/$nested/reactors"
done
for nested_dir in "monitors/$monitor"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
test_monitor_reactor "$nested" "$monitor/"
done
}
for monitor_dir in monitors/*; do
monitor=$(basename "$monitor_dir")
if find "$monitor_dir" -mindepth 1 -type d | grep -q .; then
test_container_monitor "$monitor"
else
test_monitor_reactor "$monitor"
fi
done
monitor=$(ls /sys/kernel/tracing/rv/monitors -1 | head -n 1)
test -f "monitors/$monitor/reactors"
! echo non_existent_reactor > "monitors/$monitor/reactors"
! grep -q "\\[non_existent_reactor\\]" "monitors/$monitor/reactors"

View File

@ -0,0 +1,18 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-or-later
# description: Check available monitors
for monitor_dir in monitors/*; do
monitor=$(basename "$monitor_dir")
grep -q "^$monitor$" available_monitors
grep -q . "$monitor_dir"/desc
for nested_dir in "$monitor_dir"/*; do
[ -d "$nested_dir" ] || continue
nested=$(basename "$nested_dir")
grep -q "^$monitor:$nested$" available_monitors
grep -q . "$nested_dir"/desc
done
done

View File

@ -0,0 +1,30 @@
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-or-later
# description: Test wwnr monitor with printk reactor
# requires: available_reactors wwnr:monitor printk:reactor stress-ng:program
load() { # returns true if there was a reaction
local lines_before num
num=$((($(nproc) + 1) / 2))
lines_before=$(dmesg | wc -l)
stress-ng --cpu-sched "$num" --timer "$num" -t 5 -q
dmesg | tail -n $((lines_before + 1)) | grep -q "rv: monitor wwnr does not allow event"
}
echo 1 > monitors/wwnr/enable
echo printk > monitors/wwnr/reactors
load
echo 0 > monitoring_on
! load
echo 1 > monitoring_on
load
echo 0 > reacting_on
! load
echo 1 > reacting_on
echo nop > monitors/wwnr/reactors
echo 0 > monitors/wwnr/enable

View File

@ -0,0 +1,8 @@
#!/bin/sh -e
# SPDX-License-Identifier: GPL-2.0-only
#
# ftracetest-ktap: Wrapper to integrate ftracetest with the kselftest runner
#
# Copyright (C) Arm Ltd., 2023
../ftrace/ftracetest -K -v --rv ../verification

View File

@ -18,7 +18,7 @@ export CC AR STRIP PKG_CONFIG LD_SO_CONF_PATH LDCONFIG
FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \ FOPTS := -flto=auto -ffat-lto-objects -fexceptions -fstack-protector-strong \
-fasynchronous-unwind-tables -fstack-clash-protection -fasynchronous-unwind-tables -fstack-clash-protection
WOPTS := -O -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 \ WOPTS := -O -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 \
-Wp,-D_GLIBCXX_ASSERTIONS -Wno-maybe-uninitialized -Wp,-D_GLIBCXX_ASSERTIONS
ifeq ($(CC),clang) ifeq ($(CC),clang)
FOPTS := $(filter-out -flto=auto -ffat-lto-objects, $(FOPTS)) FOPTS := $(filter-out -flto=auto -ffat-lto-objects, $(FOPTS))

View File

@ -268,6 +268,10 @@ int top_main_loop(struct osnoise_tool *tool)
tool->ops->print_stats(tool); tool->ops->print_stats(tool);
if (osnoise_trace_is_off(tool, record)) { if (osnoise_trace_is_off(tool, record)) {
if (stop_tracing)
/* stop tracing requested, do not perform actions */
return 0;
actions_perform(&params->threshold_actions); actions_perform(&params->threshold_actions);
if (!params->threshold_actions.continue_flag) if (!params->threshold_actions.continue_flag)
@ -315,20 +319,22 @@ int hist_main_loop(struct osnoise_tool *tool)
} }
if (osnoise_trace_is_off(tool, tool->record)) { if (osnoise_trace_is_off(tool, tool->record)) {
if (stop_tracing)
/* stop tracing requested, do not perform actions */
break;
actions_perform(&params->threshold_actions); actions_perform(&params->threshold_actions);
if (!params->threshold_actions.continue_flag) { if (!params->threshold_actions.continue_flag)
/* continue flag not set, break */ /* continue flag not set, break */
break; break;
/* continue action reached, re-enable tracing */ /* continue action reached, re-enable tracing */
if (tool->record) if (tool->record)
trace_instance_start(&tool->record->trace); trace_instance_start(&tool->record->trace);
if (tool->aa) if (tool->aa)
trace_instance_start(&tool->aa->trace); trace_instance_start(&tool->aa->trace);
trace_instance_start(&tool->trace); trace_instance_start(&tool->trace);
}
break;
} }
/* is there still any user-threads ? */ /* is there still any user-threads ? */

View File

@ -107,6 +107,10 @@ struct common_params {
struct timerlat_u_params user; struct timerlat_u_params user;
}; };
#define for_each_monitored_cpu(cpu, nr_cpus, common) \
for (cpu = 0; cpu < nr_cpus; cpu++) \
if (!(common)->cpus || CPU_ISSET(cpu, &(common)->monitored_cpus))
struct tool_ops; struct tool_ops;
/* /*

View File

@ -247,9 +247,7 @@ static void osnoise_hist_header(struct osnoise_tool *tool)
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(s, "Index"); trace_seq_printf(s, "Index");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -278,9 +276,7 @@ osnoise_print_summary(struct osnoise_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "count:"); trace_seq_printf(trace->seq, "count:");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -292,9 +288,7 @@ osnoise_print_summary(struct osnoise_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "min: "); trace_seq_printf(trace->seq, "min: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -307,9 +301,7 @@ osnoise_print_summary(struct osnoise_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "avg: "); trace_seq_printf(trace->seq, "avg: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -325,9 +317,7 @@ osnoise_print_summary(struct osnoise_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "max: "); trace_seq_printf(trace->seq, "max: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -362,9 +352,7 @@ osnoise_print_stats(struct osnoise_tool *tool)
trace_seq_printf(trace->seq, "%-6d", trace_seq_printf(trace->seq, "%-6d",
bucket * data->bucket_size); bucket * data->bucket_size);
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -400,9 +388,7 @@ osnoise_print_stats(struct osnoise_tool *tool)
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "over: "); trace_seq_printf(trace->seq, "over: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].count) if (!data->hist[cpu].count)
continue; continue;
@ -421,16 +407,16 @@ osnoise_print_stats(struct osnoise_tool *tool)
/* /*
* osnoise_hist_usage - prints osnoise hist usage message * osnoise_hist_usage - prints osnoise hist usage message
*/ */
static void osnoise_hist_usage(char *usage) static void osnoise_hist_usage(void)
{ {
int i; int i;
static const char * const msg[] = { static const char * const msg[] = {
"", "",
" usage: rtla osnoise hist [-h] [-D] [-d s] [-a us] [-p us] [-r us] [-s us] [-S us] \\", " usage: rtla osnoise hist [-h] [-D] [-d s] [-a us] [-p us] [-r us] [-s us] [-S us] \\",
" [-T us] [-t[file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] \\", " [-T us] [-t [file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] \\",
" [-c cpu-list] [-H cpu-list] [-P priority] [-b N] [-E N] [--no-header] [--no-summary] \\", " [-c cpu-list] [-H cpu-list] [-P priority] [-b N] [-E N] [--no-header] [--no-summary] \\",
" [--no-index] [--with-zeros] [-C[=cgroup_name]] [--warm-up]", " [--no-index] [--with-zeros] [-C [cgroup_name]] [--warm-up]",
"", "",
" -h/--help: print this menu", " -h/--help: print this menu",
" -a/--auto: set automatic trace mode, stopping the session if argument in us sample is hit", " -a/--auto: set automatic trace mode, stopping the session if argument in us sample is hit",
@ -441,10 +427,10 @@ static void osnoise_hist_usage(char *usage)
" -T/--threshold us: the minimum delta to be considered a noise", " -T/--threshold us: the minimum delta to be considered a noise",
" -c/--cpus cpu-list: list of cpus to run osnoise threads", " -c/--cpus cpu-list: list of cpus to run osnoise threads",
" -H/--house-keeping cpus: run rtla control threads only on the given cpus", " -H/--house-keeping cpus: run rtla control threads only on the given cpus",
" -C/--cgroup[=cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited", " -C/--cgroup [cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited",
" -d/--duration time[s|m|h|d]: duration of the session", " -d/--duration time[s|m|h|d]: duration of the session",
" -D/--debug: print debug info", " -D/--debug: print debug info",
" -t/--trace[file]: save the stopped trace to [file|osnoise_trace.txt]", " -t/--trace [file]: save the stopped trace to [file|osnoise_trace.txt]",
" -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed", " -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed",
" --filter <filter>: enable a trace event filter to the previous -e event", " --filter <filter>: enable a trace event filter to the previous -e event",
" --trigger <trigger>: enable a trace event trigger to the previous -e event", " --trigger <trigger>: enable a trace event trigger to the previous -e event",
@ -467,18 +453,12 @@ static void osnoise_hist_usage(char *usage)
NULL, NULL,
}; };
if (usage)
fprintf(stderr, "%s\n", usage);
fprintf(stderr, "rtla osnoise hist: a per-cpu histogram of the OS noise (version %s)\n", fprintf(stderr, "rtla osnoise hist: a per-cpu histogram of the OS noise (version %s)\n",
VERSION); VERSION);
for (i = 0; msg[i]; i++) for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]); fprintf(stderr, "%s\n", msg[i]);
if (usage)
exit(EXIT_FAILURE);
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);
} }
@ -538,11 +518,8 @@ static struct common_params
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
/* getopt_long stores the option index here. */
int option_index = 0;
c = getopt_long(argc, argv, "a:c:C::b:d:e:E:DhH:p:P:r:s:S:t::T:01234:5:6:7:", c = getopt_long(argc, argv, "a:c:C::b:d:e:E:DhH:p:P:r:s:S:t::T:01234:5:6:7:",
long_options, &option_index); long_options, NULL);
/* detect the end of the options. */ /* detect the end of the options. */
if (c == -1) if (c == -1)
@ -557,30 +534,25 @@ static struct common_params
params->threshold = 1; params->threshold = 1;
/* set trace */ /* set trace */
trace_output = "osnoise_trace.txt"; if (!trace_output)
trace_output = "osnoise_trace.txt";
break; break;
case 'b': case 'b':
params->common.hist.bucket_size = get_llong_from_str(optarg); params->common.hist.bucket_size = get_llong_from_str(optarg);
if (params->common.hist.bucket_size == 0 || if (params->common.hist.bucket_size == 0 ||
params->common.hist.bucket_size >= 1000000) params->common.hist.bucket_size >= 1000000)
osnoise_hist_usage("Bucket size needs to be > 0 and <= 1000000\n"); fatal("Bucket size needs to be > 0 and <= 1000000");
break; break;
case 'c': case 'c':
retval = parse_cpu_set(optarg, &params->common.monitored_cpus); retval = parse_cpu_set(optarg, &params->common.monitored_cpus);
if (retval) if (retval)
osnoise_hist_usage("\nInvalid -c cpu list\n"); fatal("Invalid -c cpu list");
params->common.cpus = optarg; params->common.cpus = optarg;
break; break;
case 'C': case 'C':
params->common.cgroup = 1; params->common.cgroup = 1;
if (!optarg) { params->common.cgroup_name = parse_optional_arg(argc, argv);
/* will inherit this cgroup */
params->common.cgroup_name = NULL;
} else if (*optarg == '=') {
/* skip the = */
params->common.cgroup_name = ++optarg;
}
break; break;
case 'D': case 'D':
config_debug = 1; config_debug = 1;
@ -588,14 +560,12 @@ static struct common_params
case 'd': case 'd':
params->common.duration = parse_seconds_duration(optarg); params->common.duration = parse_seconds_duration(optarg);
if (!params->common.duration) if (!params->common.duration)
osnoise_hist_usage("Invalid -D duration\n"); fatal("Invalid -D duration");
break; break;
case 'e': case 'e':
tevent = trace_event_alloc(optarg); tevent = trace_event_alloc(optarg);
if (!tevent) { if (!tevent)
err_msg("Error alloc trace event"); fatal("Error alloc trace event");
exit(EXIT_FAILURE);
}
if (params->common.events) if (params->common.events)
tevent->next = params->common.events; tevent->next = params->common.events;
@ -606,35 +576,33 @@ static struct common_params
params->common.hist.entries = get_llong_from_str(optarg); params->common.hist.entries = get_llong_from_str(optarg);
if (params->common.hist.entries < 10 || if (params->common.hist.entries < 10 ||
params->common.hist.entries > 9999999) params->common.hist.entries > 9999999)
osnoise_hist_usage("Entries must be > 10 and < 9999999\n"); fatal("Entries must be > 10 and < 9999999");
break; break;
case 'h': case 'h':
case '?': case '?':
osnoise_hist_usage(NULL); osnoise_hist_usage();
break; break;
case 'H': case 'H':
params->common.hk_cpus = 1; params->common.hk_cpus = 1;
retval = parse_cpu_set(optarg, &params->common.hk_cpu_set); retval = parse_cpu_set(optarg, &params->common.hk_cpu_set);
if (retval) { if (retval)
err_msg("Error parsing house keeping CPUs\n"); fatal("Error parsing house keeping CPUs");
exit(EXIT_FAILURE);
}
break; break;
case 'p': case 'p':
params->period = get_llong_from_str(optarg); params->period = get_llong_from_str(optarg);
if (params->period > 10000000) if (params->period > 10000000)
osnoise_hist_usage("Period longer than 10 s\n"); fatal("Period longer than 10 s");
break; break;
case 'P': case 'P':
retval = parse_prio(optarg, &params->common.sched_param); retval = parse_prio(optarg, &params->common.sched_param);
if (retval == -1) if (retval == -1)
osnoise_hist_usage("Invalid -P priority"); fatal("Invalid -P priority");
params->common.set_sched = 1; params->common.set_sched = 1;
break; break;
case 'r': case 'r':
params->runtime = get_llong_from_str(optarg); params->runtime = get_llong_from_str(optarg);
if (params->runtime < 100) if (params->runtime < 100)
osnoise_hist_usage("Runtime shorter than 100 us\n"); fatal("Runtime shorter than 100 us");
break; break;
case 's': case 's':
params->common.stop_us = get_llong_from_str(optarg); params->common.stop_us = get_llong_from_str(optarg);
@ -646,14 +614,8 @@ static struct common_params
params->threshold = get_llong_from_str(optarg); params->threshold = get_llong_from_str(optarg);
break; break;
case 't': case 't':
if (optarg) { trace_output = parse_optional_arg(argc, argv);
if (optarg[0] == '=') if (!trace_output)
trace_output = &optarg[1];
else
trace_output = &optarg[0];
} else if (optind < argc && argv[optind][0] != '0')
trace_output = argv[optind];
else
trace_output = "osnoise_trace.txt"; trace_output = "osnoise_trace.txt";
break; break;
case '0': /* no header */ case '0': /* no header */
@ -671,23 +633,19 @@ static struct common_params
case '4': /* trigger */ case '4': /* trigger */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_trigger(params->common.events, optarg); retval = trace_event_add_trigger(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding trigger %s\n", optarg); fatal("Error adding trigger %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
osnoise_hist_usage("--trigger requires a previous -e\n"); fatal("--trigger requires a previous -e");
} }
break; break;
case '5': /* filter */ case '5': /* filter */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_filter(params->common.events, optarg); retval = trace_event_add_filter(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding filter %s\n", optarg); fatal("Error adding filter %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
osnoise_hist_usage("--filter requires a previous -e\n"); fatal("--filter requires a previous -e");
} }
break; break;
case '6': case '6':
@ -699,34 +657,28 @@ static struct common_params
case '8': case '8':
retval = actions_parse(&params->common.threshold_actions, optarg, retval = actions_parse(&params->common.threshold_actions, optarg,
"osnoise_trace.txt"); "osnoise_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
case '9': case '9':
retval = actions_parse(&params->common.end_actions, optarg, retval = actions_parse(&params->common.end_actions, optarg,
"osnoise_trace.txt"); "osnoise_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
default: default:
osnoise_hist_usage("Invalid option"); fatal("Invalid option");
} }
} }
if (trace_output) if (trace_output)
actions_add_trace_output(&params->common.threshold_actions, trace_output); actions_add_trace_output(&params->common.threshold_actions, trace_output);
if (geteuid()) { if (geteuid())
err_msg("rtla needs root permission\n"); fatal("rtla needs root permission");
exit(EXIT_FAILURE);
}
if (params->common.hist.no_index && !params->common.hist.with_zeros) if (params->common.hist.no_index && !params->common.hist.with_zeros)
osnoise_hist_usage("no-index set and with-zeros not set - it does not make sense"); fatal("no-index set and with-zeros not set - it does not make sense");
return &params->common; return &params->common;
} }

View File

@ -243,9 +243,7 @@ osnoise_print_stats(struct osnoise_tool *top)
osnoise_top_header(top); osnoise_top_header(top);
for (i = 0; i < nr_cpus; i++) { for_each_monitored_cpu(i, nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(i, &params->common.monitored_cpus))
continue;
osnoise_top_print(top, i); osnoise_top_print(top, i);
} }
@ -257,14 +255,14 @@ osnoise_print_stats(struct osnoise_tool *top)
/* /*
* osnoise_top_usage - prints osnoise top usage message * osnoise_top_usage - prints osnoise top usage message
*/ */
static void osnoise_top_usage(struct osnoise_params *params, char *usage) static void osnoise_top_usage(struct osnoise_params *params)
{ {
int i; int i;
static const char * const msg[] = { static const char * const msg[] = {
" [-h] [-q] [-D] [-d s] [-a us] [-p us] [-r us] [-s us] [-S us] \\", " [-h] [-q] [-D] [-d s] [-a us] [-p us] [-r us] [-s us] [-S us] \\",
" [-T us] [-t[file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] \\", " [-T us] [-t [file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] \\",
" [-c cpu-list] [-H cpu-list] [-P priority] [-C[=cgroup_name]] [--warm-up s]", " [-c cpu-list] [-H cpu-list] [-P priority] [-C [cgroup_name]] [--warm-up s]",
"", "",
" -h/--help: print this menu", " -h/--help: print this menu",
" -a/--auto: set automatic trace mode, stopping the session if argument in us sample is hit", " -a/--auto: set automatic trace mode, stopping the session if argument in us sample is hit",
@ -275,10 +273,10 @@ static void osnoise_top_usage(struct osnoise_params *params, char *usage)
" -T/--threshold us: the minimum delta to be considered a noise", " -T/--threshold us: the minimum delta to be considered a noise",
" -c/--cpus cpu-list: list of cpus to run osnoise threads", " -c/--cpus cpu-list: list of cpus to run osnoise threads",
" -H/--house-keeping cpus: run rtla control threads only on the given cpus", " -H/--house-keeping cpus: run rtla control threads only on the given cpus",
" -C/--cgroup[=cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited", " -C/--cgroup [cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited",
" -d/--duration time[s|m|h|d]: duration of the session", " -d/--duration time[s|m|h|d]: duration of the session",
" -D/--debug: print debug info", " -D/--debug: print debug info",
" -t/--trace[file]: save the stopped trace to [file|osnoise_trace.txt]", " -t/--trace [file]: save the stopped trace to [file|osnoise_trace.txt]",
" -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed", " -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed",
" --filter <filter>: enable a trace event filter to the previous -e event", " --filter <filter>: enable a trace event filter to the previous -e event",
" --trigger <trigger>: enable a trace event trigger to the previous -e event", " --trigger <trigger>: enable a trace event trigger to the previous -e event",
@ -296,9 +294,6 @@ static void osnoise_top_usage(struct osnoise_params *params, char *usage)
NULL, NULL,
}; };
if (usage)
fprintf(stderr, "%s\n", usage);
if (params->mode == MODE_OSNOISE) { if (params->mode == MODE_OSNOISE) {
fprintf(stderr, fprintf(stderr,
"rtla osnoise top: a per-cpu summary of the OS noise (version %s)\n", "rtla osnoise top: a per-cpu summary of the OS noise (version %s)\n",
@ -318,9 +313,6 @@ static void osnoise_top_usage(struct osnoise_params *params, char *usage)
for (i = 0; msg[i]; i++) for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]); fprintf(stderr, "%s\n", msg[i]);
if (usage)
exit(EXIT_FAILURE);
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);
} }
@ -378,11 +370,8 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
/* getopt_long stores the option index here. */
int option_index = 0;
c = getopt_long(argc, argv, "a:c:C::d:De:hH:p:P:qr:s:S:t::T:0:1:2:3:", c = getopt_long(argc, argv, "a:c:C::d:De:hH:p:P:qr:s:S:t::T:0:1:2:3:",
long_options, &option_index); long_options, NULL);
/* Detect the end of the options. */ /* Detect the end of the options. */
if (c == -1) if (c == -1)
@ -397,24 +386,19 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
params->threshold = 1; params->threshold = 1;
/* set trace */ /* set trace */
trace_output = "osnoise_trace.txt"; if (!trace_output)
trace_output = "osnoise_trace.txt";
break; break;
case 'c': case 'c':
retval = parse_cpu_set(optarg, &params->common.monitored_cpus); retval = parse_cpu_set(optarg, &params->common.monitored_cpus);
if (retval) if (retval)
osnoise_top_usage(params, "\nInvalid -c cpu list\n"); fatal("Invalid -c cpu list");
params->common.cpus = optarg; params->common.cpus = optarg;
break; break;
case 'C': case 'C':
params->common.cgroup = 1; params->common.cgroup = 1;
if (!optarg) { params->common.cgroup_name = parse_optional_arg(argc, argv);
/* will inherit this cgroup */
params->common.cgroup_name = NULL;
} else if (*optarg == '=') {
/* skip the = */
params->common.cgroup_name = ++optarg;
}
break; break;
case 'D': case 'D':
config_debug = 1; config_debug = 1;
@ -422,14 +406,12 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
case 'd': case 'd':
params->common.duration = parse_seconds_duration(optarg); params->common.duration = parse_seconds_duration(optarg);
if (!params->common.duration) if (!params->common.duration)
osnoise_top_usage(params, "Invalid -d duration\n"); fatal("Invalid -d duration");
break; break;
case 'e': case 'e':
tevent = trace_event_alloc(optarg); tevent = trace_event_alloc(optarg);
if (!tevent) { if (!tevent)
err_msg("Error alloc trace event"); fatal("Error alloc trace event");
exit(EXIT_FAILURE);
}
if (params->common.events) if (params->common.events)
tevent->next = params->common.events; tevent->next = params->common.events;
@ -438,25 +420,23 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
break; break;
case 'h': case 'h':
case '?': case '?':
osnoise_top_usage(params, NULL); osnoise_top_usage(params);
break; break;
case 'H': case 'H':
params->common.hk_cpus = 1; params->common.hk_cpus = 1;
retval = parse_cpu_set(optarg, &params->common.hk_cpu_set); retval = parse_cpu_set(optarg, &params->common.hk_cpu_set);
if (retval) { if (retval)
err_msg("Error parsing house keeping CPUs\n"); fatal("Error parsing house keeping CPUs");
exit(EXIT_FAILURE);
}
break; break;
case 'p': case 'p':
params->period = get_llong_from_str(optarg); params->period = get_llong_from_str(optarg);
if (params->period > 10000000) if (params->period > 10000000)
osnoise_top_usage(params, "Period longer than 10 s\n"); fatal("Period longer than 10 s");
break; break;
case 'P': case 'P':
retval = parse_prio(optarg, &params->common.sched_param); retval = parse_prio(optarg, &params->common.sched_param);
if (retval == -1) if (retval == -1)
osnoise_top_usage(params, "Invalid -P priority"); fatal("Invalid -P priority");
params->common.set_sched = 1; params->common.set_sched = 1;
break; break;
case 'q': case 'q':
@ -465,7 +445,7 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
case 'r': case 'r':
params->runtime = get_llong_from_str(optarg); params->runtime = get_llong_from_str(optarg);
if (params->runtime < 100) if (params->runtime < 100)
osnoise_top_usage(params, "Runtime shorter than 100 us\n"); fatal("Runtime shorter than 100 us");
break; break;
case 's': case 's':
params->common.stop_us = get_llong_from_str(optarg); params->common.stop_us = get_llong_from_str(optarg);
@ -474,14 +454,8 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
params->common.stop_total_us = get_llong_from_str(optarg); params->common.stop_total_us = get_llong_from_str(optarg);
break; break;
case 't': case 't':
if (optarg) { trace_output = parse_optional_arg(argc, argv);
if (optarg[0] == '=') if (!trace_output)
trace_output = &optarg[1];
else
trace_output = &optarg[0];
} else if (optind < argc && argv[optind][0] != '-')
trace_output = argv[optind];
else
trace_output = "osnoise_trace.txt"; trace_output = "osnoise_trace.txt";
break; break;
case 'T': case 'T':
@ -490,23 +464,19 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
case '0': /* trigger */ case '0': /* trigger */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_trigger(params->common.events, optarg); retval = trace_event_add_trigger(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding trigger %s\n", optarg); fatal("Error adding trigger %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
osnoise_top_usage(params, "--trigger requires a previous -e\n"); fatal("--trigger requires a previous -e");
} }
break; break;
case '1': /* filter */ case '1': /* filter */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_filter(params->common.events, optarg); retval = trace_event_add_filter(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding filter %s\n", optarg); fatal("Error adding filter %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
osnoise_top_usage(params, "--filter requires a previous -e\n"); fatal("--filter requires a previous -e");
} }
break; break;
case '2': case '2':
@ -518,31 +488,25 @@ struct common_params *osnoise_top_parse_args(int argc, char **argv)
case '4': case '4':
retval = actions_parse(&params->common.threshold_actions, optarg, retval = actions_parse(&params->common.threshold_actions, optarg,
"osnoise_trace.txt"); "osnoise_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
case '5': case '5':
retval = actions_parse(&params->common.end_actions, optarg, retval = actions_parse(&params->common.end_actions, optarg,
"osnoise_trace.txt"); "osnoise_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
default: default:
osnoise_top_usage(params, "Invalid option"); fatal("Invalid option");
} }
} }
if (trace_output) if (trace_output)
actions_add_trace_output(&params->common.threshold_actions, trace_output); actions_add_trace_output(&params->common.threshold_actions, trace_output);
if (geteuid()) { if (geteuid())
err_msg("osnoise needs root permission\n"); fatal("osnoise needs root permission");
exit(EXIT_FAILURE);
}
return &params->common; return &params->common;
} }

View File

@ -148,6 +148,9 @@ int handle_timerlat_sample(struct trace_event_raw_timerlat_sample *tp_args)
} else { } else {
update_main_hist(&hist_user, bucket); update_main_hist(&hist_user, bucket);
update_summary(&summary_user, latency, bucket); update_summary(&summary_user, latency, bucket);
if (thread_threshold != 0 && latency_us >= thread_threshold)
set_stop_tracing();
} }
return 0; return 0;

View File

@ -126,9 +126,7 @@ int timerlat_enable(struct osnoise_tool *tool)
nr_cpus = sysconf(_SC_NPROCESSORS_CONF); nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
for (i = 0; i < nr_cpus; i++) { for_each_monitored_cpu(i, nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(i, &params->common.monitored_cpus))
continue;
if (save_cpu_idle_disable_state(i) < 0) { if (save_cpu_idle_disable_state(i) < 0) {
err_msg("Could not save cpu idle state.\n"); err_msg("Could not save cpu idle state.\n");
return -1; return -1;
@ -215,16 +213,14 @@ void timerlat_analyze(struct osnoise_tool *tool, bool stopped)
void timerlat_free(struct osnoise_tool *tool) void timerlat_free(struct osnoise_tool *tool)
{ {
struct timerlat_params *params = to_timerlat_params(tool->params); struct timerlat_params *params = to_timerlat_params(tool->params);
int nr_cpus, i; int nr_cpus = sysconf(_SC_NPROCESSORS_CONF);
int i;
timerlat_aa_destroy(); timerlat_aa_destroy();
if (dma_latency_fd >= 0) if (dma_latency_fd >= 0)
close(dma_latency_fd); close(dma_latency_fd);
if (params->deepest_idle_state >= -1) { if (params->deepest_idle_state >= -1) {
for (i = 0; i < nr_cpus; i++) { for_each_monitored_cpu(i, nr_cpus, &params->common) {
if (params->common.cpus &&
!CPU_ISSET(i, &params->common.monitored_cpus))
continue;
restore_cpu_idle_disable_state(i); restore_cpu_idle_disable_state(i);
} }
} }

View File

@ -305,9 +305,7 @@ static void timerlat_hist_header(struct osnoise_tool *tool)
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(s, "Index"); trace_seq_printf(s, "Index");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -359,9 +357,7 @@ timerlat_print_summary(struct timerlat_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "count:"); trace_seq_printf(trace->seq, "count:");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -383,9 +379,7 @@ timerlat_print_summary(struct timerlat_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "min: "); trace_seq_printf(trace->seq, "min: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -413,9 +407,7 @@ timerlat_print_summary(struct timerlat_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "avg: "); trace_seq_printf(trace->seq, "avg: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -443,9 +435,7 @@ timerlat_print_summary(struct timerlat_params *params,
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "max: "); trace_seq_printf(trace->seq, "max: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -490,9 +480,7 @@ timerlat_print_stats_all(struct timerlat_params *params,
sum.min_thread = ~0; sum.min_thread = ~0;
sum.min_user = ~0; sum.min_user = ~0;
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -639,9 +627,7 @@ timerlat_print_stats(struct osnoise_tool *tool)
trace_seq_printf(trace->seq, "%-6d", trace_seq_printf(trace->seq, "%-6d",
bucket * data->bucket_size); bucket * data->bucket_size);
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -679,9 +665,7 @@ timerlat_print_stats(struct osnoise_tool *tool)
if (!params->common.hist.no_index) if (!params->common.hist.no_index)
trace_seq_printf(trace->seq, "over: "); trace_seq_printf(trace->seq, "over: ");
for (cpu = 0; cpu < data->nr_cpus; cpu++) { for_each_monitored_cpu(cpu, data->nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(cpu, &params->common.monitored_cpus))
continue;
if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count) if (!data->hist[cpu].irq_count && !data->hist[cpu].thread_count)
continue; continue;
@ -710,16 +694,16 @@ timerlat_print_stats(struct osnoise_tool *tool)
/* /*
* timerlat_hist_usage - prints timerlat top usage message * timerlat_hist_usage - prints timerlat top usage message
*/ */
static void timerlat_hist_usage(char *usage) static void timerlat_hist_usage(void)
{ {
int i; int i;
char *msg[] = { char *msg[] = {
"", "",
" usage: [rtla] timerlat hist [-h] [-q] [-d s] [-D] [-n] [-a us] [-p us] [-i us] [-T us] [-s us] \\", " usage: [rtla] timerlat hist [-h] [-q] [-d s] [-D] [-n] [-a us] [-p us] [-i us] [-T us] [-s us] \\",
" [-t[file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] [-c cpu-list] [-H cpu-list]\\", " [-t [file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] [-c cpu-list] [-H cpu-list]\\",
" [-P priority] [-E N] [-b N] [--no-irq] [--no-thread] [--no-header] [--no-summary] \\", " [-P priority] [-E N] [-b N] [--no-irq] [--no-thread] [--no-header] [--no-summary] \\",
" [--no-index] [--with-zeros] [--dma-latency us] [-C[=cgroup_name]] [--no-aa] [--dump-task] [-u|-k]", " [--no-index] [--with-zeros] [--dma-latency us] [-C [cgroup_name]] [--no-aa] [--dump-task] [-u|-k]",
" [--warm-up s] [--deepest-idle-state n]", " [--warm-up s] [--deepest-idle-state n]",
"", "",
" -h/--help: print this menu", " -h/--help: print this menu",
@ -730,11 +714,11 @@ static void timerlat_hist_usage(char *usage)
" -s/--stack us: save the stack trace at the IRQ if a thread latency is higher than the argument in us", " -s/--stack us: save the stack trace at the IRQ if a thread latency is higher than the argument in us",
" -c/--cpus cpus: run the tracer only on the given cpus", " -c/--cpus cpus: run the tracer only on the given cpus",
" -H/--house-keeping cpus: run rtla control threads only on the given cpus", " -H/--house-keeping cpus: run rtla control threads only on the given cpus",
" -C/--cgroup[=cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited", " -C/--cgroup [cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited",
" -d/--duration time[m|h|d]: duration of the session in seconds", " -d/--duration time[m|h|d]: duration of the session in seconds",
" --dump-tasks: prints the task running on all CPUs if stop conditions are met (depends on !--no-aa)", " --dump-tasks: prints the task running on all CPUs if stop conditions are met (depends on !--no-aa)",
" -D/--debug: print debug info", " -D/--debug: print debug info",
" -t/--trace[file]: save the stopped trace to [file|timerlat_trace.txt]", " -t/--trace [file]: save the stopped trace to [file|timerlat_trace.txt]",
" -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed", " -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed",
" --filter <filter>: enable a trace event filter to the previous -e event", " --filter <filter>: enable a trace event filter to the previous -e event",
" --trigger <trigger>: enable a trace event trigger to the previous -e event", " --trigger <trigger>: enable a trace event trigger to the previous -e event",
@ -766,18 +750,12 @@ static void timerlat_hist_usage(char *usage)
NULL, NULL,
}; };
if (usage)
fprintf(stderr, "%s\n", usage);
fprintf(stderr, "rtla timerlat hist: a per-cpu histogram of the timer latency (version %s)\n", fprintf(stderr, "rtla timerlat hist: a per-cpu histogram of the timer latency (version %s)\n",
VERSION); VERSION);
for (i = 0; msg[i]; i++) for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]); fprintf(stderr, "%s\n", msg[i]);
if (usage)
exit(EXIT_FAILURE);
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);
} }
@ -856,11 +834,8 @@ static struct common_params
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
/* getopt_long stores the option index here. */
int option_index = 0;
c = getopt_long(argc, argv, "a:c:C::b:d:e:E:DhH:i:knp:P:s:t::T:uU0123456:7:8:9\1\2:\3:", c = getopt_long(argc, argv, "a:c:C::b:d:e:E:DhH:i:knp:P:s:t::T:uU0123456:7:8:9\1\2:\3:",
long_options, &option_index); long_options, NULL);
/* detect the end of the options. */ /* detect the end of the options. */
if (c == -1) if (c == -1)
@ -878,30 +853,25 @@ static struct common_params
params->print_stack = auto_thresh; params->print_stack = auto_thresh;
/* set trace */ /* set trace */
trace_output = "timerlat_trace.txt"; if (!trace_output)
trace_output = "timerlat_trace.txt";
break; break;
case 'c': case 'c':
retval = parse_cpu_set(optarg, &params->common.monitored_cpus); retval = parse_cpu_set(optarg, &params->common.monitored_cpus);
if (retval) if (retval)
timerlat_hist_usage("\nInvalid -c cpu list\n"); fatal("Invalid -c cpu list");
params->common.cpus = optarg; params->common.cpus = optarg;
break; break;
case 'C': case 'C':
params->common.cgroup = 1; params->common.cgroup = 1;
if (!optarg) { params->common.cgroup_name = parse_optional_arg(argc, argv);
/* will inherit this cgroup */
params->common.cgroup_name = NULL;
} else if (*optarg == '=') {
/* skip the = */
params->common.cgroup_name = ++optarg;
}
break; break;
case 'b': case 'b':
params->common.hist.bucket_size = get_llong_from_str(optarg); params->common.hist.bucket_size = get_llong_from_str(optarg);
if (params->common.hist.bucket_size == 0 || if (params->common.hist.bucket_size == 0 ||
params->common.hist.bucket_size >= 1000000) params->common.hist.bucket_size >= 1000000)
timerlat_hist_usage("Bucket size needs to be > 0 and <= 1000000\n"); fatal("Bucket size needs to be > 0 and <= 1000000");
break; break;
case 'D': case 'D':
config_debug = 1; config_debug = 1;
@ -909,14 +879,12 @@ static struct common_params
case 'd': case 'd':
params->common.duration = parse_seconds_duration(optarg); params->common.duration = parse_seconds_duration(optarg);
if (!params->common.duration) if (!params->common.duration)
timerlat_hist_usage("Invalid -D duration\n"); fatal("Invalid -D duration");
break; break;
case 'e': case 'e':
tevent = trace_event_alloc(optarg); tevent = trace_event_alloc(optarg);
if (!tevent) { if (!tevent)
err_msg("Error alloc trace event"); fatal("Error alloc trace event");
exit(EXIT_FAILURE);
}
if (params->common.events) if (params->common.events)
tevent->next = params->common.events; tevent->next = params->common.events;
@ -927,19 +895,17 @@ static struct common_params
params->common.hist.entries = get_llong_from_str(optarg); params->common.hist.entries = get_llong_from_str(optarg);
if (params->common.hist.entries < 10 || if (params->common.hist.entries < 10 ||
params->common.hist.entries > 9999999) params->common.hist.entries > 9999999)
timerlat_hist_usage("Entries must be > 10 and < 9999999\n"); fatal("Entries must be > 10 and < 9999999");
break; break;
case 'h': case 'h':
case '?': case '?':
timerlat_hist_usage(NULL); timerlat_hist_usage();
break; break;
case 'H': case 'H':
params->common.hk_cpus = 1; params->common.hk_cpus = 1;
retval = parse_cpu_set(optarg, &params->common.hk_cpu_set); retval = parse_cpu_set(optarg, &params->common.hk_cpu_set);
if (retval) { if (retval)
err_msg("Error parsing house keeping CPUs\n"); fatal("Error parsing house keeping CPUs");
exit(EXIT_FAILURE);
}
break; break;
case 'i': case 'i':
params->common.stop_us = get_llong_from_str(optarg); params->common.stop_us = get_llong_from_str(optarg);
@ -953,12 +919,12 @@ static struct common_params
case 'p': case 'p':
params->timerlat_period_us = get_llong_from_str(optarg); params->timerlat_period_us = get_llong_from_str(optarg);
if (params->timerlat_period_us > 1000000) if (params->timerlat_period_us > 1000000)
timerlat_hist_usage("Period longer than 1 s\n"); fatal("Period longer than 1 s");
break; break;
case 'P': case 'P':
retval = parse_prio(optarg, &params->common.sched_param); retval = parse_prio(optarg, &params->common.sched_param);
if (retval == -1) if (retval == -1)
timerlat_hist_usage("Invalid -P priority"); fatal("Invalid -P priority");
params->common.set_sched = 1; params->common.set_sched = 1;
break; break;
case 's': case 's':
@ -968,14 +934,8 @@ static struct common_params
params->common.stop_total_us = get_llong_from_str(optarg); params->common.stop_total_us = get_llong_from_str(optarg);
break; break;
case 't': case 't':
if (optarg) { trace_output = parse_optional_arg(argc, argv);
if (optarg[0] == '=') if (!trace_output)
trace_output = &optarg[1];
else
trace_output = &optarg[0];
} else if (optind < argc && argv[optind][0] != '-')
trace_output = argv[optind];
else
trace_output = "timerlat_trace.txt"; trace_output = "timerlat_trace.txt";
break; break;
case 'u': case 'u':
@ -1005,31 +965,25 @@ static struct common_params
case '6': /* trigger */ case '6': /* trigger */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_trigger(params->common.events, optarg); retval = trace_event_add_trigger(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding trigger %s\n", optarg); fatal("Error adding trigger %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
timerlat_hist_usage("--trigger requires a previous -e\n"); fatal("--trigger requires a previous -e");
} }
break; break;
case '7': /* filter */ case '7': /* filter */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_filter(params->common.events, optarg); retval = trace_event_add_filter(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding filter %s\n", optarg); fatal("Error adding filter %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
timerlat_hist_usage("--filter requires a previous -e\n"); fatal("--filter requires a previous -e");
} }
break; break;
case '8': case '8':
params->dma_latency = get_llong_from_str(optarg); params->dma_latency = get_llong_from_str(optarg);
if (params->dma_latency < 0 || params->dma_latency > 10000) { if (params->dma_latency < 0 || params->dma_latency > 10000)
err_msg("--dma-latency needs to be >= 0 and < 10000"); fatal("--dma-latency needs to be >= 0 and < 10000");
exit(EXIT_FAILURE);
}
break; break;
case '9': case '9':
params->no_aa = 1; params->no_aa = 1;
@ -1049,37 +1003,31 @@ static struct common_params
case '\5': case '\5':
retval = actions_parse(&params->common.threshold_actions, optarg, retval = actions_parse(&params->common.threshold_actions, optarg,
"timerlat_trace.txt"); "timerlat_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
case '\6': case '\6':
retval = actions_parse(&params->common.end_actions, optarg, retval = actions_parse(&params->common.end_actions, optarg,
"timerlat_trace.txt"); "timerlat_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
default: default:
timerlat_hist_usage("Invalid option"); fatal("Invalid option");
} }
} }
if (trace_output) if (trace_output)
actions_add_trace_output(&params->common.threshold_actions, trace_output); actions_add_trace_output(&params->common.threshold_actions, trace_output);
if (geteuid()) { if (geteuid())
err_msg("rtla needs root permission\n"); fatal("rtla needs root permission");
exit(EXIT_FAILURE);
}
if (params->common.hist.no_irq && params->common.hist.no_thread) if (params->common.hist.no_irq && params->common.hist.no_thread)
timerlat_hist_usage("no-irq and no-thread set, there is nothing to do here"); fatal("no-irq and no-thread set, there is nothing to do here");
if (params->common.hist.no_index && !params->common.hist.with_zeros) if (params->common.hist.no_index && !params->common.hist.with_zeros)
timerlat_hist_usage("no-index set with with-zeros is not set - it does not make sense"); fatal("no-index set with with-zeros is not set - it does not make sense");
/* /*
* Auto analysis only happens if stop tracing, thus: * Auto analysis only happens if stop tracing, thus:
@ -1088,7 +1036,7 @@ static struct common_params
params->no_aa = 1; params->no_aa = 1;
if (params->common.kernel_workload && params->common.user_workload) if (params->common.kernel_workload && params->common.user_workload)
timerlat_hist_usage("--kernel-threads and --user-threads are mutually exclusive!"); fatal("--kernel-threads and --user-threads are mutually exclusive!");
/* /*
* If auto-analysis or trace output is enabled, switch from BPF mode to * If auto-analysis or trace output is enabled, switch from BPF mode to

View File

@ -459,9 +459,7 @@ timerlat_print_stats(struct osnoise_tool *top)
timerlat_top_header(params, top); timerlat_top_header(params, top);
for (i = 0; i < nr_cpus; i++) { for_each_monitored_cpu(i, nr_cpus, &params->common) {
if (params->common.cpus && !CPU_ISSET(i, &params->common.monitored_cpus))
continue;
timerlat_top_print(top, i); timerlat_top_print(top, i);
timerlat_top_update_sum(top, i, &summary); timerlat_top_update_sum(top, i, &summary);
} }
@ -476,15 +474,15 @@ timerlat_print_stats(struct osnoise_tool *top)
/* /*
* timerlat_top_usage - prints timerlat top usage message * timerlat_top_usage - prints timerlat top usage message
*/ */
static void timerlat_top_usage(char *usage) static void timerlat_top_usage(void)
{ {
int i; int i;
static const char *const msg[] = { static const char *const msg[] = {
"", "",
" usage: rtla timerlat [top] [-h] [-q] [-a us] [-d s] [-D] [-n] [-p us] [-i us] [-T us] [-s us] \\", " usage: rtla timerlat [top] [-h] [-q] [-a us] [-d s] [-D] [-n] [-p us] [-i us] [-T us] [-s us] \\",
" [[-t[file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] [-c cpu-list] [-H cpu-list]\\", " [[-t [file]] [-e sys[:event]] [--filter <filter>] [--trigger <trigger>] [-c cpu-list] [-H cpu-list]\\",
" [-P priority] [--dma-latency us] [--aa-only us] [-C[=cgroup_name]] [-u|-k] [--warm-up s] [--deepest-idle-state n]", " [-P priority] [--dma-latency us] [--aa-only us] [-C [cgroup_name]] [-u|-k] [--warm-up s] [--deepest-idle-state n]",
"", "",
" -h/--help: print this menu", " -h/--help: print this menu",
" -a/--auto: set automatic trace mode, stopping the session if argument in us latency is hit", " -a/--auto: set automatic trace mode, stopping the session if argument in us latency is hit",
@ -495,11 +493,11 @@ static void timerlat_top_usage(char *usage)
" -s/--stack us: save the stack trace at the IRQ if a thread latency is higher than the argument in us", " -s/--stack us: save the stack trace at the IRQ if a thread latency is higher than the argument in us",
" -c/--cpus cpus: run the tracer only on the given cpus", " -c/--cpus cpus: run the tracer only on the given cpus",
" -H/--house-keeping cpus: run rtla control threads only on the given cpus", " -H/--house-keeping cpus: run rtla control threads only on the given cpus",
" -C/--cgroup[=cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited", " -C/--cgroup [cgroup_name]: set cgroup, if no cgroup_name is passed, the rtla's cgroup will be inherited",
" -d/--duration time[s|m|h|d]: duration of the session", " -d/--duration time[s|m|h|d]: duration of the session",
" -D/--debug: print debug info", " -D/--debug: print debug info",
" --dump-tasks: prints the task running on all CPUs if stop conditions are met (depends on !--no-aa)", " --dump-tasks: prints the task running on all CPUs if stop conditions are met (depends on !--no-aa)",
" -t/--trace[file]: save the stopped trace to [file|timerlat_trace.txt]", " -t/--trace [file]: save the stopped trace to [file|timerlat_trace.txt]",
" -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed", " -e/--event <sys:event>: enable the <sys:event> in the trace instance, multiple -e are allowed",
" --filter <command>: enable a trace event filter to the previous -e event", " --filter <command>: enable a trace event filter to the previous -e event",
" --trigger <command>: enable a trace event trigger to the previous -e event", " --trigger <command>: enable a trace event trigger to the previous -e event",
@ -524,18 +522,12 @@ static void timerlat_top_usage(char *usage)
NULL, NULL,
}; };
if (usage)
fprintf(stderr, "%s\n", usage);
fprintf(stderr, "rtla timerlat top: a per-cpu summary of the timer latency (version %s)\n", fprintf(stderr, "rtla timerlat top: a per-cpu summary of the timer latency (version %s)\n",
VERSION); VERSION);
for (i = 0; msg[i]; i++) for (i = 0; msg[i]; i++)
fprintf(stderr, "%s\n", msg[i]); fprintf(stderr, "%s\n", msg[i]);
if (usage)
exit(EXIT_FAILURE);
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);
} }
@ -606,11 +598,8 @@ static struct common_params
{0, 0, 0, 0} {0, 0, 0, 0}
}; };
/* getopt_long stores the option index here. */
int option_index = 0;
c = getopt_long(argc, argv, "a:c:C::d:De:hH:i:knp:P:qs:t::T:uU0:1:2:345:6:7:", c = getopt_long(argc, argv, "a:c:C::d:De:hH:i:knp:P:qs:t::T:uU0:1:2:345:6:7:",
long_options, &option_index); long_options, NULL);
/* detect the end of the options. */ /* detect the end of the options. */
if (c == -1) if (c == -1)
@ -628,7 +617,8 @@ static struct common_params
params->print_stack = auto_thresh; params->print_stack = auto_thresh;
/* set trace */ /* set trace */
trace_output = "timerlat_trace.txt"; if (!trace_output)
trace_output = "timerlat_trace.txt";
break; break;
case '5': case '5':
@ -648,18 +638,12 @@ static struct common_params
case 'c': case 'c':
retval = parse_cpu_set(optarg, &params->common.monitored_cpus); retval = parse_cpu_set(optarg, &params->common.monitored_cpus);
if (retval) if (retval)
timerlat_top_usage("\nInvalid -c cpu list\n"); fatal("Invalid -c cpu list");
params->common.cpus = optarg; params->common.cpus = optarg;
break; break;
case 'C': case 'C':
params->common.cgroup = 1; params->common.cgroup = 1;
if (!optarg) { params->common.cgroup_name = optarg;
/* will inherit this cgroup */
params->common.cgroup_name = NULL;
} else if (*optarg == '=') {
/* skip the = */
params->common.cgroup_name = ++optarg;
}
break; break;
case 'D': case 'D':
config_debug = 1; config_debug = 1;
@ -667,14 +651,12 @@ static struct common_params
case 'd': case 'd':
params->common.duration = parse_seconds_duration(optarg); params->common.duration = parse_seconds_duration(optarg);
if (!params->common.duration) if (!params->common.duration)
timerlat_top_usage("Invalid -d duration\n"); fatal("Invalid -d duration");
break; break;
case 'e': case 'e':
tevent = trace_event_alloc(optarg); tevent = trace_event_alloc(optarg);
if (!tevent) { if (!tevent)
err_msg("Error alloc trace event"); fatal("Error alloc trace event");
exit(EXIT_FAILURE);
}
if (params->common.events) if (params->common.events)
tevent->next = params->common.events; tevent->next = params->common.events;
@ -682,15 +664,13 @@ static struct common_params
break; break;
case 'h': case 'h':
case '?': case '?':
timerlat_top_usage(NULL); timerlat_top_usage();
break; break;
case 'H': case 'H':
params->common.hk_cpus = 1; params->common.hk_cpus = 1;
retval = parse_cpu_set(optarg, &params->common.hk_cpu_set); retval = parse_cpu_set(optarg, &params->common.hk_cpu_set);
if (retval) { if (retval)
err_msg("Error parsing house keeping CPUs\n"); fatal("Error parsing house keeping CPUs");
exit(EXIT_FAILURE);
}
break; break;
case 'i': case 'i':
params->common.stop_us = get_llong_from_str(optarg); params->common.stop_us = get_llong_from_str(optarg);
@ -704,12 +684,12 @@ static struct common_params
case 'p': case 'p':
params->timerlat_period_us = get_llong_from_str(optarg); params->timerlat_period_us = get_llong_from_str(optarg);
if (params->timerlat_period_us > 1000000) if (params->timerlat_period_us > 1000000)
timerlat_top_usage("Period longer than 1 s\n"); fatal("Period longer than 1 s");
break; break;
case 'P': case 'P':
retval = parse_prio(optarg, &params->common.sched_param); retval = parse_prio(optarg, &params->common.sched_param);
if (retval == -1) if (retval == -1)
timerlat_top_usage("Invalid -P priority"); fatal("Invalid -P priority");
params->common.set_sched = 1; params->common.set_sched = 1;
break; break;
case 'q': case 'q':
@ -722,14 +702,8 @@ static struct common_params
params->common.stop_total_us = get_llong_from_str(optarg); params->common.stop_total_us = get_llong_from_str(optarg);
break; break;
case 't': case 't':
if (optarg) { trace_output = parse_optional_arg(argc, argv);
if (optarg[0] == '=') if (!trace_output)
trace_output = &optarg[1];
else
trace_output = &optarg[0];
} else if (optind < argc && argv[optind][0] != '-')
trace_output = argv[optind];
else
trace_output = "timerlat_trace.txt"; trace_output = "timerlat_trace.txt";
break; break;
case 'u': case 'u':
@ -741,31 +715,25 @@ static struct common_params
case '0': /* trigger */ case '0': /* trigger */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_trigger(params->common.events, optarg); retval = trace_event_add_trigger(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding trigger %s\n", optarg); fatal("Error adding trigger %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
timerlat_top_usage("--trigger requires a previous -e\n"); fatal("--trigger requires a previous -e");
} }
break; break;
case '1': /* filter */ case '1': /* filter */
if (params->common.events) { if (params->common.events) {
retval = trace_event_add_filter(params->common.events, optarg); retval = trace_event_add_filter(params->common.events, optarg);
if (retval) { if (retval)
err_msg("Error adding filter %s\n", optarg); fatal("Error adding filter %s", optarg);
exit(EXIT_FAILURE);
}
} else { } else {
timerlat_top_usage("--filter requires a previous -e\n"); fatal("--filter requires a previous -e");
} }
break; break;
case '2': /* dma-latency */ case '2': /* dma-latency */
params->dma_latency = get_llong_from_str(optarg); params->dma_latency = get_llong_from_str(optarg);
if (params->dma_latency < 0 || params->dma_latency > 10000) { if (params->dma_latency < 0 || params->dma_latency > 10000)
err_msg("--dma-latency needs to be >= 0 and < 10000"); fatal("--dma-latency needs to be >= 0 and < 10000");
exit(EXIT_FAILURE);
}
break; break;
case '3': /* no-aa */ case '3': /* no-aa */
params->no_aa = 1; params->no_aa = 1;
@ -785,31 +753,25 @@ static struct common_params
case '9': case '9':
retval = actions_parse(&params->common.threshold_actions, optarg, retval = actions_parse(&params->common.threshold_actions, optarg,
"timerlat_trace.txt"); "timerlat_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
case '\1': case '\1':
retval = actions_parse(&params->common.end_actions, optarg, retval = actions_parse(&params->common.end_actions, optarg,
"timerlat_trace.txt"); "timerlat_trace.txt");
if (retval) { if (retval)
err_msg("Invalid action %s\n", optarg); fatal("Invalid action %s", optarg);
exit(EXIT_FAILURE);
}
break; break;
default: default:
timerlat_top_usage("Invalid option"); fatal("Invalid option");
} }
} }
if (trace_output) if (trace_output)
actions_add_trace_output(&params->common.threshold_actions, trace_output); actions_add_trace_output(&params->common.threshold_actions, trace_output);
if (geteuid()) { if (geteuid())
err_msg("rtla needs root permission\n"); fatal("rtla needs root permission");
exit(EXIT_FAILURE);
}
/* /*
* Auto analysis only happens if stop tracing, thus: * Auto analysis only happens if stop tracing, thus:
@ -818,10 +780,10 @@ static struct common_params
params->no_aa = 1; params->no_aa = 1;
if (params->no_aa && params->common.aa_only) if (params->no_aa && params->common.aa_only)
timerlat_top_usage("--no-aa and --aa-only are mutually exclusive!"); fatal("--no-aa and --aa-only are mutually exclusive!");
if (params->common.kernel_workload && params->common.user_workload) if (params->common.kernel_workload && params->common.user_workload)
timerlat_top_usage("--kernel-threads and --user-threads are mutually exclusive!"); fatal("--kernel-threads and --user-threads are mutually exclusive!");
/* /*
* If auto-analysis or trace output is enabled, switch from BPF mode to * If auto-analysis or trace output is enabled, switch from BPF mode to
@ -916,7 +878,7 @@ timerlat_top_bpf_main_loop(struct osnoise_tool *tool)
if (!params->common.quiet) if (!params->common.quiet)
timerlat_print_stats(tool); timerlat_print_stats(tool);
if (wait_retval == 1) { if (wait_retval != 0) {
/* Stopping requested by tracer */ /* Stopping requested by tracer */
actions_perform(&params->common.threshold_actions); actions_perform(&params->common.threshold_actions);

View File

@ -51,10 +51,8 @@ static int timerlat_u_main(int cpu, struct timerlat_u_params *params)
if (!params->sched_param) { if (!params->sched_param) {
retval = sched_setscheduler(0, SCHED_FIFO, &sp); retval = sched_setscheduler(0, SCHED_FIFO, &sp);
if (retval < 0) { if (retval < 0)
err_msg("Error setting timerlat u default priority: %s\n", strerror(errno)); fatal("Error setting timerlat u default priority: %s", strerror(errno));
exit(1);
}
} else { } else {
retval = __set_sched_attr(getpid(), params->sched_param); retval = __set_sched_attr(getpid(), params->sched_param);
if (retval) { if (retval) {
@ -78,10 +76,8 @@ static int timerlat_u_main(int cpu, struct timerlat_u_params *params)
snprintf(buffer, sizeof(buffer), "osnoise/per_cpu/cpu%d/timerlat_fd", cpu); snprintf(buffer, sizeof(buffer), "osnoise/per_cpu/cpu%d/timerlat_fd", cpu);
timerlat_fd = tracefs_instance_file_open(NULL, buffer, O_RDONLY); timerlat_fd = tracefs_instance_file_open(NULL, buffer, O_RDONLY);
if (timerlat_fd < 0) { if (timerlat_fd < 0)
err_msg("Error opening %s:%s\n", buffer, strerror(errno)); fatal("Error opening %s:%s", buffer, strerror(errno));
exit(1);
}
debug_msg("User-space timerlat pid %d on cpu %d\n", gettid(), cpu); debug_msg("User-space timerlat pid %d on cpu %d\n", gettid(), cpu);

View File

@ -56,6 +56,21 @@ void debug_msg(const char *fmt, ...)
fprintf(stderr, "%s", message); fprintf(stderr, "%s", message);
} }
/*
* fatal - print an error message and EOL to stderr and exit with ERROR
*/
void fatal(const char *fmt, ...)
{
va_list ap;
va_start(ap, fmt);
vfprintf(stderr, fmt, ap);
va_end(ap);
fprintf(stderr, "\n");
exit(ERROR);
}
/* /*
* get_llong_from_str - get a long long int from a string * get_llong_from_str - get a long long int from a string
*/ */
@ -959,3 +974,29 @@ int auto_house_keeping(cpu_set_t *monitored_cpus)
return 1; return 1;
} }
/**
* parse_optional_arg - Parse optional argument value
*
* Parse optional argument value, which can be in the form of:
* -sarg, -s/--long=arg, -s/--long arg
*
* Returns arg value if found, NULL otherwise.
*/
char *parse_optional_arg(int argc, char **argv)
{
if (optarg) {
if (optarg[0] == '=') {
/* skip the = */
return &optarg[1];
} else {
return optarg;
}
/* parse argument of form -s [arg] and --long [arg]*/
} else if (optind < argc && argv[optind][0] != '-') {
/* consume optind */
return argv[optind++];
} else {
return NULL;
}
}

View File

@ -19,11 +19,13 @@
extern int config_debug; extern int config_debug;
void debug_msg(const char *fmt, ...); void debug_msg(const char *fmt, ...);
void err_msg(const char *fmt, ...); void err_msg(const char *fmt, ...);
void fatal(const char *fmt, ...);
long parse_seconds_duration(char *val); long parse_seconds_duration(char *val);
void get_duration(time_t start_time, char *output, int output_size); void get_duration(time_t start_time, char *output, int output_size);
int parse_cpu_list(char *cpu_list, char **monitored_cpus); int parse_cpu_list(char *cpu_list, char **monitored_cpus);
char *parse_optional_arg(int argc, char **argv);
long long get_llong_from_str(char *start); long long get_llong_from_str(char *start);
static inline void static inline void

View File

@ -37,11 +37,11 @@ check "multiple actions" \
check "hist stop at failed action" \ check "hist stop at failed action" \
"osnoise hist -S 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA osnoise histogram$" "osnoise hist -S 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA osnoise histogram$"
check "top stop at failed action" \ check "top stop at failed action" \
"timerlat top -T 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh" "osnoise top -S 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
check "hist with continue" \ check "hist with continue" \
"osnoise hist -S 2 -d 1s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$" "osnoise hist -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
check "top with continue" \ check "top with continue" \
"osnoise top -q -S 2 -d 1s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$" "osnoise top -q -S 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
check "hist with trace output at end" \ check "hist with trace output at end" \
"osnoise hist -d 1s --on-end trace" 0 "^ Saving trace to osnoise_trace.txt$" "osnoise hist -d 1s --on-end trace" 0 "^ Saving trace to osnoise_trace.txt$"
check "top with trace output at end" \ check "top with trace output at end" \

View File

@ -58,11 +58,11 @@ check "multiple actions" \
check "hist stop at failed action" \ check "hist stop at failed action" \
"timerlat hist -T 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA timerlat histogram$" "timerlat hist -T 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1# RTLA timerlat histogram$"
check "top stop at failed action" \ check "top stop at failed action" \
"timerlat top -T 2 --on-threshold shell,command='echo -n 1; false' --on-threshold shell,command='echo -n 2'" 2 "^1ALL" "timerlat top -T 2 --on-threshold shell,command='echo -n abc; false' --on-threshold shell,command='echo -n defgh'" 2 "^abc" "defgh"
check "hist with continue" \ check "hist with continue" \
"timerlat hist -T 2 -d 1s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$" "timerlat hist -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
check "top with continue" \ check "top with continue" \
"timerlat top -q -T 2 -d 1s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$" "timerlat top -q -T 2 -d 5s --on-threshold shell,command='echo TestOutput' --on-threshold continue" 0 "^TestOutput$"
check "hist with trace output at end" \ check "hist with trace output at end" \
"timerlat hist -d 1s --on-end trace" 0 "^ Saving trace to timerlat_trace.txt$" "timerlat hist -d 1s --on-end trace" 0 "^ Saving trace to timerlat_trace.txt$"
check "top with trace output at end" \ check "top with trace output at end" \