linux/kernel/sched
Linus Torvalds 02baaa67d9 sched_ext: Changes for v6.19
- Improve recovery from misbehaving BPF schedulers. When a scheduler puts many
   tasks with varying affinity restrictions on a shared DSQ, CPUs scanning
   through tasks they cannot run can overwhelm the system, causing lockups.
   Bypass mode now uses per-CPU DSQs with a load balancer to avoid this, and
   hooks into the hardlockup detector to attempt recovery. Add scx_cpu0 example
   scheduler to demonstrate this scenario.
 
 - Add lockless peek operation for DSQs to reduce lock contention for schedulers
   that need to query queue state during load balancing.
 
 - Allow scx_bpf_reenqueue_local() to be called from anywhere in preparation for
   deprecating cpu_acquire/release() callbacks in favor of generic BPF hooks.
 
 - Prepare for hierarchical scheduler support: add scx_bpf_task_set_slice() and
   scx_bpf_task_set_dsq_vtime() kfuncs, make scx_bpf_dsq_insert*() return bool,
   and wrap kfunc args in structs for future aux__prog parameter.
 
 - Implement cgroup_set_idle() callback to notify BPF schedulers when a cgroup's
   idle state changes.
 
 - Fix migration tasks being incorrectly downgraded from stop_sched_class to
   rt_sched_class across sched_ext enable/disable. Applied late as the fix is
   low risk and the bug subtle but needs stable backporting.
 
 - Various fixes and cleanups including cgroup exit ordering, SCX_KICK_WAIT
   reliability, and backward compatibility improvements.
 -----BEGIN PGP SIGNATURE-----
 
 iIQEABYKACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCaS4h1A4cdGpAa2VybmVs
 Lm9yZwAKCRCxYfJx3gVYGe/MAP9EZ0pLiTpmMtt6mI/11Fmi+aWfL84j1zt13cz9
 W4vb4gEA9eVEH6n9xyC4nhcOk9AQwSDuCWMOzLsnhW8TbEHVTww=
 =8W/B
 -----END PGP SIGNATURE-----

Merge tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext updates from Tejun Heo:

 - Improve recovery from misbehaving BPF schedulers.

   When a scheduler puts many tasks with varying affinity restrictions
   on a shared DSQ, CPUs scanning through tasks they cannot run can
   overwhelm the system, causing lockups.

   Bypass mode now uses per-CPU DSQs with a load balancer to avoid this,
   and hooks into the hardlockup detector to attempt recovery.

   Add scx_cpu0 example scheduler to demonstrate this scenario.

 - Add lockless peek operation for DSQs to reduce lock contention for
   schedulers that need to query queue state during load balancing.

 - Allow scx_bpf_reenqueue_local() to be called from anywhere in
   preparation for deprecating cpu_acquire/release() callbacks in favor
   of generic BPF hooks.

 - Prepare for hierarchical scheduler support: add
   scx_bpf_task_set_slice() and scx_bpf_task_set_dsq_vtime() kfuncs,
   make scx_bpf_dsq_insert*() return bool, and wrap kfunc args in
   structs for future aux__prog parameter.

 - Implement cgroup_set_idle() callback to notify BPF schedulers when a
   cgroup's idle state changes.

 - Fix migration tasks being incorrectly downgraded from
   stop_sched_class to rt_sched_class across sched_ext enable/disable.
   Applied late as the fix is low risk and the bug subtle but needs
   stable backporting.

 - Various fixes and cleanups including cgroup exit ordering,
   SCX_KICK_WAIT reliability, and backward compatibility improvements.

* tag 'sched_ext-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (44 commits)
  sched_ext: Fix incorrect sched_class settings for per-cpu migration tasks
  sched_ext: tools: Removing duplicate targets during non-cross compilation
  sched_ext: Use kvfree_rcu() to release per-cpu ksyncs object
  sched_ext: Pass locked CPU parameter to scx_hardlockup() and add docs
  sched_ext: Update comments replacing breather with aborting mechanism
  sched_ext: Implement load balancer for bypass mode
  sched_ext: Factor out abbreviated dispatch dequeue into dispatch_dequeue_locked()
  sched_ext: Factor out scx_dsq_list_node cursor initialization into INIT_DSQ_LIST_CURSOR
  sched_ext: Add scx_cpu0 example scheduler
  sched_ext: Hook up hardlockup detector
  sched_ext: Make handle_lockup() propagate scx_verror() result
  sched_ext: Refactor lockup handlers into handle_lockup()
  sched_ext: Make scx_exit() and scx_vexit() return bool
  sched_ext: Exit dispatch and move operations immediately when aborting
  sched_ext: Simplify breather mechanism with scx_aborting flag
  sched_ext: Use per-CPU DSQs instead of per-node global DSQs in bypass mode
  sched_ext: Refactor do_enqueue_task() local and global DSQ paths
  sched_ext: Use shorter slice in bypass mode
  sched_ext: Mark racy bitfields to prevent adding fields that can't tolerate races
  sched_ext: Minor cleanups to scx_task_iter
  ...
2025-12-03 13:25:39 -08:00
..
Makefile
autogroup.c cgroup: Rename cgroup lifecycle hooks to cgroup_task_*() 2025-11-03 11:46:18 -10:00
autogroup.h sched: Clean up and standardize #if/#else/#endif markers in sched/autogroup.[ch] 2025-06-13 08:47:14 +02:00
build_policy.c sched_ext: Move internal type and accessor definitions to ext_internal.h 2025-09-03 11:33:28 -10:00
build_utility.c sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
clock.c sched: Clean up and standardize #if/#else/#endif markers in sched/clock.c 2025-06-13 08:47:14 +02:00
completion.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
core.c sched_ext: Changes for v6.19 2025-12-03 13:25:39 -08:00
core_sched.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpuacct.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpudeadline.c sched/deadline: only set free_cpus for online runqueues 2025-10-16 11:13:49 +02:00
cpudeadline.h sched/deadline: only set free_cpus for online runqueues 2025-10-16 11:13:49 +02:00
cpufreq.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpufreq_schedutil.c sched: Clean up and standardize #if/#else/#endif markers in sched/cpufreq_schedutil.c 2025-06-13 08:47:15 +02:00
cpupri.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
cpupri.h sched/smp: Make SMP unconditional 2025-06-13 08:47:18 +02:00
cputime.c seqlock: Change thread_group_cputime() to use scoped_seqlock_read() 2025-10-21 12:31:57 +02:00
deadline.c cgroup: Changes for v6.19 2025-12-03 13:04:07 -08:00
debug.c sched/eevdf: Fix min_vruntime vs avg_vruntime 2025-11-11 12:33:38 +01:00
ext.c sched_ext: Changes for v6.19 2025-12-03 13:25:39 -08:00
ext.h sched_ext: Use cgroup_lock/unlock() to synchronize against cgroup operations 2025-09-03 11:36:07 -10:00
ext_idle.c sched_ext: Wrap kfunc args in struct to prepare for aux__prog 2025-10-13 08:49:29 -10:00
ext_idle.h sched_ext: Always use SMP versions in kernel/sched/ext_idle.h 2025-06-13 14:47:59 -10:00
ext_internal.h sched_ext: Implement load balancer for bypass mode 2025-11-12 06:43:44 -10:00
fair.c Scheduler changes for v6.19: 2025-12-01 21:04:45 -08:00
features.h sched/fair: Proportional newidle balance 2025-11-17 17:13:16 +01:00
idle.c Power management updates for 6.19-rc1 2025-12-02 17:31:22 -08:00
isolation.c sched/isolation: Force housekeeping if isolcpus and nohz_full don't leave any 2025-11-20 20:17:31 +01:00
loadavg.c Merge branch 'tip/sched/urgent' 2025-07-14 17:16:28 +02:00
membarrier.c rseq: Simplify the event notification 2025-11-04 08:30:09 +01:00
pelt.c sched: Clean up and standardize #if/#else/#endif markers in sched/pelt.[ch] 2025-06-13 08:47:17 +02:00
pelt.h sched/fair: Switch to task based throttle model 2025-09-03 10:03:14 +02:00
psi.c sched/psi: Fix psi_seq initialization 2025-08-04 10:51:22 -07:00
rq-offsets.c sched: Make migrate_{en,dis}able() inline 2025-09-25 09:57:16 +02:00
rt.c sched/proxy: Yield the donor task 2025-11-11 12:33:36 +01:00
sched-pelt.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
sched.h sched_ext: Changes for v6.19 2025-12-03 13:25:39 -08:00
smp.h sched: Make clangd usable 2025-06-11 11:20:53 +02:00
stats.c sched/smp: Use the SMP version of schedstats 2025-06-13 08:47:21 +02:00
stats.h sched: Match __task_rq_{,un}lock() 2025-10-16 11:13:54 +02:00
stop_task.c sched: Add support to pick functions to take rf 2025-10-16 11:13:55 +02:00
swait.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00
syscalls.c Updates for the interrupt core and treewide cleanups: 2025-12-02 09:14:26 -08:00
topology.c sched/fair: Proportional newidle balance 2025-11-17 17:13:16 +01:00
wait.c ARM: 2025-07-30 17:14:01 -07:00
wait_bit.c sched: Make clangd usable 2025-06-11 11:20:53 +02:00