sched/mmcid: Use cpumask_weighted_or()

Use cpumask_weighted_or() instead of cpumask_or() and cpumask_weight() on the result, which walks the same bitmap twice. Results in 10-20% less cycles, which reduces the runqueue lock hold time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com> Link: https://patch.msgid.link/20251119172549.511736272@linutronix.de
2025-11-19 18:26:59 +01:00 · 2025-11-19 18:26:59 +01:00 · 79c11fb3da
parent 437cb3ded2
commit 79c11fb3da
1 changed files with 3 additions and 2 deletions
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@ -10377,6 +10377,7 @@ void call_trace_sched_update_nr_running(struct rq *rq, int count)
 static inline void mm_update_cpus_allowed(struct mm_struct *mm, const struct cpumask *affmsk)
 {
 	struct cpumask *mm_allowed;
+	unsigned int weight;

 	if (!mm)
 		return;
@ -10387,8 +10388,8 @@ static inline void mm_update_cpus_allowed(struct mm_struct *mm, const struct cpu
 	 */
 	guard(raw_spinlock)(&mm->mm_cid.lock);
 	mm_allowed = mm_cpus_allowed(mm);
-	cpumask_or(mm_allowed, mm_allowed, affmsk);
-	WRITE_ONCE(mm->mm_cid.nr_cpus_allowed, cpumask_weight(mm_allowed));
+	weight = cpumask_weighted_or(mm_allowed, mm_allowed, affmsk);
+	WRITE_ONCE(mm->mm_cid.nr_cpus_allowed, weight);
 }

 void sched_mm_cid_exit_signals(struct task_struct *t)