sched/mmcid: Use cpumask_weighted_or()

Use cpumask_weighted_or() instead of cpumask_or() and cpumask_weight() on
the result, which walks the same bitmap twice. Results in 10-20% less
cycles, which reduces the runqueue lock hold time.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Yury Norov (NVIDIA) <yury.norov@gmail.com>
Link: https://patch.msgid.link/20251119172549.511736272@linutronix.de
This commit is contained in:
Thomas Gleixner 2025-11-19 18:26:59 +01:00 committed by Peter Zijlstra
parent 437cb3ded2
commit 79c11fb3da
1 changed files with 3 additions and 2 deletions

View File

@ -10377,6 +10377,7 @@ void call_trace_sched_update_nr_running(struct rq *rq, int count)
static inline void mm_update_cpus_allowed(struct mm_struct *mm, const struct cpumask *affmsk)
{
struct cpumask *mm_allowed;
unsigned int weight;
if (!mm)
return;
@ -10387,8 +10388,8 @@ static inline void mm_update_cpus_allowed(struct mm_struct *mm, const struct cpu
*/
guard(raw_spinlock)(&mm->mm_cid.lock);
mm_allowed = mm_cpus_allowed(mm);
cpumask_or(mm_allowed, mm_allowed, affmsk);
WRITE_ONCE(mm->mm_cid.nr_cpus_allowed, cpumask_weight(mm_allowed));
weight = cpumask_weighted_or(mm_allowed, mm_allowed, affmsk);
WRITE_ONCE(mm->mm_cid.nr_cpus_allowed, weight);
}
void sched_mm_cid_exit_signals(struct task_struct *t)