x86/microcode: Introduce staging step to reduce late-loading time

As microcode patch sizes continue to grow, late-loading latency spikes can
lead to timeouts and disruptions in running workloads. This trend of
increasing patch sizes is expected to continue, so a foundational solution is
needed to address the issue.

To mitigate the problem, introduce a microcode staging feature. This option
processes most of the microcode update (excluding activation) on
a non-critical path, allowing CPUs to remain operational during the majority
of the update. By offloading work from the critical path, staging can
significantly reduce latency spikes.

Integrate staging as a preparatory step in late-loading. Introduce a new
callback for staging, which is invoked at the beginning of
load_late_stop_cpus(), before CPUs enter the rendezvous phase.

Staging follows an opportunistic model:

  *  If successful, it reduces CPU rendezvous time
  *  Even though it fails, the process falls back to the legacy path to
     finish the loading process but with potentially higher latency.

Extend struct microcode_ops to incorporate staging properties, which will be
implemented in the vendor code separately.

Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Chao Gao <chao.gao@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Tested-by: Anselm Busse <abusse@amazon.de>
Link: https://lore.kernel.org/20250320234104.8288-1-chang.seok.bae@intel.com
This commit is contained in:
Chang S. Bae 2025-09-21 15:48:36 -07:00 committed by Borislav Petkov (AMD)
parent ed44a5625f
commit 7cdda85ed9
2 changed files with 14 additions and 1 deletions

View File

@ -589,6 +589,17 @@ static int load_late_stop_cpus(bool is_safe)
pr_err("You should switch to early loading, if possible.\n");
}
/*
* Pre-load the microcode image into a staging device. This
* process is preemptible and does not require stopping CPUs.
* Successful staging simplifies the subsequent late-loading
* process, reducing rendezvous time.
*
* Even if the transfer fails, the update will proceed as usual.
*/
if (microcode_ops->use_staging)
microcode_ops->stage_microcode();
atomic_set(&late_cpus_in, num_online_cpus());
atomic_set(&offline_in_nmi, 0);
loops_per_usec = loops_per_jiffy / (TICK_NSEC / 1000);

View File

@ -31,10 +31,12 @@ struct microcode_ops {
* See also the "Synchronization" section in microcode_core.c.
*/
enum ucode_state (*apply_microcode)(int cpu);
void (*stage_microcode)(void);
int (*collect_cpu_info)(int cpu, struct cpu_signature *csig);
void (*finalize_late_load)(int result);
unsigned int nmi_safe : 1,
use_nmi : 1;
use_nmi : 1,
use_staging : 1;
};
struct early_load_data {