- A micro-optimization got misplaced as a cleanup:
- Micro-optimize the asm code in secondary_startup_64_no_verify()
- Change global variables to local
- Add missing kernel-doc function parameter descriptions
- Remove unused parameter from a macro
- Remove obsolete Kconfig entry
- Fix comments
- Fix typos, mostly scripted, manually reviewed
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmWb2i8RHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1iFIQ//RjqKWmEBfv0UVCNgtRgkUKOvYVkfhC1R
FykHWbSE+/oDODS7B+gbWqzl9Fq2Oxx9re4KZuMfnojE96KZ6H1flQn7z3UVRUrf
pfMx13E+uyf7qbVZktqH38lUS4s/AHdX2PKCiXlU/0hIkiBdjbAl3ylyqMv7ytIL
Fi2N9iYJN+eLlMkc3A5IK83xNiU8rb0gO6Uywn3nUbqadY/YX2gDpND5kfzRIneR
lTKy4rX3+E65qYB2Ly1wDr7e0Q0rgaTzPctx6twFrxQXK+MsHiartJhM5juND/tU
DEjSW9ISOHlitKEJI/zbdrvJlr5AKDNy2zHYmQQuqY6+YHRamCKqwIjLIPkKj52g
lAbosNwvp/o8W3zUHgUfVZR5hVxN863zV2qa/ehoQ3b/9kNjQC8actILjYEgIVu9
av1sd+nETbjCUABIF9H9uAoRbgc+wQs2nupJZrjvginFz8+WVhgaBdJDMYCNAmjc
fNMjGtRS7YXiIMj09ZAXFThVW302FdbTgggDh/qlQlDOXFu5HRbyuWR+USr4/jkP
qs2G6m/BHDs9HxDRo/no+ccSrUBV5phfhZbO7qwjTf2NJJvPHW+cxGpT00zU2v8A
lgfVI7SDkxwbyi1gacJ054GqEhsWuEdi40ikqxjhL8Oq4xwwsey/PiaIxjkDQx92
Gj3XUSDnGEs=
=kUav
-----END PGP SIGNATURE-----
Merge tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
- Change global variables to local
- Add missing kernel-doc function parameter descriptions
- Remove unused parameter from a macro
- Remove obsolete Kconfig entry
- Fix comments
- Fix typos, mostly scripted, manually reviewed
and a micro-optimization got misplaced as a cleanup:
- Micro-optimize the asm code in secondary_startup_64_no_verify()
* tag 'x86-cleanups-2024-01-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
arch/x86: Fix typos
x86/head_64: Use TESTB instead of TESTL in secondary_startup_64_no_verify()
x86/docs: Remove reference to syscall trampoline in PTI
x86/Kconfig: Remove obsolete config X86_32_SMP
x86/io: Remove the unused 'bw' parameter from the BUILDIO() macro
x86/mtrr: Document missing function parameters in kernel-doc
x86/setup: Make relocated_ramdisk a local variable of relocate_initrd()
infrastructure and remove the former
- Misc other improvements
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmWbIJ0ACgkQEsHwGGHe
VUrOSg//duslGRzbLkrjijYmlb9C9yjdf+7611biRNExxlOUbQvgMX8O5/qnvfhk
ddqcyRJlg6GeooMI3ch25KRqdUbcgI5Zq1ftG7hFiX94gIiWhiHibYxtmnNGsDuy
sq4R3nxqZxpTh2Seajh9AfXS4uINu1wsedLxuSyNiCBk2/BB7zqC3aPCyAtjJDdT
tpPN+dx5s/rmPkdCcdFxwAgSxsohLjZgktdZIUCbVM+blbghDgPV1wjhCUimxUYx
2woqS0zgOZb6eRcZgdXkuaNvOqODOwIZo3iebr6XSIb49YCC/KhuUlDDr/ZZGV3a
CeMR42sa2CEdl0YHsnHEm8LswpUVqL6FBfFnC+TP0bAGRC+L4pNEaS6eQk+ktWf/
+k6bvuBhuPgZG5KdxgQiJB8wogDxCXqQtI7MNLI0IuBS7GC2hjHb35WB6Qb8i+EB
ffyIJYhK0shpVzo9H1efosmLfO+i2RPn5YZiC+S27MVqGbYEkW6713hr56vp58mK
UaxXWoTnfRR9O5Hyf0xEf/qjZFO2zpmQRgXIM2ECzLxn7GNvsPtcHsYJmLJyeUXf
749glVHz8Ue9HsDTCQB35Z9yRT4cVuz2FclwEt51Qlta8QtX8MRaL/l+F3FOrHoo
CPgo/fzSav3Z/XpQpjbC7vI886azQivuX5i5J9LhO5IDc8O3OMs=
=Q5U5
-----END PGP SIGNATURE-----
Merge tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 paravirt updates from Borislav Petkov:
- Replace the paravirt patching functionality using the alternatives
infrastructure and remove the former
- Misc other improvements
* tag 'x86_paravirt_for_v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/alternative: Correct feature bit debug output
x86/paravirt: Remove no longer needed paravirt patching code
x86/paravirt: Switch mixed paravirt/alternative calls to alternatives
x86/alternative: Add indirect call patching
x86/paravirt: Move some functions and defines to alternative.c
x86/paravirt: Introduce ALT_NOT_XEN
x86/paravirt: Make the struct paravirt_patch_site packed
x86/paravirt: Use relative reference for the original instruction offset
In
https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
I wanted to adjust the alternative patching debug output to the new
changes introduced by
da0fe6e68e ("x86/alternative: Add indirect call patching")
but removed the '*' which denotes the ->x86_capability word. The correct
output should be, for example:
[ 0.230071] SMP alternatives: feat: 11*32+15, old: (entry_SYSCALL_64_after_hwframe+0x5a/0x77 (ffffffff81c000c2) len: 16), repl: (ffffffff89ae896a, len: 5) flags: 0x0
while the incorrect one says "... 1132+15" currently.
Add back the '*'.
Fixes: da0fe6e68e ("x86/alternative: Add indirect call patching")
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231206110636.GBZXBVvCWj2IDjVk4c@fat_crate.local
apply_alternatives() treats alternatives with the ALT_FLAG_NOT flag set
special as it optimizes the existing NOPs in place.
Unfortunately, this happens with interrupts enabled and does not provide any
form of core synchronization.
So an interrupt hitting in the middle of the update and using the affected code
path will observe a half updated NOP and crash and burn. The following
3 NOP sequence was observed to expose this crash halfway reliably under QEMU
32bit:
0x90 0x90 0x90
which is replaced by the optimized 3 byte NOP:
0x8d 0x76 0x00
So an interrupt can observe:
1) 0x90 0x90 0x90 nop nop nop
2) 0x8d 0x90 0x90 undefined
3) 0x8d 0x76 0x90 lea -0x70(%esi),%esi
4) 0x8d 0x76 0x00 lea 0x0(%esi),%esi
Where only #1 and #4 are true NOPs. The same problem exists for 64bit obviously.
Disable interrupts around this NOP optimization and invoke sync_core()
before re-enabling them.
Fixes: 270a69c448 ("x86/alternative: Support relocations in alternatives")
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
text_poke_early() does:
local_irq_save(flags);
memcpy(addr, opcode, len);
local_irq_restore(flags);
sync_core();
That's not really correct because the synchronization should happen before
interrupts are re-enabled to ensure that a pending interrupt observes the
complete update of the opcodes.
It's not entirely clear whether the interrupt entry provides enough
serialization already, but moving the sync_core() invocation into interrupt
disabled region does no harm and is obviously correct.
Fixes: 6fffacb303 ("x86/alternatives, jumplabel: Use text_poke_early() before mm_init()")
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/ZT6narvE%2BLxX%2B7Be@windriver.com
Now that paravirt is using the alternatives patching infrastructure,
remove the paravirt patching code.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231210062138.2417-6-jgross@suse.com
Instead of stacking alternative and paravirt patching, use the new
ALT_FLAG_CALL flag to switch those mixed calls to pure alternative
handling.
Eliminate the need to be careful regarding the sequence of alternative
and paravirt patching.
[ bp: Touch up commit message. ]
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20231210062138.2417-5-jgross@suse.com
In order to prepare replacing of paravirt patching with alternative
patching, add the capability to replace an indirect call with a direct
one.
This is done via a new flag ALT_FLAG_CALL as the target of the CALL
instruction needs to be evaluated using the value of the location
addressed by the indirect call.
For convenience, add a macro for a default CALL instruction. In case it
is being used without the new flag being set, it will result in a BUG()
when being executed. As in most cases, the feature used will be
X86_FEATURE_ALWAYS so add another macro ALT_CALL_ALWAYS usable for the
flags parameter of the ALTERNATIVE macros.
For a complete replacement, handle the special cases of calling a nop
function and an indirect call of NULL the same way as paravirt does.
[ bp: Massage commit message, fixup the debug output and clarify flow
more. ]
Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231210062138.2417-4-jgross@suse.com
As a preparation for replacing paravirt patching completely by
alternative patching, move some backend functions and #defines to
the alternatives code and header.
Signed-off-by: Juergen Gross <jgross@suse.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20231129133332.31043-3-jgross@suse.com
Similar to the alternative patching, use a relative reference for original
instruction offset rather than absolute one, which saves 8 bytes for one
PARA_SITE entry on x86_64. As a result, a R_X86_64_PC32 relocation is
generated instead of an R_X86_64_64 one, which also reduces relocation
metadata on relocatable builds. Hardcode the alignment to 4 now.
[ bp: Massage commit message. ]
Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/9e6053107fbaabc0d33e5d2865c5af2c67ec9925.1686301237.git.houwenlong.hwl@antgroup.com
Fei has reported that KASAN triggers during apply_alternatives() on
a 5-level paging machine:
BUG: KASAN: out-of-bounds in rcu_is_watching()
Read of size 4 at addr ff110003ee6419a0 by task swapper/0/0
...
__asan_load4()
rcu_is_watching()
trace_hardirqs_on()
text_poke_early()
apply_alternatives()
...
On machines with 5-level paging, cpu_feature_enabled(X86_FEATURE_LA57)
gets patched. It includes KASAN code, where KASAN_SHADOW_START depends on
__VIRTUAL_MASK_SHIFT, which is defined with cpu_feature_enabled().
KASAN gets confused when apply_alternatives() patches the
KASAN_SHADOW_START users. A test patch that makes KASAN_SHADOW_START
static, by replacing __VIRTUAL_MASK_SHIFT with 56, works around the issue.
Fix it for real by disabling KASAN while the kernel is patching alternatives.
[ mingo: updated the changelog ]
Fixes: 6657fca06e ("x86/mm: Allow to boot without LA57 if CONFIG_X86_5LEVEL=y")
Reported-by: Fei Yang <fei.yang@intel.com>
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20231012100424.1456-1-kirill.shutemov@linux.intel.com
Commit
7825451fa4 ("static_call: Add call depth tracking support")
failed to realize the problem fixed there is not specific to call depth
tracking but applies to all return-thunk uses.
Move the fix to the appropriate place and condition.
Fixes: ee88d363d1 ("x86,static_call: Use alternative RET encoding")
Reported-by: David Kaplan <David.Kaplan@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: <stable@kernel.org>
The following commit
095b8303f3 ("x86/alternative: Make custom return thunk unconditional")
made '__x86_return_thunk' a placeholder value. All code setting
X86_FEATURE_RETHUNK also changes the value of 'x86_return_thunk'. So
the optimization at the beginning of apply_returns() is dead code.
Also, before the above-mentioned commit, the optimization actually had a
bug It bypassed __static_call_fixup(), causing some raw returns to
remain unpatched in static call trampolines. Thus the 'Fixes' tag.
Fixes: d2408e043e ("x86/alternative: Optimize returns patching")
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/16d19d2249d4485d8380fb215ffaae81e6b8119e.1693889988.git.jpoimboe@kernel.org
The following commit deserves special mention:
22dc02f81c Revert "sched/fair: Move unused stub functions to header"
This is in x86/cleanups, because the revert is a re-application of a
number of cleanups that got removed inadvertedly.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmTtDkoRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1jCMw//UvQGM8yxsTa57r0/ZpJHS2++P5pJxOsz
45kBb3aBiDV6idArce4EHpthp3MvF3Pycibp9w0qg//NOtIHTKeagXv52abxsu1W
hmS6gXJZDXZvjO1BFaUlmv97iYtzGfKnQppj32C4tMr9SaP49h3KvOHH1Z8CR3mP
1nZaJJwYIi2qBh7msnmLGG+F0drb85O/dfHdoLX6iVJw9UP4n5nu9u8u1E0iC7J7
2GC6AwP60A0EBRTK9EHQQEYwy9uvdS/TG5f2Qk1VP87KA9TTocs8MyapMG4DQu79
hZKVEGuVQAlV3rYe9cJBNpDx1mTu3rmuMH0G71KEe3T6UcG5QRUiAPm8UfA9prPD
uWjY4zm5o0W3tUio4V1MqqiLFIaBU63WmTY9RyM0QH8Ms8r8GugWKmnrTIuHfEC3
9D+Uhyb5d8ID6qFGLTOvPm0g+v64lnH71qq83PcVJgsmZvUb2XvFA3d/A0h9JzLT
2In/yfU10UsLUFTiNRyAgcLccjaGhliDB2oke9Kp0OyOTSQRcWmiq8kByVxCPImP
auOWWcNXjcuOgjlnziEkMTDuRY12MgUB2If4zhELvdEFibIaaNW5sNCbY2msWaN1
CUD7fcj0L3HZvzujUm72l5hxL2brJMuPwVNJfuOe4T8wzy569d6VJULrd1URBM1B
vfaPs1Dz46Q=
=kiAA
-----END PGP SIGNATURE-----
Merge tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 cleanups from Ingo Molnar:
"The following commit deserves special mention:
22dc02f81c Revert "sched/fair: Move unused stub functions to header"
This is in x86/cleanups, because the revert is a re-application of a
number of cleanups that got removed inadvertedly"
[ This also effectively undoes the amd_check_microcode() microcode
declaration change I had done in my microcode loader merge in commit
42a7f6e3ff ("Merge tag 'x86_microcode_for_v6.6_rc1' [...]").
I picked the declaration change by Arnd from this branch instead,
which put it in <asm/processor.h> instead of <asm/microcode.h> like I
had done in my merge resolution - Linus ]
* tag 'x86-cleanups-2023-08-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/platform/uv: Refactor code using deprecated strncpy() interface to use strscpy()
x86/hpet: Refactor code using deprecated strncpy() interface to use strscpy()
x86/platform/uv: Refactor code using deprecated strcpy()/strncpy() interfaces to use strscpy()
x86/qspinlock-paravirt: Fix missing-prototype warning
x86/paravirt: Silence unused native_pv_lock_init() function warning
x86/alternative: Add a __alt_reloc_selftest() prototype
x86/purgatory: Include header for warn() declaration
x86/asm: Avoid unneeded __div64_32 function definition
Revert "sched/fair: Move unused stub functions to header"
x86/apic: Hide unused safe_smp_processor_id() on 32-bit UP
x86/cpu: Fix amd_check_microcode() declaration
There is infrastructure to rewrite return thunks to point to any
random thunk one desires, unwrap that from CALL_THUNKS, which up to
now was the sole user of that.
[ bp: Make the thunks visible on 32-bit and add ifdeffery for the
32-bit builds. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230814121148.775293785@infradead.org
The newly introduced selftest function causes a warning when -Wmissing-prototypes
is enabled:
arch/x86/kernel/alternative.c:1461:32: error: no previous prototype for '__alt_reloc_selftest' [-Werror=missing-prototypes]
Since it's only used locally, add the prototype directly in front of it.
Fixes: 270a69c448 ("x86/alternative: Support relocations in alternatives")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230803082619.1369127-6-arnd@kernel.org
poison_cfi() was introduced in:
9831c6253a ("x86/cfi: Extend ENDBR sealing to kCFI")
... but it's only ever used under CONFIG_X86_KERNEL_IBT=y,
and if that option is disabled, we get:
arch/x86/kernel/alternative.c:1243:13: error: ‘poison_cfi’ defined but not used [-Werror=unused-function]
Guard the definition with CONFIG_X86_KERNEL_IBT.
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sami Tolvanen <samitolvanen@google.com>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Alyssa noticed that when building the kernel with CFI_CLANG+IBT and
booting on IBT enabled hardware to obtain FineIBT, the indirect
functions look like:
__cfi_foo:
endbr64
subl $hash, %r10d
jz 1f
ud2
nop
1:
foo:
endbr64
This is because the compiler generates code for kCFI+IBT. In that case
the caller does the hash check and will jump to +0, so there must be
an ENDBR there. The compiler doesn't know about FineIBT at all; also
it is possible to actually use kCFI+IBT when booting with 'cfi=kcfi'
on IBT enabled hardware.
Having this second ENDBR however makes it possible to elide the CFI
check. Therefore, we should poison this second ENDBR when switching to
FineIBT mode.
Fixes: 931ab63664 ("x86/ibt: Implement FineIBT")
Reported-by: "Milburn, Alyssa" <alyssa.milburn@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lore.kernel.org/r/20230615193722.194131053@infradead.org
Kees noted that IBT sealing could be extended to kCFI.
Fundamentally it is the list of functions that do not have their
address taken and are thus never called indirectly. It doesn't matter
that objtool uses IBT infrastructure to determine this list, once we
have it it can also be used to clobber kCFI hashes and avoid kCFI
indirect calls.
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.494426891%40infradead.org
The current name doesn't reflect what it does very well.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Link: https://lkml.kernel.org/r/20230622144321.427441595%40infradead.org
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double().
The cmpxchg128() family of functions is basically & functionally
the same as cmpxchg_double(), but with a saner interface: instead
of a 6-parameter horror that forced u128 - u64/u64-halves layout
details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128 types.
- Restructure the generated atomic headers, and add
kerneldoc comments for all of the generic atomic{,64,_long}_t
operations. Generated definitions are much cleaner now,
and come with documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering
when taking multiple locks of the same type. This gets rid of
one use of lockdep_set_novalidate_class() in the bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended
variable shadowing generating garbage code on Clang on certain
ARM builds.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmSav3wRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1gDyxAAjCHQjpolrre7fRpyiTDwqzIKT27H04vQ
zrQVlVc42WBnn9pe8LthGy43/RvYvqlZvLoLONA4fMkuYriM6nSMsoZjeUmE+6Rs
QAElQC74P5YvEBOa67VNY3/M7sj22ftDe7ODtVV8OrnPjMk1sQNRvaK025Cs3yig
8MAI//hHGNmyVAp1dPYZMJNqxGCvluReLZ4SaUJFCMrg7YgUXgCBj/5Gi07TlKxn
sT8BFCssoEW/B9FXkh59B1t6FBCZoSy4XSZfsZe0uVAUJ4XDEOO+zBgaWFCedNQT
wP323ryBgMrkzUKA8j2/o5d3QnMA1GcBfHNNlvAl/fOfrxWXzDZnOEY26YcaLMa0
YIuRF/JNbPZlt6DCUVBUEvMPpfNYi18dFN0rat1a6xL2L4w+tm55y3mFtSsg76Ka
r7L2nWlRrAGXnuA+VEPqkqbSWRUSWOv5hT2Mcyb5BqqZRsxBETn6G8GVAzIO6j6v
giyfUdA8Z9wmMZ7NtB6usxe3p1lXtnZ/shCE7ZHXm6xstyZrSXaHgOSgAnB9DcuJ
7KpGIhhSODQSwC/h/J0KEpb9Pr/5jCWmXAQ2DWnZK6ndt1jUfFi8pfK58wm0AuAM
o9t8Mx3o8wZjbMdt6up9OIM1HyFiMx2BSaZK+8f/bWemHQ0xwez5g4k5O5AwVOaC
x9Nt+Tp0Ze4=
=DsYj
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Introduce cmpxchg128() -- aka. the demise of cmpxchg_double()
The cmpxchg128() family of functions is basically & functionally the
same as cmpxchg_double(), but with a saner interface.
Instead of a 6-parameter horror that forced u128 - u64/u64-halves
layout details on the interface and exposed users to complexity,
fragility & bugs, use a natural 3-parameter interface with u128
types.
- Restructure the generated atomic headers, and add kerneldoc comments
for all of the generic atomic{,64,_long}_t operations.
The generated definitions are much cleaner now, and come with
documentation.
- Implement lock_set_cmp_fn() on lockdep, for defining an ordering when
taking multiple locks of the same type.
This gets rid of one use of lockdep_set_novalidate_class() in the
bcache code.
- Fix raw_cpu_generic_try_cmpxchg() bug due to an unintended variable
shadowing generating garbage code on Clang on certain ARM builds.
* tag 'locking-core-2023-06-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (43 commits)
locking/atomic: scripts: fix ${atomic}_dec_if_positive() kerneldoc
percpu: Fix self-assignment of __old in raw_cpu_generic_try_cmpxchg()
locking/atomic: treewide: delete arch_atomic_*() kerneldoc
locking/atomic: docs: Add atomic operations to the driver basic API documentation
locking/atomic: scripts: generate kerneldoc comments
docs: scripts: kernel-doc: accept bitwise negation like ~@var
locking/atomic: scripts: simplify raw_atomic*() definitions
locking/atomic: scripts: simplify raw_atomic_long*() definitions
locking/atomic: scripts: split pfx/name/sfx/order
locking/atomic: scripts: restructure fallback ifdeffery
locking/atomic: scripts: build raw_atomic_long*() directly
locking/atomic: treewide: use raw_atomic*_<op>()
locking/atomic: scripts: add trivial raw_atomic*_<op>()
locking/atomic: scripts: factor out order template generation
locking/atomic: scripts: remove leftover "${mult}"
locking/atomic: scripts: remove bogus order parameter
locking/atomic: xtensa: add preprocessor symbols
locking/atomic: x86: add preprocessor symbols
locking/atomic: sparc: add preprocessor symbols
locking/atomic: sh: add preprocessor symbols
...
While chasing ghosts, I did notice that optimize_nops() was replacing
'REP NOP' aka 'PAUSE' with NOP2. This is clearly not right.
Fixes: 6c480f2221 ("x86/alternative: Rewrite optimize_nops() some")
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/linux-next/20230524130104.GR83892@hirez.programming.kicks-ass.net/
Debugging in the kernel has started slowing down the kernel by a
noticeable amount. The ftrace start up tests are triggering the softlockup
watchdog on some boxes. This is caused by the start up tests that enable
function and function graph tracing several times. Sprinkling
cond_resched() just in the start up test code was not enough to stop the
softlockup from triggering. It would sometimes trigger in the
text_poke_bp_batch() code.
When function tracing enables all functions, it will call
text_poke_queue() to queue the places that need to be patched. Every
256 entries will do a "flush" that calls text_poke_bp_batch() to do the
update of the 256 locations. As this is in a scheduleable context,
calling cond_resched() at the start of text_poke_bp_batch() will ensure
that other tasks could get a chance to run while the patching is
happening. This keeps the softlockup from triggering in the start up
tests.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230531092419.4d051374@rorschach.local.home
Now that we have raw_atomic*_<op>() definitions, there's no need to use
arch_atomic*_<op>() definitions outside of the low-level atomic
definitions.
Move treewide users of arch_atomic*_<op>() over to the equivalent
raw_atomic*_<op>().
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230605070124.3741859-19-mark.rutland@arm.com
By adding support for longer NOPs there are a few more alternatives
that can turn into a single instruction.
Add up to NOP11, the same limit where GNU as .nops also stops
generating longer nops. This is because a number of uarchs have severe
decode penalties for more than 3 prefixes.
[ bp: Sync up with the version in tools/ while at it. ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230515093020.661756940@infradead.org
Instead of decoding each instruction in the return sites range only to
realize that that return site is a jump to the default return thunk
which is needed - X86_FEATURE_RETHUNK is enabled - lift that check
before the loop and get rid of that loop overhead.
Add comments about what gets patched, while at it.
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230512120952.7924-1-bp@alien8.de
Because:
SMP alternatives: ffffffff810026dc: [2:44) optimized NOPs: eb 2a eb 28 cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc
is quite daft, make things more complicated and have the NOP runlength
detection eat the preceding JMP if they both end at the same target.
SMP alternatives: ffffffff810026dc: [0:44) optimized NOPs: eb 2a cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc
cc cc cc cc cc cc cc cc cc cc cc cc cc
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.433132442@infradead.org
Address two issues:
- it no longer hard requires single byte NOP runs - now it accepts any
NOP and NOPL encoded instruction (but not the more complicated 32bit
NOPs).
- it writes a single 'instruction' replacement.
Specifically, ORC unwinder relies on the tail NOP of an alternative to
be a single instruction. In particular, it relies on the inner bytes not
being executed.
Once the max supported NOP length has been reached (currently 8, could easily
be extended to 11 on x86_64), switch to JMP.d8 and INT3 padding to
achieve the same result.
Objtool uses this guarantee in the analysis of alternative/overlapping
CFI state for the ORC unwinder data. Every instruction edge gets a CFI
state and the more instructions the larger the chance of conflicts.
[ bp:
- Add a comment over add_nop() to explain why it does it this way
- Make add_nops() PARAVIRT only as it is used solely there now ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.373412974@infradead.org
A little while ago someone (Kirill) ran into the whole 'alternatives don't
do relocations nonsense' again and I got annoyed enough to actually look
at the code.
Since the whole alternative machinery already fully decodes the
instructions it is simple enough to adjust immediates and displacement
when needed. Specifically, the immediates for IP modifying instructions
(JMP, CALL, Jcc) and the displacement for RIP-relative instructions.
[ bp: Massage comment some more and get rid of third loop in
apply_relocation(). ]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.313857925@infradead.org
Using debug-alternative generates a *LOT* of output, extend it a bit
to select which of the many rewrites it reports on.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Link: https://lore.kernel.org/r/20230208171431.253636689@infradead.org
In order to re-write Jcc.d32 instructions text_poke_bp() needs to be
taught about them.
The biggest hurdle is that the whole machinery is currently made for 5
byte instructions and extending this would grow struct text_poke_loc
which is currently a nice 16 bytes and used in an array.
However, since text_poke_loc contains a full copy of the (s32)
displacement, it is possible to map the Jcc.d32 2 byte opcodes to
Jcc.d8 1 byte opcode for the int3 emulation.
This then leaves the replacement bytes; fudge that by only storing the
last 5 bytes and adding the rule that 'length == 6' instruction will
be prefixed with a 0x0f byte.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20230123210607.115718513@infradead.org
Add a struct alt_instr.flags field which will contain different flags
controlling alternatives patching behavior.
The initial idea was to be able to specify it as a separate macro
parameter but that would mean touching all possible invocations of the
alternatives macros and thus a lot of churn.
What is more, as PeterZ suggested, being able to say ALT_NOT(feature) is
very readable and explains exactly what is meant.
So make the feature field a u32 where the patching flags are the upper
u16 part of the dword quantity while the lower u16 word is the feature.
The highest feature number currently is 0x26a (i.e., word 19) so there
is plenty of space. If that becomes insufficient, the field can be
extended to u64 which will then make struct alt_instr of the nice size
of 16 bytes (14 bytes currently).
There should be no functional changes resulting from this.
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Link: https://lore.kernel.org/r/Y6RCoJEtxxZWwotd@zn.tnic
* Randomize the per-cpu entry areas
Cleanups:
* Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open
coding it
* Move to "native" set_memory_rox() helper
* Clean up pmd_get_atomic() and i386-PAE
* Remove some unused page table size macros
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOc53UACgkQaDWVMHDJ
krCUHw//SGZ+La0hLZLAiAiZTXLZZHpYkOmg1Oj1+11qSU11uZzTFqDpauhaKpRS
cJCSh+D+RXe5e2ipgt0+Zl0hESLt7pJf8258OE4ra0DL/IlyO9uqruAs9Kn3eRS/
Fk76nG8gdEU+JKJqpG02GqOLslYQuIy96n9hpuj1x25b614+uezPfC7S4XEat0NT
MbJQ+jnVDf16aJIJkzT+iSwhubDVeh+bSHeO0SSCzX23WLUqDeg5NvlyxoCHGbBh
UpUTWggV/0pYAkBKRHToeJs8qTWREwuuH/8JGewpe9A0tjdB5wyZfNL2PuracweN
9MauXC3T5f0+Ca4yIIaPq1fF7Ny/PR2dBFihk27rOD0N7tjaZxNwal2pB1sZcmvZ
+PAokjyTPVH5ZXjkMYGGAUe1jyjwr2+TgFSZxhTnDuGtyVQiY4pihGKOifLCX6tv
x6khvYeTBw7wfaDRtKEAf+2kLHYn+71HszHP/8bNKX9T03h+Zf0i1wdZu5xbM5Gc
VK2wR7bCC+UftJJYG0pldcHg2qaF19RBHK2tLwp7zngUv7lTbkKfkgKjre73KV2a
D4b76lrqdUMo6UYwYdw7WtDyarZS4OVLq2DcNhwwMddBCaX8kyN5a4AqwQlZYJ0u
dM+kuMofE8U3yMxmMhJimkZUsj09yLHIqfynY0jbAcU3nhKZZNY=
=wwVF
-----END PGP SIGNATURE-----
Merge tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 mm updates from Dave Hansen:
"New Feature:
- Randomize the per-cpu entry areas
Cleanups:
- Have CR3_ADDR_MASK use PHYSICAL_PAGE_MASK instead of open coding it
- Move to "native" set_memory_rox() helper
- Clean up pmd_get_atomic() and i386-PAE
- Remove some unused page table size macros"
* tag 'x86_mm_for_6.2_v2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (35 commits)
x86/mm: Ensure forced page table splitting
x86/kasan: Populate shadow for shared chunk of the CPU entry area
x86/kasan: Add helpers to align shadow addresses up and down
x86/kasan: Rename local CPU_ENTRY_AREA variables to shorten names
x86/mm: Populate KASAN shadow for entire per-CPU range of CPU entry area
x86/mm: Recompute physical address for every page of per-CPU CEA mapping
x86/mm: Rename __change_page_attr_set_clr(.checkalias)
x86/mm: Inhibit _PAGE_NX changes from cpa_process_alias()
x86/mm: Untangle __change_page_attr_set_clr(.checkalias)
x86/mm: Add a few comments
x86/mm: Fix CR3_ADDR_MASK
x86/mm: Remove P*D_PAGE_MASK and P*D_PAGE_SIZE macros
mm: Convert __HAVE_ARCH_P..P_GET to the new style
mm: Remove pointless barrier() after pmdp_get_lockless()
x86/mm/pae: Get rid of set_64bit()
x86_64: Remove pointless set_64bit() usage
x86/mm/pae: Be consistent with pXXp_get_and_clear()
x86/mm/pae: Use WRITE_ONCE()
x86/mm/pae: Don't (ab)use atomic64
mm/gup: Fix the lockless PMD access
...
Now that text_poke is available before ftrace, remove the
SYSTEM_BOOTING exceptions.
Specifically, this cures a W+X case during boot.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20221025201057.945960823@infradead.org
been long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets applied,
it patches a call accounting thunk which is used to track the call depth
of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific value
for the Return Stack Buffer, the code stuffs that RSB and avoids its
underflow which could otherwise lead to the Intel variant of Retbleed.
This software-only solution brings a lot of the lost performance back,
as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT support
where present to annotate and track indirect branches using a hash to
validate them
- Other misc fixes and cleanups
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe
VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb
VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN
gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A
iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y
Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A
ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh
ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd
73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP
i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n
Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/
zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs=
=cRy1
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core updates from Borislav Petkov:
- Add the call depth tracking mitigation for Retbleed which has been
long in the making. It is a lighterweight software-only fix for
Skylake-based cores where enabling IBRS is a big hammer and causes a
significant performance impact.
What it basically does is, it aligns all kernel functions to 16 bytes
boundary and adds a 16-byte padding before the function, objtool
collects all functions' locations and when the mitigation gets
applied, it patches a call accounting thunk which is used to track
the call depth of the stack at any time.
When that call depth reaches a magical, microarchitecture-specific
value for the Return Stack Buffer, the code stuffs that RSB and
avoids its underflow which could otherwise lead to the Intel variant
of Retbleed.
This software-only solution brings a lot of the lost performance
back, as benchmarks suggest:
https://lore.kernel.org/all/20220915111039.092790446@infradead.org/
That page above also contains a lot more detailed explanation of the
whole mechanism
- Implement a new control flow integrity scheme called FineIBT which is
based on the software kCFI implementation and uses hardware IBT
support where present to annotate and track indirect branches using a
hash to validate them
- Other misc fixes and cleanups
* tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits)
x86/paravirt: Use common macro for creating simple asm paravirt functions
x86/paravirt: Remove clobber bitmask from .parainstructions
x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al
x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit
x86/Kconfig: Enable kernel IBT by default
x86,pm: Force out-of-line memcpy()
objtool: Fix weak hole vs prefix symbol
objtool: Optimize elf_dirty_reloc_sym()
x86/cfi: Add boot time hash randomization
x86/cfi: Boot time selection of CFI scheme
x86/ibt: Implement FineIBT
objtool: Add --cfi to generate the .cfi_sites section
x86: Add prefix symbols for function padding
objtool: Add option to generate prefix symbols
objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf
objtool: Slice up elf_create_section_symbol()
kallsyms: Revert "Take callthunks into account"
x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces
x86/retpoline: Fix crash printing warning
x86/paravirt: Fix a !PARAVIRT build warning
...
- Rework the handling of x86_regset for 32 and 64 bit. The original
implementation tried to minimize the allocation size with quite some
hard to understand and fragile tricks. Make it robust and straight
forward by separating the register enumerations for 32 and 64 bit
completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUu0ATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUNzEACNn5XbRqxPQZak5XHeJ46/VNVTqTE0
Z7euwF8oP+aAybyDevvm18D2hB9Atn4vU9QJYhnTxBXbCLUNErKrH8FcXdNOBbeC
YdAX7nO5WH8IM+drCMySeK6Tv6rvhnDUtgBzdBSl4NdPXUSOnGo+jHqHfN/Q+/n0
yvbwSoVAjD01sxVZQqKQOrzDgDuR/zlISCVudfS+tR4Rm/CYj0cl+MQS9Z1VM3Z6
7pqyypd5+CyNAD6vTDY/q+ZK0ShfNnU9TIIoGmOB/pc0kLctwIu3MY76Uo2DUgGn
n/ItR9mvYu/QelCwX02VG3aRYJPLRfBa+DjQfZUwZapRz3rsjKtfa8ogpPZTLrSO
o4ht/jxlKKDyNOQKYeL2yy054JR4DkKziilEzw5GZHeH2y66XWudRuWfMwbTdrGc
esP5fSNfZ9uluYl6GCCw6S83RJzQ8aZXRcAy7CJgw2Qb4XE7IOA2jf18x5AYaDUp
4a6HCjbxYkEmKCkzkh9+w5koYruyizMBKMBBh5QsMzH4xp20s/vffHwbZ1tls9Za
eTDC/E+wW9Om3qynRynm0EmcHpa0j+RcmkHOhFcXj6SRLnhzktk4Rrr3vlhardS3
Pc8h3GnE5mFXqS8t3r6/hvMk+6svhSu3RbICiLNU72F/tVLU628ux/WoCKfXZloE
7HxWoVhkTF7eOw==
=DTBQ
-----END PGP SIGNATURE-----
Merge tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit.
The original implementation tried to minimize the allocation size
with quite some hard to understand and fragile tricks. Make it
robust and straight forward by separating the register enumerations
for 32 and 64 bit completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings"
* tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/32: Remove setup_once()
x86/kaslr: Fix process_mem_region()'s return value
x86: Fix misc small issues
x86/boot: Repair kernel-doc for boot_kstrtoul()
x86: Improve formatting of user_regset arrays
x86: Separate out x86_regset for 32 and 64 bit
x86/i8259: Make default_legacy_pic static
x86/tsc: Make art_related_clocksource static
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the
argument setup of:
apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end);
even when X86_KERNEL_IBT=n and the function is an empty stub, which leads
to link errors due to missing __ibt_endbr_seal* symbols:
ld: arch/x86/kernel/alternative.o: in function `alternative_instructions':
alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end'
ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal'
Remove the explicit 'noinline' to help gcc optimize them away.
Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20221011113803.956808-1-linmiaohe@huawei.com
In order to avoid known hashes (from knowing the boot image),
randomize the CFI hashes with a per-boot random seed.
Suggested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.765195516@infradead.org
Add the "cfi=" boot parameter to allow people to select a CFI scheme
at boot time. Mostly useful for development / debugging.
Requested-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.699804264@infradead.org
Implement an alternative CFI scheme that merges both the fine-grained
nature of kCFI but also takes full advantage of the coarse grained
hardware CFI as provided by IBT.
To contrast:
kCFI is a pure software CFI scheme and relies on being able to read
text -- specifically the instruction *before* the target symbol, and
does the hash validation *before* doing the call (otherwise control
flow is compromised already).
FineIBT is a software and hardware hybrid scheme; by ensuring every
branch target starts with a hash validation it is possible to place
the hash validation after the branch. This has several advantages:
o the (hash) load is avoided; no memop; no RX requirement.
o IBT WAIT-FOR-ENDBR state is a speculation stop; by placing
the hash validation in the immediate instruction after
the branch target there is a minimal speculation window
and the whole is a viable defence against SpectreBHB.
o Kees feels obliged to mention it is slightly more vulnerable
when the attacker can write code.
Obviously this patch relies on kCFI, but additionally it also relies
on the padding from the call-depth-tracking patches. It uses this
padding to place the hash-validation while the call-sites are
re-written to modify the indirect target to be 16 bytes in front of
the original target, thus hitting this new preamble.
Notably, there is no hardware that needs call-depth-tracking (Skylake)
and supports IBT (Tigerlake and onwards).
Suggested-by: Joao Moreira (Intel) <joao@overdrivepizza.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20221027092842.634714496@infradead.org
The first argument of WARN() is a condition, so this will use "addr"
as the format string and possibly crash.
Fixes: 3b6c1747da ("x86/retpoline: Add SKL retthunk retpolines")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com>
Link: https://lore.kernel.org/all/Y1gBoUZrRK5N%2FlCB@kili/
Ensure that retpolines do the proper call accounting so that the return
accounting works correctly.
Specifically; retpolines are used to replace both 'jmp *%reg' and
'call *%reg', however these two cases do not have the same accounting
requirements. Therefore split things up and provide two different
retpoline arrays for SKL.
The 'jmp *%reg' case needs no accounting, the
__x86_indirect_jump_thunk_array[] covers this. The retpoline is
changed to not use the return thunk; it's a simple call;ret construct.
[ strictly speaking it should do:
andq $(~0x1f), PER_CPU_VAR(__x86_call_depth)
but we can argue this can be covered by the fuzz we already have
in the accounting depth (12) vs the RSB depth (16) ]
The 'call *%reg' case does need accounting, the
__x86_indirect_call_thunk_array[] covers this. Again, this retpoline
avoids the use of the return-thunk, in this case to avoid double
accounting.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.996634749@infradead.org
In preparation for call depth tracking on Intel SKL CPUs, make it possible
to patch in a SKL specific return thunk.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.680469665@infradead.org
Mitigating the Intel SKL RSB underflow issue in software requires to
track the call depth. That is every CALL and every RET need to be
intercepted and additional code injected.
The existing retbleed mitigations already include means of redirecting
RET to __x86_return_thunk; this can be re-purposed and RET can be
redirected to another function doing RET accounting.
CALL accounting will use the function padding introduced in prior
patches. For each CALL instruction, the destination symbol's padding
is rewritten to do the accounting and the CALL instruction is adjusted
to call into the padding.
This ensures only affected CPUs pay the overhead of this accounting.
Unaffected CPUs will leave the padding unused and have their 'JMP
__x86_return_thunk' replaced with an actual 'RET' instruction.
Objtool has been modified to supply a .call_sites section that lists
all the 'CALL' instructions. Additionally the paravirt instruction
sites are iterated since they will have been patched from an indirect
call to direct calls (or direct instructions in which case it'll be
ignored).
Module handling and the actual thunk code for SKL will be added in
subsequent steps.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.470877038@infradead.org
The upcoming call thunk patching must hold text_mutex and needs access to
text_poke_copy(), which takes text_mutex.
Provide a _locked postfixed variant to expose the inner workings.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220915111147.159977224@infradead.org
as both vendors suggest
- Clean up pciserial a bit
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmM8AxoACgkQEsHwGGHe
VUrdqQ//c07XggioZRVWASkE5vXggSuM1BzCkf55kzx3Hp20F8ui26cgOW8nmNr9
8pKjcTSGZvZPyZFHPBmZDFqver+viSAA3y+/DopNMd6DIeQWOCtJMFdoGPuPqCMq
qYb3WkGu3i4AcjHxeiYKIlf76w+DfSomk+NspUYKLxGuPH6Hg5MFLwv32c0orh/g
sMM5YCFP4xKtMTrpZ8GXO8+81dU57KwCPnmLy6RrPPCjIHkEmS9KBaib2T9udFvf
DMQVZGEU4DQkl+SXoj5fOn03eQ56kDaH3OyV7JbHUwIXpASAEJZENwBD/HmP02aB
uN+tm9brjyJLPD92FldoJPkcsYBcwS1vx+TBH7/KLsaGZziZ1+Oc8hk6fbAd6yp9
W/ex491SCmdBA0wQelw/fZJ24/QXOT1PGIKl96srH0VPvhQtCESMDbKMdTY4p6Ow
s6GYF+HW6SV5d/jUQQzfmd1D7Ur6RxwvUnRvfDg8/AZ+INQRciMG0aFJa3E1SUJf
TkfXZ12XsL2LABK4MWOJqJsB+DSnEZbjOaPxWIKBhMUHxoc8nIny5ZFZbww5iVii
pUdKEMgicm1zuaiIoQBY9K6oOlR2ixZJDw++NzfId2jwsiR4MeJH78sB61mItMfs
WnwIkNxNfimgvcGtAogTMalJlnN56ukrjjwGrdhJGL3WovQ8qHc=
=Ggh3
-----END PGP SIGNATURE-----
Merge tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 core fixes from Borislav Petkov:
- Make sure an INT3 is slapped after every unconditional retpoline JMP
as both vendors suggest
- Clean up pciserial a bit
* tag 'x86_core_for_v6.1_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86,retpoline: Be sure to emit INT3 after JMP *%\reg
x86/earlyprintk: Clean up pciserial
I encountered some occasional crashes of poke_int3_handler() when
kprobes are set, while accessing desc->vec.
The text poke mechanism claims to have an RCU-like behavior, but it
does not appear that there is any quiescent state to ensure that
nobody holds reference to desc. As a result, the following race
appears to be possible, which can lead to memory corruption.
CPU0 CPU1
---- ----
text_poke_bp_batch()
-> smp_store_release(&bp_desc, &desc)
[ notice that desc is on
the stack ]
poke_int3_handler()
[ int3 might be kprobe's
so sync events are do not
help ]
-> try_get_desc(descp=&bp_desc)
desc = __READ_ONCE(bp_desc)
if (!desc) [false, success]
WRITE_ONCE(bp_desc, NULL);
atomic_dec_and_test(&desc.refs)
[ success, desc space on the stack
is being reused and might have
non-zero value. ]
arch_atomic_inc_not_zero(&desc->refs)
[ might succeed since desc points to
stack memory that was freed and might
be reused. ]
Fix this issue with small backportable patch. Instead of trying to
make RCU-like behavior for bp_desc, just eliminate the unnecessary
level of indirection of bp_desc, and hold the whole descriptor as a
global. Anyhow, there is only a single descriptor at any given
moment.
Fixes: 1f676247f3 ("x86/alternatives: Implement a better poke_int3_handler() completion scheme")
Signed-off-by: Nadav Amit <namit@vmware.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: stable@kernel.org
Link: https://lkml.kernel.org/r/20220920224743.3089-1-namit@vmware.com