mirror of https://github.com/torvalds/linux.git
447 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
94a855111e |
- Add the call depth tracking mitigation for Retbleed which has
been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOZp5EACgkQEsHwGGHe VUrZFxAAvi/+8L0IYSK4mKJvixGbTFjxN/Swo2JVOfs34LqGUT6JaBc+VUMwZxdb VMTFIZ3ttkKEodjhxGI7oGev6V8UfhI37SmO2lYKXpQVjXXnMlv/M+Vw3teE38CN gopi+xtGnT1IeWQ3tc/Tv18pleJ0mh5HKWiW+9KoqgXj0wgF9x4eRYDz1TDCDA/A iaBzs56j8m/FSykZHnrWZ/MvjKNPdGlfJASUCPeTM2dcrXQGJ93+X2hJctzDte0y Nuiw6Y0htfFBE7xoJn+sqm5Okr+McoUM18/CCprbgSKYk18iMYm3ZtAi6FUQZS1A ua4wQCf49loGp15PO61AS5d3OBf5D3q/WihQRbCaJvTVgPp9sWYnWwtcVUuhMllh ZQtBU9REcVJ/22bH09Q9CjBW0VpKpXHveqQdqRDViLJ6v/iI6EFGmD24SW/VxyRd 73k9MBGrL/dOf1SbEzdsnvcSB3LGzp0Om8o/KzJWOomrVKjBCJy16bwTEsCZEJmP i406m92GPXeaN1GhTko7vmF0GnkEdJs1GVCZPluCAxxbhHukyxHnrjlQjI4vC80n Ylc0B3Kvitw7LGJsPqu+/jfNHADC/zhx1qz/30wb5cFmFbN1aRdp3pm8JYUkn+l/ zri2Y6+O89gvE/9/xUhMohzHsWUO7xITiBavewKeTP9GSWybWUs= =cRy1 -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ... |
|
|
|
79ad89123c |
A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit. The original
implementation tried to minimize the allocation size with quite some
hard to understand and fragile tricks. Make it robust and straight
forward by separating the register enumerations for 32 and 64 bit
completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUu0ATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUNzEACNn5XbRqxPQZak5XHeJ46/VNVTqTE0
Z7euwF8oP+aAybyDevvm18D2hB9Atn4vU9QJYhnTxBXbCLUNErKrH8FcXdNOBbeC
YdAX7nO5WH8IM+drCMySeK6Tv6rvhnDUtgBzdBSl4NdPXUSOnGo+jHqHfN/Q+/n0
yvbwSoVAjD01sxVZQqKQOrzDgDuR/zlISCVudfS+tR4Rm/CYj0cl+MQS9Z1VM3Z6
7pqyypd5+CyNAD6vTDY/q+ZK0ShfNnU9TIIoGmOB/pc0kLctwIu3MY76Uo2DUgGn
n/ItR9mvYu/QelCwX02VG3aRYJPLRfBa+DjQfZUwZapRz3rsjKtfa8ogpPZTLrSO
o4ht/jxlKKDyNOQKYeL2yy054JR4DkKziilEzw5GZHeH2y66XWudRuWfMwbTdrGc
esP5fSNfZ9uluYl6GCCw6S83RJzQ8aZXRcAy7CJgw2Qb4XE7IOA2jf18x5AYaDUp
4a6HCjbxYkEmKCkzkh9+w5koYruyizMBKMBBh5QsMzH4xp20s/vffHwbZ1tls9Za
eTDC/E+wW9Om3qynRynm0EmcHpa0j+RcmkHOhFcXj6SRLnhzktk4Rrr3vlhardS3
Pc8h3GnE5mFXqS8t3r6/hvMk+6svhSu3RbICiLNU72F/tVLU628ux/WoCKfXZloE
7HxWoVhkTF7eOw==
=DTBQ
-----END PGP SIGNATURE-----
Merge tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit.
The original implementation tried to minimize the allocation size
with quite some hard to understand and fragile tricks. Make it
robust and straight forward by separating the register enumerations
for 32 and 64 bit completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings"
* tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/32: Remove setup_once()
x86/kaslr: Fix process_mem_region()'s return value
x86: Fix misc small issues
x86/boot: Repair kernel-doc for boot_kstrtoul()
x86: Improve formatting of user_regset arrays
x86: Separate out x86_regset for 32 and 64 bit
x86/i8259: Make default_legacy_pic static
x86/tsc: Make art_related_clocksource static
|
|
|
|
0ce096db71 |
Linux 6.1-rc6
-----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmN6wAgeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG0EYH/3/RO90NbrFItraN Lzr+d3VdbGjTu8xd1M+PRTmwh3zxLpB+Jwqr0T0A2gzL9B/D+AUPUJdrCVbv9DqS FLJAVqoeV20dNBAHSffOOLPsgCZ+Eu+LzlNN7Iqde0e8cyZICFMNktitui84Xm/i 1NgFVgz9OZ6+aieYvUj3FrFq0p8GTIaC/oybDZrxYKcO8ZzKVMJ11swRw10wwq0g qOOECvV3w7wlQ8upQZkzFxItKFc7EexZI6R4elXeGSJJ9Hlc092dv/zsKB9dwV+k WcwkJrZRoezYXzgGBFxUcQtzi+ethjrPjuJuM1rYLUSIcfIW/0lkaSLgRoBu8D+I 1GfXkXs= =gt6P -----END PGP SIGNATURE----- Merge tag 'v6.1-rc6' into x86/core, to resolve conflicts Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c: # upstream: |
|
|
|
ba54d194f8 |
x86/traps: avoid KMSAN bugs originating from handle_bug()
There is a case in exc_invalid_op handler that is executed outside the irqentry_enter()/irqentry_exit() region when an UD2 instruction is used to encode a call to __warn(). In that case the `struct pt_regs` passed to the interrupt handler is never unpoisoned by KMSAN (this is normally done in irqentry_enter()), which leads to false positives inside handle_bug(). Use kmsan_unpoison_entry_regs() to explicitly unpoison those registers before using them. Link: https://lkml.kernel.org/r/20221102110611.1085175-5-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
6426773410 |
x86: Fix misc small issues
Fix: ./arch/x86/kernel/traps.c: asm/proto.h is included more than once. ./arch/x86/kernel/alternative.c:1610:2-3: Unneeded semicolon. [ bp: Merge into a single patch. ] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/1620902768-53822-1-git-send-email-jiapeng.chong@linux.alibaba.com Link: https://lore.kernel.org/r/20220926054628.116957-1-jiapeng.chong@linux.alibaba.com |
|
|
|
c063a217bc |
x86/percpu: Move current_top_of_stack next to current_task
Extend the struct pcpu_hot cacheline with current_top_of_stack; another very frequently used value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.493038635@infradead.org |
|
|
|
3c516f89e1 |
x86: Add support for CONFIG_CFI_CLANG
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately
before each function and a check to validate the target function type
before indirect calls:
; type preamble
__cfi_function:
mov <id>, %eax
function:
...
; indirect call check
mov -<id>,%r10d
add -0x4(%r11),%r10d
je .Ltmp1
ud2
.Ltmp1:
call __x86_indirect_thunk_r11
Add error handling code for the ud2 traps emitted for the checks, and
allow CONFIG_CFI_CLANG to be selected on x86_64.
This produces the following oops on CFI failure (generated using lkdtm):
[ 21.441706] CFI failure at lkdtm_indirect_call+0x16/0x20 [lkdtm]
(target: lkdtm_increment_int+0x0/0x10 [lkdtm]; expected type: 0x7e0c52a)
[ 21.444579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
[ 21.445296] CPU: 0 PID: 132 Comm: sh Not tainted
5.19.0-rc8-00020-g9f27360e674c #1
[ 21.445296] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014
[ 21.445296] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm]
[ 21.445296] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f
44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8
[ 21.445296] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292
[ 21.445296] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700
[ 21.445296] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450
[ 21.445296] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610
[ 21.445296] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000
[ 21.445296] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002
[ 21.445296] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000)
knlGS:0000000000000000
[ 21.445296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 21.445296] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0
[ 21.445296] Call Trace:
[ 21.445296] <TASK>
[ 21.445296] lkdtm_CFI_FORWARD_PROTO+0x30/0x50 [lkdtm]
[ 21.445296] direct_entry+0x12d/0x140 [lkdtm]
[ 21.445296] full_proxy_write+0x5d/0xb0
[ 21.445296] vfs_write+0x144/0x460
[ 21.445296] ? __x64_sys_wait4+0x5a/0xc0
[ 21.445296] ksys_write+0x69/0xd0
[ 21.445296] do_syscall_64+0x51/0xa0
[ 21.445296] entry_SYSCALL_64_after_hwframe+0x63/0xcd
[ 21.445296] RIP: 0033:0x7f08b41a6fe1
[ 21.445296] Code: be 07 00 00 00 41 89 c0 e8 7e ff ff ff 44 89 c7 89
04 24 e8 91 c6 02 00 8b 04 24 48 83 c4 68 c3 48 63 ff b8 01 00 00 03
[ 21.445296] RSP: 002b:00007ffcdf65c2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
[ 21.445296] RAX: ffffffffffffffda RBX: 00007f08b4221690 RCX: 00007f08b41a6fe1
[ 21.445296] RDX: 0000000000000012 RSI: 0000000000c738f0 RDI: 0000000000000001
[ 21.445296] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefeffc5ff4e
[ 21.445296] R10: 00007f08b42222b0 R11: 0000000000000246 R12: 0000000000c738f0
[ 21.445296] R13: 0000000000000012 R14: 00007ffcdf65c401 R15: 0000000000c70450
[ 21.445296] </TASK>
[ 21.445296] Modules linked in: lkdtm
[ 21.445296] Dumping ftrace buffer:
[ 21.445296] (ftrace buffer empty)
[ 21.471442] ---[ end trace 0000000000000000 ]---
[ 21.471811] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm]
[ 21.472467] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f
44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8
[ 21.474400] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292
[ 21.474735] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700
[ 21.475664] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450
[ 21.476471] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610
[ 21.477127] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000
[ 21.477959] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002
[ 21.478657] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000)
knlGS:0000000000000000
[ 21.479577] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 21.480307] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0
[ 21.481460] Kernel panic - not syncing: Fatal exception
Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Kees Cook <keescook@chromium.org>
Tested-by: Kees Cook <keescook@chromium.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20220908215504.3686827-23-samitolvanen@google.com
|
|
|
|
42b682a30f |
- A bunch of changes towards streamlining low level asm helpers' calling
conventions so that former can be converted to C eventually - Simplify PUSH_AND_CLEAR_REGS so that it can be used at the system call entry paths instead of having opencoded, slightly different variants of it everywhere - Misc other fixes -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmKLeQEACgkQEsHwGGHe VUqFqQ/6AkVfWa9EMnmOcFcUYHjK7srsv7kzppc2P6ly98QOJFsCYagPRHVHXGZF k4Dezk29j2d4AjVdGot/CpTlRezSe0dmPxTcH5QD+SpiJ8bSgMrnH/0La+No0ypi VabOZgQaHWIUboccpE77oIRdglun/ZnePN3gRdBRtQWgmeQZVWxD6ly6L1Ptp1Lk nBXVMpH2h5agLjulsw7j7PihrbM6RFf3qSw4GkaQAAxooxb2i7qb05sG347lm72l 3ppsHtP80MKCmJpe20O+V+O4Hvq1/XJ18Tin6p1bhqSe0PW2pS5QUN7ziF/5orvH 9p8PVWrrH6kTaK1NJilGYG4eIeyuWhSVnObgFqbe7RIITy5eCYXyaq5PLqVahWFD qk1+Z3nsS6g6BLu10dFACnPq7O+6tVEWsoOZ2D4XJAV/zThbEwE75E4rW6x07gnm s0BzXgtzb0s35L46jzTctc9RtdCRFjZmD+iHXSqjEfH/dyS1tsvXX6z5wBTb5qn3 FQE3sVtZs0e5yIFAfp19hzmweY/Mgu9b1p+IfkhQhInrLyJNwUVsMkpH1WFdkL5/ RZWtURuYO7lE6Iw1wwZPL691A7hx+1cE9YWuEBH2Il6byJa4UWP4azXCx1nbMFKk E5ZDKL3iRsDPVI+k+D6NwBN19ih2LAmT2Mxcg1EOV434LLlkHsk= =P80f -----END PGP SIGNATURE----- Merge tag 'x86_asm_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Borislav Petkov: - A bunch of changes towards streamlining low level asm helpers' calling conventions so that former can be converted to C eventually - Simplify PUSH_AND_CLEAR_REGS so that it can be used at the system call entry paths instead of having opencoded, slightly different variants of it everywhere - Misc other fixes * tag 'x86_asm_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Fix register corruption in compat syscall objtool: Fix STACK_FRAME_NON_STANDARD reloc type linkage: Fix issue with missing symbol size x86/entry: Remove skip_r11rcx x86/entry: Use PUSH_AND_CLEAR_REGS for compat x86/entry: Simplify entry_INT80_compat() x86/mm: Simplify RESERVE_BRK() x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS x86/entry: Don't call error_entry() for XENPV x86/entry: Move CLD to the start of the idtentry macro x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry() x86/entry: Switch the stack after error_entry() returns x86/traps: Use pt_regs directly in fixup_bad_iret() |
|
|
|
0aca53c6b5 |
x86/traps: Use pt_regs directly in fixup_bad_iret()
Always stash the address error_entry() is going to return to, in %r12 and get rid of the void *error_entry_ret; slot in struct bad_iret_stack which was supposed to account for it and pt_regs pushed on the stack. After this, both fixup_bad_iret() and sync_regs() can work on a struct pt_regs pointer directly. [ bp: Rewrite commit message, touch ups. ] Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220503032107.680190-2-jiangshanlai@gmail.com |
|
|
|
7dbde76316 |
x86/mm/cpa: Add support for TDX shared memory
Intel TDX protects guest memory from VMM access. Any memory that is required for communication with the VMM must be explicitly shared. It is a two-step process: the guest sets the shared bit in the page table entry and notifies VMM about the change. The notification happens using MapGPA hypercall. Conversion back to private memory requires clearing the shared bit, notifying VMM with MapGPA hypercall following with accepting the memory with AcceptPage hypercall. Provide a TDX version of x86_platform.guest.* callbacks. It makes __set_memory_enc_pgtable() work right in TDX guest. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-27-kirill.shutemov@linux.intel.com |
|
|
|
9a22bf6deb |
x86/traps: Add #VE support for TDX guest
Virtualization Exceptions (#VE) are delivered to TDX guests due to specific guest actions which may happen in either user space or the kernel: * Specific instructions (WBINVD, for example) * Specific MSR accesses * Specific CPUID leaf accesses * Access to specific guest physical addresses Syscall entry code has a critical window where the kernel stack is not yet set up. Any exception in this window leads to hard to debug issues and can be exploited for privilege escalation. Exceptions in the NMI entry code also cause issues. Returning from the exception handler with IRET will re-enable NMIs and nested NMI will corrupt the NMI stack. For these reasons, the kernel avoids #VEs during the syscall gap and the NMI entry code. Entry code paths do not access TD-shared memory, MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves that might generate #VE. VMM can remove memory from TD at any point, but access to unaccepted (or missing) private memory leads to VM termination, not to #VE. Similarly to page faults and breakpoints, #VEs are allowed in NMI handlers once the kernel is ready to deal with nested NMIs. During #VE delivery, all interrupts, including NMIs, are blocked until TDGETVEINFO is called. It prevents #VE nesting until the kernel reads the VE info. TDGETVEINFO retrieves the #VE info from the TDX module, which also clears the "#VE valid" flag. This must be done before anything else as any #VE that occurs while the valid flag is set escalates to #DF by TDX module. It will result in an oops. Virtual NMIs are inhibited if the #VE valid flag is set. NMI will not be delivered until TDGETVEINFO is called. For now, convert unhandled #VE's (everything, until later in this series) so that they appear just like a #GP by calling the ve_raise_fault() directly. The ve_raise_fault() function is similar to #GP handler and is responsible for sending SIGSEGV to userspace and CPU die and notifying debuggers and other die chain users. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220405232939.73860-8-kirill.shutemov@linux.intel.com |
|
|
|
775acc82a8 |
x86/traps: Refactor exc_general_protection()
TDX brings a new exception -- Virtualization Exception (#VE). Handling of #VE structurally very similar to handling #GP. Extract two helpers from exc_general_protection() that can be reused for handling #VE. No functional changes. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220405232939.73860-7-kirill.shutemov@linux.intel.com |
|
|
|
8c490b42fe |
Merge branch 'x86/pasid' into x86/core, to resolve conflicts
Conflicts: tools/objtool/arch/x86/decode.c Signed-off-by: Ingo Molnar <mingo@kernel.org> |
|
|
|
991625f3dd |
x86/ibt: Add IBT feature, MSR and #CP handling
The bits required to make the hardware go.. Of note is that, provided the syscall entry points are covered with ENDBR, #CP doesn't need to be an IST because we'll never hit the syscall gap. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org |
|
|
|
a365a65f9c |
x86/traps: Mark do_int3() NOKPROBE_SYMBOL
Since kprobe_int3_handler() is called in do_int3(), probing do_int3()
can cause a breakpoint recursion and crash the kernel. Therefore,
do_int3() should be marked as NOKPROBE_SYMBOL.
Fixes:
|
|
|
|
fa6af69f38 |
x86/traps: Demand-populate PASID MSR via #GP
All tasks start with PASID state disabled. This means that the first time they execute an ENQCMD instruction they will take a #GP fault. Modify the #GP fault handler to check if the "mm" for the task has already been allocated a PASID. If so, try to fix the #GP fault by loading the IA32_PASID MSR. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220207230254.3342514-9-fenghua.yu@intel.com |
|
|
|
cc0356d6a0 |
- Do not #GP on userspace use of CLI/STI but pretend it was a NOP to
keep old userspace from breaking. Adjust the corresponding iopl selftest to that. - Improve stack overflow warnings to say which stack got overflowed and raise the exception stack sizes to 2 pages since overflowing the single page of exception stack is very easy to do nowadays with all the tracing machinery enabled. With that, rip out the custom mapping of AMD SEV's too. - A bunch of changes in preparation for FGKASLR like supporting more than 64K section headers in the relocs tool, correct ORC lookup table size to cover the whole kernel .text and other adjustments. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/uugACgkQEsHwGGHe VUroKw//e8BJ3Aun8bg00FHxfiMGbPYcozjLGDkaoMtMDZ8WlfCUrvtqYICEr8eB UU0eRyygAPI167dre1O9JvAcbilkNTKntaU6qbu/ZVyUwS3+Jkjwsotbqn3xKtkd QDDTDNiCU+beCJ2ZbspbrPgEh13+H0MwMHUfRxZB9Scpmo6aGSEaU3g295f6GX57 VFGJ/LNov5MV1dTD7Pp/h6/Nb+R6WmflKcBzJmQxYuKyKX+g1xsSv0VSga+t+uf3 M9pUkizqTiUxzC2eLgtcEZTqqBHu810E8M76FmhKBUMilsFJT5YAJTiqyahwHXds HYarOFRgcnFuJPd29vn8UHjqeeoi6ru8GtcZYzccEc7U3ku/gXPaDJ9ffmvhs7vU pJX5Um3GiiFm0w/ZZOKDqh78wRAsCKLN+jIoyszuhkkNchZSj/jKfOgdd3EmcZst 6L6rxBA4oRHwNOgM7uVMp+jFeRe1/prR280OWWH0D4QmmuqybThOdO23Iuh/Deth W3qPUH3UQtfSWxGy2yODzJ1ciuGAr/AzJZ9zjg04e3Vl0DkEpyWtLKJiG3ClXZag Nj+3xc4xYH2Aw+M0HRaONk5XVKLpqVjuAfgU5iLQa0YSUbtrR+wCWvY8KgQNbAqK xZmzYzQ89stwVCuGKx10gPsL3jSJ3VCylMfqdHD2Ajmld1yApr0= =DOZU -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to keep old userspace from breaking. Adjust the corresponding iopl selftest to that. - Improve stack overflow warnings to say which stack got overflowed and raise the exception stack sizes to 2 pages since overflowing the single page of exception stack is very easy to do nowadays with all the tracing machinery enabled. With that, rip out the custom mapping of AMD SEV's too. - A bunch of changes in preparation for FGKASLR like supporting more than 64K section headers in the relocs tool, correct ORC lookup table size to cover the whole kernel .text and other adjustments. * tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage vmlinux.lds.h: Have ORC lookup cover entire _etext - _stext x86/boot/compressed: Avoid duplicate malloc() implementations x86/boot: Allow a "silent" kaslr random byte fetch x86/tools/relocs: Support >64K section headers x86/sev: Make the #VC exception stacks part of the default stacks storage x86: Increase exception stack sizes x86/mm/64: Improve stack overflow warnings x86/iopl: Fake iopl(3) CLI/STI usage |
|
|
|
20273d2588 |
- Export sev_es_ghcb_hv_call() so that HyperV Isolation VMs can use it too
- Non-urgent fixes and cleanups -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmF/xXMACgkQEsHwGGHe VUpFohAAn1FcRfgUh4a7SZQudhWaYPye0Yaf9c9acJIDYfls4Qg3ZLvSNGS0QChW pcjNQzr42UymxZKq1t6JGaUlD0vkfW0p+w5wueeIxMltWG0oZXgUPhqWrFTLwBtR g5Gio3Jum1CULCMokS6W4MjJSkTtX5NyYPg+m5Siowy10cbBdYA4wJaKnwGslPT7 4pCDQP5159cjmG9WthKppxUdFy/vql0NJhjxmUkha39eVJ7yLoWvJoubQqqGnqXF XHwFolZGBxm4Ed4XoUjtz4HgI0VD1JOImUBPqnaE/uyrU7bqqywe5/PpZP051xtF anpWBm8KbZFsh220bSRJdFQxQBiXaIA41tfBiqVQhrgPy6TKgq7glhD4/ZjvUAdu DDg2HYEnK3dBAOCa7zIj/+uTijD1nvvuhQblGB2PnvnD2RWWgl+0vZ9Wqspo0EyW ry5V7hGCMC3mgFexTtvwd1hvMJVYrKfyn2XcP9B+zdgpUJ9DprB+g1O1J6NkGe1r SKS6itMokVRd+I+16iFQh0PuywqldbNv9dby6bd+dtvxAcVER2vUA0C7wmjqX4Mx bpftPrNhdNmgQAYlN/tRIfh2t2cFTJnWegVBBErdEfafiqKL9lU8gQlMVgwY10o+ a1ALQ5cUI9Y0xS4cJtfVBVIekqIwEbmniS66iMlMiEJx+Ar6T8g= =Gql9 -----END PGP SIGNATURE----- Merge tag 'x86_sev_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Export sev_es_ghcb_hv_call() so that HyperV Isolation VMs can use it too - Non-urgent fixes and cleanups * tag 'x86_sev_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV x86/sev: Allow #VC exceptions on the VC2 stack x86/sev: Fix stack type check in vc_switch_off_ist() x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c x86/sev: Carve out HV call's return value verification |
|
|
|
783e87b404 |
x86/fpu/xstate: Add XFD #NM handler
If the XFD MSR has feature bits set then #NM will be raised when user space attempts to use an instruction related to one of these features. When the task has no permissions to use that feature, raise SIGILL, which is the same behavior as #UD. If the task has permissions, calculate the new buffer size for the extended feature set and allocate a larger fpstate. In the unlikely case that vzalloc() fails, SIGSEGV is raised. The allocation function will be added in the next step. Provide a stub which fails for now. [ tglx: Updated serialization ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com |
|
|
|
5681981fb7 |
x86/sev: Fix stack type check in vc_switch_off_ist()
The value of STACK_TYPE_EXCEPTION_LAST points to the last _valid_
exception stack. Reflect that in the check done in the
vc_switch_off_ist() function.
Fixes:
|
|
|
|
b56d2795b2 |
x86/fpu: Replace the includes of fpu/internal.h
Now that the file is empty, fixup all references with the proper includes and delete the former kitchen sink. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211015011540.001197214@linutronix.de |
|
|
|
44b979fa30 |
x86/mm/64: Improve stack overflow warnings
Current code has an explicit check for hitting the task stack guard; but overflowing any of the other stacks will get you a non-descript general #DF warning. Improve matters by using get_stack_info_noinstr() to detetrmine if and which stack guard page got hit, enabling a better stack warning. In specific, Michael Wang reported what turned out to be an NMI exception stack overflow, which is now clearly reported as such: [] BUG: NMI stack guard page was hit at 0000000085fd977b (stack is 000000003a55b09e..00000000d8cce1a5) Reported-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Wang <yun.wang@linux.alibaba.com> Link: https://lkml.kernel.org/r/YUTE/NuqnaWbST8n@hirez.programming.kicks-ass.net |
|
|
|
b968e84b50 |
x86/iopl: Fake iopl(3) CLI/STI usage
Since commit |
|
|
|
1423e2660c |
Fixes and improvements for FPU handling on x86:
- Prevent sigaltstack out of bounds writes. The kernel unconditionally
writes the FPU state to the alternate stack without checking whether
the stack is large enough to accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been
updated despite the fact that the FPU state which is stored on the
signal stack has grown over time which causes trouble in the field
when AVX512 is available on a CPU. The kernel does not expose the
minimum requirements for the alternate stack size depending on the
available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well
- A major cleanup of the x86 FPU code. The recent discoveries of XSTATE
related issues unearthed quite some inconsistencies, duplicated code
and other issues.
The fine granular overhaul addresses this, makes the code more robust
and maintainable, which allows to integrate upcoming XSTATE related
features in sane ways.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmDlcpETHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoeP5D/4i+AgYYeiMLgGb+NS7iaKPfoWo6LIz
y3qdTSA0DQaIYbYivWwRO/g0GYdDMXDWeZalFi7eGnVI8O3eOog+22Zrf/y0UINB
KJHdYd4ApWHhs401022y5hexrWQvnV8w1yQCuj/zLm6eC+AVhdwt2AY+IBoRrdUj
wqY97B/4rJNsBvvqTDn9EeDrJA2y0y0Suc7AhIp2BGMI+dpIdxys8RJDamXNWyDL
gJf0YRgUoiIn3AHKb+fgv60AoxfC175NSg/5/y/scFNXqVlW0Up4YCb7pqG9o2Ga
f3XvtWfbw1N5PmUYjFkALwEkzGUbM3v0RA3xLY2j2WlWm9fBPPy59dt+i/h/VKyA
GrA7i7lcIqX8dfVH6XkrReZBkRDSB6t9SZTvV54jAz5fcIZO2Rg++UFUvI/R6GKK
XCcxukYaArwo+IG62iqDszS3gfLGhcor/cviOeULRC5zMUIO4Jah+IhDnifmShtC
M5s9QzrwIRD/XMewGRQmvkiN4kBfE7jFoBQr1J9leCXJKrM+2JQmMzVInuubTQIq
SdlKOaAIn7xtekz+6XdFG9Gmhck0PCLMJMOLNvQkKWI3KqGLRZ+dAWKK0vsCizAx
0BA7ZeB9w9lFT+D8mQCX77JvW9+VNwyfwIOLIrJRHk3VqVpS5qvoiFTLGJJBdZx4
/TbbRZu7nXDN2w==
=Mq1m
-----END PGP SIGNATURE-----
Merge tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fpu updates from Thomas Gleixner:
"Fixes and improvements for FPU handling on x86:
- Prevent sigaltstack out of bounds writes.
The kernel unconditionally writes the FPU state to the alternate
stack without checking whether the stack is large enough to
accomodate it.
Check the alternate stack size before doing so and in case it's too
small force a SIGSEGV instead of silently corrupting user space
data.
- MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never
been updated despite the fact that the FPU state which is stored on
the signal stack has grown over time which causes trouble in the
field when AVX512 is available on a CPU. The kernel does not expose
the minimum requirements for the alternate stack size depending on
the available and enabled CPU features.
ARM already added an aux vector AT_MINSIGSTKSZ for the same reason.
Add it to x86 as well.
- A major cleanup of the x86 FPU code. The recent discoveries of
XSTATE related issues unearthed quite some inconsistencies,
duplicated code and other issues.
The fine granular overhaul addresses this, makes the code more
robust and maintainable, which allows to integrate upcoming XSTATE
related features in sane ways"
* tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits)
x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again
x86/fpu/signal: Let xrstor handle the features to init
x86/fpu/signal: Handle #PF in the direct restore path
x86/fpu: Return proper error codes from user access functions
x86/fpu/signal: Split out the direct restore code
x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing()
x86/fpu/signal: Sanitize the xstate check on sigframe
x86/fpu/signal: Remove the legacy alignment check
x86/fpu/signal: Move initial checks into fpu__restore_sig()
x86/fpu: Mark init_fpstate __ro_after_init
x86/pkru: Remove xstate fiddling from write_pkru()
x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate()
x86/fpu: Remove PKRU handling from switch_fpu_finish()
x86/fpu: Mask PKRU from kernel XRSTOR[S] operations
x86/fpu: Hook up PKRU into ptrace()
x86/fpu: Add PKRU storage outside of task XSAVE buffer
x86/fpu: Dont restore PKRU in fpregs_restore_userspace()
x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi()
x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs()
x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs()
...
|
|
|
|
b2681e791d |
x86/fpu: Rename and sanitize fpu__save/copy()
Both function names are a misnomer. fpu__save() is actually about synchronizing the hardware register state into the task's memory state so that either coredump or a math exception handler can inspect the state at the time where the problem happens. The function guarantees to preserve the register state, while "save" is a common terminology for saving the current state so it can be modified and restored later. This is clearly not the case here. Rename it to fpu_sync_fpstate(). fpu__copy() is used to clone the current task's FPU state when duplicating task_struct. While the register state is a copy the rest of the FPU state is not. Name it accordingly and remove the really pointless @src argument along with the warning which comes along with it. Nothing can ever copy the FPU state of a non-current task. It's clearly just a consequence of arch_dup_task_struct(), but it makes no sense to proliferate that further. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.196727450@linutronix.de |
|
|
|
1dcc917a0e |
x86/idt: Rework IDT setup for boot CPU
A basic IDT setup for the boot CPU has to be done before invoking cpu_init() because that might trigger #GP when accessing certain MSRs. This setup cannot install the IST variants on 64-bit because the TSS setup which is required for ISTs to work happens in cpu_init(). That leaves a theoretical window where a NMI would invoke the ASM entry point which relies on IST being enabled on the kernel stack which is undefined behaviour. This setup logic has never worked correctly, but on the other hand a NMI hitting the boot CPU before it has fully set up the IDT would be fatal anyway. So the small window between the wrong NMI gate and the IST based NMI gate is not really adding a substantial amount of risk. But the setup logic is nevertheless more convoluted than necessary. The recent separation of the TSS setup into a separate function to ensure that setup so it can setup TSS first, then initialize IDT with the IST variants before invoking cpu_init() and get rid of the post cpu_init() IST setup. Move the invocation of cpu_init_exception_handling() ahead of idt_setup_traps() and merge the IST setup into the default setup table. Reported-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210507114000.569244755@linutronix.de |
|
|
|
b1efd0ff4b |
x86/cpu: Init AP exception handling from cpu_init_secondary()
SEV-ES guests require properly setup task register with which the TSS descriptor in the GDT can be located so that the IST-type #VC exception handler which they need to function properly, can be executed. This setup needs to happen before attempting to load microcode in ucode_cpu_init() on secondary CPUs which can cause such #VC exceptions. Simplify the machinery by running that exception setup from a new function cpu_init_secondary() and explicitly call cpu_init_exception_handling() for the boot CPU before cpu_init(). The latter prepares for fixing and simplifying the exception/IST setup on the boot CPU. There should be no functional changes resulting from this patch. [ tglx: Reworked it so cpu_init_exception_handling() stays seperate ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/87k0o6gtvu.ffs@nanos.tec.linutronix.de |
|
|
|
c6536676c7 |
- turn the stack canary into a normal __percpu variable on 32-bit which
gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCHyJQACgkQEsHwGGHe VUpjiRAAwPZdwwp08ypZuMHR4EhLNru6gYhbAoALGgtYnQjLtn5onQhIeieK+R4L cmZpxHT9OFp5dXHk4kwygaQBsD4pPOiIpm60kye1dN3cSbOORRdkwEoQMpKMZ+5Y kvVsmn7lrwRbp600KdE4G6L5+N6gEgr0r6fMFWWGK3mgVAyCzPexVHgydcp131ch iYMo6/pPDcNkcV/hboVKgx7GISdQ7L356L1MAIW/Sxtw6uD/X4qGYW+kV2OQg9+t nQDaAo7a8Jqlop5W5TQUdMLKQZ1xK8SFOSX/nTS15DZIOBQOGgXR7Xjywn1chBH/ PHLwM5s4XF6NT5VlIA8tXNZjWIZTiBdldr1kJAmdDYacrtZVs2LWSOC0ilXsd08Z EWtvcpHfHEqcuYJlcdALuXY8xDWqf6Q2F7BeadEBAxwnnBg+pAEoLXI/1UwWcmsj wpaZTCorhJpYo2pxXckVdHz2z0LldDCNOXOjjaWU8tyaOBKEK6MgAaYU7e0yyENv mVc9n5+WuvXuivC6EdZ94Pcr/KQsd09ezpJYcVfMDGv58YZrb6XIEELAJIBTu2/B Ua8QApgRgetx+1FKb8X6eGjPl0p40qjD381TADb4rgETPb1AgKaQflmrSTIik+7p O+Eo/4x/GdIi9jFk3K+j4mIznRbUX0cheTJgXoiI4zXML9Jv94w= =bm4S -----END PGP SIGNATURE----- Merge tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ... |
|
|
|
64f8e73de0 |
Support for enhanced split lock detection:
Newer CPUs provide a second mechanism to detect operations with lock prefix which go accross a cache line boundary. Such operations have to take bus lock which causes a system wide performance degradation when these operations happen frequently. The new mechanism is not using the #AC exception. It triggers #DB and is restricted to operations in user space. Kernel side split lock access can only be detected by the #AC based variant. Contrary to the #AC based mechanism the #DB based variant triggers _after_ the instruction was executed. The mechanism is CPUID enumerated and contrary to the #AC version which is based on the magic TEST_CTRL_MSR and model/family based enumeration on the way to become architectural. -----BEGIN PGP SIGNATURE----- iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmCGkr8THHRnbHhAbGlu dXRyb25peC5kZQAKCRCmGPVMDXSYodUKD/9tUXhInR7+1ykEHpMvdmSp48vqY3nc sKmT22pPl+OchnJ62mw3T8gKpBYVleJmcCaY2qVx7hfaVcWApLGJvX4tmfXmv422 XDSJ6b8Os6wfgx5FR//I17z8ZtXnnuKkPrTMoRsQUw2qLq31y6fdQv+GW/cc1Kpw mengjmPE+HnpaKbtuQfPdc4a+UvLjvzBMAlDZPTBPKYrP4FFqYVnUVwyTg5aLVDY gHz4V8+b502RS/zPfTAtE3J848od+NmcUPdFlcG9DVA+hR0Rl0thvruCTFiD2vVh i9DJ7INof5FoJDEzh0dGsD7x+MB6OY8GZyHdUMeGgIRPtWkqrG52feQQIn2YYlaL fB3DlpNv7NIJ/0JMlALvh8S0tEoOcYdHqH+M/3K/zbzecg/FAo+lVo8WciGLPqWs ykUG5/f/OnlTvgB8po1ebJu0h0jHnoK9heWWXk9zWIRVDPXHFOWKW3kSbTTb3icR 9hfjP/SNejpmt9Ju1OTwsgnV7NALIdVX+G5jyIEsjFl31Co1RZNYhHLFvi11FWlQ /ssvFK9O5ZkliocGCAN9+yuOnM26VqWSCE4fis6/2aSgD2Y4Gpvb//cP96SrcNAH u8eXNvGLlniJP3F3JImWIfIPQTrpvQhcU4eZ6NtviXqj/utQXX6c9PZ1PLYpcvUh 9AWF8rwhT8X4oA== =lmi8 -----END PGP SIGNATURE----- Merge tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bus lock detection updates from Thomas Gleixner: "Support for enhanced split lock detection: Newer CPUs provide a second mechanism to detect operations with lock prefix which go accross a cache line boundary. Such operations have to take bus lock which causes a system wide performance degradation when these operations happen frequently. The new mechanism is not using the #AC exception. It triggers #DB and is restricted to operations in user space. Kernel side split lock access can only be detected by the #AC based variant. Contrary to the #AC based mechanism the #DB based variant triggers _after_ the instruction was executed. The mechanism is CPUID enumerated and contrary to the #AC version which is based on the magic TEST_CTRL_MSR and model/family based enumeration on the way to become architectural" * tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/admin-guide: Change doc for split_lock_detect parameter x86/traps: Handle #DB for bus lock x86/cpufeatures: Enumerate #DB for bus lock detection |
|
|
|
ea5bc7b977 |
Trivial cleanups and fixes all over the place.
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmCGmYIACgkQEsHwGGHe VUr45w/8CSXr7MXaFBj4To0hTWJXSZyF6YGqlZOSJXFcFh4cWTNwfVOoFaV47aDo +HsCNTkGENcKhLrDUWDRiG/Uo46jxtOtl1vhq7U4pGemSYH871XWOKfb5k5XNMwn /uhaHMI4aEfd6bUFnF518NeyRIsD0BdqFj4tB7RbAiyFwdETDX9Tkj/uBKnQ4zon 4tEDoXgThuK5YKK9zVQg5pa7aFp2zg1CAdX/WzBkS8BHVBPXSV0CF97AJYQOM/V+ lUHv+BN3wp97GYHPQMPsbkNr8IuFoe2mIvikwjxg8iOFpzEU1G1u09XV9R+PXByX LclFTRqK/2uU5hJlcsBiKfUuidyErYMRYImbMAOREt2w0ogWVu2zQ7HkjVve25h1 sQPwPudbAt6STbqRxvpmB3yoV4TCYwnF91FcWgEy+rcEK2BDsHCnScA45TsK5I1C kGR1K17pHXprgMZFPveH+LgxewB6smDv+HllxQdSG67LhMJXcs2Epz0TsN8VsXw8 dlD3lGReK+5qy9FTgO7mY0xhiXGz1IbEdAPU4eRBgih13puu03+jqgMaMabvBWKD wax+BWJUrPtetwD5fBPhlS/XdJDnd8Mkv2xsf//+wT0s4p+g++l1APYxeB8QEehm Pd7Mvxm4GvQkfE13QEVIPYQRIXCMH/e9qixtY5SHUZDBVkUyFM0= =bO1i -----END PGP SIGNATURE----- Merge tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions |
|
|
|
632a1c209b |
x86/traps: Correct exc_general_protection() and math_error() return paths
Commit |
|
|
|
52fa82c21f |
x86: Add insn_decode_kernel()
Add a helper to decode kernel instructions; there's no point in endlessly repeating those last two arguments. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210326151259.379242587@infradead.org |
|
|
|
ebb1064e7c |
x86/traps: Handle #DB for bus lock
Bus locks degrade performance for the whole system, not just for the CPU that requested the bus lock. Two CPU features "#AC for split lock" and "#DB for bus lock" provide hooks so that the operating system may choose one of several mitigation strategies. #AC for split lock is already implemented. Add code to use the #DB for bus lock feature to cover additional situations with new options to mitigate. split_lock_detect= #AC for split lock #DB for bus lock off Do nothing Do nothing warn Kernel OOPs Warn once per task and Warn once per task and and continues to run. disable future checking When both features are supported, warn in #AC fatal Kernel OOPs Send SIGBUS to user. Send SIGBUS to user When both features are supported, fatal in #AC ratelimit:N Do nothing Limit bus lock rate to N per second in the current non-root user. Default option is "warn". Hardware only generates #DB for bus lock detect when CPL>0 to avoid nested #DB from multiple bus locks while the first #DB is being handled. So no need to handle #DB for bus lock detected in the kernel. #DB for bus lock is enabled by bus lock detection bit 2 in DEBUGCTL MSR while #AC for split lock is enabled by split lock detection bit 29 in TEST_CTRL MSR. Both breakpoint and bus lock in the same instruction can trigger one #DB. The bus lock is handled before the breakpoint in the #DB handler. Delivery of #DB for bus lock in userspace clears DR6[11], which is set by the #DB handler right after reading DR6. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Link: https://lore.kernel.org/r/20210322135325.682257-3-fenghua.yu@intel.com |
|
|
|
6256e668b7 |
x86/kprobes: Use int3 instead of debug trap for single-step
Use int3 instead of debug trap exception for single-stepping the probed instructions. Some instructions which change the ip registers or modify IF flags are emulated because those are not able to be single-stepped by int3 or may allow the interrupt while single-stepping. This actually changes the kprobes behavior. - kprobes can not probe following instructions; int3, iret, far jmp/call which get absolute address as immediate, indirect far jmp/call, indirect near jmp/call with addressing by memory (register-based indirect jmp/call are OK), and vmcall/vmlaunch/vmresume/vmxoff. - If the kprobe post_handler doesn't set before registering, it may not be called in some case even if you set it afterwards. (IOW, kprobe booster is enabled at registration, user can not change it) But both are rare issue, unsupported instructions will not be used in the kernel (or rarely used), and post_handlers are rarely used (I don't see it except for the test code). Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/161469874601.49483.11985325887166921076.stgit@devnote2 |
|
|
|
d9f6e12fb0 |
x86: Fix various typos in comments
Fix ~144 single-word typos in arch/x86/ code comments. Doing this in a single commit should reduce the churn. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: linux-kernel@vger.kernel.org |
|
|
|
0be7f42d6f |
x86/traps: Convert to insn_decode()
Simplify code, no functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210304174237.31945-15-bp@alien8.de |
|
|
|
78a81d88f6 |
x86/sev-es: Introduce ip_within_syscall_gap() helper
Introduce a helper to check whether an exception came from the syscall
gap and use it in the SEV-ES code. Extend the check to also cover the
compatibility SYSCALL entry path.
Fixes:
|
|
|
|
e14fd4ba8f |
x86/split-lock: Avoid returning with interrupts enabled
When a split lock is detected always make sure to disable interrupts before returning from the trap handler. The kernel exit code assumes that all exits run with interrupts disabled, otherwise the SWAPGS sequence can race against interrupts and cause recursing page faults and later panics. The problem will only happen on CPUs with split lock disable functionality, so Icelake Server, Tiger Lake, Snow Ridge, Jacobsville. Fixes: |
|
|
|
1ac0884d54 |
A set of updates for entry/exit handling:
- More generalization of entry/exit functionality
- The consolidation work to reclaim TIF flags on x86 and also for non-x86
specific TIF flags which are solely relevant for syscall related work
and have been moved into their own storage space. The x86 specific part
had to be merged in to avoid a major conflict.
- The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
delivery mode of task work and results in an impressive performance
improvement for io_uring. The non-x86 consolidation of this is going to
come seperate via Jens.
- The selective syscall redirection facility which provides a clean and
efficient way to support the non-Linux syscalls of WINE by catching them
at syscall entry and redirecting them to the user space emulation. This
can be utilized for other purposes as well and has been designed
carefully to avoid overhead for the regular fastpath. This includes the
core changes and the x86 support code.
- Simplification of the context tracking entry/exit handling for the users
of the generic entry code which guarantee the proper ordering and
protection.
- Preparatory changes to make the generic entry code accomodate S390
specific requirements which are mostly related to their syscall restart
mechanism.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAl/XoPoTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoe0tD/4jSKHIogVM9kVpiYfwjDGS1NluaBXn
71ZoASbX9GZebyGandMyF2QP1iJ24ZO0RztBwHEVH6fyomKB2iFNedssCpO9yfWV
3eFRpOvMpbszY2W2bd0QG3GrqaTttjVfB4ahkGLzqeSbchdob6hZpNDYtBZnujA6
GSnrrurfJkCGoQny+yJQYdQJXQU+BIX90B2a2Q+jW123Luy/iHXC1f/krZSA1m14
fC9xYLSUjPphTzh2ZOW+C3DgdjOL5PfAm/6F+DArt4GtLgrEGD7R74aLSFhvetky
dn5QtG+yAsz1i0cc5Wu/JBcT9tOkY92rPYSyLI9bYQUSQ/bMyuprz6oYKj3dubsu
ZSsKPdkNFPIniL4fLdCMWZcIXX5xgnrxKjdgXZXW3gtrcxSns8w8uED3Sh7dgE08
pgIeq67E5g/OB8kJXH1VxdewmeQb9cOmnzzHwNO7TrrGbBKjDTYHNdYOKf1dUTTK
ZX1UjLfGwxTkMYAbQD1k0JGZ2OLRshzSaH5BW/ZKa3bvJW6yYOq+/YT8B8hbJ8U3
vThlO75/55IJxS5r5Y3vZd/IHdsYbPuETD+TA8tNYtPqNZasW8nnk4TYctWqzDuO
/Ka1wvWYid3c6ySznQn4zSyRjr968AfHeZ9YTUMhWufy5waXVmdBMG41u3IKfsVt
osyzNc4EK19/Mg==
=hsjV
-----END PGP SIGNATURE-----
Merge tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry/exit updates from Thomas Gleixner:
"A set of updates for entry/exit handling:
- More generalization of entry/exit functionality
- The consolidation work to reclaim TIF flags on x86 and also for
non-x86 specific TIF flags which are solely relevant for syscall
related work and have been moved into their own storage space. The
x86 specific part had to be merged in to avoid a major conflict.
- The TIF_NOTIFY_SIGNAL work which replaces the inefficient signal
delivery mode of task work and results in an impressive performance
improvement for io_uring. The non-x86 consolidation of this is
going to come seperate via Jens.
- The selective syscall redirection facility which provides a clean
and efficient way to support the non-Linux syscalls of WINE by
catching them at syscall entry and redirecting them to the user
space emulation. This can be utilized for other purposes as well
and has been designed carefully to avoid overhead for the regular
fastpath. This includes the core changes and the x86 support code.
- Simplification of the context tracking entry/exit handling for the
users of the generic entry code which guarantee the proper ordering
and protection.
- Preparatory changes to make the generic entry code accomodate S390
specific requirements which are mostly related to their syscall
restart mechanism"
* tag 'core-entry-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
entry: Add syscall_exit_to_user_mode_work()
entry: Add exit_to_user_mode() wrapper
entry_Add_enter_from_user_mode_wrapper
entry: Rename exit_to_user_mode()
entry: Rename enter_from_user_mode()
docs: Document Syscall User Dispatch
selftests: Add benchmark for syscall user dispatch
selftests: Add kselftest for syscall user dispatch
entry: Support Syscall User Dispatch on common syscall entry
kernel: Implement selective syscall userspace redirection
signal: Expose SYS_USER_DISPATCH si_code type
x86: vdso: Expose sigreturn address on vdso to the kernel
MAINTAINERS: Add entry for common entry code
entry: Fix boot for !CONFIG_GENERIC_ENTRY
x86: Support HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Only define schedule_user() on !HAVE_CONTEXT_TRACKING_OFFSTACK archs
sched: Detect call to schedule from critical entry code
context_tracking: Don't implement exception_enter/exit() on CONFIG_HAVE_CONTEXT_TRACKING_OFFSTACK
context_tracking: Introduce HAVE_CONTEXT_TRACKING_OFFSTACK
x86: Reclaim unused x86 TI flags
...
|
|
|
|
334872a091 |
x86/traps: Attempt to fixup exceptions in vDSO before signaling
vDSO functions can now leverage an exception fixup mechanism similar to kernel exception fixup. For vDSO exception fixup, the initial user is Intel's Software Guard Extensions (SGX), which will wrap the low-level transitions to/from the enclave, i.e. EENTER and ERESUME instructions, in a vDSO function and leverage fixup to intercept exceptions that would otherwise generate a signal. This allows the vDSO wrapper to return the fault information directly to its caller, obviating the need for SGX applications and libraries to juggle signal handlers. Attempt to fixup vDSO exceptions immediately prior to populating and sending signal information. Except for the delivery mechanism, an exception in a vDSO function should be treated like any other exception in userspace, e.g. any fault that is successfully handled by the kernel should not be directly visible to userspace. Although it's debatable whether or not all exceptions are of interest to enclaves, defer to the vDSO fixup to decide whether to do fixup or generate a signal. Future users of vDSO fixup, if there ever are any, will undoubtedly have different requirements than SGX enclaves, e.g. the fixup vs. signal logic can be made function specific if/when necessary. Suggested-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Jethro Beekman <jethro@fortanix.com> Link: https://lkml.kernel.org/r/20201112220135.165028-19-jarkko@kernel.org |
|
|
|
b6be002bcd |
x86/entry: Move nmi entry/exit into common code
Lockdep state handling on NMI enter and exit is nothing specific to X86. It's not any different on other architectures. Also the extra state type is not necessary, irqentry_state_t can carry the necessary information as well. Move it to common code and extend irqentry_state_t to carry lockdep state. [ Ira: Make exit_rcu and lockdep a union as they are mutually exclusive between the IRQ and NMI exceptions, and add kernel documentation for struct irqentry_state_t ] Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20201102205320.1458656-7-ira.weiny@intel.com |
|
|
|
cb05143bdf |
x86/debug: Fix DR_STEP vs ptrace_get_debugreg(6)
Commit |
|
|
|
a195f3d452 |
x86/debug: Only clear/set ->virtual_dr6 for userspace #DB
The ->virtual_dr6 is the value used by ptrace_{get,set}_debugreg(6). A
kernel #DB clearing it could mean spurious malfunction of ptrace()
expectations.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Kyle Huey <me@kylehuey.com>
Link: https://lore.kernel.org/r/20201027093608.028952500@infradead.org
|
|
|
|
2a9baf5ad4 |
x86/debug: Fix BTF handling
The SDM states that #DB clears DEBUGCTLMSR_BTF, this means that when the bit is set for userspace (TIF_BLOCKSTEP) and a kernel #DB happens first, the BTF bit meant for userspace execution is lost. Have the kernel #DB handler restore the BTF bit when it was requested for userspace. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Kyle Huey <me@kylehuey.com> Link: https://lore.kernel.org/r/20201027093607.956147736@infradead.org |
|
|
|
da9803dfd3 |
This feature enhances the current guest memory encryption support
called SEV by also encrypting the guest register state, making the
registers inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared between
the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest so
in order for that exception mechanism to work, the early x86 init code
needed to be made able to handle exceptions, which, in itself, brings
a bunch of very nice cleanups and improvements to the early boot code
like an early page fault handler, allowing for on-demand building of the
identity mapping. With that, !KASLR configurations do not use the EFI
page table anymore but switch to a kernel-controlled one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly
separate from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and behind
static keys to minimize the performance impact on !SEV-ES setups.
Work by Joerg Roedel and Thomas Lendacky and others.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+FiKYACgkQEsHwGGHe
VUqS5BAAlh5mKwtxXMyFyAIHa5tpsgDjbecFzy1UVmZyxN0JHLlM3NLmb+K52drY
PiWjNNMi/cFMFazkuLFHuY0poBWrZml8zRS/mExKgUJC6EtguS9FQnRE9xjDBoWQ
gOTSGJWEzT5wnFqo8qHwlC2CDCSF1hfL8ks3cUFW2tCWus4F9pyaMSGfFqD224rg
Lh/8+arDMSIKE4uH0cm7iSuyNpbobId0l5JNDfCEFDYRigQZ6pZsQ9pbmbEpncs4
rmjDvBA5eHDlNMXq0ukqyrjxWTX4ZLBOBvuLhpyssSXnnu2T+Tcxg09+ZSTyJAe0
LyC9Wfo0v78JASXMAdeH9b1d1mRYNMqjvnBItNQoqweoqUXWz7kvgxCOp6b/G4xp
cX5YhB6BprBW2DXL45frMRT/zX77UkEKYc5+0IBegV2xfnhRsjqQAQaWLIksyEaX
nz9/C6+1Sr2IAv271yykeJtY6gtlRjg/usTlYpev+K0ghvGvTmuilEiTltjHrso1
XAMbfWHQGSd61LNXofvx/GLNfGBisS6dHVHwtkayinSjXNdWxI6w9fhbWVjQ+y2V
hOF05lmzaJSG5kPLrsFHFqm2YcxOmsWkYYDBHvtmBkMZSf5B+9xxDv97Uy9NETcr
eSYk//TEkKQqVazfCQS/9LSm0MllqKbwNO25sl0Tw2k6PnheO2g=
=toqi
-----END PGP SIGNATURE-----
Merge tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 SEV-ES support from Borislav Petkov:
"SEV-ES enhances the current guest memory encryption support called SEV
by also encrypting the guest register state, making the registers
inaccessible to the hypervisor by en-/decrypting them on world
switches. Thus, it adds additional protection to Linux guests against
exfiltration, control flow and rollback attacks.
With SEV-ES, the guest is in full control of what registers the
hypervisor can access. This is provided by a guest-host exchange
mechanism based on a new exception vector called VMM Communication
Exception (#VC), a new instruction called VMGEXIT and a shared
Guest-Host Communication Block which is a decrypted page shared
between the guest and the hypervisor.
Intercepts to the hypervisor become #VC exceptions in an SEV-ES guest
so in order for that exception mechanism to work, the early x86 init
code needed to be made able to handle exceptions, which, in itself,
brings a bunch of very nice cleanups and improvements to the early
boot code like an early page fault handler, allowing for on-demand
building of the identity mapping. With that, !KASLR configurations do
not use the EFI page table anymore but switch to a kernel-controlled
one.
The main part of this series adds the support for that new exchange
mechanism. The goal has been to keep this as much as possibly separate
from the core x86 code by concentrating the machinery in two
SEV-ES-specific files:
arch/x86/kernel/sev-es-shared.c
arch/x86/kernel/sev-es.c
Other interaction with core x86 code has been kept at minimum and
behind static keys to minimize the performance impact on !SEV-ES
setups.
Work by Joerg Roedel and Thomas Lendacky and others"
* tag 'x86_seves_for_v5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (73 commits)
x86/sev-es: Use GHCB accessor for setting the MMIO scratch buffer
x86/sev-es: Check required CPU features for SEV-ES
x86/efi: Add GHCB mappings when SEV-ES is active
x86/sev-es: Handle NMI State
x86/sev-es: Support CPU offline/online
x86/head/64: Don't call verify_cpu() on starting APs
x86/smpboot: Load TSS and getcpu GDT entry before loading IDT
x86/realmode: Setup AP jump table
x86/realmode: Add SEV-ES specific trampoline entry point
x86/vmware: Add VMware-specific handling for VMMCALL under SEV-ES
x86/kvm: Add KVM-specific VMMCALL handling under SEV-ES
x86/paravirt: Allow hypervisor-specific VMMCALL handling under SEV-ES
x86/sev-es: Handle #DB Events
x86/sev-es: Handle #AC Events
x86/sev-es: Handle VMMCALL Events
x86/sev-es: Handle MWAIT/MWAITX Events
x86/sev-es: Handle MONITOR/MONITORX Events
x86/sev-es: Handle INVD Events
x86/sev-es: Handle RDPMC Events
x86/sev-es: Handle RDTSC(P) Events
...
|
|
|
|
857d64485e |
- Fix the #DE oops message string format which confused tools parsing
crash information. (Thomas Gleixner) - Remove an unused variable in the UV5 code which was triggering a build warning with clang. (Mike Travis) -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAl+F5C4ACgkQEsHwGGHe VUrDGg//TcwfoWvlH5sj9X+NnoTyBqr0cDiZ48lqArscKNHb1D1dSvcBq3EULdSZ cCviwohVpSW3VEE9xq+TYhP/7y0eDEhPrHGuI04rm+hxuN98zvsX65w9dJzsN2Q1 yVdB6fe/oq/fIKvEUfGoElhOOrWa60LCNBNfhKugCcfwOg9ljbB0yzXptlnUnApp C24ED0Xr9fTcO3zUfRgO4wm902ljpvC52zBhXBbhjYVgM9om+/93FXoYytqmpjFC WKD+H6iWHkQ7/joDh69+FbQoMK8yEIXEGBEngZfr8TlOIpBlSM/Gn+AwVyLZmg3r 3uFnbzkXNIyMjiGt+w7hjI2CUBwwM96xqOt6KUzSC623GejhRSV1+c+xUsOBJq5e rIUL1ZWaBHcyW5tj6Gcqj/3KzsZcvENiL17nH1qhdo+b5eQFUUgfyWyuBDx2Q9vM rvKk0mOpYdQ4wVL9cWOYErTGMtHUCg61x6+pwZbYJf/EOitnJch7S3c6dkQ9gp/3 P5jzYl9lJADp1IfcIhyaJnL12uR0F2yJ57PsdvSXOr2nuu23bENRFPU64zUL/iWd w8SFIjdPcZ3m+xOL71u2vhkmuomxG2oXnoqWdY7p3Lx2mOQ745A9HIRCyndcKS5V uqTWw3/gVk0FS7JJfLH4tOh4eFxgRpmqFYaJ1Hb+Oy8Le2/vLac= =09k+ -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Fix the #DE oops message string format which confused tools parsing crash information (Thomas Gleixner) - Remove an unused variable in the UV5 code which was triggering a build warning with clang (Mike Travis) * tag 'x86_urgent_for_v5.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/platform/uv: Remove unused variable in UV5 NMI handler x86/traps: Fix #DE Oops message regression |
|
|
|
5f1ec1fd32 |
x86/traps: Fix #DE Oops message regression
The conversion of #DE to the idtentry mechanism introduced a change in the
Ooops message which confuses tools which parse crash information in dmesg.
Remove the underscore from 'divide_error' to restore previous behaviour.
Fixes:
|
|
|
|
a13644f3a5 |
x86/entry/64: Add entry code for #VC handler
The #VC handler needs special entry code because: 1. It runs on an IST stack 2. It needs to be able to handle nested #VC exceptions To make this work, the entry code is implemented to pretend it doesn't use an IST stack. When entered from user-mode or early SYSCALL entry path it switches to the task stack. If entered from kernel-mode it tries to switch back to the previous stack in the IRET frame. The stack found in the IRET frame is validated first, and if it is not safe to use it for the #VC handler, the code will switch to a fall-back stack (the #VC2 IST stack). From there, it can cause nested exceptions again. Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-46-joro@8bytes.org |
|
|
|
885689e47d |
x86/sev-es: Setup per-CPU GHCBs for the runtime handler
The runtime handler needs one GHCB per-CPU. Set them up and map them unencrypted. [ bp: Touchups and simplification. ] Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20200907131613.12703-42-joro@8bytes.org |
|
|
|
d53d9bc0cf |
x86/debug: Change thread.debugreg6 to thread.virtual_dr6
Current usage of thread.debugreg6 is convoluted at best. It starts life as a copy of the hardware DR6 value, but then various bits are cleared and set. Replace this with a new variable thread.virtual_dr6 that is initialized to 0 when DR6 is read and only gains bits, at the same time the actual (on stack) dr6 value which is read from the hardware only gets bits cleared. Suggested-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20200902133201.415372940@infradead.org |