mirror of https://github.com/torvalds/linux.git
18243 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
75f4d9af8b |
iov_iter work; most of that is about getting rid of
direction misannotations and (hopefully) preventing
more of the same for the future.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
-----BEGIN PGP SIGNATURE-----
iHQEABYIAB0WIQQqUNBr3gm4hGXdBJlZ7Krx/gZQ6wUCY5ZzQAAKCRBZ7Krx/gZQ
65RZAP4nTkvOn0NZLVFkuGOx8pgJelXAvrteyAuecVL8V6CR4AD40qCVY51PJp8N
MzwiRTeqnGDxTTF7mgd//IB6hoatAA==
=bcvF
-----END PGP SIGNATURE-----
Merge tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull iov_iter updates from Al Viro:
"iov_iter work; most of that is about getting rid of direction
misannotations and (hopefully) preventing more of the same for the
future"
* tag 'pull-iov_iter' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
use less confusing names for iov_iter direction initializers
iov_iter: saner checks for attempt to copy to/from iterator
[xen] fix "direction" argument of iov_iter_kvec()
[vhost] fix 'direction' argument of iov_iter_{init,bvec}()
[target] fix iov_iter_bvec() "direction" argument
[s390] memcpy_real(): WRITE is "data source", not destination...
[s390] zcore: WRITE is "data source", not destination...
[infiniband] READ is "data destination", not source...
[fsi] WRITE is "data source", not destination...
[s390] copy_oldmem_kernel() - WRITE is "data source", not destination
csum_and_copy_to_iter(): handle ITER_DISCARD
get rid of unlikely() on page_copy_sane() calls
|
|
|
|
268325bda5 |
Random number generator updates for Linux 6.2-rc1.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEEq5lC5tSkz8NBJiCnSfxwEqXeA64FAmOU+U8ACgkQSfxwEqXe
A67NnQ//Y5DltmvibyPd7r1TFT2gUYv+Rx3sUV9ZE1NYptd/SWhhcL8c5FZ70Fuw
bSKCa1uiWjOxosjXT1kGrWq3de7q7oUpAPSOGxgxzoaNURIt58N/ajItCX/4Au8I
RlGAScHy5e5t41/26a498kB6qJ441fBEqCYKQpPLINMBAhe8TQ+NVp0rlpUwNHFX
WrUGg4oKWxdBIW3HkDirQjJWDkkAiklRTifQh/Al4b6QDbOnRUGGCeckNOhixsvS
waHWTld+Td8jRrA4b82tUb2uVZ2/b8dEvj/A8CuTv4yC0lywoyMgBWmJAGOC+UmT
ZVNdGW02Jc2T+Iap8ZdsEmeLHNqbli4+IcbY5xNlov+tHJ2oz41H9TZoYKbudlr6
/ReAUPSn7i50PhbQlEruj3eg+M2gjOeh8OF8UKwwRK8PghvyWQ1ScW0l3kUhPIhI
PdIG6j4+D2mJc1FIj2rTVB+Bg933x6S+qx4zDxGlNp62AARUFYf6EgyD6aXFQVuX
RxcKb6cjRuFkzFiKc8zkqg5edZH+IJcPNuIBmABqTGBOxbZWURXzIQvK/iULqZa4
CdGAFIs6FuOh8pFHLI3R4YoHBopbHup/xKDEeAO9KZGyeVIuOSERDxxo5f/ITzcq
APvT77DFOEuyvanr8RMqqh0yUjzcddXqw9+ieufsAyDwjD9DTuE=
=QRhK
-----END PGP SIGNATURE-----
Merge tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random
Pull random number generator updates from Jason Donenfeld:
- Replace prandom_u32_max() and various open-coded variants of it,
there is now a new family of functions that uses fast rejection
sampling to choose properly uniformly random numbers within an
interval:
get_random_u32_below(ceil) - [0, ceil)
get_random_u32_above(floor) - (floor, U32_MAX]
get_random_u32_inclusive(floor, ceil) - [floor, ceil]
Coccinelle was used to convert all current users of
prandom_u32_max(), as well as many open-coded patterns, resulting in
improvements throughout the tree.
I'll have a "late" 6.1-rc1 pull for you that removes the now unused
prandom_u32_max() function, just in case any other trees add a new
use case of it that needs to converted. According to linux-next,
there may be two trivial cases of prandom_u32_max() reintroductions
that are fixable with a 's/.../.../'. So I'll have for you a final
conversion patch doing that alongside the removal patch during the
second week.
This is a treewide change that touches many files throughout.
- More consistent use of get_random_canary().
- Updates to comments, documentation, tests, headers, and
simplification in configuration.
- The arch_get_random*_early() abstraction was only used by arm64 and
wasn't entirely useful, so this has been replaced by code that works
in all relevant contexts.
- The kernel will use and manage random seeds in non-volatile EFI
variables, refreshing a variable with a fresh seed when the RNG is
initialized. The RNG GUID namespace is then hidden from efivarfs to
prevent accidental leakage.
These changes are split into random.c infrastructure code used in the
EFI subsystem, in this pull request, and related support inside of
EFISTUB, in Ard's EFI tree. These are co-dependent for full
functionality, but the order of merging doesn't matter.
- Part of the infrastructure added for the EFI support is also used for
an improvement to the way vsprintf initializes its siphash key,
replacing an sleep loop wart.
- The hardware RNG framework now always calls its correct random.c
input function, add_hwgenerator_randomness(), rather than sometimes
going through helpers better suited for other cases.
- The add_latent_entropy() function has long been called from the fork
handler, but is a no-op when the latent entropy gcc plugin isn't
used, which is fine for the purposes of latent entropy.
But it was missing out on the cycle counter that was also being mixed
in beside the latent entropy variable. So now, if the latent entropy
gcc plugin isn't enabled, add_latent_entropy() will expand to a call
to add_device_randomness(NULL, 0), which adds a cycle counter,
without the absent latent entropy variable.
- The RNG is now reseeded from a delayed worker, rather than on demand
when used. Always running from a worker allows it to make use of the
CPU RNG on platforms like S390x, whose instructions are too slow to
do so from interrupts. It also has the effect of adding in new inputs
more frequently with more regularity, amounting to a long term
transcript of random values. Plus, it helps a bit with the upcoming
vDSO implementation (which isn't yet ready for 6.2).
- The jitter entropy algorithm now tries to execute on many different
CPUs, round-robining, in hopes of hitting even more memory latencies
and other unpredictable effects. It also will mix in a cycle counter
when the entropy timer fires, in addition to being mixed in from the
main loop, to account more explicitly for fluctuations in that timer
firing. And the state it touches is now kept within the same cache
line, so that it's assured that the different execution contexts will
cause latencies.
* tag 'random-6.2-rc1-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/crng/random: (23 commits)
random: include <linux/once.h> in the right header
random: align entropy_timer_state to cache line
random: mix in cycle counter when jitter timer fires
random: spread out jitter callback to different CPUs
random: remove extraneous period and add a missing one in comments
efi: random: refresh non-volatile random seed when RNG is initialized
vsprintf: initialize siphash key using notifier
random: add back async readiness notifier
random: reseed in delayed work rather than on-demand
random: always mix cycle counter in add_latent_entropy()
hw_random: use add_hwgenerator_randomness() for early entropy
random: modernize documentation comment on get_random_bytes()
random: adjust comment to account for removed function
random: remove early archrandom abstraction
random: use random.trust_{bootloader,cpu} command line option only
stackprotector: actually use get_random_canary()
stackprotector: move get_random_canary() into stackprotector.h
treewide: use get_random_u32_inclusive() when possible
treewide: use get_random_u32_{above,below}() instead of manual loop
treewide: use get_random_u32_below() instead of deprecated function
...
|
|
|
|
2f60f83084 |
- Have alternatives patch the same sections in modules as in vmlinux
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOXitsACgkQEsHwGGHe VUp8IA//W0CiKuDcqZcMLzl5A16ZOSNXS1xuxBjFcDhS2JtCb3NZEeXLPRcEWglO HwOZpLq5gc32kSboujT4XqKFnrtcdS94fO+BPwB/xxlM4Y4WWp4JRwbAylzGOOft 5NmZFB35zLMAKDpCogrigYtvav+usZqeCt2SRxAGrK8MuCXLk53OndQdChfJj0+O VzZsd6gdhjCJ20lZzSYiAZWUYE1Ibfd6hch37A/T1bLD8crANWJPV97PCCJivWIH PVx3NTWzCjSX507eX3+v1Nf8a+GpCGcJzJwu8+0o5T6lWrf4vyXF/Evz4jgdUe4i 8ZeeCDTsIgbDT7WhLpM6DS5SvkgrYkCamsSQzFLFdeXwpPlgvogyc6DmoQvPy1sw WFPTYy+HOp/5Slz7GIPN2WdjE5RkfFDQ6a+w72R6YZDlZLYEKCWciZmRwOlrtmR6 K2ujo4ipEK9I7QW9EES2WAAvaM0AsxfjX545T9IDI4W+AXil+m08TFDPwmoag4Ja q0MUTDFAfmvdhAa2rxJuSedbpjYflb/uuYHqX/kdR9syHhtJpiw1TypaVYD89hpL AceNGn1gHMVTcC19+Ey9uoL9+Y3VfazD1UIFuzK/iFIgsieqK6zxTRKo8mZ0NHYE fNnA2sdcteKgaW+d2aKsb0yF1OZ05nfg1YgmUZLbcE1oXmGEJ5A= =expz -----END PGP SIGNATURE----- Merge tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 alternative update from Borislav Petkov: "A single alternatives patching fix for modules: - Have alternatives patch the same sections in modules as in vmlinux" * tag 'x86_alternatives_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/alternative: Consistently patch SMP locks in vmlinux and modules |
|
|
|
9196a0ba9f |
- Fix confusing output from /sys/kernel/debug/ras/daemon_active
- Add another MCE severity error case to the Intel error severity table to promote UC and AR errors to panic severity and remove the corresponding code condition doing that. - Make sure the thresholding and deferred error interrupts on AMD SMCA systems clear the all registers reporting an error so that there are no multiple errors logged for the same event -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmOXiPcACgkQEsHwGGHe VUomHQ/9Gj6Go0ILvIEpn3VCC8bRc0nf/SFJ+4BnFBc+GdiN6ePhDwn1of3bPk/d zNuPFGEa1rV7r97MDsggGetvNVVrA6zumoPmLrHPrZSvhRyCW620J43RH+3T4bzz XfCMU6oRJg+F4dFUPnnAnp/6DTwLSe0ofpc2eARlmjdOxTo4MRfxWfwe5XALGGN1 q+8ycB3Gb8cvhjlB61PL7hhjuy6yH29v63vjUMqsyfDmVhXRxY3xymg+4SxalCBf 0Zz8/RRJFHSOwzUPsQUm9kVMN8phhJ/fN0B1wsLDWlt0K5Vx5D19l+wbH4aaTTcF 8bWMMmS43raFuARkcROAEbDrWM2kEo5Qe9eWhZ7HB9wLG9SicJfZIH5Th4ul40V8 4RARSj1ve4vfxzNmmhFf+RL8kdYWLpxlwVJUhdHkiqKTqN8SIQMb/dsNg/7KjFsV N3PSZ0lOEQ5Q2l5fSZoL+auqXgJBD5BUy+Gjk0awZavzZCdI35/LK8xVzrgCgsRk AlAcfvngpyZB7A0aeiwalIszcyjk1cnK8+RwIRtvM0CAYhKP+SVTvq4wHtRiO5HD TfuVgaHSgOyoOD0NdUGM6PXgrBXqnyIYI2me8Gwtg7gzenizBsv/uh8cvWa3DQnG 5NItCYEG7Paa7p3VQDvFqxKldIC1Bjwi2pYYIDOgRiwWGbYzSCU= =sKP5 -----END PGP SIGNATURE----- Merge tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 RAS updates from Borislav Petkov: - Fix confusing output from /sys/kernel/debug/ras/daemon_active - Add another MCE severity error case to the Intel error severity table to promote UC and AR errors to panic severity and remove the corresponding code condition doing that. - Make sure the thresholding and deferred error interrupts on AMD SMCA systems clear the all registers reporting an error so that there are no multiple errors logged for the same event * tag 'ras_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: RAS: Fix return value from show_trace() x86/mce: Use severity table to handle uncorrected errors in kernel x86/MCE/AMD: Clear DFR errors found in THR handler |
|
|
|
40deb5e41a |
* Clarify XSAVE consistency warnings
* Fix up ptrace interface to protection keys register (PKRU) * Avoid undefined compiler behavior with TYPE_ALIGN -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYisACgkQaDWVMHDJ krAJkA//QRChRwyKi1syinXt2SGoSa3mTzP23SyV0TunOfKBiBUreFJ2mMFjsX0h V7SJcu82sCWLHAY6LZRdyiF8zK3Cfzpbgb1QfzBCefE/gU801FhCypqNbQO5Lpdr PEo+naaDOzwDWDt0A6OkAArgb0zfaOGL+OBhuwT7mcUtBz6gCakFqG2BMgOzqD1z SAp0RraoSsFnKFl5Gv44+gkThq8/8yL5tyrJtnGv1jAsbhw9zmloaOue6MNMPJhH 3sFQnML3qeNRozquWWeCPu/hxWuFDitPhwdmNRZrnQ3DyRdDhCZPOjv+tQmxI3EO 5c+UIkMIsRh2nZLwHcM+iO5cWE7lyiAWpgqqArB+r2CFXWK5q2lplhXngBodE9Kr ki/NZ6oEitT3+bLXhCwyc7WKxohl2IlmclJ4AD3Qrp4bzPhfsZebL6nNs/3bxWuF CxJWIKzjtIcgNSEJaDOzFA5CAImq74r/kCW4e11ZXwmOnx6PX1YG6p0C1yknrZYJ bvy8WxureO7OJEcVZfwxpXLYbb+7Q/k/l2DkUdVAvKSCB81uWR4JzEp4oooDxf2j 6x9qT5Mi95FhAHOCmlxwkQJTBCB36LkVF/3ESEOqJmun4F5ghPbMX2JzpBa6jPCS lzkBrzA8MAdmaLHhDO+nd5m8HVY3QBSXDVtRTycmuloeoSeyBno= =An0n -----END PGP SIGNATURE----- Merge tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Dave Hansen: "There are two little fixes in here, one to give better XSAVE warnings and another to address some undefined behavior in offsetof(). There is also a collection of patches to fix some issues with ptrace and the protection keys register (PKRU). PKRU is a real oddity because it is exposed in the XSAVE-related ABIs, but it is generally managed without using XSAVE in the kernel. This fix thankfully came with a selftest to ward off future regressions. Summary: - Clarify XSAVE consistency warnings - Fix up ptrace interface to protection keys register (PKRU) - Avoid undefined compiler behavior with TYPE_ALIGN" * tag 'x86_fpu_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN selftests/vm/pkeys: Add a regression test for setting PKRU through ptrace x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set x86/fpu: Allow PKRU to be (once again) written by ptrace. x86/fpu: Add a pkru argument to copy_uabi_to_xstate() x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate(). x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate() x86/fpu/xstate: Fix XSTATE_WARN_ON() to emit relevant diagnostics |
|
|
|
1cab145a94 |
Add a sysctl to control the split lock misery mode
-----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYlcACgkQaDWVMHDJ krB+IQ//fLNzHnNmTaFq3TlmF9HiJ926uCu2C3MMma0g8NlFXGLTbeI/UaaER12/ m/N4d+/WPSO1PnsMR36f10Byr8PN2fpbJHGMsHtZ4Y4MTnGycM6JxjDYeFuaSPB5 Cw2IRsO6c7X/dWEVW7hLbHhlG4MpsiX9APt2/PBGpGJm88wL1RDosMKst6430UQK 24JZtFbdyaPnUlo48ql85VkGtdgFHXRnebhM0sX95bVWdSLvNWUSpQAyETp+U9rn CH75pnoKcJspKun5FmdN2n3gix8Rumz8OZuv9e4XAfBl94H4OZ+SeRN4YbKUvzJP PtSCz7PT8VQNsJVCA58TQ+QdmhtKsT4ia0ylDvMhHiozzjUNeeS54qJQSUyPLOqK dBl4hl6BmGMMH2fAZGeoxVmVZMIdLaE0PBECjBEuPAG15IqlxQwTdSeyo0k+S0wV wYUtCqmxOItW3TA8y044zDjCcIN6wiFymBJtjKbAMxz54ONfnUqgAUluXLeE3xim 8UqL/uM869Ptu6sDO6sfROd1K8EA3KXrsmOGZV7s9hp+qGQcxsvUDhePT9EosS/G JcmYspV211FO2fTAAOiCe5SJRkoPw/lRWufjNNNWWd3mawJhDeYujZ2fQAxEThC+ Mf8FyFsbxOdbJ1UatgWs/iLOnVwMJf/E1hraq7mdRuZHbNQm7H4= =yyO+ -----END PGP SIGNATURE----- Merge tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 splitlock updates from Dave Hansen: "Add a sysctl to control the split lock misery mode. This enables users to reduce the penalty inflicted on split lock users. There are some proprietary, binary-only games which became entirely unplayable with the old penalty. Anyone opting into the new mode is, of course, more exposed to the DoS nasitness inherent with split locks, but they can play their games again" * tag 'x86_splitlock_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/split_lock: Add sysctl to control the misery mode |
|
|
|
287f037db5 |
Minor cleanups:
* Remove unnecessary arch_has_empty_bitmaps structure memory * Move rescrtl MSR defines into msr-index.h, like normal MSRs -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYhsACgkQaDWVMHDJ krA75w//XmOC929XGMOY7WQL6IZlH62xsJbtb3BhmM24Ho7RHSNQGPD+ukArCb0u V/w50Q4crQrLsIxqWjXkyDQ7w66PvvsAIhYFBEV4kssRli9y173CzJQt/lQfUXL9 T7vG5WY1n4f+vtvmZfwcFaGOPkZ5edp8v1y8Grk3r93ci2VDSk+yvEiq80c+JQoX ZnEYPxGPUpwAVuaysY8wkGCEc4Yln6gtTKzpVPXE18WAs82OeiCWBfldI/+95j3o /5r5asYQpD8bVhtLHi1mepkBAGbeVNWhSJVlOE9HdU9WnzCkNKn1ZXRuXSBlvTeq FPjg6vsBXuz8zQV4Dd3Jk3hWv3H/4sTWsgiyUFdHtz/VlE9M8NjGcE4caOgSuBqR 2ovI/HwdvdYyiZwvNN0fXrnzEn1MliSXDgAscNuxzovJXqdTP2BpUj0SVlZdVs0U 0xba5sZ5A6fh2SwKX7JQYYsEh4gudiixR+D2l5u7EUOiNyfw0DZgWi/ElpvX4ncy QvDIIqlm29A/VkJQAdSHJc0ew+w39M7f3VNfQviLXxudGFuhrg+kXlI1UYGcX/cH 4LEjmE1KCymmq7v+7+zBrHwsVCxr5mi/CZnx+/4Y/2O+xOKJ1U7GQDXWzu/SC+aF tEwqDCldYKjqrfdkmuGXSt2YipkNOC2EBLY32mW7rtTDDIXPSro= =n2UE -----END PGP SIGNATURE----- Merge tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cache resource control updates from Dave Hansen: "These declare the resource control (rectrl) MSRs a bit more normally and clean up an unnecessary structure member: - Remove unnecessary arch_has_empty_bitmaps structure memory - Move rescrtl MSR defines into msr-index.h, like normal MSRs" * tag 'x86_cache_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/resctrl: Move MSR defines into msr-index.h x86/resctrl: Remove arch_has_empty_bitmaps |
|
|
|
2da68a77b9 |
* Introduce a new SGX feature (Asynchrounous Exit Notification)
for bare-metal enclaves and KVM guests to mitigate single-step
attacks
* Increase batching to speed up enclave release
* Replace kmap/kunmap_atomic() calls
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEV76QKkVc4xCGURexaDWVMHDJkrAFAmOXYkEACgkQaDWVMHDJ
krB5Og//Vn0oy0pGhda+LtHJgpa9/qPlzvoZCBxi/6SfLneadE5/g/q2KHbiCgVf
sQ6SEZ0MiVc2SrQcA6CntMO+stJIHG4LqYutygfKDoxXHGzxotzvzTmRV7Qxfhj5
LrPfl4cLWVO/jGDs0XQpOVFykKgdMcg1OjlnQYfriFiIiBkcClC7F0zYrOWAQWW0
z+4h3mlWzyAcBdxrZ9qPVqBMbM3qVKQWeE4D9K2Edfgx1lhQBmvtRdYXTplk08tV
DrfEkG5L189lrwlmbkKT5+pXSTmJqJzBoYyAGOH8n4Wb9aKLdagJErVg0ocXx8uV
ngPFU5vmaZza7EZcQheu8iRfM+zQCrcVjBImrRLyQPgCeMBX7o75axYvu4/bvPkP
3+1/JUL6/m738Fqom4wUKdeoJFw/HLGRyQ36yhZAEzH7wPv7/9Q1zpdxcypE6a+Q
B7UGQNVXV9g5Ivhe44gZIKx/3VL7AthtyCQvhwGQzzm4jX2SwnQKNXy0iKlJr2iI
LyREdYlJsRR1/wMdjnj2QqtnWPRZ5/rzl7bvWqiXa4xyvcgArrBowjMdZBttaItJ
cVK5Aj2bvR3Yc/e9GtPoLvwU5IwtoXgUe1B4DsJtoFoUq7gUGZZcEd5uAYRAk7PX
lyP2LQNxX5i150cxjlSYLLLTNmwvZQ+5PFq+V5+McKbAge8OD8g=
=bIXL
-----END PGP SIGNATURE-----
Merge tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 sgx updates from Dave Hansen:
"The biggest deal in this series is support for a new hardware feature
that allows enclaves to detect and mitigate single-stepping attacks.
There's also a minor performance tweak and a little piece of the
kmap_atomic() -> kmap_local() transition.
Summary:
- Introduce a new SGX feature (Asynchrounous Exit Notification) for
bare-metal enclaves and KVM guests to mitigate single-step attacks
- Increase batching to speed up enclave release
- Replace kmap/kunmap_atomic() calls"
* tag 'x86_sgx_for_6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/sgx: Replace kmap/kunmap_atomic() calls
KVM/VMX: Allow exposing EDECCSSA user leaf function to KVM guest
x86/sgx: Allow enclaves to use Asynchrounous Exit Notification
x86/sgx: Reduce delay and interference of enclave release
|
|
|
|
631aa74442 |
Updates for miscellaneous x86 areas:
- Reserve a new boot loader type for barebox which is usally used on ARM
and MIPS, but can also be utilized as EFI payload on x86 to provide
watchdog-supervised boot up.
- Consolidate the native and compat 32bit signal handling code and split
the 64bit version out into a separate source file
- Switch the ESPFIX random usage to get_random_long().
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUvMQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoQmmD/9xVeaZbBInehnzbsZi4C4WyOMGUg4l
AoZC0QSzp2hFZRwpbu4Df1Zh2VN5nItAhQUvNLfdZv9/GL5VkhO+J5fPEHUbtnQ8
34TujaTHAssyib8uRFTAxxGSz3S2jPRrzUloZ71M+Whx7Fw7Fh8M/t8DmnvnaPtw
uYbBmZd9mZ0Y7BVMoXh70V0nd21PN8a8qQhYRaUD7lyb1w6Tcfzag4J1DXFfP8Lm
ovaf2AW3mgt+RmzIRNqP28weLt/VxFC38H/nZ9Jlc9npfnLTyGfwfOxE0CILfEo+
cYYVbMaIN+vs5kJQaVbvEJvk7oumLC9CvwE6oIL8J0XOs8dbBHkbZPQYW0yVF1/m
rXEd3LBSNhnZIF0aMUoJrBZAI++nGZo0izSu3eGwLZXSbWBVjlzPAqeBJQtqfQ/E
j87IisQjkWeOOSNvBas1bURWa7Gy5QFRCxbJQFfAZjIHhg+fIwxrK0HlSqxUXqK5
PRbc1LsWjUn9TspOC+mRIKrqAfetkohL7BGc+uuslH3uXiMQVAghg37+rSqvAjkn
50d8XxqOd7aC0NOVn8BfxhMf85Ge7z/0r7JJcaLcRY7/CP6S3vTCAgbSjN4+WzfN
sRu5W/m8oLuF8Q9DdgqtqiNrYezhoEKJHZsGoi/IGy6eAYjMxPX/Cl4YysdqV32N
Z55ZeEBwg9KC1g==
=AHdL
-----END PGP SIGNATURE-----
Merge tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull misc x86 updates from Thomas Gleixner:
"Updates for miscellaneous x86 areas:
- Reserve a new boot loader type for barebox which is usally used on
ARM and MIPS, but can also be utilized as EFI payload on x86 to
provide watchdog-supervised boot up.
- Consolidate the native and compat 32bit signal handling code and
split the 64bit version out into a separate source file
- Switch the ESPFIX random usage to get_random_long()"
* tag 'x86-misc-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/espfix: Use get_random_long() rather than archrandom
x86/signal/64: Move 64-bit signal code to its own file
x86/signal/32: Merge native and compat 32-bit signal code
x86/signal: Add ABI prefixes to frame setup functions
x86/signal: Merge get_sigframe()
x86: Remove __USER32_DS
signal/compat: Remove compat_sigset_t override
x86/signal: Remove sigset_t parameter from frame setup functions
x86/signal: Remove sig parameter from frame setup functions
Documentation/x86/boot: Reserve type_of_loader=13 for barebox
|
|
|
|
79ad89123c |
A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit. The original
implementation tried to minimize the allocation size with quite some
hard to understand and fragile tricks. Make it robust and straight
forward by separating the register enumerations for 32 and 64 bit
completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUu0ATHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoUNzEACNn5XbRqxPQZak5XHeJ46/VNVTqTE0
Z7euwF8oP+aAybyDevvm18D2hB9Atn4vU9QJYhnTxBXbCLUNErKrH8FcXdNOBbeC
YdAX7nO5WH8IM+drCMySeK6Tv6rvhnDUtgBzdBSl4NdPXUSOnGo+jHqHfN/Q+/n0
yvbwSoVAjD01sxVZQqKQOrzDgDuR/zlISCVudfS+tR4Rm/CYj0cl+MQS9Z1VM3Z6
7pqyypd5+CyNAD6vTDY/q+ZK0ShfNnU9TIIoGmOB/pc0kLctwIu3MY76Uo2DUgGn
n/ItR9mvYu/QelCwX02VG3aRYJPLRfBa+DjQfZUwZapRz3rsjKtfa8ogpPZTLrSO
o4ht/jxlKKDyNOQKYeL2yy054JR4DkKziilEzw5GZHeH2y66XWudRuWfMwbTdrGc
esP5fSNfZ9uluYl6GCCw6S83RJzQ8aZXRcAy7CJgw2Qb4XE7IOA2jf18x5AYaDUp
4a6HCjbxYkEmKCkzkh9+w5koYruyizMBKMBBh5QsMzH4xp20s/vffHwbZ1tls9Za
eTDC/E+wW9Om3qynRynm0EmcHpa0j+RcmkHOhFcXj6SRLnhzktk4Rrr3vlhardS3
Pc8h3GnE5mFXqS8t3r6/hvMk+6svhSu3RbICiLNU72F/tVLU628ux/WoCKfXZloE
7HxWoVhkTF7eOw==
=DTBQ
-----END PGP SIGNATURE-----
Merge tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Thomas Gleixner:
"A set of x86 cleanups:
- Rework the handling of x86_regset for 32 and 64 bit.
The original implementation tried to minimize the allocation size
with quite some hard to understand and fragile tricks. Make it
robust and straight forward by separating the register enumerations
for 32 and 64 bit completely.
- Add a few missing static annotations
- Remove the stale unused setup_once() assembly function
- Address a few minor static analysis and kernel-doc warnings"
* tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm/32: Remove setup_once()
x86/kaslr: Fix process_mem_region()'s return value
x86: Fix misc small issues
x86/boot: Repair kernel-doc for boot_kstrtoul()
x86: Improve formatting of user_regset arrays
x86: Separate out x86_regset for 32 and 64 bit
x86/i8259: Make default_legacy_pic static
x86/tsc: Make art_related_clocksource static
|
|
|
|
369013162f |
A set of changes for the x86 APIC code:
- Handle the case where x2APIC is enabled and locked by the BIOS on a
kernel with CONFIG_X86_X2APIC=n gracefully. Instead of a panic which
does not make it to the graphical console during very early boot,
simply disable the local APIC completely and boot with the PIC and very
limited functionality, which allows to diagnose the issue.
- Convert x86 APIC device tree bindings to YAML
- Extend x86 APIC device tree bindings to configure interrupt delivery
mode and handle this in during init. This allows to boot with device
tree on platforms which lack a legacy PIC.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUuYUTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoaTED/9D33bnJesbDVZs31HxLJc/jZED0/Do
dli0wRHWmQx9jpUmTXlKRhhIcUOjPy3Cdz44yoOH14wdJ96qUCBUj8sS9vFO4F7M
CS/eoO77GKG6oXpMvsNC5TcSaZnXAb4UYz5wCV21ZXL6P0izhOivKSqTR222jT6e
afEzQhwWhHZmrkX44F1YvMuc+HP6+swfO635vNtZhKtlA7NeKdHRijGZhrXEhNO/
Pue2xbYVMSLNaRTRtN0Mjm6UvShBLQhbmD/vXrVOCztfzhSfwq0LRC9xXcXmdWCY
XjflM+osQxIUs2WbpL1lohq5VUzTlWVNsZe4YkH5b0xMEO9HkD7apF03p03SIO4n
X37joMbrfPz9ZsmSdaN836YZd74IfQ5wnFFQTVL0BC0M4lZNeAnNcxVr3Mfio4yX
GvYahmyvxHlbWag4SYqVsy15QiNV/xZZZD6uIvBvMCfxoFKw8tBF+9/2Iy+3R+zj
n7q17Y9bLSXwh1Z/9xgwdTs+7SNCpIlZ/5nz8NpBhHaZF2BziICCv2TEKZUXmli3
HHkWM7ikj67zgFMiWLLOZpiYz/vgJEFE9nhlmXEH1RNMIfqom/JG8FN8GE1C9kYV
dmSjOE7x/CdZfJ83BRlTx5j2HfAs7RW4A7IMWPIxNdqEFmhxWnQIHasAfMrHcoIU
pAQ8u/qoduJA4A==
=dpZx
-----END PGP SIGNATURE-----
Merge tag 'x86-apic-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 apic update from Thomas Gleixner:
"A set of changes for the x86 APIC code:
- Handle the case where x2APIC is enabled and locked by the BIOS on a
kernel with CONFIG_X86_X2APIC=n gracefully.
Instead of a panic which does not make it to the graphical console
during very early boot, simply disable the local APIC completely
and boot with the PIC and very limited functionality, which allows
to diagnose the issue
- Convert x86 APIC device tree bindings to YAML
- Extend x86 APIC device tree bindings to configure interrupt
delivery mode and handle this in during init. This allows to boot
with device tree on platforms which lack a legacy PIC"
* tag 'x86-apic-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/of: Add support for boot time interrupt delivery mode configuration
x86/of: Replace printk(KERN_LVL) with pr_lvl()
dt-bindings: x86: apic: Introduce new optional bool property for lapic
dt-bindings: x86: apic: Convert Intel's APIC bindings to YAML schema
x86/of: Remove unused early_init_dt_add_memory_arch()
x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS
|
|
|
|
9d33edb20f |
Updates for the interrupt core and driver subsystem:
- Core:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for PCI/MSI[-X]
and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows device
manufactures to provide implementation defined storage for MSI messages
contrary to the uniform and specification defined storage mechanisms for
PCI/MSI and PCI/MSI-X. IMS not only allows to overcome the size limitations
of the MSI-X table, but also gives the device manufacturer the freedom to
store the message in arbitrary places, even in host memory which is shared
with the device.
There have been several attempts to glue this into the current MSI code,
but after lengthy discussions it turned out that there is a fundamental
design problem in the current PCI/MSI-X implementation. This needs some
historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management was
completely different from what we have today in the actively developed
architectures. Interrupt management was completely architecture specific
and while there were attempts to create common infrastructure the
commonalities were rudimentary and just providing shared data structures and
interfaces so that drivers could be written in an architecture agnostic
way.
The initial PCI/MSI[-X] support obviously plugged into this model which
resulted in some basic shared infrastructure in the PCI core code for
setting up MSI descriptors, which are a pure software construct for holding
data relevant for a particular MSI interrupt, but the actual association to
Linux interrupts was completely architecture specific. This model is still
supported today to keep museum architectures and notorious stranglers
alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the kernel,
which was creating yet another architecture specific mechanism and resulted
in an unholy mess on top of the existing horrors of x86 interrupt handling.
The x86 interrupt management code was already an incomprehensible maze of
indirections between the CPU vector management, interrupt remapping and the
actual IO/APIC and PCI/MSI[-X] implementation.
At roughly the same time ARM struggled with the ever growing SoC specific
extensions which were glued on top of the architected GIC interrupt
controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86 vector
domain and interrupt remapping and also allowed ARM to handle the zoo of
SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that it is
not available the PCI/MSI[-X] domains have the vector domain as their
parent. This reduced the required interaction between the domains pretty
much to the initialization phase where it is obviously required to
establish the proper parent relation ship in the components of the
hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the hardware
it's clear that the actual PCI/MSI[-X] interrupt controller is not a global
entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the easy
solution of providing "global" PCI/MSI domains which was possible because
the PCI/MSI[-X] handling is uniform across the devices. This also allowed
to keep the existing PCI/MSI[-X] infrastructure mostly unchanged which in
turn made it simple to keep the existing architecture specific management
alive.
A similar problem was created in the ARM world with support for IP block
specific message storage. Instead of going all the way to stack a IP block
specific domain on top of the generic MSI domain this ended in a construct
which provides a "global" platform MSI domain which allows overriding the
irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the MSI
infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into the
existing infrastructure. Some of this just works by chance on particular
platforms but will fail in hard to diagnose ways when the driver is used
on platforms where the underlying MSI interrupt management code does not
expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront to
avoid resource exhaustion. They are expanded at run-time when the guest
actually tries to use them. The way how this is implemented is that the
host disables MSI-X and then re-enables it with a larger number of
vectors again. That works by chance because most device drivers set up
all interrupts before the device actually will utilize them. But that's
not universally true because some drivers allocate a large enough number
of vectors but do not utilize them until it's actually required,
e.g. for acceleration support. But at that point other interrupts of the
device might be in active use and the MSI-X disable/enable dance can
just result in losing interrupts and therefore hard to diagnose subtle
problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact that IMS
is not longer providing a uniform storage and configuration model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for PCI/IMS.
PCI/IMS has been verified with the work in progress IDXD driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
- Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUsygTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoYXiD/40tXKzCzf0qFIqUlZLia1N3RRrwrNC
DVTixuLtR9MrjwE+jWLQILa85SHInV8syXHSd35SzhsGDxkURFGi+HBgVWmysODf
br9VSh3Gi+kt7iXtIwAg8WNWviGNmS3kPksxCko54F0YnJhMY5r5bhQVUBQkwFG2
wES1C9Uzd4pdV2bl24Z+WKL85cSmZ+pHunyKw1n401lBABXnTF9c4f13zC14jd+y
wDxNrmOxeL3mEH4Pg6VyrDuTOURSf3TjJjeEq3EYqvUo0FyLt9I/cKX0AELcZQX7
fkRjrQQAvXNj39RJfeSkojDfllEPUHp7XSluhdBu5aIovSamdYGCDnuEoZ+l4MJ+
CojIErp3Dwj/uSaf5c7C3OaDAqH2CpOFWIcrUebShJE60hVKLEpUwd6W8juplaoT
gxyXRb1Y+BeJvO8VhMN4i7f3232+sj8wuj+HTRTTbqMhkElnin94tAx8rgwR1sgR
BiOGMJi4K2Y8s9Rqqp0Dvs01CW4guIYvSR4YY+WDbbi1xgiev89OYs6zZTJCJe4Y
NUwwpqYSyP1brmtdDdBOZLqegjQm+TwUb6oOaasFem4vT1swgawgLcDnPOx45bk5
/FWt3EmnZxMz99x9jdDn1+BCqAZsKyEbEY1avvhPVMTwoVIuSX2ceTBMLseGq+jM
03JfvdxnueM3gw==
=9erA
-----END PGP SIGNATURE-----
Merge tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq updates from Thomas Gleixner:
"Updates for the interrupt core and driver subsystem:
The bulk is the rework of the MSI subsystem to support per device MSI
interrupt domains. This solves conceptual problems of the current
PCI/MSI design which are in the way of providing support for
PCI/MSI[-X] and the upcoming PCI/IMS mechanism on the same device.
IMS (Interrupt Message Store] is a new specification which allows
device manufactures to provide implementation defined storage for MSI
messages (as opposed to PCI/MSI and PCI/MSI-X that has a specified
message store which is uniform accross all devices). The PCI/MSI[-X]
uniformity allowed us to get away with "global" PCI/MSI domains.
IMS not only allows to overcome the size limitations of the MSI-X
table, but also gives the device manufacturer the freedom to store the
message in arbitrary places, even in host memory which is shared with
the device.
There have been several attempts to glue this into the current MSI
code, but after lengthy discussions it turned out that there is a
fundamental design problem in the current PCI/MSI-X implementation.
This needs some historical background.
When PCI/MSI[-X] support was added around 2003, interrupt management
was completely different from what we have today in the actively
developed architectures. Interrupt management was completely
architecture specific and while there were attempts to create common
infrastructure the commonalities were rudimentary and just providing
shared data structures and interfaces so that drivers could be written
in an architecture agnostic way.
The initial PCI/MSI[-X] support obviously plugged into this model
which resulted in some basic shared infrastructure in the PCI core
code for setting up MSI descriptors, which are a pure software
construct for holding data relevant for a particular MSI interrupt,
but the actual association to Linux interrupts was completely
architecture specific. This model is still supported today to keep
museum architectures and notorious stragglers alive.
In 2013 Intel tried to add support for hot-pluggable IO/APICs to the
kernel, which was creating yet another architecture specific mechanism
and resulted in an unholy mess on top of the existing horrors of x86
interrupt handling. The x86 interrupt management code was already an
incomprehensible maze of indirections between the CPU vector
management, interrupt remapping and the actual IO/APIC and PCI/MSI[-X]
implementation.
At roughly the same time ARM struggled with the ever growing SoC
specific extensions which were glued on top of the architected GIC
interrupt controller.
This resulted in a fundamental redesign of interrupt management and
provided the today prevailing concept of hierarchical interrupt
domains. This allowed to disentangle the interactions between x86
vector domain and interrupt remapping and also allowed ARM to handle
the zoo of SoC specific interrupt components in a sane way.
The concept of hierarchical interrupt domains aims to encapsulate the
functionality of particular IP blocks which are involved in interrupt
delivery so that they become extensible and pluggable. The X86
encapsulation looks like this:
|--- device 1
[Vector]---[Remapping]---[PCI/MSI]--|...
|--- device N
where the remapping domain is an optional component and in case that
it is not available the PCI/MSI[-X] domains have the vector domain as
their parent. This reduced the required interaction between the
domains pretty much to the initialization phase where it is obviously
required to establish the proper parent relation ship in the
components of the hierarchy.
While in most cases the model is strictly representing the chain of IP
blocks and abstracting them so they can be plugged together to form a
hierarchy, the design stopped short on PCI/MSI[-X]. Looking at the
hardware it's clear that the actual PCI/MSI[-X] interrupt controller
is not a global entity, but strict a per PCI device entity.
Here we took a short cut on the hierarchical model and went for the
easy solution of providing "global" PCI/MSI domains which was possible
because the PCI/MSI[-X] handling is uniform across the devices. This
also allowed to keep the existing PCI/MSI[-X] infrastructure mostly
unchanged which in turn made it simple to keep the existing
architecture specific management alive.
A similar problem was created in the ARM world with support for IP
block specific message storage. Instead of going all the way to stack
a IP block specific domain on top of the generic MSI domain this ended
in a construct which provides a "global" platform MSI domain which
allows overriding the irq_write_msi_msg() callback per allocation.
In course of the lengthy discussions we identified other abuse of the
MSI infrastructure in wireless drivers, NTB etc. where support for
implementation specific message storage was just mindlessly glued into
the existing infrastructure. Some of this just works by chance on
particular platforms but will fail in hard to diagnose ways when the
driver is used on platforms where the underlying MSI interrupt
management code does not expect the creative abuse.
Another shortcoming of today's PCI/MSI-X support is the inability to
allocate or free individual vectors after the initial enablement of
MSI-X. This results in an works by chance implementation of VFIO (PCI
pass-through) where interrupts on the host side are not set up upfront
to avoid resource exhaustion. They are expanded at run-time when the
guest actually tries to use them. The way how this is implemented is
that the host disables MSI-X and then re-enables it with a larger
number of vectors again. That works by chance because most device
drivers set up all interrupts before the device actually will utilize
them. But that's not universally true because some drivers allocate a
large enough number of vectors but do not utilize them until it's
actually required, e.g. for acceleration support. But at that point
other interrupts of the device might be in active use and the MSI-X
disable/enable dance can just result in losing interrupts and
therefore hard to diagnose subtle problems.
Last but not least the "global" PCI/MSI-X domain approach prevents to
utilize PCI/MSI[-X] and PCI/IMS on the same device due to the fact
that IMS is not longer providing a uniform storage and configuration
model.
The solution to this is to implement the missing step and switch from
global PCI/MSI domains to per device PCI/MSI domains. The resulting
hierarchy then looks like this:
|--- [PCI/MSI] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
which in turn allows to provide support for multiple domains per
device:
|--- [PCI/MSI] device 1
|--- [PCI/IMS] device 1
[Vector]---[Remapping]---|...
|--- [PCI/MSI] device N
|--- [PCI/IMS] device N
This work converts the MSI and PCI/MSI core and the x86 interrupt
domains to the new model, provides new interfaces for post-enable
allocation/free of MSI-X interrupts and the base framework for
PCI/IMS. PCI/IMS has been verified with the work in progress IDXD
driver.
There is work in progress to convert ARM over which will replace the
platform MSI train-wreck. The cleanup of VFIO, NTB and other creative
"solutions" are in the works as well.
Drivers:
- Updates for the LoongArch interrupt chip drivers
- Support for MTK CIRQv2
- The usual small fixes and updates all over the place"
* tag 'irq-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (134 commits)
irqchip/ti-sci-inta: Fix kernel doc
irqchip/gic-v2m: Mark a few functions __init
irqchip/gic-v2m: Include arm-gic-common.h
irqchip/irq-mvebu-icu: Fix works by chance pointer assignment
iommu/amd: Enable PCI/IMS
iommu/vt-d: Enable PCI/IMS
x86/apic/msi: Enable PCI/IMS
PCI/MSI: Provide pci_ims_alloc/free_irq()
PCI/MSI: Provide IMS (Interrupt Message Store) support
genirq/msi: Provide constants for PCI/IMS support
x86/apic/msi: Enable MSI_FLAG_PCI_MSIX_ALLOC_DYN
PCI/MSI: Provide post-enable dynamic allocation interfaces for MSI-X
PCI/MSI: Provide prepare_desc() MSI domain op
PCI/MSI: Split MSI-X descriptor setup
genirq/msi: Provide MSI_FLAG_MSIX_ALLOC_DYN
genirq/msi: Provide msi_domain_alloc_irq_at()
genirq/msi: Provide msi_domain_ops:: Prepare_desc()
genirq/msi: Provide msi_desc:: Msi_data
genirq/msi: Provide struct msi_map
x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
...
|
|
|
|
9c2b840a3b |
Three small x86 fixes which did not make it into 6.1:
- Remove a superfluous noinline which prevents GCC-7.3 to optimize a stub
function away.
- Allow uprobes on REP NOP and do not treat them like word-sized branch
instructions.
- Make the VDSO symbol export of __vdso_sgx_enter_enclave() depend on
CONFIG_X86_SGX to prevent build fails with newer LLVM versions which
rightfully detect that there is no function behind the symbol.
-----BEGIN PGP SIGNATURE-----
iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOW+sQTHHRnbHhAbGlu
dXRyb25peC5kZQAKCRCmGPVMDXSYoWH5EACPYcRw9PNBLMC6L0MF5G0qCFmLcjqn
Fe8LxLywsKdyT6f1aAcOetIqkwDN/fuUyJHcioKqyqSkNlNeRV2hoZ9OlsBGJ7zC
6HH41ZCrY39liKzMM2JmfxU6XxT74zEt3Fly4G127d78HBi9DYwk8fT6GY8/BOk6
wkeWuczqRY1NNek1SBIciBn/FMZU8UShqjKzQsS1Bpj2Dm2ZvHdVh+P2okp2wl9Z
gMbFN0Jq+8jRWOb4BF0Hx2Fg+WjXZPhT8msDXh8Vnr0u7bchWCljbLvvFST2hfpo
+u/uKeOgOHm0XfUBOQa2WpEpev4M3ve1WFSkmP/0Qe3tcaRabMRDXGezZJSAdf1K
dZV0tQu+4rygzZwEf4ppskxejG7LSvyzrLdebPvzUYFT14C5E22jRxp1+Mpswq28
ZPiw6yc3XXUqboNV3JVNs3PDPBVucSCHfQfUNEfjUayaMhb4w5jQyy93WIffOzVU
0KnXe9XX0MA3e5zVJMXExW4907Iks/K+qNgXtx/8fJnqaECIJInxZfbPmj74ZpfT
6b0sJVt04eFX4uYKoLPpFoP9LFUvzU5eR7e7yuoiSGFh3D3p9bimyR5xhBxNqs8Y
j7XL2i0jY95w6v1kK3Kmgr2L+JCAN2v/JFJ+eIOYQAIb/VkhTfNq/MHL33bDJ1X3
2IrBEgo5tk7VNw==
=oJ/K
-----END PGP SIGNATURE-----
Merge tag 'x86-urgent-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Three small x86 fixes which did not make it into 6.1:
- Remove a superfluous noinline which prevents GCC-7.3 to optimize a
stub function away
- Allow uprobes on REP NOP and do not treat them like word-sized
branch instructions
- Make the VDSO symbol export of __vdso_sgx_enter_enclave() depend on
CONFIG_X86_SGX to prevent build failures with newer LLVM versions
which rightfully detect that there is no function behind the
symbol"
* tag 'x86-urgent-2022-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/vdso: Conditionally export __vdso_sgx_enter_enclave()
uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix
x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubs
|
|
|
|
7d62159919 |
hyperv-next for v6.2
-----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEIbPD0id6easf0xsudhRwX5BBoF4FAmORzR4THHdlaS5saXVA a2VybmVsLm9yZwAKCRB2FHBfkEGgXkqCCACFwHz04iepLE7R8ZZ6BVUhD6uzfzDo s1j7ozOUGUe3vI6q0DElHWVQZgzIzLypVsfWkZToe6jeOU6R48b0tZSFyJCUNwGM ogmS7N8fBdHfY9SBFoUPoziBifXpf3kq4hhX/w+1Lge9CN5Ywc4KjuJb91EAInbs lm47O4KQY8w8A7BbPBHYBueUVWLvgwPRPOS032zqxN1787m2tCxpqkfnImK39kh6 IsBBIZfYsok0H5wldhZXnsARpEOeFF6BoFBXpFPlmnbv2VcK2AfZgTYdA3ESyEgd NyOFDfh6BO07gTR1xCH6gvOpkHwx6xKAkjE36RymdhXS6fhRCRsfahVB =m78g -----END PGP SIGNATURE----- Merge tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Drop unregister syscore from hyperv_cleanup to avoid hang (Gaurav Kohli) - Clean up panic path for Hyper-V framebuffer (Guilherme G. Piccoli) - Allow IRQ remapping to work without x2apic (Nuno Das Neves) - Fix comments (Olaf Hering) - Expand hv_vp_assist_page definition (Saurabh Sengar) - Improvement to page reporting (Shradha Gupta) - Make sure TSC clocksource works when Linux runs as the root partition (Stanislav Kinsburskiy) * tag 'hyperv-next-signed-20221208' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Remove unregister syscore call from Hyper-V cleanup iommu/hyper-v: Allow hyperv irq remapping without x2apic clocksource: hyper-v: Add TSC page support for root partition clocksource: hyper-v: Use TSC PFN getter to map vvar page clocksource: hyper-v: Introduce TSC PFN getter clocksource: hyper-v: Introduce a pointer to TSC page x86/hyperv: Expand definition of struct hv_vp_assist_page PCI: hv: update comment in x86 specific hv_arch_irq_unmask hv: fix comment typo in vmbus_channel/low_latency drivers: hv, hyperv_fb: Untangle and refactor Hyper-V panic notifiers video: hyperv_fb: Avoid taking busy spinlock on panic path hv_balloon: Add support for configurable order free page reporting mm/page_reporting: Add checks for page_reporting_order param |
|
|
|
6e24c88773 |
x86/apic/msi: Enable PCI/IMS
Enable IMS in the domain init and allocation mapping code, but do not enable it on the vector domain as discussed in various threads on LKML. The interrupt remap domains can expand this setting like they do with PCI multi MSI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232327.022658817@linutronix.de |
|
|
|
4d5a4ccc51 |
x86/apic/msi: Remove arch_create_remap_msi_irq_domain()
and related code which is not longer required now that the interrupt remap code has been converted to MSI parent domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.267353814@linutronix.de |
|
|
|
cc7594ffad |
iommu/amd: Switch to MSI base domains
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.209212272@linutronix.de |
|
|
|
9a945234ab |
iommu/vt-d: Switch to MSI parent domains
Remove the global PCI/MSI irqdomain implementation and provide the required MSI parent ops so the PCI/MSI code can detect the new parent and setup per device domains. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.151226317@linutronix.de |
|
|
|
b6d5fc3a52 |
x86/apic/vector: Provide MSI parent domain
Enable MSI parent domain support in the x86 vector domain and fixup the checks in the iommu implementations to check whether device::msi::domain is the default MSI parent domain. That keeps the existing logic to protect e.g. devices behind VMD working. The interrupt remap PCI/MSI code still works because the underlying vector domain still provides the same functionality. None of the other x86 PCI/MSI, e.g. XEN and HyperV, implementations are affected either. They still work the same way both at the low level and the PCI/MSI implementations they provide. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124232326.034672592@linutronix.de |
|
|
|
3dad5f9ad9 |
genirq/msi: Move IRQ_DOMAIN_MSI_NOMASK_QUIRK to MSI flags
It's truly a MSI only flag and for the upcoming per device MSI domains this must be in the MSI flags so it can be set during domain setup without exposing this quirk outside of x86. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20221124230313.454246167@linutronix.de |
|
|
|
cefa72129e |
uprobes/x86: Allow to probe a NOP instruction with 0x66 prefix
Intel ICC -hotpatch inserts 2-byte "0x66 0x90" NOP at the start of each
function to reserve extra space for hot-patching, and currently it is not
possible to probe these functions because branch_setup_xol_ops() wrongly
rejects NOP with REP prefix as it treats them like word-sized branch
instructions.
Fixes:
|
|
|
|
6606515742 |
x86/bugs: Make sure MSR_SPEC_CTRL is updated properly upon resume from S3
The "force" argument to write_spec_ctrl_current() is currently ambiguous
as it does not guarantee the MSR write. This is due to the optimization
that writes to the MSR happen only when the new value differs from the
cached value.
This is fine in most cases, but breaks for S3 resume when the cached MSR
value gets out of sync with the hardware MSR value due to S3 resetting
it.
When x86_spec_ctrl_current is same as x86_spec_ctrl_base, the MSR write
is skipped. Which results in SPEC_CTRL mitigations not getting restored.
Move the MSR write from write_spec_ctrl_current() to a new function that
unconditionally writes to the MSR. Update the callers accordingly and
rename functions.
[ bp: Rework a bit. ]
Fixes:
|
|
|
|
89e927bbcd |
x86/sgx: Replace kmap/kunmap_atomic() calls
kmap_local_page() is the preferred way to create temporary mappings when it is feasible, because the mappings are thread-local and CPU-local. kmap_local_page() uses per-task maps rather than per-CPU maps. This in effect removes the need to disable preemption on the local CPU while the mapping is active, and thus vastly reduces overall system latency. It is also valid to take pagefaults within the mapped region. The use of kmap_atomic() in the SGX code was not an explicit design choice to disable page faults or preemption, and there is no compelling design reason to using kmap_atomic() vs. kmap_local_page(). Signed-off-by: Kristen Carlson Accardi <kristen@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Reviewed-by: Fabio M. De Francesco <fmdefrancesco@gmail.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Link: https://lore.kernel.org/linux-sgx/Y0biN3%2FJsZMa0yUr@kernel.org/ Link: https://lore.kernel.org/r/20221115161627.4169428-1-kristen@linux.intel.com |
|
|
|
2833275568 |
x86/of: Add support for boot time interrupt delivery mode configuration
Presently, init/boot time interrupt delivery mode is enumerated only for ACPI enabled systems by parsing MADT table or for older systems by parsing MP table. But for OF based x86 systems, it is assumed & hardcoded to be legacy PIC mode. This causes a boot time crash for platforms which do not provide a 8259 compliant legacy PIC. Add support for configuration of init time interrupt delivery mode for x86 OF based systems by introducing a new optional boolean property 'intel,virtual-wire-mode' for the local APIC interrupt-controller node. This property emulates IMCRP Bit 7 of MP feature info byte 2 of MP floating pointer structure. Defaults to legacy PIC mode if absent. Configures it to virtual wire compatibility mode if present. Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221124084143.21841-5-rtanwar@maxlinear.com |
|
|
|
535403323b |
x86/of: Replace printk(KERN_LVL) with pr_lvl()
Use pr_lvl() instead of the deprecated printk(KERN_LVL). Just a upgrade of print utilities usage. no functional changes. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rahul Tanwar <rtanwar@maxlinear.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221124084143.21841-4-rtanwar@maxlinear.com |
|
|
|
9b09927c0c |
x86/of: Remove unused early_init_dt_add_memory_arch()
Recently objtool started complaining about dead code in the object files, in particular vmlinux.o: warning: objtool: early_init_dt_scan_memory+0x191: unreachable instruction when CONFIG_OF=y. Indeed, early_init_dt_scan() is not used on x86 and making it compile (with help of CONFIG_OF) will abrupt the code flow since in the middle of it there is a BUG() instruction. Remove the pointless function. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221124184824.9548-1-andriy.shevchenko@linux.intel.com |
|
|
|
e3998434da |
x86/apic: Handle no CONFIG_X86_X2APIC on systems with x2APIC enabled by BIOS
A kernel that was compiled without CONFIG_X86_X2APIC was unable to boot on platforms that have x2APIC already enabled in the BIOS before starting the kernel. The kernel was supposed to panic with an approprite error message in validate_x2apic() due to the missing X2APIC support. However, validate_x2apic() was run too late in the boot cycle, and the kernel tried to initialize the APIC nonetheless. This resulted in an earlier panic in setup_local_APIC() because the APIC was not registered. In my experiments, a panic message in setup_local_APIC() was not visible in the graphical console, which resulted in a hang with no indication what has gone wrong. Instead of calling panic(), disable the APIC, which results in a somewhat working system with the PIC only (and no SMP). This way the user is able to diagnose the problem more easily. Disabling X2APIC mode is not an option because it's impossible on systems with locked x2APIC. The proper place to disable the APIC in this case is in check_x2apic(), which is called early from setup_arch(). Doing this in __apic_intr_mode_select() is too late. Make check_x2apic() unconditionally available and remove the empty stub. Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Reported-by: Robert Elliott (Servers) <elliott@hpe.com> Signed-off-by: Mateusz Jończyk <mat.jonczyk@o2.pl> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/lkml/d573ba1c-0dc4-3016-712a-cc23a8a33d42@molgen.mpg.de Link: https://lore.kernel.org/lkml/20220911084711.13694-3-mat.jonczyk@o2.pl Link: https://lore.kernel.org/all/20221129215008.7247-1-mat.jonczyk@o2.pl |
|
|
|
ff4c85c053 |
x86/asm/32: Remove setup_once()
After the removal of the stack canary segment setup code, this function does nothing. Signed-off-by: Brian Gerst <brgerst@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221115184328.70874-1-brgerst@gmail.com |
|
|
|
023e59d4ce |
x86/alternative: Remove noinline from __ibt_endbr_seal[_end]() stubs
Due to the explicit 'noinline' GCC-7.3 is not able to optimize away the argument setup of: apply_ibt_endbr(__ibt_endbr_seal, __ibt_enbr_seal_end); even when X86_KERNEL_IBT=n and the function is an empty stub, which leads to link errors due to missing __ibt_endbr_seal* symbols: ld: arch/x86/kernel/alternative.o: in function `alternative_instructions': alternative.c:(.init.text+0x15d): undefined reference to `__ibt_endbr_seal_end' ld: alternative.c:(.init.text+0x164): undefined reference to `__ibt_endbr_seal' Remove the explicit 'noinline' to help gcc optimize them away. Signed-off-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20221011113803.956808-1-linmiaohe@huawei.com |
|
|
|
fea858dc5d |
iommu/hyper-v: Allow hyperv irq remapping without x2apic
If x2apic is not available, hyperv-iommu skips remapping irqs. This breaks root partition which always needs irqs remapped. Fix this by allowing irq remapping regardless of x2apic, and change hyperv_enable_irq_remapping() to return IRQ_REMAP_XAPIC_MODE in case x2apic is missing. Tested with root and non-root hyperv partitions. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Tianyu Lan <Tianyu.Lan@microsoft.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com> Link: https://lore.kernel.org/r/1668715899-8971-1-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> |
|
|
|
97fa21f65c |
x86/resctrl: Move MSR defines into msr-index.h
msr-index.h should contain all MSRs for easier grepping for MSR numbers when dealing with unchecked MSR access warnings, for example. Move the resctrl ones. Prefix IA32_PQR_ASSOC with "MSR_" while at it. No functional changes. Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20221106212923.20699-1-bp@alien8.de |
|
|
|
de4eda9de2 |
use less confusing names for iov_iter direction initializers
READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> |
|
|
|
55228db269 |
x86/fpu: Use _Alignof to avoid undefined behavior in TYPE_ALIGN
WG14 N2350 specifies that it is an undefined behavior to have type definitions within offsetof", see https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2350.htm This specification is also part of C23. Therefore, replace the TYPE_ALIGN macro with the _Alignof builtin to avoid undefined behavior. (_Alignof itself is C11 and the kernel is built with -gnu11). ISO C11 _Alignof is subtly different from the GNU C extension __alignof__. Latter is the preferred alignment and _Alignof the minimal alignment. For long long on x86 these are 8 and 4 respectively. The macro TYPE_ALIGN's behavior matches _Alignof rather than __alignof__. [ bp: Massage commit message. ] Signed-off-by: YingChi Long <me@inclyc.cn> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20220925153151.2467884-1-me@inclyc.cn |
|
|
|
be84d8ed3f |
x86/alternative: Consistently patch SMP locks in vmlinux and modules
alternatives_smp_module_add() restricts patching of SMP lock prefixes to the text address range passed as an argument. For vmlinux, patching all the instructions located between the _text and _etext symbols is allowed. That includes the .text section but also other sections such as .text.hot and .text.unlikely. As per the comment inside the 'struct smp_alt_module' definition, the original purpose of this restriction is to avoid patching the init code because in the case when one boots with a single CPU, the LOCK prefixes to the locking primitives are removed. Later on, when other CPUs are onlined, those LOCK prefixes get added back in but by that time the .init code is very likely removed so patching that would be a bad idea. For modules, the current code only allows patching instructions located inside the .text segment, excluding other sections such as .text.hot or .text.unlikely, which may need patching. Make patching of the kernel core and modules more consistent by allowing all text sections of modules except .init.text to be patched in module_finalize(). For that, use mod->core_layout.base/mod->core_layout.text_size as the address range allowed to be patched, which include all the code sections except the init code. [ bp: Massage and expand commit message. ] Signed-off-by: Julian Pidancet <julian.pidancet@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20221027204906.511277-1-julian.pidancet@oracle.com |
|
|
|
aaa65d17ee |
x86/tsx: Add a feature bit for TSX control MSR support
Support for the TSX control MSR is enumerated in MSR_IA32_ARCH_CAPABILITIES.
This is different from how other CPU features are enumerated i.e. via
CPUID. Currently, a call to tsx_ctrl_is_supported() is required for
enumerating the feature. In the absence of a feature bit for TSX control,
any code that relies on checking feature bits directly will not work.
In preparation for adding a feature bit check in MSR save/restore
during suspend/resume, set a new feature bit X86_FEATURE_TSX_CTRL when
MSR_IA32_TSX_CTRL is present. Also make tsx_ctrl_is_supported() use the
new feature bit to avoid any overhead of reading the MSR.
[ bp: Remove tsx_ctrl_is_supported(), add room for two more feature
bits in word 11 which are coming up in the next merge window. ]
Suggested-by: Andrew Cooper <andrew.cooper3@citrix.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com>
Cc: <stable@kernel.org>
Link: https://lore.kernel.org/r/de619764e1d98afbb7a5fa58424f1278ede37b45.1668539735.git.pawan.kumar.gupta@linux.intel.com
|
|
|
|
894909f95a |
- Do not hold fpregs lock when inheriting FPU permissions because the
fpregs lock disables preemption on RT but fpu_inherit_perms() does spin_lock_irq(), which, on RT, uses rtmutexes and they need to be preemptible. - Check the page offset and the length of the data supplied by userspace for overflow when specifying a set of pages to add to an SGX enclave -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmN6GjgACgkQEsHwGGHe VUpzog/+OIX3ZAZ0EJqg9GgvhacPjww1oPr+DRcpXCFYjk1jTJ3seJc2we+uun0j zYHbgO6BYyP3LdlrSjt8MgosMZGz1s14r9TXc46T8IhvUu0imbUkO9vLcxwL6pJl LJgPIYvBu6IUoVIQVlVr7PrVvUj8nUPc3w/8qmjR91bJAWTeeFvFflvn713jlWBP hLKiUvhdjA08Sp9gjF2drGl+NkSXPPLPHQetKa4BhVYqwDK5hRGBOt51CuDHdUOQ QYaP5JRy435ZsoFGgYq0lOxCXIYDe8rWRBCnDWdi7kjXEYhnKJLj6Fi1SxjD+cZC wDX+LQGFiShJFonGzxbeORBU04Owbz+nLsSeHCQsl/70kAv/W/44BLj+BPl0dit1 XBTUUCr9Wi9VdDTBVJT+EQbD3F5dBn1TO00Z0qzhv3D3gVruUNmv7SDHMoRUyYcy 9LueWCzF9YV1Se6V9gUox9vwTuc09J63IS2zkMm2ahCbfmWTSsx9P5BWLFK3E3Em lPsdZWNJQ7F6f0B3AfRjTDXvaMyzBRYfuZHEaBMq5avDWDFBCyOhc3PqjpKt5wHS URP6M/kOtz1zg8fy/XmMRCfCDBoAm+NfvF4zG9md1GYta7aP74Z824M+FMoXNv7f YcR4mCzpeeiG0hXyywcL+QDpmjlsYCPhe24Gnh/Bb+1g7Huyyc8= =VQD4 -----END PGP SIGNATURE----- Merge tag 'x86_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Do not hold fpregs lock when inheriting FPU permissions because the fpregs lock disables preemption on RT but fpu_inherit_perms() does spin_lock_irq(), which, on RT, uses rtmutexes and they need to be preemptible. - Check the page offset and the length of the data supplied by userspace for overflow when specifying a set of pages to add to an SGX enclave * tag 'x86_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Drop fpregs lock before inheriting FPU permissions x86/sgx: Add overflow check in sgx_validate_offset_length() |
|
|
|
b3883a9a1f |
stackprotector: move get_random_canary() into stackprotector.h
This has nothing to do with random.c and everything to do with stack protectors. Yes, it uses randomness. But many things use randomness. random.h and random.c are concerned with the generation of randomness, not with each and every use. So move this function into the more specific stackprotector.h file where it belongs. Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
|
|
|
e8a533cbeb |
treewide: use get_random_u32_inclusive() when possible
These cases were done with this Coccinelle: @@ expression H; expression L; @@ - (get_random_u32_below(H) + L) + get_random_u32_inclusive(L, H + L - 1) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - + E - - E ) @@ expression H; expression L; expression E; @@ get_random_u32_inclusive(L, H - - E - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - - E + F - + E ) @@ expression H; expression L; expression E; expression F; @@ get_random_u32_inclusive(L, H - + E + F - - E ) And then subsequently cleaned up by hand, with several automatic cases rejected if it didn't make sense contextually. Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
|
|
|
8032bf1233 |
treewide: use get_random_u32_below() instead of deprecated function
This is a simple mechanical transformation done by: @@ expression E; @@ - prandom_u32_max + get_random_u32_below (E) Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs Reviewed-by: SeongJae Park <sj@kernel.org> # for damon Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> # for infiniband Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> # for arm Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> |
|
|
|
d474d92d70 |
x86/apic: Remove X86_IRQ_ALLOC_CONTIGUOUS_VECTORS
Now that the PCI/MSI core code does early checking for multi-MSI support X86_IRQ_ALLOC_CONTIGUOUS_VECTORS is not required anymore. Remove the flag and rely on MSI_FLAG_MULTI_PCI_MSI. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20221111122015.865042356@linutronix.de |
|
|
|
d7e5aceace |
x86/fpu: Emulate XRSTOR's behavior if the xfeatures PKRU bit is not set
The hardware XRSTOR instruction resets the PKRU register to its hardware
init value (namely 0) if the PKRU bit is not set in the xfeatures mask.
Emulating that here restores the pre-5.14 behavior for PTRACE_SET_REGSET
with NT_X86_XSTATE, and makes sigreturn (which still uses XRSTOR) and
ptrace behave identically. KVM has never used XRSTOR and never had this
behavior, so KVM opts-out of this emulation by passing a NULL pkru pointer
to copy_uabi_to_xstate().
Fixes:
|
|
|
|
4a804c4f83 |
x86/fpu: Allow PKRU to be (once again) written by ptrace.
Move KVM's PKRU handling code in fpu_copy_uabi_to_guest_fpstate() to
copy_uabi_to_xstate() so that it is shared with other APIs that write the
XSTATE such as PTRACE_SETREGSET with NT_X86_XSTATE.
This restores the pre-5.14 behavior of ptrace. The regression can be seen
by running gdb and executing `p $pkru`, `set $pkru = 42`, and `p $pkru`.
On affected kernels (5.14+) the write to the PKRU register (which gdb
performs through ptrace) is ignored.
[ dhansen: removed stable@ tag for now. The ABI was broken for long
enough that this is not urgent material. Let's let it stew
in tip for a few weeks before it's submitted to stable
because there are so many ABIs potentially affected. ]
Fixes:
|
|
|
|
2c87767c35 |
x86/fpu: Add a pkru argument to copy_uabi_to_xstate()
In preparation for moving PKRU handling code out of fpu_copy_uabi_to_guest_fpstate() and into copy_uabi_to_xstate(), add an argument that copy_uabi_from_kernel_to_xstate() can use to pass the canonical location of the PKRU value. For copy_sigframe_from_user_to_xstate() the kernel will actually restore the PKRU value from the fpstate, but pass in the thread_struct's pkru location anyways for consistency. Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-4-khuey%40kylehuey.com |
|
|
|
1c813ce030 |
x86/fpu: Add a pkru argument to copy_uabi_from_kernel_to_xstate().
Both KVM (through KVM_SET_XSTATE) and ptrace (through PTRACE_SETREGSET with NT_X86_XSTATE) ultimately call copy_uabi_from_kernel_to_xstate(), but the canonical locations for the current PKRU value for KVM guests and processes in a ptrace stop are different (in the kvm_vcpu_arch and the thread_state structs respectively). In preparation for eventually handling PKRU in copy_uabi_to_xstate, pass in a pointer to the PKRU location. Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-3-khuey%40kylehuey.com |
|
|
|
6a877d2450 |
x86/fpu: Take task_struct* in copy_sigframe_from_user_to_xstate()
This will allow copy_sigframe_from_user_to_xstate() to grab the address of thread_struct's pkru value in a later patch. Signed-off-by: Kyle Huey <me@kylehuey.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lore.kernel.org/all/20221115230932.7126-2-khuey%40kylehuey.com |
|
|
|
2632daebaf |
x86/cpu: Restore AMD's DE_CFG MSR after resume
DE_CFG contains the LFENCE serializing bit, restore it on resume too.
This is relevant to older families due to the way how they do S3.
Unify and correct naming while at it.
Fixes:
|
|
|
|
d7c2b1f64e |
22 hotfixes. 8 are cc:stable and the remainder address issues which were
introduced post-6.0 or which aren't considered serious enough to justify a -stable backport. -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY27xPAAKCRDdBJ7gKXxA juFXAP4tSmfNDrT6khFhV0l4cS43bluErVNLh32RfXBqse8GYgEA5EPvZkOssLqY 86ejRXFgAArxYC4caiNURUQL+IASvQo= =YVOx -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc hotfixes from Andrew Morton: "22 hotfixes. Eight are cc:stable and the remainder address issues which were introduced post-6.0 or which aren't considered serious enough to justify a -stable backport" * tag 'mm-hotfixes-stable-2022-11-11' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (22 commits) docs: kmsan: fix formatting of "Example report" mm/damon/dbgfs: check if rm_contexts input is for a real context maple_tree: don't set a new maximum on the node when not reusing nodes maple_tree: fix depth tracking in maple_state arch/x86/mm/hugetlbpage.c: pud_huge() returns 0 when using 2-level paging fs: fix leaked psi pressure state nilfs2: fix use-after-free bug of ns_writer on remount x86/traps: avoid KMSAN bugs originating from handle_bug() kmsan: make sure PREEMPT_RT is off Kconfig.debug: ensure early check for KMSAN in CONFIG_KMSAN_WARN x86/uaccess: instrument copy_from_user_nmi() kmsan: core: kmsan_in_runtime() should return true in NMI context mm: hugetlb_vmemmap: include missing linux/moduleparam.h mm/shmem: use page_mapping() to detect page cache for uffd continue mm/memremap.c: map FS_DAX device memory as decrypted Partly revert "mm/thp: carry over dirty bit when thp splits on pmd" nilfs2: fix deadlock in nilfs_count_free_blocks() mm/mmap: fix memory leak in mmap_region() hugetlbfs: don't delete error page from pagecache maple_tree: reorganize testing to restore module testing ... |
|
|
|
727209376f |
x86/split_lock: Add sysctl to control the misery mode
Commit |
|
|
|
36b038791e |
x86/fpu: Drop fpregs lock before inheriting FPU permissions
Mike Galbraith reported the following against an old fork of preempt-rt
but the same issue also applies to the current preempt-rt tree.
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1, name: systemd
preempt_count: 1, expected: 0
RCU nest depth: 0, expected: 0
Preemption disabled at:
fpu_clone
CPU: 6 PID: 1 Comm: systemd Tainted: G E (unreleased)
Call Trace:
<TASK>
dump_stack_lvl
? fpu_clone
__might_resched
rt_spin_lock
fpu_clone
? copy_thread
? copy_process
? shmem_alloc_inode
? kmem_cache_alloc
? kernel_clone
? __do_sys_clone
? do_syscall_64
? __x64_sys_rt_sigprocmask
? syscall_exit_to_user_mode
? do_syscall_64
? syscall_exit_to_user_mode
? do_syscall_64
? syscall_exit_to_user_mode
? do_syscall_64
? exc_page_fault
? entry_SYSCALL_64_after_hwframe
</TASK>
Mike says:
The splat comes from fpu_inherit_perms() being called under fpregs_lock(),
and us reaching the spin_lock_irq() therein due to fpu_state_size_dynamic()
returning true despite static key __fpu_state_size_dynamic having never
been enabled.
Mike's assessment looks correct. fpregs_lock on a PREEMPT_RT kernel disables
preemption so calling spin_lock_irq() in fpu_inherit_perms() is unsafe. This
problem exists since commit
|
|
|
|
bd3d394e36 |
x86, KVM: remove unnecessary argument to x86_virt_spec_ctrl and callers
x86_virt_spec_ctrl only deals with the paravirtualized MSR_IA32_VIRT_SPEC_CTRL now and does not handle MSR_IA32_SPEC_CTRL anymore; remove the corresponding, unused argument. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> |