mirror of https://github.com/torvalds/linux.git
2109 Commits
| Author | SHA1 | Message | Date |
|---|---|---|---|
|
|
ecae0bd517 |
Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Kemeng Shi has contributed some compation maintenance work in the
series "Fixes and cleanups to compaction".
- Joel Fernandes has a patchset ("Optimize mremap during mutual
alignment within PMD") which fixes an obscure issue with mremap()'s
pagetable handling during a subsequent exec(), based upon an
implementation which Linus suggested.
- More DAMON/DAMOS maintenance and feature work from SeongJae Park i the
following patch series:
mm/damon: misc fixups for documents, comments and its tracepoint
mm/damon: add a tracepoint for damos apply target regions
mm/damon: provide pseudo-moving sum based access rate
mm/damon: implement DAMOS apply intervals
mm/damon/core-test: Fix memory leaks in core-test
mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
- In the series "Do not try to access unaccepted memory" Adrian Hunter
provides some fixups for the recently-added "unaccepted memory' feature.
To increase the feature's checking coverage. "Plug a few gaps where
RAM is exposed without checking if it is unaccepted memory".
- In the series "cleanups for lockless slab shrink" Qi Zheng has done
some maintenance work which is preparation for the lockless slab
shrinking code.
- Qi Zheng has redone the earlier (and reverted) attempt to make slab
shrinking lockless in the series "use refcount+RCU method to implement
lockless slab shrink".
- David Hildenbrand contributes some maintenance work for the rmap code
in the series "Anon rmap cleanups".
- Kefeng Wang does more folio conversions and some maintenance work in
the migration code. Series "mm: migrate: more folio conversion and
unification".
- Matthew Wilcox has fixed an issue in the buffer_head code which was
causing long stalls under some heavy memory/IO loads. Some cleanups
were added on the way. Series "Add and use bdev_getblk()".
- In the series "Use nth_page() in place of direct struct page
manipulation" Zi Yan has fixed a potential issue with the direct
manipulation of hugetlb page frames.
- In the series "mm: hugetlb: Skip initialization of gigantic tail
struct pages if freed by HVO" has improved our handling of gigantic
pages in the hugetlb vmmemmep optimizaton code. This provides
significant boot time improvements when significant amounts of gigantic
pages are in use.
- Matthew Wilcox has sent the series "Small hugetlb cleanups" - code
rationalization and folio conversions in the hugetlb code.
- Yin Fengwei has improved mlock()'s handling of large folios in the
series "support large folio for mlock"
- In the series "Expose swapcache stat for memcg v1" Liu Shixin has
added statistics for memcg v1 users which are available (and useful)
under memcg v2.
- Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
prctl so that userspace may direct the kernel to not automatically
propagate the denial to child processes. The series is named "MDWE
without inheritance".
- Kefeng Wang has provided the series "mm: convert numa balancing
functions to use a folio" which does what it says.
- In the series "mm/ksm: add fork-exec support for prctl" Stefan Roesch
makes is possible for a process to propagate KSM treatment across
exec().
- Huang Ying has enhanced memory tiering's calculation of memory
distances. This is used to permit the dax/kmem driver to use "high
bandwidth memory" in addition to Optane Data Center Persistent Memory
Modules (DCPMM). The series is named "memory tiering: calculate
abstract distance based on ACPI HMAT"
- In the series "Smart scanning mode for KSM" Stefan Roesch has
optimized KSM by teaching it to retain and use some historical
information from previous scans.
- Yosry Ahmed has fixed some inconsistencies in memcg statistics in the
series "mm: memcg: fix tracking of pending stats updates values".
- In the series "Implement IOCTL to get and optionally clear info about
PTEs" Peter Xu has added an ioctl to /proc/<pid>/pagemap which permits
us to atomically read-then-clear page softdirty state. This is mainly
used by CRIU.
- Hugh Dickins contributed the series "shmem,tmpfs: general maintenance"
- a bunch of relatively minor maintenance tweaks to this code.
- Matthew Wilcox has increased the use of the VMA lock over file-backed
page faults in the series "Handle more faults under the VMA lock". Some
rationalizations of the fault path became possible as a result.
- In the series "mm/rmap: convert page_move_anon_rmap() to
folio_move_anon_rmap()" David Hildenbrand has implemented some cleanups
and folio conversions.
- In the series "various improvements to the GUP interface" Lorenzo
Stoakes has simplified and improved the GUP interface with an eye to
providing groundwork for future improvements.
- Andrey Konovalov has sent along the series "kasan: assorted fixes and
improvements" which does those things.
- Some page allocator maintenance work from Kemeng Shi in the series
"Two minor cleanups to break_down_buddy_pages".
- In thes series "New selftest for mm" Breno Leitao has developed
another MM self test which tickles a race we had between madvise() and
page faults.
- In the series "Add folio_end_read" Matthew Wilcox provides cleanups
and an optimization to the core pagecache code.
- Nhat Pham has added memcg accounting for hugetlb memory in the series
"hugetlb memcg accounting".
- Cleanups and rationalizations to the pagemap code from Lorenzo
Stoakes, in the series "Abstract vma_merge() and split_vma()".
- Audra Mitchell has fixed issues in the procfs page_owner code's new
timestamping feature which was causing some misbehaviours. In the
series "Fix page_owner's use of free timestamps".
- Lorenzo Stoakes has fixed the handling of new mappings of sealed files
in the series "permit write-sealed memfd read-only shared mappings".
- Mike Kravetz has optimized the hugetlb vmemmap optimization in the
series "Batch hugetlb vmemmap modification operations".
- Some buffer_head folio conversions and cleanups from Matthew Wilcox in
the series "Finish the create_empty_buffers() transition".
- As a page allocator performance optimization Huang Ying has added
automatic tuning to the allocator's per-cpu-pages feature, in the series
"mm: PCP high auto-tuning".
- Roman Gushchin has contributed the patchset "mm: improve performance
of accounted kernel memory allocations" which improves their performance
by ~30% as measured by a micro-benchmark.
- folio conversions from Kefeng Wang in the series "mm: convert page
cpupid functions to folios".
- Some kmemleak fixups in Liu Shixin's series "Some bugfix about
kmemleak".
- Qi Zheng has improved our handling of memoryless nodes by keeping them
off the allocation fallback list. This is done in the series "handle
memoryless nodes more appropriately".
- khugepaged conversions from Vishal Moola in the series "Some
khugepaged folio conversions".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZULEMwAKCRDdBJ7gKXxA
jhQHAQCYpD3g849x69DmHnHWHm/EHQLvQmRMDeYZI+nx/sCJOwEAw4AKg0Oemv9y
FgeUPAD1oasg6CP+INZvCj34waNxwAc=
=E+Y4
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
"Many singleton patches against the MM code. The patch series which are
included in this merge do the following:
- Kemeng Shi has contributed some compation maintenance work in the
series 'Fixes and cleanups to compaction'
- Joel Fernandes has a patchset ('Optimize mremap during mutual
alignment within PMD') which fixes an obscure issue with mremap()'s
pagetable handling during a subsequent exec(), based upon an
implementation which Linus suggested
- More DAMON/DAMOS maintenance and feature work from SeongJae Park i
the following patch series:
mm/damon: misc fixups for documents, comments and its tracepoint
mm/damon: add a tracepoint for damos apply target regions
mm/damon: provide pseudo-moving sum based access rate
mm/damon: implement DAMOS apply intervals
mm/damon/core-test: Fix memory leaks in core-test
mm/damon/sysfs-schemes: Do DAMOS tried regions update for only one apply interval
- In the series 'Do not try to access unaccepted memory' Adrian
Hunter provides some fixups for the recently-added 'unaccepted
memory' feature. To increase the feature's checking coverage. 'Plug
a few gaps where RAM is exposed without checking if it is
unaccepted memory'
- In the series 'cleanups for lockless slab shrink' Qi Zheng has done
some maintenance work which is preparation for the lockless slab
shrinking code
- Qi Zheng has redone the earlier (and reverted) attempt to make slab
shrinking lockless in the series 'use refcount+RCU method to
implement lockless slab shrink'
- David Hildenbrand contributes some maintenance work for the rmap
code in the series 'Anon rmap cleanups'
- Kefeng Wang does more folio conversions and some maintenance work
in the migration code. Series 'mm: migrate: more folio conversion
and unification'
- Matthew Wilcox has fixed an issue in the buffer_head code which was
causing long stalls under some heavy memory/IO loads. Some cleanups
were added on the way. Series 'Add and use bdev_getblk()'
- In the series 'Use nth_page() in place of direct struct page
manipulation' Zi Yan has fixed a potential issue with the direct
manipulation of hugetlb page frames
- In the series 'mm: hugetlb: Skip initialization of gigantic tail
struct pages if freed by HVO' has improved our handling of gigantic
pages in the hugetlb vmmemmep optimizaton code. This provides
significant boot time improvements when significant amounts of
gigantic pages are in use
- Matthew Wilcox has sent the series 'Small hugetlb cleanups' - code
rationalization and folio conversions in the hugetlb code
- Yin Fengwei has improved mlock()'s handling of large folios in the
series 'support large folio for mlock'
- In the series 'Expose swapcache stat for memcg v1' Liu Shixin has
added statistics for memcg v1 users which are available (and
useful) under memcg v2
- Florent Revest has enhanced the MDWE (Memory-Deny-Write-Executable)
prctl so that userspace may direct the kernel to not automatically
propagate the denial to child processes. The series is named 'MDWE
without inheritance'
- Kefeng Wang has provided the series 'mm: convert numa balancing
functions to use a folio' which does what it says
- In the series 'mm/ksm: add fork-exec support for prctl' Stefan
Roesch makes is possible for a process to propagate KSM treatment
across exec()
- Huang Ying has enhanced memory tiering's calculation of memory
distances. This is used to permit the dax/kmem driver to use 'high
bandwidth memory' in addition to Optane Data Center Persistent
Memory Modules (DCPMM). The series is named 'memory tiering:
calculate abstract distance based on ACPI HMAT'
- In the series 'Smart scanning mode for KSM' Stefan Roesch has
optimized KSM by teaching it to retain and use some historical
information from previous scans
- Yosry Ahmed has fixed some inconsistencies in memcg statistics in
the series 'mm: memcg: fix tracking of pending stats updates
values'
- In the series 'Implement IOCTL to get and optionally clear info
about PTEs' Peter Xu has added an ioctl to /proc/<pid>/pagemap
which permits us to atomically read-then-clear page softdirty
state. This is mainly used by CRIU
- Hugh Dickins contributed the series 'shmem,tmpfs: general
maintenance', a bunch of relatively minor maintenance tweaks to
this code
- Matthew Wilcox has increased the use of the VMA lock over
file-backed page faults in the series 'Handle more faults under the
VMA lock'. Some rationalizations of the fault path became possible
as a result
- In the series 'mm/rmap: convert page_move_anon_rmap() to
folio_move_anon_rmap()' David Hildenbrand has implemented some
cleanups and folio conversions
- In the series 'various improvements to the GUP interface' Lorenzo
Stoakes has simplified and improved the GUP interface with an eye
to providing groundwork for future improvements
- Andrey Konovalov has sent along the series 'kasan: assorted fixes
and improvements' which does those things
- Some page allocator maintenance work from Kemeng Shi in the series
'Two minor cleanups to break_down_buddy_pages'
- In thes series 'New selftest for mm' Breno Leitao has developed
another MM self test which tickles a race we had between madvise()
and page faults
- In the series 'Add folio_end_read' Matthew Wilcox provides cleanups
and an optimization to the core pagecache code
- Nhat Pham has added memcg accounting for hugetlb memory in the
series 'hugetlb memcg accounting'
- Cleanups and rationalizations to the pagemap code from Lorenzo
Stoakes, in the series 'Abstract vma_merge() and split_vma()'
- Audra Mitchell has fixed issues in the procfs page_owner code's new
timestamping feature which was causing some misbehaviours. In the
series 'Fix page_owner's use of free timestamps'
- Lorenzo Stoakes has fixed the handling of new mappings of sealed
files in the series 'permit write-sealed memfd read-only shared
mappings'
- Mike Kravetz has optimized the hugetlb vmemmap optimization in the
series 'Batch hugetlb vmemmap modification operations'
- Some buffer_head folio conversions and cleanups from Matthew Wilcox
in the series 'Finish the create_empty_buffers() transition'
- As a page allocator performance optimization Huang Ying has added
automatic tuning to the allocator's per-cpu-pages feature, in the
series 'mm: PCP high auto-tuning'
- Roman Gushchin has contributed the patchset 'mm: improve
performance of accounted kernel memory allocations' which improves
their performance by ~30% as measured by a micro-benchmark
- folio conversions from Kefeng Wang in the series 'mm: convert page
cpupid functions to folios'
- Some kmemleak fixups in Liu Shixin's series 'Some bugfix about
kmemleak'
- Qi Zheng has improved our handling of memoryless nodes by keeping
them off the allocation fallback list. This is done in the series
'handle memoryless nodes more appropriately'
- khugepaged conversions from Vishal Moola in the series 'Some
khugepaged folio conversions'"
[ bcachefs conflicts with the dynamically allocated shrinkers have been
resolved as per Stephen Rothwell in
https://lore.kernel.org/all/20230913093553.4290421e@canb.auug.org.au/
with help from Qi Zheng.
The clone3 test filtering conflict was half-arsed by yours truly ]
* tag 'mm-stable-2023-11-01-14-33' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (406 commits)
mm/damon/sysfs: update monitoring target regions for online input commit
mm/damon/sysfs: remove requested targets when online-commit inputs
selftests: add a sanity check for zswap
Documentation: maple_tree: fix word spelling error
mm/vmalloc: fix the unchecked dereference warning in vread_iter()
zswap: export compression failure stats
Documentation: ubsan: drop "the" from article title
mempolicy: migration attempt to match interleave nodes
mempolicy: mmap_lock is not needed while migrating folios
mempolicy: alloc_pages_mpol() for NUMA policy without vma
mm: add page_rmappable_folio() wrapper
mempolicy: remove confusing MPOL_MF_LAZY dead code
mempolicy: mpol_shared_policy_init() without pseudo-vma
mempolicy trivia: use pgoff_t in shared mempolicy tree
mempolicy trivia: slightly more consistent naming
mempolicy trivia: delete those ancient pr_debug()s
mempolicy: fix migrate_pages(2) syscall return nr_failed
kernfs: drop shared NUMA mempolicy hooks
hugetlbfs: drop shared NUMA mempolicy pretence
mm/damon/sysfs-test: add a unit test for damon_sysfs_set_targets()
...
|
|
|
|
6ed92e559a |
SCSI misc on 20231102
Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc, scsi_debug) plus the usual assorted minor fixes and updates. The major change this time around is a prep patch for rethreading of the driver reset handler API not to take a scsi_cmd structure which starts to reduce various drivers' dependence on scsi_cmd in error handling. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZUORLiYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishQ4WAQDDIhzp /PiJBBtt0U9ii/lYqRLrOVnN0extKEgEGO+FbwEAssKgs+5Jn/7XCgdpSrx8Co3/ 0cPXrZGxs7tFpFWLZjM= =AlRU -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "Updates to the usual drivers (ufs, megaraid_sas, lpfc, target, ibmvfc, scsi_debug) plus the usual assorted minor fixes and updates. The major change this time around is a prep patch for rethreading of the driver reset handler API not to take a scsi_cmd structure which starts to reduce various drivers' dependence on scsi_cmd in error handling" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (132 commits) scsi: ufs: core: Leave space for '\0' in utf8 desc string scsi: ufs: core: Conversion to bool not necessary scsi: ufs: core: Fix race between force complete and ISR scsi: megaraid: Fix up debug message in megaraid_abort_and_reset() scsi: aic79xx: Fix up NULL command in ahd_done() scsi: message: fusion: Initialize return value in mptfc_bus_reset() scsi: mpt3sas: Fix loop logic scsi: snic: Remove useless code in snic_dr_clean_pending_req() scsi: core: Add comment to target_destroy in scsi_host_template scsi: core: Clean up scsi_dev_queue_ready() scsi: pmcraid: Add missing scsi_device_put() in pmcraid_eh_target_reset_handler() scsi: target: core: Fix kernel-doc comment scsi: pmcraid: Fix kernel-doc comment scsi: core: Handle depopulation and restoration in progress scsi: ufs: core: Add support for parsing OPP scsi: ufs: core: Add OPP support for scaling clocks and regulators scsi: ufs: dt-bindings: common: Add OPP table scsi: scsi_debug: Add param to control sdev's allow_restart scsi: scsi_debug: Add debugfs interface to fail target reset scsi: scsi_debug: Add new error injection type: Reset LUN failed ... |
|
|
|
426ee5196d |
sysctl-6.7-rc1
To help make the move of sysctls out of kernel/sysctl.c not incur a size
penalty sysctl has been changed to allow us to not require the sentinel, the
final empty element on the sysctl array. Joel Granados has been doing all this
work. On the v6.6 kernel we got the major infrastructure changes required to
support this. For v6.7-rc1 we have all arch/ and drivers/ modified to remove
the sentinel. Both arch and driver changes have been on linux-next for a bit
less than a month. It is worth re-iterating the value:
- this helps reduce the overall build time size of the kernel and run time
memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move sysctls
out from kernel/sysctl.c to their own files
For v6.8-rc1 expect removal of all the sentinels and also then the unneeded
check for procname == NULL.
The last 2 patches are fixes recently merged by Krister Johansen which allow
us again to use softlockup_panic early on boot. This used to work but the
alias work broke it. This is useful for folks who want to detect softlockups
super early rather than wait and spend money on cloud solutions with nothing
but an eventual hung kernel. Although this hadn't gone through linux-next it's
also a stable fix, so we might as well roll through the fixes now.
-----BEGIN PGP SIGNATURE-----
iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmVCqKsSHG1jZ3JvZkBr
ZXJuZWwub3JnAAoJEM4jHQowkoinEgYQAIpkqRL85DBwems19Uk9A27lkctwZ6Fc
HdslQCObQTsbuKVimZFP4IL2beUfUE0cfLZCXlzp+4nRDOf6vyhyf3w19jPQtI0Q
YdqwTk9y6G5VjDsb35QK0+UBloY/kZ1H3/LW4uCwjXTuksUGmWW2Qvey35696Scv
hDMLADqKQmdpYxLUaNi9QyYbEAjYtOai2ezg3+i7hTG168t1k/Ab2BxIFrPVsCR2
FAiq05L4ugWjNskdsWBjck05JZsx9SK/qcAxpIPoUm4nGiFNHApXE0E0hs3vsnmn
WIHIbxCQw8ZlUDlmw4S+0YH3NFFzFbWfmW8k2b0f2qZTJm/rU4KiJfcJVknkAUVF
raFox6XDW0AUQ9L/NOUJ9ip5rup57GcFrMYocdJ3PPAvvmHKOb1D1O741p75RRcc
9j7zwfIRrzjPUqzhsQS/GFjdJu3lJNmEBK1AcgrVry6WoItrAzJHKPPDC7TwaNmD
eXpjxMl1sYzzHqtVh4hn+xkUYphj/6gTGMV8zdo+/FopFswgeJW9G8kHtlEWKDPk
MRIKwACmfetP6f3ngHunBg+BOipbjCANL7JI0nOhVOQoaULxCCPx+IPJ6GfSyiuH
AbcjH8DGI7fJbUkBFoF0dsRFZ2gH8ds1PYMbWUJ6x3FtuCuv5iIuvQYoaWU6itm7
6f0KvCogg0fU
=Qf50
-----END PGP SIGNATURE-----
Merge tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull sysctl updates from Luis Chamberlain:
"To help make the move of sysctls out of kernel/sysctl.c not incur a
size penalty sysctl has been changed to allow us to not require the
sentinel, the final empty element on the sysctl array. Joel Granados
has been doing all this work. On the v6.6 kernel we got the major
infrastructure changes required to support this. For v6.7-rc1 we have
all arch/ and drivers/ modified to remove the sentinel. Both arch and
driver changes have been on linux-next for a bit less than a month. It
is worth re-iterating the value:
- this helps reduce the overall build time size of the kernel and run
time memory consumed by the kernel by about ~64 bytes per array
- the extra 64-byte penalty is no longer inncurred now when we move
sysctls out from kernel/sysctl.c to their own files
For v6.8-rc1 expect removal of all the sentinels and also then the
unneeded check for procname == NULL.
The last two patches are fixes recently merged by Krister Johansen
which allow us again to use softlockup_panic early on boot. This used
to work but the alias work broke it. This is useful for folks who want
to detect softlockups super early rather than wait and spend money on
cloud solutions with nothing but an eventual hung kernel. Although
this hadn't gone through linux-next it's also a stable fix, so we
might as well roll through the fixes now"
* tag 'sysctl-6.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (23 commits)
watchdog: move softlockup_panic back to early_param
proc: sysctl: prevent aliased sysctls from getting passed to init
intel drm: Remove now superfluous sentinel element from ctl_table array
Drivers: hv: Remove now superfluous sentinel element from ctl_table array
raid: Remove now superfluous sentinel element from ctl_table array
fw loader: Remove the now superfluous sentinel element from ctl_table array
sgi-xp: Remove the now superfluous sentinel element from ctl_table array
vrf: Remove the now superfluous sentinel element from ctl_table array
char-misc: Remove the now superfluous sentinel element from ctl_table array
infiniband: Remove the now superfluous sentinel element from ctl_table array
macintosh: Remove the now superfluous sentinel element from ctl_table array
parport: Remove the now superfluous sentinel element from ctl_table array
scsi: Remove now superfluous sentinel element from ctl_table array
tty: Remove now superfluous sentinel element from ctl_table array
xen: Remove now superfluous sentinel element from ctl_table array
hpet: Remove now superfluous sentinel element from ctl_table array
c-sky: Remove now superfluous sentinel element from ctl_talbe array
powerpc: Remove now superfluous sentinel element from ctl_table arrays
riscv: Remove now superfluous sentinel element from ctl_table array
x86/vdso: Remove now superfluous sentinel element from ctl_table array
...
|
|
|
|
ca995ce438 |
xen: branch for v6.7-rc1
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZT0uPgAKCRCAXGG7T9hj vmIPAQDiCkJzskB9tgvic6/l7HnOUxcKcUmhHSpXbZ21sx9S6QEA06HEXHgs14Ax +t51DVNMk4bBm5O8I7z650JpoTbN5ww= =Pt1M -----END PGP SIGNATURE----- Merge tag 'for-linus-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - two small cleanup patches - a fix for PCI passthrough under Xen - a four patch series speeding up virtio under Xen with user space backends * tag 'for-linus-6.7-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled xen: privcmd: Add support for ioeventfd xen: evtchn: Allow shared registration of IRQ handers xen: irqfd: Use _IOW instead of the internal _IOC() macro xen: Make struct privcmd_irqfd's layout architecture independent xen/xenbus: Add __counted_by for struct read_buffer and use struct_size() xenbus: fix error exit in xenbus_init() |
|
|
|
3cf3fabccb |
Locking changes in this cycle are:
- Futex improvements:
- Add the 'futex2' syscall ABI, which is an attempt to get away from the
multiplex syscall and adds a little room for extentions, while lifting
some limitations.
- Fix futex PI recursive rt_mutex waiter state bug
- Fix inter-process shared futexes on no-MMU systems
- Use folios instead of pages
- Micro-optimizations of locking primitives:
- Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
architectures, to improve lockref code generation.
- Improve the x86-32 lockref_get_not_zero() main loop by adding
build-time CMPXCHG8B support detection for the relevant lockref code,
and by better interfacing the CMPXCHG8B assembly code with the compiler.
- Introduce arch_sync_try_cmpxchg() on x86 to improve sync_try_cmpxchg()
code generation. Convert some sync_cmpxchg() users to sync_try_cmpxchg().
- Micro-optimize rcuref_put_slowpath()
- Locking debuggability improvements:
- Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
- Enforce atomicity of sched_submit_work(), which is de-facto atomic but
was un-enforced previously.
- Extend <linux/cleanup.h>'s no_free_ptr() with __must_check semantics
- Fix ww_mutex self-tests
- Clean up const-propagation in <linux/seqlock.h> and simplify
the API-instantiation macros a bit.
- RT locking improvements:
- Provide the rt_mutex_*_schedule() primitives/helpers and use them
in the rtmutex code to avoid recursion vs. rtlock on the PI state.
- Add nested blocking lockdep asserts to rt_mutex_lock(), rtlock_lock()
and rwbase_read_lock().
- Plus misc fixes & cleanups
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmU877IRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1g9jw/+N7rxQ78dmFCYh4UWnLCYvuKP0/ivHErG
493JcB8MupuA2tfJHIkDdr4aM2mNq2E61w69/WlZAQWWD6pdOhwgF5Xf5eoEcJm0
vsAhWBGLxihXdtevPuMAx0dEpg3AMp2wc6i5PkN831KdPUgCNsrKq9Bfnfef7/G8
MQTSHjmtba6jxleyxfEa4tE2xe5PJX825nRfkX2e1cf+stkYua+uJFxVxUfxFWGE
4pBy70D9OC7MsJ44WWOA1gwkVtMMiBTmRPNjlP8Gz2GQ0f3ERHRwYk3jDHOPHZI6
0GNt7pE3IMXQn2UuDtfkvv9IFTd+U5qD+APnWIn2ntWXqzGLFqOlmovMrobVn7El
olYDCyweWPG71m1Qblsb1VK2QjRPQVJ9NAEg8RlDHIu2ThxHbMysDVGPVOYnPFq4
S8QFpmldzbNoPU4rDJyT1fAmoUIrusBHkl+Us3yGfC74iM+fHnDEvaSoMZbzEdY1
x/Nocj9XgKEgfXdYzrCWFmZ9xXqHkO25/wDL6yKqBdQtvaEalXuHTT6mQcYxrUPm
Xx1BPan2Jg7p4u2oOFcVtKewUtRH9KBx8qytr5S+JK4PJbrBsixMnr84HLd/3X2V
ykYkO+367T5MTYv4TnJDE5vdurzUqekKSCFPY3skPujPJfdLj1vsPzYf9iMkCLdo
hU2f/R+Wpdk=
=36Ff
-----END PGP SIGNATURE-----
Merge tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Info Molnar:
"Futex improvements:
- Add the 'futex2' syscall ABI, which is an attempt to get away from
the multiplex syscall and adds a little room for extentions, while
lifting some limitations.
- Fix futex PI recursive rt_mutex waiter state bug
- Fix inter-process shared futexes on no-MMU systems
- Use folios instead of pages
Micro-optimizations of locking primitives:
- Improve arch_spin_value_unlocked() on asm-generic ticket spinlock
architectures, to improve lockref code generation
- Improve the x86-32 lockref_get_not_zero() main loop by adding
build-time CMPXCHG8B support detection for the relevant lockref
code, and by better interfacing the CMPXCHG8B assembly code with
the compiler
- Introduce arch_sync_try_cmpxchg() on x86 to improve
sync_try_cmpxchg() code generation. Convert some sync_cmpxchg()
users to sync_try_cmpxchg().
- Micro-optimize rcuref_put_slowpath()
Locking debuggability improvements:
- Improve CONFIG_DEBUG_RT_MUTEXES=y to have a fast-path as well
- Enforce atomicity of sched_submit_work(), which is de-facto atomic
but was un-enforced previously.
- Extend <linux/cleanup.h>'s no_free_ptr() with __must_check
semantics
- Fix ww_mutex self-tests
- Clean up const-propagation in <linux/seqlock.h> and simplify the
API-instantiation macros a bit
RT locking improvements:
- Provide the rt_mutex_*_schedule() primitives/helpers and use them
in the rtmutex code to avoid recursion vs. rtlock on the PI state.
- Add nested blocking lockdep asserts to rt_mutex_lock(),
rtlock_lock() and rwbase_read_lock()
.. plus misc fixes & cleanups"
* tag 'locking-core-2023-10-28' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
futex: Don't include process MM in futex key on no-MMU
locking/seqlock: Fix grammar in comment
alpha: Fix up new futex syscall numbers
locking/seqlock: Propagate 'const' pointers within read-only methods, remove forced type casts
locking/lockdep: Fix string sizing bug that triggers a format-truncation compiler-warning
locking/seqlock: Change __seqprop() to return the function pointer
locking/seqlock: Simplify SEQCOUNT_LOCKNAME()
locking/atomics: Use atomic_try_cmpxchg_release() to micro-optimize rcuref_put_slowpath()
locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
locking/atomic/x86: Introduce arch_sync_try_cmpxchg()
locking/atomic: Add generic support for sync_try_cmpxchg() and its fallback
locking/seqlock: Fix typo in comment
futex/requeue: Remove unnecessary ‘NULL’ initialization from futex_proxy_trylock_atomic()
locking/local, arch: Rewrite local_add_unless() as a static inline function
locking/debug: Fix debugfs API return value checks to use IS_ERR()
locking/ww_mutex/test: Make sure we bail out instead of livelock
locking/ww_mutex/test: Fix potential workqueue corruption
locking/ww_mutex/test: Use prng instead of rng to avoid hangs at bootup
futex: Add sys_futex_requeue()
futex: Add flags2 argument to futex_requeue()
...
|
|
|
|
2c269f42d0 |
xen-pciback: Consider INTx disabled when MSI/MSI-X is enabled
Linux enables MSI-X before disabling INTx, but keeps MSI-X masked until
the table is filled. Then it disables INTx just before clearing MASKALL
bit. Currently this approach is rejected by xen-pciback.
According to the PCIe spec, device cannot use INTx when MSI/MSI-X is
enabled (in other words: enabling MSI/MSI-X implicitly disables INTx).
Change the logic to consider INTx disabled if MSI/MSI-X is enabled. This
applies to three places:
- checking currently enabled interrupts type,
- transition to MSI/MSI-X - where INTx would be implicitly disabled,
- clearing INTx disable bit - which can be allowed even if MSI/MSI-X is
enabled, as device should consider INTx disabled anyway in that case
Fixes:
|
|
|
|
f0d7db7b33 |
xen: privcmd: Add support for ioeventfd
Virtio guests send VIRTIO_MMIO_QUEUE_NOTIFY notification when they need
to notify the backend of an update to the status of the virtqueue. The
backend or another entity, polls the MMIO address for updates to know
when the notification is sent.
It works well if the backend does this polling by itself. But as we move
towards generic backend implementations, we end up implementing this in
a separate user-space program.
Generally, the Virtio backends are implemented to work with the Eventfd
based mechanism. In order to make such backends work with Xen, another
software layer needs to do the polling and send an event via eventfd to
the backend once the notification from guest is received. This results
in an extra context switch.
This is not a new problem in Linux though. It is present with other
hypervisors like KVM, etc. as well. The generic solution implemented in
the kernel for them is to provide an IOCTL call to pass the address to
poll and eventfd, which lets the kernel take care of polling and raise
an event on the eventfd, instead of handling this in user space (which
involves an extra context switch).
This patch adds similar support for xen.
Inspired by existing implementations for KVM, etc..
This also copies ioreq.h header file (only struct ioreq and related
macros) from Xen's source tree (Top commit 5d84f07fe6bf ("xen/pci: drop
remaining uses of bool_t")).
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/b20d83efba6453037d0c099912813c79c81f7714.1697439990.git.viresh.kumar@linaro.org
Signed-off-by: Juergen Gross <jgross@suse.com>
|
|
|
|
9e90e58c11 |
xen: evtchn: Allow shared registration of IRQ handers
Currently the handling of events is supported either in the kernel or userspace, but not both. In order to support fast delivery of interrupts from the guest to the backend, we need to handle the Queue notify part of Virtio protocol in kernel and the rest in userspace. Update the interrupt handler registration flag to IRQF_SHARED for event channels, which would allow multiple entities to bind their interrupt handler for the same event channel port. Also increment the reference count of irq_info when multiple entities try to bind event channel to irqchip, so the unbinding happens only after all the users are gone. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/99b1edfd3147c6b5d22a5139dab5861e767dc34a.1697439990.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
8dd765a5d7 |
xen: Make struct privcmd_irqfd's layout architecture independent
Using indirect pointers in an ioctl command argument means that the
layout is architecture specific, in particular we can't use the same one
from 32-bit compat tasks. The general recommendation is to have __u64
members and use u64_to_user_ptr() to access it from the kernel if we are
unable to avoid the pointers altogether.
Fixes:
|
|
|
|
d3a2b6b48f |
xen/xenbus: Add __counted_by for struct read_buffer and use struct_size()
Prepare for the coming implementation by GCC and Clang of the __counted_by attribute. Flexible array members annotated with __counted_by can have their accesses bounds-checked at run-time via CONFIG_UBSAN_BOUNDS (for array indexing) and CONFIG_FORTIFY_SOURCE (for strcpy/memcpy-family functions). While there, use struct_size() helper, instead of the open-coded version, to calculate the size for the allocation of the whole flexible structure, including of course, the flexible-array member. This code was found with the help of Coccinelle, and audited and fixed manually. Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Jason Andryuk <jandryuk@gmail.com> Link: https://lore.kernel.org/r/ZSRMosLuJJS5Y/io@work Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
44961b81a9 |
xenbus: fix error exit in xenbus_init()
In case an error occurs in xenbus_init(), xen_store_domain_type should
be set to XS_UNKNOWN.
Fix one instance where this action is missing.
Fixes:
|
|
|
|
194605d45d |
scsi: target: Have drivers report if they support direct submissions
In some cases, like with multiple LUN targets or where the target has to respond to transport level requests from the receiving context it can be better to defer cmd submission to a helper thread. If the backend driver blocks on something like request/tag allocation it can block the entire target submission path and other LUs and transport IO on that session. In other cases like single LUN targets with storage that can support all the commands that the target can queue, then it's best to submit the cmd to the backend from the target's cmd receiving context. Subsequent commits will allow the user to config what they prefer, but drivers like loop can't directly submit because they can be called from a context that can't sleep. And, drivers like vhost-scsi can support direct submission, but need to keep their default behavior of deferring execution to avoid possible regressions where the backend can block. Make the drivers tell LIO core if they support direct submissions and their current default, so we can prevent users from misconfiguring the system and initialize devices correctly. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20230928020907.5730-2-michael.christie@oracle.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
|
|
|
a5b2aeeac1 |
xen: Remove now superfluous sentinel element from ctl_table array
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) Remove sentinel from balloon_table Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> |
|
|
|
ad0a2e4c2f |
locking/atomic, xen: Use sync_try_cmpxchg() instead of sync_cmpxchg()
Use sync_try_cmpxchg() instead of sync_cmpxchg(*ptr, old, new) == old
in clear_masked_cond(), clear_linked() and
gnttab_end_foreign_access_ref_v1(). x86 CMPXCHG instruction returns
success in ZF flag, so this change saves a compare after cmpxchg
(and related move instruction in front of cmpxchg), improving the
cmpxchg loop in gnttab_end_foreign_access_ref_v1() from:
174: eb 0e jmp 184 <...>
176: 89 d0 mov %edx,%eax
178: f0 66 0f b1 31 lock cmpxchg %si,(%rcx)
17d: 66 39 c2 cmp %ax,%dx
180: 74 11 je 193 <...>
182: 89 c2 mov %eax,%edx
184: 89 d6 mov %edx,%esi
186: 66 83 e6 18 and $0x18,%si
18a: 74 ea je 176 <...>
to:
614: 89 c1 mov %eax,%ecx
616: 66 83 e1 18 and $0x18,%cx
61a: 75 11 jne 62d <...>
61c: f0 66 0f b1 0a lock cmpxchg %cx,(%rdx)
621: 75 f1 jne 614 <...>
No functional change intended.
Signed-off-by: Uros Bizjak <ubizjak@gmail.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Juergen Gross <jgross@suse.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stefano Stabellini <sstabellini@kernel.org>
Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
|
|
|
|
87797fad6c |
xen/events: replace evtchn_rwlock with RCU
In unprivileged Xen guests event handling can cause a deadlock with
Xen console handling. The evtchn_rwlock and the hvc_lock are taken in
opposite sequence in __hvc_poll() and in Xen console IRQ handling.
Normally this is no problem, as the evtchn_rwlock is taken as a reader
in both paths, but as soon as an event channel is being closed, the
lock will be taken as a writer, which will cause read_lock() to block:
CPU0 CPU1 CPU2
(IRQ handling) (__hvc_poll()) (closing event channel)
read_lock(evtchn_rwlock)
spin_lock(hvc_lock)
write_lock(evtchn_rwlock)
[blocks]
spin_lock(hvc_lock)
[blocks]
read_lock(evtchn_rwlock)
[blocks due to writer waiting,
and not in_interrupt()]
This issue can be avoided by replacing evtchn_rwlock with RCU in
xen_free_irq(). Note that RCU is used only to delay freeing of the
irq_info memory. There is no RCU based dereferencing or replacement of
pointers involved.
In order to avoid potential races between removing the irq_info
reference and handling of interrupts, set the irq_info pointer to NULL
only when freeing its memory. The IRQ itself must be freed at that
time, too, as otherwise the same IRQ number could be allocated again
before handling of the old instance would have been finished.
This is XSA-441 / CVE-2023-34324.
Fixes:
|
|
|
|
1ec016bfa1 |
xenbus/backend: dynamically allocate the xen-backend shrinker
Use new APIs to dynamically allocate the xen-backend shrinker. Link: https://lkml.kernel.org/r/20230911094444.68966-6-zhengqi.arch@bytedance.com Signed-off-by: Qi Zheng <zhengqi.arch@bytedance.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Juergen Gross <jgross@suse.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Abhinav Kumar <quic_abhinavk@quicinc.com> Cc: Alasdair Kergon <agk@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Carlos Llamas <cmllamas@google.com> Cc: Chandan Babu R <chandan.babu@oracle.com> Cc: Chao Yu <chao@kernel.org> Cc: Chris Mason <clm@fb.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christian Koenig <christian.koenig@amd.com> Cc: Chuck Lever <cel@kernel.org> Cc: Coly Li <colyli@suse.de> Cc: Dai Ngo <Dai.Ngo@oracle.com> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: "Darrick J. Wong" <djwong@kernel.org> Cc: Dave Chinner <david@fromorbit.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: David Airlie <airlied@gmail.com> Cc: David Hildenbrand <david@redhat.com> Cc: David Sterba <dsterba@suse.com> Cc: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Cc: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huang Rui <ray.huang@amd.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Jan Kara <jack@suse.cz> Cc: Jason Wang <jasowang@redhat.com> Cc: Jeff Layton <jlayton@kernel.org> Cc: Jeffle Xu <jefflexu@linux.alibaba.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Josef Bacik <josef@toxicpanda.com> Cc: Kent Overstreet <kent.overstreet@gmail.com> Cc: Kirill Tkhai <tkhai@ya.ru> Cc: Marijn Suijten <marijn.suijten@somainline.org> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Nadav Amit <namit@vmware.com> Cc: Neil Brown <neilb@suse.de> Cc: Olga Kornievskaia <kolga@netapp.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Richard Weinberger <richard@nod.at> Cc: Rob Clark <robdclark@gmail.com> Cc: Rob Herring <robh@kernel.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Sean Paul <sean@poorly.run> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Song Liu <song@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com> Cc: Tom Talpey <tom@talpey.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Yue Hu <huyue2@coolpad.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
37510dd566 |
xen: simplify evtchn_do_upcall() call maze
There are several functions involved for performing the functionality of evtchn_do_upcall(): - __xen_evtchn_do_upcall() doing the real work - xen_hvm_evtchn_do_upcall() just being a wrapper for __xen_evtchn_do_upcall(), exposed for external callers - xen_evtchn_do_upcall() calling __xen_evtchn_do_upcall(), too, but without any user Simplify this maze by: - removing the unused xen_evtchn_do_upcall() - removing xen_hvm_evtchn_do_upcall() as the only left caller of __xen_evtchn_do_upcall(), while renaming __xen_evtchn_do_upcall() to xen_evtchn_do_upcall() Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
6c1b980a7e |
dma-maping updates for Linux 6.6
- allow dynamic sizing of the swiotlb buffer, to cater for secure
virtualization workloads that require all I/O to be bounce buffered
(Petr Tesarik)
- move a declaration to a header (Arnd Bergmann)
- check for memory region overlap in dma-contiguous (Binglei Wang)
- remove the somewhat dangerous runtime swiotlb-xen enablement and
unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)
- per-node CMA improvements (Yajun Deng)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCgApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAmTuDHkLHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYOqvhAApMk2/ceTgVH17sXaKE822+xKvgv377O6TlggMeGG
W4zA0KD69DNz0AfaaCc5U5f7n8Ld/YY1RsvkHW4b3jgw+KRTeQr0jjitBgP5kP2M
A1+qxdyJpCTwiPt9s2+JFVPeyZ0s52V6OJODKRG3s0ore55R+U09VySKtASON+q3
GMKfWqQteKC+thg7NkrQ7JUixuo84oICws+rZn4K9ifsX2O0HYW6aMW0feRfZjJH
r0TgqZc4RdPTSaF22oapR9Ls39+7hp/pBvoLm5sBNA3cl5C3X4VWo9ERMU1jW9h+
VYQv39NycUspgskWJmpbU06/+ooYqQlwHSR/vdNusmFIvxo4tf6/UX72YO5F8Dar
ap0wYGauiEwTjSnhVxPTXk3obWyWEsgFAeRnPdTlH2CNmv38QZU2HLb8eU1pcXxX
j+WI2Ewy9z22uBVYiPOKpdW1jkSfmlmfPp/8SbAdua7I3YQ90rQN6AvU06zAi/cL
NQTgO81E4jPkygqAVgS/LeYziWAQ73yM7m9ExThtTgqFtHortwhJ4Fd8XKtvtvEb
viXAZ/WZtQBv/CIKAW98NhgIDP/SPOT8ym6V35WK+kkNFMS6LMSQUfl9GgbHGyFa
n9icMm7BmbDtT1+AKNafG9En4DtAf9M9QNidAVOyfrsIk6S0gZoZwvIStkA7on8a
cNY=
=kVVr
-----END PGP SIGNATURE-----
Merge tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping
Pull dma-maping updates from Christoph Hellwig:
- allow dynamic sizing of the swiotlb buffer, to cater for secure
virtualization workloads that require all I/O to be bounce buffered
(Petr Tesarik)
- move a declaration to a header (Arnd Bergmann)
- check for memory region overlap in dma-contiguous (Binglei Wang)
- remove the somewhat dangerous runtime swiotlb-xen enablement and
unexport is_swiotlb_active (Christoph Hellwig, Juergen Gross)
- per-node CMA improvements (Yajun Deng)
* tag 'dma-mapping-6.6-2023-08-29' of git://git.infradead.org/users/hch/dma-mapping:
swiotlb: optimize get_max_slots()
swiotlb: move slot allocation explanation comment where it belongs
swiotlb: search the software IO TLB only if the device makes use of it
swiotlb: allocate a new memory pool when existing pools are full
swiotlb: determine potential physical address limit
swiotlb: if swiotlb is full, fall back to a transient memory pool
swiotlb: add a flag whether SWIOTLB is allowed to grow
swiotlb: separate memory pool data from other allocator data
swiotlb: add documentation and rename swiotlb_do_find_slots()
swiotlb: make io_tlb_default_mem local to swiotlb.c
swiotlb: bail out of swiotlb_init_late() if swiotlb is already allocated
dma-contiguous: check for memory region overlap
dma-contiguous: support numa CMA for specified node
dma-contiguous: support per-numa CMA for all architectures
dma-mapping: move arch_dma_set_mask() declaration to header
swiotlb: unexport is_swiotlb_active
x86: always initialize xen-swiotlb when xen-pcifront is enabling
xen/pci: add flag for PCI passthrough being possible
|
|
|
|
f8941e6c4c |
xen: privcmd: Add support for irqfd
Xen provides support for injecting interrupts to the guests via the HYPERVISOR_dm_op() hypercall. The same is used by the Virtio based device backend implementations, in an inefficient manner currently. Generally, the Virtio backends are implemented to work with the Eventfd based mechanism. In order to make such backends work with Xen, another software layer needs to poll the Eventfds and raise an interrupt to the guest using the Xen based mechanism. This results in an extra context switch. This is not a new problem in Linux though. It is present with other hypervisors like KVM, etc. as well. The generic solution implemented in the kernel for them is to provide an IOCTL call to pass the interrupt details and eventfd, which lets the kernel take care of polling the eventfd and raising of the interrupt, instead of handling this in user space (which involves an extra context switch). This patch adds support to inject a specific interrupt to guest using the eventfd mechanism, by preventing the extra context switch. Inspired by existing implementations for KVM, etc.. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/8e724ac1f50c2bc1eb8da9b3ff6166f1372570aa.1692697321.git.viresh.kumar@linaro.org Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
442466e04f |
xen/xenbus: Avoid a lockdep warning when adding a watch
The following lockdep warning appears during boot on a Xen dom0 system: [ 96.388794] ====================================================== [ 96.388797] WARNING: possible circular locking dependency detected [ 96.388799] 6.4.0-rc5-default+ #8 Tainted: G EL [ 96.388803] ------------------------------------------------------ [ 96.388804] xenconsoled/1330 is trying to acquire lock: [ 96.388808] ffffffff82acdd10 (xs_watch_rwsem){++++}-{3:3}, at: register_xenbus_watch+0x45/0x140 [ 96.388847] but task is already holding lock: [ 96.388849] ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600 [ 96.388862] which lock already depends on the new lock. [ 96.388864] the existing dependency chain (in reverse order) is: [ 96.388866] -> #2 (&u->msgbuffer_mutex){+.+.}-{3:3}: [ 96.388874] __mutex_lock+0x85/0xb30 [ 96.388885] xenbus_dev_queue_reply+0x48/0x2b0 [ 96.388890] xenbus_thread+0x1d7/0x950 [ 96.388897] kthread+0xe7/0x120 [ 96.388905] ret_from_fork+0x2c/0x50 [ 96.388914] -> #1 (xs_response_mutex){+.+.}-{3:3}: [ 96.388923] __mutex_lock+0x85/0xb30 [ 96.388930] xenbus_backend_ioctl+0x56/0x1c0 [ 96.388935] __x64_sys_ioctl+0x90/0xd0 [ 96.388942] do_syscall_64+0x5c/0x90 [ 96.388950] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 96.388957] -> #0 (xs_watch_rwsem){++++}-{3:3}: [ 96.388965] __lock_acquire+0x1538/0x2260 [ 96.388972] lock_acquire+0xc6/0x2b0 [ 96.388976] down_read+0x2d/0x160 [ 96.388983] register_xenbus_watch+0x45/0x140 [ 96.388990] xenbus_file_write+0x53d/0x600 [ 96.388994] vfs_write+0xe4/0x490 [ 96.389003] ksys_write+0xb8/0xf0 [ 96.389011] do_syscall_64+0x5c/0x90 [ 96.389017] entry_SYSCALL_64_after_hwframe+0x72/0xdc [ 96.389023] other info that might help us debug this: [ 96.389025] Chain exists of: xs_watch_rwsem --> xs_response_mutex --> &u->msgbuffer_mutex [ 96.413429] Possible unsafe locking scenario: [ 96.413430] CPU0 CPU1 [ 96.413430] ---- ---- [ 96.413431] lock(&u->msgbuffer_mutex); [ 96.413432] lock(xs_response_mutex); [ 96.413433] lock(&u->msgbuffer_mutex); [ 96.413434] rlock(xs_watch_rwsem); [ 96.413436] *** DEADLOCK *** [ 96.413436] 1 lock held by xenconsoled/1330: [ 96.413438] #0: ffff888100c92068 (&u->msgbuffer_mutex){+.+.}-{3:3}, at: xenbus_file_write+0x2c/0x600 [ 96.413446] An ioctl call IOCTL_XENBUS_BACKEND_SETUP (record #1 in the report) results in calling xenbus_alloc() -> xs_suspend() which introduces ordering xs_watch_rwsem --> xs_response_mutex. The xenbus_thread() operation (record #2) creates xs_response_mutex --> &u->msgbuffer_mutex. An XS_WATCH write to the xenbus file then results in a complain about the opposite lock order &u->msgbuffer_mutex --> xs_watch_rwsem. The dependency xs_watch_rwsem --> xs_response_mutex is spurious. Avoid it and the warning by changing the ordering in xs_suspend(), first acquire xs_response_mutex and then xs_watch_rwsem. Reverse also the unlocking order in xs_suspend_cancel() for consistency, but keep xs_resume() as is because it needs to have xs_watch_rwsem unlocked only after exiting xs suspend and re-adding all watches. Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230607123624.15739-1-petr.pavlu@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
187b4c0d34 |
xen: Fix one kernel-doc comment
Use colon to separate parameter name from their specific meaning. silence the warning: drivers/xen/grant-table.c:1051: warning: Function parameter or member 'nr_pages' not described in 'gnttab_free_pages' Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6030 Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230731030037.123946-1-yang.lee@linux.alibaba.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
035a69586f |
xen: xenbus: Use helper function IS_ERR_OR_NULL()
Use IS_ERR_OR_NULL() to detect an error pointer or a null pointer open-coding to simplify the code. Signed-off-by: Li Zetao <lizetao1@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230817014736.3094289-1-lizetao1@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
71281ec9c8 |
xen: Switch to use kmemdup() helper
Use kmemdup() helper instead of open-coding to simplify the code. Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Chen Jiahao <chenjiahao16@huawei.com> Link: https://lore.kernel.org/r/20230815092434.1206386-1-ruanjinjie@huawei.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
3e0d473dcb |
xen-pciback: Remove unused function declarations
Commit
|
|
|
|
05ee774122 |
swiotlb: make io_tlb_default_mem local to swiotlb.c
SWIOTLB implementation details should not be exposed to the rest of the kernel. This will allow to make changes to the implementation without modifying non-swiotlb code. To avoid breaking existing users, provide helper functions for the few required fields. As a bonus, using a helper function to initialize struct device allows to get rid of an #ifdef in driver core. Signed-off-by: Petr Tesarik <petr.tesarik.ext@huawei.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Christoph Hellwig <hch@lst.de> |
|
|
|
c04e989484 |
xen: speed up grant-table reclaim
When a grant entry is still in use by the remote domain, Linux must put it on a deferred list. Normally, this list is very short, because the PV network and block protocols expect the backend to unmap the grant first. However, Qubes OS's GUI protocol is subject to the constraints of the X Window System, and as such winds up with the frontend unmapping the window first. As a result, the list can grow very large, resulting in a massive memory leak and eventual VM freeze. To partially solve this problem, make the number of entries that the VM will attempt to free at each iteration tunable. The default is still 10, but it can be overridden via a module parameter. This is Cc: stable because (when combined with appropriate userspace changes) it fixes a severe performance and stability problem for Qubes OS users. Cc: stable@vger.kernel.org Signed-off-by: Demi Marie Obenour <demi@invisiblethingslab.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230726165354.1252-1-demi@invisiblethingslab.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
58f6259b7a |
xen/evtchn: Introduce new IOCTL to bind static evtchn
Xen 4.17 supports the creation of static evtchns. To allow user space application to bind static evtchns introduce new ioctl "IOCTL_EVTCHN_BIND_STATIC". Existing IOCTL doing more than binding that’s why we need to introduce the new IOCTL to only bind the static event channels. Static evtchns to be available for use during the lifetime of the guest. When the application exits, __unbind_from_irq() ends up being called from release() file operations because of that static evtchns are getting closed. To avoid closing the static event channel, add the new bool variable "is_static" in "struct irq_info" to mark the event channel static when creating the event channel to avoid closing the static evtchn. Also, take this opportunity to remove the open-coded version of the evtchn close in drivers/xen/evtchn.c file and use xen_evtchn_close(). Signed-off-by: Rahul Singh <rahul.singh@arm.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Acked-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/ae7329bf1713f83e4aad4f3fa0f316258c40a3e9.1689677042.git.rahul.singh@arm.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
0d8f7cc805 |
xenbus: check xen_domain in xenbus_probe_initcall
The same way we already do in xenbus_init.
Fixes the following warning:
[ 352.175563] Trying to free already-free IRQ 0
[ 352.177355] WARNING: CPU: 1 PID: 88 at kernel/irq/manage.c:1893 free_irq+0xbf/0x350
[...]
[ 352.213951] Call Trace:
[ 352.214390] <TASK>
[ 352.214717] ? __warn+0x81/0x170
[ 352.215436] ? free_irq+0xbf/0x350
[ 352.215906] ? report_bug+0x10b/0x200
[ 352.216408] ? prb_read_valid+0x17/0x20
[ 352.216926] ? handle_bug+0x44/0x80
[ 352.217409] ? exc_invalid_op+0x13/0x60
[ 352.217932] ? asm_exc_invalid_op+0x16/0x20
[ 352.218497] ? free_irq+0xbf/0x350
[ 352.218979] ? __pfx_xenbus_probe_thread+0x10/0x10
[ 352.219600] xenbus_probe+0x7a/0x80
[ 352.221030] xenbus_probe_thread+0x76/0xc0
Fixes:
|
|
|
|
1599932894 |
xen: branch for v6.5-rc2
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZK/pZgAKCRCAXGG7T9hj vmQlAQD/xi8BUlCe0a7l6kf7+nMkOWmvpVIrmdxrqQ1Wj4c9FAEA0FuI+XXz2sow ov+il7z3UnViGsieeSHTW+Gxdn6Blgc= =LzAo -----END PGP SIGNATURE----- Merge tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a cleanup of the Xen related ELF-notes - a fix for virtio handling in Xen dom0 when running Xen in a VM * tag 'for-linus-6.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent x86/Xen: tidy xen-head.S |
|
|
|
21a235bce1 |
xen/virtio: Fix NULL deref when a bridge of PCI root bus has no parent
When attempting to run Xen on a QEMU/KVM virtual machine with virtio
devices (all x86_64), function xen_dt_get_node() crashes on accessing
bus->bridge->parent->of_node because a bridge of the PCI root bus has no
parent set:
[ 1.694192][ T1] BUG: kernel NULL pointer dereference, address: 0000000000000288
[ 1.695688][ T1] #PF: supervisor read access in kernel mode
[ 1.696297][ T1] #PF: error_code(0x0000) - not-present page
[ 1.696297][ T1] PGD 0 P4D 0
[ 1.696297][ T1] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1.696297][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.3.7-1-default #1 openSUSE Tumbleweed a577eae57964bb7e83477b5a5645a1781df990f0
[ 1.696297][ T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.15.0-0-g2dd4b9b-rebuilt.opensuse.org 04/01/2014
[ 1.696297][ T1] RIP: e030:xen_virtio_restricted_mem_acc+0xd9/0x1c0
[ 1.696297][ T1] Code: 45 0c 83 e8 c9 a3 ea ff 31 c0 eb d7 48 8b 87 40 ff ff ff 48 89 c2 48 8b 40 10 48 85 c0 75 f4 48 8b 82 10 01 00 00 48 8b 40 40 <48> 83 b8 88 02 00 00 00 0f 84 45 ff ff ff 66 90 31 c0 eb a5 48 89
[ 1.696297][ T1] RSP: e02b:ffffc90040013cc8 EFLAGS: 00010246
[ 1.696297][ T1] RAX: 0000000000000000 RBX: ffff888006c75000 RCX: 0000000000000029
[ 1.696297][ T1] RDX: ffff888005ed1000 RSI: ffffc900400f100c RDI: ffff888005ee30d0
[ 1.696297][ T1] RBP: ffff888006c75010 R08: 0000000000000001 R09: 0000000330000006
[ 1.696297][ T1] R10: ffff888005850028 R11: 0000000000000002 R12: ffffffff830439a0
[ 1.696297][ T1] R13: 0000000000000000 R14: ffff888005657900 R15: ffff888006e3e1e8
[ 1.696297][ T1] FS: 0000000000000000(0000) GS:ffff88804a000000(0000) knlGS:0000000000000000
[ 1.696297][ T1] CS: e030 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1.696297][ T1] CR2: 0000000000000288 CR3: 0000000002e36000 CR4: 0000000000050660
[ 1.696297][ T1] Call Trace:
[ 1.696297][ T1] <TASK>
[ 1.696297][ T1] virtio_features_ok+0x1b/0xd0
[ 1.696297][ T1] virtio_dev_probe+0x19c/0x270
[ 1.696297][ T1] really_probe+0x19b/0x3e0
[ 1.696297][ T1] __driver_probe_device+0x78/0x160
[ 1.696297][ T1] driver_probe_device+0x1f/0x90
[ 1.696297][ T1] __driver_attach+0xd2/0x1c0
[ 1.696297][ T1] bus_for_each_dev+0x74/0xc0
[ 1.696297][ T1] bus_add_driver+0x116/0x220
[ 1.696297][ T1] driver_register+0x59/0x100
[ 1.696297][ T1] virtio_console_init+0x7f/0x110
[ 1.696297][ T1] do_one_initcall+0x47/0x220
[ 1.696297][ T1] kernel_init_freeable+0x328/0x480
[ 1.696297][ T1] kernel_init+0x1a/0x1c0
[ 1.696297][ T1] ret_from_fork+0x29/0x50
[ 1.696297][ T1] </TASK>
[ 1.696297][ T1] Modules linked in:
[ 1.696297][ T1] CR2: 0000000000000288
[ 1.696297][ T1] ---[ end trace 0000000000000000 ]---
The PCI root bus is in this case created from ACPI description via
acpi_pci_root_add() -> pci_acpi_scan_root() -> acpi_pci_root_create() ->
pci_create_root_bus() where the last function is called with
parent=NULL. It indicates that no parent is present and then
bus->bridge->parent is NULL too.
Fix the problem by checking bus->bridge->parent in xen_dt_get_node() for
NULL first.
Fixes:
|
|
|
|
6e17c6de3d |
- Yosry Ahmed brought back some cgroup v1 stats in OOM logs.
- Yosry has also eliminated cgroup's atomic rstat flushing. - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability. - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning. - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface. - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree. - Johannes Weiner has done some cleanup work on the compaction code. - David Hildenbrand has contributed additional selftests for get_user_pages(). - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code. - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code. - Huang Ying has done some maintenance on the swap code's usage of device refcounting. - Christoph Hellwig has some cleanups for the filemap/directio code. - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses. - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings. - John Hubbard has a series of fixes to the MM selftesting code. - ZhangPeng continues the folio conversion campaign. - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock. - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8. - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management. - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code. - Vishal Moola also has done some folio conversion work. - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZJejewAKCRDdBJ7gKXxA joggAPwKMfT9lvDBEUnJagY7dbDPky1cSYZdJKxxM2cApGa42gEA6Cl8HRAWqSOh J0qXCzqaaN8+BuEyLGDVPaXur9KirwY= =B7yQ -----END PGP SIGNATURE----- Merge tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ... |
|
|
|
72dc6db7e3 |
workqueue: Ordered workqueue creation cleanups
For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This pull request clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior. -----BEGIN PGP SIGNATURE----- iIQEABYIACwWIQTfIjM1kS57o3GsC/uxYfJx3gVYGQUCZJoKnA4cdGpAa2VybmVs Lm9yZwAKCRCxYfJx3gVYGc5SAQDOtjML7Cx9AYzbY5+nYc0wTebRRTXGeOu7A3Xy j50rVgEAjHgvHLIdmeYmVhCeHOSN4q7Wn5AOwaIqZalOhfLyKQk= =hs79 -----END PGP SIGNATURE----- Merge tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull ordered workqueue creation updates from Tejun Heo: "For historical reasons, unbound workqueues with max concurrency limit of 1 are considered ordered, even though the concurrency limit hasn't been system-wide for a long time. This creates ambiguity around whether ordered execution is actually required for correctness, which was actually confusing for e.g. btrfs (btrfs updates are being routed through the btrfs tree). There aren't that many users in the tree which use the combination and there are pending improvements to unbound workqueue affinity handling which will make inadvertent use of ordered workqueue a bigger loss. This clarifies the situation for most of them by updating the ones which require ordered execution to use alloc_ordered_workqueue(). There are some conversions being routed through subsystem-specific trees and likely a few stragglers. Once they're all converted, workqueue can trigger a warning on unbound + @max_active==1 usages and eventually drop the implicit ordered behavior" * tag 'wq-for-6.5-cleanup-ordered' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: rxrpc: Use alloc_ordered_workqueue() to create ordered workqueues net: qrtr: Use alloc_ordered_workqueue() to create ordered workqueues net: wwan: t7xx: Use alloc_ordered_workqueue() to create ordered workqueues dm integrity: Use alloc_ordered_workqueue() to create ordered workqueues media: amphion: Use alloc_ordered_workqueue() to create ordered workqueues scsi: NCR5380: Use default @max_active for hostdata->work_q media: coda: Use alloc_ordered_workqueue() to create ordered workqueues crypto: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues wifi: ath10/11/12k: Use alloc_ordered_workqueue() to create ordered workqueues wifi: mwifiex: Use default @max_active for workqueues wifi: iwlwifi: Use default @max_active for trans_pcie->rba.alloc_wq xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues virt: acrn: Use alloc_ordered_workqueue() to create ordered workqueues net: octeontx2: Use alloc_ordered_workqueue() to create ordered workqueues net: thunderx: Use alloc_ordered_workqueue() to create ordered workqueues greybus: Use alloc_ordered_workqueue() to create ordered workqueues powerpc, workqueue: Use alloc_ordered_workqueue() to create ordered workqueues |
|
|
|
c33c794828 |
mm: ptep_get() conversion
Convert all instances of direct pte_t* dereferencing to instead use ptep_get() helper. This means that by default, the accesses change from a C dereference to a READ_ONCE(). This is technically the correct thing to do since where pgtables are modified by HW (for access/dirty) they are volatile and therefore we should always ensure READ_ONCE() semantics. But more importantly, by always using the helper, it can be overridden by the architecture to fully encapsulate the contents of the pte. Arch code is deliberately not converted, as the arch code knows best. It is intended that arch code (arm64) will override the default with its own implementation that can (e.g.) hide certain bits from the core code, or determine young/dirty status by mixing in state from another source. Conversion was done using Coccinelle: ---- // $ make coccicheck \ // COCCI=ptepget.cocci \ // SPFLAGS="--include-headers" \ // MODE=patch virtual patch @ depends on patch @ pte_t *v; @@ - *v + ptep_get(v) ---- Then reviewed and hand-edited to avoid multiple unnecessary calls to ptep_get(), instead opting to store the result of a single call in a variable, where it is correct to do so. This aims to negate any cost of READ_ONCE() and will benefit arch-overrides that may be more complex. Included is a fix for an issue in an earlier version of this patch that was pointed out by kernel test robot. The issue arose because config MMU=n elides definition of the ptep helper functions, including ptep_get(). HUGETLB_PAGE=n configs still define a simple huge_ptep_clear_flush() for linking purposes, which dereferences the ptep. So when both configs are disabled, this caused a build error because ptep_get() is not defined. Fix by continuing to do a direct dereference when MMU=n. This is safe because for this config the arch code cannot be trying to virtualize the ptes because none of the ptep helpers are defined. Link: https://lkml.kernel.org/r/20230612151545.3317766-4-ryan.roberts@arm.com Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/oe-kbuild-all/202305120142.yXsNEo6H-lkp@intel.com/ Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Potapenko <glider@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alex Williamson <alex.williamson@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Dave Airlie <airlied@gmail.com> Cc: Dimitri Sivanich <dimitri.sivanich@hpe.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jérôme Glisse <jglisse@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Lorenzo Stoakes <lstoakes@gmail.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miaohe Lin <linmiaohe@huawei.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <muchun.song@linux.dev> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: SeongJae Park <sj@kernel.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> |
|
|
|
4e893b5aa4 |
xen: branch for v6.4-rc4
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZHGVRAAKCRCAXGG7T9hj vqtJAQDizKasLE7tSnfs/FrZ/4xPaDLe3bQifMx2C1dtYCjRcAD/ciZSa1L0WzZP dpEZnlYRzsR3bwLktQEMQFOvlbh1SwE= =K860 -----END PGP SIGNATURE----- Merge tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a double free fix in the Xen pvcalls backend driver - a fix for a regression causing the MSI related sysfs entries to not being created in Xen PV guests - a fix in the Xen blkfront driver for handling insane input data better * tag 'for-linus-6.4-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/pci/xen: populate MSI sysfs entries xen/pvcalls-back: fix double frees with pvcalls_new_active_socket() xen/blkfront: Only check REQ_FUA for writes |
|
|
|
8fafac202d |
xen/pvcalls-back: fix double frees with pvcalls_new_active_socket()
In the pvcalls_new_active_socket() function, most error paths call
pvcalls_back_release_active(fedata->dev, fedata, map) which calls
sock_release() on "sock". The bug is that the caller also frees sock.
Fix this by making every error path in pvcalls_new_active_socket()
release the sock, and don't free it in the caller.
Fixes:
|
|
|
|
715557b02c |
xen/pvcalls: Use alloc_ordered_workqueue() to create ordered workqueues
BACKGROUND ========== When multiple work items are queued to a workqueue, their execution order doesn't match the queueing order. They may get executed in any order and simultaneously. When fully serialized execution - one by one in the queueing order - is needed, an ordered workqueue should be used which can be created with alloc_ordered_workqueue(). However, alloc_ordered_workqueue() was a later addition. Before it, an ordered workqueue could be obtained by creating an UNBOUND workqueue with @max_active==1. This originally was an implementation side-effect which was broken by |
|
|
|
35fab9271b |
xen: branch for v6.4-rc1
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZEolJQAKCRCAXGG7T9hj vuVMAP9B3WLzszen3/XCM2E6sZurtmD+YPkUrbES2AsEE1PH3gEA73ZxM1C+gvKS 5be7Dksgeyqyqrwhb9/VOyHU3pmyrAw= =zirQ -----END PGP SIGNATURE----- Merge tag 'for-linus-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen updates from Juergen Gross: - some cleanups in the Xen blkback driver - fix potential sleeps under lock in various Xen drivers * tag 'for-linus-6.4-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/blkback: move blkif_get_x86_*_req() into blkback.c xen/blkback: simplify free_persistent_gnts() interface xen/blkback: remove stale prototype xen/blkback: fix white space code style issues xen/pvcalls: don't call bind_evtchn_to_irqhandler() under lock xen/scsiback: don't call scsiback_free_translation_entry() under lock xen/pciback: don't call pcistub_device_put() under lock |
|
|
|
888d3c9f7f |
sysctl-6.4-rc1
This pull request goes with only a few sysctl moves from the kernel/sysctl.c file, the rest of the work has been put towards deprecating two API calls which incur recursion and prevent us from simplifying the registration process / saving memory per move. Most of the changes have been soaking on linux-next since v6.3-rc3. I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's feedback that we should see if we could *save* memory with these moves instead of incurring more memory. We currently incur more memory since when we move a syctl from kernel/sysclt.c out to its own file we end up having to add a new empty sysctl used to register it. To achieve saving memory we want to allow syctls to be passed without requiring the end element being empty, and just have our registration process rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls would make the sysctl registration pretty brittle, hard to read and maintain as can be seen from Meng Tang's efforts to do just this [0]. Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations also implies doing the work to deprecate two API calls which use recursion in order to support sysctl declarations with subdirectories. And so during this development cycle quite a bit of effort went into this deprecation effort. I've annotated the following two APIs are deprecated and in few kernel releases we should be good to remove them: * register_sysctl_table() * register_sysctl_paths() During this merge window we should be able to deprecate and unexport register_sysctl_paths(), we can probably do that towards the end of this merge window. Deprecating register_sysctl_table() will take a bit more time but this pull request goes with a few example of how to do this. As it turns out each of the conversions to move away from either of these two API calls *also* saves memory. And so long term, all these changes *will* prove to have saved a bit of memory on boot. The way I see it then is if remove a user of one deprecated call, it gives us enough savings to move one kernel/sysctl.c out from the generic arrays as we end up with about the same amount of bytes. Since deprecating register_sysctl_table() and register_sysctl_paths() does not require maintainer coordination except the final unexport you'll see quite a bit of these changes from other pull requests, I've just kept the stragglers after rc3. Most of these changes have been soaking on linux-next since around rc3. [0] https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org -----BEGIN PGP SIGNATURE----- iQJGBAABCgAwFiEENnNq2KuOejlQLZofziMdCjCSiKcFAmRHAjQSHG1jZ3JvZkBr ZXJuZWwub3JnAAoJEM4jHQowkoinTzgQAI/uKHKi0VlUR1l2Psl0XbseUVueuyj3 ZDxSJpbVUmsoDf2MlLjzB8mYE3ricnNTDbLr7qOyA6pXdM1N0mY5LQmRVRu8/ffd 2T1hQ5pl7YnJdWP5dPhcF9Y+jnu1tjX1MW5DS4fzllwK7FnD86HuIruGq52RAPS/ /FH+BD9eodLWWXk6A/o2GFqoWxPKQI0GLxEYWa7Hg7yt8E/3PQL9QsRzn8i6U+HW BrN/+G3YD1VCCzXu0UAeXnm+i1Z7CdvqNdZuSkvE3DObiZ5WpOS+/i7FrDB7zdiu zAbHaifHnDPtcK3w2ZodbLAAwEWD/mG4iwIjE2kgIMVYxBv7TFDBRREXAWYAevIT UUuZnWDQsGaWdjywrebaUycEfd6dytKyan0fTXgMFkcoWRjejhitfdM2iZDdQROg q453p4HqOw4vTrhy4ov4zOX7J3EFiBzpZdl+SmLqcXk+jbLVb/Q9snUWz1AFtHBl gHoP5bS82uVktGG3MsObjgTzYYMQjO9YGIrVuW1VP9uWs8WaoWx6M9FQJIIhtwE+ h6wG2s7CjuFWnS0/IxWmDOn91QyUn1w7ohiz9TuvYj/5GLSBpBDGCJHsNB5T2WS1 qbQRaZ2Kg3j9TeyWfXxdlxBx7bt3ni+J/IXDY0zom2sTpGHKl8D2g5AzmEXJDTpl kd7Z3gsmwhDh =0U0W -----END PGP SIGNATURE----- Merge tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux Pull sysctl updates from Luis Chamberlain: "This only does a few sysctl moves from the kernel/sysctl.c file, the rest of the work has been put towards deprecating two API calls which incur recursion and prevent us from simplifying the registration process / saving memory per move. Most of the changes have been soaking on linux-next since v6.3-rc3. I've slowed down the kernel/sysctl.c moves due to Matthew Wilcox's feedback that we should see if we could *save* memory with these moves instead of incurring more memory. We currently incur more memory since when we move a syctl from kernel/sysclt.c out to its own file we end up having to add a new empty sysctl used to register it. To achieve saving memory we want to allow syctls to be passed without requiring the end element being empty, and just have our registration process rely on ARRAY_SIZE(). Without this, supporting both styles of sysctls would make the sysctl registration pretty brittle, hard to read and maintain as can be seen from Meng Tang's efforts to do just this [0]. Fortunately, in order to use ARRAY_SIZE() for all sysctl registrations also implies doing the work to deprecate two API calls which use recursion in order to support sysctl declarations with subdirectories. And so during this development cycle quite a bit of effort went into this deprecation effort. I've annotated the following two APIs are deprecated and in few kernel releases we should be good to remove them: - register_sysctl_table() - register_sysctl_paths() During this merge window we should be able to deprecate and unexport register_sysctl_paths(), we can probably do that towards the end of this merge window. Deprecating register_sysctl_table() will take a bit more time but this pull request goes with a few example of how to do this. As it turns out each of the conversions to move away from either of these two API calls *also* saves memory. And so long term, all these changes *will* prove to have saved a bit of memory on boot. The way I see it then is if remove a user of one deprecated call, it gives us enough savings to move one kernel/sysctl.c out from the generic arrays as we end up with about the same amount of bytes. Since deprecating register_sysctl_table() and register_sysctl_paths() does not require maintainer coordination except the final unexport you'll see quite a bit of these changes from other pull requests, I've just kept the stragglers after rc3" Link: https://lkml.kernel.org/r/ZAD+cpbrqlc5vmry@bombadil.infradead.org [0] * tag 'sysctl-6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux: (29 commits) fs: fix sysctls.c built mm: compaction: remove incorrect #ifdef checks mm: compaction: move compaction sysctl to its own file mm: memory-failure: Move memory failure sysctls to its own file arm: simplify two-level sysctl registration for ctl_isa_vars ia64: simplify one-level sysctl registration for kdump_ctl_table utsname: simplify one-level sysctl registration for uts_kern_table ntfs: simplfy one-level sysctl registration for ntfs_sysctls coda: simplify one-level sysctl registration for coda_table fs/cachefiles: simplify one-level sysctl registration for cachefiles_sysctls xfs: simplify two-level sysctl registration for xfs_table nfs: simplify two-level sysctl registration for nfs_cb_sysctls nfs: simplify two-level sysctl registration for nfs4_cb_sysctls lockd: simplify two-level sysctl registration for nlm_sysctls proc_sysctl: enhance documentation xen: simplify sysctl registration for balloon md: simplify sysctl registration hv: simplify sysctl registration scsi: simplify sysctl registration with register_sysctl() csky: simplify alignment sysctl registration ... |
|
|
|
b68ee1c613 |
SCSI misc on 20230426
Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target, mpi3mr, hisi_sas, arcmsr). The major core change is the constification of the host templates (which touches everything) along with other minor fixups and clean ups. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZEmJACYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU4FAP0WYhFC rkbY203/+ErUuwvOKum0VwJKUowCaUD0MBwScAD+Ok/NWobmjdXUBbPUbvVkr+hE 8B/xs9hodX+1fVJcVG0= =fS/j -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: "Updates to the usual drivers (megaraid_sas, scsi_debug, lpfc, target, mpi3mr, hisi_sas, arcmsr). The major core change is the constification of the host templates (which touches everything) along with other minor fixups and clean ups" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: mcq: Use pointer arithmetic in ufshcd_send_command() scsi: ufs: mcq: Annotate ufshcd_inc_sq_tail() appropriately scsi: cxlflash: s/semahpore/semaphore/ scsi: lpfc: Silence an incorrect device output scsi: mpi3mr: Use IRQ save variants of spinlock to protect chain frame allocation scsi: scsi_debug: Fix missing error code in scsi_debug_init() scsi: hisi_sas: Work around build failure in suspend function scsi: lpfc: Fix ioremap issues in lpfc_sli4_pci_mem_setup() scsi: mpt3sas: Fix an issue when driver is being removed scsi: mpt3sas: Remove HBA BIOS version in the kernel log scsi: target: core: Fix invalid memory access scsi: scsi_debug: Drop sdebug_queue scsi: scsi_debug: Only allow sdebug_max_queue be modified when no shosts scsi: scsi_debug: Use scsi_host_busy() in delay_store() and ndelay_store() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in stop_all_queued() scsi: scsi_debug: Use blk_mq_tagset_busy_iter() in sdebug_blk_mq_poll() scsi: scsi_debug: Dynamically allocate sdebug_queued_cmd scsi: scsi_debug: Use scsi_block_requests() to block queues scsi: scsi_debug: Protect block_unblock_all_queues() with mutex scsi: scsi_debug: Change shost list lock to a mutex ... |
|
|
|
c66bb48edd |
xen/pvcalls: don't call bind_evtchn_to_irqhandler() under lock
bind_evtchn_to_irqhandler() shouldn't be called under spinlock, as it can sleep. This requires to move the calls of create_active() out of the locked regions. This is no problem, as the worst which could happen would be a spurious call of the interrupt handler, causing a spurious wake_up(). Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Reviewed-by: Stefano Stabellini <sstabellini@kernel.org> Link: https://lore.kernel.org/r/20230403092711.15285-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
b2c042cc80 |
xen/scsiback: don't call scsiback_free_translation_entry() under lock
scsiback_free_translation_entry() shouldn't be called under spinlock, as it can sleep. This requires to split removing a translation entry from the v2p list from actually calling kref_put() for the entry. Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20230328084602.20729-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
fae65ef3a1 |
xen/pciback: don't call pcistub_device_put() under lock
pcistub_device_put() shouldn't be called under spinlock, as it can sleep. For this reason pcistub_device_get_pci_dev() needs to be modified: instead of always calling pcistub_device_get() just do the call of pcistub_device_get() only if it is really needed. This removes the need to call pcistub_device_put(). Reported-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/lkml/Y+JUIl64UDmdkboh@kadam/ Signed-off-by: Juergen Gross <jgross@suse.com> Reviewed-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com> Link: https://lore.kernel.org/r/20230328084549.20695-1-jgross@suse.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
9f17a75b2d |
xen: simplify sysctl registration for balloon
register_sysctl_table() is a deprecated compatibility wrapper. register_sysctl_init() can do the directory creation for you so just use that. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> |
|
|
|
073828e954 |
ACPI: processor: Fix evaluating _PDC method when running as Xen dom0
In ACPI systems, the OS can direct power management, as opposed to the
firmware. This OS-directed Power Management is called OSPM. Part of
telling the firmware that the OS going to direct power management is
making ACPI "_PDC" (Processor Driver Capabilities) calls. These _PDC
methods must be evaluated for every processor object. If these _PDC
calls are not completed for every processor it can lead to
inconsistency and later failures in things like the CPU frequency
driver.
In a Xen system, the dom0 kernel is responsible for system-wide power
management. The dom0 kernel is in charge of OSPM. However, the
number of CPUs available to dom0 can be different than the number of
CPUs physically present on the system.
This leads to a problem: the dom0 kernel needs to evaluate _PDC for
all the processors, but it can't always see them.
In dom0 kernels, ignore the existing ACPI method for determining if a
processor is physically present because it might not be accurate.
Instead, ask the hypervisor for this information.
Fix this by introducing a custom function to use when running as Xen
dom0 in order to check whether a processor object matches a CPU that's
online. Such checking is done using the existing information fetched
by the Xen pCPU subsystem, extending it to also store the ACPI ID.
This ensures that _PDC method gets evaluated for all physically online
CPUs, regardless of the number of CPUs made available to dom0.
Fixes:
|
|
|
|
0eb392ec09 |
xen: branch for v6.3-rc3
-----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQRTLbB6QfY48x44uB6AXGG7T9hjvgUCZBQKJwAKCRCAXGG7T9hj vuVgAQDhvr5mBFNqFxIfTnE8+oEsnYb0OgmR+9U3h+ECDB0P0gEAmR1fAee441YE 2DWOAlvjmqoI2K8DTTabizXvm7x3bQk= =jcYl -----END PGP SIGNATURE----- Merge tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - cleanup for xen time handling - enable the VGA console in a Xen PVH dom0 - cleanup in the xenfs driver * tag 'for-linus-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen: remove unnecessary (void*) conversions x86/PVH: obtain VGA console info in Dom0 x86/xen/time: cleanup xen_tsc_safe_clocksource xen: update arch/x86/include/asm/xen/cpuid.h |
|
|
|
355c3d6135 |
scsi: xen-scsiback: Remove default fabric ops callouts
Remove callouts that are identical to the default implementations in TCM Core. Acked-by: Juergen Gross <jgross@suse.com> Signed-off-by: Dmitry Bogdanov <d.bogdanov@yadro.com> Link: https://lore.kernel.org/r/20230313181110.20566-10-d.bogdanov@yadro.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> |
|
|
|
7ad2c39860 |
xen: remove unnecessary (void*) conversions
Pointer variables of void * type do not require type cast. Signed-off-by: Yu Zhe <yuzhe@nfschina.com> Reviewed-by: Juergen Gross <jgross@suse.com> Link: https://lore.kernel.org/r/20230316083954.4223-1-yuzhe@nfschina.com Signed-off-by: Juergen Gross <jgross@suse.com> |
|
|
|
a93e884edf |
Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work falls
into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be moved
into read-only memory (i.e. const) The recent work with Rust has
pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only making
things safer overall. This is the contuation of that work (started
last release with kobject changes) in moving struct bus_type to be
constant. We didn't quite make it for this release, but the
remaining patches will be finished up for the release after this
one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
6oeFOjD3JDju3cQsfGgd
=Su6W
-----END PGP SIGNATURE-----
Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the large set of driver core changes for 6.3-rc1.
There's a lot of changes this development cycle, most of the work
falls into two different categories:
- fw_devlink fixes and updates. This has gone through numerous review
cycles and lots of review and testing by lots of different devices.
Hopefully all should be good now, and Saravana will be keeping a
watch for any potential regression on odd embedded systems.
- driver core changes to work to make struct bus_type able to be
moved into read-only memory (i.e. const) The recent work with Rust
has pointed out a number of areas in the driver core where we are
passing around and working with structures that really do not have
to be dynamic at all, and they should be able to be read-only
making things safer overall. This is the contuation of that work
(started last release with kobject changes) in moving struct
bus_type to be constant. We didn't quite make it for this release,
but the remaining patches will be finished up for the release after
this one, but the groundwork has been laid for this effort.
Other than that we have in here:
- debugfs memory leak fixes in some subsystems
- error path cleanups and fixes for some never-able-to-be-hit
codepaths.
- cacheinfo rework and fixes
- Other tiny fixes, full details are in the shortlog
All of these have been in linux-next for a while with no reported
problems"
[ Geert Uytterhoeven points out that that last sentence isn't true, and
that there's a pending report that has a fix that is queued up - Linus ]
* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
OPP: fix error checking in opp_migrate_dentry()
debugfs: update comment of debugfs_rename()
i3c: fix device.h kernel-doc warnings
dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
Revert "driver core: add error handling for devtmpfs_create_node()"
Revert "devtmpfs: add debug info to handle()"
Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
driver core: cpu: don't hand-override the uevent bus_type callback.
devtmpfs: remove return value of devtmpfs_delete_node()
devtmpfs: add debug info to handle()
driver core: add error handling for devtmpfs_create_node()
driver core: bus: update my copyright notice
driver core: bus: add bus_get_dev_root() function
driver core: bus: constify bus_unregister()
driver core: bus: constify some internal functions
driver core: bus: constify bus_get_kset()
driver core: bus: constify bus_register/unregister_notifier()
driver core: remove private pointer from struct bus_type
...
|
|
|
|
3822a7c409 |
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter". These filters provide users
with finer-grained control over DAMOS's actions. SeongJae has also done
some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series "mm:
support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with his
series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings. The previous BPF-based approach had
shortcomings. See "mm: In-kernel support for memory-deny-write-execute
(MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a per-node
basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage during
compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in ths
series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's series
"mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
"fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest of
the kernel in the series "Simplify the external interface for GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the series
"mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
=MlGs
-----END PGP SIGNATURE-----
Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
memfd creation time, with the option of sealing the state of the X
bit.
- Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
thread-safe for pmd unshare") which addresses a rare race condition
related to PMD unsharing.
- Several folioification patch serieses from Matthew Wilcox, Vishal
Moola, Sidhartha Kumar and Lorenzo Stoakes
- Johannes Weiner has a series ("mm: push down lock_page_memcg()")
which does perform some memcg maintenance and cleanup work.
- SeongJae Park has added DAMOS filtering to DAMON, with the series
"mm/damon/core: implement damos filter".
These filters provide users with finer-grained control over DAMOS's
actions. SeongJae has also done some DAMON cleanup work.
- Kairui Song adds a series ("Clean up and fixes for swap").
- Vernon Yang contributed the series "Clean up and refinement for maple
tree".
- Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
adds to MGLRU an LRU of memcgs, to improve the scalability of global
reclaim.
- David Hildenbrand has added some userfaultfd cleanup work in the
series "mm: uffd-wp + change_protection() cleanups".
- Christoph Hellwig has removed the generic_writepages() library
function in the series "remove generic_writepages".
- Baolin Wang has performed some maintenance on the compaction code in
his series "Some small improvements for compaction".
- Sidhartha Kumar is doing some maintenance work on struct page in his
series "Get rid of tail page fields".
- David Hildenbrand contributed some cleanup, bugfixing and
generalization of pte management and of pte debugging in his series
"mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
swap PTEs".
- Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
flag in the series "Discard __GFP_ATOMIC".
- Sergey Senozhatsky has improved zsmalloc's memory utilization with
his series "zsmalloc: make zspage chain size configurable".
- Joey Gouly has added prctl() support for prohibiting the creation of
writeable+executable mappings.
The previous BPF-based approach had shortcomings. See "mm: In-kernel
support for memory-deny-write-execute (MDWE)".
- Waiman Long did some kmemleak cleanup and bugfixing in the series
"mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
- T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
"mm: multi-gen LRU: improve".
- Jiaqi Yan has provided some enhancements to our memory error
statistics reporting, mainly by presenting the statistics on a
per-node basis. See the series "Introduce per NUMA node memory error
statistics".
- Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
regression in compaction via his series "Fix excessive CPU usage
during compaction".
- Christoph Hellwig does some vmalloc maintenance work in the series
"cleanup vfree and vunmap".
- Christoph Hellwig has removed block_device_operations.rw_page() in
ths series "remove ->rw_page".
- We get some maple_tree improvements and cleanups in Liam Howlett's
series "VMA tree type safety and remove __vma_adjust()".
- Suren Baghdasaryan has done some work on the maintainability of our
vm_flags handling in the series "introduce vm_flags modifier
functions".
- Some pagemap cleanup and generalization work in Mike Rapoport's
series "mm, arch: add generic implementation of pfn_valid() for
FLATMEM" and "fixups for generic implementation of pfn_valid()"
- Baoquan He has done some work to make /proc/vmallocinfo and
/proc/kcore better represent the real state of things in his series
"mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
- Jason Gunthorpe rationalized the GUP system's interface to the rest
of the kernel in the series "Simplify the external interface for
GUP".
- SeongJae Park wishes to migrate people from DAMON's debugfs interface
over to its sysfs interface. To support this, we'll temporarily be
printing warnings when people use the debugfs interface. See the
series "mm/damon: deprecate DAMON debugfs interface".
- Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
and clean-ups" series.
- Huang Ying has provided a dramatic reduction in migration's TLB flush
IPI rates with the series "migrate_pages(): batch TLB flushing".
- Arnd Bergmann has some objtool fixups in "objtool warning fixes".
* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
include/linux/migrate.h: remove unneeded externs
mm/memory_hotplug: cleanup return value handing in do_migrate_range()
mm/uffd: fix comment in handling pte markers
mm: change to return bool for isolate_movable_page()
mm: hugetlb: change to return bool for isolate_hugetlb()
mm: change to return bool for isolate_lru_page()
mm: change to return bool for folio_isolate_lru()
objtool: add UACCESS exceptions for __tsan_volatile_read/write
kmsan: disable ftrace in kmsan core code
kasan: mark addr_has_metadata __always_inline
mm: memcontrol: rename memcg_kmem_enabled()
sh: initialize max_mapnr
m68k/nommu: add missing definition of ARCH_PFN_OFFSET
mm: percpu: fix incorrect size in pcpu_obj_full_size()
maple_tree: reduce stack usage with gcc-9 and earlier
mm: page_alloc: call panic() when memoryless node allocation fails
mm: multi-gen LRU: avoid futile retries
migrate_pages: move THP/hugetlb migration support check to simplify code
migrate_pages: batch flushing TLB
migrate_pages: share more code between _unmap and _move
...
|
|
|
|
06e1a81c48 |
A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
- Preparatory refactoring and cleanup work in the efivar layer by Johan,
which is needed to accommodate the Snapdragon arm64 laptops that
expose their EFI variable store via a TEE secure world API.
- Enhancements to the EFI memory map handling so that Xen dom0 can
safely access EFI configuration tables (Demi Marie)
- Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
table, so that firmware that is generated with ENDBR/BTI landing pads
will be mapped with enforcement enabled.
- Clean up how we check and print the EFI revision exposed by the
firmware.
- Incorporate EFI memory attributes protocol definition contributed by
Evgeniy and wire it up in the EFI zboot code. This ensures that these
images can execute under new and stricter rules regarding the default
memory permissions for EFI page allocations. (More work is in progress
here)
- CPER header cleanup by Dan Williams
- Use a raw spinlock to protect the EFI runtime services stack on arm64
to ensure the correct semantics under -rt. (Pierre)
- EFI framebuffer quirk for Lenovo Ideapad by Darrell.
-----BEGIN PGP SIGNATURE-----
iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
f/tLec86
=+pOz
-----END PGP SIGNATURE-----
Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi
Pull EFI updates from Ard Biesheuvel:
"A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon (Andy)
- Preparatory refactoring and cleanup work in the efivar layer, which
is needed to accommodate the Snapdragon arm64 laptops that expose
their EFI variable store via a TEE secure world API (Johan)
- Enhancements to the EFI memory map handling so that Xen dom0 can
safely access EFI configuration tables (Demi Marie)
- Wire up the newly introduced IBT/BTI flag in the EFI memory
attributes table, so that firmware that is generated with ENDBR/BTI
landing pads will be mapped with enforcement enabled
- Clean up how we check and print the EFI revision exposed by the
firmware
- Incorporate EFI memory attributes protocol definition and wire it
up in the EFI zboot code (Evgeniy)
This ensures that these images can execute under new and stricter
rules regarding the default memory permissions for EFI page
allocations (More work is in progress here)
- CPER header cleanup (Dan Williams)
- Use a raw spinlock to protect the EFI runtime services stack on
arm64 to ensure the correct semantics under -rt (Pierre)
- EFI framebuffer quirk for Lenovo Ideapad (Darrell)"
* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
arm64: efi: Make efi_rt_lock a raw_spinlock
efi: Add mixed-mode thunk recipe for GetMemoryAttributes
efi: x86: Wire up IBT annotation in memory attributes table
efi: arm64: Wire up BTI annotation in memory attributes table
efi: Discover BTI support in runtime services regions
efi/cper, cxl: Remove cxl_err.h
efi: Use standard format for printing the EFI revision
efi: Drop minimum EFI version check at boot
efi: zboot: Use EFI protocol to remap code/data with the right attributes
efi/libstub: Add memory attribute protocol definitions
efi: efivars: prevent double registration
efi: verify that variable services are supported
efivarfs: always register filesystem
efi: efivars: add efivars printk prefix
efi: Warn if trying to reserve memory under Xen
efi: Actually enable the ESRT under Xen
efi: Apply allowlist to EFI configuration tables when running under Xen
efi: xen: Implement memory descriptor lookup based on hypercall
efi: memmap: Disregard bogus entries instead of returning them
...
|