Commit Graph

8372 Commits

Author SHA1 Message Date
Christophe Leroy 830213786c powerpc/ftrace: directly call of function graph tracer by ftrace caller
Modify function graph tracer to be handled directly by the standard
ftrace caller.

This is made possible as powerpc now supports
CONFIG_DYNAMIC_FTRACE_WITH_ARGS.

This change simplifies the call of function graph ftrace.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/04d196585ff81bde06a000bd9c633a33a5b21130.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy 0c81ed5ed4 powerpc/ftrace: Refactor ftrace_{en/dis}able_ftrace_graph_caller
ftrace_enable_ftrace_graph_caller() and
ftrace_disable_ftrace_graph_caller() have common code.

They will have even more common code after following patch.

Refactor into a single ftrace_modify_ftrace_graph_caller() function.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f37785a531f1a8f201e1b3da45997a5c77e9d820.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy c75388a8ce powerpc/ftrace: Prepare PPC64's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS
In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change ftrace_caller()
to handle LIVEPATCH the same way as frace_caller_regs().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/850817333cc76593699032e8e9a70d8c36e1af1e.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy d95bf254be powerpc/ftrace: Prepare PPC32's ftrace_caller() for CONFIG_DYNAMIC_FTRACE_WITH_ARGS
In order to implement CONFIG_DYNAMIC_FTRACE_WITH_ARGS, change
ftrace_caller() stack layout to match struct pt_regs.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/da9734eba504998fb914aca12131c9f6bf6120a8.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy 7bdb478c1d powerpc/ftrace: Simplify PPC32's return_to_handler()
return_to_handler() was copied from PPC64. For PPC32 it
just needs to save r3 and r4, and doesn't require any nop
after the bl.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aab39b77b34fb2c4ed08ed01c547b6ed13643788.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:11 +11:00
Christophe Leroy 7875bc9b07 powerpc/ftrace: Don't save again LR in ftrace_regs_caller() on PPC32
PPC32 mcount() caller already saves LR on stack,
no need to save it again.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/eadcfc770b4f1e35535ffb85e28e858a2c31dec4.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy a4520b2527 powerpc/ftrace: Add support for livepatch to PPC32
PPC64 needs some special logic to properly set up the TOC.
See commit 85baa09549 ("powerpc/livepatch: Add live patching support
on ppc64le") for details.

PPC32 doesn't have TOC so it doesn't need that logic, so adding
LIVEPATCH support is straight forward.

Add CONFIG_LIVEPATCH_64 and move livepatch stack logic into that item.

Livepatch sample modules all work.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/63cb094125b6a6038c65eeac2abaabbabe63addd.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy 0c850965d6 powerpc/module_32: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.

R_PPC_ADDR16_LO, R_PPC_ADDR16_HI, R_PPC_ADDR16_HA and R_PPC_REL24 are
the types generated by the kpatch-build userspace tool or klp-convert
kernel tree observed applying a relocation to a post-init module.

Use patch_instruction() to patch those relocations.

Commit 8734b41b3e ("powerpc/module_64: Fix livepatching for
RO modules") did similar change in module_64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/d5697157cb7dba3927e19aa17c915a83bc550bb2.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy 27e21e8f12 powerpc/32: Remove _ENTRY() macro
_ENTRY() is now redundant with _GLOBAL(). Remove it.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/62a35f8dde2bb74c8d0d7a5430cce07a5a3a6fb6.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy 1231816373 powerpc/32: Remove remaining .stabs annotations
STABS debug format has been superseded long time ago by DWARF.

Remove the few remaining .stabs annotations from old 32 bits code.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/68932ec2ba6b868d35006b96e90f0890f3da3c05.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07 21:03:10 +11:00
Christophe Leroy 9bb162fa26 powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCE
Allthough kernel text is always mapped with BATs, we still have
inittext mapped with pages, so TLB miss handling is required
when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set.

The final solution should be to set a BAT that also maps inittext
but that BAT then needs to be cleared at end of init, and it will
require more changes to be able to do it properly.

As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big
deal so let's fix it simply for now to enable easy stable application.

Fixes: 035b19a15a ("powerpc/32s: Always map kernel text and rodata with BATs")
Cc: stable@vger.kernel.org # v5.11+
Reported-by: Maxime Bizon <mbizon@freebox.fr>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu
2022-02-07 17:18:47 +11:00
Sourabh Jain 7c5ed82b80 powerpc: Set crashkernel offset to mid of RMA region
On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.

The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.

Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.

This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html

Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com>
Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com
2022-02-07 15:26:12 +11:00
Christophe Leroy 4634bf4455 powerpc/603: Clear C bit when PTE is read only
On book3s/32 MMU, PP bits don't offer kernel RO protection,
kernel pages are always RW.

However, on the 603 a page fault is always generated when the
C bit (change bit = dirty bit) is not set.

Enforce kernel RO protection by clearing C bit in TLB miss
handler when the page doesn't have _PAGE_RW flag.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/bbb13848ff0100a76ee9ea95118058c30ae95f2c.1643613343.git.christophe.leroy@csgroup.eu
2022-02-03 22:56:44 +11:00
Christophe Leroy 9872cbfb45 powerpc/603: Remove outdated comment
Since commit 84de6ab0e9 ("powerpc/603: don't handle PAGE_ACCESSED
in TLB miss handlers.") page table is not updated anymore by
TLB miss handlers.

Remove the comment.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/38b1ffefd2146fa56bf8aa605d476ad9736bbb37.1643613296.git.christophe.leroy@csgroup.eu
2022-02-03 22:38:13 +11:00
Wedson Almeida Filho d4be60fe66 powerpc/module_64: use module_init_section instead of patching names
Without this patch, module init sections are disabled by patching their
names in arch-specific code when they're loaded (which prevents code in
layout_sections from finding init sections). This patch uses the new
arch-specific module_init_section instead.

This allows modules that have .init_array sections to have the
initialisers properly called (on load, before init). Without this patch,
the initialisers are not called because .init_array is renamed to
_init_array, and thus isn't found by code in find_module_sections().

Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220202055123.2144842-1-wedsonaf@google.com
2022-02-03 22:20:37 +11:00
Fabiano Rosas b53c861059 powerpc: Fix debug print in smp_setup_cpu_maps
When figuring out the number of threads, the debug message prints "1
thread" for the first iteration of the loop, instead of the actual
number of threads calculated from the length of the
"ibm,ppc-interrupt-server#s" property.

  * /cpus/PowerPC,POWER8@20...
    ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG
    thread 0 -> cpu 0 (hard id 32)
    thread 1 -> cpu 1 (hard id 33)
    thread 2 -> cpu 2 (hard id 34)
    thread 3 -> cpu 3 (hard id 35)
    thread 4 -> cpu 4 (hard id 36)
    thread 5 -> cpu 5 (hard id 37)
    thread 6 -> cpu 6 (hard id 38)
    thread 7 -> cpu 7 (hard id 39)
  * /cpus/PowerPC,POWER8@28...
    ibm,ppc-interrupt-server#s -> 8 threads
    thread 0 -> cpu 8 (hard id 40)
    thread 1 -> cpu 9 (hard id 41)
    thread 2 -> cpu 10 (hard id 42)
    thread 3 -> cpu 11 (hard id 43)
    thread 4 -> cpu 12 (hard id 44)
    thread 5 -> cpu 13 (hard id 45)
    thread 6 -> cpu 14 (hard id 46)
    thread 7 -> cpu 15 (hard id 47)
(...)

Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com
2022-02-03 20:24:59 +11:00
Michael Ellerman 2e7f1e2b30 powerpc/64: Move paca allocation later in boot
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size()
and paca allocation.

The first is that on a Radix capable machine but with "disable_radix" on
the command line, there is a window during early boot where
early_radix_enabled() is true, even though it will later become false.

  early_init_devtree:                       <- early_radix_enabled() = false
    early_init_dt_scan_cpus:                <- early_radix_enabled() = false
        ...
        check_cpu_pa_features:              <- early_radix_enabled() = false
        ...                               ^ <- early_radix_enabled() = TRUE
        allocate_paca:                    | <- early_radix_enabled() = TRUE
            ...                           |
            ppc64_bolted_size:            | <- early_radix_enabled() = TRUE
                if (early_radix_enabled())| <- early_radix_enabled() = TRUE
                    return ULONG_MAX;     |
        ...                               |
    ...                                   | <- early_radix_enabled() = TRUE
    ...                                   | <- early_radix_enabled() = TRUE
    mmu_early_init_devtree()              V
    ...                                     <- early_radix_enabled() = false

This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's
paca allocation, even though later it will return a different value.
This is not currently a bug because the paca allocation is also limited
by the RMA size, but that is very fragile.

The second issue is that when using the Hash MMU, when we call
ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet
detected whether 1T segments are available. That causes
ppc64_bolted_size() to return 256MB, even if the machine can actually
support up to 1T. This is usually OK, we generally have space below
256MB for one paca, but for a kdump kernel placed above 256MB it causes
the boot to fail.

At boot we cannot discover all the features of the machine
instantaneously, so there will always be some periods where we have
incomplete knowledge of the system. However both the above problems stem
from the fact that we allocate the boot CPU's paca (and paca pointers
array) before we decide which MMU we are using, or discover its exact
features.

Moving the paca allocation slightly later still can solve both the
issues described above, and means for a normal boot we don't do any
permanent allocations until after we've discovered the MMU.

Note that although we move the boot CPU's paca allocation later, we
still have a temporary paca (boot_paca) accessible via r13, so code that
does read only access to paca fields is safe. The only risk is that some
code writes to the boot_paca, and that write will then be lost when we
switch away from the boot_paca later in early_setup().

The additional code that runs before the paca allocation is primarily
mmu_early_init_devtree(), which is scanning the device tree and
populating globals and cur_cpu_spec with MMU related flags. I do not see
any additional code that writes to paca fields.

[1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com
[2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.com

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
2022-02-02 20:32:11 +11:00
Michal Suchanek b2a6f60435 powerpc: add link stack flush mitigation status in debugfs.
The link stack flush status is not visible in debugfs. It can be enabled
even when count cache flush is disabled. Add separate file for its
status.

Signed-off-by: Michal Suchanek <msuchanek@suse.de>
[mpe: Update for change to link_stack_flush_type]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de
2022-01-31 13:45:23 +11:00
Nicholas Piggin 8defc2a5dd powerpc/64s/interrupt: Fix decrementer storm
The decrementer exception can fail to be cleared when the interrupt
returns in the case where the decrementer wraps with the next timer
still beyond decrementer_max. This results in a decrementer interrupt
storm. This is triggerable with small decrementer system with hard
and soft watchdogs disabled.

Fix this by always programming the decrementer if there was no timer.

Fixes: 0faf20a1ad ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use")
Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220124143930.3923442-1-npiggin@gmail.com
2022-01-25 16:50:10 +11:00
Linus Torvalds dd81e1c7d5 powerpc fixes for 5.17 #2
- A series of bpf fixes, including an oops fix and some codegen fixes.
 
  - Fix a regression in syscall_get_arch() for compat processes.
 
  - Fix boot failure on some 32-bit systems with KASAN enabled.
 
  - A couple of other build/minor fixes.
 
 Thanks to: Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh,
 Maxime Bizon, Naveen N. Rao, Nicholas Piggin.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHtOJgTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgLMeD/9eBHqMy+yhXfTw9mjAf8rVlsdY6Xjq
 2LYgHCJpeP0VjxBm8GcyFtAfJc778LyykU18xsi0WVsuwPmVdVwDfZpzxWJrYqeq
 6AuK9uRZAbEx6gZVCsWlk3Em7GyUEqHm5PTxKsDfbzh+hBTNQ8CcKKbCVmEZ+loA
 nWdaWJOZMpE4hQOx0Ph5JRAos/ZUEt2I57qZkRgvIEV+hroFrmdP1rnncdalnLHN
 pXsEn380Q+rpH+48HIpaTpl5+R8CvgFnnlI6/luZC6HoA8K6jph8FdOkkI9eU1v6
 TgdNMiNVIe1wRFzVp+nH7SGqL3l9X/UPS57hL52X+ccpHoioBW9CLPha06g/+UKT
 kJU3mYZg3jvQGrM9H/2R3P97UH9SwHE5SH0xcXj3ItH1NpOKpMO0h5UuWx+Ze8Wo
 zO80xbYLbALtEfx8JMCf95rnUVFL0n55xVJN+7+fs2hU3Be/ZpIyHfDujlHlbd1m
 o3KRhkg1a0RyGsun6Gd2vwLGaBbwYNWvgHM9IDeiAFdJ7HFpp1YrVGpQRkAIUbJS
 bzYiRczjbSof7c9WqLyKNatgNFO/JPI5vi1iNQ3u9Oh9fzjpv5Zok4vK534mOnqU
 QnhiqLJbY+yBpoAvSePGQTCCzDAnJq0r53n0d1O/2T1GcpX9MKM4YlL+Wu4m0yLn
 RcbOCTkmoqWZEw==
 =KjWT
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:

 - A series of bpf fixes, including an oops fix and some codegen fixes.

 - Fix a regression in syscall_get_arch() for compat processes.

 - Fix boot failure on some 32-bit systems with KASAN enabled.

 - A couple of other build/minor fixes.

Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa,
Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin.

* tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/64s: Mask SRR0 before checking against the masked NIP
  powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64
  powerpc/32s: Fix kasan_init_region() for KASAN
  powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32
  powerpc/audit: Fix syscall_get_arch()
  powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06
  tools/bpf: Rename 'struct event' to avoid naming conflict
  powerpc/bpf: Update ldimm64 instructions during extra pass
  powerpc32/bpf: Fix codegen for bpf-to-bpf calls
  bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-23 17:52:42 +02:00
Muchun Song 359745d783 proc: remove PDE_DATA() completely
Remove PDE_DATA() completely and replace it with pde_data().

[akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c]
[akpm@linux-foundation.org: now fix it properly]

Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Acked-by: Christian Brauner <christian.brauner@ubuntu.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Gladkov <gladkov.alexey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22 08:33:37 +02:00
Linus Torvalds f4484d138b Merge branch 'akpm' (patches from Andrew)
Merge more updates from Andrew Morton:
 "55 patches.

  Subsystems affected by this patch series: percpu, procfs, sysctl,
  misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2,
  hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits)
  lib: remove redundant assignment to variable ret
  ubsan: remove CONFIG_UBSAN_OBJECT_SIZE
  kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR
  lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB
  btrfs: use generic Kconfig option for 256kB page size limit
  arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB
  configs: introduce debug.config for CI-like setup
  delayacct: track delays from memory compact
  Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact
  delayacct: cleanup flags in struct task_delay_info and functions use it
  delayacct: fix incomplete disable operation when switch enable to disable
  delayacct: support swapin delay accounting for swapping without blkio
  panic: remove oops_id
  panic: use error_report_end tracepoint on warnings
  fs/adfs: remove unneeded variable make code cleaner
  FAT: use io_schedule_timeout() instead of congestion_wait()
  hfsplus: use struct_group_attr() for memcpy() region
  nilfs2: remove redundant pointer sbufs
  fs/binfmt_elf: use PT_LOAD p_align values for static PIE
  const_structs.checkpatch: add frequently used ops structs
  ...
2022-01-20 10:41:01 +02:00
Kefeng Wang 20c0357646 mm: percpu: add generic pcpu_populate_pte() function
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to
populate pte, this patch adds a generic pcpu populate pte function,
pcpu_populate_pte(), which is marked __weak and used on most
architectures, but it is overridden on x86, which has its own
implementation.

Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang 23f917169e mm: percpu: add generic pcpu_fc_alloc/free funciton
With the previous patch, we could add a generic pcpu first chunk
allocate and free function to cleanup the duplicated definations on each
architecture.

Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Kefeng Wang 1ca3fb3abd mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedef
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu
first chunk allocation will call it to alloc memblock on the
corresponding node by it, this is prepare for the next patch.

Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20 08:52:52 +02:00
Nicholas Piggin aee101d7b9 powerpc/64s: Mask SRR0 before checking against the masked NIP
Commit 314f6c23dd ("powerpc/64s: Mask NIP before checking against
SRR0") masked off the low 2 bits of the NIP value in the interrupt
stack frame in case they are non-zero and mis-compare against a SRR0
register value of a CPU which always reads back 0 from the 2 low bits
which are reserved.

This now causes the opposite problem that an implementation which does
implement those bits in SRR0 will mis-compare against the masked NIP
value in which they have been cleared. QEMU is one such implementation,
and this is allowed by the architecture.

This can be triggered by sigfuz by setting low bits of PT_NIP in the
signal context.

Fix this for now by masking the SRR0 bits as well. Cleaner is probably
to sanitise these values before putting them in registers or stack, but
this is the quick and backportable fix.

Fixes: 314f6c23dd ("powerpc/64s: Mask NIP before checking against SRR0")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220117134403.2995059-1-npiggin@gmail.com
2022-01-18 10:25:18 +11:00
Linus Torvalds 35ce8ae9ae Merge branch 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull signal/exit/ptrace updates from Eric Biederman:
 "This set of changes deletes some dead code, makes a lot of cleanups
  which hopefully make the code easier to follow, and fixes bugs found
  along the way.

  The end-game which I have not yet reached yet is for fatal signals
  that generate coredumps to be short-circuit deliverable from
  complete_signal, for force_siginfo_to_task not to require changing
  userspace configured signal delivery state, and for the ptrace stops
  to always happen in locations where we can guarantee on all
  architectures that the all of the registers are saved and available on
  the stack.

  Removal of profile_task_ext, profile_munmap, and profile_handoff_task
  are the big successes for dead code removal this round.

  A bunch of small bug fixes are included, as most of the issues
  reported were small enough that they would not affect bisection so I
  simply added the fixes and did not fold the fixes into the changes
  they were fixing.

  There was a bug that broke coredumps piped to systemd-coredump. I
  dropped the change that caused that bug and replaced it entirely with
  something much more restrained. Unfortunately that required some
  rebasing.

  Some successes after this set of changes: There are few enough calls
  to do_exit to audit in a reasonable amount of time. The lifetime of
  struct kthread now matches the lifetime of struct task, and the
  pointer to struct kthread is no longer stored in set_child_tid. The
  flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is
  removed. Issues where task->exit_code was examined with
  signal->group_exit_code should been examined were fixed.

  There are several loosely related changes included because I am
  cleaning up and if I don't include them they will probably get lost.

  The original postings of these changes can be found at:
     https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org
     https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org

  I trimmed back the last set of changes to only the obviously correct
  once. Simply because there was less time for review than I had hoped"

* 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits)
  ptrace/m68k: Stop open coding ptrace_report_syscall
  ptrace: Remove unused regs argument from ptrace_report_syscall
  ptrace: Remove second setting of PT_SEIZED in ptrace_attach
  taskstats: Cleanup the use of task->exit_code
  exit: Use the correct exit_code in /proc/<pid>/stat
  exit: Fix the exit_code for wait_task_zombie
  exit: Coredumps reach do_group_exit
  exit: Remove profile_handoff_task
  exit: Remove profile_task_exit & profile_munmap
  signal: clean up kernel-doc comments
  signal: Remove the helper signal_group_exit
  signal: Rename group_exit_task group_exec_task
  coredump: Stop setting signal->group_exit_task
  signal: Remove SIGNAL_GROUP_COREDUMP
  signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process
  signal: Make coredump handling explicit in complete_signal
  signal: Have prepare_signal detect coredumps using signal->core_state
  signal: Have the oom killer detect coredumps using signal->core_state
  exit: Move force_uaccess back into do_exit
  exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit
  ...
2022-01-17 05:49:30 +02:00
Linus Torvalds f56caedaf9 Merge branch 'akpm' (patches from Andrew)
Merge misc updates from Andrew Morton:
 "146 patches.

  Subsystems affected by this patch series: kthread, ia64, scripts,
  ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak,
  dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap,
  memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb,
  userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp,
  ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and
  damon)"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits)
  mm/damon: hide kernel pointer from tracepoint event
  mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log
  mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging
  mm/damon/dbgfs: remove an unnecessary variable
  mm/damon: move the implementation of damon_insert_region to damon.h
  mm/damon: add access checking for hugetlb pages
  Docs/admin-guide/mm/damon/usage: update for schemes statistics
  mm/damon/dbgfs: support all DAMOS stats
  Docs/admin-guide/mm/damon/reclaim: document statistics parameters
  mm/damon/reclaim: provide reclamation statistics
  mm/damon/schemes: account how many times quota limit has exceeded
  mm/damon/schemes: account scheme actions that successfully applied
  mm/damon: remove a mistakenly added comment for a future feature
  Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts
  Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning
  Docs/admin-guide/mm/damon/usage: remove redundant information
  Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks
  mm/damon: convert macro functions to static inline functions
  mm/damon: modify damon_rand() macro to static inline function
  mm/damon: move damon_rand() definition into damon.h
  ...
2022-01-15 20:37:06 +02:00
Aneesh Kumar K.V 21b084fdf2 mm/mempolicy: wire up syscall set_mempolicy_home_node
Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
Cc: Ben Widawsky <ben.widawsky@intel.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Feng Tang <feng.tang@intel.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Huang Ying <ying.huang@intel.com>
Cc: <linux-api@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-15 16:30:30 +02:00
Linus Torvalds 29ec39fcf1 powerpc updates for 5.17
- Optimise radix KVM guest entry/exit by 2x on Power9/Power10.
 
  - Allow firmware to tell us whether to disable the entry and uaccess flushes on Power10
    or later CPUs.
 
  - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.
 
  - Several fixes and improvements to our hard lockup watchdog.
 
  - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.
 
  - Allow building the 64-bit Book3S kernel without hash MMU support, ie. Radix only.
 
  - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).
 
  - Add new encodings for perf_mem_data_src.mem_hops field, and use them on Power10.
 
  - A series of small performance improvements to 64-bit interrupt entry.
 
  - Several commits fixing issues when building with the clang integrated assembler.
 
  - Many other small features and fixes.
 
 Thanks to: Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell, Arnd Bergmann,
 Athira Rajeev, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christoph Hellwig,
 Daniel Axtens, David Yang, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren,
 Hari Bathini, Jason Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent
 Dufour, Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh Kamboju,
 Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child, Oliver O'Halloran, Peiwei
 Hu, Randy Dunlap, Ravi Bangoria, Rob Herring, Russell Currey, Sachin Sant, Sean
 Christopherson, Segher Boessenkool, Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang
 wangx, Yang Guang.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmHhVFMTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgKwzD/9UUEZzWyzMVRJvP9FPZByN2M8czxHJ
 tuqEuVqnfks8ad8tfm2ebng5t8ZuVASBQU2fpPA1+lpdvprgZN5RFGMRh729vskn
 2aHQPmFvFObNbXOgCoXzk+C5xYi3zoRMVM968neSPBneYo+xDicn/zN5CHAgsjhX
 +baemJQ7/xzwLiZgTHe8fWw3nTk3IbPBpha59SdTvR8Moy6I4O8CDPIYEm3U3/J3
 x14ZRETqjksL7YOzEBk0avm1dDZRw/johz29oRYSmCj7dyy5OqrkPwokJiRY90eA
 1lVdofDc0zElaSWkVGzKdSWRUIXjKIVdtejvDeEvl6H/mI6q4TVZE8rFmn+3Rvgf
 9q0iKtmw5Kn11cqgY/pgEGmxnQtIdAodNfI/t939E7+O5LbcznuYUiy0J/kTD/vl
 Xduotg2dsCI+5ukf1wrk2wt9LhqZL+ziOeaBhyDM4orV8T3HBYL6zWBptun//IGO
 lK6TvvCHSYnGqY4bnrAmiOnbbEtnP6nN3zbcXgSvPM0wCRHPIEqd0NRXtfISo32d
 vBPq1neXWo4wrRJj9X3yOuP+5fEA4I+hB3yrCJOkcEcz+8NhlboQXU7raVsJL+bd
 kze75H8hwX7kE71oJFFl13LbSNABgiLFARTBXKfvdQA2iLdR0Snvm+OouvwWRPo/
 Po7Nm3zqdLc/1A==
 =BxhQ
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc updates from Michael Ellerman:

 - Optimise radix KVM guest entry/exit by 2x on Power9/Power10.

 - Allow firmware to tell us whether to disable the entry and uaccess
   flushes on Power10 or later CPUs.

 - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits.

 - Several fixes and improvements to our hard lockup watchdog.

 - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit.

 - Allow building the 64-bit Book3S kernel without hash MMU support, ie.
   Radix only.

 - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit).

 - Add new encodings for perf_mem_data_src.mem_hops field, and use them
   on Power10.

 - A series of small performance improvements to 64-bit interrupt entry.

 - Several commits fixing issues when building with the clang integrated
   assembler.

 - Many other small features and fixes.

Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell,
Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET,
Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard
Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason
Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour,
Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh
Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child,
Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring,
Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool,
Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang
Guang.

* tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits)
  powerpc/xmon: Dump XIVE information for online-only processors.
  powerpc/opal: use default_groups in kobj_type
  powerpc/cacheinfo: use default_groups in kobj_type
  powerpc/sched: Remove unused TASK_SIZE_OF
  powerpc/xive: Add missing null check after calling kmalloc
  powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API
  selftests/powerpc: Add a test of sigreturning to an unaligned address
  powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
  powerpc/64s: Mask NIP before checking against SRR0
  powerpc/perf: Fix spelling of "its"
  powerpc/32: Fix boot failure with GCC latent entropy plugin
  powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests
  powerpc/code-patching: Move code patching selftests in its own file
  powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h
  powerpc/code-patching: Move patch_exception() outside code-patching.c
  powerpc/code-patching: Use test_trampoline for prefixed patch test
  powerpc/code-patching: Fix patch_branch() return on out-of-range failure
  powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling
  powerpc/code-patching: Fix unmap_patch_area() error handling
  powerpc/code-patching: Fix error handling in do_patch_instruction()
  ...
2022-01-14 15:17:26 +01:00
Linus Torvalds 4eb766f64d Devicetree updates for v5.17:
Bindings:
 - DT schema conversions for Samsung clocks, RNG bindings, Qcom Command
   DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl, Tegra I2C
   and BPMP, pwm-vibrator, Arm DSU, and Cadence macb
 
 - DT schema conversions for Broadcom platforms: interrupt controllers,
   STB GPIO, STB waketimer, STB reset, iProc MDIO mux, iProc PCIe,
   Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON, SYSTEMPORT, AMAC,
   Northstar 2 PCIe PHY, GENET, moca PHY, GISB arbiter, and SATA
 
 - Add binding schemas for Tegra210 EMC table, TI DC-DC converters,
 
 - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues
 
 - More fixes due to 'unevaluatedProperties' enabling
 
 - Data type fixes and clean-ups of binding examples found in preparation
   to move to validating DTB files directly (instead of intermediate YAML
   representation.
 
 - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus
 
 - Add various new compatible strings
 
 DT core:
 - Silence a warning for overlapping reserved memory regions
 
 - Reimplement unittest overlay tracking
 
 - Fix stack frame size warning in unittest
 
 - Clean-ups of early FDT scanning functions
 
 - Fix handling of "linux,usable-memory-range" on EFI booted systems
 
 - Add support for 'fail' status on CPU nodes
 
 - Improve error message in of_phandle_iterator_next()
 
 - kbuild: Disable duplicate unit-address warnings for disabled nodes
 -----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCgAuFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmHfCdcQHHJvYmhAa2Vy
 bmVsLm9yZwAKCRD6+121jbxhw+UZD/0ZMQQ6VF20MW7Gg0bOutd8Q6Q6opjrCG5c
 nLW5mv8Q+um3sI1ZpwdMI4zAfCmTfeL13ZM9KtJKlJ0o41bgId+kZsezy4I2rN9+
 sE1CwA4TninKTJsUkmyQX4fgJRUZ95Eubryfb07sy7nbK3LZQ+t18R5tzVBDpzy4
 7hy4eM6mlMxgIJDi7EUboLZslkMM4TGGutLsk5C5T5V5lcWSt3Jj5WZtl5k4Wykq
 j4i9mU+GGTZi0nGAJQ7lNoLPatZDSVQx5tzNV/Wi8hSwZbn0Kycu+IuWZyihILz/
 9lzB/7tv8fl+xkTaJ5xxaY05HcDeX02yCLzh3PfAHRYdbQ2EkFoaKqJ81SLfAq5t
 aH87v41wFSrjzynxpppqswXOdqI/jofrHrGlQldnw0VHGTjEfDbyZGRQFPHmuzTG
 gXaSNKCxppG7ThpXarfu7D4TdYV75n+cBOsC/BBopYgIS2+xmjDA3t5Scks1/4NX
 1Hfq9IMF9iYJYc/GNXBWcOrLn9d1ILYt6HrKRQar1NIEFH1Lt0c2aw5WsyvOZ4zx
 aLHLSbEwnl+2wleyGB9YQkFaaF7N6qcid3u9KFRJP6nTojoaeQaIi3MR9F3LVReZ
 LV5YfWEcij1zc+lzwgHc6+8bbgFxrKgOC2IL/B6u93u/BO0wmF/54kbEZKaLyX8d
 a7Iii4IYFw==
 =2g8v
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull devicetree updates from Rob Herring:
 "Bindings:

   - DT schema conversions for Samsung clocks, RNG bindings, Qcom
     Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl,
     Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb

   - DT schema conversions for Broadcom platforms: interrupt
     controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux,
     iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON,
     SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB
     arbiter, and SATA

   - Add binding schemas for Tegra210 EMC table, TI DC-DC converters,

   - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues

   - More fixes due to 'unevaluatedProperties' enabling

   - Data type fixes and clean-ups of binding examples found in
     preparation to move to validating DTB files directly (instead of
     intermediate YAML representation.

   - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus

   - Add various new compatible strings

  DT core:

   - Silence a warning for overlapping reserved memory regions

   - Reimplement unittest overlay tracking

   - Fix stack frame size warning in unittest

   - Clean-ups of early FDT scanning functions

   - Fix handling of "linux,usable-memory-range" on EFI booted systems

   - Add support for 'fail' status on CPU nodes

   - Improve error message in of_phandle_iterator_next()

   - kbuild: Disable duplicate unit-address warnings for disabled nodes"

* tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits)
  dt-bindings: net: mdio: Drop resets/reset-names child properties
  dt-bindings: clock: samsung: convert S5Pv210 to dtschema
  dt-bindings: clock: samsung: convert Exynos5410 to dtschema
  dt-bindings: clock: samsung: convert Exynos5260 to dtschema
  dt-bindings: clock: samsung: extend Exynos7 bindings with UFS
  dt-bindings: clock: samsung: convert Exynos7 to dtschema
  dt-bindings: clock: samsung: convert Exynos5433 to dtschema
  dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712
  dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes
  dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp'
  dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example
  dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example
  dt-bindings: clock: imx5: Drop clock consumer node from example
  dt-bindings: Drop required 'interrupt-parent'
  dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance'
  dt-bindings: net: wireless: mt76: Fix 8-bit property sizes
  dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema
  dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry
  dt-bindings: net: stm32-dwmac: Make each example a separate entry
  dt-bindings: net: Cleanup MDIO node schemas
  ...
2022-01-12 16:47:05 -08:00
Linus Torvalds 48a60bdb2b - Add a set of thread_info.flags accessors which snapshot it before
accesing it in order to prevent any potential data races, and convert
 all users to those new accessors
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmHcgFoACgkQEsHwGGHe
 VUqXeRAAvcNEfFw6BvXeGfFTxKmOrsRtu2WCkAkjvamyhXMCrjBqqHlygLJFCH5i
 2mc6HBohzo4vBFcgi3R5tVkGazqlthY1KUM9Jpk7rUuUzi0phTH7n/MafZOm9Es/
 BHYcAAyT/NwZRbCN0geccIzBtbc4xr8kxtec7vkRfGDx8B9/uFN86xm7cKAaL62G
 UDs0IquDPKEns3A7uKNuvKztILtuZWD1WcSkbOULJzXgLkb+cYKO1Lm9JK9rx8Ds
 8tjezrJgOYGLQyyv0i3pWelm3jCZOKUChPslft0opvVUbrNd8piehvOm9CWopHcB
 QsYOWchnULTE9o4ZAs/1PkxC0LlFEWZH8bOLxBMTDVEY+xvmDuj1PdBUpncgJbOh
 dunHzsvaWproBSYUXA9nKhZWTVGl+CM8Ks7jXjl3IPynLd6cpYZ/5gyBVWEX7q3e
 8htG95NzdPPo7doxMiNSKGSmSm0Np1TJ/i89vsYeGfefsvsq53Fyjhu7dIuTWHmU
 2YUe6qHs6dF9x1bkHAAZz6T9Hs4BoGQBcXUnooT9JbzVdv2RfTPsrawdu8dOnzV1
 RhwCFdFcll0AIEl0T9fCYzUI/Ga8ZS0roXs5NZ4wl0lwr0BGFwiU8WC1FUdGsZo9
 0duaa0Tpv0OWt6rIMMB/E9QsqCDsQ4CMHuQpVVw+GOO5ux9kMms=
 =v6Xn
 -----END PGP SIGNATURE-----

Merge tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull thread_info flag accessor helper updates from Borislav Petkov:
 "Add a set of thread_info.flags accessors which snapshot it before
  accesing it in order to prevent any potential data races, and convert
  all users to those new accessors"

* tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  powerpc: Snapshot thread flags
  powerpc: Avoid discarding flags in system_call_exception()
  openrisc: Snapshot thread flags
  microblaze: Snapshot thread flags
  arm64: Snapshot thread flags
  ARM: Snapshot thread flags
  alpha: Snapshot thread flags
  sched: Snapshot thread flags
  entry: Snapshot thread flags
  x86: Snapshot thread flags
  thread_info: Add helpers to snapshot thread flags
2022-01-10 11:34:10 -08:00
Linus Torvalds 9b9e211360 arm64 updates for 5.17:
- KCSAN enabled for arm64.
 
 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.
 
 - Some more SVE clean-ups and refactoring in preparation for SME support
   (scalable matrix extensions).
 
 - BTI clean-ups (SYM_FUNC macros etc.)
 
 - arm64 atomics clean-up and codegen improvements.
 
 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal
   square root estimate).
 
 - Use SHA3 instructions to speed-up XOR.
 
 - arm64 unwind code refactoring/unification.
 
 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1
   (potentially set by a hypervisor; user-space already does this).
 
 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.
 
 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmHXNtYACgkQa9axLQDI
 XvHBGw/+OVGdbORxwrU+uRb7N6qIJkrW/mmM4x1KLo1i+REZLb8/VlXm0xC60FG+
 39x6FSVkRr+lLDfTqpQsOez5FpdsvOe9Fc4L3bwniDg+EPo7x65VmP2dw/Ae2q0i
 87xyWCczx5hFEPF/1sb1R1pm3bTXjeklBkdv+OXhwflLOwpCp1J8z8WJK8qJVFX6
 CmuE6Q4fDQr0ghl9Nf8DiAr20mHDh8wMKNUJOg4waaQOOCta6q1oJ3qfz6E9z1eW
 zEE3dfZgBCx7HCRc3KGgzT7H4Ces3BYvhBYP6bJRliVI88XdPiM4MfdGL4UIb27Q
 NLAdr+FVzk/YLzMHtxSfkT10nBqoOPWUTckLu9jIIl5cpBX73Wiz7jfzBvqFmC/y
 opSFMZ3lwQPM5WAPtAlZptA3GPPySeInVmvUgB7IQ+1Q1T1n8ri1y5hzTYC4Sc/g
 amJI1rXf1Al8+2zFBggr6Up+EOnfV9nAwrzLXkRlASsfmvY4dnVWg3NWfBqtEHAq
 VuZCecSgawxuSlpmJ4VGbLrBFaz18bn9EzujR5fFvi5Qcg1CMFOROi2+6IynopNV
 IS0R8j6fwgQPA5lcnNIPeJRRkQoqO4l8bPDzeXEny0BSw313EgBSo9aQtnjyIJbp
 BTuDHARKs+/NvDPvd8GQkxNPgwJnVOL9pdgNAolEu1/k7JtnIS0=
 =ecyi
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:

 - KCSAN enabled for arm64.

 - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD.

 - Some more SVE clean-ups and refactoring in preparation for SME
   support (scalable matrix extensions).

 - BTI clean-ups (SYM_FUNC macros etc.)

 - arm64 atomics clean-up and codegen improvements.

 - HWCAPs for FEAT_AFP (alternate floating point behaviour) and
   FEAT_RPRESS (increased precision of reciprocal estimate and
   reciprocal square root estimate).

 - Use SHA3 instructions to speed-up XOR.

 - arm64 unwind code refactoring/unification.

 - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP ==
   1 (potentially set by a hypervisor; user-space already does this).

 - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU,
   Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups.

 - Other fixes and clean-ups; highlights: fix the handling of erratum
   1418040, correct the calculation of the nomap region boundaries,
   introduce io_stop_wc() mapped to the new DGH instruction (data
   gathering hint).

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits)
  arm64: Use correct method to calculate nomap region boundaries
  arm64: Drop outdated links in comments
  arm64: perf: Don't register user access sysctl handler multiple times
  drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check
  perf/smmuv3: Fix unused variable warning when CONFIG_OF=n
  arm64: errata: Fix exec handling in erratum 1418040 workaround
  arm64: Unhash early pointer print plus improve comment
  asm-generic: introduce io_stop_wc() and add implementation for ARM64
  arm64: Ensure that the 'bti' macro is defined where linkage.h is included
  arm64: remove __dma_*_area() aliases
  docs/arm64: delete a space from tagged-address-abi
  arm64: Enable KCSAN
  kselftest/arm64: Add pidbench for floating point syscall cases
  arm64/fp: Add comments documenting the usage of state restore functions
  kselftest/arm64: Add a test program to exercise the syscall ABI
  kselftest/arm64: Allow signal tests to trigger from a function
  kselftest/arm64: Parameterise ptrace vector length information
  arm64/sve: Minor clarification of ABI documentation
  arm64/sve: Generalise vector length configuration prctl() for SME
  arm64/sve: Make sysctl interface for SVE reusable by SME
  ...
2022-01-10 08:49:37 -08:00
Greg Kroah-Hartman 2bdf3f9e9d powerpc/cacheinfo: use default_groups in kobj_type
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field.  Move the powerpc cacheinfo sysfs code to use default_groups
field which has been the preferred way since aa30f47cf6 ("kobject: Add
support for default attribute groups to kobj_type") so that we can soon
get rid of the obsolete default_attrs field.

Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20220104155450.1291277-1-gregkh@linuxfoundation.org
2022-01-05 10:58:23 +11:00
Michael Ellerman fd1eaaaaa6 powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings
When CONFIG_PPC_RFI_SRR_DEBUG=y we check the SRR values before returning
from interrupts. This is done in asm using EMIT_BUG_ENTRY, and passing
BUGFLAG_WARNING.

However that fails to create an exception table entry for the warning,
and so do_program_check() fails the exception table search and proceeds
to call _exception(), resulting in an oops like:

  Oops: Exception in kernel mode, sig: 5 [#1]
  LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
  Modules linked in:
  CPU: 2 PID: 1204 Comm: sigreturn_unali Tainted: P                  5.16.0-rc2-00194-g91ca3d4f77c5 #12
  NIP:  c00000000000c5b0 LR: 0000000000000000 CTR: 0000000000000000
  ...
  NIP [c00000000000c5b0] system_call_common+0x150/0x268
  LR [0000000000000000] 0x0
  Call Trace:
  [c00000000db73e10] [c00000000000c558] system_call_common+0xf8/0x268 (unreliable)
  ...
  Instruction dump:
  7cc803a6 888d0931 2c240000 4082001c 38800000 988d0931 e8810170 e8a10178
  7c9a03a6 7cbb03a6 7d7a02a6 e9810170 <7f0b6088> 7d7b02a6 e9810178 7f0b6088

We should instead use EMIT_WARN_ENTRY, which creates an exception table
entry for the warning, allowing the warning to be correctly recognised,
and the code to resume after printing the warning.

Note however that because this warning is buried deep in the interrupt
return path, we are not able to recover from it (due to MSR_RI being
clear), so we still end up in die() with an unrecoverable exception.

Fixes: 59dc5bfca0 ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid")
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221135101.2085547-2-mpe@ellerman.id.au
2021-12-25 10:56:01 +11:00
Michael Ellerman 314f6c23dd powerpc/64s: Mask NIP before checking against SRR0
When CONFIG_PPC_RFI_SRR_DEBUG=y we check that NIP and SRR0 match when
returning from interrupts. This can trigger falsely if NIP has either of
its two low bits set via sigreturn or ptrace, while SRR0 has its low two
bits masked in hardware.

As a quick fix make sure to mask the low bits before doing the check.

Fixes: 59dc5bfca0 ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid")
Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com>
Link: https://lore.kernel.org/r/20211221135101.2085547-1-mpe@ellerman.id.au
2021-12-25 10:55:55 +11:00
Christophe Leroy bba496656a powerpc/32: Fix boot failure with GCC latent entropy plugin
Boot fails with GCC latent entropy plugin enabled.

This is due to early boot functions trying to access 'latent_entropy'
global data while the kernel is not relocated at its final
destination yet.

As there is no way to tell GCC to use PTRRELOC() to access it,
disable latent entropy plugin in early_32.o and feature-fixups.o and
code-patching.o

Fixes: 38addce8b6 ("gcc-plugins: Add latent_entropy plugin")
Cc: stable@vger.kernel.org # v4.9+
Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217
Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.1640178426.git.christophe.leroy@csgroup.eu
2021-12-23 22:36:58 +11:00
Alexey Kardashevskiy d51f86cfd8 powerpc/mm: Switch obsolete dssall to .long
The dssall ("Data Stream Stop All") instruction is obsolete altogether
with other Data Cache Instructions since ISA 2.03 (year 2006).

LLVM IAS does not support it but PPC970 seems to be using it.
This switches dssall to .long as there is no much point in fixing LLVM.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23 22:35:13 +11:00
Daniel Axtens d72c4a36d7 powerpc/64/asm: Do not reassign labels
The LLVM integrated assembler really does not like us reassigning things
to the same label:

<instantiation>:7:9: error: invalid reassignment of non-absolute variable 'fs_label'

This happens across a bunch of platforms:
https://github.com/ClangBuiltLinux/linux/issues/1043
https://github.com/ClangBuiltLinux/linux/issues/1008
https://github.com/ClangBuiltLinux/linux/issues/920
https://github.com/ClangBuiltLinux/linux/issues/1050

There is no hope of getting this fixed in LLVM (see
https://github.com/ClangBuiltLinux/linux/issues/1043#issuecomment-641571200
and https://bugs.llvm.org/show_bug.cgi?id=47798#c1 )
so if we want to build with LLVM_IAS, we need to hack
around it ourselves.

For us the big problem comes from this:

\#define USE_FIXED_SECTION(sname)				\
	fs_label = start_##sname;				\
	fs_start = sname##_start;				\
	use_ftsec sname;

\#define USE_TEXT_SECTION()
	fs_label = start_text;					\
	fs_start = text_start;					\
	.text

and in particular fs_label.

This works around it by not setting those 'variables' and requiring
that users of the variables instead track for themselves what section
they are in. This isn't amazing, by any stretch, but it gets us further
in the compilation.

Note that even though users have to keep track of the section, using
a wrong one produces an error with both binutils and llvm which prevents
from using wrong section at the compile time:

llvm error example:

AS      arch/powerpc/kernel/head_64.o
<unknown>:0: error: Cannot represent a difference across sections
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

binutils error example:

/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S: Assembler messages:
/home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S:1974: Error: can't resolve `system_call_common' {.text section} - `start_r
eal_vectors' {.head.text.real_vectors section}
make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-5-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alexey Kardashevskiy fd98395797 powerpc/64/asm: Inline BRANCH_TO_C000
It is used just once and does not really help with readability, remove it.

Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-4-aik@ozlabs.ru
2021-12-23 22:35:12 +11:00
Alan Modra a3ad84da07 powerpc/toc: Future proof kernel toc
This patch future-proofs the kernel against linker changes that might
put the toc pointer at some location other than .got+0x8000, by
replacing __toc_start+0x8000 with .TOC. throughout.  If the kernel's
idea of the toc pointer doesn't agree with the linker, bad things
happen.

prom_init.c code relocating its toc is also changed so that a symbolic
__prom_init_toc_start toc-pointer relative address is calculated
rather than assuming that it is always at toc-pointer - 0x8000.  The
length calculations loading values from the toc are also avoided.
It's a little incestuous to do that with unreloc_toc picking up
adjusted values (which is fine in practice, they both adjust by the
same amount if all goes well).

I've also changed the way .got is aligned in vmlinux.lds and
zImage.lds, mostly so that dumping out section info by objdump or
readelf plainly shows the alignment is 256.  This linker script
feature was added 2005-09-27, available in FSF binutils releases from
2.17 onwards.  Should be safe to use in the kernel, I think.

Finally, put *(.got) before the prom_init.o entry which only needs
*(.toc), so that the GOT header goes in the correct place.  I don't
believe this makes any difference for the kernel as it would for
dynamic objects being loaded by ld.so.  That change is just to stop
lusers who blindly copy kernel scripts being led astray.  Of course,
this change needs the prom_init.c changes.

Some notes on .toc and .got.

.toc is a compiler generated section of addresses.  .got is a linker
generated section of addresses, generally built when the linker sees
R_*_*GOT* relocations.  In the case of powerpc64 ld.bfd, there are
multiple generated .got sections, one per input object file.  So you
can somewhat reasonably write in a linker script an input section
statement like *prom_init.o(.got .toc) to mean "the .got and .toc
section for files matching *prom_init.o".  On other architectures that
doesn't make sense, because the linker generally has just one .got
section.  Even on powerpc64, note well that the GOT entries for
prom_init.o may be merged with GOT entries from other objects.  That
means that if prom_init.o references, say, _end via some GOT
relocation, and some other object also references _end via a GOT
relocation, the GOT entry for _end may be in the range
__prom_init_toc_start to __prom_init_toc_end and if the kernel does
something special to GOT/TOC entries in that range then the value of
_end as seen by objects other than prom_init.o will be affected.  On
the other hand the GOT entry for _end may not be in the range
__prom_init_toc_start to __prom_init_toc_end.  Which way it turns out
is deterministic but a detail of linker operation that should not be
relied on.

A feature of ld.bfd is that input .toc (and .got) sections matching
one linker input section statement may be sorted, to put entries used
by small-model code first, near the toc base.  This is why scripts for
powerpc64 normally use *(.got .toc) rather than *(.got) *(.toc), since
the first form allows more freedom to sort.

Another feature of ld.bfd is that indirect addressing sequences using
the GOT/TOC may be edited by the linker to relative addressing.  In
many cases relative addressing would be emitted by gcc for
-mcmodel=medium if you appropriately decorate variable declarations
with non-default visibility.

The original patch is here:
https://lore.kernel.org/linuxppc-dev/20210310034813.GM6042@bubble.grove.modra.org/

Signed-off-by: Alan Modra <amodra@au1.ibm.com>
[aik: removed non-relocatable which is gone in 24d33ac5b8]
[aik: added <=2.24 check]
[aik: because of llvm-as, kernel_toc_addr() uses "mr" instead of global register variable]
Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211221055904.555763-2-aik@ozlabs.ru
2021-12-23 22:35:01 +11:00
Nick Child d276960d92 powerpc/kernel: Add __init attribute to eligible functions
Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/
kexec`) are deserving of an `__init` macro attribute. These functions are
only called by other initialization functions and therefore should inherit
the attribute.
Also, change function declarations in header files to include `__init`.

Signed-off-by: Nick Child <nick.child@ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2021-12-23 22:33:10 +11:00
Linus Torvalds 713ab911f2 powerpc fixes for 5.16 #4
Fix a recently introduced oops at boot on 85xx in some configurations.
 
 Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX.
 
 Thanks to: Joe Lawrence, Russell Currey, Xiaoming Ni.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmG+q+UTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgG5sEACVsb39+Q8w0GKJxUGR0Wri1InJTkLC
 UWpd9PVzG6n8+Epl0f+H5cMlv5V6xaDzTsNXEf27xIACMlPSVHVLDW8u/RFPizdp
 fUDUweGUvK/M0GQW89nRV29DjdNjMmUq1KQFM6eEgYN/0nS3HqF27jamiaFhiZ+e
 pJl1g2dy/DTWrFz3V/8b0nv6kyJE8GlpnCmTchDsPNDbzWQVJXVfM7IB/9N+EVna
 MQAZCyKSOafTgNRPOW+Sm4oF64z+Yn4cxURVKfzbM+0Cd8pNIcGZ7McXrnp0DR4t
 bLGeYEzyR1bPc3VTt7b+jafDP41AakRllKAFz65+pdYOcV1hTRM4inbLVPT5Bmz6
 7ANlX/w4BwYWFk3mWW14BoxU02FlRKRddXhusq7x2wIqvtn1KI8Qm1xpneiGpcW+
 GEe+k9ONGowXp/bc47/Co6dPSHKRRHUC3MsgbDv2Y7M5G9vsetckdQzA2bIqYSx8
 r1ZXNiMwQarPeB6FrGvNbekyCsIhJkk6chhtL0k27BVOFqYiHZtaqBs+RCQYPdC8
 Aih8WOEqdldbItFI3MihyKULucgkVpdnm80Ja67qRthuJICF9OML3mdrt83bxEIp
 fu4Bsw8dygVKJWScZj8AUC/ZBxOrrTGyZR1RvsHpaAiEfinMkxxTnXjqNDiSZWXt
 DKKLx6pfSFrokw==
 =Nmp5
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix a recently introduced oops at boot on 85xx in some configurations.

  Fix crashes when loading some livepatch modules with
  STRICT_MODULE_RWX.

  Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni"

* tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/module_64: Fix livepatching for RO modules
  powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
2021-12-19 11:31:14 -08:00
Rob Herring 1f012283e9 of/fdt: Rework early_init_dt_scan_memory() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_memory() to be called directly and use libfdt.

Cc: John Crispin <john@phrozen.org>
Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linux-mips@vger.kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211215150102.1303588-1-robh@kernel.org
2021-12-16 16:07:52 -06:00
Rob Herring d665881d21 of/fdt: Rework early_init_dt_scan_root() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_root()
to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-3-robh@kernel.org
2021-12-16 16:07:48 -06:00
Rob Herring 60f20d84dc of/fdt: Rework early_init_dt_scan_chosen() to call directly
Use of the of_scan_flat_dt() function predates libfdt and is discouraged
as libfdt provides a nicer set of APIs. Rework
early_init_dt_scan_chosen() to be called directly and use libfdt.

Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Frank Rowand <frowand.list@gmail.com>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Rob Herring <robh@kernel.org>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Link: https://lore.kernel.org/r/20211118181213.1433346-2-robh@kernel.org
2021-12-16 16:07:41 -06:00
Nicholas Piggin af47d79b04 powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interrupts
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is
a good idea to have for most synchronous interrupts, but for async ones
it is much less important.

Doorbell, external, and decrementer interrupts are the important
asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible,
because it might be a guest exit that requires CFAR preserved. But the
important pseries interrupts can avoid loading CFAR.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin 0faf20a1ad powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use
Enabling MSR[EE] in interrupt handlers while interrupts are still soft
masked allows PMIs to profile interrupt handlers to some degree, beyond
what SIAR latching allows.

When perf is not being used, this is almost useless work. It requires an
extra mtmsrd in the irq handler, and it also opens the door to masked
interrupts hitting and requiring replay, which is more expensive than
just taking them directly. This effect can be noticable in high IRQ
workloads.

Avoid enabling MSR[EE] unless perf is currently in use. This saves about
60 cycles (or 8%) on a simple decrementer interrupt microbenchmark.
Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%.

This does prevent the soft-nmi interrupt being taken in these handlers,
but that's not too reliable anyway. The SMP watchdog will continue to be
the reliable way to catch lockups.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Nicholas Piggin ff0b0d6e1a powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapper
The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable
MSR[EE] in interrupt entry code, for those interrupts which enable EE.
This helps performance of important synchronous interrupts (e.g., page
faults).

This is similar to what commit dd152f70bd ("powerpc/64s: system call
avoid setting MSR[RI] until we set MSR[EE]") does for system calls.

Do this by enabling EE and RI together at the beginning of the entry
wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is
set.

Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones
leave it unchanged, so by default they always get EE=1 unless they have
interrupted a caller that is hard disabled. When the sync interrupt
later calls interrupt_cond_local_irq_enable(), it will not require
another mtmsrd because MSR[EE] was already enabled here.

This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which
saves about 20 cycles on POWER9. And for kernel-mode interrupts, both
synchronous and asynchronous, this saves an additional 40 cycles due to
the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR
scoreboard stall.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2021-12-16 21:31:45 +11:00
Russell Currey 8734b41b3e powerpc/module_64: Fix livepatching for RO modules
Livepatching a loaded module involves applying relocations through
apply_relocate_add(), which attempts to write to read-only memory when
CONFIG_STRICT_MODULE_RWX=y.  Work around this by performing these
writes through the text poke area by using patch_instruction().

R_PPC_REL24 is the only relocation type generated by the kpatch-build
userspace tool or klp-convert kernel tree that I observed applying a
relocation to a post-init module.

A more comprehensive solution is planned, but using patch_instruction()
for R_PPC_REL24 on should serve as a sufficient fix.

This does have a performance impact, I observed ~15% overhead in
module_load() on POWER8 bare metal with checksum verification off.

Fixes: c35717c71e ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX")
Cc: stable@vger.kernel.org # v5.14+
Reported-by: Joe Lawrence <joe.lawrence@redhat.com>
Signed-off-by: Russell Currey <ruscur@russell.cc>
Tested-by: Joe Lawrence <joe.lawrence@redhat.com>
[mpe: Check return codes from patch_instruction()]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
2021-12-14 23:13:03 +11:00
Eric W. Biederman 0e25498f8c exit: Add and use make_task_dead.
There are two big uses of do_exit.  The first is it's design use to be
the guts of the exit(2) system call.  The second use is to terminate
a task after something catastrophic has happened like a NULL pointer
in kernel code.

Add a function make_task_dead that is initialy exactly the same as
do_exit to cover the cases where do_exit is called to handle
catastrophic failure.  In time this can probably be reduced to just a
light wrapper around do_task_dead. For now keep it exactly the same so
that there will be no behavioral differences introducing this new
concept.

Replace all of the uses of do_exit that use it for catastraphic
task cleanup with make_task_dead to make it clear what the code
is doing.

As part of this rename rewind_stack_do_exit
rewind_stack_and_make_dead.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-13 12:04:45 -06:00
Peter Zijlstra 1614b2b11f arch: Make ARCH_STACKWALK independent of STACKTRACE
Make arch_stack_walk() available for ARCH_STACKWALK architectures
without it being entangled in STACKTRACE.

Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
[Mark: rebase, drop unnecessary arm change]
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Paul Walmsley <paul.walmsley@sifive.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-10 14:06:03 +00:00
Hari Bathini 06e629c25d powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panic
In panic path, fadump is triggered via a panic notifier function.
Before calling panic notifier functions, smp_send_stop() gets called,
which stops all CPUs except the panic'ing CPU. Commit 8389b37dff
("powerpc: stop_this_cpu: remove the cpu from the online map.") and
again commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
started marking CPUs as offline while stopping them. So, if a kernel
has either of the above commits, vmcore captured with fadump via panic
path would not process register data for all CPUs except the panic'ing
CPU. Sample output of crash-utility with such vmcore:

  # crash vmlinux vmcore
  ...
        KERNEL: vmlinux
      DUMPFILE: vmcore  [PARTIAL DUMP]
          CPUS: 1
          DATE: Wed Nov 10 09:56:34 EST 2021
        UPTIME: 00:00:42
  LOAD AVERAGE: 2.27, 0.69, 0.24
         TASKS: 183
      NODENAME: XXXXXXXXX
       RELEASE: 5.15.0+
       VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021
       MACHINE: ppc64le  (2500 Mhz)
        MEMORY: 8 GB
         PANIC: "Kernel panic - not syncing: sysrq triggered crash"
           PID: 3394
       COMMAND: "bash"
          TASK: c0000000150a5f80  [THREAD_INFO: c0000000150a5f80]
           CPU: 1
         STATE: TASK_RUNNING (PANIC)

  crash> p -x __cpu_online_mask
  __cpu_online_mask = $1 = {
    bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>
  crash>
  crash> p -x __cpu_active_mask
  __cpu_active_mask = $2 = {
    bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}
  }
  crash>

While this has been the case since fadump was introduced, the issue
was not identified for two probable reasons:

  - In general, the bulk of the vmcores analyzed were from crash
    due to exception.

  - The above did change since commit 8341f2f222 ("sysrq: Use
    panic() to force a crash") started using panic() instead of
    deferencing NULL pointer to force a kernel crash. But then
    commit de6e5d3841 ("powerpc: smp_send_stop do not offline
    stopped CPUs") stopped marking CPUs as offline till kernel
    commit bab26238bb ("powerpc: Offline CPU in stop_this_cpu()")
    reverted that change.

To ensure post processing register data of all other CPUs happens
as intended, let panic() function take the crash friendly path (read
crash_smp_send_stop()) with the help of crash_kexec_post_notifiers
option. Also, as register data for all CPUs is captured by f/w, skip
IPI callbacks here for fadump, to avoid any complications in finding
the right backtraces.

Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Hari Bathini 219572d2fc powerpc: handle kdump appropriately with crash_kexec_post_notifiers option
Kdump can be triggered after panic_notifers since commit f06e5153f4
("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump
after panic_notifers") introduced crash_kexec_post_notifiers option.
But using this option would mean smp_send_stop(), that marks all other
CPUs as offline, gets called before kdump is triggered. As a result,
kdump routines fail to save other CPUs' registers. To fix this, kdump
friendly crash_smp_send_stop() function was introduced with kernel
commit 0ee59413c9 ("x86/panic: replace smp_send_stop() with kdump
friendly version in panic path"). Override this kdump friendly weak
function to handle crash_kexec_post_notifiers option appropriately
on powerpc.

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hari Bathini <hbathini@linux.ibm.com>
[Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2021-12-09 22:41:22 +11:00
Christophe Leroy 07b863aef5 powerpc/inst: Define ppc_inst_t as u32 on PPC32
Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined
as a struct, even when the struct has a single simple element.

To avoid that, define ppc_inst_t as u32 on PPC32.

Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that
sparse can perform type checking.

Also revert commit 511eea5e2c ("powerpc/kprobes: Fix Oops by passing
ppc_inst as a pointer to emulate_step() on ppc32") as now the
instruction to be emulated is passed as a register to emulate_step().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy c545b9f040 powerpc/inst: Define ppc_inst_t
In order to stop using 'struct ppc_inst' on PPC32,
define a ppc_inst_t typedef.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:21 +11:00
Christophe Leroy 4f6a025201 powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.
This adds KUAP support to 85xx in 32 bits mode.
This is done by reading the content of SPRN_MAS1 and checking
the TID at the time user pgtable is loaded.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy fcf9bb6d32 powerpc/kuap: Wire-up KUAP on 40x
This adds KUAP support to 40x. This is done by checking
the content of SPRN_PID at the time user pgtable is loaded.

40x doesn't have KUEP, but KUAP implies KUEP because when the
PID doesn't match the page's PID, the page cannot be read nor
executed.

So KUEP is now automatically selected when KUAP is selected and
disabled when KUAP is disabled.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/aaefa91897ddc42ac11019dc0e1d1a525bd08e90.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy f6fad4fb55 powerpc/kuap: Wire-up KUAP on 44x
This adds KUAP support to 44x. This is done by checking
the content of SPRN_PID at the time it is read and written
into SPRN_MMUCR.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/7d6c3f1978a26feada74b084f651e8cf1e3b3a47.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:20 +11:00
Christophe Leroy 43afcf8f01 powerpc: Add KUAP support for BOOKE and 40x
On booke/40x we don't have segments like book3s/32.
On booke/40x we don't have access protection groups like 8xx.

Use the PID register to provide user access protection.
Kernel address space can be accessed with any PID.
User address space has to be accessed with the PID of the user.
User PID is always not null.

Everytime the kernel is entered, set PID register to 0 and
restore PID register when returning to user.

Everytime kernel needs to access user data, PID is restored
for the access.

In TLB miss handlers, check the PID and bail out to data storage
exception when PID is 0 and accessed address is in user space.

Note that also forbids execution of user text by kernel except
when user access is unlocked. But this shouldn't be a problem
as the kernel is not supposed to ever run user text.

This patch prepares the infrastructure but the real activation of KUAP
is done by following patches for each processor type one by one.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/5d65576a8e31e9480415785a180c92dd4e72306d.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy 42e03bc524 powerpc/kuap: Prepare for supporting KUAP on BOOK3E/64
Also call kuap_lock() and kuap_save_and_lock() from
interrupt functions with CONFIG_PPC64.

For book3s/64 we keep them empty as it is done in assembly.

Also do the locked assert when switching task unless it is
book3s/64.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1cbf94e26e6d6e2e028fd687588a7e6622d454a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy 047a6fd401 powerpc/config: Add CONFIG_BOOKE_OR_40x
We have many functionnalities common to 40x and BOOKE, it leads to
many places with #if defined(CONFIG_BOOKE) || defined(CONFIG_40x).

We are going to add a few more with KUAP for booke/40x, so create
a new symbol which is defined when either BOOKE or 40x is defined.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/9a3dbd60924cb25c9f944d3d8205ac5a0d15e229.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy 937fb7003e powerpc/kuap: Add kuap_lock()
Add kuap_lock() and call it when entering interrupts from user.

It is called kuap_lock() as it is similar to kuap_save_and_lock()
without the save.

However book3s/32 already have a kuap_lock(). Rename it
kuap_lock_addr().

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/4437e2deb9f6f549f7089d45e9c6f96a7e77905a.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:19 +11:00
Christophe Leroy 70428da94c powerpc/32s: Save content of sr0 to avoid 'mfsr'
Calling 'mfsr' to get the content of segment registers is heavy,
in addition it requires clearing of the 'reserved' bits.

In order to avoid this operation, save it in mm context and in
thread struct.

The saved sr0 is the one used by kernel, this means that on
locking entry it can be used as is.

For unlocking, the only thing to do is to clear SR_NX.

This improves null_syscall selftest by 12 cycles, ie 4%.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/b02baf2ed8f09bad910dfaeeb7353b2ae6830525.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy 526d4a4c77 powerpc/32s: Do kuep_lock() and kuep_unlock() in assembly
When interrupt and syscall entries where converted to C, KUEP locking
and unlocking was also converted. It improved performance by unrolling
the loop, and allowed easily implementing boot time deactivation of
KUEP.

However, null_syscall selftest shows that KUEP is still heavy
(361 cycles with KUEP, 212 cycles without).

A way to improve more is to group 'mtsr's together, instead of
repeating 'addi' + 'mtsr' several times.

In order to do that, more registers need to be available. In C, GCC
will always be able to provide the requested number of registers, but
at the cost of saving some data on the stack, which is counter
performant here.

So let's do it in assembly, when we have full control of which
register can be used. It also has the advantage of locking earlier
and unlocking later and it helps GCC generating less tricky code.
The only drawback is to make boot time deactivation less straight
forward and require 'hand' instruction patching.

Group 'mtsr's by 4.

With this change, null_syscall selftest reports 336 cycles. Without
the change it was 361 cycles, that's a 7% reduction.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/115cb279e9b9948dfd93a065e047081c59e3a2a6.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy dc3a0e5b83 powerpc/book3e: Activate KUEP at all time
On book3e,
- When using 64 bits PTE: User pages don't have the SX bit defined
so KUEP is always active.
- When using 32 bits PTE: Implement KUEP by clearing SX bit during
TLB miss for user pages. The impact is minimal and worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/e376b114283fb94504e2aa2de846780063252cde.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy ee2631603f powerpc/44x: Activate KUEP at all time
On 44x, KUEP is implemented by clearing SX bit during TLB miss
for user pages. The impact is minimal and not worth neither
boot time nor build time selection.

Activate it at all time.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/2414d662558e7fb27d1ed41c8e47c591d576acac.1634627931.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:17 +11:00
Christophe Leroy 06e7cbc29e powerpc/40x: Map 32Mbytes of memory at startup
As reported by Carlo, 16Mbytes is not enough with modern kernels
that tend to be a bit big, so map another 16M page at boot.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/89b5f974a7fa5011206682cd092e2c905530ff46.1632755552.git.christophe.leroy@csgroup.eu
2021-12-09 22:41:16 +11:00
Nicholas Piggin 387e220a2e powerpc/64s: Move hash MMU support code under CONFIG_PPC_64S_HASH_MMU
Compiling out hash support code when CONFIG_PPC_64S_HASH_MMU=n saves
128kB kernel image size (90kB text) on powernv_defconfig minus KVM,
350kB on pseries_defconfig minus KVM, 40kB on a tiny config.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
[mpe: Fixup defined(ARCH_HAS_MEMREMAP_COMPAT_ALIGN), which needs CONFIG.
      Fix radix_enabled() use in setup_initial_memory_limit(). Add some
      stubs to reduce number of ifdefs.]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-18-npiggin@gmail.com
2021-12-09 22:41:13 +11:00
Nicholas Piggin c28573744b powerpc/64s: Make hash MMU support configurable
This adds Kconfig selection which allows 64s hash MMU support to be
disabled. It can be disabled if radix support is enabled, the minimum
supported CPU type is POWER9 (or higher), and KVM is not selected.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-17-npiggin@gmail.com
2021-12-09 22:40:24 +11:00
Nicholas Piggin ffbe5d21d1 powerpc/64: pcpu setup avoid reading mmu_linear_psize on 64e or radix
Radix never sets mmu_linear_psize so it's always 4K, which causes pcpu
atom_size to always be PAGE_SIZE. 64e sets it to 1GB always.

Make paths for these platforms to be explicit about what value they set
atom_size to.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-12-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin 310dce6201 powerpc/64s: Make flush_and_reload_slb a no-op when radix is enabled
The radix test can exclude slb_flush_all_realmode() from being called
because flush_and_reload_slb() is only expected to flush ERAT when
called by flush_erat(), which is only on pre-ISA v3.0 CPUs that do not
support radix.

This helps the later change to make hash support configurable to not
introduce runtime changes to radix mode behaviour.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-9-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Nicholas Piggin 935b534c24 powerpc/64s: Move and rename do_bad_slb_fault as it is not hash specific
slb.c is hash-specific SLB management, but do_bad_slb_fault deals with
segment interrupts that occur with radix MMU as well.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211201144153.2456614-5-npiggin@gmail.com
2021-12-02 22:57:23 +11:00
Kees Cook 62ea67e319 powerpc/signal32: Use struct_group() to zero spe regs
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memset(), avoid intentionally writing across
neighboring fields.

Add a struct_group() for the spe registers so that memset() can correctly reason
about the size:

   In function 'fortify_memset_chk',
       inlined from 'restore_user_regs.part.0' at arch/powerpc/kernel/signal_32.c:539:3:
   >> include/linux/fortify-string.h:195:4: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
     195 |    __write_overflow_field();
         |    ^~~~~~~~~~~~~~~~~~~~~~~~

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211118203604.1288379-1-keescook@chromium.org
2021-12-02 10:39:00 +11:00
Mark Rutland 985faa7868 powerpc: Snapshot thread flags
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Generally this is unlikely to cause a
problem in practice, but it is somewhat unsound, and KCSAN will
legitimately warn that there is a data race.

To avoid such issues, a snapshot of the flags has to be taken prior to
using them. Some places already use READ_ONCE() for that, others do not.

Convert them all to the new flag accessor helpers.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Link: https://lore.kernel.org/r/20211129130653.2037928-11-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Mark Rutland 08b0af5b2a powerpc: Avoid discarding flags in system_call_exception()
Some thread flags can be set remotely, and so even when IRQs are disabled,
the flags can change under our feet. Thus, when setting flags we must use
an atomic operation rather than a plain read-modify-write sequence, as a
plain read-modify-write may discard flags which are concurrently set by a
remote thread, e.g.

	// task A			// task B
	tmp = A->thread_info.flags;
					set_tsk_thread_flag(A, NEWFLAG_B);
	tmp |= NEWFLAG_A;
	A->thread_info.flags = tmp;

arch/powerpc/kernel/interrupt.c's system_call_exception() sets
_TIF_RESTOREALL in the thread info flags with a read-modify-write, which
may result in other flags being discarded.

Elsewhere in the file it uses clear_bits() to atomically remove flag bits,
so use set_bits() here for consistency with those.

There may be reasons (e.g. instrumentation) that prevent the use of
set_thread_flag() and clear_thread_flag() here, which would otherwise be
preferable.

Fixes: ae7aaecc3f ("powerpc/64s: system call rfscv workaround for TM bugs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Eirik Fuller <efuller@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nicholas Piggin <npiggin@gmail.com>
Link: https://lore.kernel.org/r/20211129130653.2037928-10-mark.rutland@arm.com
2021-12-01 00:06:44 +01:00
Christophe Leroy f1797e4de1 powerpc/modules: Don't WARN on first module allocation attempt
module_alloc() first tries to allocate module text within 24 bits direct
jump from kernel text, and tries a wider allocation if first one fails.

When first allocation fails the following is observed in kernel logs:

  vmap allocation for size 2400256 failed: use vmalloc=<size> to increase size
  systemd-udevd: vmalloc error: size 2395133, vm_struct allocation failed, mode:0xcc0(GFP_KERNEL), nodemask=(null)
  CPU: 0 PID: 127 Comm: systemd-udevd Tainted: G        W         5.15.5-gentoo-PowerMacG4 #9
  Call Trace:
  [e2a53a50] [c0ba0048] dump_stack_lvl+0x80/0xb0 (unreliable)
  [e2a53a70] [c0540128] warn_alloc+0x11c/0x2b4
  [e2a53b50] [c0531be8] __vmalloc_node_range+0xd8/0x64c
  [e2a53c10] [c00338c0] module_alloc+0xa0/0xac
  [e2a53c40] [c027a368] load_module+0x2ae0/0x8148
  [e2a53e30] [c027fc78] sys_finit_module+0xfc/0x130
  [e2a53f30] [c0035098] ret_from_syscall+0x0/0x28
  ...

Add __GFP_NOWARN flag to first allocation so that no warning appears
when it fails.

Reported-by: Erhard Furtner <erhard_f@mailbox.org>
Fixes: 2ec13df167 ("powerpc/modules: Load modules closer to kernel text")
Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/93c9b84d6ec76aaf7b4f03468e22433a6d308674.1638267035.git.christophe.leroy@csgroup.eu
2021-11-30 22:44:32 +11:00
Nicholas Piggin aebd1fb45c powerpc: flexible GPR range save/restore macros
Introduce macros that operate on a (start, end) range of GPRs, which
reduces lines of code and need to do mental arithmetic while reading the
code.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Segher Boessenkool <segher@kernel.crashing.org>
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211022061322.2671178-1-npiggin@gmail.com
2021-11-29 23:15:20 +11:00
Nicholas Piggin e012c49998 powerpc/watchdog: help remote CPUs to flush NMI printk output
The printk layer at the moment does not seem to have a good way to force
flush printk messages that are created in NMI context, except in the
panic path.

NMI-context printk messages normally get to the console with irq_work,
but that won't help if the CPU is stuck with irqs disabled, as can be
the case for hard lockup watchdog messages.

The watchdog currently flushes the printk buffers after detecting a
lockup on remote CPUs, but they may not have processed their NMI IPI
yet by that stage, or they may have self-detected a lockup in which
case they won't go via this NMI IPI path.

Improve the situation by having NMI-context mark a flag if it called
printk, and have watchdog timer interrupts check if that flag was set
and try to flush if it was. Latency is not a big problem because we
were already stuck for a while, just need to try to make sure the
messages eventually make it out.

Depends-on: 5d5e4522a7 ("printk: restore flushing of NMI buffers on remote CPUs after NMI backtraces")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211119113146.752759-6-npiggin@gmail.com
2021-11-29 23:08:43 +11:00
Christophe Leroy 57dd3a7bdf powerpc: Don't bother about .data..Lubsan sections
Since commit 9a427556fb ("vmlinux.lds.h: catch compound literals
into data and BSS") .data..Lubsan sections are taken into account
in DATA_MAIN which is included in DATA_DATA macro.

No need to take care of them anymore in powerpc vmlinux.lds.S

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/3eb14570612eef17e01bb67f14a4450136001794.1637840601.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy 7dfbfb87c2 powerpc/ftrace: Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on PPC32
Unlike PPC64, PPC32 doesn't require any special compiler option
to get _mcount() call not clobbering registers.

Provide ftrace_regs_caller() and ftrace_regs_call() and activate
HAVE_DYNAMIC_FTRACE_WITH_REGS.

That's heavily copied from ftrace_64_mprofile.S

For the time being leave livepatching aside, it will come with
following patch.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1862dc7719855cc2a4eec80920d94c955877557e.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy c93d4f6ecf powerpc/ftrace: Add module_trampoline_target() for PPC32
module_trampoline_target() is used by __ftrace_modify_call().

Implement it for PPC32 so that CONFIG_DYNAMIC_FTRACE_WITH_REGS
can be activated on PPC32 as well.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/42345f464fb465f0fc76f3090e250be8fc1729f0.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Christophe Leroy 88670fdb26 powerpc/ftrace: No need to read LR from stack in _mcount()
All functions calling _mcount do it exactly the same way, with the
following sequence of instructions:

	c07de788:       7c 08 02 a6     mflr    r0
	c07de78c:       90 01 00 04     stw     r0,4(r1)
	c07de790:       4b 84 13 65     bl      c001faf4 <_mcount>

Allthough LR is pushed on stack, it is still in r0 while entering
_mcount().

Function arguments are in r3-r10, so r11 and r12 are still available
at that point.

Do like PPC64 and use r12 to move LR into CTR, so that r0 is preserved
and doesn't need to be restored from the stack.

While at it, bring back the EXPORT_SYMBOL at the end of _mcount.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/24a3ba7db388537c44a038026f926d885372e6d3.1635423081.git.christophe.leroy@csgroup.eu
2021-11-29 22:49:29 +11:00
Michael Ellerman ab85a27395 powerpc: Mark probe_machine() __init and static
Prior to commit b1923caa6e ("powerpc: Merge 32-bit and 64-bit
setup_arch()") probe_machine() was called from setup_32/64.c and lived
in setup-common.c. But now it's only called from setup-common.c so it
can be static and __init, and we don't need the declaration in
machdep.h either.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-6-mpe@ellerman.id.au
2021-11-29 22:49:26 +11:00
Michael Ellerman a4ac0d249a powerpc/smp: Move setup_profiling_timer() under CONFIG_PROFILING
setup_profiling_timer() is only needed when CONFIG_PROFILING is enabled.

Fixes the following W=1 warning when CONFIG_PROFILING=n:
  linux/arch/powerpc/kernel/smp.c:1638:5: error: no previous prototype for ‘setup_profiling_timer’

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211124093254.1054750-5-mpe@ellerman.id.au
2021-11-29 22:49:23 +11:00
Linus Torvalds 7b65b798a6 powerpc fixes for 5.16 #3
Fix KVM using a Power9 instruction on earlier CPUs, which could lead to the host SLB being
 incorrectly invalidated and a subsequent host crash.
 
 Fix kernel hardlockup on vmap stack overflow on 32-bit.
 
 Thanks to: Christophe Leroy, Nicholas Piggin, Fabiano Rosas.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEJFGtCPCthwEv2Y/bUevqPMjhpYAFAmGh96MTHG1wZUBlbGxl
 cm1hbi5pZC5hdQAKCRBR6+o8yOGlgGHDEACu37n/NLFp3qzHyrk4fHN6sUQ0wGsE
 oroKeIrTP2+JPsoEDd6Jd9oQmqmVLmVTp4y/Lu6AAJbK4p1He1iPIxIbZn12NwqY
 /siCihV76PXR8748g1j8r3a3UYDmmvmllAIf9JdAn2I4kKPE3J37/1NEGSqZyJ80
 HzkheTczUweANhVuG1RDh3WvnAKMuOvtrT/YgetVILig23oiUORQkZH5wcpScaCN
 qShnqKSuRt8HWaF3uRh6D0sf4o0BWfZ6Pjyi7R/T7s2dQVgz0uCtu4MuyGirx/0J
 uU3yDEZD24y6doyi6HUgAFGeGYqJXky2XDOMrgV6CNGFty9HFp6GgN6gtX353pZa
 9Xplw4kg9UJwGcyk1BvgdQBHdc83y5zmr8c2PfhgiP/5rfEyCTYtHgZKUu5LLW8Y
 ECYkSNnlAT68SancCmiXgos1heIiwB7922cRWJANfGEjItNBFldtb2QZio2bWOUX
 g4gZ8HN2P0OmOxV1nnNkmIe+ZKLeHPN2N/FR7pUdkFk2+BLWFx+SbrceCCicwMF4
 CUUzll8FMFmVAGjFivG/axzqlDWMY5lrDsFZamaWN3VQEo7TwL6l8/cDcFZF4QBq
 viLWbi7sfQm6M3tZXFS465sG4SpUQ4lI8PKptuz5DMdr6WKisTLR7OF0klxsEwZj
 ZcT8a/wzUIZ53w==
 =MBp+
 -----END PGP SIGNATURE-----

Merge tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Michael Ellerman:
 "Fix KVM using a Power9 instruction on earlier CPUs, which could lead
  to the host SLB being incorrectly invalidated and a subsequent host
  crash.

  Fix kernel hardlockup on vmap stack overflow on 32-bit.

  Thanks to Christophe Leroy, Nicholas Piggin, and Fabiano Rosas"

* tag 'powerpc-5.16-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/32: Fix hardlockup on vmap stack overflow
  KVM: PPC: Book3S HV: Prevent POWER7/8 TLB flush flushing SLB
2021-11-27 10:06:15 -08:00
Linus Torvalds b501b85957 asm-generic: syscall table updates
André Almeida sends an update for the newly added futex_waitv
 syscall that was initially only added to a few architectures.
 
 Some additional ones have since made it through architecture
 maintainer trees, this finishes the remaining ones.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEo6/YBQwIrVS28WGKmmx57+YAGNkFAmGfrmIACgkQmmx57+YA
 GNl0eg/9Eu2iliGxiQPzPw93a3DCYr7eiylbhHbgzvCbUtDL4cTE/0fcFGCYUoVu
 k5UT/5ja3Q4z8wE+MUbxXf6igr+ANKJNwomKksTJV1Q6wKF23k5vUEd2ZXGyvrSk
 F+2X0bpB0wZzmMBh98L6dWsEaDJ//8iRZNW4ErmPR4mp25iDqhzZ1k2RoRg9mYoM
 /Lw6u5zlBiqAw5nnMc6BxPo8gJGwgcKDu4VDngpGDVGlvBHgd7Kf1AKi1+aBumml
 WzVXEEdUp3LTj7O/8oU3dKPuZMhl+k/mKUvHn/cieG00FxfMpOnxj0gdaUGgG0O4
 imhQquXtPueq/W3Uod+MhVulKR6AziEOsPhJayMGopLReE0spKDxcplC/OSbMhGj
 0iA6auLdHpHBCg3KWZDj0Sjf2NwL9UCFVDrk8XAQQo7bY2U5LQbcbr0PS7iLlpzN
 gIb+r2k3Azwk/Iqnvl2/SR1cZTx5KxfxCydT03vXaYQXhc5l+u4RvjOZcSGgZXqd
 gzC78vd6fJGKyVeaL2xb6/w7qmA4DFd6sQAbgMlVFRRS6AIJrIdDEZAQztSJ+uYK
 p9A2qzeXz+ftzonySbPHmV4RBGjSVVecUBjRmmDfKNbqmFUjjnCwCv+LeDXh4IS6
 mjmSA/9liYdkHUaujwCLhOEskuCx9ZPeAsqmzXMER0BDzReVlH4=
 =Vy2A
 -----END PGP SIGNATURE-----

Merge tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic

Pull asm-generic syscall table update from Arnd Bergmann:
 "André Almeida sends an update for the newly added futex_waitv syscall
  that was initially only added to a few architectures.

  Some additional ones have since made it through architecture
  maintainer trees, this finishes the remaining ones"

* tag 'asm-generic-5.16-2' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
  futex: Wireup futex_waitv syscall
2021-11-25 10:41:28 -08:00
André Almeida a0eb2da92b futex: Wireup futex_waitv syscall
Wireup futex_waitv syscall for all remaining archs.

Signed-off-by: André Almeida <andrealmeid@collabora.com>
Acked-by: Max Filippov <jcmvbkbc@gmail.com>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Tested-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2021-11-25 14:26:12 +01:00
Nicholas Piggin 3d030e3018 powerpc/watchdog: Fix wd_smp_last_reset_tb reporting
wd_smp_last_reset_tb now gets reset by watchdog_smp_panic() as part of
marking CPUs stuck and removing them from the pending mask before it
begins any printing. This causes last reset times reported to be off.

Fix this by reading it into a local variable before it gets reset.

Fixes: 76521c4b02 ("powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211125103346.1188958-1-npiggin@gmail.com
2021-11-25 21:48:39 +11:00
Nicholas Piggin 1f01bf9076 powerpc/watchdog: read TB close to where it is used
When taking watchdog actions, printing messages, comparing and
re-setting wd_smp_last_reset_tb, etc., read TB close to the point of use
and under wd_smp_lock or printing lock (if applicable).

This should keep timebase mostly monotonic with kernel log messages, and
could prevent (in theory) a laggy CPU updating wd_smp_last_reset_tb to
something a long way in the past, and causing other CPUs to appear to be
stuck.

These additional TB reads are all slowpath (lockup has been detected),
so performance does not matter.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-5-npiggin@gmail.com
2021-11-25 11:25:34 +11:00
Nicholas Piggin 76521c4b02 powerpc/watchdog: Avoid holding wd_smp_lock over printk and smp_send_nmi_ipi
There is a deadlock with the console_owner lock and the wd_smp_lock:

CPU x takes the console_owner lock
CPU y takes a watchdog timer interrupt and takes __wd_smp_lock
CPU x takes a soft-NMI interrupt, detects deadlock, spins on __wd_smp_lock
CPU y detects deadlock, tries to print something and spins on console_owner
-> deadlock

Change the watchdog locking scheme so wd_smp_lock protects the watchdog
internal data, but "reporting" (printing, issuing NMI IPIs, taking any
action outside of watchdog) uses a non-waiting exclusion. If a CPU detects
a problem but can not take the reporting lock, it just returns because
something else is already reporting. It will try again at some point.

Typically hard lockup watchdog report usefulness is not impacted due to
failure to spewing a large enough amount of data in as short a time as
possible, but by messages getting garbled.

Laurent debugged this and found the deadlock, and this patch is based on
his general approach to avoid expensive operations while holding the lock.
With the addition of the reporting exclusion.

Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com>
[np: rework to add reporting exclusion update changelog]
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-4-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin 858c93c315 powerpc/watchdog: tighten non-atomic read-modify-write access
Most updates to wd_smp_cpus_pending are under lock except the watchdog
interrupt bit clear.

This can race with non-atomic RMW updates to the mask under lock, which
can happen in two instances:

Firstly, if another CPU detects this one is stuck, removes it from the
mask, mask becomes empty and is re-filled with non-atomic stores. This
is okay because it would re-fill the mask with this CPU's bit clear
anyway (because this CPU is now stuck), so it doesn't matter that the
bit clear update got "lost". Add a comment for this.

Secondly, if another CPU detects a different CPU is stuck and removes it
from the pending mask with a non-atomic store to bytes which also
include the bit of this CPU. This case can result in the bit clear being
lost and the end result being the bit is set. This should be so rare it
hardly matters, but to make things simpler to reason about just avoid
the non-atomic access for that case.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-3-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Nicholas Piggin 5dad4ba68a powerpc/watchdog: Fix missed watchdog reset due to memory ordering race
It is possible for all CPUs to miss the pending cpumask becoming clear,
and then nobody resetting it, which will cause the lockup detector to
stop working. It will eventually expire, but watchdog_smp_panic will
avoid doing anything if the pending mask is clear and it will never be
reset.

Order the cpumask clear vs the subsequent test to close this race.

Add an extra check for an empty pending mask when the watchdog fires and
finds its bit still clear, to try to catch any other possible races or
bugs here and keep the watchdog working. The extra test in
arch_touch_nmi_watchdog is required to prevent the new warning from
firing off.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Reviewed-by: Laurent Dufour <ldufour@linux.ibm.com>
Debugged-by: Laurent Dufour <ldufour@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211110025056.2084347-2-npiggin@gmail.com
2021-11-25 11:25:33 +11:00
Peiwei Hu 869fb7e5ae powerpc/prom_init: Fix improper check of prom_getprop()
prom_getprop() can return PROM_ERROR. Binary operator can not identify
it.

Fixes: 94d2dde738 ("[POWERPC] Efika: prune fixups and make them more carefull")
Signed-off-by: Peiwei Hu <jlu.hpw@foxmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/tencent_BA28CC6897B7C95A92EB8C580B5D18589105@qq.com
2021-11-25 11:25:33 +11:00
Nathan Lynch dd5cde457a powerpc/rtas: rtas_busy_delay_time() kernel-doc
Provide API documentation for rtas_busy_delay_time(), explaining why we
return the same value for 9900 and -2.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-3-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch 38f7b7067d powerpc/rtas: rtas_busy_delay() improvements
Generally RTAS cannot block, and in PAPR it is required to return control
to the OS within a few tens of microseconds. In order to support operations
which may take longer to complete, many RTAS primitives can return
intermediate -2 ("busy") or 990x ("extended delay") values, which indicate
that the OS should reattempt the same call with the same arguments at some
point in the future.

Current versions of PAPR are less than clear about this, but the intended
meanings of these values in more detail are:

RTAS_BUSY (-2): RTAS has suspended a potentially long-running operation in
order to meet its latency obligation and give the OS the opportunity to
perform other work. RTAS can resume making progress as soon as the OS
reattempts the call.

RTAS_EXTENDED_DELAY_{MIN...MAX} (9900-9905): RTAS must wait for an external
event to occur or for internal contention to resolve before it can complete
the requested operation. The value encodes a non-binding hint as to roughly
how long the OS should wait before calling again, but the OS is allowed to
reattempt the call sooner or even immediately.

Linux of course must take its own CPU scheduling obligations into account
when handling these statuses; e.g. a task which receives an RTAS_BUSY
status should check whether to reschedule before it attempts the RTAS call
again to avoid starving other tasks.

rtas_busy_delay() is a helper function that "consumes" a busy or extended
delay status. Common usage:

    int rc;

    do {
        rc = rtas_call(rtas_token("some-function"), ...);
    } while (rtas_busy_delay(rc));

    /* convert rc to Linux error value, etc */

If rc is a busy or extended delay status, the caller can rely on
rtas_busy_delay() to perform an appropriate sleep or reschedule and return
nonzero. Other statuses are handled normally by the caller.

The current implementation of rtas_busy_delay() both oversleeps and
overuses the CPU:

*  It performs msleep() for all 990x and even when no delay is
   suggested (-2), but this is understood to actually sleep for two jiffies
   minimum in practice (20ms with HZ=100). 9900 (1ms) and 9901 (10ms)
   appear to be the most common extended delay statuses, and the
   oversleeping measurably lengthens DLPAR operations, which perform
   many RTAS calls.

*  It does not sleep on 990x unless need_resched() is true, causing code
   like the loop above to needlessly retry, wasting CPU time.

Alter the logic to align better with the intended meanings:

*  When passed RTAS_BUSY, perform cond_resched() and return without
   sleeping. The caller should reattempt immediately

*  Always sleep when passed an extended delay status, using usleep_range()
   for precise shorter sleeps. Limit the sleep time to one second even
   though there are higher architected values.

Change rtas_busy_delay()'s return type to bool to better reflect its usage,
and add kernel-doc.

rtas_busy_delay_time() is unchanged, even though it "incorrectly" returns 1
for RTAS_BUSY. There are users of that API with open-coded delay loops in
sensitive contexts that will have to be taken on an individual basis.

Brief results for addition and removal of 5GB memory on a small P9 PowerVM
partition follow. Load was generated with stress-ng --cpu N. For add,
elapsed time is greatly reduced without significant change in the number of
RTAS calls or time spent on CPU. For remove, elapsed time is modestly
reduced, with significant reductions in RTAS calls and time spent on CPU.

With no competing workload (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             1,935      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
-            609.99 msec task-clock                #    0.183 CPUs utilized            ( +-  0.19% )
+             1,956      probe:rtas_call           #    0.003 M/sec                    ( +-  0.17% )
+            618.56 msec task-clock                #    0.278 CPUs utilized            ( +-  0.11% )

-            3.3322 +- 0.0670 seconds time elapsed  ( +-  2.01% )
+            2.2222 +- 0.0416 seconds time elapsed  ( +-  1.87% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             6,224      probe:rtas_call           #    0.008 M/sec                    ( +-  2.57% )
-            750.36 msec task-clock                #    0.190 CPUs utilized            ( +-  2.01% )
+               843      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
+            250.66 msec task-clock                #    0.068 CPUs utilized            ( +-  0.17% )

-            3.9394 +- 0.0890 seconds time elapsed  ( +-  2.26% )
+             3.678 +- 0.113 seconds time elapsed  ( +-  3.07% )

With all CPUs 100% busy (- before, + after):

  Performance counter stats for 'bash -c echo "memory add count 20" > /sys/kernel/dlpar' (10 runs):

-             2,979      probe:rtas_call           #    0.003 M/sec                    ( +-  0.12% )
-          1,096.62 msec task-clock                #    0.105 CPUs utilized            ( +-  0.10% )
+             2,981      probe:rtas_call           #    0.003 M/sec                    ( +-  0.22% )
+          1,095.26 msec task-clock                #    0.154 CPUs utilized            ( +-  0.21% )

-            10.476 +- 0.104 seconds time elapsed  ( +-  1.00% )
+            7.1124 +- 0.0865 seconds time elapsed  ( +-  1.22% )

  Performance counter stats for 'bash -c echo "memory remove count 20" > /sys/kernel/dlpar' (10 runs):

-             2,702      probe:rtas_call           #    0.004 M/sec                    ( +-  4.00% )
-            722.71 msec task-clock                #    0.067 CPUs utilized            ( +-  2.41% )
+             1,246      probe:rtas_call           #    0.003 M/sec                    ( +-  0.25% )
+            487.73 msec task-clock                #    0.049 CPUs utilized            ( +-  0.20% )

-            10.829 +- 0.163 seconds time elapsed  ( +-  1.51% )
+            9.9887 +- 0.0866 seconds time elapsed  ( +-  0.87% )

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211117060259.957178-2-nathanl@linux.ibm.com
2021-11-25 11:25:33 +11:00
Nathan Lynch 53cadf7dee powerpc/rtas: kernel-doc fixes
Fix the following issues reported by kernel-doc:

$ scripts/kernel-doc -v -none arch/powerpc/kernel/rtas.c
arch/powerpc/kernel/rtas.c:810: info: Scanning doc for function rtas_activate_firmware
arch/powerpc/kernel/rtas.c:818: warning: contents before sections
arch/powerpc/kernel/rtas.c:841: info: Scanning doc for function rtas_call_reentrant
arch/powerpc/kernel/rtas.c:893: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
 * Find a specific pseries error log in an RTAS extended event log.

Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211116215806.928235-1-nathanl@linux.ibm.com
2021-11-25 11:25:32 +11:00
Oliver O'Halloran 157616f3c2 powerpc/eeh: Use a goto for recovery failures
The EEH recovery logic in eeh_handle_normal_event() has some pretty strange
flow control. If we remove all the actual recovery logic we're left with
the following skeleton:

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result != PCI_ERS_RESULT_DISCONNECT) {
		...
	}

	if (result == PCI_ERS_RESULT_NONE) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_CAN_RECOVER) {
		...
	}

	if (result == PCI_ERS_RESULT_NEED_RESET) {
		...
	}

	if ((result == PCI_ERS_RESULT_RECOVERED) ||
	    (result == PCI_ERS_RESULT_NONE)) {
		...
		goto out;
	}

	/*
	 * unsuccessful recovery / PCI_ERS_RESULT_DISCONECTED
	 * handling is here.
	 */
	...

	out:
	...

Most of the "if () { ... }" blocks above change "result" to
PCI_ERS_RESULT_DISCONNECTED if an error occurs in that recovery step. This
makes the control flow a bit confusing since it breaks the early-exit
pattern that is generally used in Linux. In any case we end up handling the
error in the final else block so why not just jump there directly? Doing so
also allows us to de-indent a bunch of code.

No functional changes.

[dja: rebase on top of linux-next + my preceeding refactor,
      move clearing the EEH_DEV_NO_HANDLER bit above the first goto so that
      it is always clear in the error handler code as it was before.]

Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-2-dja@axtens.net
2021-11-25 11:25:31 +11:00
Daniel Axtens 10b34ece13 powerpc/eeh: Small refactor of eeh_handle_normal_event()
The control flow of eeh_handle_normal_event() is a bit tricky.

Break out one of the error handling paths - rather than be in an else
block, we'll make it part of the regular body of the function and put a
'goto out;' in the true limb of the if.

Signed-off-by: Daniel Axtens <dja@axtens.net>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/20211015070628.1331635-1-dja@axtens.net
2021-11-25 11:25:31 +11:00
Julia Lawall a1d2b210ff powerpc/btext: add missing of_node_put
for_each_node_by_type performs an of_node_get on each iteration, so
a break out of the loop requires an of_node_put.

A simplified version of the semantic patch that fixes this problem is as
follows (http://coccinelle.lip6.fr):

// <smpl>
@@
local idexpression n;
expression e;
@@

 for_each_node_by_type(n,...) {
   ...
(
   of_node_put(n);
|
   e = n
|
+  of_node_put(n);
?  break;
)
   ...
 }
... when != n
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Link: https://lore.kernel.org/r/1448051604-25256-6-git-send-email-Julia.Lawall@lip6.fr
2021-11-25 11:25:29 +11:00