Commit Graph

1100 Commits

Author SHA1 Message Date
Conor Dooley 6b913e3da8
RISC-V: rework comments in ISA string parser
I have found these comments to not be at all helpful whenever I look at
the parser. Further, the comments in the default case (single letter
parser) are not quite right either.
Group the comments into a larger one at the start of each case, that
attempts to explain things at a higher level.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230607-headpiece-tannery-83ed5cc4856a@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-21 07:45:16 -07:00
Conor Dooley 069b0d5170
RISC-V: validate riscv,isa at boot, not during ISA string parsing
Since riscv_fill_hwcap() now only iterates over possible cpus, the
basic validation of whether riscv,isa contains "rv<width>" can be moved
to riscv_early_of_processor_hartid().

Further, "ima" support is required by the kernel, so reject any CPU not
fitting the bill.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230607-guts-blurry-67e711acf328@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-21 07:45:15 -07:00
Conor Dooley 2ac8743437
RISC-V: split early & late of_node to hartid mapping
Some back and forth with Drew [1] about riscv_fill_hwcap() resulted in
the realisation that it is not very useful to parse the DT & perform
validation of riscv,isa every time we would like to get the id for a
hart.

Although it is no longer called in riscv_fill_hwcap(),
riscv_of_processor_hartid() is called in several other places.
Notably in setup_smp() it forms part of the logic for filling the mask
of possible CPUs. Since a possible CPU must have passed this basic
validation of riscv,isa, a repeat validation is not required.

Rename riscv_of_processor_id() to riscv_early_of_processor_id(),
which will be called from setup_smp() & introduce a new
riscv_of_processor_id() which makes use of the pre-populated mask of
possible cpus.

Link: https://lore.kernel.org/linux-riscv/xvdswl3iyikwvamny7ikrxo2ncuixshtg3f6uucjahpe3xpc5c@ud4cz4fkg5dj/ [1]
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230607-glade-pastel-d8cbd9d9f3c6@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-21 07:45:14 -07:00
Conor Dooley fed14be476
RISC-V: simplify register width check in ISA string parsing
Saving off the `isa` pointer to a temp variable, followed by checking if
it has been incremented is a bit of an odd pattern. Perhaps it was done
to avoid a funky looking if statement mixed with the ifdeffery.

Now that we use IS_ENABLED() here just return from the parser as soon as
we detect a mismatch between the string and the currently running
kernel.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Sunil V L <sunilvl@ventanamicro.com>
Link: https://lore.kernel.org/r/20230607-splatter-bacterium-a75bb9f0d0b7@spud
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-21 07:45:13 -07:00
Fangrui Song 4681dacade
riscv: replace deprecated scall with ecall
scall is a deprecated alias for ecall. ecall is used in several places,
so there is no assembler compatibility concern.

Signed-off-by: Fangrui Song <maskray@google.com>
Link: https://lore.kernel.org/r/20230423223210.126948-1-maskray@google.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20 09:02:09 -07:00
Tiezhu Yang 58b1294dd1
riscv: uprobes: Restore thread.bad_cause
thread.bad_cause is saved in arch_uprobe_pre_xol(), it should be restored
in arch_uprobe_{post,abort}_xol() accordingly, otherwise the save operation
is meaningless, this change is similar with x86 and powerpc.

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Fixes: 74784081aa ("riscv: Add uprobes supported")
Link: https://lore.kernel.org/r/1682214146-3756-1-git-send-email-yangtiezhu@loongson.cn
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-20 08:20:23 -07:00
Palmer Dabbelt 16252e018a
Merge patch series "RISC-V: Export Zba, Zbb to usermode via hwprobe"
Evan Green <evan@rivosinc.com> says:

This change detects the presence of Zba, Zbb, and Zbs extensions and exports
them per-hart to userspace via the hwprobe mechanism. Glibc can then use
these in setting up hwcaps-based library search paths.

There's a little bit of extra housekeeping here: the first change adds
Zba and Zbs to the set of extensions the kernel recognizes, and the second
change starts tracking ISA features per-hart (in addition to the ANDed
mask of features across all harts which the kernel uses to make
decisions). Now that we track the ISA information per-hart, we could
even fix up /proc/cpuinfo to accurately report extension per-hart,
though I've left that out of this series for now.

* b4-shazam-merge:
  RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
  RISC-V: Track ISA extensions per hart
  RISC-V: Add Zba, Zbs extension probing

Link: https://lore.kernel.org/r/20230509182504.2997252-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 14:34:40 -07:00
Evan Green c0baf32103
RISC-V: hwprobe: Expose Zba, Zbb, and Zbs
Add two new bits to the IMA_EXT_0 key for ZBA, ZBB, and ZBS extensions.
These are accurately reported per CPU.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230509182504.2997252-4-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 09:53:10 -07:00
Evan Green 82e9c66e81
RISC-V: Track ISA extensions per hart
The kernel maintains a mask of ISA extensions ANDed together across all
harts. Let's also keep a bitmap of ISA extensions for each CPU. Although
the kernel is currently unlikely to enable a feature that exists only on
some CPUs, we want the ability to report asymmetric CPU extensions
accurately to usermode.

Note that riscv_fill_hwcaps() runs before the per_cpu_offsets are built,
which is why I've used a [NR_CPUS] array rather than per_cpu() data.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230509182504.2997252-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 09:51:22 -07:00
Evan Green c6699baf10
RISC-V: Add Zba, Zbs extension probing
Add the Zba address bit manipulation extension and Zbs single bit
instructions extension into those the kernel is aware of and maintains
in its riscv_isa bitmap.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230509182504.2997252-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 09:51:16 -07:00
Sunil V L 3b426d4b5b
RISC-V: ACPI : Fix for usage of pointers in different address space
The arch specific __acpi_map_table can be wrapper around either
early_memremap or early_ioremap. But early_memremap
routine works with normal pointers whereas __acpi_map_table expects
pointers in iomem address space. This causes kernel test bot to fail
while using the sparse tool. Fix the issue by using early_ioremap and
similar fix done for __acpi_unmap_table.

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202305201427.I7QhPjNW-lkp@intel.com/
Fixes: a91a9ffbd3 ("RISC-V: Add support to build the ACPI core")
Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230607112417.782085-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 09:27:58 -07:00
Song Shuai c6399b8930
riscv: hibernation: Remove duplicate call of suspend_restore_csrs
The suspend_restore_csrs is called in both __hibernate_cpu_resume
and the `else` of subsequent swsusp_arch_suspend.

Removing the first call makes both suspend_{save,restore}_csrs
left in swsusp_arch_suspend for clean code.

Fixes: c031721001 ("RISC-V: Add arch functions to support hibernation/suspend-to-disk")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: JeeHeng Sia <jeeheng.sia@starfivetech.com>
Signed-off-by: Song Shuai <songshuaishuai@tinylab.org>
Link: https://lore.kernel.org/r/20230522025020.285042-1-songshuaishuai@tinylab.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 09:27:57 -07:00
Song Shuai 650ea2a1dd
riscv: hibernation: Replace jalr with jr before suspend_restore_regs
No need to link the x1/ra reg via jalr before suspend_restore_regs
So it's better to replace jalr with jr.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Reviewed-by: JeeHeng Sia <jeeheng.sia@starfivetech.com >
Link: https://lore.kernel.org/r/20230519060854.214138-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-19 09:27:56 -07:00
Ben Dooks c818fea83d
riscv: say disabling zicbom if no or bad riscv,cbom-block-size found
If Zicbom is present but there was no riscv,cbom-blocks-size property found
during the cpu feeatures probe, or the cbom-block-size is not valid, then
the extension will be disabled. Make the print explicitly say this is
disabled to ensure that there is no confusion about what is being done.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230317134512.254627-1-ben.dooks@codethink.co.uk
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-14 07:17:34 -07:00
Palmer Dabbelt d5e45e810e
Merge patch series "riscv: Add vector ISA support"
Andy Chiu <andy.chiu@sifive.com> says:

This is the v21 patch series for adding Vector extension support in
Linux. Please refer to [1] for the introduction of the patchset. The
v21 patch series was aimed to solve build issues from v19, provide usage
guideline for the prctl interface, and address review comments on v20.

Thank every one who has been reviewing, suggesting on the topic. Hope
this get a step closer to the final merge.

* b4-shazam-merge: (27 commits)
  selftests: add .gitignore file for RISC-V hwprobe
  selftests: Test RISC-V Vector prctl interface
  riscv: Add documentation for Vector
  riscv: Enable Vector code to be built
  riscv: detect assembler support for .option arch
  riscv: Add sysctl to set the default vector rule for new processes
  riscv: Add prctl controls for userspace vector management
  riscv: hwcap: change ELF_HWCAP to a function
  riscv: KVM: Add vector lazy save/restore support
  riscv: kvm: Add V extension to KVM ISA
  riscv: prevent stack corruption by reserving task_pt_regs(p) early
  riscv: signal: validate altstack to reflect Vector
  riscv: signal: Report signal frame size to userspace via auxv
  riscv: signal: Add sigcontext save/restore for vector
  riscv: signal: check fp-reserved words unconditionally
  riscv: Add ptrace vector support
  riscv: Allocate user's vector context in the first-use trap
  riscv: Add task switch support for vector
  riscv: Introduce struct/helpers to save/restore per-task Vector state
  riscv: Introduce riscv_v_vsize to record size of Vector context
  ...

Link: https://lore.kernel.org/r/20230605110724.21391-1-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:17:09 -07:00
Andy Chiu 7ca7a7b9b6
riscv: Add sysctl to set the default vector rule for new processes
To support Vector extension, the series exports variable-length vector
registers on the signal frame. However, this potentially breaks abi if
processing vector registers is required in the signal handler for old
binaries. For example, there is such need if user-level context switch
is triggerred via signals[1].

For this reason, it is best to leave a decision to distro maintainers,
where the enablement of userspace Vector for new launching programs can
be controlled. Developers may also need the switch to experiment with.
The parameter is configurable through sysctl interface so a distro may
turn off Vector early at init script if the break really happens in the
wild.

The switch will only take effects on new execve() calls once set. This
will not effect existing processes that do not call execve(), nor
processes which has been set with a non-default vstate_ctrl by making
explicit PR_RISCV_V_SET_CONTROL prctl() calls.

Link: https://lore.kernel.org/all/87cz4048rp.fsf@all.your.base.are.belong.to.us/
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Vincent Chen <vincent.chen@sifive.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-23-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:54 -07:00
Andy Chiu 1fd96a3e9d
riscv: Add prctl controls for userspace vector management
This patch add two riscv-specific prctls, to allow usespace control the
use of vector unit:

 * PR_RISCV_V_SET_CONTROL: control the permission to use Vector at next,
   or all following execve for a thread. Turning off a thread's Vector
   live is not possible since libraries may have registered ifunc that
   may execute Vector instructions.
 * PR_RISCV_V_GET_CONTROL: get the same permission setting for the
   current thread, and the setting for following execve(s).

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Greentime Hu <greentime.hu@sifive.com>
Reviewed-by: Vincent Chen <vincent.chen@sifive.com>
Link: https://lore.kernel.org/r/20230605110724.21391-22-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:53 -07:00
Andy Chiu 50724efcb3
riscv: hwcap: change ELF_HWCAP to a function
Using a function is flexible to represent ELF_HWCAP. So the kernel may
encode hwcap reflecting supported hardware features just at the moment of
the start of each program.

This will be helpful when we introduce prctl/sysctl interface to control
per-process availability of Vector extension in following patches.
Programs started with V disabled should see V masked off in theirs
ELF_HWCAP.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230605110724.21391-21-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:52 -07:00
Greentime Hu c7cdd96eca
riscv: prevent stack corruption by reserving task_pt_regs(p) early
Early function calls, such as setup_vm(), relocate_enable_mmu(),
soc_early_init() etc, are free to operate on stack. However,
PT_SIZE_ON_STACK bytes at the head of the kernel stack are purposedly
reserved for the placement of per-task register context pointed by
task_pt_regs(p). Those functions may corrupt task_pt_regs if we overlap
the $sp with it. In fact, we had accidentally corrupted sstatus.VS in some
tests, treating the kernel to save V context before V was actually
allocated, resulting in a kernel panic.

Thus, we should skip PT_SIZE_ON_STACK for $sp before making C function
calls from the top-level assembly.

Co-developed-by: ShihPo Hung <shihpo.hung@sifive.com>
Signed-off-by: ShihPo Hung <shihpo.hung@sifive.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-18-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:49 -07:00
Andy Chiu 76e22fdc2c
riscv: signal: validate altstack to reflect Vector
Some extensions, such as Vector, dynamically change footprint on a
signal frame, so MINSIGSTKSZ is no longer accurate. For example, an
RV64V implementation with vlen = 512 may occupy 2K + 40 + 12 Bytes of a
signal frame with the upcoming support. And processes that do not
execute any vector instructions do not need to reserve the extra
sigframe. So we need a way to guard the allocation size of the sigframe
at process runtime according to current status of V.

Thus, provide the function sigaltstack_size_valid() to validate its size
based on current allocation status of supported extensions.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-17-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:48 -07:00
Vincent Chen e92f469b07
riscv: signal: Report signal frame size to userspace via auxv
The vector register belongs to the signal context. They need to be stored
and restored as entering and leaving the signal handler. According to the
V-extension specification, the maximum length of the vector registers can
be 2^16. Hence, if userspace refers to the MINSIGSTKSZ to create a
sigframe, it may not be enough. To resolve this problem, this patch refers
to the commit 94b07c1f8c
("arm64: signal: Report signal frame size to userspace via auxv") to enable
userspace to know the minimum required sigframe size through the auxiliary
vector and use it to allocate enough memory for signal context.

Note that auxv always reports size of the sigframe as if V exists for
all starting processes, whenever the kernel has CONFIG_RISCV_ISA_V. The
reason is that users usually reference this value to allocate an
alternative signal stack, and the user may use V anytime. So the user
must reserve a space for V-context in sigframe in case that the signal
handler invokes after the kernel allocating V.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-16-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:48 -07:00
Greentime Hu 8ee0b41898
riscv: signal: Add sigcontext save/restore for vector
This patch facilitates the existing fp-reserved words for placement of
the first extension's context header on the user's sigframe. A context
header consists of a distinct magic word and the size, including the
header itself, of an extension on the stack. Then, the frame is followed
by the context of that extension, and then a header + context body for
another extension if exists. If there is no more extension to come, then
the frame must be ended with a null context header. A special case is
rv64gc, where the kernel support no extensions requiring to expose
additional regfile to the user. In such case the kernel would place the
null context header right after the first reserved word of
__riscv_q_ext_state when saving sigframe. And the kernel would check if
all reserved words are zeros when a signal handler returns.

__riscv_q_ext_state---->|	|<-__riscv_extra_ext_header
			~	~
	.reserved[0]--->|0	|<-	.reserved
		<-------|magic	|<-	.hdr
		|	|size	|_______ end of sc_fpregs
		|	|ext-bdy|
		|	~	~
	+)size	------->|magic	|<- another context header
			|size	|
			|ext-bdy|
			~	~
			|magic:0|<- null context header
			|size:0	|

The vector registers will be saved in datap pointer. The datap pointer
will be allocated dynamically when the task needs in kernel space. On
the other hand, datap pointer on the sigframe will be set right after
the __riscv_v_ext_state data structure.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Suggested-by: Vineet Gupta <vineetg@rivosinc.com>
Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-15-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:47 -07:00
Andy Chiu a45cedaa1a
riscv: signal: check fp-reserved words unconditionally
In order to let kernel/user locate and identify an extension context on
the existing sigframe, we are going to utilize reserved space of fp and
encode the information there. And since the sigcontext has already
preserved a space for fp context w or w/o CONFIG_FPU, we move those
reserved words checking/setting routine back into generic code.

This commit also undone an additional logical change carried by the
refactor commit 007f5c3589
("Refactor FPU code in signal setup/return procedures"). Originally we
did not restore fp context if restoring of gpr have failed. And it was
fine on the other side. In such way the kernel could keep the regfiles
intact, and potentially react at the failing point of restore.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-14-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:46 -07:00
Greentime Hu 0c59922c76
riscv: Add ptrace vector support
This patch adds ptrace support for riscv vector. The vector registers will
be saved in datap pointer of __riscv_v_ext_state. This pointer will be set
right after the __riscv_v_ext_state data structure then it will be put in
ubuf for ptrace system call to get or set. It will check if the datap got
from ubuf is set to the correct address or not when the ptrace system call
is trying to set the vector registers.

Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-13-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:45 -07:00
Andy Chiu cd05483724
riscv: Allocate user's vector context in the first-use trap
Vector unit is disabled by default for all user processes. Thus, a
process will take a trap (illegal instruction) into kernel at the first
time when it uses Vector. Only after then, the kernel allocates V
context and starts take care of the context for that user process.

Suggested-by: Richard Henderson <richard.henderson@linaro.org>
Link: https://lore.kernel.org/r/3923eeee-e4dc-0911-40bf-84c34aee962d@linaro.org
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230605110724.21391-12-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:44 -07:00
Greentime Hu 3a2df6323d
riscv: Add task switch support for vector
This patch adds task switch support for vector. It also supports all
lengths of vlen.

Suggested-by: Andrew Waterman <andrew@sifive.com>
Co-developed-by: Nick Knight <nick.knight@sifive.com>
Signed-off-by: Nick Knight <nick.knight@sifive.com>
Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Co-developed-by: Ruinland Tsai <ruinland.tsai@sifive.com>
Signed-off-by: Ruinland Tsai <ruinland.tsai@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230605110724.21391-11-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:43 -07:00
Greentime Hu 7017858eb2
riscv: Introduce riscv_v_vsize to record size of Vector context
This patch is used to detect the size of CPU vector registers and use
riscv_v_vsize to save the size of all the vector registers. It assumes all
harts has the same capabilities in a SMP system. If a core detects VLENB
that is different from the boot core, then it warns and turns off V
support for user space.

Co-developed-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-9-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:41 -07:00
Guo Ren 74abe5a39d
riscv: Disable Vector Instructions for kernel itself
Disable vector instructions execution for kernel mode at its entrances.
This helps find illegal uses of vector in the kernel space, which is
similar to the fpu.

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Co-developed-by: Vincent Chen <vincent.chen@sifive.com>
Signed-off-by: Vincent Chen <vincent.chen@sifive.com>
Co-developed-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Signed-off-by: Han-Kuan Chen <hankuan.chen@sifive.com>
Co-developed-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-7-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:39 -07:00
Greentime Hu 6b53382872
riscv: Clear vector regfile on bootup
clear vector registers on boot if kernel supports V.

Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-6-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:38 -07:00
Andy Chiu 162e4df137
riscv: hwprobe: Add support for probing V in RISCV_HWPROBE_KEY_IMA_EXT_0
Probing kernel support for Vector extension is available now. This only
add detection for V only. Extenions like Zvfh, Zk are not in this scope.

Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-4-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:36 -07:00
Guo Ren dc6667a4e7
riscv: Extending cpufeature.c to detect V-extension
Add V-extension into riscv_isa_ext_keys array and detect it with isa
string parsing.

Signed-off-by: Guo Ren <ren_guo@c-sky.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Greentime Hu <greentime.hu@sifive.com>
Suggested-by: Vineet Gupta <vineetg@rivosinc.com>
Co-developed-by: Andy Chiu <andy.chiu@sifive.com>
Signed-off-by: Andy Chiu <andy.chiu@sifive.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230605110724.21391-3-andy.chiu@sifive.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-08 07:16:35 -07:00
Palmer Dabbelt 748462b59f
Merge patch series "riscv: allow case-insensitive ISA string parsing"
Yangyu Chen <cyy@cyyself.name> says:

This patchset allows case-insensitive ISA string parsing, which is
needed in the ACPI environment. As the RISC-V Hart Capabilities Table
(RHCT) description in UEFI Forum ECR[1] shows the format of the ISA
string is defined in the RISC-V unprivileged specification[2]. However,
the RISC-V unprivileged specification defines the ISA naming strings are
case-insensitive while the current ISA string parser in the kernel only
accepts lowercase letters. In this case, the kernel should allow
case-insensitive ISA string parsing. Moreover, this reason has been
discussed in Conor's patch[3]. And I have also checked the current ISA
string parsing in the recent ACPI support patch[4] will also call
`riscv_fill_hwcap` function as DT we use now.

The original motivation for my patch v1[5] is that some SoC generators
will provide generated DT with illegal ISA string in dt-binding such as
rocket-chip, which will even cause kernel panic in some cases as I
mentioned in v1[5]. Now, the rocket-chip has been fixed in PR #3333[6].
However, when using some specific version of rocket-chip with
illegal ISA string in DT, this patchset will also work for parsing
uppercase letters correctly in DT, thus will have better compatibility.

In summary, this patch not only works for case-insensitive ISA string
parsing to meet the requirements in ECR[1] but also can be a workaround
for some specific versions of rocket-chip.

* b4-shazam-merge:
  dt-bindings: riscv: drop invalid comment about riscv,isa lower-case reasoning
  riscv: allow case-insensitive ISA string parsing

Link: https://lore.kernel.org/r/tencent_E6911C8D71F5624E432A1AFDF86804C3B509@qq.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-06 15:19:33 -07:00
Yangyu Chen 255b34d799
riscv: allow case-insensitive ISA string parsing
According to RISC-V Hart Capabilities Table (RHCT) description in UEFI
Forum ECR, the format of the ISA string is defined in the RISC-V
unprivileged specification which is case-insensitive. However, the
current ISA string parser in the kernel does not support ISA strings
with uppercase letters.

This patch modifies the ISA string parser in the kernel to support
case-insensitive ISA string parsing.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Link: https://lore.kernel.org/r/tencent_B30EED51C7235CA1988890E5C658BE35C107@qq.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-06 15:19:30 -07:00
Linus Torvalds 5321d1b1af RISC-V Fixes for 6.4-rc5
* A build warning fix for BUILTIN_DTB=y.
 * Hibernation support is hidden behind NONPORTABLE, as it depends on
   some undocumented early boot behavior and breaks on most platforms.
 * A fix for relocatable kernels on systems with early boot errata.
 * A fix to properly handle perf callchains for kernel tracepoints.
 * A pair of fixes for NAPOT to avoid inconsistencies between PTEs and
   handle hardware that sets arbitrary A/D bits.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmR6BjUTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYibzID/44s3JWdxb+bGDz75YaZwUFioGUBsJ+
 rAgzyjWRIf+kVE+QwN4xz80McADohyE/Vkc/auT8LJsQfJfvHx9SR50w64aQX03T
 4mDThpKC9O6RHYwMqskIQdJT1lakM33EufoUc7I3CiUwv/uv4i8QpoKaFZ7KBZkD
 SVFNTF45hN+WYi74Mutczj0M4kyP4ThmlEZT5DqtS+elpJxPbcu8PhxLVcuSjhjy
 Hiwxe2W8H2xkyxGJWl41xGlCF4i2DVAuyistXjToJQhj95ww6HBojNyqM50i/B8M
 6N9uP4NBvR+lguyR1DTBOke/030QV3Ag4+UeKXW038Ysz1sE00Pe+PQoJlROEZIz
 W/HnncFkG0Gd4Qg/swHArx44TAVAKL3ZuTw9nGAbPWTUKia67lslCsah7Gva3Gvu
 NmMrvkKJca6SFU1bPN/zH99d5ei8x451LaWE2CbECFCkCMHsVYc61CZqzlCeG7Sz
 CnQoEjfYa9HPoilP2BFlQcGHV9fSKyGGf7dUODDt/tawZhhR8j84HhQCC4Wb9K1q
 hwl11MoKQ2nZEGcrfvm/hMw8rwA+PYts3QRjIDfq1HnyOdv2fho6snJU7aXfFrRZ
 83YPGu6uwaoyTkC+fxinJybSStwa5a1mXhIJmRS/mamBiDCZnc6cIUEFVv8XbcIe
 lH7XxVhDe7/XyA==
 =G1H6
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Palmer Dabbelt:

 - A build warning fix for BUILTIN_DTB=y

 - Hibernation support is hidden behind NONPORTABLE, as it depends on
   some undocumented early boot behavior and breaks on most platforms

 - A fix for relocatable kernels on systems with early boot errata

 - A fix to properly handle perf callchains for kernel tracepoints

 - A pair of fixes for NAPOT to avoid inconsistencies between PTEs and
   handle hardware that sets arbitrary A/D bits

* tag 'riscv-for-linus-6.4-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Implement missing huge_ptep_get
  riscv: Fix huge_ptep_set_wrprotect when PTE is a NAPOT
  riscv: perf: Fix callchain parse error with kernel tracepoint events
  riscv: Fix relocatable kernels with early alternatives using -fno-pie
  RISC-V: mark hibernation as nonportable
  riscv: Fix unused variable warning when BUILTIN_DTB is set
2023-06-02 13:47:36 -04:00
Sunil V L 714aa1d1c8
RISC-V: time.c: Add ACPI support for time_init()
On ACPI based platforms, timer related information is
available in RHCT. Add ACPI based probe support to the
timer initialization.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-20-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:13 -07:00
Sunil V L 0b144c8189
RISC-V: cpu: Enable cpuinfo for ACPI systems
On ACPI based platforms, few details like ISA need to be read
from the ACPI table. Enable cpuinfo on ACPI based systems.

ACPI has nothing similar to DT compatible property for each CPU.
Hence, cpuinfo will not print "uarch".

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-16-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:09 -07:00
Sunil V L 396c018332
RISC-V: cpufeature: Add ACPI support in riscv_fill_hwcap()
On ACPI based systems, the information about the hart
like ISA is provided by the RISC-V Hart Capabilities Table (RHCT).
Enable filling up hwcap structure based on the information in RHCT.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-15-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:08 -07:00
Sunil V L 914d6f44fc
RISC-V: only iterate over possible CPUs in ISA string parser
During boot we call riscv_of_processor_hartid() for each hart that we
add to the possible cpus list. Repeating the call again here is not
required, if we iterate over the list of possible CPUs, rather than the
list of all CPUs.

The call to of_property_read_string() for "riscv,isa" cannot fail
either, as it has previously succeeded in riscv_of_processor_hartid(),
but leaving in the error checking makes the operation of the loop more
obvious & provides leeway for future refactoring of
riscv_of_processor_hartid().

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Co-developed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-14-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:07 -07:00
Sunil V L ce92546cd6
RISC-V: smpboot: Add ACPI support in setup_smp()
Enable SMP boot on ACPI based platforms by using the RINTC
structures in the MADT table.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-13-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:06 -07:00
Sunil V L 61946127ab
RISC-V: smpboot: Create wrapper setup_smp()
setup_smp() currently assumes DT-based platforms. To enable ACPI,
first make this a wrapper function and move existing code to
a separate DT-specific function.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-12-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:05 -07:00
Sunil V L f995611994
RISC-V: ACPI: Cache and retrieve the RINTC structure
RINTC structures in the MADT provide mapping between the hartid
and the CPU. This is required many times even at run time like
cpuinfo. So, instead of parsing the ACPI table every time, cache
the RINTC structures and provide a function to get the correct
RINTC structure for a given cpu.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-10-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:03 -07:00
Sunil V L 724f4c0df7
RISC-V: Add ACPI initialization in setup_arch()
Initialize the ACPI core for RISC-V during boot.

ACPI tables and interpreter are initialized based on
the information passed from the firmware and the value of
the kernel parameter 'acpi'.

With ACPI support added for RISC-V, the kernel parameter 'acpi'
is also supported on RISC-V. Hence, update the documentation.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-9-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:03 -07:00
Sunil V L a91a9ffbd3
RISC-V: Add support to build the ACPI core
Enable ACPI core for RISC-V after adding architecture-specific
interfaces and header files required to build the ACPI core.

1) Couple of header files are required unconditionally by the ACPI
core. Add empty acenv.h and cpu.h header files.

2) If CONFIG_PCI is enabled, a few PCI related interfaces need to
be provided by the architecture. Define dummy interfaces for now
so that build succeeds. Actual implementation will be added when
PCI support is added for ACPI along with external interrupt
controller support.

3) A few globals and memory mapping related functions specific
to the architecture need to be provided.

Signed-off-by: Sunil V L <sunilvl@ventanamicro.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-7-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:45:01 -07:00
Jisheng Zhang 24fc18087f
riscv: move sbi_init() earlier before jump_label_init()
We call jump_label_init() in setup_arch() is to use static key
mechanism earlier, but riscv jump label relies on the sbi functions,
If we enable static key before sbi_init(), the code path looks like:
  static_branch_enable()
    ..
      arch_jump_label_transform()
        patch_text_nosync()
          flush_icache_range()
            flush_icache_all()
              sbi_remote_fence_i() for CONFIG_RISCV_SBI case
                __sbi_rfence()

Since sbi isn't initialized, so NULL deference! Here is a typical
panic log:

[    0.000000] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
[    0.000000] Oops [#1]
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.18.0-rc7+ #79
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] epc : 0x0
[    0.000000]  ra : sbi_remote_fence_i+0x1e/0x26
[    0.000000] epc : 0000000000000000 ra : ffffffff80005826 sp : ffffffff80c03d50
[    0.000000]  gp : ffffffff80ca6178 tp : ffffffff80c0ad80 t0 : 6200000000000000
[    0.000000]  t1 : 0000000000000000 t2 : 62203a6b746e6972 s0 : ffffffff80c03d60
[    0.000000]  s1 : ffffffff80001af6 a0 : 0000000000000000 a1 : 0000000000000000
[    0.000000]  a2 : 0000000000000000 a3 : 0000000000000000 a4 : 0000000000000000
[    0.000000]  a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000000080200
[    0.000000]  s2 : ffffffff808b3e48 s3 : ffffffff808bf698 s4 : ffffffff80cb2818
[    0.000000]  s5 : 0000000000000001 s6 : ffffffff80c9c345 s7 : ffffffff80895aa0
[    0.000000]  s8 : 0000000000000001 s9 : 000000000000007f s10: 0000000000000000
[    0.000000]  s11: 0000000000000000 t3 : ffffffff80824d08 t4 : 0000000000000022
[    0.000000]  t5 : 000000000000003d t6 : 0000000000000000
[    0.000000] status: 0000000000000100 badaddr: 0000000000000000 cause: 000000000000000c
[    0.000000] ---[ end trace 0000000000000000 ]---
[    0.000000] Kernel panic - not syncing: Attempted to kill the idle task!
[    0.000000] ---[ end Kernel panic - not syncing: Attempted to kill the idle task! ]---

Fix this issue by moving sbi_init() earlier before jump_label_init()

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Anup Patel <anup@brainfault.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230515054928.2079268-2-sunilvl@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-06-01 08:44:56 -07:00
Alexandre Ghiti 8dc2a7e802
riscv: Fix relocatable kernels with early alternatives using -fno-pie
Early alternatives are called with the mmu disabled, and then should not
access any global symbols through the GOT since it requires relocations,
relocations that we do before but *virtually*. So only use medany code
model for this early code.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com> # booted on nezha & unmatched
Fixes: 39b3307294 ("riscv: Introduce CONFIG_RELOCATABLE")
Link: https://lore.kernel.org/r/20230526154630.289374-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-31 07:07:07 -07:00
Linus Torvalds 2d1bcbc6cd Probes fixes for 6.4-rc1:
- Initialize 'ret' local variables on fprobe_handler() to fix the smatch
   warning. With this, fprobe function exit handler is not working
   randomly.
 
 - Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)
 
 - Fix recursive call issue on fprobe_kprobe_handler().
 
 - Fix to detect recursive call on fprobe_exit_handler().
 
 - Fix to make all arch-dependent rethook code notrace.
   (the arch-independent code is already notrace)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCgAdFiEEh7BulGwFlgAOi5DV2/sHvwUrPxsFAmRmKgQACgkQ2/sHvwUr
 PxvlCgf+OJk5O9IJlTgqDV6JNPsTzFS7qqyAyQmZW9Bj8STfWAIRxa0zeGbZE58K
 5LwgzAj+SqzYRwIvzzZ3xsA5j7f1Wj7wG0TQgmpnIW+hprwDrLsUhoZ5s1D/Ojel
 A4rAnqCrgnh5m5SenU2QCUngGKn004j4RASaZvRELDyvyIkBSqNhswCH8ZWGPror
 KuCu5AmEnFagYl0lmNL3H2aCITAg3QEK+fE6iR+lYsqfR3xbs4YAcqiylHBdY0wX
 ssK7LVdRmv7O6TxSj4P2ohDvLJP3eL9bVirsJpg0OVbqWJCs65T2rJJjXiKojYXf
 vSVWFJFK5oV98ZHfXTG9R7x0DEwc+g==
 =jO68
 -----END PGP SIGNATURE-----

Merge tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull probes fixes from Masami Hiramatsu:

 - Initialize 'ret' local variables on fprobe_handler() to fix the
   smatch warning. With this, fprobe function exit handler is not
   working randomly.

 - Fix to use preempt_enable/disable_notrace for rethook handler to
   prevent recursive call of fprobe exit handler (which is based on
   rethook)

 - Fix recursive call issue on fprobe_kprobe_handler()

 - Fix to detect recursive call on fprobe_exit_handler()

 - Fix to make all arch-dependent rethook code notrace (the
   arch-independent code is already notrace)"

* tag 'probes-fixes-v6.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rethook, fprobe: do not trace rethook related functions
  fprobe: add recursion detection in fprobe_exit_handler
  fprobe: make fprobe_kprobe_handler recursion free
  rethook: use preempt_{disable, enable}_notrace in rethook_trampoline_handler
  tracing: fprobe: Initialize ret valiable to fix smatch error
2023-05-18 09:04:45 -07:00
Ze Gao 571a2a50a8 rethook, fprobe: do not trace rethook related functions
These functions are already marked as NOKPROBE to prevent recursion and
we have the same reason to blacklist them if rethook is used with fprobe,
since they are beyond the recursion-free region ftrace can guard.

Link: https://lore.kernel.org/all/20230517034510.15639-5-zegao@tencent.com/

Fixes: f3a112c0c4 ("x86,rethook,kprobes: Replace kretprobe with rethook on x86")
Signed-off-by: Ze Gao <zegao@tencent.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2023-05-18 07:08:01 +09:00
Thomas Gleixner 72b11aa7f8 riscv: Switch to hotplug core state synchronization
Switch to the CPU hotplug core state tracking and synchronization
mechanim. No functional change intended.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Oleksandr Natalenko <oleksandr@natalenko.name>
Tested-by: Helge Deller <deller@gmx.de> # parisc
Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> # Steam Deck
Link: https://lore.kernel.org/r/20230512205256.916055844@linutronix.de
2023-05-15 13:44:59 +02:00
Alexandre Ghiti 3b90b09af5
riscv: Fix orphan section warnings caused by kernel/pi
kernel/pi gives rise to a lot of new sections that end up orphans: the
first attempt to fix that tried to enumerate them all in the linker
script, but kernel test robot with a random config keeps finding more of
them.

So prefix all those sections with .init.pi instead of only .init in
order to be able to easily catch them all in the linker script.

Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304301606.Cgp113Ha-lkp@intel.com/
Fixes: 26e7aacb83 ("riscv: Allow to downgrade paging mode from the command line")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230504120759.18730-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-09 18:20:23 -07:00
Linus Torvalds 982365a8f5 RISC-V Patches for the 6.4 Merge Window, Part 2
* Support for hibernation.
 * .rela.dyn has been moved to init.
 * A fix for the SBI probing to allow for implementation-defined
   behavior.
 * Various other fixes and cleanups throughout the tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRVHRATHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiearD/9tUL5STN3icSO58t2EBAmp4CuyBqWo
 KVhOmLmvZqz259GeqfcRsHANszLTwRPzyWxHQJGugPzAphZu3ukQRR8BEDTwwZJO
 toIhv9hXZ4RAu8Chi6Fs/J1WyYVyqSneGTk68xXBXOmm1MWaqU91z92Q5bJGfWqy
 yBSPOTMFvnHHAOdhIXigxLl+z0Y9EV013L18aesHArnuDHIgPGSF9UI6slQ7ThNV
 PhR+VsApd3Ho7+njOzK+mn+1afICKXXGAtmrPjyEt+nE4LmaJc/XY471SPTSlr3U
 BLWm3jmVTK/0peZxce4I2H6k3gz21PiSAy21E+26Bp2+lZD1iWH601eUyasLY88n
 FYXF5VQNvwMx8Ba/yN4VmQ8M25eJ7s7AKWvGa6VLwu0iHxGWmePqoaFuI6JaSXON
 TzJFJDN9xAaBf4Jt7c2c4X9tPJTEFZu6V51AaDDJllw/IJicwHNlNskZUsfvmqqb
 wE/fF6VtcrvEoeKvizOyZGXMs6Wgg6soufL0Ve8rD12U6ZBknVkGruQxF7B+JYsJ
 Ri6ndfKuguMRm6hZmJlVCfFULtm+D6wFczWmmfF562AFISAticib8u/kPz3jAGCu
 GbozEi333FFLBat2QpPK9zL0sH6tj7GCT3ppJjpjUtCmGPyyZuD8zT3rgTxSc8pe
 fp1EE13A2rsU3A==
 =xoqj
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.4-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for hibernation

 - The .rela.dyn section has been moved to the init area

 - A fix for the SBI probing to allow for implementation-defined
   behavior

 - Various other fixes and cleanups throughout the tree

* tag 'riscv-for-linus-6.4-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  RISC-V: include cpufeature.h in cpufeature.c
  riscv: Move .rela.dyn to the init sections
  dt-bindings: riscv: explicitly mention assumption of Zicsr & Zifencei support
  riscv: compat_syscall_table: Fixup compile warning
  RISC-V: fixup in-flight collision with ARCH_WANT_OPTIMIZE_VMEMMAP rename
  RISC-V: fix sifive and thead section mismatches in errata
  RISC-V: Align SBI probe implementation with spec
  riscv: mm: remove redundant parameter of create_fdt_early_page_table
  riscv: Adjust dependencies of HAVE_DYNAMIC_FTRACE selection
  RISC-V: Add arch functions to support hibernation/suspend-to-disk
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
2023-05-05 12:23:33 -07:00
Paolo Bonzini d4fba4dfdc KVM/riscv changes for 6.4
- ONE_REG interface to enable/disable SBI extensions
 - Zbb extension for Guest/VM
 - AIA CSR virtualization
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEZdn75s5e6LHDQ+f/rUjsVaLHLAcFAmRCxY4ACgkQrUjsVaLH
 LAdedxAAjRfjD8vZWyFkFrMT4O1dlCV50hzh5f+zIjPXuKbTed7+zhmDkhnydrWo
 3DldflXV0igZ7LQ7D8KgV9HXHwJzt6TtrDBRIocQwj84udEfwaldNhWcwywYo0vw
 mzuyyEKEC0motg8cTZtfCjykg0wh0+0JmCAeu0BlVE7nWbnann1BuHLDCa7xoJ2t
 t7t5h3HfTk51/Sa/8DQ7JC13hkcSgrPf9pgekAHqLazwbZ7EbDjlp8Wo15tBE18d
 u/HaHCEg+J4bNEljtnwGlO+7wRXhiC/sJUIrxdZBO6tbMEds0j+juUPtTx8vOjKD
 RnnmoBi2/Md5KINhGhZCX1xCf/3jPGk9sjcXruyuDIXTF0AeAx5DZqFagv/6YrFz
 UWaI8tBcnMRFNIHp0PMZPDQJcz0Igtp+wVM/Q0cda/W9voLkAejezK72s+XIODFm
 991zp4MrNdEvmN4DEbcf6JGwU1t/KdBmQ8eEOav4htk6oqlYvBZdP/0ZQOvNDtwK
 tYPstaILUY8Mn0gwF21+UKmmjhZZdyDiG90nhWJcwXIe+v9Wae4ZYUmM4gcw+W4K
 iuO90YjrgA7ZFGXHuIUB3R3v+L27nJlkVes8Gb+QfOUErrs+R/oF+7IPjqFg/KwK
 Axaqmt0Pc3P8TymNt+eRHxPHNIpkF2zjQF5MI+qmYcAuGagwXFE=
 =1Xil
 -----END PGP SIGNATURE-----

Merge tag 'kvm-riscv-6.4-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv changes for 6.4

- ONE_REG interface to enable/disable SBI extensions
- Zbb extension for Guest/VM
- AIA CSR virtualization
2023-05-05 06:11:48 -04:00
Conor Dooley c2d3c8441e
RISC-V: include cpufeature.h in cpufeature.c
Automation complains:
warning: symbol '__pcpu_scope_misaligned_access_speed' was not declared. Should it be static?

cpufeature.c doesn't actually include the header of the same name, as it
had not previously used anything from it.
The per-cpu variable is declared there, so include it to silence the
complaints.

Fixes: 62a31d6e38 ("RISC-V: hwprobe: Support probing of misaligned access performance")
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230420-wound-gizzard-2b2b589d9bea@spud
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-01 17:19:27 -07:00
Alexandre Ghiti 4db9e253e7
riscv: Move .rela.dyn to the init sections
The recent introduction of relocatable kernels prepared the move of
.rela.dyn to the init section, but actually forgot to do so, so do it
here.

Before this patch: "Freeing unused kernel image (initmem) memory: 2592K"
After this patch:  "Freeing unused kernel image (initmem) memory: 6288K"

The difference corresponds to the size of the .rela.dyn section:
"[42] .rela.dyn         RELA             ffffffff8197e798  0127f798
       000000000039c660  0000000000000018   A      47     0     8"

Fixes: 559d1e45a1 ("riscv: Use --emit-relocs in order to move .rela.dyn in init")
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230428120932.22735-1-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-01 17:00:32 -07:00
Guo Ren f9c4bbddec
riscv: compat_syscall_table: Fixup compile warning
../arch/riscv/kernel/compat_syscall_table.c:12:41: warning: initialized
field overwritten [-Woverride-init]
   12 | #define __SYSCALL(nr, call)      [nr] = (call),
      |                                         ^
../include/uapi/asm-generic/unistd.h:567:1: note: in expansion of macro
'__SYSCALL'
  567 | __SYSCALL(__NR_semget, sys_semget)

Fixes: 59c10c52f5 ("riscv: compat: syscall: Add compat_sys_call_table implementation")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reported-by: kernel test robot <lkp@intel.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Drew Fustini <dfustini@baylibre.com>
Link: https://lore.kernel.org/r/20230501223353.2833899-1-dfustini@baylibre.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-05-01 15:42:34 -07:00
Andrew Jones 41cad8284d
RISC-V: Align SBI probe implementation with spec
sbi_probe_extension() is specified with "Returns 0 if the given SBI
extension ID (EID) is not available, or 1 if it is available unless
defined as any other non-zero value by the implementation."
Additionally, sbiret.value is a long. Fix the implementation to
ensure any nonzero long value is considered a success, rather
than only positive int values.

Fixes: b9dcd9e415 ("RISC-V: Add basic support for SBI v0.2")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230427163626.101042-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 13:04:50 -07:00
Palmer Dabbelt 38dab744f7
Merge patch series "RISC-V Hibernation Support"
Sia Jee Heng <jeeheng.sia@starfivetech.com> says:

This series adds RISC-V Hibernation/suspend to disk support.
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start to
restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_HIBERNATION
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

At high-level, this series includes the following changes:
1) Change suspend_save_csrs() and suspend_restore_csrs()
   to public function as these functions are common to
   suspend/hibernation. (patch 1)
2) Refactor the common code in the __cpu_resume_enter() function and
   __hibernate_cpu_resume() function. The common code are used by
   hibernation and suspend. (patch 2)
3) Enhance kernel_page_present() function to support huge page. (patch 3)
4) Add arch/riscv low level functions to support
   hibernation/suspend to disk. (patch 4)

* b4-shazam-merge:
  RISC-V: Add arch functions to support hibernation/suspend-to-disk
  RISC-V: mm: Enable huge page support to kernel_page_present() function
  RISC-V: Factor out common code of __cpu_resume_enter()
  RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function

Link: https://lore.kernel.org/r/20230330064321.1008373-1-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:27:33 -07:00
Sia Jee Heng c031721001
RISC-V: Add arch functions to support hibernation/suspend-to-disk
Low level Arch functions were created to support hibernation.
swsusp_arch_suspend() relies code from __cpu_suspend_enter() to write
cpu state onto the stack, then calling swsusp_save() to save the memory
image.

Arch specific hibernation header is implemented and is utilized by the
arch_hibernation_header_restore() and arch_hibernation_header_save()
functions. The arch specific hibernation header consists of satp, hartid,
and the cpu_resume address. The kernel built version is also need to be
saved into the hibernation image header to making sure only the same
kernel is restore when resume.

swsusp_arch_resume() creates a temporary page table that covering only
the linear map. It copies the restore code to a 'safe' page, then start
to restore the memory image. Once completed, it restores the original
kernel's page table. It then calls into __hibernate_cpu_resume()
to restore the CPU context. Finally, it follows the normal hibernation
path back to the hibernation core.

To enable hibernation/suspend to disk into RISCV, the below config
need to be enabled:
- CONFIG_HIBERNATION
- CONFIG_ARCH_HIBERNATION_HEADER
- CONFIG_ARCH_HIBERNATION_POSSIBLE

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230330064321.1008373-5-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:25:13 -07:00
Sia Jee Heng fcb89863d1
RISC-V: Factor out common code of __cpu_resume_enter()
The cpu_resume() function is very similar for the suspend to disk and
suspend to ram cases. Factor out the common code into suspend_restore_csrs
macro and suspend_restore_regs macro.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230330064321.1008373-3-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:25:11 -07:00
Sia Jee Heng 0def12f321
RISC-V: Change suspend_save_csrs and suspend_restore_csrs to public function
Currently suspend_save_csrs() and suspend_restore_csrs() functions are
statically defined in the suspend.c. Change the function's attribute
to public so that the functions can be used by hibernation as well.

Signed-off-by: Sia Jee Heng <jeeheng.sia@starfivetech.com>
Reviewed-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Reviewed-by: Mason Huo <mason.huo@starfivetech.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230330064321.1008373-2-jeeheng.sia@starfivetech.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-29 11:25:10 -07:00
Linus Torvalds 89d77f71f4 RISC-V Patches for the 6.4 Merge Window, Part 1
* Support for runtime detection of the Svnapot extension.
 * Support for Zicboz when clearing pages.
 * We've moved to GENERIC_ENTRY.
 * Support for !MMU on rv32 systems.
 * The linear region is now mapped via huge pages.
 * Support for building relocatable kernels.
 * Support for the hwprobe interface.
 * Various fixes and cleanups throughout the tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmRL5rcTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYibpcD/0RnmO+N2OJxsJXf0KtHv4LlChAFaMZ
 mfcsU8lv8r3Rz1USJGyVoE57885R+iUw1664ic6Gj9Ll9/A+BDVyqlNeo1BZ7nnv
 6hZawSh8XGMyCJoatjaCSMW6VKObsSpHXLoA0mxtj06w1XhtpUnzjv4SZQqBYxC2
 7+/cfy6l3uGdSKQ0R402sF8PE+l3HthhO+Cw9NYHQZisAHEQrfFpXRnrovhs+vX0
 aVxoWo8bmIhhNke2jh6dnGhfFfAs+UClbaKgZfe8af6feboo+Tal3+OibiEy1K1j
 hDQ3w/G5jAdwSqnNPdXzpk4srskUOhP9is8AG79vCasMxybQIBfZcc7/kLmmQX+2
 xt1EoDVD/lSO1p+CWRautLXEsInWbpBYaSJie7WcR4SHe8S7/nomTDlwkJHx5cma
 mkSYHJKNwCbamDTI3gXg8nrScbxsRnJQsQUolFDwAeRz7AYVwtqVh8VxAWqAdU3q
 xUNKrUpCAzNC3d5GL7pmRfZrqjpQhuFXkHFSy85vaCPuckBu926OzxpKBmX4Kea1
 qLYWfxv78bcwuY47FWJKcd97Ib63iBYDgarJxvrHrwDaHV2xjBOmdapNPUc2PswT
 a938enbYYnJHIbuSmbeNBPF4iF6nKUXshyfZu7tCZl6MzsXloUckGdm++j97Bpvr
 g6G3ZP6STSQBmw==
 =oxQd
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for runtime detection of the Svnapot extension

 - Support for Zicboz when clearing pages

 - We've moved to GENERIC_ENTRY

 - Support for !MMU on rv32 systems

 - The linear region is now mapped via huge pages

 - Support for building relocatable kernels

 - Support for the hwprobe interface

 - Various fixes and cleanups throughout the tree

* tag 'riscv-for-linus-6.4-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (57 commits)
  RISC-V: hwprobe: Explicity check for -1 in vdso init
  RISC-V: hwprobe: There can only be one first
  riscv: Allow to downgrade paging mode from the command line
  dt-bindings: riscv: add sv57 mmu-type
  RISC-V: hwprobe: Remove __init on probe_vendor_features()
  riscv: Use --emit-relocs in order to move .rela.dyn in init
  riscv: Check relocations at compile time
  powerpc: Move script to check relocations at compile time in scripts/
  riscv: Introduce CONFIG_RELOCATABLE
  riscv: Move .rela.dyn outside of init to avoid empty relocations
  riscv: Prepare EFI header for relocatable kernels
  riscv: Unconditionnally select KASAN_VMALLOC if KASAN
  riscv: Fix ptdump when KASAN is enabled
  riscv: Fix EFI stub usage of KASAN instrumented strcmp function
  riscv: Move DTB_EARLY_BASE_VA to the kernel address space
  riscv: Rework kasan population functions
  riscv: Split early and final KASAN population functions
  riscv: Use PUD/P4D/PGD pages for the linear mapping
  riscv: Move the linear mapping creation in its own function
  riscv: Get rid of riscv_pfn_base variable
  ...
2023-04-28 16:55:39 -07:00
Linus Torvalds f20730efbd SMP cross-CPU function-call updates for v6.4:
- Remove diagnostics and adjust config for CSD lock diagnostics
 
  - Add a generic IPI-sending tracepoint, as currently there's no easy
    way to instrument IPI origins: it's arch dependent and for some
    major architectures it's not even consistently available.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK438RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1jJ5Q/5AZ0HGpyqwdFK8GmGznyu5qjP5HwV9pPq
 gZQScqSy4tZEeza4TFMi83CoXSg9uJ7GlYJqqQMKm78LGEPomnZtXXC7oWvTA9M5
 M/jAvzytmvZloSCXV6kK7jzSejMHhag97J/BjTYhZYQpJ9T+hNC87XO6J6COsKr9
 lPIYqkFrIkQNr6B0U11AQfFejRYP1ics2fnbnZL86G/zZAc6x8EveM3KgSer2iHl
 KbrO+xcYyGY8Ef9P2F72HhEGFfM3WslpT1yzqR3sm4Y+fuMG0oW3qOQuMJx0ZhxT
 AloterY0uo6gJwI0P9k/K4klWgz81Tf/zLb0eBAtY2uJV9Fo3YhPHuZC7jGPGAy3
 JusW2yNYqc8erHVEMAKDUsl/1KN4TE2uKlkZy98wno+KOoMufK5MA2e2kPPqXvUi
 Jk9RvFolnWUsexaPmCftti0OCv3YFiviVAJ/t0pchfmvvJA2da0VC9hzmEXpLJVF
 25nBTV/1uAOrWvOpCyo3ElrC2CkQVkFmK5rXMDdvf6ib0Nid4vFcCkCSLVfu+ePB
 11mi7QYro+CcnOug1K+yKogUDmsZgV/u1kUwgQzTIpZ05Kkb49gUiXw9L2RGcBJh
 yoDoiI66KPR7PWQ2qBdQoXug4zfEEtWG0O9HNLB0FFRC3hu7I+HHyiUkBWs9jasK
 PA5+V7HcQRk=
 =Wp7f
 -----END PGP SIGNATURE-----

Merge tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull SMP cross-CPU function-call updates from Ingo Molnar:

 - Remove diagnostics and adjust config for CSD lock diagnostics

 - Add a generic IPI-sending tracepoint, as currently there's no easy
   way to instrument IPI origins: it's arch dependent and for some major
   architectures it's not even consistently available.

* tag 'smp-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  trace,smp: Trace all smp_function_call*() invocations
  trace: Add trace_ipi_send_cpu()
  sched, smp: Trace smp callback causing an IPI
  smp: reword smp call IPI comment
  treewide: Trace IPIs sent via smp_send_reschedule()
  irq_work: Trace self-IPIs sent via arch_irq_work_raise()
  smp: Trace IPIs sent via arch_send_call_function_ipi_mask()
  sched, smp: Trace IPIs sent via send_call_function_single_ipi()
  trace: Add trace_ipi_send_cpumask()
  kernel/smp: Make csdlock_debug= resettable
  locking/csd_lock: Remove per-CPU data indirection from CSD lock debugging
  locking/csd_lock: Remove added data from CSD lock debugging
  locking/csd_lock: Add Kconfig option for csd_debug default
2023-04-28 15:03:43 -07:00
Linus Torvalds 2aff7c706c Objtool changes for v6.4:
- Mark arch_cpu_idle_dead() __noreturn, make all architectures & drivers that did
    this inconsistently follow this new, common convention, and fix all the fallout
    that objtool can now detect statically.
 
  - Fix/improve the ORC unwinder becoming unreliable due to UNWIND_HINT_EMPTY ambiguity,
    split it into UNWIND_HINT_END_OF_STACK and UNWIND_HINT_UNDEFINED to resolve it.
 
  - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code.
 
  - Generate ORC data for __pfx code
 
  - Add more __noreturn annotations to various kernel startup/shutdown/panic functions.
 
  - Misc improvements & fixes.
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmRK1x0RHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1ghxQ/+IkCynMYtdF5OG9YwbcGJqsPSfOPMEcEM
 pUSFYg+gGPBDT/fJfcVSqvUtdnWbLC2kXt9yiswXz3X3J2nmNkBk5YKQftsNDcul
 TmKeqIIAK51XTncpegKH0EGnOX63oZ9Vxa8CTPdDlb+YF23Km2FoudGRI9F5qbUd
 LoraXqGYeiaeySkGyWmZVl6Uc8dIxnMkTN3H/oI9aB6TOrsi059hAtFcSaFfyemP
 c4LqXXCH7k2baiQt+qaLZ8cuZVG/+K5r2N2cmjO5kmJc6ynIaFnfMe4XxZLjp5LT
 /PulYI15bXkvSARKx5CRh/CDHMOx5Blw+ASO0RhWbdy0WH4ZhhcaVF5AeIpPW86a
 1LBcz97rMp72WmvKgrJeVO1r9+ll4SI6/YKGJRsxsCMdP3hgFpqntXyVjTFNdTM1
 0gH6H5v55x06vJHvhtTk8SR3PfMTEM2fRU5jXEOrGowoGifx+wNUwORiwj6LE3KQ
 SKUdT19RNzoW3VkFxhgk65ThK1S7YsJUKRoac3YdhttpqqqtFV//erenrZoR4k/p
 vzvKy68EQ7RCNyD5wNWNFe0YjeJl5G8gQ8bUm4Xmab7djjgz+pn4WpQB8yYKJLAo
 x9dqQ+6eUbw3Hcgk6qQ9E+r/svbulnAL0AeALAWK/91DwnZ2mCzKroFkLN7napKi
 fRho4CqzrtM=
 =NwEV
 -----END PGP SIGNATURE-----

Merge tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull objtool updates from Ingo Molnar:

 - Mark arch_cpu_idle_dead() __noreturn, make all architectures &
   drivers that did this inconsistently follow this new, common
   convention, and fix all the fallout that objtool can now detect
   statically

 - Fix/improve the ORC unwinder becoming unreliable due to
   UNWIND_HINT_EMPTY ambiguity, split it into UNWIND_HINT_END_OF_STACK
   and UNWIND_HINT_UNDEFINED to resolve it

 - Fix noinstr violations in the KCSAN code and the lkdtm/stackleak code

 - Generate ORC data for __pfx code

 - Add more __noreturn annotations to various kernel startup/shutdown
   and panic functions

 - Misc improvements & fixes

* tag 'objtool-core-2023-04-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits)
  x86/hyperv: Mark hv_ghcb_terminate() as noreturn
  scsi: message: fusion: Mark mpt_halt_firmware() __noreturn
  x86/cpu: Mark {hlt,resume}_play_dead() __noreturn
  btrfs: Mark btrfs_assertfail() __noreturn
  objtool: Include weak functions in global_noreturns check
  cpu: Mark nmi_panic_self_stop() __noreturn
  cpu: Mark panic_smp_self_stop() __noreturn
  arm64/cpu: Mark cpu_park_loop() and friends __noreturn
  x86/head: Mark *_start_kernel() __noreturn
  init: Mark start_kernel() __noreturn
  init: Mark [arch_call_]rest_init() __noreturn
  objtool: Generate ORC data for __pfx code
  x86/linkage: Fix padding for typed functions
  objtool: Separate prefix code from stack validation code
  objtool: Remove superfluous dead_end_function() check
  objtool: Add symbol iteration helpers
  objtool: Add WARN_INSN()
  scripts/objdump-func: Support multiple functions
  context_tracking: Fix KCSAN noinstr violation
  objtool: Add stackleak instrumentation to uaccess safe list
  ...
2023-04-28 14:02:54 -07:00
Linus Torvalds cb6fe2ceb6 Devicetree updates for v6.4, part 2:
- First part of DT header detangling dropping cpu.h from of_device.h
   and replacing some includes with forward declarations. A handful of
   drivers needed some adjustment to their includes as a result.
 
 - Refactor of_device.h to be used by bus drivers rather than various
   device drivers. This moves non-bus related functions out of
   of_device.h. The end goal is for of_platform.h and of_device.h to stop
   including each other.
 
 - Refactor open coded parsing of "ranges" in some bus drivers to use DT
   address parsing functions
 
 - Add some new address parsing functions of_property_read_reg(),
   of_range_count(), and of_range_to_resource() in preparation to convert
   more open coded parsing of DT addresses to use them.
 
 - Treewide clean-ups to use of_property_read_bool() and
   of_property_present() as appropriate. The ones here are the ones
   that didn't get picked up elsewhere.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEktVUI4SxYhzZyEuo+vtdtY28YcMFAmRIOrkACgkQ+vtdtY28
 YcN9WA//R+QrmSPExhfgio5y+aOJDWucqnAcyAusPctLcF7h7j0CdzpwaSRkdaH4
 KiLjeyt6tKn8wt8w7m/+SmCsSYXPn81GH/Y5I2F40x6QMrY3cVOXUsulKQA+6ZjZ
 PmW3bMcz0Dw9IhUK3R/WX96+9UdoytKg5qoTzNzPTKpvKA1yHa/ogl2FnHJS5W+8
 Rxz+1oJ70VMIWGpBOc0acHuB2S0RHZ46kPKkPTBgFYEwtmJ8qobvV3r3uQapNaIP
 2jnamPu0tAaQoSaJKKSulToziT+sd1sNB+9oyu/kP+t3PXzq4qwp2Gr4jzUYKs4A
 ZF3DPhMR3YLLN41g/L3rtB0T/YIS287sZRuaLhCqldNpRerSDk4b0HRAksGk1XrI
 HqYXjWPbRxqYiIUWkInfregSTYJfGPxeLfLKrawNO34/eEV4JrkSKy8d0AJn04EK
 jTRqI3L7o23ZPxs29uH/3+KK90J3emPZkF7GWVJTEAMsM8jYZduGh7EpsttJLaz/
 QnxbTBm9295ahIdCfo/OQhqjWnaNhpbTzf31pyrBZ/itXV7gQ0xjwqPwiyFwI+o/
 F/r81xqdwQ3Ni8MKt2c7zLyVA95JHPe95KQ3GrDXR68aByJr4RuhKG8Y2Pj1VOb3
 V+Hsu5uhwKrK7Yqe+rHDnJBO00OCO8nwbWhMy2xVxoTkSFCjDmo=
 =89Zj
 -----END PGP SIGNATURE-----

Merge tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux

Pull more devicetree updates from Rob Herring:

 - First part of DT header detangling dropping cpu.h from of_device.h
   and replacing some includes with forward declarations. A handful of
   drivers needed some adjustment to their includes as a result.

 - Refactor of_device.h to be used by bus drivers rather than various
   device drivers. This moves non-bus related functions out of
   of_device.h. The end goal is for of_platform.h and of_device.h to
   stop including each other.

 - Refactor open coded parsing of "ranges" in some bus drivers to use DT
   address parsing functions

 - Add some new address parsing functions of_property_read_reg(),
   of_range_count(), and of_range_to_resource() in preparation to
   convert more open coded parsing of DT addresses to use them.

 - Treewide clean-ups to use of_property_read_bool() and
   of_property_present() as appropriate. The ones here are the ones that
   didn't get picked up elsewhere.

* tag 'devicetree-for-6.4-2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (34 commits)
  bus: tegra-gmi: Replace of_platform.h with explicit includes
  hte: Use of_property_present() for testing DT property presence
  w1: w1-gpio: Use of_property_read_bool() for boolean properties
  virt: fsl: Use of_property_present() for testing DT property presence
  soc: fsl: Use of_property_present() for testing DT property presence
  sbus: display7seg: Use of_property_read_bool() for boolean properties
  sparc: Use of_property_read_bool() for boolean properties
  sparc: Use of_property_present() for testing DT property presence
  bus: mvebu-mbus: Remove open coded "ranges" parsing
  of/address: Add of_property_read_reg() helper
  of/address: Add of_range_count() helper
  of/address: Add support for 3 address cell bus
  of/address: Add of_range_to_resource() helper
  of: unittest: Add bus address range parsing tests
  of: Drop cpu.h include from of_device.h
  OPP: Adjust includes to remove of_device.h
  irqchip: loongson-eiointc: Add explicit include for cpuhotplug.h
  cpuidle: Adjust includes to remove of_device.h
  cpufreq: sun50i: Add explicit include for cpu.h
  cpufreq: Adjust includes to remove of_device.h
  ...
2023-04-27 10:09:05 -07:00
Andrew Jones b09313dd2e
RISC-V: hwprobe: Explicity check for -1 in vdso init
id_bitsmash is unsigned. We need to explicitly check for -1, rather
than use > 0.

Fixes: aa5af0aa90 ("RISC-V: Add hwprobe vDSO function and data")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230426141333.10063-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-26 08:58:34 -07:00
Andrew Jones 08dc107594
RISC-V: hwprobe: There can only be one first
Only capture the first cpu_id in order for the comparison
below to be of any use.

Fixes: ea3de9ce8a ("RISC-V: Add a syscall for HW probing")
Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230426141333.10063-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-26 08:58:33 -07:00
Alexandre Ghiti 26e7aacb83
riscv: Allow to downgrade paging mode from the command line
Add 2 early command line parameters that allow to downgrade satp mode
(using the same naming as x86):
- "no5lvl": use a 4-level page table (down from sv57 to sv48)
- "no4lvl": use a 3-level page table (down from sv57/sv48 to sv39)

Note that going through the device tree to get the kernel command line
works with ACPI too since the efi stub creates a device tree anyway with
the command line.

In KASAN kernels, we can't use the libfdt that early in the boot process
since we are not ready to execute instrumented functions. So instead of
using the "generic" libfdt, we compile our own versions of those functions
that are not instrumented and that are prefixed so that they do not
conflict with the generic ones. We also need the non-instrumented versions
of the string functions and the prefixed versions of memcpy/memmove.

This is largely inspired by commit aacd149b62 ("arm64: head: avoid
relocating the kernel twice for KASLR") from which I removed compilation
flags that were not relevant to RISC-V at the moment (LTO, SCS). Also
note that we have to link with -z norelro to avoid ld.lld to throw a
warning with the new .got sections, like in commit 311bea3cb9 ("arm64:
link with -z norelro for LLD or aarch64-elf").

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230424092313.178699-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-26 07:30:52 -07:00
Evan Green bb3f89487f
RISC-V: hwprobe: Remove __init on probe_vendor_features()
probe_vendor_features() is now called from smp_callin(), which is not
__init code and runs during cpu hotplug events. Remove the
__init_or_module decoration from it and the functions it calls to avoid
walking into outer space.

Fixes: 62a31d6e38 ("RISC-V: hwprobe: Support probing of misaligned access performance")
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230420194934.1871356-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-25 21:58:42 -07:00
Linus Torvalds e7989789c6 Timers and timekeeping updates:
- Improve the VDSO build time checks to cover all dynamic relocations
 
     VDSO does not allow dynamic relcations, but the build time check is
     incomplete and fragile.
 
     It's based on architectures specifying the relocation types to search
     for and does not handle R_*_NONE relocation entries correctly.
     R_*_NONE relocations are injected by some GNU ld variants if they fail
     to determine the exact .rel[a]/dyn_size to cover trailing zeros.
     R_*_NONE relocations must be ignored by dynamic loaders, so they
     should be ignored in the build time check too.
 
     Remove the architecture specific relocation types to check for and
     validate strictly that no other relocations than R_*_NONE end up
     in the VSDO .so file.
 
   - Prefer signal delivery to the current thread for
     CLOCK_PROCESS_CPUTIME_ID based posix-timers
 
     Such timers prefer to deliver the signal to the main thread of a
     process even if the context in which the timer expires is the current
     task. This has the downside that it might wake up an idle thread.
 
     As there is no requirement or guarantee that the signal has to be
     delivered to the main thread, avoid this by preferring the current
     task if it is part of the thread group which shares sighand.
 
     This not only avoids waking idle threads, it also distributes the
     signal delivery in case of multiple timers firing in the context
     of different threads close to each other better.
 
   - Align the tick period properly (again)
 
     For a long time the tick was starting at CLOCK_MONOTONIC zero, which
     allowed users space applications to either align with the tick or to
     place a periodic computation so that it does not interfere with the
     tick. The alignement of the tick period was more by chance than by
     intention as the tick is set up before a high resolution clocksource is
     installed, i.e. timekeeping is still tick based and the tick period
     advances from there.
 
     The early enablement of sched_clock() broke this alignement as the time
     accumulated by sched_clock() is taken into account when timekeeping is
     initialized. So the base value now(CLOCK_MONOTONIC) is not longer a
     multiple of tick periods, which breaks applications which relied on
     that behaviour.
 
     Cure this by aligning the tick starting point to the next multiple of
     tick periods, i.e 1000ms/CONFIG_HZ.
 
  - A set of NOHZ fixes and enhancements
 
    - Cure the concurrent writer race for idle and IO sleeptime statistics
 
      The statitic values which are exposed via /proc/stat are updated from
      the CPU local idle exit and remotely by cpufreq, but that happens
      without any form of serialization. As a consequence sleeptimes can be
      accounted twice or worse.
 
      Prevent this by restricting the accumulation writeback to the CPU
      local idle exit and let the remote access compute the accumulated
      value.
 
    - Protect idle/iowait sleep time with a sequence count
 
      Reading idle/iowait sleep time, e.g. from /proc/stat, can race with
      idle exit updates. As a consequence the readout may result in random
      and potentially going backwards values.
 
      Protect this by a sequence count, which fixes the idle time
      statistics issue, but cannot fix the iowait time problem because
      iowait time accounting races with remote wake ups decrementing the
      remote runqueues nr_iowait counter. The latter is impossible to fix,
      so the only way to deal with that is to document it properly and to
      remove the assertion in the selftest which triggers occasionally due
      to that.
 
    - Restructure struct tick_sched for better cache layout
 
    - Some small cleanups and a better cache layout for struct tick_sched
 
  - Implement the missing timer_wait_running() callback for POSIX CPU timers
 
    For unknown reason the introduction of the timer_wait_running() callback
    missed to fixup posix CPU timers, which went unnoticed for almost four
    years.
 
    While initially only targeted to prevent livelocks between a timer
    deletion and the timer expiry function on PREEMPT_RT enabled kernels, it
    turned out that fixing this for mainline is not as trivial as just
    implementing a stub similar to the hrtimer/timer callbacks.
 
    The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled systems
    there is a livelock issue independent of RT.
 
    CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU timers
    out from hard interrupt context to task work, which is handled before
    returning to user space or to a VM. The expiry mechanism moves the
    expired timers to a stack local list head with sighand lock held. Once
    sighand is dropped the task can be preempted and a task which wants to
    delete a timer will spin-wait until the expiry task is scheduled back
    in. In the worst case this will end up in a livelock when the preempting
    task and the expiry task are pinned on the same CPU.
 
    The timer wheel has a timer_wait_running() mechanism for RT, which uses
    a per CPU timer-base expiry lock which is held by the expiry code and the
    task waiting for the timer function to complete blocks on that lock.
 
    This does not work in the same way for posix CPU timers as there is no
    timer base and expiry for process wide timers can run on any task
    belonging to that process, but the concept of waiting on an expiry lock
    can be used too in a slightly different way.
 
    Add a per task mutex to struct posix_cputimers_work, let the expiry task
    hold it accross the expiry function and let the deleting task which
    waits for the expiry to complete block on the mutex.
 
    In the non-contended case this results in an extra mutex_lock()/unlock()
    pair on both sides.
 
    This avoids spin-waiting on a task which is scheduled out, prevents the
    livelock and cures the problem for RT and !RT systems.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGrj4THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoZhdEAC/lwfDWCnTXHC8ExQQRDIVNyXmDlLb
 EHB8ZY7Wc4gNZ8UEXEOLOXJHMG9bsbtPGctVewJwRGnXZWKVhpPwQba6kCRycyX0
 0J6l5DlvUaGGrpoOzOZwgETRmtIZE9tEArZR8xlfRScYd93a7yLhwIjO8JaV9vKs
 IQpAQMeJ/ysp6gHrS59qakYfoHU/ERUAu3Tk4GqHUtPtcyz3nX3eTlLWV8LySqs+
 00qr2yc0bQFUFoKzTCxtM8lcEi9ja9SOj1rw28348O+BXE4d0HC12Ie7eU/CDN2Y
 OAlWYxVjy4LMh24LDrRQKTzoVqx9MXDx2g+09B3t8NK5LgeS+EJIjujDhZF147/H
 5y906nplZUKa8BiZW5Rpm/HKH8tFI80T9XWSQCRBeMgTEJyRyRU1yASAwO4xw+dY
 Dn3tGmFGymcV/72o4ic9JFKQd8cTSxPjEJS3qqzMkEAtyI/zPBmKxj/Tce50OH40
 6FSZq1uU21ZQzszwSHISwgFtNr75laUSK4Z1te5OhPOOz+C7O9YqHvqS/1jwhPj2
 tMd8X17fRW3UTUBlBj+zqxqiEGBl/Yk2AvKrJIXGUtfWYCtjMJ7ieCf0kZ7NSVJx
 9ewubA0gqseMD783YomZsy8LLtMKnhclJeslUOVb1oKs1q/WF1R/k6qjy9vUwYaB
 nIJuHl8mxSetag==
 =SVnj
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timers and timekeeping updates from Thomas Gleixner:

 - Improve the VDSO build time checks to cover all dynamic relocations

   VDSO does not allow dynamic relocations, but the build time check is
   incomplete and fragile.

   It's based on architectures specifying the relocation types to search
   for and does not handle R_*_NONE relocation entries correctly.
   R_*_NONE relocations are injected by some GNU ld variants if they
   fail to determine the exact .rel[a]/dyn_size to cover trailing zeros.
   R_*_NONE relocations must be ignored by dynamic loaders, so they
   should be ignored in the build time check too.

   Remove the architecture specific relocation types to check for and
   validate strictly that no other relocations than R_*_NONE end up in
   the VSDO .so file.

 - Prefer signal delivery to the current thread for
   CLOCK_PROCESS_CPUTIME_ID based posix-timers

   Such timers prefer to deliver the signal to the main thread of a
   process even if the context in which the timer expires is the current
   task. This has the downside that it might wake up an idle thread.

   As there is no requirement or guarantee that the signal has to be
   delivered to the main thread, avoid this by preferring the current
   task if it is part of the thread group which shares sighand.

   This not only avoids waking idle threads, it also distributes the
   signal delivery in case of multiple timers firing in the context of
   different threads close to each other better.

 - Align the tick period properly (again)

   For a long time the tick was starting at CLOCK_MONOTONIC zero, which
   allowed users space applications to either align with the tick or to
   place a periodic computation so that it does not interfere with the
   tick. The alignement of the tick period was more by chance than by
   intention as the tick is set up before a high resolution clocksource
   is installed, i.e. timekeeping is still tick based and the tick
   period advances from there.

   The early enablement of sched_clock() broke this alignement as the
   time accumulated by sched_clock() is taken into account when
   timekeeping is initialized. So the base value now(CLOCK_MONOTONIC) is
   not longer a multiple of tick periods, which breaks applications
   which relied on that behaviour.

   Cure this by aligning the tick starting point to the next multiple of
   tick periods, i.e 1000ms/CONFIG_HZ.

 - A set of NOHZ fixes and enhancements:

     * Cure the concurrent writer race for idle and IO sleeptime
       statistics

       The statitic values which are exposed via /proc/stat are updated
       from the CPU local idle exit and remotely by cpufreq, but that
       happens without any form of serialization. As a consequence
       sleeptimes can be accounted twice or worse.

       Prevent this by restricting the accumulation writeback to the CPU
       local idle exit and let the remote access compute the accumulated
       value.

     * Protect idle/iowait sleep time with a sequence count

       Reading idle/iowait sleep time, e.g. from /proc/stat, can race
       with idle exit updates. As a consequence the readout may result
       in random and potentially going backwards values.

       Protect this by a sequence count, which fixes the idle time
       statistics issue, but cannot fix the iowait time problem because
       iowait time accounting races with remote wake ups decrementing
       the remote runqueues nr_iowait counter. The latter is impossible
       to fix, so the only way to deal with that is to document it
       properly and to remove the assertion in the selftest which
       triggers occasionally due to that.

     * Restructure struct tick_sched for better cache layout

     * Some small cleanups and a better cache layout for struct
       tick_sched

 - Implement the missing timer_wait_running() callback for POSIX CPU
   timers

   For unknown reason the introduction of the timer_wait_running()
   callback missed to fixup posix CPU timers, which went unnoticed for
   almost four years.

   While initially only targeted to prevent livelocks between a timer
   deletion and the timer expiry function on PREEMPT_RT enabled kernels,
   it turned out that fixing this for mainline is not as trivial as just
   implementing a stub similar to the hrtimer/timer callbacks.

   The reason is that for CONFIG_POSIX_CPU_TIMERS_TASK_WORK enabled
   systems there is a livelock issue independent of RT.

   CONFIG_POSIX_CPU_TIMERS_TASK_WORK=y moves the expiry of POSIX CPU
   timers out from hard interrupt context to task work, which is handled
   before returning to user space or to a VM. The expiry mechanism moves
   the expired timers to a stack local list head with sighand lock held.
   Once sighand is dropped the task can be preempted and a task which
   wants to delete a timer will spin-wait until the expiry task is
   scheduled back in. In the worst case this will end up in a livelock
   when the preempting task and the expiry task are pinned on the same
   CPU.

   The timer wheel has a timer_wait_running() mechanism for RT, which
   uses a per CPU timer-base expiry lock which is held by the expiry
   code and the task waiting for the timer function to complete blocks
   on that lock.

   This does not work in the same way for posix CPU timers as there is
   no timer base and expiry for process wide timers can run on any task
   belonging to that process, but the concept of waiting on an expiry
   lock can be used too in a slightly different way.

   Add a per task mutex to struct posix_cputimers_work, let the expiry
   task hold it accross the expiry function and let the deleting task
   which waits for the expiry to complete block on the mutex.

   In the non-contended case this results in an extra
   mutex_lock()/unlock() pair on both sides.

   This avoids spin-waiting on a task which is scheduled out, prevents
   the livelock and cures the problem for RT and !RT systems

* tag 'timers-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  posix-cpu-timers: Implement the missing timer_wait_running callback
  selftests/proc: Assert clock_gettime(CLOCK_BOOTTIME) VS /proc/uptime monotonicity
  selftests/proc: Remove idle time monotonicity assertions
  MAINTAINERS: Remove stale email address
  timers/nohz: Remove middle-function __tick_nohz_idle_stop_tick()
  timers/nohz: Add a comment about broken iowait counter update race
  timers/nohz: Protect idle/iowait sleep time under seqcount
  timers/nohz: Only ever update sleeptime from idle exit
  timers/nohz: Restructure and reshuffle struct tick_sched
  tick/common: Align tick period with the HZ tick.
  selftests/timers/posix_timers: Test delivery of signals across threads
  posix-timers: Prefer delivery of signals to the current thread
  vdso: Improve cmd_vdso_check to check all dynamic relocations
2023-04-25 11:22:46 -07:00
Linus Torvalds 3f614ab563 Interrupt core and drivers updates:
- Core:
 
    - Add tracepoints for tasklet callbacks which makes it possible to
      analyze individual tasklet functions instead of guess working
      from the overall duration of tasklet processing
 
    - Ensure that secondary interrupt threads have their affinity adjusted
      correctly.
 
  - Drivers:
 
    - A large rework of the RISC-V IPI management to prepare for a new
      RISC-V interrupt architecture
 
    - Small fixes and enhancements all over the place
 
    - Removal of support for various obsolete hardware platforms and the
      related code
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmRGnqsTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoUsSD/9IHF2HogDvMq+9dBqmqQMrryiLOIad
 dne9PvhZu6Cww60WVRbYA5dvmyRx3oi9vHb5xrqjEgEXwCGyNGUU9K6seqzqwTjr
 BuhokcbeimCVUBsF9/6x0k50tRSRP0oCLA49WDJ+uaXyICII+y+p+qkQOQmP6UTx
 sCpA6Y51RpO7eAcxiMqLa2XgiixQCFZvRXRmO0a0DcxY3DhOSz6PbecTWcY43jtX
 CpHiNZkeiVmLOAmbfPF/mBBRczt9BzYTx3C/NA2TTXwwA2Mcw7p2Vmh3JL2cTWzc
 nD6nvarsTkOk9T8LkT8uEk/ovalwXtTn+Z8yYrcI3o2I89y4cat56haz/Y2tOTFG
 D5fUXHIFTV8jsBUUL2Ai+3PCjoSzd1jbqua7fa8496FqS2FyZjNsHeuzIUXRyQd9
 2/VF+sT5NQ6ytYzgiUuoO13VcI6e6Hc3mwmbd3RhKMf+epZQ9ifx9KcLlokWcxcS
 bdJSHWz6Zos3hH+GRilXmgi16xNN7eaYxEtg0FPUBuB2zWYzZwreY2uvlZGqYpVG
 OKTncko7TeDOR8PXybWXXce6VhKxhMHgpHOdFMFm4lIqDzpbMmyYjNaXdxFqhyGM
 s/FTxPOdEMwapWBGr5Fhumepgdmujc2USZArnIPvnzwF5mUje+U1Pg4xHeLYF4lU
 Taaw4Jc5OvAD2A==
 =EWF0
 -----END PGP SIGNATURE-----

Merge tag 'irq-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull interrupt updates from Thomas Gleixner:
 "Core:

   - Add tracepoints for tasklet callbacks which makes it possible to
     analyze individual tasklet functions instead of guess working from
     the overall duration of tasklet processing

   - Ensure that secondary interrupt threads have their affinity
     adjusted correctly

  Drivers:

   - A large rework of the RISC-V IPI management to prepare for a new
     RISC-V interrupt architecture

   - Small fixes and enhancements all over the place

   - Removal of support for various obsolete hardware platforms and the
     related code"

* tag 'irq-core-2023-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
  irqchip/st: Remove stih415/stih416 and stid127 platforms support
  irqchip/gic-v3: Add Rockchip 3588001 erratum workaround
  genirq: Update affinity of secondary threads
  softirq: Add trace points for tasklet entry/exit
  irqchip/loongson-pch-pic: Fix pch_pic_acpi_init calling
  irqchip/loongson-pch-pic: Fix registration of syscore_ops
  irqchip/loongson-eiointc: Fix registration of syscore_ops
  irqchip/loongson-eiointc: Fix incorrect use of acpi_get_vec_parent
  irqchip/loongson-eiointc: Fix returned value on parsing MADT
  irqchip/riscv-intc: Add empty irq_eoi() for chained irq handlers
  RISC-V: Use IPIs for remote icache flush when possible
  RISC-V: Use IPIs for remote TLB flush when possible
  RISC-V: Allow marking IPIs as suitable for remote FENCEs
  RISC-V: Treat IPIs as normal Linux IRQs
  irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
  RISC-V: Clear SIP bit only when using SBI IPI operations
  irqchip/irq-sifive-plic: Add syscore callbacks for hibernation
  irqchip: Use of_property_read_bool() for boolean properties
  irqchip/bcm-6345-l1: Request memory region
  irqchip/gicv3: Workaround for NVIDIA erratum T241-FABRIC-4
  ...
2023-04-25 11:16:08 -07:00
Anup Patel 8fe6f7e14c RISC-V: Detect AIA CSRs from ISA string
We have two extension names for AIA ISA support: Smaia (M-mode AIA CSRs)
and Ssaia (S-mode AIA CSRs).

We extend the ISA string parsing to detect Smaia and Ssaia extensions.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-21 17:45:42 +05:30
Palmer Dabbelt 310c33dc7a
Merge patch series "Introduce 64b relocatable kernel"
Alexandre Ghiti <alexghiti@rivosinc.com> says:

After multiple attempts, this patchset is now based on the fact that the
64b kernel mapping was moved outside the linear mapping.

The first patch allows to build relocatable kernels but is not selected
by default. That patch is a requirement for KASLR.
The second and third patches take advantage of an already existing powerpc
script that checks relocations at compile-time, and uses it for riscv.

* b4-shazam-merge:
  riscv: Use --emit-relocs in order to move .rela.dyn in init
  riscv: Check relocations at compile time
  powerpc: Move script to check relocations at compile time in scripts/
  riscv: Introduce CONFIG_RELOCATABLE
  riscv: Move .rela.dyn outside of init to avoid empty relocations
  riscv: Prepare EFI header for relocatable kernels

Link: https://lore.kernel.org/r/20230329045329.64565-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:47:45 -07:00
Alexandre Ghiti 39b3307294
riscv: Introduce CONFIG_RELOCATABLE
This config allows to compile 64b kernel as PIE and to relocate it at
any virtual address at runtime: this paves the way to KASLR.
Runtime relocation is possible since relocation metadata are embedded into
the kernel.

Note that relocating at runtime introduces an overhead even if the
kernel is loaded at the same address it was linked at and that the compiler
options are those used in arm64 which uses the same RELA relocation
format.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-4-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:46:30 -07:00
Alexandre Ghiti 69a90d2fe1
riscv: Move .rela.dyn outside of init to avoid empty relocations
This is a preparatory patch for relocatable kernels: .rela.dyn should be
in .init but doing so actually produces empty relocations, so this should
be a temporary commit until we find a solution.

This issue was reported here [1].

[1] https://lore.kernel.org/all/4a6fc7a3-9697-a49b-0941-97f32194b0d7@ghiti.fr/.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-3-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:46:29 -07:00
Alexandre Ghiti 55de1e4ad4
riscv: Prepare EFI header for relocatable kernels
ld does not handle relocations correctly as explained here [1],
a fix for that was proposed by Nelson there but we have to support older
toolchains and then provide this fix.

Note that llvm does not need this fix and is then excluded.

[1] https://sourceware.org/pipermail/binutils/2023-March/126690.html

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329045329.64565-2-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:46:28 -07:00
Palmer Dabbelt 2667e3673f
Merge patch series "RISC-V kasan rework"
Alexandre Ghiti <alexghiti@rivosinc.com> says:

As described in patch 2, our current kasan implementation is intricate,
so I tried to simplify the implementation and mimic what arm64/x86 are
doing.

In addition it fixes UEFI bootflow with a kasan kernel and kasan inline
instrumentation: all kasan configurations were tested on a large ubuntu
kernel with success with KASAN_KUNIT_TEST and KASAN_MODULE_TEST.

inline ubuntu config + uefi:
 sv39: OK
 sv48: OK
 sv57: OK

outline ubuntu config + uefi:
 sv39: OK
 sv48: OK
 sv57: OK

Actually 1 test always fails with KASAN_KUNIT_TEST that I have to check:
KASAN failure expected in "set_bit(nr, addr)", but none occurrred

Note that Palmer recently proposed to remove COMMAND_LINE_SIZE from the
userspace abi
https://lore.kernel.org/lkml/20221211061358.28035-1-palmer@rivosinc.com/T/
so that we can finally increase the command line to fit all kasan kernel
parameters.

All of this should hopefully fix the syzkaller riscv build that has been
failing for a few months now, any test is appreciated and if I can help
in any way, please ask.

* b4-shazam-merge:
  riscv: Unconditionnally select KASAN_VMALLOC if KASAN
  riscv: Fix ptdump when KASAN is enabled
  riscv: Fix EFI stub usage of KASAN instrumented strcmp function
  riscv: Move DTB_EARLY_BASE_VA to the kernel address space
  riscv: Rework kasan population functions
  riscv: Split early and final KASAN population functions

Link: https://lore.kernel.org/r/20230203075232.274282-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:24:56 -07:00
Alexandre Ghiti 617955ca6e
riscv: Fix EFI stub usage of KASAN instrumented strcmp function
The EFI stub must not use any KASAN instrumented code as the kernel
proper did not initialize the thread pointer and the mapping for the
KASAN shadow region.

Avoid using the generic strcmp function, instead use the one in
drivers/firmware/efi/libstub/string.c.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20230203075232.274282-5-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-19 07:24:52 -07:00
Palmer Dabbelt eb04e72b34
Merge patch series "RISC-V Hardware Probing User Interface"
Evan Green <evan@rivosinc.com> says:

There's been a bunch of off-list discussions about this, including at
Plumbers.  The original plan was to do something involving providing an
ISA string to userspace, but ISA strings just aren't sufficient for a
stable ABI any more: in order to parse an ISA string users need the
version of the specifications that the string is written to, the version
of each extension (sometimes at a finer granularity than the RISC-V
releases/versions encode), and the expected use case for the ISA string
(ie, is it a U-mode or M-mode string).  That's a lot of complexity to
try and keep ABI compatible and it's probably going to continue to grow,
as even if there's no more complexity in the specifications we'll have
to deal with the various ISA string parsing oddities that end up all
over userspace.

Instead this patch set takes a very different approach and provides a set
of key/value pairs that encode various bits about the system.  The big
advantage here is that we can clearly define what these mean so we can
ensure ABI stability, but it also allows us to encode information that's
unlikely to ever appear in an ISA string (see the misaligned access
performance, for example).  The resulting interface looks a lot like
what arm64 and x86 do, and will hopefully fit well into something like
ACPI in the future.

The actual user interface is a syscall, with a vDSO function in front of
it. The vDSO function can answer some queries without a syscall at all,
and falls back to the syscall for cases it doesn't have answers to.
Currently we prepopulate it with an array of answers for all keys and
a CPU set of "all CPUs". This can be adjusted as necessary to provide
fast answers to the most common queries.

An example series in glibc exposing this syscall and using it in an
ifunc selector for memcpy can be found at [1].

I was asked about the performance delta between this and something like
sysfs. I created a small test program and ran it on a Nezha D1
Allwinner board. Doing each operation 100000 times and dividing, these
operations take the following amount of time:
 - open()+read()+close() of /sys/kernel/cpu_byteorder: 3.8us
 - access("/sys/kernel/cpu_byteorder", R_OK): 1.3us
 - riscv_hwprobe() vDSO and syscall: .0094us
 - riscv_hwprobe() vDSO with no syscall: 0.0091us

These numbers get farther apart if we query multiple keys, as sysfs will
scale linearly with the number of keys, where the dedicated syscall
stays the same. To frame these numbers, I also did a tight
fork/exec/wait loop, which I measured as 4.8ms. So doing 4
open/read/close operations is a delta of about 0.3%, versus a single vDSO
call is a delta of essentially zero.

[1] https://patchwork.ozlabs.org/project/glibc/list/?series=343050

* b4-shazam-merge:
  RISC-V: Add hwprobe vDSO function and data
  selftests: Test the new RISC-V hwprobe interface
  RISC-V: hwprobe: Support probing of misaligned access performance
  RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
  RISC-V: Add a syscall for HW probing
  RISC-V: Move struct riscv_cpuinfo to new header

Link: https://lore.kernel.org/r/20230407231103.2622178-1-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 19:49:51 -07:00
Evan Green aa5af0aa90
RISC-V: Add hwprobe vDSO function and data
Add a vDSO function __vdso_riscv_hwprobe, which can sit in front of the
riscv_hwprobe syscall and answer common queries. We stash a copy of
static answers for the "all CPUs" case in the vDSO data page. This data
is private to the vDSO, so we can decide later to change what's stored
there or under what conditions we defer to the syscall. Currently all
data can be discovered at boot, so the vDSO function answers all queries
when the cpumask is set to the "all CPUs" hint.

There's also a boolean in the data that lets the vDSO function know that
all CPUs are the same. In that case, the vDSO will also answer queries
for arbitrary CPU masks in addition to the "all CPUs" hint.

Signed-off-by: Evan Green <evan@rivosinc.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-7-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 15:48:18 -07:00
Evan Green 62a31d6e38
RISC-V: hwprobe: Support probing of misaligned access performance
This allows userspace to select various routines to use based on the
performance of misaligned access on the target hardware.

Rather than adding DT bindings, this change taps into the alternatives
mechanism used to probe CPU errata. Add a new function pointer alongside
the vendor-specific errata_patch_func() that probes for desirable errata
(otherwise known as "features"). Unlike the errata_patch_func(), this
function is called on each CPU as it comes up, so it can save
feature information per-CPU.

The T-head C906 has fast unaligned access, both as defined by GCC [1],
and in performing a basic benchmark, which determined that byte copies
are >50% slower than a misaligned word copy of the same data size (source
for this test at [2]):

bytecopy size f000 count 50000 offset 0 took 31664899 us
wordcopy size f000 count 50000 offset 0 took 5180919 us
wordcopy size f000 count 50000 offset 1 took 13416949 us

[1] https://github.com/gcc-mirror/gcc/blob/master/gcc/config/riscv/riscv.cc#L353
[2] https://pastebin.com/EPXvDHSW

Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-5-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 15:48:16 -07:00
Evan Green 00e76e2c6a
RISC-V: hwprobe: Add support for RISCV_HWPROBE_BASE_BEHAVIOR_IMA
We have an implicit set of base behaviors that userspace depends on,
which are mostly defined in various ISA specifications.

Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-4-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 15:48:15 -07:00
Evan Green ea3de9ce8a
RISC-V: Add a syscall for HW probing
We don't have enough space for these all in ELF_HWCAP{,2} and there's no
system call that quite does this, so let's just provide an arch-specific
one to probe for hardware capabilities.  This currently just provides
m{arch,imp,vendor}id, but with the key-value pairs we can pass more in
the future.

Co-developed-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-3-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 15:48:14 -07:00
Evan Green ff77cf5b2e
RISC-V: Move struct riscv_cpuinfo to new header
In preparation for tracking and exposing microarchitectural details to
userspace (like whether or not unaligned accesses are fast), move the
riscv_cpuinfo struct out to its own new cpufeatures.h header. It will
need to be used by more than just cpu.c.

Signed-off-by: Evan Green <evan@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Paul Walmsley <paul.walmsley@sifive.com>
Link: https://lore.kernel.org/r/20230407231103.2622178-2-evan@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-18 15:48:03 -07:00
Alexandre Ghiti f158162607
riscv: Do not set initial_boot_params to the linear address of the dtb
early_init_dt_verify() is already called in parse_dtb() and since the dtb
address does not change anymore (it is now in the fixmap region), no need
to reset initial_boot_params by calling early_init_dt_verify() again.

Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230329081932.79831-3-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-13 18:14:33 -07:00
Alexandre Ghiti ef69d2559f
riscv: Move early dtb mapping into the fixmap region
riscv establishes 2 virtual mappings:

- early_pg_dir maps the kernel which allows to discover the system
  memory
- swapper_pg_dir installs the final mapping (linear mapping included)

We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem() must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.

The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
must be called before swapper_pg_dir is set up otherwise we could
allocate in reserved memory defined in the dtb.

So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.

Fixes: 922b0375fc ("riscv: Fix memblock reservation for device tree blob")
Fixes: 8f3a2b4a96 ("RISC-V: Move DT mapping outof fixmap")
Fixes: 50e63dd8ed ("riscv: fix reserved memory setup")
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230329081932.79831-2-alexghiti@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-13 18:14:26 -07:00
Rob Herring 06d9066976 riscv: cacheinfo: Adjust includes to remove of_device.h
Now that of_cpu_device_node_get() is defined in of.h, of_device.h is just
implicitly including other includes, and is no longer needed. Adjust the
include files with what was implicitly included by of_device.h (cpu.h and
of.h) and drop including of_device.h.

Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-9-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-13 17:46:34 -05:00
Rob Herring a0418108d7 riscv: Add explicit include for cpu.h
Removing the include of cpu.h from of_device.h (included by
of_platform.h) causes an error in setup.c:

arch/riscv/kernel/setup.c:313:22: error: arithmetic on a pointer to an incomplete type 'typeof(struct cpu)' (aka 'struct cpu')

The of_platform.h header is not necessary either, so it can be dropped.

Link: https://lore.kernel.org/r/20230329-dt-cpu-header-cleanups-v1-8-581e2605fe47@kernel.org
Signed-off-by: Rob Herring <robh@kernel.org>
2023-04-13 17:46:34 -05:00
Song Shuai 6a24915145
Revert "riscv: Set more data to cacheinfo"
This reverts commit baf7cbd94b.

There are some duplicate cache attributes populations executed
in both ci_leaf_init() and later cache_setup_properties().

Revert the commit baf7cbd94b ("riscv: Set more data to cacheinfo")
to setup only the level and type attributes at this early place.

Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230308064734.512457-1-suagrfillet@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-11 16:07:53 -07:00
Mathis Salmen 8d73648274
riscv: add icache flush for nommu sigreturn trampoline
In a NOMMU kernel, sigreturn trampolines are generated on the user
stack by setup_rt_frame. Currently, these trampolines are not instruction
fenced, thus their visibility to ifetch is not guaranteed.

This patch adds a flush_icache_range in setup_rt_frame to fix this
problem.

Signed-off-by: Mathis Salmen <mathis.salmen@matsal.de>
Fixes: 6bd33e1ece ("riscv: add nommu support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230406101130.82304-1-mathis.salmen@matsal.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-11 12:52:27 -07:00
Björn Töpel 9c2598d435
riscv: entry: Save a0 prior syscall_enter_from_user_mode()
The RISC-V calling convention passes the first argument, and the
return value in the a0 register. For this reason, the a0 register
needs some extra care; When handling syscalls, the a0 register is
saved into regs->orig_a0, so a0 can be properly restored for,
e.g. interrupted syscalls.

This functionality was broken with the introduction of the generic
entry patches. Here, a0 was saved into orig_a0 after calling
syscall_enter_from_user_mode(), which can change regs->a0 for some
paths, incorrectly restoring a0.

This is resolved, by saving a0 prior doing the
syscall_enter_from_user_mode() call.

Fixes: f0bddf5058 ("riscv: entry: Convert to generic entry")
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Tested-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reported-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Tested-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20230403065207.1070974-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-04-11 08:25:29 -07:00
Anup Patel fb0f3d281b RISC-V: Allow marking IPIs as suitable for remote FENCEs
To do remote FENCEs (i.e. remote TLB flushes) using IPI calls on the
RISC-V kernel, we need hardware mechanism to directly inject IPI from
the supervisor mode (i.e. RISC-V kernel) instead of using SBI calls.

The upcoming AIA IMSIC devices allow direct IPI injection from the
supervisor mode (i.e. RISC-V kernel). To support this, we extend the
riscv_ipi_set_virq_range() function so that IPI provider (i.e. irqchip
drivers can mark IPIs as suitable for remote FENCEs.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-5-apatel@ventanamicro.com
2023-04-08 11:26:24 +01:00
Anup Patel 832f15f426 RISC-V: Treat IPIs as normal Linux IRQs
Currently, the RISC-V kernel provides arch specific hooks (i.e.
struct riscv_ipi_ops) to register IPI handling methods. The stats
gathering of IPIs is also arch specific in the RISC-V kernel.

Other architectures (such as ARM, ARM64, and MIPS) have moved away
from custom arch specific IPI handling methods. Currently, these
architectures have Linux irqchip drivers providing a range of Linux
IRQ numbers to be used as IPIs and IPI triggering is done using
generic IPI APIs. This approach allows architectures to treat IPIs
as normal Linux IRQs and IPI stats gathering is done by the generic
Linux IRQ subsystem.

We extend the RISC-V IPI handling as-per above approach so that arch
specific IPI handling methods (struct riscv_ipi_ops) can be removed
and the IPI handling is done through the Linux IRQ subsystem.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-4-apatel@ventanamicro.com
2023-04-08 11:26:24 +01:00
Anup Patel 0c60a31ce6 irqchip/riscv-intc: Allow drivers to directly discover INTC hwnode
Various RISC-V drivers (such as SBI IPI, SBI Timer, SBI PMU, and
KVM RISC-V) don't have associated DT node but these drivers need
standard per-CPU (local) interrupts defined by the RISC-V privileged
specification.

We add riscv_get_intc_hwnode() in arch/riscv which allows RISC-V
drivers not having DT node to discover INTC hwnode which in-turn
helps these drivers to map per-CPU (local) interrupts provided
by the INTC driver.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-3-apatel@ventanamicro.com
2023-04-08 11:26:24 +01:00
Anup Patel 3ee92565b8 RISC-V: Clear SIP bit only when using SBI IPI operations
The software interrupt pending (i.e. [M|S]SIP) bit is writeable for
S-mode but read-only for M-mode so we clear this bit only when using
SBI IPI operations.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Bin Meng <bmeng.cn@gmail.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://lore.kernel.org/r/20230328035223.1480939-2-apatel@ventanamicro.com
2023-04-08 11:26:23 +01:00
Palmer Dabbelt e45d6a52fe
Merge patch series "riscv: Add GENERIC_ENTRY support"
guoren@kernel.org <guoren@kernel.org> says:

From: Guo Ren <guoren@linux.alibaba.com>

The patches convert riscv to use the generic entry infrastructure from
kernel/entry/*. Some optimization for entry.S with new .macro and merge
ret_from_kernel_thread into ret_from_fork.

* b4-shazam-merge:
  riscv: entry: Consolidate general regs saving/restoring
  riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
  riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
  riscv: entry: Convert to generic entry
  riscv: entry: Add noinstr to prevent instrumentation inserted
  riscv: ptrace: Remove duplicate operation

Link: https://lore.kernel.org/r/20230222033021.983168-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-24 13:34:43 -07:00
Valentin Schneider 4c8c3c7f70 treewide: Trace IPIs sent via smp_send_reschedule()
To be able to trace invocations of smp_send_reschedule(), rename the
arch-specific definitions of it to arch_smp_send_reschedule() and wrap it
into an smp_send_reschedule() that contains a tracepoint.

Changes to include the declaration of the tracepoint were driven by the
following coccinelle script:

  @func_use@
  @@
  smp_send_reschedule(...);

  @include@
  @@
  #include <trace/events/ipi.h>

  @no_include depends on func_use && !include@
  @@
    #include <...>
  +
  + #include <trace/events/ipi.h>

[csky bits]
[riscv bits]
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230307143558.294354-6-vschneid@redhat.com
2023-03-24 11:01:28 +01:00
Jisheng Zhang 45b32b946a
riscv: entry: Consolidate general regs saving/restoring
Consolidate the saving/restoring GPs (except zero, ra, sp, gp,
tp and t0) into save_from_x6_to_x31/restore_from_x6_to_x31 macros.

No functional change intended.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-8-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:47:03 -07:00
Jisheng Zhang ab9164dae2
riscv: entry: Consolidate ret_from_kernel_thread into ret_from_fork
The ret_from_kernel_thread() behaves similarly with ret_from_fork(),
the only difference is whether call the fn(arg) or not, this can be
achieved by testing fn is NULL or not, I.E s0 is 0 or not. Many
architectures have done the same thing, it makes entry.S more clean.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-7-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:47:02 -07:00
Jisheng Zhang 0bf298ad2b
riscv: entry: Remove extra level wrappers of trace_hardirqs_{on,off}
Since riscv is converted to generic entry, there's no need for the
extra wrappers of trace_hardirqs_{on,off}.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-6-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:47:01 -07:00
Guo Ren f0bddf5058
riscv: entry: Convert to generic entry
This patch converts riscv to use the generic entry infrastructure from
kernel/entry/*. The generic entry makes maintainers' work easier and
codes more elegant. Here are the changes:

 - More clear entry.S with handle_exception and ret_from_exception
 - Get rid of complex custom signal implementation
 - Move syscall procedure from assembly to C, which is much more
   readable.
 - Connect ret_from_fork & ret_from_kernel_thread to generic entry.
 - Wrap with irqentry_enter/exit and syscall_enter/exit_from_user_mode
 - Use the standard preemption code instead of custom

Suggested-by: Huacai Chen <chenhuacai@kernel.org>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Tested-by: Yipeng Zou <zouyipeng@huawei.com>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Ben Hutchings <ben@decadent.org.uk>
Link: https://lore.kernel.org/r/20230222033021.983168-5-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:47:00 -07:00
Guo Ren d0db02c628
riscv: entry: Add noinstr to prevent instrumentation inserted
Without noinstr the compiler is free to insert instrumentation (think
all the k*SAN, KCov, GCov, ftrace etc..) which can call code we're not
yet ready to run this early in the entry path, for instance it could
rely on RCU which isn't on yet, or expect lockdep state. (by peterz)

Link: https://lore.kernel.org/linux-riscv/YxcQ6NoPf3AH0EXe@hirez.programming.kicks-ass.net/
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Jisheng Zhang <jszhang@kernel.org>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230222033021.983168-4-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:46:59 -07:00
Guo Ren 8574bf8d0d
riscv: ptrace: Remove duplicate operation
The TIF_SYSCALL_TRACE is controlled by a common code, see
kernel/ptrace.c and include/linux/thread_info.h.

clear_task_syscall_work(child, SYSCALL_TRACE);

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230222033021.983168-3-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-23 08:46:58 -07:00
Fangrui Song aff69273af vdso: Improve cmd_vdso_check to check all dynamic relocations
The actual intention is that no dynamic relocation exists in the VDSO. For
this the VDSO build validates that the resulting .so file does not have any
relocations which are specified via $(ARCH_REL_TYPE_ABS) per architecture,
which is fragile as e.g. ARM64 lacks an entry for R_AARCH64_RELATIVE. Aside
of that ARCH_REL_TYPE_ABS is a misnomer as it checks for relative
relocations too.

However, some GNU ld ports produce unneeded R_*_NONE relocation entries. If
a port fails to determine the exact .rel[a].dyn size, the trailing zeros
become R_*_NONE relocations. E.g. ld's powerpc port recently fixed
https://sourceware.org/bugzilla/show_bug.cgi?id=29540). R_*_NONE are
generally a no-op in the dynamic loaders. So just ignore them.

Remove the ARCH_REL_TYPE_ABS defines and just validate that the resulting
.so file does not contain any R_* relocation entries except R_*_NONE.

Signed-off-by: Fangrui Song <maskray@google.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for aarch64
Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> # for vDSO, aarch64
Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc)
Link: https://lore.kernel.org/r/20230310190750.3323802-1-maskray@google.com
2023-03-21 21:15:34 +01:00
Palmer Dabbelt 4b740779ac
Merge patch series "RISC-V: Apply Zicboz to clear_page"
Andrew Jones <ajones@ventanamicro.com> says:

When the Zicboz extension is available we can more rapidly zero naturally
aligned Zicboz block sized chunks of memory. As pages are always page
aligned and are larger than any Zicboz block size will be, then
clear_page() appears to be a good candidate for the extension. While cycle
count and energy consumption should also be considered, we can be pretty
certain that implementing clear_page() with the Zicboz extension is a win
by comparing the new dynamic instruction count with its current count[1].
Doing so we see that the new count is just over a quarter of the old count
(see patch6's commit message for more details).

For those of you who reviewed v1[2], you may be looking for the memset()
patches. As pointed out in v1, and a couple follow-up emails, it's not
clear that patching memset() is a win yet. When I get a chance to test
on real hardware with a comprehensive benchmark collection then I can
post the memset() patches separately (assuming the benchmarks show it's
worthwhile).

* b4-shazam-merge:
  RISC-V: KVM: Expose Zicboz to the guest
  RISC-V: KVM: Provide UAPI for Zicboz block size
  RISC-V: Use Zicboz in clear_page when available
  RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
  RISC-V: Add Zicboz detection and block size parsing
  dt-bindings: riscv: Document cboz-block-size
  RISC-V: Factor out body of riscv_init_cbom_blocksize loop
  RISC-V: alternatives: Support patching multiple insns in assembly

Link: https://lore.kernel.org/r/20230224162631.405473-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-15 07:11:08 -07:00
Andrew Jones ab0f77465e
RISC-V: Use Zicboz in clear_page when available
Using memset() to zero a 4K page takes 563 total instructions, where
20 are branches. clear_page(), with Zicboz and a 64 byte block size,
takes 169 total instructions, where 4 are branches and 33 are nops.
Even though the block size is a variable, thanks to alternatives, we
can still implement a Duff device without having to do any preliminary
calculations. This is achieved by using the alternatives' cpufeature
value (the upper 16 bits of patch_id). The value used is the maximum
zicboz block size order accepted at the patch site. This enables us
to stop patching / unrolling when 4K bytes have been zeroed (we would
loop and continue after 4K if the page size would be larger)

For 4K pages, unrolling 16 times allows block sizes of 64 and 128 to
only loop a few times and larger block sizes to not loop at all. Since
cbo.zero doesn't take an offset, we also need an 'add' after each
instruction, making the loop body 112 to 160 bytes. Hopefully this
is small enough to not cause icache misses.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 21:26:06 -07:00
Andrew Jones d25f256332
RISC-V: cpufeatures: Put the upper 16 bits of patch ID to work
cpufeature IDs are consecutive integers starting at 26, so a 32-bit
patch ID allows an aircraft carrier load of feature IDs. Repurposing
the upper 16 bits still leaves a boat load of feature IDs and gains
16 bits which may be used to control patching on a per patch-site
basis.

This will be initially used in Zicboz's application to clear_page(),
as Zicboz's block size must also be considered. In that case, the
upper 16-bit value's role will be to convey the maximum block size
which the Zicboz clear_page() implementation supports.

cpufeature patch sites which need to check for the existence or
absence of other cpufeatures may also be able to make use of this.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-6-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 21:26:05 -07:00
Andrew Jones 7ea5a73617
RISC-V: Add Zicboz detection and block size parsing
Parse "riscv,cboz-block-size" from the DT by piggybacking on Zicbom's
riscv_init_cbom_blocksize(). Additionally check the DT for the presence
of the "zicboz" extension and, when it's present, validate the parsed
cboz block size as we do Zicbom's cbom block size with
riscv_isa_extension_check().

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230224162631.405473-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 21:26:04 -07:00
Palmer Dabbelt 73bde0ca0a
Merge patch series "riscv: alternative/cpufeature related cleanups"
Andrew Jones <ajones@ventanamicro.com> says:

This series has no intended functional change. These cleanups were
found while renaming errata_id to patch_id in order to better
convey that its purpose is larger than errata (it's also for
cpufeatures).

* b4-shazam-merge:
  riscv: cpufeature: Drop errata_list.h and other unused includes
  riscv: lib: Include hwcap.h directly
  riscv: alternatives: Rename errata_id to patch_id
  riscv: alternatives: Remove unnecessary define and unused struct
  riscv: Rename Kconfig.erratas to Kconfig.errata
  riscv: Clarify RISCV_ALTERNATIVE help text

Link: https://lore.kernel.org/r/20230224154601.88163-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 20:51:34 -07:00
Andrew Jones 816a697441
riscv: cpufeature: Drop errata_list.h and other unused includes
Drop errata_list.h, since cpufeature.c includes hwcap.h directly to
get cpufeature IDs. And, while there, prune the rest of the unused
includes too.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-7-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 20:51:25 -07:00
Andrew Jones ff19a8dee1
riscv: alternatives: Rename errata_id to patch_id
Alternatives are used for both errata and cpufeatures. Use a more
generic name, 'patch_id', as in "ID of code patching site", to
avoid confusion when alternatives are used for cpufeatures.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230224154601.88163-5-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-14 20:51:23 -07:00
Palmer Dabbelt 4a4c459872
Merge patch series "riscv, mm: detect svnapot cpu support at runtime"
Qinglin Pan <panqinglin00@gmail.com> says:

Svnapot is a RISC-V extension for marking contiguous 4K pages as a non-4K
page. This patch set is for using Svnapot in hugetlb fs and huge vmap.

This patchset adds a Kconfig item for using Svnapot in
"Platform type"->"SVNAPOT extension support". Its default value is on,
and people can set it off if they don't allow kernel to detect Svnapot
hardware support and leverage it.

Tested on:
  - qemu rv64 with "Svnapot support" off and svnapot=true.
  - qemu rv64 with "Svnapot support" on and svnapot=true.
  - qemu rv64 with "Svnapot support" off and svnapot=false.
  - qemu rv64 with "Svnapot support" on and svnapot=false.

* b4-shazam-merge:
  riscv: mm: support Svnapot in huge vmap
  riscv: mm: support Svnapot in hugetlb page
  riscv: mm: modify pte format for Svnapot

Link: https://lore.kernel.org/r/20230209131647.17245-1-panqinglin00@gmail.com
[Palmer: fix up the feature ordering in the merge]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 18:13:45 -08:00
Conor Dooley 2a8db5ec4a
RISC-V: Don't check text_mutex during stop_machine
We're currently using stop_machine() to update ftrace & kprobes, which
means that the thread that takes text_mutex during may not be the same
as the thread that eventually patches the code.  This isn't actually a
race because the lock is still held (preventing any other concurrent
accesses) and there is only one thread running during stop_machine(),
but it does trigger a lockdep failure.

This patch just elides the lockdep check during stop_machine.

Fixes: c15ac4fd60 ("riscv/ftrace: Add dynamic function tracer support")
Suggested-by: Steven Rostedt <rostedt@goodmis.org>
Reported-by: Changbin Du <changbin.du@gmail.com>
Signed-off-by: Palmer Dabbelt <palmerdabbelt@google.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230303143754.4005217-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 14:58:51 -08:00
Alexandre Ghiti 76950340cf
riscv: Use READ_ONCE_NOCHECK in imprecise unwinding stack mode
When CONFIG_FRAME_POINTER is unset, the stack unwinding function
walk_stackframe randomly reads the stack and then, when KASAN is enabled,
it can lead to the following backtrace:

[    0.000000] ==================================================================
[    0.000000] BUG: KASAN: stack-out-of-bounds in walk_stackframe+0xa6/0x11a
[    0.000000] Read of size 8 at addr ffffffff81807c40 by task swapper/0
[    0.000000]
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.2.0-12919-g24203e6db61f #43
[    0.000000] Hardware name: riscv-virtio,qemu (DT)
[    0.000000] Call Trace:
[    0.000000] [<ffffffff80007ba8>] walk_stackframe+0x0/0x11a
[    0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff80c49c80>] dump_stack_lvl+0x22/0x36
[    0.000000] [<ffffffff80c3783e>] print_report+0x198/0x4a8
[    0.000000] [<ffffffff80099ecc>] init_param_lock+0x26/0x2a
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff8015f68a>] kasan_report+0x9a/0xc8
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff80007c4a>] walk_stackframe+0xa2/0x11a
[    0.000000] [<ffffffff8006e99c>] desc_make_final+0x80/0x84
[    0.000000] [<ffffffff8009a04e>] stack_trace_save+0x88/0xa6
[    0.000000] [<ffffffff80099fc2>] filter_irq_stacks+0x72/0x76
[    0.000000] [<ffffffff8006b95e>] devkmsg_read+0x32a/0x32e
[    0.000000] [<ffffffff8015ec16>] kasan_save_stack+0x28/0x52
[    0.000000] [<ffffffff8006e998>] desc_make_final+0x7c/0x84
[    0.000000] [<ffffffff8009a04a>] stack_trace_save+0x84/0xa6
[    0.000000] [<ffffffff8015ec52>] kasan_set_track+0x12/0x20
[    0.000000] [<ffffffff8015f22e>] __kasan_slab_alloc+0x58/0x5e
[    0.000000] [<ffffffff8015e7ea>] __kmem_cache_create+0x21e/0x39a
[    0.000000] [<ffffffff80e133ac>] create_boot_cache+0x70/0x9c
[    0.000000] [<ffffffff80e17ab2>] kmem_cache_init+0x6c/0x11e
[    0.000000] [<ffffffff80e00fd6>] mm_init+0xd8/0xfe
[    0.000000] [<ffffffff80e011d8>] start_kernel+0x190/0x3ca
[    0.000000]
[    0.000000] The buggy address belongs to stack of task swapper/0
[    0.000000]  and is located at offset 0 in frame:
[    0.000000]  stack_trace_save+0x0/0xa6
[    0.000000]
[    0.000000] This frame has 1 object:
[    0.000000]  [32, 56) 'c'
[    0.000000]
[    0.000000] The buggy address belongs to the physical page:
[    0.000000] page:(____ptrval____) refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x81a07
[    0.000000] flags: 0x1000(reserved|zone=0)
[    0.000000] raw: 0000000000001000 ff600003f1e3d150 ff600003f1e3d150 0000000000000000
[    0.000000] raw: 0000000000000000 0000000000000000 00000001ffffffff
[    0.000000] page dumped because: kasan: bad access detected
[    0.000000]
[    0.000000] Memory state around the buggy address:
[    0.000000]  ffffffff81807b00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000]  ffffffff81807b80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000] >ffffffff81807c00: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 00 00 f3
[    0.000000]                                            ^
[    0.000000]  ffffffff81807c80: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000]  ffffffff81807d00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[    0.000000] ==================================================================

Fix that by using READ_ONCE_NOCHECK when reading the stack in imprecise
mode.

Fixes: 5d8544e2d0 ("RISC-V: Generic library routines and assembly")
Reported-by: Chathura Rajapaksha <chathura.abeyrathne.lk@gmail.com>
Link: https://lore.kernel.org/all/CAD7mqryDQCYyJ1gAmtMm8SASMWAQ4i103ptTb0f6Oda=tPY2=A@mail.gmail.com/
Suggested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Link: https://lore.kernel.org/r/20230308091639.602024-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-09 14:50:35 -08:00
Josh Poimboeuf 071c44e427 sched/idle: Mark arch_cpu_idle_dead() __noreturn
Before commit 076cbf5d2163 ("x86/xen: don't let xen_pv_play_dead()
return"), in Xen, when a previously offlined CPU was brought back
online, it unexpectedly resumed execution where it left off in the
middle of the idle loop.

There were some hacks to make that work, but the behavior was surprising
as do_idle() doesn't expect an offlined CPU to return from the dead (in
arch_cpu_idle_dead()).

Now that Xen has been fixed, and the arch-specific implementations of
arch_cpu_idle_dead() also don't return, give it a __noreturn attribute.

This will cause the compiler to complain if an arch-specific
implementation might return.  It also improves code generation for both
caller and callee.

Also fixes the following warning:

  vmlinux.o: warning: objtool: do_idle+0x25f: unreachable instruction

Reported-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/60d527353da8c99d4cf13b6473131d46719ed16d.1676358308.git.jpoimboe@kernel.org
Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org>
2023-03-08 08:44:28 -08:00
Qinglin Pan 23ad288aaf
riscv: mm: modify pte format for Svnapot
Add one alternative to enable/disable svnapot support, enable this static
key when "svnapot" is in the "riscv,isa" field of fdt and SVNAPOT compile
option is set. It will influence the behavior of has_svnapot. All code
dependent on svnapot should make sure that has_svnapot return true firstly.

Modify PTE definition for Svnapot, and creates some functions in pgtable.h
to mark a PTE as napot and check if it is a Svnapot PTE. Until now, only
64KB napot size is supported in spec, so some macros has only 64KB version.

Signed-off-by: Qinglin Pan <panqinglin00@gmail.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230209131647.17245-2-panqinglin00@gmail.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-07 19:39:15 -08:00
Palmer Dabbelt e18048da9b
RISC-V: Stop emitting attributes
The RISC-V ELF attributes don't contain any useful information.  New
toolchains ignore them, but they frequently trip up various older/mixed
toolchains.  So just turn them off.

Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230223224605.6995-1-palmer@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-03-06 15:55:25 -08:00
Linus Torvalds 01687e7c93 RISC-V Patches for the 6.3 Merge Window, Part 1
There's a bunch of fixes/cleanups throughout the tree as usual, but we
 also have a handful of new features.
 
 * Various improvements to the extension detection and alternative
   patching infrastructure.
 * Zbb-optimized string routines.
 * Support for cpu-capacity in the RISC-V DT bindings.
 * Zicbom no longer depends on toolchain support.
 * Some performance and code size improvements to ftrace.
 * Support for ARCH_WANT_LD_ORPHAN_WARN.
 * Oops now contain the faulting instruction.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmP49coTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiTFnEACuQtGJWSwzH+ORswVyItKzqHcBMU3t
 rWHTFxQ7MdWgQO8nrWUtSypGY4n0DFTCe9w4H3tQFRDaTXbI+ycFjidEDt3eJCMb
 n6WiGuZpdKVS81CQ0Es4dTWQ1i/28fe1851CGK/PkybXdrPPofdCJ9k3Wepxflb/
 2UYxRDyjKt3KbJ2OmN2oF8Ek1rrsGhIC/Dhbdb2JsGZhYF10ZYjquaOLs31WbHMG
 O+n/N/JfZRAif1MDQ71ygAm9KV0kGqe/wcRtsJGETwJ8U3I/cjn2mAGd8BRdy4iL
 9GFmTmi8q27ntUbakikNz3b4aE9xVnLDvRIyOciI3l8rQjrFAsfnQbuRwlaq6BVJ
 BF3e6nAjkcLj23FhbROTlfncEOzrklbNZ+uQIuvyffAUjDoePw9x7o0r+qj7FnOY
 WMfNecJMeE5OGVBqHSVFEcAMlN6uYu6wqbEipEpc+8sTg+w1LM0bUVNhV86/BrnL
 bh+4+7MPYtg45vy2Y8AuPUBFqR2uCekDpbxciCEGsaIzUYRas2zrt9UkWGjKs1VV
 q0qeLSNNA1wBq+q6FprTceipFQIqD5KnmI2GMucF6v4YFg5AzeSOpRc6aeqcs7Z2
 +ApShSOFPjjntZbcpTgkvhrPExr0Jel0xY7YSazUUqY0xOHUwGNBEh/E4rzsRLxr
 qvUpFAIZT60dfQ==
 =XgYl
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:
 "There's a bunch of fixes/cleanups throughout the tree as usual, but we
  also have a handful of new features:

   - Various improvements to the extension detection and alternative
     patching infrastructure

   - Zbb-optimized string routines

   - Support for cpu-capacity in the RISC-V DT bindings

   - Zicbom no longer depends on toolchain support

   - Some performance and code size improvements to ftrace

   - Support for ARCH_WANT_LD_ORPHAN_WARN

   - Oops now contain the faulting instruction"

* tag 'riscv-for-linus-6.3-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (67 commits)
  RISC-V: add a spin_shadow_stack declaration
  riscv: mm: hugetlb: Enable ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP
  riscv: Add header include guards to insn.h
  riscv: alternative: proceed one more instruction for auipc/jalr pair
  riscv: Avoid enabling interrupts in die()
  riscv, mm: Perform BPF exhandler fixup on page fault
  RISC-V: take text_mutex during alternative patching
  riscv: hwcap: Don't alphabetize ISA extension IDs
  RISC-V: fix ordering of Zbb extension
  riscv: jump_label: Fixup unaligned arch_static_branch function
  RISC-V: Only provide the single-letter extensions in HWCAP
  riscv: mm: fix regression due to update_mmu_cache change
  scripts/decodecode: Add support for RISC-V
  riscv: Add instruction dump to RISC-V splats
  riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
  riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
  riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
  riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
  riscv: lds: define RUNTIME_DISCARD_EXIT
  RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
  ...
2023-02-25 11:14:08 -08:00
Linus Torvalds a93e884edf Driver core changes for 6.3-rc1
Here is the large set of driver core changes for 6.3-rc1.
 
 There's a lot of changes this development cycle, most of the work falls
 into two different categories:
   - fw_devlink fixes and updates.  This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.
   - driver core changes to work to make struct bus_type able to be moved
     into read-only memory (i.e. const)  The recent work with Rust has
     pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only making
     things safer overall.  This is the contuation of that work (started
     last release with kobject changes) in moving struct bus_type to be
     constant.  We didn't quite make it for this release, but the
     remaining patches will be finished up for the release after this
     one, but the groundwork has been laid for this effort.
 
 Other than that we have in here:
   - debugfs memory leak fixes in some subsystems
   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.
   - cacheinfo rework and fixes
   - Other tiny fixes, full details are in the shortlog
 
 All of these have been in linux-next for a while with no reported
 problems.
 
 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
 -----BEGIN PGP SIGNATURE-----
 
 iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCY/ipdg8cZ3JlZ0Brcm9h
 aC5jb20ACgkQMUfUDdst+ynL3gCgwzbcWu0So3piZyLiJKxsVo9C2EsAn3sZ9gN6
 6oeFOjD3JDju3cQsfGgd
 =Su6W
 -----END PGP SIGNATURE-----

Merge tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core

Pull driver core updates from Greg KH:
 "Here is the large set of driver core changes for 6.3-rc1.

  There's a lot of changes this development cycle, most of the work
  falls into two different categories:

   - fw_devlink fixes and updates. This has gone through numerous review
     cycles and lots of review and testing by lots of different devices.
     Hopefully all should be good now, and Saravana will be keeping a
     watch for any potential regression on odd embedded systems.

   - driver core changes to work to make struct bus_type able to be
     moved into read-only memory (i.e. const) The recent work with Rust
     has pointed out a number of areas in the driver core where we are
     passing around and working with structures that really do not have
     to be dynamic at all, and they should be able to be read-only
     making things safer overall. This is the contuation of that work
     (started last release with kobject changes) in moving struct
     bus_type to be constant. We didn't quite make it for this release,
     but the remaining patches will be finished up for the release after
     this one, but the groundwork has been laid for this effort.

  Other than that we have in here:

   - debugfs memory leak fixes in some subsystems

   - error path cleanups and fixes for some never-able-to-be-hit
     codepaths.

   - cacheinfo rework and fixes

   - Other tiny fixes, full details are in the shortlog

  All of these have been in linux-next for a while with no reported
  problems"

[ Geert Uytterhoeven points out that that last sentence isn't true, and
  that there's a pending report that has a fix that is queued up - Linus ]

* tag 'driver-core-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (124 commits)
  debugfs: drop inline constant formatting for ERR_PTR(-ERROR)
  OPP: fix error checking in opp_migrate_dentry()
  debugfs: update comment of debugfs_rename()
  i3c: fix device.h kernel-doc warnings
  dma-mapping: no need to pass a bus_type into get_arch_dma_ops()
  driver core: class: move EXPORT_SYMBOL_GPL() lines to the correct place
  Revert "driver core: add error handling for devtmpfs_create_node()"
  Revert "devtmpfs: add debug info to handle()"
  Revert "devtmpfs: remove return value of devtmpfs_delete_node()"
  driver core: cpu: don't hand-override the uevent bus_type callback.
  devtmpfs: remove return value of devtmpfs_delete_node()
  devtmpfs: add debug info to handle()
  driver core: add error handling for devtmpfs_create_node()
  driver core: bus: update my copyright notice
  driver core: bus: add bus_get_dev_root() function
  driver core: bus: constify bus_unregister()
  driver core: bus: constify some internal functions
  driver core: bus: constify bus_get_kset()
  driver core: bus: constify bus_register/unregister_notifier()
  driver core: remove private pointer from struct bus_type
  ...
2023-02-24 12:58:55 -08:00
Linus Torvalds 3822a7c409 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X bit.
 
 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.
 
 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes
 
 - Johannes Weiner has a series ("mm: push down lock_page_memcg()") which
   does perform some memcg maintenance and cleanup work.
 
 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".  These filters provide users
   with finer-grained control over DAMOS's actions.  SeongJae has also done
   some DAMON cleanup work.
 
 - Kairui Song adds a series ("Clean up and fixes for swap").
 
 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".
 
 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series.  It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.
 
 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".
 
 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".
 
 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".
 
 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".
 
 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series "mm:
   support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with swap
   PTEs".
 
 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".
 
 - Sergey Senozhatsky has improved zsmalloc's memory utilization with his
   series "zsmalloc: make zspage chain size configurable".
 
 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.  The previous BPF-based approach had
   shortcomings.  See "mm: In-kernel support for memory-deny-write-execute
   (MDWE)".
 
 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".
 
 - T.J.  Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".
 
 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a per-node
   basis.  See the series "Introduce per NUMA node memory error
   statistics".
 
 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage during
   compaction".
 
 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".
 
 - Christoph Hellwig has removed block_device_operations.rw_page() in ths
   series "remove ->rw_page".
 
 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".
 
 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier functions".
 
 - Some pagemap cleanup and generalization work in Mike Rapoport's series
   "mm, arch: add generic implementation of pfn_valid() for FLATMEM" and
   "fixups for generic implementation of pfn_valid()"
 
 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".
 
 - Jason Gunthorpe rationalized the GUP system's interface to the rest of
   the kernel in the series "Simplify the external interface for GUP".
 
 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface.  To support this, we'll temporarily be
   printing warnings when people use the debugfs interface.  See the series
   "mm/damon: deprecate DAMON debugfs interface".
 
 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.
 
 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".
 
 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".
 -----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCY/PoPQAKCRDdBJ7gKXxA
 jlvpAPsFECUBBl20qSue2zCYWnHC7Yk4q9ytTkPB/MMDrFEN9wD/SNKEm2UoK6/K
 DmxHkn0LAitGgJRS/W9w81yrgig9tAQ=
 =MlGs
 -----END PGP SIGNATURE-----

Merge tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm

Pull MM updates from Andrew Morton:

 - Daniel Verkamp has contributed a memfd series ("mm/memfd: add
   F_SEAL_EXEC") which permits the setting of the memfd execute bit at
   memfd creation time, with the option of sealing the state of the X
   bit.

 - Peter Xu adds a patch series ("mm/hugetlb: Make huge_pte_offset()
   thread-safe for pmd unshare") which addresses a rare race condition
   related to PMD unsharing.

 - Several folioification patch serieses from Matthew Wilcox, Vishal
   Moola, Sidhartha Kumar and Lorenzo Stoakes

 - Johannes Weiner has a series ("mm: push down lock_page_memcg()")
   which does perform some memcg maintenance and cleanup work.

 - SeongJae Park has added DAMOS filtering to DAMON, with the series
   "mm/damon/core: implement damos filter".

   These filters provide users with finer-grained control over DAMOS's
   actions. SeongJae has also done some DAMON cleanup work.

 - Kairui Song adds a series ("Clean up and fixes for swap").

 - Vernon Yang contributed the series "Clean up and refinement for maple
   tree".

 - Yu Zhao has contributed the "mm: multi-gen LRU: memcg LRU" series. It
   adds to MGLRU an LRU of memcgs, to improve the scalability of global
   reclaim.

 - David Hildenbrand has added some userfaultfd cleanup work in the
   series "mm: uffd-wp + change_protection() cleanups".

 - Christoph Hellwig has removed the generic_writepages() library
   function in the series "remove generic_writepages".

 - Baolin Wang has performed some maintenance on the compaction code in
   his series "Some small improvements for compaction".

 - Sidhartha Kumar is doing some maintenance work on struct page in his
   series "Get rid of tail page fields".

 - David Hildenbrand contributed some cleanup, bugfixing and
   generalization of pte management and of pte debugging in his series
   "mm: support __HAVE_ARCH_PTE_SWP_EXCLUSIVE on all architectures with
   swap PTEs".

 - Mel Gorman and Neil Brown have removed the __GFP_ATOMIC allocation
   flag in the series "Discard __GFP_ATOMIC".

 - Sergey Senozhatsky has improved zsmalloc's memory utilization with
   his series "zsmalloc: make zspage chain size configurable".

 - Joey Gouly has added prctl() support for prohibiting the creation of
   writeable+executable mappings.

   The previous BPF-based approach had shortcomings. See "mm: In-kernel
   support for memory-deny-write-execute (MDWE)".

 - Waiman Long did some kmemleak cleanup and bugfixing in the series
   "mm/kmemleak: Simplify kmemleak_cond_resched() & fix UAF".

 - T.J. Alumbaugh has contributed some MGLRU cleanup work in his series
   "mm: multi-gen LRU: improve".

 - Jiaqi Yan has provided some enhancements to our memory error
   statistics reporting, mainly by presenting the statistics on a
   per-node basis. See the series "Introduce per NUMA node memory error
   statistics".

 - Mel Gorman has a second and hopefully final shot at fixing a CPU-hog
   regression in compaction via his series "Fix excessive CPU usage
   during compaction".

 - Christoph Hellwig does some vmalloc maintenance work in the series
   "cleanup vfree and vunmap".

 - Christoph Hellwig has removed block_device_operations.rw_page() in
   ths series "remove ->rw_page".

 - We get some maple_tree improvements and cleanups in Liam Howlett's
   series "VMA tree type safety and remove __vma_adjust()".

 - Suren Baghdasaryan has done some work on the maintainability of our
   vm_flags handling in the series "introduce vm_flags modifier
   functions".

 - Some pagemap cleanup and generalization work in Mike Rapoport's
   series "mm, arch: add generic implementation of pfn_valid() for
   FLATMEM" and "fixups for generic implementation of pfn_valid()"

 - Baoquan He has done some work to make /proc/vmallocinfo and
   /proc/kcore better represent the real state of things in his series
   "mm/vmalloc.c: allow vread() to read out vm_map_ram areas".

 - Jason Gunthorpe rationalized the GUP system's interface to the rest
   of the kernel in the series "Simplify the external interface for
   GUP".

 - SeongJae Park wishes to migrate people from DAMON's debugfs interface
   over to its sysfs interface. To support this, we'll temporarily be
   printing warnings when people use the debugfs interface. See the
   series "mm/damon: deprecate DAMON debugfs interface".

 - Andrey Konovalov provided the accurately named "lib/stackdepot: fixes
   and clean-ups" series.

 - Huang Ying has provided a dramatic reduction in migration's TLB flush
   IPI rates with the series "migrate_pages(): batch TLB flushing".

 - Arnd Bergmann has some objtool fixups in "objtool warning fixes".

* tag 'mm-stable-2023-02-20-13-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (505 commits)
  include/linux/migrate.h: remove unneeded externs
  mm/memory_hotplug: cleanup return value handing in do_migrate_range()
  mm/uffd: fix comment in handling pte markers
  mm: change to return bool for isolate_movable_page()
  mm: hugetlb: change to return bool for isolate_hugetlb()
  mm: change to return bool for isolate_lru_page()
  mm: change to return bool for folio_isolate_lru()
  objtool: add UACCESS exceptions for __tsan_volatile_read/write
  kmsan: disable ftrace in kmsan core code
  kasan: mark addr_has_metadata __always_inline
  mm: memcontrol: rename memcg_kmem_enabled()
  sh: initialize max_mapnr
  m68k/nommu: add missing definition of ARCH_PFN_OFFSET
  mm: percpu: fix incorrect size in pcpu_obj_full_size()
  maple_tree: reduce stack usage with gcc-9 and earlier
  mm: page_alloc: call panic() when memoryless node allocation fails
  mm: multi-gen LRU: avoid futile retries
  migrate_pages: move THP/hugetlb migration support check to simplify code
  migrate_pages: batch flushing TLB
  migrate_pages: share more code between _unmap and _move
  ...
2023-02-23 17:09:35 -08:00
Linus Torvalds 06e1a81c48 A healthy mix of EFI contributions this time:
- Performance tweaks for efifb earlycon by Andy
 
 - Preparatory refactoring and cleanup work in the efivar layer by Johan,
   which is needed to accommodate the Snapdragon arm64 laptops that
   expose their EFI variable store via a TEE secure world API.
 
 - Enhancements to the EFI memory map handling so that Xen dom0 can
   safely access EFI configuration tables (Demi Marie)
 
 - Wire up the newly introduced IBT/BTI flag in the EFI memory attributes
   table, so that firmware that is generated with ENDBR/BTI landing pads
   will be mapped with enforcement enabled.
 
 - Clean up how we check and print the EFI revision exposed by the
   firmware.
 
 - Incorporate EFI memory attributes protocol definition contributed by
   Evgeniy and wire it up in the EFI zboot code. This ensures that these
   images can execute under new and stricter rules regarding the default
   memory permissions for EFI page allocations. (More work is in progress
   here)
 
 - CPER header cleanup by Dan Williams
 
 - Use a raw spinlock to protect the EFI runtime services stack on arm64
   to ensure the correct semantics under -rt. (Pierre)
 
 - EFI framebuffer quirk for Lenovo Ideapad by Darrell.
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmPzuwsACgkQw08iOZLZ
 jyS7dwwAm95DlDxFIQi4FmTm2mqJws9PyDrkfaAK1CoyqCgeOLQT2FkVolgr8jne
 pwpwCTXtYP8y0BZvdQEIjpAq/BHKaD3GJSPfl7lo+pnUu68PpsFWaV6EdT33KKfj
 QeF0MnUvrqUeTFI77D+S0ZW2zxdo9eCcahF3TPA52/bEiiDHWBF8Qm9VHeQGklik
 zoXA15ft3mgITybgjEA0ncGrVZiBMZrYoMvbdkeoedfw02GN/eaQn8d2iHBtTDEh
 3XNlo7ONX0v50cjt0yvwFEA0AKo0o7R1cj+ziKH/bc4KjzIiCbINhy7blroSq+5K
 YMlnPHuj8Nhv3I+MBdmn/nxRCQeQsE4RfRru04hfNfdcqjAuqwcBvRXvVnjWKZHl
 CmUYs+p/oqxrQ4BjiHfw0JKbXRsgbFI6o3FeeLH9kzI9IDUPpqu3Ma814FVok9Ai
 zbOCrJf5tEtg5tIavcUESEMBuHjEafqzh8c7j7AAqbaNjlihsqosDy9aYoarEi5M
 f/tLec86
 =+pOz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "A healthy mix of EFI contributions this time:

   - Performance tweaks for efifb earlycon (Andy)

   - Preparatory refactoring and cleanup work in the efivar layer, which
     is needed to accommodate the Snapdragon arm64 laptops that expose
     their EFI variable store via a TEE secure world API (Johan)

   - Enhancements to the EFI memory map handling so that Xen dom0 can
     safely access EFI configuration tables (Demi Marie)

   - Wire up the newly introduced IBT/BTI flag in the EFI memory
     attributes table, so that firmware that is generated with ENDBR/BTI
     landing pads will be mapped with enforcement enabled

   - Clean up how we check and print the EFI revision exposed by the
     firmware

   - Incorporate EFI memory attributes protocol definition and wire it
     up in the EFI zboot code (Evgeniy)

     This ensures that these images can execute under new and stricter
     rules regarding the default memory permissions for EFI page
     allocations (More work is in progress here)

   - CPER header cleanup (Dan Williams)

   - Use a raw spinlock to protect the EFI runtime services stack on
     arm64 to ensure the correct semantics under -rt (Pierre)

   - EFI framebuffer quirk for Lenovo Ideapad (Darrell)"

* tag 'efi-next-for-v6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (24 commits)
  firmware/efi sysfb_efi: Add quirk for Lenovo IdeaPad Duet 3
  arm64: efi: Make efi_rt_lock a raw_spinlock
  efi: Add mixed-mode thunk recipe for GetMemoryAttributes
  efi: x86: Wire up IBT annotation in memory attributes table
  efi: arm64: Wire up BTI annotation in memory attributes table
  efi: Discover BTI support in runtime services regions
  efi/cper, cxl: Remove cxl_err.h
  efi: Use standard format for printing the EFI revision
  efi: Drop minimum EFI version check at boot
  efi: zboot: Use EFI protocol to remap code/data with the right attributes
  efi/libstub: Add memory attribute protocol definitions
  efi: efivars: prevent double registration
  efi: verify that variable services are supported
  efivarfs: always register filesystem
  efi: efivars: add efivars printk prefix
  efi: Warn if trying to reserve memory under Xen
  efi: Actually enable the ESRT under Xen
  efi: Apply allowlist to EFI configuration tables when running under Xen
  efi: xen: Implement memory descriptor lookup based on hypercall
  efi: memmap: Disregard bogus entries instead of returning them
  ...
2023-02-23 14:41:48 -08:00
Linus Torvalds 5b7c4cabbb Networking changes for 6.3.
Core
 ----
 
  - Add dedicated kmem_cache for typical/small skb->head, avoid having
    to access struct page at kfree time, and improve memory use.
 
  - Introduce sysctl to set default RPS configuration for new netdevs.
 
  - Define Netlink protocol specification format which can be used
    to describe messages used by each family and auto-generate parsers.
    Add tools for generating kernel data structures and uAPI headers.
 
  - Expose all net/core sysctls inside netns.
 
  - Remove 4s sleep in netpoll if carrier is instantly detected on boot.
 
  - Add configurable limit of MDB entries per port, and port-vlan.
 
  - Continue populating drop reasons throughout the stack.
 
  - Retire a handful of legacy Qdiscs and classifiers.
 
 Protocols
 ---------
 
  - Support IPv4 big TCP (TSO frames larger than 64kB).
 
  - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
    on socket by socket basis.
 
  - Track and report in procfs number of MPTCP sockets used.
 
  - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP
    path manager.
 
  - IPv6: don't check net.ipv6.route.max_size and rely on garbage
    collection to free memory (similarly to IPv4).
 
  - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).
 
  - ICMP: add per-rate limit counters.
 
  - Add support for user scanning requests in ieee802154.
 
  - Remove static WEP support.
 
  - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
    reporting.
 
  - WiFi 7 EHT channel puncturing support (client & AP).
 
 BPF
 ---
 
  - Add a rbtree data structure following the "next-gen data structure"
    precedent set by recently added linked list, that is, by using
    kfunc + kptr instead of adding a new BPF map type.
 
  - Expose XDP hints via kfuncs with initial support for RX hash and
    timestamp metadata.
 
  - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key
    to better support decap on GRE tunnel devices not operating
    in collect metadata.
 
  - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.
 
  - Remove the need for trace_printk_lock for bpf_trace_printk
    and bpf_trace_vprintk helpers.
 
  - Extend libbpf's bpf_tracing.h support for tracing arguments of
    kprobes/uprobes and syscall as a special case.
 
  - Significantly reduce the search time for module symbols
    by livepatch and BPF.
 
  - Enable cpumasks to be used as kptrs, which is useful for tracing
    programs tracking which tasks end up running on which CPUs in
    different time intervals.
 
  - Add support for BPF trampoline on s390x and riscv64.
 
  - Add capability to export the XDP features supported by the NIC.
 
  - Add __bpf_kfunc tag for marking kernel functions as kfuncs.
 
  - Add cgroup.memory=nobpf kernel parameter option to disable BPF
    memory accounting for container environments.
 
 Netfilter
 ---------
 
  - Remove the CLUSTERIP target. It has been marked as obsolete
    for years, and we still have WARN splats wrt. races of
    the out-of-band /proc interface installed by this target.
 
  - Add 'destroy' commands to nf_tables. They are identical to
    the existing 'delete' commands, but do not return an error if
    the referenced object (set, chain, rule...) did not exist.
 
 Driver API
 ----------
 
  - Improve cpumask_local_spread() locality to help NICs set the right
    IRQ affinity on AMD platforms.
 
  - Separate C22 and C45 MDIO bus transactions more clearly.
 
  - Introduce new DCB table to control DSCP rewrite on egress.
 
  - Support configuration of Physical Layer Collision Avoidance (PLCA)
    Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
    shared medium Ethernet.
 
  - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
    preemption of low priority frames by high priority frames.
 
  - Add support for controlling MACSec offload using netlink SET.
 
  - Rework devlink instance refcounts to allow registration and
    de-registration under the instance lock. Split the code into multiple
    files, drop some of the unnecessarily granular locks and factor out
    common parts of netlink operation handling.
 
  - Add TX frame aggregation parameters (for USB drivers).
 
  - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
    messages with notifications for debug.
 
  - Allow offloading of UDP NEW connections via act_ct.
 
  - Add support for per action HW stats in TC.
 
  - Support hardware miss to TC action (continue processing in SW from
    a specific point in the action chain).
 
  - Warn if old Wireless Extension user space interface is used with
    modern cfg80211/mac80211 drivers. Do not support Wireless Extensions
    for Wi-Fi 7 devices at all. Everyone should switch to using nl80211
    interface instead.
 
  - Improve the CAN bit timing configuration. Use extack to return error
    messages directly to user space, update the SJW handling, including
    the definition of a new default value that will benefit CAN-FD
    controllers, by increasing their oscillator tolerance.
 
 New hardware / drivers
 ----------------------
 
  - Ethernet:
    - nVidia BlueField-3 support (control traffic driver)
    - Ethernet support for imx93 SoCs
    - Motorcomm yt8531 gigabit Ethernet PHY
    - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
    - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
    - Amlogic gxl MDIO mux
 
  - WiFi:
    - RealTek RTL8188EU (rtl8xxxu)
    - Qualcomm Wi-Fi 7 devices (ath12k)
 
  - CAN:
    - Renesas R-Car V4H
 
 Drivers
 -------
 
  - Bluetooth:
    - Set Per Platform Antenna Gain (PPAG) for Intel controllers.
 
  - Ethernet NICs:
    - Intel (1G, igc):
      - support TSN / Qbv / packet scheduling features of i226 model
    - Intel (100G, ice):
      - use GNSS subsystem instead of TTY
      - multi-buffer XDP support
      - extend support for GPIO pins to E823 devices
    - nVidia/Mellanox:
      - update the shared buffer configuration on PFC commands
      - implement PTP adjphase function for HW offset control
      - TC support for Geneve and GRE with VF tunnel offload
      - more efficient crypto key management method
      - multi-port eswitch support
    - Netronome/Corigine:
      - add DCB IEEE support
      - support IPsec offloading for NFP3800
    - Freescale/NXP (enetc):
      - enetc: support XDP_REDIRECT for XDP non-linear buffers
      - enetc: improve reconfig, avoid link flap and waiting for idle
      - enetc: support MAC Merge layer
    - Other NICs:
      - sfc/ef100: add basic devlink support for ef100
      - ionic: rx_push mode operation (writing descriptors via MMIO)
      - bnxt: use the auxiliary bus abstraction for RDMA
      - r8169: disable ASPM and reset bus in case of tx timeout
      - cpsw: support QSGMII mode for J721e CPSW9G
      - cpts: support pulse-per-second output
      - ngbe: add an mdio bus driver
      - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
      - r8152: handle devices with FW with NCM support
      - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
      - virtio-net: support multi buffer XDP
      - virtio/vsock: replace virtio_vsock_pkt with sk_buff
      - tsnep: XDP support
 
  - Ethernet high-speed switches:
    - nVidia/Mellanox (mlxsw):
      - add support for latency TLV (in FW control messages)
    - Microchip (sparx5):
      - separate explicit and implicit traffic forwarding rules, make
        the implicit rules always active
      - add support for egress DSCP rewrite
      - IS0 VCAP support (Ingress Classification)
      - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.)
      - ES2 VCAP support (Egress Access Control)
      - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1)
 
  - Ethernet embedded switches:
    - Marvell (mv88e6xxx):
      - add MAB (port auth) offload support
      - enable PTP receive for mv88e6390
    - NXP (ocelot):
      - support MAC Merge layer
      - support for the the vsc7512 internal copper phys
    - Microchip:
      - lan9303: convert to PHYLINK
      - lan966x: support TC flower filter statistics
      - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
      - lan937x: support Credit Based Shaper configuration
      - ksz9477: support Energy Efficient Ethernet
    - other:
      - qca8k: convert to regmap read/write API, use bulk operations
      - rswitch: Improve TX timestamp accuracy
 
  - Intel WiFi (iwlwifi):
    - EHT (Wi-Fi 7) rate reporting
    - STEP equalizer support: transfer some STEP (connection to radio
      on platforms with integrated wifi) related parameters from the
      BIOS to the firmware.
 
  - Qualcomm 802.11ax WiFi (ath11k):
    - IPQ5018 support
    - Fine Timing Measurement (FTM) responder role support
    - channel 177 support
 
  - MediaTek WiFi (mt76):
    - per-PHY LED support
    - mt7996: EHT (Wi-Fi 7) support
    - Wireless Ethernet Dispatch (WED) reset support
    - switch to using page pool allocator
 
  - RealTek WiFi (rtw89):
    - support new version of Bluetooth co-existance
 
  - Mobile:
    - rmnet: support TX aggregation.
 
 Signed-off-by: Jakub Kicinski <kuba@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmP1VIYACgkQMUZtbf5S
 IrvsChAApz0rNL/sPKxXTEfxZ1tN7D3sYxYKQPomxvl5BV+MvicrLddJy3KmzEFK
 nnJNO3nuRNuH422JQ/ylZ4mGX1opa6+5QJb0UINImXUI7Fm8HHBIuPGkv7d5CheZ
 7JexFqjPJXUy9nPyh1Rra+IA9AcRd2U7jeGEZR38wb99bHJQj5Bzdk20WArEB0el
 n44aqg49LXH71bSeXRz77x5SjkwVtYiccQxLcnmTbjLU2xVraLvI2J+wAhHnVXWW
 9lrU1+V4Ex2Xcd1xR0L0cHeK+meP1TrPRAeF+JDpVI3a/zJiE7cZjfHdG/jH5xWl
 leZJqghVozrZQNtewWWO7XhUFhMDgFu3W/1vNLjSHPZEqaz1JpM67J1+ql6s63l4
 LMWoXbcYZz+SL9ZRCoPkbGue/5fKSHv8/Jl9Sh58+eTS+c/zgN8uFGRNFXLX1+EP
 n8uvt985PxMd6x1+dHumhOUzxnY4Sfi1vjitSunTsNFQ3Cmp4SO0IfBVJWfLUCuC
 xz5hbJGJJbSpvUsO+HWyCg83E5OWghRE/Onpt2jsQSZCrO9HDg4FRTEf3WAMgaqc
 edb5KfbRZPTJQM08gWdluXzSk1nw3FNP2tXW4XlgUrEbjb+fOk0V9dQg2gyYTxQ1
 Nhvn8ZQPi6/GMMELHAIPGmmW1allyOGiAzGlQsv8EmL+OFM6WDI=
 =xXhC
 -----END PGP SIGNATURE-----

Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Jakub Kicinski:
 "Core:

   - Add dedicated kmem_cache for typical/small skb->head, avoid having
     to access struct page at kfree time, and improve memory use.

   - Introduce sysctl to set default RPS configuration for new netdevs.

   - Define Netlink protocol specification format which can be used to
     describe messages used by each family and auto-generate parsers.
     Add tools for generating kernel data structures and uAPI headers.

   - Expose all net/core sysctls inside netns.

   - Remove 4s sleep in netpoll if carrier is instantly detected on
     boot.

   - Add configurable limit of MDB entries per port, and port-vlan.

   - Continue populating drop reasons throughout the stack.

   - Retire a handful of legacy Qdiscs and classifiers.

  Protocols:

   - Support IPv4 big TCP (TSO frames larger than 64kB).

   - Add IP_LOCAL_PORT_RANGE socket option, to control local port range
     on socket by socket basis.

   - Track and report in procfs number of MPTCP sockets used.

   - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path
     manager.

   - IPv6: don't check net.ipv6.route.max_size and rely on garbage
     collection to free memory (similarly to IPv4).

   - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986).

   - ICMP: add per-rate limit counters.

   - Add support for user scanning requests in ieee802154.

   - Remove static WEP support.

   - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate
     reporting.

   - WiFi 7 EHT channel puncturing support (client & AP).

  BPF:

   - Add a rbtree data structure following the "next-gen data structure"
     precedent set by recently added linked list, that is, by using
     kfunc + kptr instead of adding a new BPF map type.

   - Expose XDP hints via kfuncs with initial support for RX hash and
     timestamp metadata.

   - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to
     better support decap on GRE tunnel devices not operating in collect
     metadata.

   - Improve x86 JIT's codegen for PROBE_MEM runtime error checks.

   - Remove the need for trace_printk_lock for bpf_trace_printk and
     bpf_trace_vprintk helpers.

   - Extend libbpf's bpf_tracing.h support for tracing arguments of
     kprobes/uprobes and syscall as a special case.

   - Significantly reduce the search time for module symbols by
     livepatch and BPF.

   - Enable cpumasks to be used as kptrs, which is useful for tracing
     programs tracking which tasks end up running on which CPUs in
     different time intervals.

   - Add support for BPF trampoline on s390x and riscv64.

   - Add capability to export the XDP features supported by the NIC.

   - Add __bpf_kfunc tag for marking kernel functions as kfuncs.

   - Add cgroup.memory=nobpf kernel parameter option to disable BPF
     memory accounting for container environments.

  Netfilter:

   - Remove the CLUSTERIP target. It has been marked as obsolete for
     years, and we still have WARN splats wrt races of the out-of-band
     /proc interface installed by this target.

   - Add 'destroy' commands to nf_tables. They are identical to the
     existing 'delete' commands, but do not return an error if the
     referenced object (set, chain, rule...) did not exist.

  Driver API:

   - Improve cpumask_local_spread() locality to help NICs set the right
     IRQ affinity on AMD platforms.

   - Separate C22 and C45 MDIO bus transactions more clearly.

   - Introduce new DCB table to control DSCP rewrite on egress.

   - Support configuration of Physical Layer Collision Avoidance (PLCA)
     Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of
     shared medium Ethernet.

   - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing
     preemption of low priority frames by high priority frames.

   - Add support for controlling MACSec offload using netlink SET.

   - Rework devlink instance refcounts to allow registration and
     de-registration under the instance lock. Split the code into
     multiple files, drop some of the unnecessarily granular locks and
     factor out common parts of netlink operation handling.

   - Add TX frame aggregation parameters (for USB drivers).

   - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning
     messages with notifications for debug.

   - Allow offloading of UDP NEW connections via act_ct.

   - Add support for per action HW stats in TC.

   - Support hardware miss to TC action (continue processing in SW from
     a specific point in the action chain).

   - Warn if old Wireless Extension user space interface is used with
     modern cfg80211/mac80211 drivers. Do not support Wireless
     Extensions for Wi-Fi 7 devices at all. Everyone should switch to
     using nl80211 interface instead.

   - Improve the CAN bit timing configuration. Use extack to return
     error messages directly to user space, update the SJW handling,
     including the definition of a new default value that will benefit
     CAN-FD controllers, by increasing their oscillator tolerance.

  New hardware / drivers:

   - Ethernet:
      - nVidia BlueField-3 support (control traffic driver)
      - Ethernet support for imx93 SoCs
      - Motorcomm yt8531 gigabit Ethernet PHY
      - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA)
      - Microchip LAN8841 PHY (incl. cable diagnostics and PTP)
      - Amlogic gxl MDIO mux

   - WiFi:
      - RealTek RTL8188EU (rtl8xxxu)
      - Qualcomm Wi-Fi 7 devices (ath12k)

   - CAN:
      - Renesas R-Car V4H

  Drivers:

   - Bluetooth:
      - Set Per Platform Antenna Gain (PPAG) for Intel controllers.

   - Ethernet NICs:
      - Intel (1G, igc):
         - support TSN / Qbv / packet scheduling features of i226 model
      - Intel (100G, ice):
         - use GNSS subsystem instead of TTY
         - multi-buffer XDP support
         - extend support for GPIO pins to E823 devices
      - nVidia/Mellanox:
         - update the shared buffer configuration on PFC commands
         - implement PTP adjphase function for HW offset control
         - TC support for Geneve and GRE with VF tunnel offload
         - more efficient crypto key management method
         - multi-port eswitch support
      - Netronome/Corigine:
         - add DCB IEEE support
         - support IPsec offloading for NFP3800
      - Freescale/NXP (enetc):
         - support XDP_REDIRECT for XDP non-linear buffers
         - improve reconfig, avoid link flap and waiting for idle
         - support MAC Merge layer
      - Other NICs:
         - sfc/ef100: add basic devlink support for ef100
         - ionic: rx_push mode operation (writing descriptors via MMIO)
         - bnxt: use the auxiliary bus abstraction for RDMA
         - r8169: disable ASPM and reset bus in case of tx timeout
         - cpsw: support QSGMII mode for J721e CPSW9G
         - cpts: support pulse-per-second output
         - ngbe: add an mdio bus driver
         - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing
         - r8152: handle devices with FW with NCM support
         - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation
         - virtio-net: support multi buffer XDP
         - virtio/vsock: replace virtio_vsock_pkt with sk_buff
         - tsnep: XDP support

   - Ethernet high-speed switches:
      - nVidia/Mellanox (mlxsw):
         - add support for latency TLV (in FW control messages)
      - Microchip (sparx5):
         - separate explicit and implicit traffic forwarding rules, make
           the implicit rules always active
         - add support for egress DSCP rewrite
         - IS0 VCAP support (Ingress Classification)
         - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS
           etc.)
         - ES2 VCAP support (Egress Access Control)
         - support for Per-Stream Filtering and Policing (802.1Q,
           8.6.5.1)

   - Ethernet embedded switches:
      - Marvell (mv88e6xxx):
         - add MAB (port auth) offload support
         - enable PTP receive for mv88e6390
      - NXP (ocelot):
         - support MAC Merge layer
         - support for the the vsc7512 internal copper phys
      - Microchip:
         - lan9303: convert to PHYLINK
         - lan966x: support TC flower filter statistics
         - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x
         - lan937x: support Credit Based Shaper configuration
         - ksz9477: support Energy Efficient Ethernet
      - other:
         - qca8k: convert to regmap read/write API, use bulk operations
         - rswitch: Improve TX timestamp accuracy

   - Intel WiFi (iwlwifi):
      - EHT (Wi-Fi 7) rate reporting
      - STEP equalizer support: transfer some STEP (connection to radio
        on platforms with integrated wifi) related parameters from the
        BIOS to the firmware.

   - Qualcomm 802.11ax WiFi (ath11k):
      - IPQ5018 support
      - Fine Timing Measurement (FTM) responder role support
      - channel 177 support

   - MediaTek WiFi (mt76):
      - per-PHY LED support
      - mt7996: EHT (Wi-Fi 7) support
      - Wireless Ethernet Dispatch (WED) reset support
      - switch to using page pool allocator

   - RealTek WiFi (rtw89):
      - support new version of Bluetooth co-existance

   - Mobile:
      - rmnet: support TX aggregation"

* tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits)
  page_pool: add a comment explaining the fragment counter usage
  net: ethtool: fix __ethtool_dev_mm_supported() implementation
  ethtool: pse-pd: Fix double word in comments
  xsk: add linux/vmalloc.h to xsk.c
  sefltests: netdevsim: wait for devlink instance after netns removal
  selftest: fib_tests: Always cleanup before exit
  net/mlx5e: Align IPsec ASO result memory to be as required by hardware
  net/mlx5e: TC, Set CT miss to the specific ct action instance
  net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG
  net/mlx5: Refactor tc miss handling to a single function
  net/mlx5: Kconfig: Make tc offload depend on tc skb extension
  net/sched: flower: Support hardware miss to tc action
  net/sched: flower: Move filter handle initialization earlier
  net/sched: cls_api: Support hardware miss to tc action
  net/sched: Rename user cookie and act cookie
  sfc: fix builds without CONFIG_RTC_LIB
  sfc: clean up some inconsistent indentings
  net/mlx4_en: Introduce flexible array to silence overflow warning
  net: lan966x: Fix possible deadlock inside PTP
  net/ulp: Remove redundant ->clone() test in inet_clone_ulp().
  ...
2023-02-21 18:24:12 -08:00
Palmer Dabbelt b19aa282c5
Merge patch series "riscv: Dump faulting instructions in oops handler"
Björn Töpel <bjorn@kernel.org> says:

From: Björn Töpel <bjorn@rivosinc.com>

RISC-V does not dump faulting instructions in the oops handler. This
series adds "Code:" dumps to the oops output together with
scripts/decodecode support.

* b4-shazam-merge:
  scripts/decodecode: Add support for RISC-V
  riscv: Add instruction dump to RISC-V splats

Link: https://lore.kernel.org/r/20230119074738.708301-1-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:51 -08:00
Jisheng Zhang 91612cfb17
riscv: alternative: proceed one more instruction for auipc/jalr pair
If we patched auipc + jalr pair, we'd better proceed one more
instruction. Andrew pointed out "There's not a problem now, since
we're only adding a fixup for jal, not jalr, but we should
future-proof this and there's no reason to revisit an already fixed-up
instruction anyway."

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20230115162811.3146-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:50 -08:00
Palmer Dabbelt f3af3b0039
Merge patch series "riscv: improve link and support ARCH_WANT_LD_ORPHAN_WARN"
Jisheng Zhang <jszhang@kernel.org> says:

This series tries to improve link time handling of riscv:
patch1 adds the missing RUNTIME_DISCARD_EXIT as suggested by Masahiro.

Similar as other architectures such as x86, arm64 and so on, enable
ARCH_WANT_LD_ORPHAN_WARN to enable linker orphan warnings to prevent
from missing any new sections in future. So the following two patches
are preparation ones, and the last patch finally selects
ARCH_WANT_LD_ORPHAN_WARN

* b4-shazam-merge:
  riscv: select ARCH_WANT_LD_ORPHAN_WARN for !XIP_KERNEL
  riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
  riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
  riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
  riscv: lds: define RUNTIME_DISCARD_EXIT

Link: https://lore.kernel.org/r/20230119155417.2600-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:49 -08:00
Mattias Nissler 130aee3fd9
riscv: Avoid enabling interrupts in die()
While working on something else, I noticed that the kernel would start
accepting interrupts again after crashing in an interrupt handler. Since
the kernel is already in inconsistent state, enabling interrupts is
dangerous and opens up risk of kernel state deteriorating further.
Interrupts do get enabled via what looks like an unintended side effect of
spin_unlock_irq, so switch to the more cautious
spin_lock_irqsave/spin_unlock_irqrestore instead.

Fixes: 76d2a0493a ("RISC-V: Init and Halt Code")
Signed-off-by: Mattias Nissler <mnissler@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230215144828.3370316-1-mnissler@rivosinc.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:45 -08:00
Conor Dooley 9493e6f3ce
RISC-V: take text_mutex during alternative patching
Guenter reported a splat during boot, that Samuel pointed out was the
lockdep assertion failing in patch_insn_write():

WARNING: CPU: 0 PID: 0 at arch/riscv/kernel/patch.c:63 patch_insn_write+0x222/0x2f6
epc : patch_insn_write+0x222/0x2f6
 ra : patch_insn_write+0x21e/0x2f6
epc : ffffffff800068c6 ra : ffffffff800068c2 sp : ffffffff81803df0
 gp : ffffffff81a1ab78 tp : ffffffff81814f80 t0 : ffffffffffffe000
 t1 : 0000000000000001 t2 : 4c45203a76637369 s0 : ffffffff81803e40
 s1 : 0000000000000004 a0 : 0000000000000000 a1 : ffffffffffffffff
 a2 : 0000000000000004 a3 : 0000000000000000 a4 : 0000000000000001
 a5 : 0000000000000000 a6 : 0000000000000000 a7 : 0000000052464e43
 s2 : ffffffff80b4889c s3 : 000000000000082c s4 : ffffffff80b48828
 s5 : 0000000000000828 s6 : ffffffff8131a0a0 s7 : 0000000000000fff
 s8 : 0000000008000200 s9 : ffffffff8131a520 s10: 0000000000000018
 s11: 000000000000000b t3 : 0000000000000001 t4 : 000000000000000d
 t5 : ffffffffd8180000 t6 : ffffffff81803bc8
status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff800068c6>] patch_insn_write+0x222/0x2f6
[<ffffffff80006a36>] patch_text_nosync+0xc/0x2a
[<ffffffff80003b86>] riscv_cpufeature_patch_func+0x52/0x98
[<ffffffff80003348>] _apply_alternatives+0x46/0x86
[<ffffffff80c02d36>] apply_boot_alternatives+0x3c/0xfa
[<ffffffff80c03ad8>] setup_arch+0x584/0x5b8
[<ffffffff80c0075a>] start_kernel+0xa2/0x8f8

This issue was exposed by 702e64550b ("riscv: fpu: switch has_fpu() to
riscv_has_extension_likely()"), as it is the patching in has_fpu() that
triggers the splats in Guenter's report.

Take the text_mutex before doing any code patching to satisfy lockdep.

Fixes: ff689fd21c ("riscv: add RISC-V Svpbmt extension support")
Fixes: a35707c3d8 ("riscv: add memory-type errata for T-Head")
Fixes: 1a0e5dbd37 ("riscv: sifive: Add SiFive alternative ports")
Reported-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/all/20230212154333.GA3760469@roeck-us.net/
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Tested-by: Guenter Roeck <linux@roeck-us.net>
Link: https://lore.kernel.org/r/20230212194735.491785-1-conor@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:33 -08:00
Heiko Stuebner 1eac28201a
RISC-V: fix ordering of Zbb extension
As Andrew reported,
    Zb* comes after Zi* according 27.11 "Subset Naming Convention"
so fix the ordering accordingly.

Reported-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Tested-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230208225328.1636017-2-heiko@sntech.de
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 17:21:21 -08:00
Björn Töpel eb165bfa8e
riscv: Add instruction dump to RISC-V splats
Add instruction dump (Code:) output to RISC-V splats. Dump 16b
parcels.

An example:
  Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
  Oops [#1]
  Modules linked in:
  CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.2.0-rc3-00302-g840ff44c571d-dirty #27
  Hardware name: riscv-virtio,qemu (DT)
  epc : kernel_init+0xc8/0x10e
   ra : kernel_init+0x70/0x10e
  epc : ffffffff80bd9a40 ra : ffffffff80bd99e8 sp : ff2000000060bec0
   gp : ffffffff81730b28 tp : ff6000007ff00000 t0 : 7974697275636573
   t1 : 0000000000000000 t2 : 3030303270393d6e s0 : ff2000000060bee0
   s1 : ffffffff81732028 a0 : 0000000000000000 a1 : ff60000080dd1780
   a2 : 0000000000000002 a3 : ffffffff8176a470 a4 : 0000000000000000
   a5 : 000000000000000a a6 : 0000000000000081 a7 : ff60000080dd1780
   s2 : 0000000000000000 s3 : 0000000000000000 s4 : 0000000000000000
   s5 : 0000000000000000 s6 : 0000000000000000 s7 : 0000000000000000
   s8 : 0000000000000000 s9 : 0000000000000000 s10: 0000000000000000
   s11: 0000000000000000 t3 : ffffffff81186018 t4 : 0000000000000022
   t5 : 000000000000003d t6 : 0000000000000000
  status: 0000000200000120 badaddr: 0000000000000000 cause: 000000000000000f
  [<ffffffff80003528>] ret_from_exception+0x0/0x16
  Code: 862a d179 608c a517 0069 0513 2be5 d0ef db2e 47a9 (c11c) a517
  ---[ end trace 0000000000000000 ]---
  Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b
  SMP: stopping secondary CPUs
  ---[ end Kernel panic - not syncing: Attempted to kill init! exitcode=0x0000000b ]---

Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/r/20230119074738.708301-2-bjorn@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:53:53 -08:00
Jisheng Zhang 0ed0031b09
riscv: vmlinux.lds.S: explicitly catch .init.bss sections from EFI stub
When enabling linker orphan section warning, I got warnings similar as
below:
ld.lld: warning:
./drivers/firmware/efi/libstub/lib.a(efi-stub-helper.stub.o):(.init.bss)
is being placed in '.init.bss'

Catch the sections so that we can enable linker orphan section warning.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:06 -08:00
Jisheng Zhang b13e64d941
riscv: vmlinux.lds.S: explicitly catch .riscv.attributes sections
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.riscv.attributes' from
`init/main.o' being placed in section `.riscv.attributes'

While I don't see any usage of .riscv.attributes sections' in kernel
now, just catch the sections so that we can enable linker orphan
section warning.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-4-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:05 -08:00
Jisheng Zhang e5973191a8
riscv: vmlinux.lds.S: explicitly catch .rela.dyn symbols
When enabling linker orphan section warning, I got warnings similar as
below:
riscv64-linux-gnu-ld: warning: orphan section `.rela.text' from
`init/main.o' being placed in section `.rela.dyn'

Use the approach similar as ARM64 does and declare it in vmlinux.lds.S

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230119155417.2600-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:04 -08:00
Jisheng Zhang 7e92586c28
riscv: lds: define RUNTIME_DISCARD_EXIT
riscv discards .exit.* sections at run-time but doesn't define
RUNTIME_DISCARD_EXIT. However, the .exit.* sections are still allocated
and kept even if the generic DISCARDS would discard the sections due
to missing RUNTIME_DISCARD_EXIT, because the DISCARD sits at the end of
the linker script. Add the missing RUNTIME_DISCARD_EXIT define so that
it still works if we move DISCARD up or even at the beginning of the
linker script.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Suggested-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230119155417.2600-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-21 16:29:03 -08:00
Linus Torvalds 560b803067 Updates for timekeeping, timers and clockevent/source drivers:
Core:
 
     - Yet another round of improvements to make the clocksource watchdog
       more robust:
 
       	 - Relax the clocksource-watchdog skew criteria to match the NTP
            criteria.
 
 	 - Temporarily skip the watchdog when high memory latencies are
 	   detected which can lead to false-positives.
 
 	 - Provide an option to enable TSC skew detection even on systems
            where TSC is marked as reliable.
 
       Sigh!
 
     - Initialize the restart block in the nanosleep syscalls to be directed
       to the no restart function instead of doing a partial setup on entry.
 
       This prevents an erroneous restart_syscall() invocation from
       corrupting user space data. While such a situation is clearly a user
       space bug, preventing this is a correctness issue and caters to the
       least suprise principle.
 
     - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
       to align it with the nanosleep semantics.
 
   Drivers:
 
     - The obligatory new driver bindings for Mediatek, Rockchip and RISC-V
       variants.
 
     - Add support for the C3STOP misfeature to the RISC-V timer to handle
       the case where the timer stops in deeper idle state.
 
     - Set up a static key in the RISC-V timer correctly before first use.
 
     - The usual small improvements and fixes all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmPzV+cTHHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYoYlDEACMrjN2F6qeiOW94t4nQ3qP1M9AMSgO
 OihC04XuM14/3tEviu/cUOd60wYcUQ/kfI5C+IL35ezeP2w9lnuKqeFpG7aDOa33
 5F3isDPamJdXZEZs44CW15brR6dqDlEi5acKee/TtFV9mN6xNhzxM64IaFqecPmW
 P+BTwunB8xwquY8RzsHXor/GOGb6mqWQIPoHEPnywTDe/xQYWt0Exzi7ch6HQr5Z
 ZzHG6X4h6UTNimjay6L4qsRQWILmPIg4Z5IlycWMQ8qDFM0lbnIJqkG4JwceolI6
 aRQyLe3NQFcPYgq3ue+SNm4RckYn4NbAa1zFm0d5VDgKp4xW1sxvtkxOJuxjaOw2
 /rLkHkmyuVvCeTMAySfxrwnszAoM505CHC6CEYc1xELbeCkROFUaymtVyNFnnTru
 V/Jt/T2Gyx6tOrafX7u+djUjv9figddRpNbskVZvEi3Ztq4MQ069nK3oSUqtP5vO
 INApNg4lq6s8aGqVE+Kp9+CKwGqZqI4MdxQMNMAmCRLPon6apActVawbj18qO/wS
 qblQ0cbF8a16itlQ3V68qmhcPh6EZOuq8II4etNq6U0ulV9712WfMbat3z53LG94
 QNkAmZ3/wui93I+Q2NPxhf5ybJFQZhR0SOtVO6xIdTgOntkODwzzGu9UapfD8mLb
 k5BpWnH8CoUgiw==
 =I67j
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for timekeeping, timers and clockevent/source drivers:

  Core:

   - Yet another round of improvements to make the clocksource watchdog
     more robust:

       - Relax the clocksource-watchdog skew criteria to match the NTP
         criteria.

       - Temporarily skip the watchdog when high memory latencies are
         detected which can lead to false-positives.

       - Provide an option to enable TSC skew detection even on systems
         where TSC is marked as reliable.

     Sigh!

   - Initialize the restart block in the nanosleep syscalls to be
     directed to the no restart function instead of doing a partial
     setup on entry.

     This prevents an erroneous restart_syscall() invocation from
     corrupting user space data. While such a situation is clearly a
     user space bug, preventing this is a correctness issue and caters
     to the least suprise principle.

   - Ignore the hrtimer slack for realtime tasks in schedule_hrtimeout()
     to align it with the nanosleep semantics.

  Drivers:

   - The obligatory new driver bindings for Mediatek, Rockchip and
     RISC-V variants.

   - Add support for the C3STOP misfeature to the RISC-V timer to handle
     the case where the timer stops in deeper idle state.

   - Set up a static key in the RISC-V timer correctly before first use.

   - The usual small improvements and fixes all over the place"

* tag 'timers-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits)
  clocksource/drivers/timer-sun4i: Add CLOCK_EVT_FEAT_DYNIRQ
  clocksource/drivers/em_sti: Mark driver as non-removable
  clocksource/drivers/sh_tmu: Mark driver as non-removable
  clocksource/drivers/riscv: Patch riscv_clock_next_event() jump before first use
  clocksource/drivers/timer-microchip-pit64b: Add delay timer
  clocksource/drivers/timer-microchip-pit64b: Select driver only on ARM
  dt-bindings: timer: sifive,clint: add comaptibles for T-Head's C9xx
  dt-bindings: timer: mediatek,mtk-timer: add MT8365
  clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
  clocksource/drivers/sh_cmt: Mark driver as non-removable
  clocksource/drivers/timer-microchip-pit64b: Drop obsolete dependency on COMPILE_TEST
  clocksource/drivers/riscv: Increase the clock source rating
  clocksource/drivers/timer-riscv: Set CLOCK_EVT_FEAT_C3STOP based on DT
  dt-bindings: timer: Add bindings for the RISC-V timer device
  RISC-V: time: initialize hrtimer based broadcast clock event device
  dt-bindings: timer: rk-timer: Add rktimer for rv1126
  time/debug: Fix memory leak with using debugfs_lookup()
  clocksource: Enable TSC watchdog checking of HPET and PMTMR only when requested
  posix-timers: Use atomic64_try_cmpxchg() in __update_gt_cputime()
  clocksource: Verify HPET and PMTMR when TSC unverified
  ...
2023-02-21 09:45:13 -08:00
Linus Torvalds 1f2d9ffc7a Scheduler updates in this cycle are:
- Improve the scalability of the CFS bandwidth unthrottling logic
    with large number of CPUs.
 
  - Fix & rework various cpuidle routines, simplify interaction with
    the generic scheduler code. Add __cpuidle methods as noinstr to
    objtool's noinstr detection and fix boatloads of cpuidle bugs & quirks.
 
  - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS,
    to query previously issued registrations.
 
  - Limit scheduler slice duration to the sysctl_sched_latency period,
    to improve scheduling granularity with a large number of SCHED_IDLE
    tasks.
 
  - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
    but also enable them to prevent a cascade of followup problems and
    repeat warnings.
 
  - Fix the rescheduling logic in prio_changed_dl().
 
  - Micro-optimize cpufreq and sched-util methods.
 
  - Micro-optimize ttwu_runnable()
 
  - Micro-optimize the idle-scanning in update_numa_stats(),
    select_idle_capacity() and steal_cookie_task().
 
  - Update the RSEQ code & self-tests
 
  - Constify various scheduler methods
 
  - Remove unused methods
 
  - Refine __init tags
 
  - Documentation updates
 
  - ... Misc other cleanups, fixes
 
 Signed-off-by: Ingo Molnar <mingo@kernel.org>
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmPzbJwRHG1pbmdvQGtl
 cm5lbC5vcmcACgkQEnMQ0APhK1iIvA//ZcEaB8Z6ChLRQjM+bsaudKJu3pdLQbPK
 iYbP8Da+LsAfxbEfYuGV3m+jIp0LlBOtsI/EezxQrXV+V7FvNyAX9Y00eEu/zlj8
 7Jn3LMy/DBYTwH7LwVdcU0MyIVI8ZPc6WNnkx0LOtGZn8n+qfHPSDzcP3CW+a5AV
 UvllPYpYyEmsX0Eby7CF4Ue8mSmbViw/xR3rNr8ZSve0c25XzKabw8O9kE3jiHxP
 d/zERJoAYeDyYUEuZqhfn5dTlB4an4IjNEkAfRE5SQ09RA8Gkxsa5Ar8gob9e9M1
 eQsdd4/bdhnrkM8L5qDZczqmgCTZ2bukQrxkBXhRDhLgoFxwAn77b+2ZjmIW3Lae
 AyGqRcDSg1q2oxaYm5ZiuO/t26aDOZu9vPHyHRDGt95EGbZlrp+GgeePyfCigJYz
 UmPdZAAcHdSymnnnlcvdG37WVvaVkpgWZzd8LbtBi23QR+Zc4WQ2IlgnUS5WKNNf
 VOBcAcP6E1IslDotZDQCc2dPFFQoQQEssVooyUc5oMytm7BsvxXLOeHG+Ncu/8uc
 H+U8Qn8jnqTxJbC5hkWQIJlhVKCq2FJrHxxySYTKROfUNcDgCmxboFeAcXTCIU1K
 T0S+sdoTS/CvtLklRkG0j6B8N4N98mOd9cFwUV3tX+/gMLMep3hCQs5L76JagvC5
 skkQXoONNaM=
 =l1nN
 -----END PGP SIGNATURE-----

Merge tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler updates from Ingo Molnar:

 - Improve the scalability of the CFS bandwidth unthrottling logic with
   large number of CPUs.

 - Fix & rework various cpuidle routines, simplify interaction with the
   generic scheduler code. Add __cpuidle methods as noinstr to objtool's
   noinstr detection and fix boatloads of cpuidle bugs & quirks.

 - Add new ABI: introduce MEMBARRIER_CMD_GET_REGISTRATIONS, to query
   previously issued registrations.

 - Limit scheduler slice duration to the sysctl_sched_latency period, to
   improve scheduling granularity with a large number of SCHED_IDLE
   tasks.

 - Debuggability enhancement on sys_exit(): warn about disabled IRQs,
   but also enable them to prevent a cascade of followup problems and
   repeat warnings.

 - Fix the rescheduling logic in prio_changed_dl().

 - Micro-optimize cpufreq and sched-util methods.

 - Micro-optimize ttwu_runnable()

 - Micro-optimize the idle-scanning in update_numa_stats(),
   select_idle_capacity() and steal_cookie_task().

 - Update the RSEQ code & self-tests

 - Constify various scheduler methods

 - Remove unused methods

 - Refine __init tags

 - Documentation updates

 - Misc other cleanups, fixes

* tag 'sched-core-2023-02-20' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (110 commits)
  sched/rt: pick_next_rt_entity(): check list_entry
  sched/deadline: Add more reschedule cases to prio_changed_dl()
  sched/fair: sanitize vruntime of entity being placed
  sched/fair: Remove capacity inversion detection
  sched/fair: unlink misfit task from cpu overutilized
  objtool: mem*() are not uaccess safe
  cpuidle: Fix poll_idle() noinstr annotation
  sched/clock: Make local_clock() noinstr
  sched/clock/x86: Mark sched_clock() noinstr
  x86/pvclock: Improve atomic update of last_value in pvclock_clocksource_read()
  x86/atomics: Always inline arch_atomic64*()
  cpuidle: tracing, preempt: Squash _rcuidle tracing
  cpuidle: tracing: Warn about !rcu_is_watching()
  cpuidle: lib/bug: Disable rcu_is_watching() during WARN/BUG
  cpuidle: drivers: firmware: psci: Dont instrument suspend code
  KVM: selftests: Fix build of rseq test
  exit: Detect and fix irq disabled state in oops
  cpuidle, arm64: Fix the ARM64 cpuidle logic
  cpuidle: mvebu: Fix duplicate flags assignment
  sched/fair: Limit sched slice duration
  ...
2023-02-20 17:41:08 -08:00
Jakub Kicinski ee8d72a157 bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCY+/uBgAKCRDbK58LschI
 g0ngAPwJHd1RicBuy2C4fLv0nGKZtmYZBAnTGlI2RisPxU6BRwEAwUDLHuc5K6nR
 j261okOxOy/MRxdN1NhmR6Qe7nMyQAk=
 =tYU+
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2023-02-17

We've added 64 non-merge commits during the last 7 day(s) which contain
a total of 158 files changed, 4190 insertions(+), 988 deletions(-).

The main changes are:

1) Add a rbtree data structure following the "next-gen data structure"
   precedent set by recently-added linked-list, that is, by using
   kfunc + kptr instead of adding a new BPF map type, from Dave Marchevsky.

2) Add a new benchmark for hashmap lookups to BPF selftests,
   from Anton Protopopov.

3) Fix bpf_fib_lookup to only return valid neighbors and add an option
   to skip the neigh table lookup, from Martin KaFai Lau.

4) Add cgroup.memory=nobpf kernel parameter option to disable BPF memory
   accouting for container environments, from Yafang Shao.

5) Batch of ice multi-buffer and driver performance fixes,
   from Alexander Lobakin.

6) Fix a bug in determining whether global subprog's argument is
   PTR_TO_CTX, which is based on type names which breaks kprobe progs,
   from Andrii Nakryiko.

7) Prep work for future -mcpu=v4 LLVM option which includes usage of
   BPF_ST insn. Thus improve BPF_ST-related value tracking in verifier,
   from Eduard Zingerman.

8) More prep work for later building selftests with Memory Sanitizer
   in order to detect usages of undefined memory, from Ilya Leoshkevich.

9) Fix xsk sockets to check IFF_UP earlier to avoid a NULL pointer
   dereference via sendmsg(), from Maciej Fijalkowski.

10) Implement BPF trampoline for RV64 JIT compiler, from Pu Lehui.

11) Fix BPF memory allocator in combination with BPF hashtab where it could
    corrupt special fields e.g. used in bpf_spin_lock, from Hou Tao.

12) Fix LoongArch BPF JIT to always use 4 instructions for function
    address so that instruction sequences don't change between passes,
    from Hengqi Chen.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
  selftests/bpf: Add bpf_fib_lookup test
  bpf: Add BPF_FIB_LOOKUP_SKIP_NEIGH for bpf_fib_lookup
  riscv, bpf: Add bpf trampoline support for RV64
  riscv, bpf: Add bpf_arch_text_poke support for RV64
  riscv, bpf: Factor out emit_call for kernel and bpf context
  riscv: Extend patch_text for multiple instructions
  Revert "bpf, test_run: fix &xdp_frame misplacement for LIVE_FRAMES"
  selftests/bpf: Add global subprog context passing tests
  selftests/bpf: Convert test_global_funcs test to test_loader framework
  bpf: Fix global subprog context argument resolution logic
  LoongArch, bpf: Use 4 instructions for function address in JIT
  bpf: bpf_fib_lookup should not return neigh in NUD_FAILED state
  bpf: Disable bh in bpf_test_run for xdp and tc prog
  xsk: check IFF_UP earlier in Tx path
  Fix typos in selftest/bpf files
  selftests/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  samples/bpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  bpftool: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Use bpf_{btf,link,map,prog}_get_info_by_fd()
  libbpf: Introduce bpf_{btf,link,map,prog}_get_info_by_fd()
  ...
====================

Link: https://lore.kernel.org/r/20230217221737.31122-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-02-20 16:31:14 -08:00
Pu Lehui 5e57fb7b0b riscv: Extend patch_text for multiple instructions
Extend patch_text for multiple instructions. This is the preparaiton for
multiple instructions text patching in riscv BPF trampoline, and may be
useful for other scenario.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Björn Töpel <bjorn@rivosinc.com>
Link: https://lore.kernel.org/bpf/20230215135205.1411105-2-pulehui@huaweicloud.com
2023-02-17 21:45:30 +01:00
Heiko Stuebner 8658db0a4a
RISC-V: move some stray __RISCV_INSN_FUNCS definitions from kprobes
The __RISCV_INSN_FUNCS originally declared riscv_insn_is_* functions inside
the kprobes implementation. This got moved into a central header in
commit ec5f908775 ("RISC-V: Move riscv_insn_is_* macros into a common header").

Though it looks like I overlooked two of them, so fix that. FENCE itself is
an instruction defined directly by its own opcode, while the created
riscv_isn_is_system function covers all instructions defined under the SYSTEM
opcode.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230113211955.3534431-1-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15 20:32:31 -08:00
Palmer Dabbelt ec6311919e
Merge patch series "riscv: Optimize function trace"
guoren@kernel.org <guoren@kernel.org> says:

From: Guo Ren <guoren@linux.alibaba.com>

The previous ftrace detour implementation fc76b8b8011 ("riscv: Using
PATCHABLE_FUNCTION_ENTRY instead of MCOUNT") contain three problems.

 - The most horrible bug is preemption panic which found by Andy [1].
   Let's disable preemption for ftrace first, and Andy could continue
   the ftrace preemption work.
 - The "-fpatchable-function-entry= CFLAG" wasted code size
   !RISCV_ISA_C.
 - The ftrace detour implementation wasted code size.
 - When livepatching, the trampoline (ftrace_regs_caller) would not
   return to <func_prolog+12> but would rather jump to the new function.
   So, "REG_L ra, -SZREG(sp)" would not run and the original return
   address would not be restored. The kernel is likely to hang or crash
   as a result. (Found by Evgenii Shatokhin [4])

[Palmer: The first three patches in this series are pretty concrete
fixes, so I'm pulling them ahead of the rest of the series.]

* b4-shazam-merge:
  riscv: ftrace: Reduce the detour code size to half
  riscv: ftrace: Remove wasted nops for !RISCV_ISA_C
  riscv: ftrace: Fixup panic by disabling preemption

Link: https://lore.kernel.org/r/20230112090603.1295340-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15 10:59:54 -08:00
Guo Ren 6724a76cff
riscv: ftrace: Reduce the detour code size to half
Use a temporary register to reduce the size of detour code from 16 bytes to
8 bytes. The previous implementation is from 'commit afc76b8b80 ("riscv:
Using PATCHABLE_FUNCTION_ENTRY instead of MCOUNT")'.

Before the patch:
<func_prolog>:
 0: REG_S  ra, -SZREG(sp)
 4: auipc  ra, ?
 8: jalr   ?(ra)
12: REG_L  ra, -SZREG(sp)
 (func_boddy)

After the patch:
<func_prolog>:
 0: auipc  t0, ?
 4: jalr   t0, ?(t0)
 (func_boddy)

This patch not just reduces the size of detour code, but also fixes an
important issue:

An Ftrace callback registered with FTRACE_OPS_FL_IPMODIFY flag can
actually change the instruction pointer, e.g. to "replace" the given
kernel function with a new one, which is needed for livepatching, etc.

In this case, the trampoline (ftrace_regs_caller) would not return to
<func_prolog+12> but would rather jump to the new function. So, "REG_L
ra, -SZREG(sp)" would not run and the original return address would not
be restored. The kernel is likely to hang or crash as a result.

This can be easily demonstrated if one tries to "replace", say,
cmdline_proc_show() with a new function with the same signature using
instruction_pointer_set(&fregs->regs, new_func_addr) in the Ftrace
callback.

Link: https://lore.kernel.org/linux-riscv/20221122075440.1165172-1-suagrfillet@gmail.com/
Link: https://lore.kernel.org/linux-riscv/d7d5730b-ebef-68e5-5046-e763e1ee6164@yadro.com/
Co-developed-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Song Shuai <suagrfillet@gmail.com>
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Cc: Evgenii Shatokhin <e.shatokhin@yadro.com>
Reviewed-by: Evgenii Shatokhin <e.shatokhin@yadro.com>
Link: https://lore.kernel.org/r/20230112090603.1295340-4-guoren@kernel.org
Cc: stable@vger.kernel.org
Fixes: 10626c32e3 ("riscv/ftrace: Add basic support")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-15 10:58:21 -08:00
Palmer Dabbelt 3863f2a431
Merge patch series "dt-bindings: Add a cpu-capacity property for RISC-V"
Conor Dooley <conor@kernel.org> says:

From: Conor Dooley <conor.dooley@microchip.com>

Ever since RISC-V starting using generic arch topology code, the code
paths for cpu-capacity have been there but there's no binding defined to
actually convey the information. Defining the same property as used on
arm seems to be the only logical thing to do, so do it.

[Palmer: This is on top of the fix required to make it work, which
itself wasn't merged until late in the 6.2 cycle and thus pulls in
various other fixes.]

* b4-shazam-merge:
  dt-bindings: riscv: add a capacity-dmips-mhz cpu property
  dt-bindings: arm: move cpu-capacity to a shared loation
  riscv: Move call to init_cpu_topology() to later initialization stage
  riscv/kprobe: Fix instruction simulation of JALR
  riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
  MAINTAINERS: add an IRC entry for RISC-V
  RISC-V: fix compile error from deduplicated __ALTERNATIVE_CFG_2
  dt-bindings: riscv: fix single letter canonical order
  dt-bindings: riscv: fix underscore requirement for multi-letter extensions
  riscv: uaccess: fix type of 0 variable on error in get_user()
  riscv, kprobes: Stricter c.jr/c.jalr decoding

Link: https://lore.kernel.org/r/20230104180513.1379453-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-14 19:29:46 -08:00
Lad Prabhakar 3aff0403f8 clocksource/drivers/riscv: Get rid of clocksource_arch_init() callback
Having a clocksource_arch_init() callback always sets vdso_clock_mode to
VDSO_CLOCKMODE_ARCHTIMER if GENERIC_GETTIMEOFDAY is enabled, this is
required for the riscv-timer.

This works for platforms where just riscv-timer clocksource is present.
On platforms where other clock sources are available we want them to
register with vdso_clock_mode set to VDSO_CLOCKMODE_NONE.

On the Renesas RZ/Five SoC OSTM block can be used as clocksource [0], to
avoid multiple clock sources being registered as VDSO_CLOCKMODE_ARCHTIMER
move setting of vdso_clock_mode in the riscv-timer driver instead of doing
this in clocksource_arch_init() callback as done similarly for ARM/64
architecture.

[0] drivers/clocksource/renesas-ostm.c

Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221229224601.103851-1-prabhakar.mahadev-lad.rj@bp.renesas.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-02-13 13:10:17 +01:00
Conor Dooley 8b3b8fbb48 RISC-V: time: initialize hrtimer based broadcast clock event device
Similarly to commit 022eb8ae8b ("ARM: 8938/1: kernel: initialize
broadcast hrtimer based clock event device"), RISC-V needs to initiate
hrtimer based broadcast clock event device before C3STOP can be used.
Otherwise, the introduction of C3STOP for the RISC-V arch timer in
commit 232ccac1bd ("clocksource/drivers/riscv: Events are stopped
during CPU suspend") leaves us without any broadcast timer registered.
This prevents the kernel from entering oneshot mode, which breaks timer
behaviour, for example clock_nanosleep().

A test app that sleeps each cpu for 6, 5, 4, 3 ms respectively, HZ=250
& C3STOP enabled, the sleep times are rounded up to the next jiffy:
== CPU: 1 ==      == CPU: 2 ==      == CPU: 3 ==      == CPU: 4 ==
Mean: 7.974992    Mean: 7.976534    Mean: 7.962591    Mean: 3.952179
Std Dev: 0.154374 Std Dev: 0.156082 Std Dev: 0.171018 Std Dev: 0.076193
Hi: 9.472000      Hi: 10.495000     Hi: 8.864000      Hi: 4.736000
Lo: 6.087000      Lo: 6.380000      Lo: 4.872000      Lo: 3.403000
Samples: 521      Samples: 521      Samples: 521      Samples: 521

Link: https://lore.kernel.org/linux-riscv/YzYTNQRxLr7Q9JR0@spud/
Fixes: 232ccac1bd ("clocksource/drivers/riscv: Events are stopped during CPU suspend")
Suggested-by: Samuel Holland <samuel@sholland.org>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Samuel Holland <samuel@sholland.org>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230103141102.772228-2-apatel@ventanamicro.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
2023-02-13 13:10:16 +01:00
Guo Ren eb7423273c
riscv: kprobe: Fixup misaligned load text
The current kprobe would cause a misaligned load for the probe point.
This patch fixup it with two half-word loads instead.

Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/linux-riscv/878rhig9zj.fsf@all.your.base.are.belong.to.us/
Reported-by: Bjorn Topel <bjorn.topel@gmail.com>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230204063531.740220-1-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-09 11:34:29 -08:00
Ard Biesheuvel cf1d2ffcc6 efi: Discover BTI support in runtime services regions
Add the generic plumbing to detect whether or not the runtime code
regions were constructed with BTI/IBT landing pads by the firmware,
permitting the OS to enable enforcement when mapping these regions into
the OS's address space.

Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
2023-02-04 09:19:02 +01:00
Liu Shixin cb80242cc6
riscv: stacktrace: Fix missing the first frame
When running kfence_test, I found some testcases failed like this:

 # test_out_of_bounds_read: EXPECTATION FAILED at mm/kfence/kfence_test.c:346
 Expected report_matches(&expect) to be true, but is false
 not ok 1 - test_out_of_bounds_read

The corresponding call-trace is:

 BUG: KFENCE: out-of-bounds read in kunit_try_run_case+0x38/0x84

 Out-of-bounds read at 0x(____ptrval____) (32B right of kfence-#10):
  kunit_try_run_case+0x38/0x84
  kunit_generic_run_threadfn_adapter+0x12/0x1e
  kthread+0xc8/0xde
  ret_from_exception+0x0/0xc

The kfence_test using the first frame of call trace to check whether the
testcase is succeed or not. Commit 6a00ef4493 ("riscv: eliminate
unreliable __builtin_frame_address(1)") skip first frame for all
case, which results the kfence_test failed. Indeed, we only need to skip
the first frame for case (task==NULL || task==current).

With this patch, the call-trace will be:

 BUG: KFENCE: out-of-bounds read in test_out_of_bounds_read+0x88/0x19e

 Out-of-bounds read at 0x(____ptrval____) (1B left of kfence-#7):
  test_out_of_bounds_read+0x88/0x19e
  kunit_try_run_case+0x38/0x84
  kunit_generic_run_threadfn_adapter+0x12/0x1e
  kthread+0xc8/0xde
  ret_from_exception+0x0/0xc

Fixes: 6a00ef4493 ("riscv: eliminate unreliable __builtin_frame_address(1)")
Signed-off-by: Liu Shixin <liushixin2@huawei.com>
Tested-by: Samuel Holland <samuel@sholland.org>
Link: https://lore.kernel.org/r/20221207025038.1022045-1-liushixin2@huawei.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-02 19:33:05 -08:00
Guo Ren 87f48c7ccc
riscv: kprobe: Fixup kernel panic when probing an illegal position
The kernel would panic when probed for an illegal position. eg:

(CONFIG_RISCV_ISA_C=n)

echo 'p:hello kernel_clone+0x16 a0=%a0' >> kprobe_events
echo 1 > events/kprobes/hello/enable
cat trace

Kernel panic - not syncing: stack-protector: Kernel stack
is corrupted in: __do_sys_newfstatat+0xb8/0xb8
CPU: 0 PID: 111 Comm: sh Not tainted
6.2.0-rc1-00027-g2d398fe49a4d #490
Hardware name: riscv-virtio,qemu (DT)
Call Trace:
[<ffffffff80007268>] dump_backtrace+0x38/0x48
[<ffffffff80c5e83c>] show_stack+0x50/0x68
[<ffffffff80c6da28>] dump_stack_lvl+0x60/0x84
[<ffffffff80c6da6c>] dump_stack+0x20/0x30
[<ffffffff80c5ecf4>] panic+0x160/0x374
[<ffffffff80c6db94>] generic_handle_arch_irq+0x0/0xa8
[<ffffffff802deeb0>] sys_newstat+0x0/0x30
[<ffffffff800158c0>] sys_clone+0x20/0x30
[<ffffffff800039e8>] ret_from_syscall+0x0/0x4
---[ end Kernel panic - not syncing: stack-protector:
Kernel stack is corrupted in: __do_sys_newfstatat+0xb8/0xb8 ]---

That is because the kprobe's ebreak instruction broke the kernel's
original code. The user should guarantee the correction of the probe
position, but it couldn't make the kernel panic.

This patch adds arch_check_kprobe in arch_prepare_kprobe to prevent an
illegal position (Such as the middle of an instruction).

Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20230201040604.3390509-1-guoren@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-01 20:49:55 -08:00
Palmer Dabbelt 9daca9a5b9
Merge patch series "riscv: improve boot time isa extensions handling"
Jisheng Zhang <jszhang@kernel.org> says:

Generally, riscv ISA extensions are fixed for any specific hardware
platform, so a hart's features won't change after booting, this
chacteristic makes it straightforward to use a static branch to check
a specific ISA extension is supported or not to optimize performance.

However, some ISA extensions such as SVPBMT and ZICBOM are handled
via. the alternative sequences.

Basically, for ease of maintenance, we prefer to use static branches
in C code, but recently, Samuel found that the static branch usage in
cpu_relax() breaks building with CONFIG_CC_OPTIMIZE_FOR_SIZE[1]. As
Samuel pointed out, "Having a static branch in cpu_relax() is
problematic because that function is widely inlined, including in some
quite complex functions like in the VDSO. A quick measurement shows
this static branch is responsible by itself for around 40% of the jump
table."

Samuel's findings pointed out one of a few downsides of static branches
usage in C code to handle ISA extensions detected at boot time:
static branch's metadata in the __jump_table section, which is not
discarded after ISA extensions are finalized, wastes some space.

I want to try to solve the issue for all possible dynamic handling of
ISA extensions at boot time. Inspired by Mark[2], this patch introduces
riscv_has_extension_*() helpers, which work like static branches but
are patched using alternatives, thus the metadata can be freed after
patching.

[1]https://lore.kernel.org/linux-riscv/20220922060958.44203-1-samuel@sholland.org/
[2]https://lore.kernel.org/linux-arm-kernel/20220912162210.3626215-8-mark.rutland@arm.com/
[3]https://lore.kernel.org/linux-riscv/20221130225614.1594256-1-heiko@sntech.de/

* b4-shazam-merge:
  riscv: remove riscv_isa_ext_keys[] array and related usage
  riscv: KVM: Switch has_svinval() to riscv_has_extension_unlikely()
  riscv: cpu_relax: switch to riscv_has_extension_likely()
  riscv: alternative: patch alternatives in the vDSO
  riscv: switch to relative alternative entries
  riscv: module: Add ADD16 and SUB16 rela types
  riscv: module: move find_section to module.h
  riscv: fpu: switch has_fpu() to riscv_has_extension_likely()
  riscv: introduce riscv_has_extension_[un]likely()
  riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions
  riscv: hwcap: make ISA extension ids can be used in asm
  riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
  riscv: move riscv_noncoherent_supported() out of ZICBOM probe

Link: https://lore.kernel.org/r/20230128172856.3814-1-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-02-01 19:36:25 -08:00
Jisheng Zhang 03966594e1
riscv: remove riscv_isa_ext_keys[] array and related usage
All users have switched to riscv_has_extension_*, remove unused
definitions, vars and related setting code.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230128172856.3814-14-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:45 -08:00
Jisheng Zhang cabfd146b3
riscv: alternative: patch alternatives in the vDSO
Make it possible to use alternatives in the vDSO, so that better
implementations can be used if possible.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Link: https://lore.kernel.org/r/20230128172856.3814-11-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:42 -08:00
Jisheng Zhang 8d23e94a44
riscv: switch to relative alternative entries
Instead of using absolute addresses for both the old instrucions and
the alternative instructions, use offsets relative to the alt_entry
values. So this not only cuts the size of the alternative entry, but
also meets the prerequisite for patching alternatives in the vDSO,
since absolute alternative entries are subject to dynamic relocation,
which is incompatible with the vDSO building.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-10-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:41 -08:00
Andrew Jones 1bc400ffb5
riscv: module: Add ADD16 and SUB16 rela types
To prepare for 16-bit relocation types to be emitted in alternatives
add support for ADD16 and SUB16.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-9-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:40 -08:00
Jisheng Zhang e0c267e03b
riscv: module: move find_section to module.h
Move find_section() to module.h so that the implementation can be shared
by the alternatives code. This will allow us to use alternatives in
the vdso.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20230128172856.3814-8-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:39 -08:00
Jisheng Zhang 4bf8860760
riscv: cpufeature: extend riscv_cpufeature_patch_func to all ISA extensions
riscv_cpufeature_patch_func() currently only scans a limited set of
cpufeatures, explicitly defined with macros. Extend it to probe for all
ISA extensions.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-5-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:36 -08:00
Jisheng Zhang 191b27c7c0
riscv: cpufeature: detect RISCV_ALTERNATIVES_EARLY_BOOT earlier
Currently riscv_cpufeature_patch_func() does nothing at the
RISCV_ALTERNATIVES_EARLY_BOOT stage. Add a check to detect whether we
are in this stage and exit early. This will allow us to use
riscv_cpufeature_patch_func() for scanning of all ISA extensions.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-3-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:34 -08:00
Jisheng Zhang abcc445acd
riscv: move riscv_noncoherent_supported() out of ZICBOM probe
It's a bit weird to call riscv_noncoherent_supported() each time when
insmoding a module. Move the calling out of feature patch func.

Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230128172856.3814-2-jszhang@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 23:29:33 -08:00
Palmer Dabbelt 75ab93a244
Merge patch series "Zbb string optimizations"
Heiko Stuebner <heiko@sntech.de> says:

From: Heiko Stuebner <heiko.stuebner@vrull.eu>

This series still tries to allow optimized string functions for specific
extensions. The last approach of using an inline base function to hold
the alternative calls did cause some issues in a number of places

So instead of that we're now just using an alternative j at the beginning
of the generic function to jump to a separate place inside the function
itself.

* b4-shazam-merge:
  RISC-V: add zbb support to string functions
  RISC-V: add infrastructure to allow different str* implementations

Link: https://lore.kernel.org/r/20230113212301.3534711-1-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 11:43:27 -08:00
Heiko Stuebner b6fcdb191e
RISC-V: add zbb support to string functions
Add handling for ZBB extension and add support for using it as a
variant for optimized string functions.

Support for the Zbb-str-variants is limited to the GNU-assembler
for now, as LLVM has not yet acquired the functionality to
selectively change the arch option in assembler code.
This is still under review at
    https://reviews.llvm.org/D123515

Co-developed-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Christoph Muellner <christoph.muellner@vrull.eu>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 11:43:24 -08:00
Heiko Stuebner 56e0790c7f
RISC-V: add infrastructure to allow different str* implementations
Depending on supported extensions on specific RISC-V cores,
optimized str* functions might make sense.

This adds basic infrastructure to allow patching the function calls
via alternatives later on.

The Linux kernel provides standard implementations for string functions
but when architectures want to extend them, they need to provide their
own.

The added generic string functions are done in assembler (taken from
disassembling the main-kernel functions for now) to allow us to control
the used registers and extend them with optimized variants.

This doesn't override the compiler's use of builtin replacements. So still
first of all the compiler will select if a builtin will be better suitable
i.e. for known strings. For all regular cases we will want to later
select possible optimized variants and in the worst case fall back to the
generic implemention added with this change.

Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20230113212301.3534711-2-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-31 11:43:23 -08:00
Ingo Molnar 57a30218fa Linux 6.2-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmPW7E8eHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGf7MIAI0JnHN9WvtEukSZ
 E6j6+cEGWxsvD6q0g3GPolaKOCw7hlv0pWcFJFcUAt0jebspMdxV2oUGJ8RYW7Lg
 nCcHvEVswGKLAQtQSWw52qotW6fUfMPsNYYB5l31sm1sKH4Cgss0W7l2HxO/1LvG
 TSeNHX53vNAZ8pVnFYEWCSXC9bzrmU/VALF2EV00cdICmfvjlgkELGXoLKJJWzUp
 s63fBHYGGURSgwIWOKStoO6HNo0j/F/wcSMx8leY8qDUtVKHj4v24EvSgxUSDBER
 ch3LiSQ6qf4sw/z7pqruKFthKOrlNmcc0phjiES0xwwGiNhLv0z3rAhc4OM2cgYh
 SDc/Y/c=
 =zpaD
 -----END PGP SIGNATURE-----

Merge tag 'v6.2-rc6' into sched/core, to pick up fixes

Pick up fixes before merging another batch of cpuidle updates.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
2023-01-31 15:01:20 +01:00
Ley Foon Tan c1d6105869
riscv: Move call to init_cpu_topology() to later initialization stage
If "capacity-dmips-mhz" is present in a CPU DT node,
topology_parse_cpu_capacity() will fail to allocate memory.  arm64, with
which this code path is shared, does not call
topology_parse_cpu_capacity() until later in boot where memory
allocation is available.  While "capacity-dmips-mhz" is not yet a valid
property on RISC-V, invalid properties should be ignored rather than
cause issues.  Move init_cpu_topology(), which calls
topology_parse_cpu_capacity(), to a later initialization stage, to match
arm64.

As a side effect of this change, RISC-V is "protected" from changes to
core topology code that would work on arm64 where memory allocation is
safe but on RISC-V isn't.

Fixes: 03f11f03db ("RISC-V: Parse cpu topology during boot.")
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Ley Foon Tan <leyfoon.tan@starfivetech.com>
Link: https://lore.kernel.org/r/20230105033705.3946130-1-leyfoon.tan@starfivetech.com
[Palmer: use Conor's commit text]
Link: https://lore.kernel.org/linux-riscv/20230104183033.755668-1-pierre.gondois@arm.com/T/#me592d4c8b9508642954839f0077288a353b0b9b2
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-25 07:20:00 -08:00
Liao Chang ca0254998b
riscv/kprobe: Fix instruction simulation of JALR
Set kprobe at 'jalr 1140(ra)' of vfs_write results in the following
crash:

[   32.092235] Unable to handle kernel access to user memory without uaccess routines at virtual address 00aaaaaad77b1170
[   32.093115] Oops [#1]
[   32.093251] Modules linked in:
[   32.093626] CPU: 0 PID: 135 Comm: ftracetest Not tainted 6.2.0-rc2-00013-gb0aa5e5df0cb-dirty #16
[   32.093985] Hardware name: riscv-virtio,qemu (DT)
[   32.094280] epc : ksys_read+0x88/0xd6
[   32.094855]  ra : ksys_read+0xc0/0xd6
[   32.095016] epc : ffffffff801cda80 ra : ffffffff801cdab8 sp : ff20000000d7bdc0
[   32.095227]  gp : ffffffff80f14000 tp : ff60000080f9cb40 t0 : ffffffff80f13e80
[   32.095500]  t1 : ffffffff8000c29c t2 : ffffffff800dbc54 s0 : ff20000000d7be60
[   32.095716]  s1 : 0000000000000000 a0 : ffffffff805a64ae a1 : ffffffff80a83708
[   32.095921]  a2 : ffffffff80f160a0 a3 : 0000000000000000 a4 : f229b0afdb165300
[   32.096171]  a5 : f229b0afdb165300 a6 : ffffffff80eeebd0 a7 : 00000000000003ff
[   32.096411]  s2 : ff6000007ff76800 s3 : fffffffffffffff7 s4 : 00aaaaaad77b1170
[   32.096638]  s5 : ffffffff80f160a0 s6 : ff6000007ff76800 s7 : 0000000000000030
[   32.096865]  s8 : 00ffffffc3d97be0 s9 : 0000000000000007 s10: 00aaaaaad77c9410
[   32.097092]  s11: 0000000000000000 t3 : ffffffff80f13e48 t4 : ffffffff8000c29c
[   32.097317]  t5 : ffffffff8000c29c t6 : ffffffff800dbc54
[   32.097505] status: 0000000200000120 badaddr: 00aaaaaad77b1170 cause: 000000000000000d
[   32.098011] [<ffffffff801cdb72>] ksys_write+0x6c/0xd6
[   32.098222] [<ffffffff801cdc06>] sys_write+0x2a/0x38
[   32.098405] [<ffffffff80003c76>] ret_from_syscall+0x0/0x2

Since the rs1 and rd might be the same one, such as 'jalr 1140(ra)',
hence it requires obtaining the target address from rs1 followed by
updating rd.

Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Liao Chang <liaochang1@huawei.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230116064342.2092136-1-liaochang1@huawei.com
[Palmer: Pick Guo's cleanup]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24 21:38:19 -08:00
Jisheng Zhang 9d5567ccf9
riscv: fix jal offsets in patched alternatives
Alternatives live in a different section, so offsets used by jal
instruction will point to wrong locations after the patch got applied.

Similar to arm64, adjust the location to consider that offset.

Co-developed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Jisheng Zhang <jszhang@kernel.org>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20230113212205.3534622-1-heiko@sntech.de
Fixes: 27c653c065 ("RISC-V: fix auipc-jalr addresses in patched alternatives")
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-24 18:43:56 -08:00
Greg Kroah-Hartman 2e4a4e3628 cacheinfo and arch_topology updates for v6.3
The main change is to build the cache topology information for all
 the CPUs from the primary CPU. Currently the cacheinfo for secondary CPUs
 is created during the early boot on the respective CPU itself. Preemption
 and interrupts are disabled at this stage. On PREEMPT_RT kernels, allocating
 memory and even parsing the PPTT table for ACPI based systems triggers a:
   'BUG: sleeping function called from invalid context'
 
 To prevent this bug, the cacheinfo is now allocated from the primary CPU
 when preemption and interrupts are enabled and before booting secondary
 CPUs. The cache levels/leaves are computed from DT/ACPI PPTT information
 only, without relying on any architecture specific mechanism if done so
 early.
 
 The other minor change included here is to handle shared caches at
 different levels when not all the CPUs on the system have the same
 cache hierarchy.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEEunHlEgbzHrJD3ZPhAEG6vDF+4pgFAmPKg60ACgkQAEG6vDF+
 4pi91hAAoSluqjbUHzzCW+OIIjKAAvQrAw6bsKGvSpcUfYno1Lry+9y76L6TMYSy
 OPtiKGxcJyzhdlCwIpJzgaX9nTz7uiu70euNZiAp11XA2KlphtLoI3TMUa60jD4i
 ZGfn9UiAp719Vog5m3CmZXjHZ6drI0HloL8ZWTl4VDATUu5pfcx4uYPT2o63Xc62
 k5QglaRJWFhFAJ+R6R9vQS2zfeMI9xvehl72445wb8pxxPW2f91dvBhJqJgKlziw
 gHKx+D1DnpAUd+v+7HAEmzjXKlY6JnQybmBHmRayllVAa8kGUtvhTcBlRGNsNBzR
 m7VBFKq+eSk7VgxOgka1qXVtHUrlaEWf5qWnG+w4XEiE1VgzNagjaFRaGQQneKI/
 z3yNKG8Xjp+3BdSz0pUDJVEWFnnjueAUEh6/xODEXnWdX166abQZslLIHCvmcnM8
 q7blasuj2mxyCZFC1tyK9WHI7/KCe0cmHbdau3qs0j9bvhzfdB3DwMLsdRjXQTOv
 8FVX0Z5EKY9/bW2oqCg/mb3KOWbmFX2ZHho4cds3IV+9GGB8JkD/6b8vpGejmmx2
 E2vInzhP3gLd9WiWQWDjg4+aklE29P/nDAA8BSPnW3TtEGAFJMZZQRgNlCZnBW56
 Tx2/lE0VD5/UX+1MqSFGchi+KEX3mcZykcra9VNt+uZH26a8gBE=
 =nH4b
 -----END PGP SIGNATURE-----

Merge tag 'archtopo-cacheinfo-updates-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into driver-core-next

Sudeep writes:
  "cacheinfo and arch_topology updates for v6.3

   The main change is to build the cache topology information for all
   the CPUs from the primary CPU. Currently the cacheinfo for secondary CPUs
   is created during the early boot on the respective CPU itself. Preemption
   and interrupts are disabled at this stage. On PREEMPT_RT kernels, allocating
   memory and even parsing the PPTT table for ACPI based systems triggers a:
     'BUG: sleeping function called from invalid context'

   To prevent this bug, the cacheinfo is now allocated from the primary CPU
   when preemption and interrupts are enabled and before booting secondary
   CPUs. The cache levels/leaves are computed from DT/ACPI PPTT information
   only, without relying on any architecture specific mechanism if done so
   early.

   The other minor change included here is to handle shared caches at
   different levels when not all the CPUs on the system have the same
   cache hierarchy."

* tag 'archtopo-cacheinfo-updates-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux:
  cacheinfo: Fix shared_cpu_map to handle shared caches at different levels
  arch_topology: Build cacheinfo from primary CPU
  ACPI: PPTT: Update acpi_find_last_cache_level() to acpi_get_cache_info()
  ACPI: PPTT: Remove acpi_find_cache_levels()
  cacheinfo: Check 'cache-unified' property to count cache leaves
  cacheinfo: Return error code in init_of_cache_level()
  cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation
2023-01-20 13:40:04 +01:00
Palmer Dabbelt 61a9b71290
Merge patch series "Putting some basic order on isa extension lists"
This cleans up the ISA string handling to more closely match a version
of the ISA spec.  This is visible in /proc/cpuinfo and the ordering
changes may break something in userspace, but these orderings have
changed before without issues so with any luck that's still the case.

This also adds documentation so userspace has a better idea of what is
intended when it comes to compatibility for /proc/cpuinfo, which should
help everyone as this will likely keep changing.

* b4-shazam-merge:
  Documentation: riscv: add a section about ISA string ordering in /proc/cpuinfo
  RISC-V: resort all extensions in consistent orders
  RISC-V: clarify ISA string ordering rules in cpu.c

Link: https://lore.kernel.org/r/20221205144525.2148448-1-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-19 16:41:04 -08:00
Masahiro Yamada 5b89c6f9b2
riscv: fix -Wundef warning for CONFIG_RISCV_BOOT_SPINWAIT
Since commit 80b6093b55 ("kbuild: add -Wundef to KBUILD_CPPFLAGS
for W=1 builds"), building with W=1 detects misuse of #if.

  $ make W=1 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu- arch/riscv/kernel/
    [snip]
    AS      arch/riscv/kernel/head.o
  arch/riscv/kernel/head.S:329:5: warning: "CONFIG_RISCV_BOOT_SPINWAIT" is not defined, evaluates to 0 [-Wundef]
    329 | #if CONFIG_RISCV_BOOT_SPINWAIT
        |     ^~~~~~~~~~~~~~~~~~~~~~~~~~

CONFIG_RISCV_BOOT_SPINWAIT is a bool option. #ifdef should be used.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Fixes: 2ffc48fc70 ("RISC-V: Move spinwait booting method to its own config")
Link: https://lore.kernel.org/r/20230106161213.2374093-1-masahiroy@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-19 16:37:11 -08:00
Mike Kravetz e9adcfecf5 mm: remove zap_page_range and create zap_vma_pages
zap_page_range was originally designed to unmap pages within an address
range that could span multiple vmas.  While working on [1], it was
discovered that all callers of zap_page_range pass a range entirely within
a single vma.  In addition, the mmu notification call within zap_page
range does not correctly handle ranges that span multiple vmas.  When
crossing a vma boundary, a new mmu_notifier_range_init/end call pair with
the new vma should be made.

Instead of fixing zap_page_range, do the following:
- Create a new routine zap_vma_pages() that will remove all pages within
  the passed vma.  Most users of zap_page_range pass the entire vma and
  can use this new routine.
- For callers of zap_page_range not passing the entire vma, instead call
  zap_page_range_single().
- Remove zap_page_range.

[1] https://lore.kernel.org/linux-mm/20221114235507.294320-2-mike.kravetz@oracle.com/
Link: https://lkml.kernel.org/r/20230104002732.232573-1-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Suggested-by: Peter Xu <peterx@redhat.com>
Acked-by: Michal Hocko <mhocko@suse.com>
Acked-by: Peter Xu <peterx@redhat.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>	[s390]
Reviewed-by: Christoph Hellwig <hch@lst.de>
Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nadav Amit <nadav.amit@gmail.com>
Cc: Palmer Dabbelt <palmer@dabbelt.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-18 17:12:55 -08:00
Pierre Gondois 5944ce092b arch_topology: Build cacheinfo from primary CPU
commit 3fcbf1c77d ("arch_topology: Fix cache attributes detection
in the CPU hotplug path")
adds a call to detect_cache_attributes() to populate the cacheinfo
before updating the siblings mask. detect_cache_attributes() allocates
memory and can take the PPTT mutex (on ACPI platforms). On PREEMPT_RT
kernels, on secondary CPUs, this triggers a:
  'BUG: sleeping function called from invalid context' [1]
as the code is executed with preemption and interrupts disabled.

The primary CPU was previously storing the cache information using
the now removed (struct cpu_topology).llc_id:
commit 5b8dc787ce ("arch_topology: Drop LLC identifier stash from
the CPU topology")

allocate_cache_info() tries to build the cacheinfo from the primary
CPU prior secondary CPUs boot, if the DT/ACPI description
contains cache information.
If allocate_cache_info() fails, then fallback to the current state
for the cacheinfo allocation. [1] will be triggered in such case.

When unplugging a CPU, the cacheinfo memory cannot be freed. If it
was, then the memory would be allocated early by the re-plugged
CPU and would trigger [1].

Note that populate_cache_leaves() might be called multiple times
due to populate_leaves being moved up. This is required since
detect_cache_attributes() might be called with per_cpu_cacheinfo(cpu)
being allocated but not populated.

[1]:
 | BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:46
 | in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/111
 | preempt_count: 1, expected: 0
 | RCU nest depth: 1, expected: 1
 | 3 locks held by swapper/111/0:
 |  #0:  (&pcp->lock){+.+.}-{3:3}, at: get_page_from_freelist+0x218/0x12c8
 |  #1:  (rcu_read_lock){....}-{1:3}, at: rt_spin_trylock+0x48/0xf0
 |  #2:  (&zone->lock){+.+.}-{3:3}, at: rmqueue_bulk+0x64/0xa80
 | irq event stamp: 0
 | hardirqs last  enabled at (0):  0x0
 | hardirqs last disabled at (0):  copy_process+0x5dc/0x1ab8
 | softirqs last  enabled at (0):  copy_process+0x5dc/0x1ab8
 | softirqs last disabled at (0):  0x0
 | Preemption disabled at:
 |  migrate_enable+0x30/0x130
 | CPU: 111 PID: 0 Comm: swapper/111 Tainted: G        W          6.0.0-rc4-rt6-[...]
 | Call trace:
 |  __kmalloc+0xbc/0x1e8
 |  detect_cache_attributes+0x2d4/0x5f0
 |  update_siblings_masks+0x30/0x368
 |  store_cpu_topology+0x78/0xb8
 |  secondary_start_kernel+0xd0/0x198
 |  __secondary_switched+0xb0/0xb4

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230104183033.755668-7-pierre.gondois@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-01-18 09:58:40 +00:00
Conor Dooley 80c200b34e
RISC-V: resort all extensions in consistent orders
Ordering between each and every list of extensions is wildly
inconsistent. Per discussion on the lists pick the following policy:

- The array defining order in /proc/cpuinfo follows a narrow
  interpretation of the ISA specifications, described in a comment
  immediately presiding it.

- All other lists of extensions are sorted alphabetically.

This will hopefully allow for easier review & future additions, and
reduce conflicts between patchsets as the number of extensions grows.

Link: https://lore.kernel.org/all/20221129144742.2935581-2-conor.dooley@microchip.com/
Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221205144525.2148448-3-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-17 22:05:32 -08:00
Conor Dooley 99e2266f24
RISC-V: clarify ISA string ordering rules in cpu.c
While the current list of rules may have been accurate when created
it now lacks some clarity in the face of isa-manual updates. Instead of
trying to continuously align this rule-set with the one in the
specifications, change the role of this comment.

This particular comment is important, as the array it "decorates"
defines the order in which the ISA string appears to userspace in
/proc/cpuinfo.

Re-jig and strengthen the wording to provide contributors with a set
order in which to add entries & note why this particular struct needs
more attention than others.

While in the area, add some whitespace and tweak some wording for
readability's sake.

Suggested-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221205144525.2148448-2-conor.dooley@microchip.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-17 22:05:31 -08:00
Pierre Gondois c3719bd9ee cacheinfo: Use RISC-V's init_cache_level() as generic OF implementation
RISC-V's implementation of init_of_cache_level() is following
the Devicetree Specification v0.3 regarding caches, cf.:
- s3.7.3 'Internal (L1) Cache Properties'
- s3.8 'Multi-level and Shared Cache Nodes'

Allow reusing the implementation by moving it.

Also make 'levels', 'leaves' and 'level' unsigned int.

Signed-off-by: Pierre Gondois <pierre.gondois@arm.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Link: https://lore.kernel.org/r/20230104183033.755668-2-pierre.gondois@arm.com
Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2023-01-17 21:59:52 +00:00
Peter Zijlstra 89b3098703 arch/idle: Change arch_cpu_idle() behavior: always exit with IRQs disabled
Current arch_cpu_idle() is called with IRQs disabled, but will return
with IRQs enabled.

However, the very first thing the generic code does after calling
arch_cpu_idle() is raw_local_irq_disable(). This means that
architectures that can idle with IRQs disabled end up doing a
pointless 'enable-disable' dance.

Therefore, push this IRQ disabling into the idle function, meaning
that those architectures can avoid the pointless IRQ state flipping.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Gautham R. Shenoy <gautham.shenoy@amd.com>
Acked-by: Mark Rutland <mark.rutland@arm.com> [arm64]
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Guo Ren <guoren@kernel.org>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.618076436@infradead.org
2023-01-13 11:48:15 +01:00
Peter Zijlstra 2b5a0e425e objtool/idle: Validate __cpuidle code as noinstr
Idle code is very like entry code in that RCU isn't available. As
such, add a little validation.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Tony Lindgren <tony@atomide.com>
Tested-by: Ulf Hansson <ulf.hansson@linaro.org>
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Link: https://lore.kernel.org/r/20230112195540.373461409@infradead.org
2023-01-13 11:48:15 +01:00
Björn Töpel b2d473a601
riscv, kprobes: Stricter c.jr/c.jalr decoding
In the compressed instruction extension, c.jr, c.jalr, c.mv, and c.add
is encoded the following way (each instruction is 16b):

---+-+-----------+-----------+--
100 0 rs1[4:0]!=0       00000 10 : c.jr
100 1 rs1[4:0]!=0       00000 10 : c.jalr
100 0  rd[4:0]!=0 rs2[4:0]!=0 10 : c.mv
100 1  rd[4:0]!=0 rs2[4:0]!=0 10 : c.add

The following logic is used to decode c.jr and c.jalr:

  insn & 0xf007 == 0x8002 => instruction is an c.jr
  insn & 0xf007 == 0x9002 => instruction is an c.jalr

When 0xf007 is used to mask the instruction, c.mv can be incorrectly
decoded as c.jr, and c.add as c.jalr.

Correct the decoding by changing the mask from 0xf007 to 0xf07f.

Fixes: c22b0bcb1d ("riscv: Add kprobes supported")
Signed-off-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20230102160748.1307289-1-bjorn@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2023-01-05 12:30:41 -08:00
Heiko Stuebner 27c653c065
RISC-V: fix auipc-jalr addresses in patched alternatives
Alternatives live in a different section, so addresses used by call
functions will point to wrong locations after the patch got applied.

Similar to arm64, adjust the location to consider that offset.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Tested-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-13-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29 06:59:52 -08:00
Heiko Stuebner debe28d0a5
RISC-V: kprobes: use central defined funct3 constants
Don't redefine values that are already available in the central header
asm/insn.h . Use the values from there instead.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-9-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29 06:59:48 -08:00
Heiko Stuebner c9c1af3f18
RISC-V: rename parse_asm.h to insn.h
The current parse_asm header should become a more centralized place
for everything concerning parsing and constructing instructions.

We already have a header insn-def.h similar to aarch64, so rename
parse_asm.h to insn.h (again similar to aarch64) to show that it's
meant for more than simple instruction parsing.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-8-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29 06:59:47 -08:00
Heiko Stuebner ec5f908775
RISC-V: Move riscv_insn_is_* macros into a common header
Right now the riscv kernel has (at least) two independent sets
of functions to check if an encoded instruction is of a specific
type. One in kgdb and one kprobes simulate-insn code.

More parts of the kernel will probably need this in the future,
so instead of allowing this duplication to go on further,
move macros that do the function declaration in a common header,
similar to at least aarch64.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-7-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29 06:59:46 -08:00
Heiko Stuebner bf0cc402d7
RISC-V: add prefix to all constants/macros in parse_asm.h
Some of the constants and macros already have suitable RV_, RVG_ or
RVC_ prefixes.

Extend this to the rest of the file as well, as we want to use these
things in a broader scope soon.

Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Signed-off-by: Heiko Stuebner <heiko.stuebner@vrull.eu>
Link: https://lore.kernel.org/r/20221223221332.4127602-3-heiko@sntech.de
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-29 06:59:42 -08:00
Linus Torvalds 7a5189c58b KVM/riscv changes for 6.2
* Allow unloading KVM module
 
 * Allow KVM user-space to set mvendorid, marchid, and mimpid
 
 * Several fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmOhy+QUHHBib256aW5p
 QHJlZGhhdC5jb20ACgkQv/vSX3jHroOdUwf+K3i8RHW1H8TF/JSrn1I6nURNLYhb
 2wXzl3esOsfswtn6dxEvLEXivcKmD2G9bLpa2UIa3vw1Plg9tdce9IJ5qDodtxVL
 mlISMUSgMNy+lelKJiG+l5Ld4oJ4HUY0yw/p3Ml9WUpra98UCB0sJ+FsqXr4ndi9
 LxkQJrNyZkQcRH2IXjQhKjdjkepFTmkhKs/uCxAZvW9zfUmGX0dcp9W22PTbsapQ
 IcaBKdVaNN3TXNSIdDCM2Iv+oBN7gJn1CbgFxhkp4L8eE5PvRjFw0QooFMn2TjDw
 VflP3gIs/41+5tnoPWXGAkKFe/Z5aJjGjx6Yx0WnEEgoAG47RUHYsKIUjw==
 =8ejV
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull RISC-V kvm updates from Paolo Bonzini:

 - Allow unloading KVM module

 - Allow KVM user-space to set mvendorid, marchid, and mimpid

 - Several fixes and cleanups

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  RISC-V: KVM: Add ONE_REG interface for mvendorid, marchid, and mimpid
  RISC-V: KVM: Save mvendorid, marchid, and mimpid when creating VCPU
  RISC-V: Export sbi_get_mvendorid() and friends
  RISC-V: KVM: Move sbi related struct and functions to kvm_vcpu_sbi.h
  RISC-V: KVM: Use switch-case in kvm_riscv_vcpu_set/get_reg()
  RISC-V: KVM: Remove redundant includes of asm/csr.h
  RISC-V: KVM: Remove redundant includes of asm/kvm_vcpu_timer.h
  RISC-V: KVM: Fix reg_val check in kvm_riscv_vcpu_set_reg_config()
  RISC-V: KVM: Simplify kvm_arch_prepare_memory_region()
  RISC-V: KVM: Exit run-loop immediately if xfer_to_guest fails
  RISC-V: KVM: use vma_lookup() instead of find_vma_intersection()
  RISC-V: KVM: Add exit logic to main.c
2022-12-21 18:52:15 -08:00
Linus Torvalds eb67d239f3 RISC-V Patches for the 6.2 Merge Window, Part 1
* Support for the T-Head PMU via the perf subsystem.
 * ftrace support for rv32.
 * Support for non-volatile memory devices.
 * Various fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmOZ6WsTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiWGcD/wLGiHq3ekQhl5D+CaA1WlJ5XzQFfY2
 bv1ZCZGdjuiv66jiMlmEsbpfUCk3bSAIjCO3MHQNDmTuPJztCHVJXOHbZFWItzzO
 soW4nXHKW1sGHa7hDLGQUPkltA48OdPoyqEDvlnpyEWFT+2xHwdFEURWE85FXGeq
 ZzFSKUQqX/V52n9TS4M4QtmNnQatR3TgIs8ttzD4JqwWFBbp4/iBfIGt6n3W24XH
 9lKWikO4YOYUPl0KVIakM4d8NmX7g+7vhCKWavLke1fF/IQOlyWwA0eM8ryj33OG
 L1nFkqfF3mCw9i72WHftlc0rAgVqcYS8ntnQkPNpt2zPp3xFjDwEy+XiZrRE+sAp
 m5Ma2Tkw7G3ueBtXwP1yo+EKa7PrVFbCRD/rEpLJAC6+9ktvc7cYs39E08O+wrwT
 qkYThDolovqMOqfOq6afEGy5lfIa5U00vxK+3MXiE3eLEjHSJhwTXadUbwyMjJWE
 zOwA6p5NfDFzklESSNTtIBY85Zlh/g2q6GWCy7yBQnlaSdbpDxcnAlSZipq66Iqm
 9ytdZiHid4BIRQxr5qyXTB184BvFnWNRs9NGhCj38uLEnuxwSChzwoh/WPDxLNte
 U9ouvwJO5U2qAZsMGJhY8W2s/9WvWpSqRhSMA/nnNV1Hh+URFz8rFXAln6kNn//v
 j+cYGCyjLnO1hg==
 =4Ak2
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.2-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support for the T-Head PMU via the perf subsystem

 - ftrace support for rv32

 - Support for non-volatile memory devices

 - Various fixes and cleanups

* tag 'riscv-for-linus-6.2-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (52 commits)
  Documentation: RISC-V: patch-acceptance: s/implementor/implementer
  Documentation: RISC-V: Mention the UEFI Standards
  Documentation: RISC-V: Allow patches for non-standard behavior
  Documentation: RISC-V: Fix a typo in patch-acceptance
  riscv: Fixup compile error with !MMU
  riscv: Fix P4D_SHIFT definition for 3-level page table mode
  riscv: Apply a static assert to riscv_isa_ext_id
  RISC-V: Add some comments about the shadow and overflow stacks
  RISC-V: Align the shadow stack
  RISC-V: Ensure Zicbom has a valid block size
  RISC-V: Introduce riscv_isa_extension_check
  RISC-V: Improve use of isa2hwcap[]
  riscv: Don't duplicate _ALTERNATIVE_CFG* macros
  riscv: alternatives: Drop the underscores from the assembly macro names
  riscv: alternatives: Don't name unused macro parameters
  riscv: Don't duplicate __ALTERNATIVE_CFG in __ALTERNATIVE_CFG_2
  riscv: mm: call best_map_size many times during linear-mapping
  riscv: Move cast inside kernel_mapping_[pv]a_to_[vp]a
  riscv: Fix crash during early errata patching
  riscv: boot: add zstd support
  ...
2022-12-14 15:23:49 -08:00
Linus Torvalds fc4c9f4504 EFI updates for v6.2:
- Refactor the zboot code so that it incorporates all the EFI stub
   logic, rather than calling the decompressed kernel as a EFI app.
 - Add support for initrd= command line option to x86 mixed mode.
 - Allow initrd= to be used with arbitrary EFI accessible file systems
   instead of just the one the kernel itself was loaded from.
 - Move some x86-only handling and manipulation of the EFI memory map
   into arch/x86, as it is not used anywhere else.
 - More flexible handling of any random seeds provided by the boot
   environment (i.e., systemd-boot) so that it becomes available much
   earlier during the boot.
 - Allow improved arch-agnostic EFI support in loaders, by setting a
   uniform baseline of supported features, and adding a generic magic
   number to the DOS/PE header. This should allow loaders such as GRUB or
   systemd-boot to reduce the amount of arch-specific handling
   substantially.
 - (arm64) Run EFI runtime services from a dedicated stack, and use it to
   recover from synchronous exceptions that might occur in the firmware
   code.
 - (arm64) Ensure that we don't allocate memory outside of the 48-bit
   addressable physical range.
 - Make EFI pstore record size configurable
 - Add support for decoding CXL specific CPER records
 -----BEGIN PGP SIGNATURE-----
 
 iQGzBAABCgAdFiEE+9lifEBpyUIVN1cpw08iOZLZjyQFAmOTQ1cACgkQw08iOZLZ
 jyQRkAv+LqaZFWeVwhAQHiw/N3RnRM0nZHea6++D2p1y/ZbCpwv3pdLl2YHQ1KmW
 wDG9Nr4C1ITLtfy1YZKeYpwloQtq9S1GZDWnFpVv/hdo7L924eRAwIlxowWn1OnP
 ruxv2PaYXyb0plh1YD1f6E1BqrfUOtajET55Kxs9ZsxmnMtDpIX3NiYy4LKMBIZC
 +Eywt41M3uBX+wgmSujFBMVVJjhOX60WhUYXqy0RXwDKOyrz/oW5td+eotSCreB6
 FVbjvwQvUdtzn4s1FayOMlTrkxxLw4vLhsaUGAdDOHd3rg3sZT9Xh1HqFFD6nss6
 ZAzAYQ6BzdiV/5WSB9meJe+BeG1hjTNKjJI6JPO2lctzYJqlnJJzI6JzBuH9vzQ0
 dffLB8NITeEW2rphIh+q+PAKFFNbXWkJtV4BMRpqmzZ/w7HwupZbUXAzbWE8/5km
 qlFpr0kmq8GlVcbXNOFjmnQVrJ8jPYn+O3AwmEiVAXKZJOsMH0sjlXHKsonme9oV
 Sk71c6Em
 =JEXz
 -----END PGP SIGNATURE-----

Merge tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI updates from Ard Biesheuvel:
 "Another fairly sizable pull request, by EFI subsystem standards.

  Most of the work was done by me, some of it in collaboration with the
  distro and bootloader folks (GRUB, systemd-boot), where the main focus
  has been on removing pointless per-arch differences in the way EFI
  boots a Linux kernel.

   - Refactor the zboot code so that it incorporates all the EFI stub
     logic, rather than calling the decompressed kernel as a EFI app.

   - Add support for initrd= command line option to x86 mixed mode.

   - Allow initrd= to be used with arbitrary EFI accessible file systems
     instead of just the one the kernel itself was loaded from.

   - Move some x86-only handling and manipulation of the EFI memory map
     into arch/x86, as it is not used anywhere else.

   - More flexible handling of any random seeds provided by the boot
     environment (i.e., systemd-boot) so that it becomes available much
     earlier during the boot.

   - Allow improved arch-agnostic EFI support in loaders, by setting a
     uniform baseline of supported features, and adding a generic magic
     number to the DOS/PE header. This should allow loaders such as GRUB
     or systemd-boot to reduce the amount of arch-specific handling
     substantially.

   - (arm64) Run EFI runtime services from a dedicated stack, and use it
     to recover from synchronous exceptions that might occur in the
     firmware code.

   - (arm64) Ensure that we don't allocate memory outside of the 48-bit
     addressable physical range.

   - Make EFI pstore record size configurable

   - Add support for decoding CXL specific CPER records"

* tag 'efi-next-for-v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: (43 commits)
  arm64: efi: Recover from synchronous exceptions occurring in firmware
  arm64: efi: Execute runtime services from a dedicated stack
  arm64: efi: Limit allocations to 48-bit addressable physical region
  efi: Put Linux specific magic number in the DOS header
  efi: libstub: Always enable initrd command line loader and bump version
  efi: stub: use random seed from EFI variable
  efi: vars: prohibit reading random seed variables
  efi: random: combine bootloader provided RNG seed with RNG protocol output
  efi/cper, cxl: Decode CXL Error Log
  efi/cper, cxl: Decode CXL Protocol Error Section
  efi: libstub: fix efi_load_initrd_dev_path() kernel-doc comment
  efi: x86: Move EFI runtime map sysfs code to arch/x86
  efi: runtime-maps: Clarify purpose and enable by default for kexec
  efi: pstore: Add module parameter for setting the record size
  efi: xen: Set EFI_PARAVIRT for Xen dom0 boot on all architectures
  efi: memmap: Move manipulation routines into x86 arch tree
  efi: memmap: Move EFI fake memmap support into x86 arch tree
  efi: libstub: Undeprecate the command line initrd loader
  efi: libstub: Add mixed mode support to command line initrd loader
  efi: libstub: Permit mixed mode return types other than efi_status_t
  ...
2022-12-13 14:31:47 -08:00
Linus Torvalds 0a1d4434db Updates for timers, timekeeping and drivers:
- Core:
 
    - The timer_shutdown[_sync]() infrastructure:
 
      Tearing down timers can be tedious when there are circular
      dependencies to other things which need to be torn down. A prime
      example is timer and workqueue where the timer schedules work and the
      work arms the timer.
 
      What needs to prevented is that pending work which is drained via
      destroy_workqueue() does not rearm the previously shutdown
      timer. Nothing in that shutdown sequence relies on the timer being
      functional.
 
      The conclusion was that the semantics of timer_shutdown_sync() should
      be:
 
 	- timer is not enqueued
     	- timer callback is not running
     	- timer cannot be rearmed
 
      Preventing the rearming of shutdown timers is done by discarding rearm
      attempts silently. A warning for the case that a rearm attempt of a
      shutdown timer is detected would not be really helpful because it's
      entirely unclear how it should be acted upon. The only way to address
      such a case is to add 'if (in_shutdown)' conditionals all over the
      place. This is error prone and in most cases of teardown not required
      all.
 
    - The real fix for the bluetooth HCI teardown based on
      timer_shutdown_sync().
 
      A larger scale conversion to timer_shutdown_sync() is work in
      progress.
 
    - Consolidation of VDSO time namespace helper functions
 
    - Small fixes for timer and timerqueue
 
  - Drivers:
 
    - Prevent integer overflow on the XGene-1 TVAL register which causes
      an never ending interrupt storm.
 
    - The usual set of new device tree bindings
 
    - Small fixes and improvements all over the place
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCgAxFiEEQp8+kY+LLUocC4bMphj1TA10mKEFAmOUuC0THHRnbHhAbGlu
 dXRyb25peC5kZQAKCRCmGPVMDXSYodpZD/9kCDi009n65QFF1J4kE5aZuABbRMtO
 7sy66fJpDyB/MtcbPPH29uzQUEs1VMTQVB+ZM+7e1YGoxSWuSTzeoFH+yK1w4tEZ
 VPbOcvUEjG0esKUehwYFeOjSnIjy6M1Y41aOUaDnq00/azhfTrzLxQA1BbbFbkpw
 S7u2hllbyRJ8KdqQyV9cVpXmze6fcpdtNhdQeoA7qQCsSPnJ24MSpZ/PG9bAovq8
 75IRROT7CQRd6AMKAVpA9Ov8ak9nbY3EgQmoKcp5ZXfXz8kD3nHky9Lste7djgYB
 U085Vwcelt39V5iXevDFfzrBYRUqrMKOXIf2xnnoDNeF5Jlj5gChSNVZwTLO38wu
 RFEVCjCjuC41GQJWSck9LRSYdriW/htVbEE8JLc6uzUJGSyjshgJRn/PK4HjpiLY
 AvH2rd4rAap/rjDKvfWvBqClcfL7pyBvavgJeyJ8oXyQjHrHQwapPcsMFBm0Cky5
 soF0Lr3hIlQ9u+hwUuFdNZkY9mOg09g9ImEjW1AZTKY0DfJMc5JAGjjSCfuopVUN
 Uf/qqcUeQPSEaC+C9xiFs0T3svYFxBqpgPv4B6t8zAnozon9fyZs+lv5KdRg4X77
 qX395qc6PaOSQlA7gcxVw3vjCPd0+hljXX84BORP7z+uzcsomvIH1MxJepIHmgaJ
 JrYbSZ5qzY5TTA==
 =JlDe
 -----END PGP SIGNATURE-----

Merge tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer updates from Thomas Gleixner:
 "Updates for timers, timekeeping and drivers:

  Core:

   - The timer_shutdown[_sync]() infrastructure:

     Tearing down timers can be tedious when there are circular
     dependencies to other things which need to be torn down. A prime
     example is timer and workqueue where the timer schedules work and
     the work arms the timer.

     What needs to prevented is that pending work which is drained via
     destroy_workqueue() does not rearm the previously shutdown timer.
     Nothing in that shutdown sequence relies on the timer being
     functional.

     The conclusion was that the semantics of timer_shutdown_sync()
     should be:
	- timer is not enqueued
    	- timer callback is not running
    	- timer cannot be rearmed

     Preventing the rearming of shutdown timers is done by discarding
     rearm attempts silently.

     A warning for the case that a rearm attempt of a shutdown timer is
     detected would not be really helpful because it's entirely unclear
     how it should be acted upon. The only way to address such a case is
     to add 'if (in_shutdown)' conditionals all over the place. This is
     error prone and in most cases of teardown not required all.

   - The real fix for the bluetooth HCI teardown based on
     timer_shutdown_sync().

     A larger scale conversion to timer_shutdown_sync() is work in
     progress.

   - Consolidation of VDSO time namespace helper functions

   - Small fixes for timer and timerqueue

  Drivers:

   - Prevent integer overflow on the XGene-1 TVAL register which causes
     an never ending interrupt storm.

   - The usual set of new device tree bindings

   - Small fixes and improvements all over the place"

* tag 'timers-core-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (34 commits)
  dt-bindings: timer: renesas,cmt: Add r8a779g0 CMT support
  dt-bindings: timer: renesas,tmu: Add r8a779g0 support
  clocksource/drivers/arm_arch_timer: Use kstrtobool() instead of strtobool()
  clocksource/drivers/timer-ti-dm: Fix missing clk_disable_unprepare in dmtimer_systimer_init_clock()
  clocksource/drivers/timer-ti-dm: Clear settings on probe and free
  clocksource/drivers/timer-ti-dm: Make timer_get_irq static
  clocksource/drivers/timer-ti-dm: Fix warning for omap_timer_match
  clocksource/drivers/arm_arch_timer: Fix XGene-1 TVAL register math error
  clocksource/drivers/timer-npcm7xx: Enable timer 1 clock before use
  dt-bindings: timer: nuvoton,npcm7xx-timer: Allow specifying all clocks
  dt-bindings: timer: rockchip: Add rockchip,rk3128-timer
  clockevents: Repair kernel-doc for clockevent_delta2ns()
  clocksource/drivers/ingenic-ost: Define pm functions properly in platform_driver struct
  clocksource/drivers/sh_cmt: Access registers according to spec
  vdso/timens: Refactor copy-pasted find_timens_vvar_page() helper into one copy
  Bluetooth: hci_qca: Fix the teardown problem for real
  timers: Update the documentation to reflect on the new timer_shutdown() API
  timers: Provide timer_shutdown[_sync]()
  timers: Add shutdown mechanism to the internal functions
  timers: Split [try_to_]del_timer[_sync]() to prepare for shutdown mode
  ...
2022-12-12 12:52:02 -08:00
Palmer Dabbelt c3ec1e8964
Merge patch series "RISC-V: Align the shadow stack"
Palmer Dabbelt <palmer@rivosinc.com> says:

This contains a pair of cleanups that depend on a fix that has already
landed upstream.

* b4-shazam-merge:
  RISC-V: Add some comments about the shadow and overflow stacks
  RISC-V: Align the shadow stack
  riscv: fix race when vmap stack overflow

Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-12 09:30:37 -08:00
Palmer Dabbelt de57ecc476
RISC-V: Add some comments about the shadow and overflow stacks
It took me a while to page all this back in when trying to review the
recent spin_shadow_stack, so I figured I'd just write up some comments.

Reviewed-by: Guo Ren <guoren@kernel.org>
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221130023515.20217-2-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-12 09:30:35 -08:00
Palmer Dabbelt b003b3b77d
RISC-V: Align the shadow stack
The standard RISC-V ABIs all require 16-byte stack alignment.  We're
only calling that one function on the shadow stack so I doubt it'd
result in a real issue, but might as well keep this lined up.

Fixes: 31da94c25a ("riscv: add VMAP_STACK overflow detection")
Reviewed-by: Jisheng Zhang <jszhang@kernel.org>
Link: https://lore.kernel.org/r/20221130023515.20217-1-palmer@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-12 09:30:34 -08:00
Palmer Dabbelt 59a582ad13
Merge patch series "RISC-V: Ensure Zicbom has a valid block size"
Andrew Jones <ajones@ventanamicro.com> says:

When a DT puts zicbom in the isa string, but does not provide a block
size, ALT_CMO_OP() will attempt to do cache operations on address
zero since the start address will be ANDed with zero. We can't simply
BUG() in riscv_init_cbom_blocksize() when we fail to find a block
size because the failure will happen before logging works, leaving
users to scratch their heads as to why the boot hung. Instead, ensure
Zicbom is disabled and output an error which will hopefully alert
people that the DT needs to be fixed. While at it, add a check that
the block size is a power-of-2 too.

* b4-shazam-merge:
  RISC-V: Ensure Zicbom has a valid block size
  RISC-V: Introduce riscv_isa_extension_check
  RISC-V: Improve use of isa2hwcap[]

Link: https://lore.kernel.org/r/20221129143447.49714-1-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09 19:13:01 -08:00
Andrew Jones 9daaca4a44
RISC-V: Ensure Zicbom has a valid block size
When a DT puts zicbom in the isa string, but does not provide a block
size, ALT_CMO_OP() will attempt to do cache operations on address
zero since the start address will be ANDed with zero. We can't simply
BUG() in riscv_init_cbom_blocksize() when we fail to find a block
size because the failure will happen before logging works, leaving
users to scratch their heads as to why the boot hung. Instead, ensure
Zicbom is disabled and output an error which will hopefully alert
people that the DT needs to be fixed. While at it, add a check that
the block size is a power-of-2 too.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221129143447.49714-4-ajones@ventanamicro.com
[Palmer: base on 5c20a3a9df ("RISC-V: Fix compilation without RISCV_ISA_ZICBOM"]
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09 19:12:52 -08:00
Andrew Jones fb0ff0a95d
RISC-V: Introduce riscv_isa_extension_check
Currently any isa extension found in the isa string is set in the
isa bitmap. An isa extension set in the bitmap indicates that the
extension is present and may be used (a.k.a is enabled). However,
when an extension cannot be used due to missing dependencies or
errata it should not be added to the bitmap. Introduce a function
where additional checks may be placed in order to determine if an
extension should be enabled or not.

Note, the checks may simply indicate an issue with the DT, but,
since extensions may be used in early boot, it's not always possible
to simply produce an error at the point the issue is determined.
It's best to keep the extension disabled and produce an error.

No functional change intended, as the function is only introduced
and always returns true. A later patch will provide checks for an
isa extension.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221129143447.49714-3-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09 19:09:09 -08:00
Andrew Jones 726855549c
RISC-V: Improve use of isa2hwcap[]
Improve isa2hwcap[] by removing it from static storage, as
riscv_fill_hwcap() is only called once, and by reducing its size
from 256 bytes to 26. The latter improvement is possible because
isa2hwcap[] will never be indexed with capital letters and we can
precompute the offsets from 'a'.

No functional change intended.

Signed-off-by: Andrew Jones <ajones@ventanamicro.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Heiko Stuebner <heiko@sntech.de>
Link: https://lore.kernel.org/r/20221129143447.49714-2-ajones@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-09 19:09:08 -08:00
Palmer Dabbelt e1ceb09641
Merge patch "RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path"
I'm merging this in as a single patch to make it easier to handle the
backports.

* b4-shazam-merge:
  RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path

Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08 15:31:44 -08:00
Andrew Bresticker b0f4c74ead
RISC-V: Fix unannoted hardirqs-on in return to userspace slow-path
The return to userspace path in entry.S may enable interrupts without the
corresponding lockdep annotation, producing a splat[0] when DEBUG_LOCKDEP
is enabled. Simply calling __trace_hardirqs_on() here gets a bit messy
due to the use of RA to point back to ret_from_exception, so just move
the whole slow-path loop into C. It's more readable and it lets us use
local_irq_{enable,disable}(), avoiding the need for manual annotations
altogether.

[0]:
  ------------[ cut here ]------------
  DEBUG_LOCKS_WARN_ON(!lockdep_hardirqs_enabled())
  WARNING: CPU: 2 PID: 1 at kernel/locking/lockdep.c:5512 check_flags+0x10a/0x1e0
  Modules linked in:
  CPU: 2 PID: 1 Comm: init Not tainted 6.1.0-rc4-00160-gb56b6e2b4f31 #53
  Hardware name: riscv-virtio,qemu (DT)
  epc : check_flags+0x10a/0x1e0
  ra : check_flags+0x10a/0x1e0
  <snip>
   status: 0000000200000100 badaddr: 0000000000000000 cause: 0000000000000003
  [<ffffffff808edb90>] lock_is_held_type+0x78/0x14e
  [<ffffffff8003dae2>] __might_resched+0x26/0x22c
  [<ffffffff8003dd24>] __might_sleep+0x3c/0x66
  [<ffffffff80022c60>] get_signal+0x9e/0xa70
  [<ffffffff800054a2>] do_notify_resume+0x6e/0x422
  [<ffffffff80003c68>] ret_from_exception+0x0/0x10
  irq event stamp: 44512
  hardirqs last  enabled at (44511): [<ffffffff808f901c>] _raw_spin_unlock_irqrestore+0x54/0x62
  hardirqs last disabled at (44512): [<ffffffff80008200>] __trace_hardirqs_off+0xc/0x14
  softirqs last  enabled at (44472): [<ffffffff808f9fbe>] __do_softirq+0x3de/0x51e
  softirqs last disabled at (44467): [<ffffffff80017760>] irq_exit+0xd6/0x104
  ---[ end trace 0000000000000000 ]---
  possible reason: unannotated irqs-on.

Signed-off-by: Andrew Bresticker <abrestic@rivosinc.com>
Fixes: 3c46979829 ("riscv: Enable LOCKDEP_SUPPORT & fixup TRACE_IRQFLAGS_SUPPORT")
Link: https://lore.kernel.org/r/20221111223108.1976562-1-abrestic@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-08 15:31:41 -08:00
Ard Biesheuvel d9f26ae731 Linux 6.1-rc8
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmONI6weHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG9xgH/jqXGuMoO1ikfmGb
 7oY0W/f69G9V/e0DxFLvnIjhFgCUzdnNsmD4jQJA4x6QsxwLWuvpI282Ez+bHV5T
 U4RPsxJZIIMsXE2lKM9BRgeLzDdCt0aK4Pj+3x2x7NZC5cWFSQ8PyQJkCwg+0PQo
 u8Ly+GO8c4RUMf4/rrAZQq16qZUqGDaGm1EJhtSoa+KiR81LmUUmbDIK9Mr53rmQ
 wou+95XhibwMWr17WgXA28bTgYqn9UGr67V3qvTH2LC7GW8BCoKvn+3wh6TVhlWj
 dsWplXgcOP0/OHvSC5Sb1Uibk5Gx3DlIzYa6OfNZQuZ5xmQqm9kXjW8lmYpWFHy/
 38/5HWc=
 =EuoA
 -----END PGP SIGNATURE-----

Merge tag 'v6.1-rc8' into efi/next

Linux 6.1-rc8
2022-12-07 19:08:57 +01:00
Anup Patel a1a44e227c RISC-V: Export sbi_get_mvendorid() and friends
The sbi_get_mvendorid(), sbi_get_marchid(), and sbi_get_mimpid()
can be used by KVM module so let us export these functions.

Signed-off-by: Anup Patel <apatel@ventanamicro.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Reviewed-by: Andrew Jones <ajones@ventanamicro.com>
Acked-by: Palmer Dabbelt <palmer@rivosinc.com>
Signed-off-by: Anup Patel <anup@brainfault.org>
2022-12-07 09:17:38 +05:30
Guo Ren 7ecdadf7f8
riscv: stacktrace: Make walk_stackframe cross pt_regs frame
The current walk_stackframe with FRAME_POINTER would stop unwinding at
ret_from_exception:
  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
  CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
  Call Trace:
  [<ffffffe0002038c8>] walk_stackframe+0x0/0xee
  [<ffffffe000aecf48>] show_stack+0x32/0x4a
  [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e
  [<ffffffe000af1648>] dump_stack+0x14/0x1c
  [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138
  [<ffffffe000239aec>] __might_sleep+0x10/0x18
  [<ffffffe000afe3fe>] down_read+0x22/0xa4
  [<ffffffe000207588>] do_page_fault+0xb0/0x2fe
  [<ffffffe000201b80>] ret_from_exception+0x0/0xc

The optimization would help walk_stackframe cross the pt_regs frame and
get more backtrace of debug info:
  BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:1518
  in_atomic(): 0, irqs_disabled(): 1, non_block: 0, pid: 1, name: init
  CPU: 0 PID: 1 Comm: init Not tainted 5.10.113-00021-g15c15974895c-dirty #192
  Call Trace:
  [<ffffffe0002038c8>] walk_stackframe+0x0/0xee
  [<ffffffe000aecf48>] show_stack+0x32/0x4a
  [<ffffffe000af1618>] dump_stack_lvl+0x72/0x8e
  [<ffffffe000af1648>] dump_stack+0x14/0x1c
  [<ffffffe000239ad2>] ___might_sleep+0x12e/0x138
  [<ffffffe000239aec>] __might_sleep+0x10/0x18
  [<ffffffe000afe3fe>] down_read+0x22/0xa4
  [<ffffffe000207588>] do_page_fault+0xb0/0x2fe
  [<ffffffe000201b80>] ret_from_exception+0x0/0xc
  [<ffffffe000613c06>] riscv_intc_irq+0x1a/0x72
  [<ffffffe000201b80>] ret_from_exception+0x0/0xc
  [<ffffffe00033f44a>] vma_link+0x54/0x160
  [<ffffffe000341d7a>] mmap_region+0x2cc/0x4d0
  [<ffffffe000342256>] do_mmap+0x2d8/0x3ac
  [<ffffffe000326318>] vm_mmap_pgoff+0x70/0xb8
  [<ffffffe00032638a>] vm_mmap+0x2a/0x36
  [<ffffffe0003cfdde>] elf_map+0x72/0x84
  [<ffffffe0003d05f8>] load_elf_binary+0x69a/0xec8
  [<ffffffe000376240>] bprm_execve+0x246/0x53a
  [<ffffffe00037786c>] kernel_execve+0xe8/0x124
  [<ffffffe000aecdf2>] run_init_process+0xfa/0x10c
  [<ffffffe000aece16>] try_to_run_init_process+0x12/0x3c
  [<ffffffe000afa920>] kernel_init+0xb4/0xf8
  [<ffffffe000201b80>] ret_from_exception+0x0/0xc

Here is the error injection test code for the above output:
 drivers/irqchip/irq-riscv-intc.c:
 static asmlinkage void riscv_intc_irq(struct pt_regs *regs)
 {
        unsigned long cause = regs->cause & ~CAUSE_IRQ_FLAG;
+       u32 tmp; __get_user(tmp, (u32 *)0);

Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221109064937.3643993-3-guoren@kernel.org
[Palmer: use SYM_CODE_*]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05 18:13:34 -08:00
Guo Ren 5c3022e4a6
riscv: stacktrace: Fixup ftrace_graph_ret_addr retp argument
The 'retp' is a pointer to the return address on the stack, so we
must pass the current return address pointer as the 'retp'
argument to ftrace_push_return_trace(). Not parent function's
return address on the stack.

Fixes: b785ec129b ("riscv/ftrace: Add HAVE_FUNCTION_GRAPH_RET_ADDR_PTR support")
Signed-off-by: Guo Ren <guoren@linux.alibaba.com>
Signed-off-by: Guo Ren <guoren@kernel.org>
Link: https://lore.kernel.org/r/20221109064937.3643993-2-guoren@kernel.org
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05 16:58:01 -08:00
Li Huafei cbc32023dd
RISC-V: kexec: Fix memory leak of elf header buffer
This is reported by kmemleak detector:

unreferenced object 0xff2000000403d000 (size 4096):
  comm "kexec", pid 146, jiffies 4294900633 (age 64.792s)
  hex dump (first 32 bytes):
    7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00  .ELF............
    04 00 f3 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000566ca97c>] kmemleak_vmalloc+0x3c/0xbe
    [<00000000979283d8>] __vmalloc_node_range+0x3ac/0x560
    [<00000000b4b3712a>] __vmalloc_node+0x56/0x62
    [<00000000854f75e2>] vzalloc+0x2c/0x34
    [<00000000e9a00db9>] crash_prepare_elf64_headers+0x80/0x30c
    [<0000000067e8bf48>] elf_kexec_load+0x3e8/0x4ec
    [<0000000036548e09>] kexec_image_load_default+0x40/0x4c
    [<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322
    [<0000000040c62c03>] ret_from_syscall+0x0/0x2

In elf_kexec_load(), a buffer is allocated via vzalloc() to store elf
headers.  While it's not freed back to system when kdump kernel is
reloaded or unloaded, or when image->elf_header is successfully set and
then fails to load kdump kernel for some reason. Fix it by freeing the
buffer in arch_kimage_file_post_load_cleanup().

Fixes: 8acea455fa ("RISC-V: Support for kexec_file on panic")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221104095658.141222-2-lihuafei1@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05 10:59:58 -08:00
Li Huafei 96df59b1ae
RISC-V: kexec: Fix memory leak of fdt buffer
This is reported by kmemleak detector:

unreferenced object 0xff60000082864000 (size 9588):
  comm "kexec", pid 146, jiffies 4294900634 (age 64.788s)
  hex dump (first 32 bytes):
    d0 0d fe ed 00 00 12 ed 00 00 00 48 00 00 11 40  ...........H...@
    00 00 00 28 00 00 00 11 00 00 00 02 00 00 00 00  ...(............
  backtrace:
    [<00000000f95b17c4>] kmemleak_alloc+0x34/0x3e
    [<00000000b9ec8e3e>] kmalloc_order+0x9c/0xc4
    [<00000000a95cf02e>] kmalloc_order_trace+0x34/0xb6
    [<00000000f01e68b4>] __kmalloc+0x5c2/0x62a
    [<000000002bd497b2>] kvmalloc_node+0x66/0xd6
    [<00000000906542fa>] of_kexec_alloc_and_setup_fdt+0xa6/0x6ea
    [<00000000e1166bde>] elf_kexec_load+0x206/0x4ec
    [<0000000036548e09>] kexec_image_load_default+0x40/0x4c
    [<0000000079fbe1b4>] sys_kexec_file_load+0x1c4/0x322
    [<0000000040c62c03>] ret_from_syscall+0x0/0x2

In elf_kexec_load(), a buffer is allocated via kvmalloc() to store fdt.
While it's not freed back to system when kexec kernel is reloaded or
unloaded.  Then memory leak is caused.  Fix it by introducing riscv
specific function arch_kimage_file_post_load_cleanup(), and freeing the
buffer there.

Fixes: 6261586e0c ("RISC-V: Add kexec_file support")
Signed-off-by: Li Huafei <lihuafei1@huawei.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Liao Chang <liaochang1@huawei.com>
Link: https://lore.kernel.org/r/20221104095658.141222-1-lihuafei1@huawei.com
Cc: stable@vger.kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-05 10:59:51 -08:00
Palmer Dabbelt 9704beac94
Merge patch series "Support VMCOREINFO export for RISCV64"
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES,
VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for
vmcore.

* b4-shazam-merge:
  Documentation: kdump: describe VMCOREINFO export for RISCV64
  RISC-V: Add arch_crash_save_vmcoreinfo support

Link: https://lore.kernel.org/r/20221026144208.373504-1-xianting.tian@linux.alibaba.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02 16:49:06 -08:00
Xianting Tian 649d6b1019
RISC-V: Add arch_crash_save_vmcoreinfo support
Add arch_crash_save_vmcoreinfo(), which exports VM layout(MODULES,
VMALLOC, VMEMMAP ranges and KERNEL_LINK_ADDR), va bits and ram base for
vmcore.

Default pagetable levels and PAGE_OFFSET aren't same for different
kernel version as below. For pagetable levels, it sets sv57 by default
and falls back to setting sv48 at boot time if sv57 is not supported by
the hardware.

For ram base, the default value is 0x80200000 for qemu riscv64 env and,
for example, is 0x200000 on the XuanTie 910 CPU.

 * Linux Kernel 5.18 ~
 *      PGTABLE_LEVELS = 5
 *      PAGE_OFFSET = 0xff60000000000000
 * Linux Kernel 5.17 ~
 *      PGTABLE_LEVELS = 4
 *      PAGE_OFFSET = 0xffffaf8000000000
 * Linux Kernel 4.19 ~
 *      PGTABLE_LEVELS = 3
 *      PAGE_OFFSET = 0xffffffe000000000

Since these configurations change from time to time and version to
version, it is preferable to export them via vmcoreinfo than to change
the crash's code frequently, it can simplify the development of crash
tool.

Signed-off-by: Xianting Tian <xianting.tian@linux.alibaba.com>
Tested-by: Deepak Gupta <debug@rivosinc.com>
Tested-by: Guo Ren <guoren@kernel.org>
Acked-by: Baoquan He <bhe@redhat.com>
Link: https://lore.kernel.org/r/20221026144208.373504-2-xianting.tian@linux.alibaba.com
[Palmer: wrap commit text]
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02 16:48:22 -08:00
Binglei Wang b57c2f1240
riscv: add riscv rethook implementation
Implement the kretprobes on riscv arch by using rethook machenism
which abstracts general kretprobe info into a struct rethook_node
to be embedded in the struct kretprobe_instance.

Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Binglei Wang <l3b2w1@gmail.com>
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Link: https://lore.kernel.org/r/20221025151831.1097417-1-conor@kernel.org
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02 13:04:05 -08:00
Palmer Dabbelt 5f66e18755
Merge patch series "RISC-V: Dynamic ftrace support for RV32I"
Jamie Iles <jamie@jamieiles.com> says:

This series enables dynamic ftrace support for RV32I bringing it to
parity with RV64I.  Most of the work is already there, this is largely
just assembly fixes to handle register sizes, correct handling of the
psABI calling convention and Kconfig change.

Validated with all ftrace boot time self test with qemu for RV32I and
RV64I in addition to real tracing on an RV32I FPGA design.

* b4-shazam-merge:
  RISC-V: enable dynamic ftrace for RV32I
  RISC-V: preserve a1 in mcount
  RISC-V: reduce mcount save space on RV32
  RISC-V: use REG_S/REG_L for mcount

Link: https://lore.kernel.org/r/20221115200832.706370-1-jamie@jamieiles.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2022-12-02 10:04:41 -08:00