linux

Commit Graph

Author	SHA1	Message	Date
Linus Torvalds	1896ce8eb6	Optimize fsverity with 2-way interleaved hashing Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and make fsverity use it for faster file data verification. This improves fsverity performance on many x86_64 and arm64 processors. Later, I plan to make dm-verity use this too. -----BEGIN PGP SIGNATURE----- iIoEABYIADIWIQSacvsUNc7UX4ntmEPzXCl4vpKOKwUCaNg4/RQcZWJpZ2dlcnNA a2VybmVsLm9yZwAKCRDzXCl4vpKOK4fMAP9Xz00JNDfJ+mOVHIYOhAlWFGnug0X1 cvoRf4QXchNlbwD9HTJQQDQXnbsPy3QPrUVfl2FqCW7c6vRlBJijhD6j4wE= =6dCR -----END PGP SIGNATURE----- Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux Pull interleaved SHA-256 hashing support from Eric Biggers: "Optimize fsverity with 2-way interleaved hashing Add support for 2-way interleaved SHA-256 hashing to lib/crypto/, and make fsverity use it for faster file data verification. This improves fsverity performance on many x86_64 and arm64 processors. Later, I plan to make dm-verity use this too" * tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux: fsverity: Use 2-way interleaved SHA-256 hashing when supported fsverity: Remove inode parameter from fsverity_hash_block() lib/crypto: tests: Add tests and benchmark for sha256_finup_2x() lib/crypto: x86/sha256: Add support for 2-way interleaved hashing lib/crypto: arm64/sha256: Add support for 2-way interleaved hashing lib/crypto: sha256: Add support for 2-way interleaved hashing	2025-09-29 15:55:20 -07:00
Eric Biggers	bc6d6a4172	lib/crypto: x86/sha256: Add support for 2-way interleaved hashing Add an implementation of sha256_finup_2x_arch() for x86_64. It interleaves the computation of two SHA-256 hashes using the x86 SHA-NI instructions. dm-verity and fs-verity will take advantage of this for greatly improved performance on capable CPUs. This increases the throughput of SHA-256 hashing 4096-byte messages by the following amounts on the following CPUs: Intel Ice Lake (server): 4% Intel Sapphire Rapids: 38% Intel Emerald Rapids: 38% AMD Zen 1 (Threadripper 1950X): 84% AMD Zen 4 (EPYC 9B14): 98% AMD Zen 5 (Ryzen 9 9950X): 64% For now, this seems to benefit AMD more than Intel. This seems to be because current AMD CPUs support concurrent execution of the SHA-NI instructions, but unfortunately current Intel CPUs don't, except for the sha256msg2 instruction. Hopefully future Intel CPUs will support SHA-NI on more execution ports. Zen 1 supports 2 concurrent sha256rnds2, and Zen 4 supports 4 concurrent sha256rnds2, which suggests that even better performance may be achievable on Zen 4 by interleaving more than two hashes. However, doing so poses a number of trade-offs, and furthermore Zen 5 goes back to supporting "only" 2 concurrent sha256rnds2. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250915160819.140019-4-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-09-17 13:09:40 -05:00
Eric Biggers	5012bd2dc6	lib/crypto: Drop inline from all _mod_init_arch() functions Drop 'inline' from all the _mod_init_arch() functions so that the compiler will warn about any bugs where they are unused due to not being wired up properly. (There are no such bugs currently, so this just establishes a more robust convention for the future. Of course, these functions also tend to get inlined anyway, regardless of the keyword.) Link: https://lore.kernel.org/r/20250816020457.432040-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-08-27 08:15:35 -07:00
Eric Biggers	640d31ea83	lib/crypto: sha256: Use underlying functions instead of crypto_simd_usable() Since sha256_kunit tests the fallback code paths without using crypto_simd_disabled_for_test, make the SHA-256 code just use the underlying may_use_simd() and irq_fpu_usable() functions directly instead of crypto_simd_usable(). This eliminates an unnecessary layer. While doing this, also add likely() annotations, and fix a minor inconsistency where the static keys in the sha256.h files were in a different place than in the corresponding sha1.h and sha512.h files. Link: https://lore.kernel.org/r/20250731223510.136650-1-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-08-26 12:52:27 -04:00
Eric Biggers	a8c60a9aca	lib/crypto: x86/sha256: Move static_call above kernel-mode FPU section As I did for sha512_blocks(), reorganize x86's sha256_blocks() to be just a static_call. To achieve that, for each assembly function add a C function that handles the kernel-mode FPU section and fallback. While this increases total code size slightly, the amount of code actually executed on a given system does not increase, and it is slightly more efficient since it eliminates the extra static_key. It also makes the assembly functions be called with standard direct calls instead of static calls, eliminating the need for ANNOTATE_NOENDBR. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250704023958.73274-2-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-04 10:23:55 -07:00
Eric Biggers	e96cb9507f	lib/crypto: sha256: Consolidate into single module Consolidate the CPU-based SHA-256 code into a single module, following what I did with SHA-512: - Each arch now provides a header file lib/crypto/$(SRCARCH)/sha256.h, replacing lib/crypto/$(SRCARCH)/sha256.c. The header defines sha256_blocks() and optionally sha256_mod_init_arch(). It is included by lib/crypto/sha256.c, and thus the code gets built into the single libsha256 module, with proper inlining and dead code elimination. - sha256_blocks_generic() is moved from lib/crypto/sha256-generic.c into lib/crypto/sha256.c. It's now a static function marked with __maybe_unused, so the compiler automatically eliminates it in any cases where it's not used. - Whether arch-optimized SHA-256 is buildable is now controlled centrally by lib/crypto/Kconfig instead of by lib/crypto/$(SRCARCH)/Kconfig. The conditions for enabling it remain the same as before, and it remains enabled by default. - Any additional arch-specific translation units for the optimized SHA-256 code (such as assembly files) are now compiled by lib/crypto/Makefile instead of lib/crypto/$(SRCARCH)/Makefile. Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250630160645.3198-13-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>	2025-07-04 10:23:11 -07:00

6 Commits