Back to Browse

Computer Architecture: Optimizing FMA() with SIMD and loop unrolling

209 views
Apr 23, 2024
30:42

In Linux ARMv8, optimize the floating-point multiply and add function. First, by improving the C-code at an assembly level. Then, but adding SIMD instructions with the base Aarch64/A-profile instructions. Finally, to unroll the code with a loop factor of 2. Other items discussed: how to compile and link multiple binaries via command line (with GCC), how to cross-compile C-code into assembly, tips and tricks with optimizing assembly code, post-indexing with memory operations in ARM. Prerequisites: An understanding of ARMv8 assembly, SIMD, linear algebra and loop unrolling.

Download

0 formats

No download links available.

Computer Architecture: Optimizing FMA() with SIMD and loop unrolling | NatokHD