IACR News item: 19 July 2025
Felix Carvalho Rodrigues, Décio Gazzoni Filho, Gora Adj, Isaac A. Canales-Martínez, Jorge Chávez-Saab, Julio López, Michael Scott, Francisco Rodríguez-Henríquez
Finite field arithmetic is central to several cryptographic algorithms on embedded devices like the ARM Cortex-M4, particularly for elliptic curve and isogeny-based cryptography. However, rapid algorithm evolution, driven by initiatives such as NIST’s post-quantum standardization, might frequently render hand-optimized implementations obsolete.
We address this challenge with m4-modarith, a library generating C code with inline assembly for the Cortex-M4 that rivals custom-tuned assembly,
enabling agile development in this ever-changing landscape.
Our generated modular multiplications obtains fast performances, competitive with hand-optimized assembly implementations published in the literature, even outperforming some of them for Curve25519.
Two contributions are pivotal to this success.
First, we introduce a novel multiplication strategy that matches the memory access complexity of the operand caching method while being applicable to a larger cache size for Cortex-M4 implementations. Second, we generalize an efficient pseudo-Mersenne reduction strategy, and formally prove its correctness and applicability for most primes of cryptographic interest.
Our generator allowed agile optimization of SQIsign’s NIST PQC Round 2 submission, improving level 1 verification from 123 Mcycles to only 54 Mcycles, a $2.3\times$ speedup.
As an additional case study, we use our generator to improve performance of portable implementations of RFC~7748 by up to $2.2\times$.
Additional news items may be found on the IACR news page.