CryptoDB
Isaac A. Canales-Martínez
Publications and invited talks
Year
Venue
Title
2025
TCHES
Generation of Fast Finite Field Arithmetic forCortex-M4 with ECDH and SQIsign Applications
Abstract
Finite field arithmetic is central to several cryptographic algorithms on embedded devices like the ARM Cortex-M4, particularly for elliptic curve and isogenybased cryptography. However, rapid algorithm evolution, driven by initiatives such as NIST’s post-quantum standardization, might frequently render hand-optimized implementations obsolete. We address this challenge with m4-modarith, a library generating C code with inline assembly for the Cortex-M4 that rivals custom-tuned assembly, enabling agile development in this ever-changing landscape. Our generated modular multiplications obtains fast performances, competitive with hand-optimized assembly implementations published in the literature, even outperforming some of them for Curve25519. Two contributions are pivotal to this success. First, we introduce a novel multiplication strategy that matches the memory access complexity of the operand caching method while being applicable to a larger cache size for Cortex-M4 implementations. Second, we generalize an efficient pseudo-Mersenne reduction strategy, and formally prove its correctness and applicability for most primes of cryptographic interest. Our generator allowed agile optimization of SQIsign’s NIST PQC Round 2 submission, improving level 1 verification from 123 Mcycles to only 54 Mcycles, a 2.3x speedup. As an additional case study, we use our generator to improve performance of portable implementations of RFC 7748 by up to 2.2x.
2024
TCHES
Optimized One-Dimensional SQIsign Verification on Intel and Cortex-M4
Abstract
SQIsign is a well-known post-quantum signature scheme due to its small combined signature and public-key size. However, SQIsign suffers from notably long signing times, and verification times are not short either. To improve this, recent research has explored both one-dimensional and two-dimensional variants of SQIsign, each with distinct characteristics. In particular, SQIsign2D’s efficient signing and verification times have made it a focal point of recent research. However, the absence of an optimized one-dimensional verification implementation hampers a thorough comparison between these different variants. This work bridges this gap in the literature: we provide a state-of-the-art implementation of one-dimensional SQIsign verification, including novel optimizations. We report a record-breaking one-dimensional SQIsign verification time of 8.55 Mcycles on a Raptor Lake Intel processor, closely matching SQIsign2D on the same processor. For uncompressed signatures, the signature size doubles and we verify in only 5.6 Mcycles. Taking advantage of the inherent parallelism available in isogeny computations, we present 5-core variants that can go as low as 1.3 Mcycles. Furthermore, we present the first implementation that supports both 32-bit and 64-bit processors. It includes optimized assembly code for the Cortex-M4 and has been integrated with the pqm4 project. Our results motivate further research into one-dimensional SQIsign, as it boasts unique features among isogeny-based schemes.
Coauthors
- Marius A. Aardal (1)
- Gora Adj (2)
- Arwa Alblooshi (1)
- Diego F. Aranha (1)
- Isaac A. Canales-Martínez (2)
- Jorge Chavez-Saab (2)
- Décio Gazzoni Filho (1)
- Décio Luiz Gazzoni Filho (1)
- Julio López (1)
- Krijn Reijnders (1)
- Felix Carvalho Rodrigues (1)
- Francisco Rodríguez-Henríquez (2)
- Michael Scott (1)