International Association for Cryptologic Research

International Association
for Cryptologic Research


Paper: Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4

Amin Abdulrahman , Ruhr University Bochum
Jiun-Peng Chen , Academia Sinica
Yu-Jia Chen , IKV Technology
Vincent Hwang , National Taiwan University and Academia Sinica
Matthias J. Kannwischer , Academia Sinica
Bo-Yin Yang , Academia Sinica
Search ePrint
Search Google
Abstract: The U.S. National Institute of Standards and Technology (NIST) has designated ARM microcontrollers as an important benchmarking platform for its Post-Quantum Cryptography standardization process (NISTPQC). In view of this, we explore the design space of the NISTPQC finalist Saber on the Cortex-M4 and its close relation, the Cortex-M3. In the process, we investigate various optimization strategies and memory-time tradeoffs for number-theoretic transforms (NTTs). Recent work by Chung et al. has shown that NTT multiplication is superior compared to Toom--Cook multiplication for unprotected Saber implementations on the Cortex-M4 in terms of speed. However, it remains unclear if NTT multiplication can outperform Toom--Cook in masked implementations of Saber. Additionally, it is an open question if Saber with NTTs can outperform Toom--Cook in terms of stack usage. We answer both questions in the affirmative. Additionally, we present a Cortex-M3 implementation of Saber using NTTs outperforming an existing Toom--Cook implementation. Our stack-optimized unprotected M4 implementation uses around the same amount of stack as the most stack-optimized implementation using Toom--Cook while being 33%-41% faster. Our speed-optimized masked M4 implementation is 16% faster than the fastest masked implementation using Toom--Cook. For the Cortex-M3, we outperform existing implementations by 29%-35% in speed. We conclude that for both stack- and speed-optimization purposes, one should base polynomial multiplications in Saber on the NTT rather than Toom--Cook for the Cortex-M4 and Cortex-M3. In particular, in many cases, composite moduli NTTs perform best.
  title={Multi-moduli NTTs for Saber on Cortex-M3 and Cortex-M4},
  journal={IACR Transactions on Cryptographic Hardware and Embedded Systems},
  author={Amin Abdulrahman and Jiun-Peng Chen and Yu-Jia Chen and Vincent Hwang and Matthias J. Kannwischer and Bo-Yin Yang},