International Association for Cryptologic Research

International Association
for Cryptologic Research

IACR News item: 25 December 2023

Vincent Hwang, YoungBeom Kim, Seog Chung Seo
ePrint Report ePrint Report
We optimize the number-theoretic transforms (NTTs) in Dilithium — a digital signature scheme recently standardized by the National Institute of Standards and Technology (NIST) — on Cortex-M3 and 8-bit AVR. The core novelty is the exploration of micro-architectural insights for modular multiplications. Recent work [Becker, Hwang, Kannwischer, Yang and Yang, Volume 2022 (1), Transactions on Cryptographic Hardware and Embedded Systems, 2022] found a correspondence between Montgomery and Barrett multiplications by relating modular reductions to integer approximations and demonstrated that Barrett multiplication is more favorable than Montgomery multiplication by absorbing the subtraction to the low multiplication. We first point out the benefit of Barrett multiplication when long and high multiplication instructions are unavailable, unusable, or slow. We then generalize the notion of integer approximations and improve the emulation of high multiplications used in Barrett multiplication.

Compared to the state-of-the-art assembly-optimized implementations on Cortex-M3, our constant-time NTT/iNTT are 1.38−1.51 times faster and our variable-time NTT/iNTT are 1.10−1.21 times faster. On our 8-bit AVR, we outperform Montgomery-based C implementations of NTT/iNTT by 6.37−7.27 times by simply switching to the proposed Barrett-based implementation. We additionally implement Barrett-based NTT/iNTT in assembly and obtain 14.10− 14.42 times faster code.

For the overall scheme, we provide speed-optimized implementations for Dilithium parameter sets dilithium2 and dilithium3 on Cortex-M3, and stack-optimized implementations for all parameter sets on Cortex-M3 and 8-bit AVR. We briefly compare the performance of speed-optimized dilithium3. Compared to the state-of-the-art assembly implementation on Cortex-M3, our assembly implementation reduces the key generation, signature generation, and signature verification cycles by 2.30%, 23.29%, and 0.69%. In the 8-bit AVR environment, our Barrett-based C implementation reduces the key generation, signature generation, and signature verification cycles by 45.09%, 56.80%, and 50.40%, respectively, and our assembly-optimized implementation reduces the cycles of each operation by 48.85%, 61.70%, and 55.08%, respectively.
Expand

Additional news items may be found on the IACR news page.