## CryptoDB

### Paper: NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2

Authors: Chi-Ming Marvin Chung , Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan Vincent Hwang , Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan Matthias J. Kannwischer , Max Planck Institute for Security and Privacy, Bochum, Germany Gregor Seiler , IBM Research – Zurich, Rüschlikon, Switzerland; ETH Zurich, Zurich, Switzerland Cheng-Jhih Shih , Academia Sinica, Taipei, Taiwan; National Taiwan University, Taipei, Taiwan Bo-Yin Yang , Academia Sinica, Taipei, Taiwan DOI: 10.46586/tches.v2021.i2.159-188 URL: https://tches.iacr.org/index.php/TCHES/article/view/8791 Search ePrint Search Google In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.
##### BibTeX
@article{tches-2021-30796,
title={NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2},
journal={IACR Transactions on Cryptographic Hardware and Embedded Systems},
publisher={Ruhr-Universität Bochum},
volume={2021, Issue 2},
pages={159-188},
url={https://tches.iacr.org/index.php/TCHES/article/view/8791},
doi={10.46586/tches.v2021.i2.159-188},
author={Chi-Ming Marvin Chung and Vincent Hwang and Matthias J. Kannwischer and Gregor Seiler and Cheng-Jhih Shih and Bo-Yin Yang},
year=2021
}