International Association for Cryptologic Research

International Association
for Cryptologic Research

IACR News item: 13 October 2023

Vincent Hwang, Chi-Ting Liu, Bo-Yin Yang
ePrint Report ePrint Report
In this paper, we explore the cost of vectorization for polynomial multiplication with coefficients in Zq for an odd prime q. If there is a large power of two dividing q−1, we can apply radix-2 Cooley–Tukey fast Fourier transforms to multiply polynomials in Zq[x]. The radix-2 nature admits efficient vectorization. Conversely, if 2 is the only power of two dividing q−1, we can apply Schönhage’s and Nussbaumer’s FFTs to craft radix-2 roots of unity, but these double the number of coefficients. We show how to avoid this doubling while maintaining vectorization friendliness with Good–Thomas, Rader’s, and Bruun’s FFTs. In particular, we exploit the existing Fermat-prime factor of q − 1 for Rader’s FFT and the power-of-two factor of q + 1 for Bruun’s FFT. We implement these ideas for the NTRU Prime instances ntrulpr761/sntrup761, operating over the coefficient ring Z4591 on a Cortex-A72. sntrup761 is currently used in OpenSSH 9.0 by default. Our polynomial multiplication outperforms the state-of-the-art vector-optimized implementation by 6.1×. For ntrulpr761, our keygen, encap, and decap are 2.98×, 2.79×, and 3.07× faster than the state-of-the-art vector-optimized implementation. For sntrup761, we outperform the reference implementation significantly.
Expand

Additional news items may be found on the IACR news page.