IACR News item: 13 October 2023
Vincent Hwang, Chi-Ting Liu, Bo-Yin Yang
ePrint Report
In this paper, we explore the cost of vectorization for polynomial multiplication with coefficients in Zq for an odd prime q. If there is a large power of two dividing q−1, we can apply radix-2 Cooley–Tukey fast Fourier transforms to multiply polynomials in Zq[x]. The radix-2 nature admits efficient vectorization. Conversely, if 2 is the only power of two dividing q−1, we can apply Schönhage’s and Nussbaumer’s FFTs to craft radix-2 roots of unity, but these double the number of coefficients. We show how to avoid this doubling while maintaining vectorization friendliness with Good–Thomas, Rader’s, and Bruun’s FFTs. In particular, we exploit the existing Fermat-prime factor of q − 1 for Rader’s FFT and the power-of-two factor of q + 1 for Bruun’s FFT. We implement these ideas for the NTRU Prime instances ntrulpr761/sntrup761, operating over the coefficient ring Z4591 on a Cortex-A72. sntrup761 is currently used in OpenSSH 9.0 by default.
Our polynomial multiplication outperforms the state-of-the-art vector-optimized implementation by 6.1×. For ntrulpr761, our keygen, encap, and decap are 2.98×, 2.79×, and 3.07× faster than the state-of-the-art vector-optimized implementation. For sntrup761, we outperform the reference implementation significantly.
Additional news items may be found on the IACR news page.