International Association for Cryptologic Research

International Association
for Cryptologic Research

IACR News item: 24 March 2023

Wai-Kong Lee, Raymond K. Zhao, Ron Steinfeld, Amin Sakzad, Seong Oun Hwang
ePrint Report ePrint Report
The US National Institute of Standards and Technology initiated a standardization process for post-quantum cryptography in 2017, with the aim of selecting key encapsulation mechanisms and signature schemes that can withstand the threat from emerging quantum computers. In 2022, Falcon was selected as one of the standard signature schemes, eventually attracting effort to optimize the implementation of Falcon on various hardware architectures for practical applications. Recently, Mitaka was proposed as an alternative to Falcon, allowing parallel execution of most of its operations. These recent advancements motivate us to develop high throughput implementations of Falcon and Mitaka signature schemes on Graphics Processing Units (GPUs), a massively parallel architecture widely available on cloud service platforms. In this paper, we propose the first parallel implementation of Falcon on various GPUs. An iterative version of the sampling process in Falcon, which is also the most time-consuming Falcon operation, was developed. This allows us to implement Falcon signature generation without relying on expensive recursive function calls on GPUs. In addition, we propose a parallel random samples generation approach to accelerate the performance of Mitaka on GPUs. We evaluate our implementation techniques on state-of-the-art GPU architectures (RTX 3080, A100, T4 and V100). Experimental results show that our Falcon-512 implementation achieves 58, 595 signatures/second and 2, 721, 562 verifications/second on an A100 GPU, which is 20.03× and 29.51× faster than the highly optimized AVX2 implementation on CPU. Our Mitaka implementation achieves 161, 985 signatures/second and 1, 421, 046 verifications/second on the same GPU. Due to the adoption of a parallelizable sampling process, Mitaka signature generation enjoys ≈ 2 – 20× higher throughput than Falcon on various GPUs. The high throughput signature generation and verification achieved by this work can be very useful in various emerging applications, including the Internet of Things.
Expand

Additional news items may be found on the IACR news page.