International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches

Authors:
Tian Zhou , School of Cyber Security, University of Science and Technology of China, Heifei, China
Fangyu Zheng , School of Cryptology, University of Chinese Academy of Sciences, Beijing, China
Guang Fan , Ant Group, Hangzhou, China
Lipeng Wan , School of Cryptology, University of Chinese Academy of Sciences, Beijing, China
Wenxu Tang , School of Cyber Security, University of Science and Technology of China, Heifei, China
Yixuan Song , Ant Group, Hangzhou, China
Yi Bian , School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Jingqiang Lin , School of Cyber Security, University of Science and Technology of China, Heifei, China; Beijing Research Institute, University of Science and Technology of China, Beijing, China
Download:
DOI: 10.46586/tches.v2024.i2.25-63
URL: https://tches.iacr.org/index.php/TCHES/article/view/11420
Search ePrint
Search Google
Abstract: The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized.In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
BibTeX
@article{tches-2024-34044,
  title={ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches},
  journal={IACR Transactions on Cryptographic Hardware and Embedded Systems},
  publisher={Ruhr-Universität Bochum},
  volume={024 No. 2},
  pages={25-63},
  url={https://tches.iacr.org/index.php/TCHES/article/view/11420},
  doi={10.46586/tches.v2024.i2.25-63},
  author={Tian Zhou and Fangyu Zheng and Guang Fan and Lipeng Wan and Wenxu Tang and Yixuan Song and Yi Bian and Jingqiang Lin},
  year=2024
}