International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

AN OPTIMIZED HARDWARE ARCHITECTURE OF MONTGOMERY MULTIPLICATION ALGORITHM

Authors:
Miaoqing Huang
Kris Gaj
Soonhak Kwon
Tarek El-Ghazawi
Download:
URL: http://eprint.iacr.org/2007/228
Search ePrint
Search Google
Abstract: Montgomery multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koc introduced a now-classical architecture for implementing Montgomery multiplication in hardware. With parameters optimized for minimum latency, this architecture performs a single Montgomery multiplication in approximately 2n clock cycles, where n is the size of operands in bits. In this paper we propose and discuss an optimized hardware architecture performing the same operation in approximately n clock cycles. Our architecture is based on pre-computing partial results using two possible assumptions regarding the most significant bit of the previous word, and is only marginally more demanding in terms of the circuit area. The new radix-2 architecture can be extended for the case of radix-4, while preserving a factor of two speed-up over the corresponding radix-4 design by Tenca, Todorov, and Koc from CHES 2001. Our architecture has been verified by modeling it in Verilog-HDL, implementing it using Xilinx Virtex-II 6000 FPGA, and experimentally testing it using SRC-6 reconfigurable computer.
BibTeX
@misc{eprint-2007-13509,
  title={AN OPTIMIZED HARDWARE ARCHITECTURE OF MONTGOMERY MULTIPLICATION ALGORITHM},
  booktitle={IACR Eprint archive},
  keywords={public-key cryptography / Montgomery multiplication,MWR2MM Algorithm,Field Programmable Gate Arrays},
  url={http://eprint.iacr.org/2007/228},
  note={Not published mqhuang@gwu.edu 13676 received 12 Jun 2007},
  author={Miaoqing Huang and Kris Gaj and Soonhak Kwon and Tarek El-Ghazawi},
  year=2007
}