International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Zhiguo Wan

Publications and invited talks

Year
Venue
Title
2025
TCHES
FusionMSM: A Collision-Free and Arithmetic-Optimized FPGA-based Accelerator for Multi-Scalar Multiplication
Zero-knowledge Proof (ZKP), is an effective cryptographic primitive that allows one party to verify the correctness of a given statement without disclosing any additional information. It plays a central role in applications such as blockchain transactions and cryptocurrencies. However, implementations of ZKP suffer from the most time-consuming task called Multi-Scalar Multiplication (MSM). Existing works and evaluation criteria primarily emphasize speed enhancement, but overlook optimizations of area overhead. In this paper, a FPGA-based accelerator FusionMSM is designed to reduce the overall latency but also improve area overhead. We attribute the bottleneck of MSM to a three-layer pyramid, including the finite field arithmetic, point operations on elliptic curves and scheduling. For modular arithmetic, we propose an efficient and non-Montgomery modular multiplier by utilizing hybrid multiplication strategy and optimizing multi-bit LUT-based modular reduction. It obtains 1.11 x less area cost and 2.00 x speed-up versus the modular multipliers used in ZKP acceleration works. For point operations, we design a unified and fully pipelined point addition unit, which can run at 500 MHz, the highest frequency in the reported works. On top of that, we present a greedy mechanism to resolve potential collisions, which can reduce the idle cycles of the point addition unit and improve its utilization. As far as we know, FusionMSM achieves the best performance compared to other FPGA-based and ASIC-based works for the input sizes from 218 to 226. For the degree of 220, FusionMSM only needs 12.4% of time in Hardcaml, 24.54% of time in PipeMSM on FPGA, and 36.41% of time in ASIC-based work PipeZK. It also utilizes less resources, resulting in a 90.93% reduction in URAMs, 35.24% reduction in FFs and 47.59% reduction in CARRY8s. Compared to GPU-based implementations, FusionMSM delivers comparable performance but with a lower power of 24.5 W.