Debiao He

CryptoDB

Debiao He

Publications and invited talks

Year

Venue

Title

2025

TCHES

SimdMSM: SIMD-accelerated Multi-Scalar Multiplication Framework for zkSNARKs Abstract

Rui Jiang Cong Peng Min Luo Rongmao Chen Debiao He

Multi-scalar multiplication (MSM) is the primary building block in many pairing-based zero-knowledge proof (ZKP) systems. MSM at large scales has become the main bottleneck in ZKP implementations. Inspired by existing SIMD-accelerated work, we are focused on accelerating MSM computing efficiency using SIMD instructions in a single CPU environment. First, we propose a SIMD-accelerated MSM computing architecture with no write conflicts and constant memory overheads. This architecture utilizes multithreading to achieve task-level and loop-level parallelism and employs a three-tier buffer mechanism to maximize the utilization of the SIMD engine. Instanced with AVX512-IFMA instructions, we implement six SIMD elliptic curve arithmetic engines for different point addition in three coordinate systems and two groups. Moreover, we integrate our AVX-MSM implementation into the libsnark library, naming it AVX-ZK. In more detail, point deduplication and “Three-Stage” memory optimization are proposed to address problems existing in practical applications. Based on the RELIC library, our performance results on the BLS12-381 curve show that our AVX-MSM achieves up to 27.86x speedup over the most popular Pippenger algorithm. Compared with libsnark, our AVX-ZK implementation achieves over 11.53x (up to 20.26x) speedup under standard benchmarks.

2024

PKC

Parameter-Hiding Order-Revealing Encryption without Pairings Abstract

Cong Peng Rongmao Chen Yi Wang Debiao He Xinyi Huang

Order-Revealing Encryption (ORE) provides a practical solution for conducting range queries over encrypted data. Achieving a desirable privacy-efficiency tradeoff in designing ORE schemes has posed a significant challenge. At Asiacrypt 2018, Cash et al. proposed Parameter-hiding ORE (pORE), which specifically targets scenarios where the data distribution shape is known, but the underlying parameters (such as mean and variance) need to be protected. However, existing pORE constructions rely on impractical bilinear maps, limiting their real-world applicability. In this work, we propose an alternative and efficient method for constructing pORE using identification schemes. By leveraging the map-invariance property of identification schemes, we eliminate the need for pairing computations during ciphertext comparison. Specifically, we instantiate our framework with the pairing-free Schnorr identification scheme and demonstrate that our proposed pORE scheme reduces ciphertext size by approximately 31.25\% and improves encryption and comparison efficiency by over two times compared to the current state-of-the-art pORE construction. Our work provides a more efficient alternative to existing pORE constructions and could be viewed as a step towards making pORE a viable choice for practical applications.

2024

TCHES

Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm Abstract

Yutian Chen Cong Peng Yu Dai Min Luo Debiao He

Multi-scalar multiplication (MSM) is an important building block in most of elliptic-curve-based zero-knowledge proof systems, such as Groth16 and PLONK. Recently, Lu et al. proposed cuZK, a new parallel MSM algorithm on GPUs. In this paper, we revisit this scheme and present a new GPU-based implementation to further improve the performance of MSM algorithm. First, we propose a novel method for mapping scalars into Pippenger’s bucket indices, largely reducing the number of buckets compared to the original Pippenger algorithm. Second, in the case that memory is sufficient, we develop a new efficient algorithm based on homogeneous coordinates in the bucket accumulation phase. Moreover, our accumulation phase is load-balanced, which means the parallel speedup ratio is almost linear growth as the number of device threads increases. Finally, we also propose a parallel layered reduction algorithm for the bucket aggregation phase, whose time complexity remains at the logarithmic level of the number of buckets. The implementation results over the BLS12-381 curve on the V100 graphics card show that our proposed algorithm achieves up to 1.998x, 1.821x and 1.818x speedup compared to cuZK at scales of 221, 222, and 223, respectively.

2024

ASIACRYPT

Bounded Collusion-Resistant Registered Functional Encryption for Circuits Abstract

Yijian Zhang Jie Chen Debiao He Yuqing Zhang

As an emerging primitive, Registered Functional Encryption (RFE) eliminates the key-escrow issue that threatens numerous works for functional encryption, by replacing the trusted authority with a transparent key curator and allowing each user to sample their decryption keys locally. In this work, we present a new black-box approach to construct RFE for all polynomial-sized circuits. It considers adaptive simulation-based security in the bounded collusion model (Gorbunov et al. - CRYPTO'12), where the security can be ensured only if there are no more than Q >= 1 corrupted users and Q is fixed at the setup phase. Unlike earlier works, we do not employ unpractical Indistinguishability Obfuscation (iO). Conversely, it can be extended to support unbounded users, which is previously only known from iO. Technically, our general compiler exploits garbled circuits and a novel variant of slotted Registered Broadcast Encryption (RBE), namely global slotted RBE. This primitive is similar to slotted RBE, but needs optimally compact public parameters and ciphertext, so as to satisfy the efficiency requirement of the resulting RFE. Then we present two concrete global slotted RBE from pairings and lattices, respectively. With proposed compiler, we hence obtain two bounded collusion-resistant RFE schemes. Here, the first scheme relies on k-Lin assumption, while the second one supports unbounded users under LWE and evasive LWE assumptions.

2024

ASIACRYPT

Revisiting Pairing-Friendly Curves with Embedding Degrees 10 and 14 Abstract

Yu Dai Debiao He Cong Peng Zhijian Yang Chang-an Zhao

Since 2015, there has been a significant decrease in the asymptotic complexity of computing discrete logarithms in finite fields. As a result, the key sizes of many mainstream pairing-friendly curves have to be updated to maintain the desired security level. In PKC'20, Guillevic conducted a comprehensive assessment of the security of a series of pairing-friendly curves with embedding degrees ranging from $9$ to $17$. In this paper, we focus on five pairing-friendly curves with embedding degrees 10 and 14 at the 128-bit security level, with BW14-351 emerging as the most competitive candidate. First, we extend the optimized formula for the optimal pairing on BW13-310, a 128-bit secure curve with a prime $p$ in 310 bits and embedding degree $13$, to our target curves. This generalization allows us to compute the optimal pairing in approximately $\log r/(2\varphi(k))$ Miller iterations, where $r$ and $k$ are the order of pairing groups and the embedding degree respectively. Second, we develop optimized algorithms for cofactor multiplication for $\G_1$ and $\G_2$, as well as subgroup membership testing for $\G_2$ on these curves. Finally, we provide detailed performance comparisons between BW14-351 and other popular curves on a 64-bit platform in terms of pairing computation, hashing to $\G_1$ and $\G_2$, group exponentiations, and subgroup membership testings. Our results demonstrate that BW14-351 is a strong candidate for building pairing-based cryptographic protocols.