CryptoDB
Debiao He
Publications and invited talks
    Year
  
  
    Venue
  
  
    Title
  
    2025
  
  
    TCHES
  
  
    SimdMSM: SIMD-accelerated Multi-Scalar Multiplication Framework for zkSNARKs
            
      Abstract    
    
Multi-scalar multiplication (MSM) is the primary building block in many pairing-based zero-knowledge proof (ZKP) systems. MSM at large scales has become the main bottleneck in ZKP implementations. Inspired by existing SIMD-accelerated work, we are focused on accelerating MSM computing efficiency using SIMD instructions in a single CPU environment. First, we propose a SIMD-accelerated MSM computing architecture with no write conflicts and constant memory overheads. This architecture utilizes multithreading to achieve task-level and loop-level parallelism and employs a three-tier buffer mechanism to maximize the utilization of the SIMD engine. Instanced with AVX512-IFMA instructions, we implement six SIMD elliptic curve arithmetic engines for different point addition in three coordinate systems and two groups. Moreover, we integrate our AVX-MSM implementation into the libsnark library, naming it AVX-ZK. In more detail, point deduplication and “Three-Stage” memory optimization are proposed to address problems existing in practical applications. Based on the RELIC library, our performance results on the BLS12-381 curve show that our AVX-MSM achieves up to 27.86x speedup over the most popular Pippenger algorithm. Compared with libsnark, our AVX-ZK implementation achieves over 11.53x (up to 20.26x) speedup under standard benchmarks.
  
    2024
  
  
    PKC
  
  
    Parameter-Hiding Order-Revealing Encryption without Pairings
            
      Abstract    
    
Order-Revealing Encryption (ORE) provides a practical solution for conducting range queries over encrypted data.  Achieving a desirable privacy-efficiency tradeoff in designing ORE schemes has posed a significant challenge. At Asiacrypt 2018, Cash et al. proposed Parameter-hiding ORE (pORE), which specifically targets scenarios where the data distribution shape is known, but the underlying parameters (such as mean and variance) need to be protected. However, existing pORE constructions rely on impractical bilinear maps, limiting their real-world applicability. In this work, we propose an alternative and efficient method for constructing pORE using identification schemes. By leveraging the map-invariance property of identification schemes, we eliminate the need for pairing computations during ciphertext comparison. Specifically, we instantiate our framework with the pairing-free Schnorr identification scheme and demonstrate that our proposed pORE scheme reduces ciphertext size by approximately 31.25\% and improves encryption and comparison efficiency by over two times compared to the current state-of-the-art pORE construction. Our work provides a more efficient alternative to existing pORE constructions and could be viewed as a step towards making pORE a viable choice for practical applications.
  
    2024
  
  
    TCHES
  
  
    Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm
            
      Abstract    
    
Multi-scalar multiplication (MSM) is an important building block in most of elliptic-curve-based zero-knowledge proof systems, such as Groth16 and PLONK. Recently, Lu et al. proposed cuZK, a new parallel MSM algorithm on GPUs. In this paper, we revisit this scheme and present a new GPU-based implementation to further improve the performance of MSM algorithm. First, we propose a novel method for mapping scalars into Pippenger’s bucket indices, largely reducing the number of buckets compared to the original Pippenger algorithm. Second, in the case that memory is sufficient, we develop a new efficient algorithm based on homogeneous coordinates in the bucket accumulation phase. Moreover, our accumulation phase is load-balanced, which means the parallel speedup ratio is almost linear growth as the number of device threads increases. Finally, we also propose a parallel layered reduction algorithm for the bucket aggregation phase, whose time complexity remains at the logarithmic level of the number of buckets. The implementation results over the BLS12-381 curve on the V100 graphics card show that our proposed algorithm achieves up to 1.998x, 1.821x and 1.818x speedup compared to cuZK at scales of 221, 222, and 223, respectively.
  
    2024
  
  
    ASIACRYPT
  
  
    Bounded Collusion-Resistant Registered Functional Encryption for Circuits
            
      Abstract    
    
As an emerging primitive, Registered Functional Encryption (RFE) eliminates the key-escrow issue that threatens numerous works for functional encryption, by replacing the trusted authority with a transparent key curator and allowing each user to sample their decryption keys locally. In this work, we present a new black-box approach to construct RFE for all polynomial-sized circuits. It considers adaptive simulation-based security in the bounded collusion model (Gorbunov et al. - CRYPTO'12), where the security can be ensured only if there are no more than Q >= 1 corrupted users and Q is fixed at the setup phase. Unlike earlier works, we do not employ unpractical Indistinguishability Obfuscation (iO). Conversely, it can be extended to support unbounded users, which is previously only known from iO.
Technically, our general compiler exploits garbled circuits and a novel variant of slotted Registered Broadcast Encryption (RBE), namely global slotted RBE. This primitive is similar to slotted RBE, but needs optimally compact public parameters and ciphertext, so as to satisfy the efficiency requirement of the resulting RFE. Then we present two concrete global slotted RBE from pairings and lattices, respectively. With proposed compiler, we hence obtain two bounded collusion-resistant RFE schemes. Here, the first scheme relies on k-Lin assumption, while the second one supports unbounded users under LWE and evasive LWE assumptions.
  
    2024
  
  
    ASIACRYPT
  
  
    Revisiting Pairing-Friendly Curves with Embedding Degrees 10 and 14
            
      Abstract    
    
Since 2015, there has been a significant decrease in the asymptotic complexity of computing discrete logarithms in finite fields. As a result, the key sizes of many mainstream pairing-friendly curves have to be updated to maintain the desired security level. In PKC'20, Guillevic conducted a comprehensive assessment of the security of a series of pairing-friendly curves with embedding degrees ranging from $9$ to $17$.  In this paper, we focus on five pairing-friendly curves with embedding degrees 10 and 14 at the 128-bit security level, with BW14-351 emerging as the most competitive candidate. First, we extend the optimized formula for the optimal pairing on BW13-310, a 128-bit secure curve with a prime $p$ in 310 bits and embedding degree $13$, to our target curves. This generalization allows us to compute the optimal pairing in approximately $\log r/(2\varphi(k))$ Miller iterations, where $r$ and $k$ are the order of pairing groups and the embedding degree respectively. Second, we develop optimized algorithms for cofactor multiplication for $\G_1$ and $\G_2$, as well as subgroup membership testing for $\G_2$ on these curves. Finally, we provide detailed performance comparisons between BW14-351 and other popular curves on a 64-bit platform in terms of pairing computation, hashing to $\G_1$ and $\G_2$, group exponentiations, and subgroup membership testings. Our results demonstrate that BW14-351 is a strong candidate for building pairing-based cryptographic protocols.
  Coauthors
- Jie Chen (1)
- Rongmao Chen (2)
- Yutian Chen (1)
- Yu Dai (2)
- Debiao He (5)
- Xinyi Huang (1)
- Rui Jiang (1)
- Min Luo (2)
- Cong Peng (4)
- Yi Wang (1)
- Zhijian Yang (1)
- Yijian Zhang (1)
- Yuqing ZHANG (1)
- Chang-An Zhao (1)
