International Association for Cryptologic Research

International Association
for Cryptologic Research


Cong Peng


Parameter-Hiding Order-Revealing Encryption without Pairings
Order-Revealing Encryption (ORE) provides a practical solution for conducting range queries over encrypted data. Achieving a desirable privacy-efficiency tradeoff in designing ORE schemes has posed a significant challenge. At Asiacrypt 2018, Cash et al. proposed Parameter-hiding ORE (pORE), which specifically targets scenarios where the data distribution shape is known, but the underlying parameters (such as mean and variance) need to be protected. However, existing pORE constructions rely on impractical bilinear maps, limiting their real-world applicability. In this work, we propose an alternative and efficient method for constructing pORE using identification schemes. By leveraging the map-invariance property of identification schemes, we eliminate the need for pairing computations during ciphertext comparison. Specifically, we instantiate our framework with the pairing-free Schnorr identification scheme and demonstrate that our proposed pORE scheme reduces ciphertext size by approximately 31.25\% and improves encryption and comparison efficiency by over two times compared to the current state-of-the-art pORE construction. Our work provides a more efficient alternative to existing pORE constructions and could be viewed as a step towards making pORE a viable choice for practical applications.
Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm
Multi-scalar multiplication (MSM) is an important building block in most of elliptic-curve-based zero-knowledge proof systems, such as Groth16 and PLONK. Recently, Lu et al. proposed cuZK, a new parallel MSM algorithm on GPUs. In this paper, we revisit this scheme and present a new GPU-based implementation to further improve the performance of MSM algorithm. First, we propose a novel method for mapping scalars into Pippenger’s bucket indices, largely reducing the number of buckets compared to the original Pippenger algorithm. Second, in the case that memory is sufficient, we develop a new efficient algorithm based on homogeneous coordinates in the bucket accumulation phase. Moreover, our accumulation phase is load-balanced, which means the parallel speedup ratio is almost linear growth as the number of device threads increases. Finally, we also propose a parallel layered reduction algorithm for the bucket aggregation phase, whose time complexity remains at the logarithmic level of the number of buckets. The implementation results over the BLS12-381 curve on the V100 graphics card show that our proposed algorithm achieves up to 1.998x, 1.821x and 1.818x speedup compared to cuZK at scales of 221, 222, and 223, respectively.


Rongmao Chen (1)
Yutian Chen (1)
Yu Dai (1)
Debiao He (2)
Xinyi Huang (1)
Min Luo (1)
Cong Peng (2)
Yi Wang (1)