International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Li Xiaolin

Publications

Year
Venue
Title
2025
TCHES
POTA: A Pipelined Oblivious Transfer Acceleration Architecture for Secure Multi-Party Computation
With the rapid development and deployment of machine learning (ML) and big data technologies, which rely heavily on sensitive user data for training and inference, ensuring privacy and data security has become a pressing challenge. Addressing this issue requires methods that safeguard sensitive information while maintaining the correctness of computational results. Secure multi-party computation (MPC), as a representative application of cryptographic techniques, offers a technical solution to this challenge by enabling privacy-preserving computations. It has been widely applied in scenarios such as cloud-based inference and other privacy-sensitive tasks. However, MPC also introduces significant performance overhead, thus limiting its further application. Our analysis reveals that the foundational element of MPC, the oblivious transfer (OT) protocol collectively account for up to 96.64% of the execution time. It is because the OT protocols are constrained by low network band- width and weak compute engines. To address these challenges, we propose POTA, a high-performance pipelined OT hardware acceleration architecture supporting the silent OT protocol. In the POTA design, we develop efficient subsystems targeting the two most compute-intensive parts: the construction of puncturable pseudoran- dom function (PPRF), and large matrix-vector multiplications under the learning parity with noise (LPN) assumption within the silent OT protocol. In addition, to address the performance overhead caused by data transfer between POTA and the host CPU, we design a host-accelerator execution pipeline to hide the considerable transmission latency. Furthermore, we design a modular multiplication module over a finite field to generate the more complex correlations required by MPC protocols. Finally, we implement a POTA prototype on Xilinx VCU129 FPGAs. Experimental results demonstrate that under various network settings, POTA achieves significant speedups, with maximum improvements of 192.57x for basic operations and 597.57x for convolutional neural networks (CNN).

Coauthors

Liu Hongwei (1)
Sun Ninghui (1)
Hao Qinfen (1)
Yan Wei (1)
Li Xiaolin (1)
Zhang Yong (1)
Liu Yong (1)