IACR News item: 20 October 2025

Fast Batch Matrix Multiplication in Ciphertexts

Jung Hee Cheon, Minsik Kang, Junho Lee

ePrint Report

Encrypted matrix multiplication (MM) is a fundamental primitive in privacy-preserving machine learning and encrypted data search, but it remains a significant performance bottleneck. Recently, Bae et al.~(Crypto’24) and Park~(Eurocrypt’25) introduced novel algorithms for ciphertext–plaintext (CPMM) and ciphertext–ciphertext (CCMM) matrix multiplications. These algorithms reduce encrypted MM operations to plaintext matrix multiplications (PPMM), enabling implementation through highly optimized BLAS libraries. While these reduction-based methods offer significant improvements, their applicability is limited to scenarios where the matrix dimension $d$ is comparable to the ring dimension $N$ in RLWE-based CKKS schemes. As a result, they fall short for matrix multiplications involving small or medium-sized matrices.

We extend the reduction-based CPMM/CCMM into small-sized matrix operations by batching instances. We use the Slots-in-Coefficient (SinC) encoding where a ring element is represented by a polynomial with coefficients each of which is the Discrete Fourier Transform of matrix entries at the same position. This encoding enables reductions of encrypted batch MM algorithms to a small number of batch PPMMs, which can be efficiently accelerated by BLAS libraries. Our batch encrypted MM flexibly accommodates diverse matrix dimensions and batch sizes independent of the ring dimension $N$, thereby extending its applicability to practical real-world settings.

For two $d \times d$ matrices with $N/d$ batches, our batch CPMM and CCMM algorithms achieve complexity $O(d^2N)$, improving upon Bae et al. at $O(dN^2)$ and Jiang et al~(CCS’18) at $O(d^2 N\log (N))$. We further extend our techniques to rectangular matrices, achieving $O(dN^2)$ for multiplying a $d \times N$ and an $N \times N$ matrix, improving previous $O(N^3)$ methods. A proof-of-concept implementation validates these improvements: multiplying 128 batches of $64 \times 64$ matrices takes $0.20$s (CPMM) and $0.91$s (CCMM), yielding $205\times$ and $64\times$ speedups over previous methods. For a $64 \times 2048$ by $2048 \times 2048$ multiplication, our CCMM completes in $7.8$s, achieving a $28\times$ speedup compared to Park's algorithm.

Expand

Additional news items may be found on the IACR news page.

International Association for Cryptologic Research

International Association
for Cryptologic Research