IACR News
Here you can see all recent updates to the IACR webpage. These updates are also available:
28 June 2024
Shuichi Katsumata, Michael Reichle, Kaoru Takemure
ePrint Report
T-out-of-N threshold signatures have recently seen a renewed interest, with various types now available, each offering different tradeoffs.
However, one property that has remained elusive is adaptive security. When we target thresholdizing existing efficient signatures schemes based on the Fiat-Shamir paradigm such as Schnorr, the elusive nature becomes clear. This class of signature schemes typically rely on the forking lemma to prove unforgeability. That is, an adversary is rewound and run twice within the security game. Such a proof is at odds with adaptive security, as the reduction must be ready to answer 2(T-1) secret key shares in total, implying that it can reconstruct the full secret key. Indeed, prior works either assumed strong idealized models such as the algebraic group model (AGM) or modified the underlying signature scheme so as not to rely on rewinding based proofs.
In this work, we propose a new proof technique to construct adaptively secure threshold signatures for existing rewinding-based Fiat-Shamir signatures. As a result, we obtain the following: 1. The first adaptively secure 5 round lattice-based threshold signature under the MLWE and MSIS assumptions in the ROM. The resulting signature is a standard signature of Raccoon, a lattice-based signature scheme by del Pino et al., submitted to the additional NIST call for proposals. 2. The first adaptively secure 5 round threshold signature under the DL assumption in the ROM. The resulting signature is a standard Schnorr signature. To the best of our knowledge, this is the first adaptively secure threshold signature based on DL even assuming stronger models like AGM.
Our work is inspired by the recent statically secure lattice-based 3 round threshold signature by del Pino et al. (Eurocrypt~2024) based on Raccoon. While they relied on so-called one-time additive masks to solve lattice specific issues, we notice that these masks can also be a useful tool to achieve adaptive security. At a very high level, we use these masks throughout the signing protocol to carefully control the information the adversary can learn from the signing transcripts. Intuitively, this allows the reduction to return a total of 2(T-1) randomly sampled secret key shares to the adversary consistently and without being detected, resolving the above paradoxical situation. Lastly, by allowing the parties to maintain a simple state, we can compress our 5 round schemes into 4 rounds.
In this work, we propose a new proof technique to construct adaptively secure threshold signatures for existing rewinding-based Fiat-Shamir signatures. As a result, we obtain the following: 1. The first adaptively secure 5 round lattice-based threshold signature under the MLWE and MSIS assumptions in the ROM. The resulting signature is a standard signature of Raccoon, a lattice-based signature scheme by del Pino et al., submitted to the additional NIST call for proposals. 2. The first adaptively secure 5 round threshold signature under the DL assumption in the ROM. The resulting signature is a standard Schnorr signature. To the best of our knowledge, this is the first adaptively secure threshold signature based on DL even assuming stronger models like AGM.
Our work is inspired by the recent statically secure lattice-based 3 round threshold signature by del Pino et al. (Eurocrypt~2024) based on Raccoon. While they relied on so-called one-time additive masks to solve lattice specific issues, we notice that these masks can also be a useful tool to achieve adaptive security. At a very high level, we use these masks throughout the signing protocol to carefully control the information the adversary can learn from the signing transcripts. Intuitively, this allows the reduction to return a total of 2(T-1) randomly sampled secret key shares to the adversary consistently and without being detected, resolving the above paradoxical situation. Lastly, by allowing the parties to maintain a simple state, we can compress our 5 round schemes into 4 rounds.
Animesh Singh, Sikhar Patranabis, Debdeep Mukhopadhyay
ePrint Report
An oblivious pseudorandom function (OPRF) is a two-party protocol in which a party holds an input and the other party holds the PRF key, such that the party having the input only learns the PRF output and the party having the key would not learn the input. Now, in a threshold oblivious pseudorandom function (TOPRF) protocol, a PRF key K is initially shared among T servers. A client can obtain a PRF value by interacting with t(≤ T) servers but is unable to compute the same with up to (t − 1) servers. In this paper, we present a practically efficient homomorphic encryption (HE)-based post-quantum secure TOPRF protocol. Our proposed approach, which is based on a novel use of threshold HE, is agnostic of the underlying PRF and outperforms existing fully homomorphic encryption (FHE)-based approaches for TOPRF computation by several orders of magnitude in terms of running time. The FHE-based approaches require bootstrapping, a computationally extensive operation, and the primary bottleneck for evaluating large-depth circuits. Whereas, our proposed approach is based on a multi-party computation (MPC) protocol that uses a threshold additive HE scheme based on Regev’s cryptosystem (J’ACM 2009) alternative to FHE-based approaches. Concretely, we show a novel replacement of bootstrapping required in traditional FHE schemes by a threshold additive HE-based interactive protocol that performs masked decryption followed by table look-ups, jointly performed by a group of servers holding secret shares of the HE decryption key. Finally, We present a practical validation of our approach by realizing an AES-based TOPRF with an evaluation time of less than 1 second on consumer-grade server(s).
Uddipana Dowerah, Subhranil Dutta, Frank Hartmann, Aikaterini Mitrokotsa, Sayantan Mukherjee, Tapas Pal
ePrint Report
Privacy is a major concern in large-scale digital applications, such as cloud-computing, machine learning services, and access control. Users want to protect not only their plain data but also their associated attributes (e.g., age, location, etc). Functional encryption (FE) is a cryptographic tool that allows fine-grained access control over encrypted data. However, existing FE fall short as they are either inefficient and far from reality or they leak sensitive user-specific information.
We propose SACfe, a novel attribute-based FE scheme that provides secure, fine-grained access control and hides both the user’s attributes and the function applied to the data, while preserving the data’s confidentiality. Moreover, it enables users to encrypt unbounded-length messages along with an arbitrary number of hidden attributes into ciphertexts. We design SACfe, a protocol for performing linear computation on encrypted data while enforcing access control based on inner product predicates. We show how SACfe can be used for online biometric authentication for privacy-preserving access control. As an additional contribution, we introduce an attribute-based linear FE for unbounded length of messages and functions where access control is realized by monotone span programs. We implement our protocols using the CiFEr cryptographic library and show its efficiency for practical settings.
We propose SACfe, a novel attribute-based FE scheme that provides secure, fine-grained access control and hides both the user’s attributes and the function applied to the data, while preserving the data’s confidentiality. Moreover, it enables users to encrypt unbounded-length messages along with an arbitrary number of hidden attributes into ciphertexts. We design SACfe, a protocol for performing linear computation on encrypted data while enforcing access control based on inner product predicates. We show how SACfe can be used for online biometric authentication for privacy-preserving access control. As an additional contribution, we introduce an attribute-based linear FE for unbounded length of messages and functions where access control is realized by monotone span programs. We implement our protocols using the CiFEr cryptographic library and show its efficiency for practical settings.
Yijing Ning, Jiankuo Dong, Jingqiang Lin, Fangyu Zheng, Yu Fu, Zhenjiang Dong, Fu Xiao
ePrint Report
$SPHINCS^+$, one of the Post-Quantum Cryptography Digital Signature Algorithms (PQC-DSA) selected by NIST in the third round, features very short public and private key lengths but faces significant performance challenges compared to other post-quantum cryptographic schemes, limiting its suitability for real-world applications. To address these challenges, we propose the GPU-based paRallel Accelerated $SPHINCS^+$ (GRASP), which leverages GPU technology to enhance the efficiency of $SPHINCS^+$ signing and verification processes. We propose an adaptable parallelization strategy for $SPHINCS^+$, analyzing its signing and verification processes to identify critical sections for efficient parallel execution. Utilizing CUDA, we perform bottom-up optimizations, focusing on memory access patterns and hypertree computation, to enhance GPU resource utilization. These efforts, combined with kernel fusion technology, result in significant improvements in throughput and overall performance. Extensive experimentation demonstrates that our optimized CUDA implementation of $SPHINCS^+$ achieves superior performance. Specifically, our GRASP scheme delivers throughput improvements ranging from 1.37× to 5.13× compared to state-of-the-art GPU-based solutions and surpasses the NIST reference implementation by over three orders of magnitude, highlighting a significant performance advantage.
Ananya Appan, David Heath, Ling Ren
ePrint Report
Oblivious RAM (ORAM) allows a client to securely outsource memory storage to an untrusted server. It has been shown that no ORAM can simultaneously achieve small bandwidth blow-up, small client storage, and a single roundtrip of latency.
We consider a weakening of the RAM model, which we call the Single Access Machine (SAM) model. In the SAM model, each memory slot can be written to at most once and read from at most once. We adapt existing tree-based ORAM to obtain an oblivious SAM (OSAM) that has $O(\log n)$ bandwidth blow-up (which we show is optimal), small client storage, and a single roundtrip.
OSAM unlocks improvements to oblivious data structures/algorithms. For instance, we achieve oblivious unbalanced binary trees (e.g. tries, splay trees). By leveraging splay trees, we obtain a notion of caching ORAM, where an access in the worst case incurs amortized $O(\log^2 n)$ bandwidth blow-up and $O(\log n)$ roundtrips, but in many common cases (e.g. sequential scans) incurs only amortized $O(\log n)$ bandwidth blow-up and $O(1)$ roundtrips. We also give new oblivious graph algorithms, including computing minimum spanning trees and single source shortest paths, in which the OSAM client reads/writes $O(|E| \cdot \log |E|)$ words using $O(|E|)$ roundtrips, where $|E|$ is the number of edges. This improves over prior custom solutions by a log factor.
At a higher level, OSAM provides a general model for oblivious computation. We construct a programming interface around OSAM that supports arbitrary pointer-manipulating programs such that dereferencing a pointer to an object incurs $O(\log d \log n)$ bandwidth blowup and $O(\log d)$ roundtrips, where $d$ is the number of pointers to that object. This new interface captures a wide variety of data structures and algorithms (e.g., trees, tries, doubly-linked lists) while matching or exceeding prior best asymptotic results. It both unifies much of our understanding of oblivious computation and allows the programmer to write oblivious algorithms combining various common data structures/algorithms and beyond.
We consider a weakening of the RAM model, which we call the Single Access Machine (SAM) model. In the SAM model, each memory slot can be written to at most once and read from at most once. We adapt existing tree-based ORAM to obtain an oblivious SAM (OSAM) that has $O(\log n)$ bandwidth blow-up (which we show is optimal), small client storage, and a single roundtrip.
OSAM unlocks improvements to oblivious data structures/algorithms. For instance, we achieve oblivious unbalanced binary trees (e.g. tries, splay trees). By leveraging splay trees, we obtain a notion of caching ORAM, where an access in the worst case incurs amortized $O(\log^2 n)$ bandwidth blow-up and $O(\log n)$ roundtrips, but in many common cases (e.g. sequential scans) incurs only amortized $O(\log n)$ bandwidth blow-up and $O(1)$ roundtrips. We also give new oblivious graph algorithms, including computing minimum spanning trees and single source shortest paths, in which the OSAM client reads/writes $O(|E| \cdot \log |E|)$ words using $O(|E|)$ roundtrips, where $|E|$ is the number of edges. This improves over prior custom solutions by a log factor.
At a higher level, OSAM provides a general model for oblivious computation. We construct a programming interface around OSAM that supports arbitrary pointer-manipulating programs such that dereferencing a pointer to an object incurs $O(\log d \log n)$ bandwidth blowup and $O(\log d)$ roundtrips, where $d$ is the number of pointers to that object. This new interface captures a wide variety of data structures and algorithms (e.g., trees, tries, doubly-linked lists) while matching or exceeding prior best asymptotic results. It both unifies much of our understanding of oblivious computation and allows the programmer to write oblivious algorithms combining various common data structures/algorithms and beyond.
Ferhat Karakoç, Betül Güvenç Paltun, Leyli Karaçay, Ömer Tuna, Ramin Fuladi, Utku Gülen
ePrint Report
Enhancing privacy in federal learning (FL) without considering robustness can create an open door for attacks such as poisoning attacks on the FL process. Thus, addressing both the privacy and security aspects simultaneously becomes vital. Although, there are a few solutions addressing both privacy and security in the literature in recent years, they have some drawbacks such as requiring two non-colluding servers, heavy cryptographic operations, or peer-to-peer communication topology. In this paper, we introduce a novel framework that allows the server to run some analysis for detection and mitigation of attacks towards the FL process, while satisfying the confidentiality requirements for the training data against the server. We evaluate the effectiveness of the framework in terms of security and privacy by performing experiments on some concrete examples. We also provide two instantiations of the framework with two different secure aggregation protocols to give a more concrete view how the framework works and we analyse the computation and communication overhead of the framework.
Dung Bui, Geoffroy Couteau, Nikolas Melissaris
ePrint Report
In this note, we introduce structured-seed local pseudorandom generators, a relaxation of local pseudorandom generators. We provide constructions of this primitive under the sparse-LPN assumption, and explore its implications.
Martin Zbudila, Erik Pohle, Aysajan Abidin, Bart Preneel
ePrint Report
Secure multi-party computation (MPC) in a three-party, honest majority scenario is currently the state-of-the-art for running machine learning algorithms in a privacy-preserving manner. For efficiency reasons, fixed-point arithmetic is widely used to approximate computation over decimal numbers. After multiplication in fixed-point arithmetic, truncation is required to keep the result's precision. In this paper, we present an efficient three-party truncation protocol secure in the presence of an active adversary without pre-processing and improve on the current state-of-the-art in MPC over rings using replicated secret sharing (RSS). By adding an efficient consistency check, we lift the efficient but only passively secure three-party truncation protocol from the ABY3 framework by Mohassel and Rindal into the malicious setting without pre-processed data. Our benchmark indicates performance improvements of an order of magnitude in the offline phase for a single batch training. Finally, we apply our protocol to a real-world application for diagnostic prediction based on publicly available ECG heartbeat data. We achieve an improvement by a factor of two in the total throughput for both LAN and WAN settings.
Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Maximilian Orlt
ePrint Report
While passive side-channel attacks and active fault attacks have been studied intensively in the last few decades, strong attackers combining these attacks have only been studied relatively recently. Due to its simplicity, most countermeasures against passive attacks are based on additive sharing. Unfortunately, extending these countermeasures against faults often leads to quite a significant performance penalty, either due to the use of expensive cryptographic operations or a large number of shares due to massive duplication. Just recently, Berndt, Eisenbarth, Gourjon, Faust, Orlt, and Seker thus proposed to use polynomial sharing against combined attackers (CRYPTO 2023). While they construct gadgets secure against combined attackers using only a linear number of shares, the overhead introduced might still be too large for practical scenarios.
In this work, we show how the overhead of nearly all known constructions using polynomial sharing can be reduced by nearly half by embedding two secrets in the coefficients of one polynomial at the expense of increasing the degree of the polynomial by one. We present a very general framework that allows adapting these constructions to this new sharing scheme and prove the security of this approach against purely passive side-channel attacks, purely active fault attacks, and combined attacks. Furthermore, we present new gadgets allowing us to operate upon the different secrets in a number of useful ways.
In this work, we show how the overhead of nearly all known constructions using polynomial sharing can be reduced by nearly half by embedding two secrets in the coefficients of one polynomial at the expense of increasing the degree of the polynomial by one. We present a very general framework that allows adapting these constructions to this new sharing scheme and prove the security of this approach against purely passive side-channel attacks, purely active fault attacks, and combined attacks. Furthermore, we present new gadgets allowing us to operate upon the different secrets in a number of useful ways.
Reyhaneh Rabaninejad, Behzad Abdolmaleki, Sebastian Ramacher, Daniel Slamanig, Antonis Michalas
ePrint Report
Self-sovereign identity (SSI) systems empower users to (anonymously) establish and verify their identity when accessing both digital and real-world resources, emerging as a promising privacy-preserving solution for user-centric identity management. Recent work by Maram et al. proposes the privacy-preserving Sybil-resistant decentralized SSI system CanDID (IEEE S&P 2021). While this is an important step, notable shortcomings undermine its efficacy. The two most significant among them being the following: First, unlinkability breaks in the presence of a single malicious issuer. Second, it introduces interactiveness, as the users are required to communicate each time with issuers to collect credentials intended for use in interactions with applications. This contradicts the goal of SSI, whose aim is to give users full control over their identities. This paper first introduces the concept of publicly verifiable attribute-based threshold anonymous counting tokens (tACT). Unlike recent approaches confined to centralized settings (Benhamouda et al., ASIACRYPT 2023), tACT operates in a distributed-trust environment. Accompanied by a formal security model and a provably secure instantiation, tACT introduces a novel dimension to token issuance, which, we believe, holds independent interest. Next, the paper leverages the proposed tACT scheme to construct an efficient Sybil-resistant SSI system. This system supports various functionalities, including threshold issuance, unlinkable multi-show selective disclosure, and non-interactive, non-transferable credentials that offer constant-size credentials. Finally, our benchmark results show an efficiency improvement in our construction when compared to CanDID all while accommodating a greater number of issuers and additionally reducing to a one-round protocol that can be run in parallel with all issuers.
Feixiang Zhao, Huaxiong Wang, Jian Weng
ePrint Report
Proxy re-encryption is a cryptosystem that achieves efficient encrypted data sharing by allowing a proxy to transform a ciphertext encrypted under one key into another ciphertext under a different key. Homomorphic proxy re-encryption (HPRE) extends this concept by integrating homomorphic encryption, allowing not only the sharing of encrypted data but also the homomorphic computations on such data. The existing HPRE schemes, however, are limited to a single or bounded number of hops of ciphertext re-encryptions. To address this limitation, this paper introduces a novel lattice-based, unbounded multi-hop fully homomorphic proxy re-encryption (FHPRE) scheme, with constant-size ciphertexts. Our FHPRE scheme supports an unbounded number of reencryption operations and enables arbitrary homomorphic computations over original, re-encrypted, and evaluated ciphertexts. Additionally, we propose a potential application of our FHPRE scheme in the form of a non-interactive, constant-size multi-user computation system for cloud computing environments.
Ghada Almashaqbeh, Sixia Chen, Alexander Russell
ePrint Report
Layer-two blockchain protocols emerged to address scalability issues related to fees, storage cost, and confirmation delay of on-chain transactions. They aggregate off-chain transactions into a fewer on-chain ones, thus offering immediate settlement and reduced transaction fees. To preserve security of the underlying ledger, layer-two protocols often work in a collateralized model; resources are committed on-chain to backup off-chain activities. A fundamental challenge that arises in this setup is determining a policy for establishing, committing, and replenishing the collateral in a way that maximizes the value of settled transactions.
In this paper, we study this problem under two settings that model collateralized layer-two protocols. The first is a general model in which a party has an on-chain collateral $C$ with a policy to decide on whether to settle or discard each incoming transaction. The policy also specifies when to replenish $C$ based on the remaining collateral value. The second model considers a discrete setup in which $C$ is divided among $k$ wallets, each of which is of size $C/k$, such that when a wallet is full, and so cannot settle any incoming transactions, it will be replenished. We devise several online policies for these models, and show how competitive they are compared to optimal (offline) policies that have full knowledge of the incoming transaction stream. To the best of our knowledge, we are the first to study and formulate online competitive policies for collateral and wallet management in the blockchain setting.
In this paper, we study this problem under two settings that model collateralized layer-two protocols. The first is a general model in which a party has an on-chain collateral $C$ with a policy to decide on whether to settle or discard each incoming transaction. The policy also specifies when to replenish $C$ based on the remaining collateral value. The second model considers a discrete setup in which $C$ is divided among $k$ wallets, each of which is of size $C/k$, such that when a wallet is full, and so cannot settle any incoming transactions, it will be replenished. We devise several online policies for these models, and show how competitive they are compared to optimal (offline) policies that have full knowledge of the incoming transaction stream. To the best of our knowledge, we are the first to study and formulate online competitive policies for collateral and wallet management in the blockchain setting.
Nicholas Michel, Mohamed E. Najd, Ghada Almashaqbeh
ePrint Report
Automated market makers (AMMs) are a form of decentralized cryptocurrency exchanges and considered a prime example of Decentralized Finance (DeFi) applications. Their popularity and high trading activity have resulted in millions of on-chain transactions leading to serious scalability issues. In this paper, we address the on-chain storage overhead problem of AMMs by utilizing a new sidechain architecture as a layer 2 solution, building a system called ammBoost. Our system reduces the amount of on-chain transactions, boosts throughput, and supports blockchain pruning. We devise several techniques to enable layer 2 processing for AMMs while preserving correctness and security of the underlying AMM. We also build a proof-of-concept of ammBoost for a Uniswap-inspired use case to empirically evaluate its performance. Our experiments show that ammBoost decreases the gas cost by 94.53% and the chain growth by at least 80%, and that it can support up to 500x of the daily traffic volume observed for Uniswap in practice.
Zahra Motaqy, Mohamed E. Najd, Ghada Almashaqbeh
ePrint Report
Cryptocurrencies and blockchain technology provide an innovative model for reshaping digital services. Driven by the movement toward Web 3.0, recent systems started to provide distributed services, such as computation outsourcing or file storage, on top of the currency exchange medium. By allowing anyone to join and collect cryptocurrency payments for serving others, these systems create decentralized markets for trading digital resources. Yet, there is still a big gap between the promise of these markets and their practical viability. Existing initiatives are still early-stage and have already encountered security and efficiency obstacles. At the same time, existing work around promising ideas, specifically sidechains, fall short in exploiting their full potential in addressing these problems.
To bridge this gap, we propose chainBoost, a secure performance booster for decentralized resource markets. It expedites service related operations, reduces the blockchain size, and supports flexible service-payment exchange modalities at low overhead. At its core, chainBoost employs a sidechain, that has a (security and semantic) mutual-dependence with the mainchain, to which the system offloads heavy/frequent operations. To enable it, we develop a novel sidechain architecture composed of temporary and permanent blocks, a block suppression mechanism to prune the sidechain, a syncing protocol to permit arbitrary data exchange between the two chains, and an autorecovery protocol to support robustness and resilience. We analyze the security of chainBoost, and implement a proof-of-concept prototype for a distributed file storage market as a use case. For a market handling around 2000 transactions per round, our experiments show up to 11x improvement in throughput and 94% reduction in confirmation time. They also show that chainBoost can reduce the main blockchain size by about 90%, and that it outperforms comparable optimistic rollup solutions by reducing transaction finality by 99.7%.
To bridge this gap, we propose chainBoost, a secure performance booster for decentralized resource markets. It expedites service related operations, reduces the blockchain size, and supports flexible service-payment exchange modalities at low overhead. At its core, chainBoost employs a sidechain, that has a (security and semantic) mutual-dependence with the mainchain, to which the system offloads heavy/frequent operations. To enable it, we develop a novel sidechain architecture composed of temporary and permanent blocks, a block suppression mechanism to prune the sidechain, a syncing protocol to permit arbitrary data exchange between the two chains, and an autorecovery protocol to support robustness and resilience. We analyze the security of chainBoost, and implement a proof-of-concept prototype for a distributed file storage market as a use case. For a market handling around 2000 transactions per round, our experiments show up to 11x improvement in throughput and 94% reduction in confirmation time. They also show that chainBoost can reduce the main blockchain size by about 90%, and that it outperforms comparable optimistic rollup solutions by reducing transaction finality by 99.7%.
Archisman Ghosh, Md. Abdur Rahman, Debayan Das, Santosh Ghosh, Shreyas Sen
ePrint Report
Mathematically secured cryptographic implementations leak critical information in terms of power, EM emanations, etc. Several circuit-level countermeasures are proposed to hinder side channel leakage at the source. Circuit-level countermeasures (e.g., IVR, STELLAR, WDDL, etc) are often preferred as they are generic and have low overhead. They either dither the voltage randomly or attenuate the meaningful signature at $V_{DD}$ port. Although any digital implementation has two generic ports, namely clock and $V_{DD}$, circuit-level countermeasures primarily focus on $V_{DD}$ port, and countermeasures using the clock are mainly unexplored. System-level clock randomization is ineffective due to post-processing techniques. This work, for the first time, presents clock-based countermeasures by providing a controlled slew that exploits the inherent variability of digital circuits in terms of power consumption and transforms power/EM emanation into a complex function of data and slew. Due to this, minimum traces-to-disclosure (MTD) improves by 100$\times$ with respect to the unprotected one.
Moreover, the slewed clock reduces the leaky frequency, and the clock randomization countermeasure is more effective as it becomes more difficult} to post-process in the frequency domain. Clock slew and randomization together have a cumulative effect(1800x) more than the multiplication of individual techniques (100x & 5x respectively). In brief, this paper presents a clock-level generic synthesizable countermeasure technique that improved the minimum-traces-to-disclosure (MTD) by 1800$\times$ and incurs only 11% area overhead, $<3\%$ power overhead (measured) and $<6\%$ performance overhead (measured). Moreover, this can be easily combined with other power-port-based mitigation techniques for enhanced security.
Alan Li, Qingkai Liang, Mo Dong
ePrint Report
As deep learning is being widely adopted across various domains, ensuring the integrity of models has become increasingly crucial. Despite the recent advances in Zero-Knowledge Machine Learning (ZKML) techniques, proving the inference over large ML models is still prohibitive. To enable practical ZKML, model simplification techniques like pruning and quantization should be applied without hesitation. Contrary to conventional belief, recent development in ML space have demonstrated that these simplification techniques not only condense complex models into forms with sparse, low-bit weight matrices, but also maintain exceptionally high model accuracies that matches its unsimplified counterparts.
While such transformed models seem inherently ZK-friendly, directly applying existing ZK proof frameworks still lead to suboptimal inference proving performance. To make ZKML truly practical, a quantization-and-pruning-aware ZKML framework is needed. In this paper, we propose SpaGKR, a novel sparsity-aware ZKML framework that is proven to surpass capabilities of existing ZKML methods. SpaGKR is a general framework that is widely applicable to any computation structure where sparsity arises. It is designed to be modular - all existing GKR-based ZKML frameworks can be seamlessly integrated with it to get remarkable compounding performance enhancements. We tailor SpaGKR specifically to the most commonly-used neural network structure - the linear layer, and propose the SpaGKR-LS protocol that achieves asymptotically optimal prover time. Notably, when applying SpaGKR-LS to a special series of simplified model - ternary network, it achieves further efficiency gains by additionally leveraging the low-bit nature of model parameters.
While such transformed models seem inherently ZK-friendly, directly applying existing ZK proof frameworks still lead to suboptimal inference proving performance. To make ZKML truly practical, a quantization-and-pruning-aware ZKML framework is needed. In this paper, we propose SpaGKR, a novel sparsity-aware ZKML framework that is proven to surpass capabilities of existing ZKML methods. SpaGKR is a general framework that is widely applicable to any computation structure where sparsity arises. It is designed to be modular - all existing GKR-based ZKML frameworks can be seamlessly integrated with it to get remarkable compounding performance enhancements. We tailor SpaGKR specifically to the most commonly-used neural network structure - the linear layer, and propose the SpaGKR-LS protocol that achieves asymptotically optimal prover time. Notably, when applying SpaGKR-LS to a special series of simplified model - ternary network, it achieves further efficiency gains by additionally leveraging the low-bit nature of model parameters.
Senegue Gomez Nyamsi, Laurian Guimagang Azebaze, Emmanuel Fouotsa
ePrint Report
Since the advent of pairing based cryptography, many researchers have developed several techniques and variants of pairings to optimise the speed of pairing computations. The selection of the elliptic curve for a given pairing based protocol is crucial for operations in the first and second pairing groups of points of the elliptic curve and for many cryptographic schemes. A new variant of superoptimal pairing was proposed in 2023, namely x-superoptimal pairing on curves with odd prime embedding degrees BW13-310 and BW19-286. This paper extends the definition of the x-superoptimal pairing on elliptic curves with even embedding degrees BW10-511 and BW14-351 at 128 bits security level. We provide a suitable formula of the x-superoptimal pairing on BW10-511 and BW14-351 where the Miller loop is about $13.5\%$ and $21.6\%$ faster than the optimal ate pairing on BW10-511 and BW14-351 respectively. The correctness of the x-superoptimal pairing on BW10-511 and BW14-351 and bilinearity has been verified by a Magma code.
Rui Gao, Zhiguo Wan, Yuncong Hu, Huaqun Wang
ePrint Report
A range proof serves as a protocol for the prover to prove to the verifier that a committed number lies in a specified range, such as $[0,2^n)$, without disclosing the actual value. Range proofs find extensive application in various domains. However, the efficiency of many existing schemes diminishes significantly when confronted with batch proofs encompassing multiple elements.
To improve the scalability and efficiency, we propose MissileProof, a vector range proof scheme, proving that every element in the committed vector is within $[0,2^n)$. We first reduce this argument to a bi-to-univariate SumCheck problem and a bivariate polynomial ZeroTest problem. Then generalizing the idea of univariate SumCheck PIOP, we design a bi-to-univariate SumCheck PIOP. By introducing a random polynomial, we construct the bivariate polynomial ZeroTest using a univariate polynomial ZeroTest and a univariate polynomial SumCheck PIOP. Finally, combining the PIOP for vector range proof, a KZG-based polynomial commitment scheme and the Fiat-Shamir transformation, we get a zero-knowledge succinct non-interactive vector range proof.
Compared with existing schemes, our scheme has the optimal proof size ($O(1)$), the optimal commitment length ($O(1)$),
and the optimal verification time ($O(1)$), at the expense of slightly sacrificing proof time ($O(l\log l\cdot n\log n)$ operations on the prime field for FFT and $O(ln)$ group exponentiations in $\mathbb{G}$). Moreover, we implemented an anti-money-laundering stateless blockchain based on the MissileProof. The gas consumption of the verification smart contract is reduced by 85%.
Mingfei Yu, Giovanni De Micheli
ePrint Report
Fully homomorphic encryption (FHE) enables secure data processing without compromising data access, but its computational cost and slower execution compared to plaintext operations pose challenges. The growing interest in FHE-based secure computation necessitates the acceleration of homomorphic computations. While existing research primarily targets the reduction of the multiplicative depth (MD) of homomorphic circuits, this paper addresses the trade-off between MD reduction and the increase in multiplicative complexity (MC), a critical gap often overlooked during circuit optimization and potentially resulting in suboptimal outcomes. Three contributions are presented: (a) an exact synthesis paradigm for optimal homomorphic circuit implementations, (b) an efficient heuristic algorithm named MC-aware MD minimization, and (c) a homomorphic circuit optimization flow combining MC-aware MD minimization with existing MD reduction techniques. Experimental results demonstrate a 21.32% average reduction in homomorphic computation time and showcase significantly improved efficiency in circuit optimization.
Jung Hee Cheon, Hyeongmin Choe, Minsik Kang, Jaehyung Kim
ePrint Report
The RNS variant of the CKKS scheme (SAC 2018) is widely implemented due to its computational efficiency. However, the current optimized implementations of the RNS-CKKS scheme have a limitation when choosing the ciphertext modulus. It requires the scale factors to be approximately equal to a factor (or a product of factors) of the ciphertext modulus. This restriction causes inefficiency when the scale factor is not close to the power of the machine's word size, wasting the machine's computation budget.
In this paper, we solve this implementation-side issue algorithmically by introducing \emph{Grafting}, a ciphertext modulus management system. In Grafting, we mitigate the link between the ciphertext modulus and the application-dependent scale factor. We efficiently enable rescaling by an arbitrary amount of bits by suggesting a method managing the ciphertext modulus with mostly word-sized factors. Thus, we can fully utilize the machine architecture with word-sized factors of the ciphertext modulus while keeping the application-dependent scale factors. This also leads to hardware-friendly RNS-CKKS implementation as a side effect. Furthermore, we apply our technique to Tuple-CKKS multiplication (CCS 2023), solving a restriction due to small scale factors.
Our proof-of-concept implementation shows that the overall complexity of RNS-CKKS is almost proportional to the number of coprime factors comprising the ciphertext modulus, of size smaller than the machine's word size. This results in a substantial speed-up from Grafting: $17$-$51$% faster homomorphic multiplications and $43$% faster CoeffsToSlots in bootstrapping, implemented based on the HEaaN library. We estimate that the computational gain could range up to $1.71\times$ speed-up for the current parameters used in the RNS-CKKS libraries.
In this paper, we solve this implementation-side issue algorithmically by introducing \emph{Grafting}, a ciphertext modulus management system. In Grafting, we mitigate the link between the ciphertext modulus and the application-dependent scale factor. We efficiently enable rescaling by an arbitrary amount of bits by suggesting a method managing the ciphertext modulus with mostly word-sized factors. Thus, we can fully utilize the machine architecture with word-sized factors of the ciphertext modulus while keeping the application-dependent scale factors. This also leads to hardware-friendly RNS-CKKS implementation as a side effect. Furthermore, we apply our technique to Tuple-CKKS multiplication (CCS 2023), solving a restriction due to small scale factors.
Our proof-of-concept implementation shows that the overall complexity of RNS-CKKS is almost proportional to the number of coprime factors comprising the ciphertext modulus, of size smaller than the machine's word size. This results in a substantial speed-up from Grafting: $17$-$51$% faster homomorphic multiplications and $43$% faster CoeffsToSlots in bootstrapping, implemented based on the HEaaN library. We estimate that the computational gain could range up to $1.71\times$ speed-up for the current parameters used in the RNS-CKKS libraries.