IACR News
Here you can see all recent updates to the IACR webpage. These updates are also available:
05 February 2024
Xiaohai Dai, Guanxiong Wang, Jiang Xiao, Zhengxuan Guo, Rui Hao, Xia Xie, Hai Jin
ePrint Report
To improve the throughput of Byzantine Fault Tolerance (BFT) consensus protocols, the Directed Acyclic Graph (DAG) topology has been introduced to parallel data processing, leading to the development of DAG-based BFT consensus. However, existing DAG-based works heavily rely on Reliable Broadcast (RBC) protocols for block broadcasting, which introduces significant latency due to the three communication steps involved in each RBC. For instance, DAGRider, a representative DAG-based protocol, exhibits a best latency of 12 steps, considerably higher than non-DAG protocols like PBFT, which only requires 3 steps. To tackle this issue, we propose LightDAG, which replaces RBC with lightweight broadcasting protocols such as Consistent Broadcast (CBC) and Plain Broadcast (PBC). Since CBC and PBC can be implemented in two and one communication steps, respectively, LightDAG achieves low latency.
In our proposal, we present two variants of LightDAG, namely LightDAG1 and LightDAG2, each providing a trade-off between the best latency and the expected worst latency. In LightDAG1, every block is broadcast using CBC, which exhibits a best latency of 5 steps and an expected worst latency of 14 steps. Since CBC cannot guarantee the totality property, we design a block retrieval mechanism in LightDAG1 to assist replicas in retrieving missing blocks. LightDAG2 utilizes a combination of PBC and CBC for block broadcasting, resulting in a best latency of 4 steps and an expected worst latency of $12(t+1)$ steps, where $t$ represents the number of actual Byzantine replicas. Since a Byzantine replica may equivocate through PBC, LightDAG2 prohibits blocks from directly referencing contradictory blocks. To ensure liveness, we propose a mechanism to identify and exclude Byzantine replicas if they engage in equivocation attacks. Extensive experiments have been conducted to evaluate LightDAG, and the results demonstrate its feasibility and efficiency.
Suvradip Chakraborty, Stanislav Peceny, Srinivasan Raghuraman, Peter Rindal
ePrint Report
Secure merge considers the problem of combining two sorted lists into a single sorted secret-shared list. Merge is a fundamental building block for many real-world applications. For example, secure merge can implement a large number of SQL-like database joins, which are essential for almost any data processing task such as privacy-preserving fraud detection, ad conversion rates, data deduplication, and many more.
We present two constructions with communication bandwidth and rounds tradeoff. Logstar, our bandwidth-optimized construction, takes inspiration from Falk and Ostrovsky (ITC, 2021) and runs in $O(n\log^*n)$ time and communication with $O(\log n)$ rounds. In particular, for all conceivable $n$, the $\log^*n$ factor will be equal to the constant $2$ and therefore we achieve a near-linear running time. Median, our rounds-optimized construction, builds on the classic parallel median-based merge approach of Valiant (SIAM J. Comput., 1975), and requires $O(n \log^c n)$, $1
We introduce two additional constructions that merge input lists of different sizes. SquareRootMerge, merges lists of sizes $n^{\frac{1}{2}}$ and $n$, and runs in $O(n)$ time and communication with $O(\log n)$ rounds. CubeRootMerge is inspired by Blunk et al.'s (2022) construction and merges lists of sizes $n^{\frac{1}{3}}$ and $n$. It runs in $O(n)$ time and communication with $O(1)$ rounds.
We optimize our constructions for concrete efficiency. Today, concretely efficient secure merge protocols rely on standard techniques such as GMW or generic sorting. These approaches require a $O(n \log n)$ sized circuit of $O(\log n)$ depth. In contrast, our constructions are efficient and achieve superior asymptotics. We benchmark our constructions and obtain significant improvements. For example, Logstar reduces bandwidth costs $\approx 3.3\times$ and Median reduces rounds $\approx2.43\times$.
We present two constructions with communication bandwidth and rounds tradeoff. Logstar, our bandwidth-optimized construction, takes inspiration from Falk and Ostrovsky (ITC, 2021) and runs in $O(n\log^*n)$ time and communication with $O(\log n)$ rounds. In particular, for all conceivable $n$, the $\log^*n$ factor will be equal to the constant $2$ and therefore we achieve a near-linear running time. Median, our rounds-optimized construction, builds on the classic parallel median-based merge approach of Valiant (SIAM J. Comput., 1975), and requires $O(n \log^c n)$, $1
We introduce two additional constructions that merge input lists of different sizes. SquareRootMerge, merges lists of sizes $n^{\frac{1}{2}}$ and $n$, and runs in $O(n)$ time and communication with $O(\log n)$ rounds. CubeRootMerge is inspired by Blunk et al.'s (2022) construction and merges lists of sizes $n^{\frac{1}{3}}$ and $n$. It runs in $O(n)$ time and communication with $O(1)$ rounds.
We optimize our constructions for concrete efficiency. Today, concretely efficient secure merge protocols rely on standard techniques such as GMW or generic sorting. These approaches require a $O(n \log n)$ sized circuit of $O(\log n)$ depth. In contrast, our constructions are efficient and achieve superior asymptotics. We benchmark our constructions and obtain significant improvements. For example, Logstar reduces bandwidth costs $\approx 3.3\times$ and Median reduces rounds $\approx2.43\times$.
Pousali Dey, Pratyay Mukherjee, Swagata Sasmal, Rohit Sinha
ePrint Report
Threshold symmetric encryption (TSE), introduced by Agrawal et al. [DiSE, CCS 2018], provides scalable and decentralized solution for symmetric encryption by ensuring that the secret-key stays distributed at all times. They avoid having a single point of attack or failure, while achieving the necessary security requirements. TSE was further improved by Christodorescu et al. [ATSE, CCS 2021] to support an amortization feature which enables a “more privileged” client to encrypt records in bulk by interacting only once with the key servers, while decryption must be performed individually for each record, potentially by a “less privileged” client. However, typical enterprises collect or generate data once and query it several times over its lifecycle in various data processing pipelines; i.e., enterprise workloads are often decryption heavy! ATSE does not meet the bar for this setting because of linear interaction / computation (in the number of records to be decrypted) – our experiments show that ATSE provides a sub-par throughput of a few hundred records / sec.
We observe that a large class of queries read a subsequence of records (e.g. a time window) from the database. With this access structure in mind, we build a new TSE scheme which allows for both encryption and decryption with flexible granularity, in that a client’s interactions with the key servers is at most logarithmic in the number of records. Our idea is to employ a binary-tree access structure over the data, where only one interaction is needed to decrypt all ciphertexts within a sub-tree, and thus only log-many for any arbitrary size sub-sequence. Our scheme incorporates ideas from binary-tree encryption by Canetti et al. [Eurocrypt 2003] and its variants, and carefully merges that with Merkle-tree commitments to fit into the TSE setting. We formalize this notion as hierarchical threshold symmetric-key encryption (HiSE), and argue that our construction satisfies all essential TSE properties, such as correctness, privacy and authenticity with respect to our definition. Our analysis relies on a well-known XDH assumption and a new assumption, that we call $\ell$-masked BDDH, over asymmetric bilinear pairing in the programmable random oracle model. We also show that our new assumption does hold in generic group model.
We provide an open-source implementation of HiSE. For practical parameters, we see 65$\times$ improvement in latency and throughput over ATSE. HiSE can decrypt over 6K records / sec on server-grade hardware, but the logarithmic overhead in HiSE’s encryption (not decryption) only lets us encrypt up to 3K records / sec (about 3-4.5$\times$ slowdown) and incurs roughly 500 bytes of ciphertext expansion per record – while reducing this penalty is an important future work, we believe HiSE can offer an acceptable tradeoff in practice.
We observe that a large class of queries read a subsequence of records (e.g. a time window) from the database. With this access structure in mind, we build a new TSE scheme which allows for both encryption and decryption with flexible granularity, in that a client’s interactions with the key servers is at most logarithmic in the number of records. Our idea is to employ a binary-tree access structure over the data, where only one interaction is needed to decrypt all ciphertexts within a sub-tree, and thus only log-many for any arbitrary size sub-sequence. Our scheme incorporates ideas from binary-tree encryption by Canetti et al. [Eurocrypt 2003] and its variants, and carefully merges that with Merkle-tree commitments to fit into the TSE setting. We formalize this notion as hierarchical threshold symmetric-key encryption (HiSE), and argue that our construction satisfies all essential TSE properties, such as correctness, privacy and authenticity with respect to our definition. Our analysis relies on a well-known XDH assumption and a new assumption, that we call $\ell$-masked BDDH, over asymmetric bilinear pairing in the programmable random oracle model. We also show that our new assumption does hold in generic group model.
We provide an open-source implementation of HiSE. For practical parameters, we see 65$\times$ improvement in latency and throughput over ATSE. HiSE can decrypt over 6K records / sec on server-grade hardware, but the logarithmic overhead in HiSE’s encryption (not decryption) only lets us encrypt up to 3K records / sec (about 3-4.5$\times$ slowdown) and incurs roughly 500 bytes of ciphertext expansion per record – while reducing this penalty is an important future work, we believe HiSE can offer an acceptable tradeoff in practice.
Jeroen van de Graaf, Arjen K. Lenstra
ePrint Report
Almost all practical cryptographic protocols are based on computational or ad-hoc assumptions. Assessing the strengths of these assumptions is therefore a key factor in evaluating the risks of the systems using them. As a service to (and by) cryptographic researchers and practitioners, we propose to create Delphi, a public database where researchers document their opinions and beliefs about the strengths of the most important assumptions. We believe this effort will be of great value when deciding which cryptographic primitives to keep or start using.
Philippe Chartier, Michel Koskas, Mohammed Lemou, Florian Méhats
ePrint Report
In the context of fully-homomorphic-encryption, we consider the representation of large integers by their decomposition over a product of rings (through the Chinese Remainder Theorem) and introduce a new algorithm for the determination of the sign solely through the knowledge of ring-components. We then prove that our algorithm delivers a correct result with a very high probability.
Philippe Chartier, Michel Koskas, Mohammed Lemou, Florian Méhats
ePrint Report
At the core of fully homomorphic encryption lies a procedure to refresh the ciphertexts whose noise component has grown too big. The efficiency of the so-called bootstrap is of paramount importance as it is usually regarded as the main bottleneck towards a real-life deployment of fully homomorphic crypto-systems. In two of the fastest implementations so far, the space of messages is limited to binary integers. If the message space is extended to the discretized torus $T_{p_i}$ or equivalently to $Z_{p_i}$ with values of $p_i$ large as compared to the dimension of the quotient ring in which the operations are realised, the bootstrap delivers incorrect results with far too high probability. As a consequence, the use of a residue numeral system to address large integers modulo $p=p_1 \times \ldots \times p_\kappa$ would be of limited interest in practical situations without the following remedy: rather than increasing the polynomial degree and thus the computational cost, we introduce here a novel and simple technique (hereafter referred to as ``collapsing") which, by grouping the components of the mask, attenuates both rounding errors and computational costs, and greatly helps to sharpen the correctness of the bootstrap. We then rigorously estimate the probability of success as well as the output error and determine practical parameters to reach a given correctness threshold.
Aurélien Dupin, Simon Abelard
ePrint Report
The problem of Broadcast Encryption (BE) consists in broadcasting an encrypted message to a large number of users or receiving devices in such a way that the emitter of the message can control which of the users can or cannot decrypt it.
Since the early 1990's, the design of BE schemes has received significant interest and many different concepts were proposed. A major breakthrough was achieved by Naor, Naor and Lotspiech (CRYPTO 2001) by partitioning cleverly the set of authorized users and associating a symmetric key to each subset. Since then, while there have been many advances in public-key based BE schemes, mostly based on bilinear maps, little was made on symmetric cryptography.
In this paper, we design a new symmetric-based BE scheme, named $\Sigma\Pi$BE, that relies on logic optimization and consensual security assumptions. It is competitive with the work of Naor et al. and provides a different tradeoff: the bandwidth requirement is significantly lowered at the cost of an increase in the key storage.
Since the early 1990's, the design of BE schemes has received significant interest and many different concepts were proposed. A major breakthrough was achieved by Naor, Naor and Lotspiech (CRYPTO 2001) by partitioning cleverly the set of authorized users and associating a symmetric key to each subset. Since then, while there have been many advances in public-key based BE schemes, mostly based on bilinear maps, little was made on symmetric cryptography.
In this paper, we design a new symmetric-based BE scheme, named $\Sigma\Pi$BE, that relies on logic optimization and consensual security assumptions. It is competitive with the work of Naor et al. and provides a different tradeoff: the bandwidth requirement is significantly lowered at the cost of an increase in the key storage.
Robin Geelen
ePrint Report
Numerous algorithms in homomorphic encryption require an operation that moves the slots of a ciphertext to the coefficients of a different ciphertext. We describe an FFT-like method for decomposing this slot-to-coefficient transformation (and its inverse) for BGV and BFV. The proposed method is specific to power-of-two cyclotomic rings and can handle both fully and sparsely packed slots. Previously, such a method was only known for non-power-of-two cyclotomic rings.
Our algorithm admits more freedom in the complexity-depth trade-off than prior works. Moreover, it brings down the computational complexity of the slot-to-coefficient transformation from a linear to a logarithmic number of FHE operations in the best case, which is shown via a detailed complexity analysis. We also provide a proof-of-concept implementation in the Magma bootstrapping library.
Our algorithm admits more freedom in the complexity-depth trade-off than prior works. Moreover, it brings down the computational complexity of the slot-to-coefficient transformation from a linear to a logarithmic number of FHE operations in the best case, which is shown via a detailed complexity analysis. We also provide a proof-of-concept implementation in the Magma bootstrapping library.
Patrick Derbez, Marie Euler
ePrint Report
This paper focuses on equivalences between Generalised Feistel Networks (GFN) of type-II. We introduce a new definition of equivalence which captures the concept that two GFNs are identical up to re-labelling of the inputs/outputs, and give a procedure to test this equivalence relation. Such two GFNs are therefore cryptographically equivalent for several classes of attacks. It induces a reduction of the space of possible GFNs: the set of the $(k!)^2$ possible even-odd GFNs with $2k$ branches can be partitioned into $k!$ different classes.
This result can be very useful when looking for an optimal GFN regarding specific computationally intensive properties, such as the minimal number of active S-boxes in a differential trail. We also show that in several previous papers, many GFN candidates are redundant as they belong to only a few classes. Because of this reduction of candidates, we are also able to suggest better permutations than the one of WARP: they reach 64 active S-boxes in one round less and still have the same diffusion round that WARP. Finally, we also point out a new family of permutations with good diffusion properties.
This result can be very useful when looking for an optimal GFN regarding specific computationally intensive properties, such as the minimal number of active S-boxes in a differential trail. We also show that in several previous papers, many GFN candidates are redundant as they belong to only a few classes. Because of this reduction of candidates, we are also able to suggest better permutations than the one of WARP: they reach 64 active S-boxes in one round less and still have the same diffusion round that WARP. Finally, we also point out a new family of permutations with good diffusion properties.
02 February 2024
Antonio Flórez-Gutiérrez, Yosuke Todo
ePrint Report
In some linear key recovery attacks, the function which determines the value of the linear approximation from the plaintext, ciphertext and key is replaced by a similar map in order to improve the time or memory complexity at the cost of a data complexity increase. We propose a general framework for key recovery map substitution, and introduce Walsh spectrum puncturing, which consists of removing carefully-chosen coefficients from the Walsh spectrum of this map. The capabilities of this technique are illustrated by describing improved attacks on reduced-round Serpent (including the first 12-round attack on the 192-bit key variant), GIFT-128 and NOEKEON, as well as the full DES.
Samuel Stevens, Emily Wenger, Cathy Yuanchen Li, Niklas Nolte, Eshika Saxena, Francois Charton, Kristin Lauter
ePrint Report
Learning with Errors (LWE) is a hard math problem underlying post-quantum cryptography (PQC) systems for key exchange and digital signatures, recently standardized by NIST. Prior work [Wenger et al., 2022; Li et al., 2023a;b] proposed new machine learning (ML)-based attacks on LWE problems with small, sparse secrets, but these attacks require millions of LWE samples to train on and take days to recover secrets. We propose three key methods—better pre-processing, angular embeddings and model pre-training—to improve these attacks, speeding up preprocessing by 25× and improving model sample efficiency by 10×. We demonstrate for the first time that pre-training improves and reduces
the cost of ML attacks on LWE. Our architecture improvements enable scaling to larger-dimension LWE problems: this work is the first instance of ML attacks recovering sparse binary secrets in dimension n = 1024, the smallest dimension used in practice for homomorphic encryption applications of LWE where sparse binary secrets are proposed.
Shing Hing William Cheng, Chitchanok Chuengsatiansup, Daniel Genkin, Dallas McNeil, Toby Murray, Yuval Yarom, Zhiyuan Zhang
ePrint Report
Speculative out-of-order execution is a strategy of masking execution latency by allowing younger instructions to execute before older instructions. While originally considered to be innocuous, speculative out-of-order execution was brought into the spotlight with the 2018 publication of the Spectre and Meltdown attacks. These attacks demonstrated that microarchitectural side channels can leak sensitive data accessed by speculatively executed instructions that are not part of the normal program execution. Since then, a significant effort has been vested in investigating how microarchitectural side channels can leak data from speculatively executed instructions and how to control this leakage. However, much less is known about how speculative out-of-order execution affects microarchitectural side-channel attacks.
In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code.
Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-table-based implementation of AES. We further show an Evict+Spec+Time attack on an S-box-based implementation of AES, recovering the key with as little as 14389 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.
In this paper, we investigate how speculative out-of-order execution affects the Evict+Time cache attack. Evict+Time is based on the observation that cache misses are slower than cache hits, hence by measuring the execution time of code, an attacker can determine if a cache miss occurred during the execution. We demonstrate that, due to limited resources for tracking out-of-order execution, under certain conditions an attacker can gain more fine-grained information and determine whether a cache miss occurred in part of the executed code.
Based on the observation, we design the Evict+Spec+Time attack, a variant of Evict+Time that can learn not only whether a cache miss occurred, but also in which part of the victim code it occurred. We demonstrate that Evict+Spec+Time is an order of magnitude more efficient than Evict+Time when attacking a T-table-based implementation of AES. We further show an Evict+Spec+Time attack on an S-box-based implementation of AES, recovering the key with as little as 14389 decryptions. To the best of our knowledge, ours is the first successful Evict+Time attack on such a victim.
Charles Bouillaguet, Julia Sauvage
ePrint Report
Biscuit is a recent multivariate signature scheme based on the MPC-in-the-Head paradigm. It has been submitted to the NIST competition for additional signature schemes. Signatures are derived from a zero-knowledge proof of knowledge of the solution of a structured polynomial system. This extra structure enables efficient proofs and compact signatures. This short note demonstrates that it also makes these polynomial systems easier to solve than random ones. As a consequence, the original parameters of Biscuit failed to meet the required security levels and had to be upgraded.
Thorben Moos, Sayandeep Saha, François-Xavier Standaert
ePrint Report
Fault injection attacks are a serious concern for cryptographic hardware. Adversaries may extract sensitive information from the faulty output that is produced by a cryptographic circuit after actively disturbing its computation. Alternatively, the information whether an output would have been faulty, even if it is withheld from being released, may be exploited. The former class of attacks, which requires the collection of faulty outputs, such as Differential Fault Analysis (DFA), then either exploits some knowledge about the position of the injected fault or about its value. The latter class of attacks, which can be applied without ever obtaining faulty outputs, such as Statistical Ineffective Fault Attacks (SIFA), then either exploits a dependency between the effectiveness of the fault injection and the value to be faulted (e.g., an LSB stuck-at-0 only affecting odd numbers), denoted as SIFA-1, or a conditional propagation of a faulted value based on a sensitive intermediate (e.g., multiplication of a faulted value by 0 prevents propagation), denoted as SIFA-2. The aptitude of additive masking schemes, which were designed to prevent side-channel analysis, to also thwart fault attacks is typically assumed to be limited. Common fault models, such as toggle/bit-flip, stuck-at-0 or stuck-at-1 survive the recombination of Boolean shares well enough for generic attacks to succeed. More precisely, injecting a fault into one or multiple Boolean shares often results in the same, or at least a predictable, error appearing in the sensitive variable after recombination. In this work, we show that additive masking in prime-order fields breaks such relationships, causing frequently exploited biases to decrease exponentially in the number of shares. As a result, prime masking offers surprisingly strong protection against generic statistical attacks, which require a dependency between the effectiveness of an injected fault and the secret variable that is manipulated, such as SIFA-1. Operation-dependent statistical attacks, such as SIFA-2 and Fault Template Attacks (FTA), may still be performed against certain prime-field structures, even if they are masked with many shares. Yet, we analyze the corresponding cases and are able to provide specific guidelines on how to avoid vulnerabilities either at the cipher design or implementation level by making informed decisions about the primes, non-linear mappings and masked gadgets used. Since prime-field masking appears to be one of the rare instances of affordable countermeasures that naturally provide sound protection against sidechannel analysis and certain fault injection attacks, we believe there is a strong incentive for developing new ciphers to leverage these advantages.
Jonathan Komada Eriksen, Antonin Leroux
ePrint Report
This work introduces several algorithms related to the computation of orientations in endomorphism rings of supersingular elliptic curves. This problem boils down to representing integers by ternary quadratic forms, and it is at the heart of several results regarding the security of oriented-curves in isogeny-based cryptography.
Our main contribution is to show that there exists efficient algorithms that can solve this problem for quadratic orders of discriminant $n$ up to $O(p^{4/3})$. Our approach improves upon previous results by increasing this bound from $O(p)$ to $O(p^{4/3})$ and removing some heuristics.
We introduce several variants of our new algorithm and provide a careful analysis of their asymptotic running time (without heuristic when it is possible). The best proven asymptotic complexity of one of our variant is $O(n^{3/4}/p)$ in average. The best heuristic variant has a complexity of $O(p^{1/3})$ for big enough $n$.
We then introduce several results regarding the computation of ideals between oriented orders. The first application of this is a simplification of the known reduction from vectorization to computing the endomorphism ring, removing the assumption on the factorization of the discriminant. As a second application, we relate the problem of computing fixed-degree isogenies between supersingular curves to the problem of computing orientations in endomorphism rings, and we show that for a large range of degree $d$, our new algorithms improve on the state-of-the-art, and in important special cases, the range of degree $d$ for which there exist a polynomial-time algorithm is increased. In the most special case we consider, when both curves are oriented by a small degree endomorphism, we show heuristically that our techniques allow the computation of isogenies of any degree, assuming they exist.
Charlotte Hoffmann, Pavel Hubáček, Svetlana Ivanova
ePrint Report
A Proof of Exponentiation (PoE) allows a prover to efficiently convince a verifier that $y=x^e$ in some group of unknown order. PoEs are the basis for practical constructions of Verifiable Delay Functions (VDFs), which, in turn, are important for various higher-level protocols in distributed computing. In applications such as distributed consensus, many PoEs are generated regularly, motivating protocols for secure aggregation of batches of statements into a few statements to improve the efficiency for both parties. Rotem (TCC 2021) recently presented two such generic batch PoEs.
In this work, we introduce two batch PoEs that outperform both proposals of Rotem and we evaluate their practicality. First, we show that the two batch PoEs of Rotem can be combined to improve the overall efficiency by at least a factor of two. Second, we revisit the work of Bellare, Garay, and Rabin (EUROCRYPT 1998) on batch verification of digital signatures and show that, under the low order assumption, their bucket test can be securely adapted to the setting of groups of unknown order. The resulting batch PoE quickly outperforms the state of the art in the expected number of group multiplications with the growing number of instances, and it decreases the cost of batching by an order of magnitude already for hundreds of thousands of instances. Importantly, it is the first batch PoE that significantly decreases both the proof size and complexity of verification. Our experimental evaluations show that even a non-optimized implementation achieves such improvements, which would match the demands of real-life systems requiring large-scale PoE processing.
Finally, even though our proof techniques are conceptually similar to Rotem, we give an improved analysis of the application of the low order assumption towards secure batching of PoE instances, resulting in a tight reduction, which is important when setting the security parameter in practice.
In this work, we introduce two batch PoEs that outperform both proposals of Rotem and we evaluate their practicality. First, we show that the two batch PoEs of Rotem can be combined to improve the overall efficiency by at least a factor of two. Second, we revisit the work of Bellare, Garay, and Rabin (EUROCRYPT 1998) on batch verification of digital signatures and show that, under the low order assumption, their bucket test can be securely adapted to the setting of groups of unknown order. The resulting batch PoE quickly outperforms the state of the art in the expected number of group multiplications with the growing number of instances, and it decreases the cost of batching by an order of magnitude already for hundreds of thousands of instances. Importantly, it is the first batch PoE that significantly decreases both the proof size and complexity of verification. Our experimental evaluations show that even a non-optimized implementation achieves such improvements, which would match the demands of real-life systems requiring large-scale PoE processing.
Finally, even though our proof techniques are conceptually similar to Rotem, we give an improved analysis of the application of the low order assumption towards secure batching of PoE instances, resulting in a tight reduction, which is important when setting the security parameter in practice.
Maria Corte-Real Santos, Craig Costello, Benjamin Smith
ePrint Report
We give an alternative derivation of (N,N)-isogenies between fast Kummer surfaces which complements existing works based on the theory of theta functions. We use this framework to produce explicit formulae for the case of N = 3, and show that the resulting algorithms are more efficient than all prior (3,3)-isogeny algorithms.
Xuanming Liu, Zhelei Zhou, Yinghao Wang, Bingsheng Zhang, Xiaohu Yang
ePrint Report
The notion of collaborative zk-SNARK is introduced by Ozdemir and Boneh (USENIX 2022), which allows multiple parties to jointly create a zk-SNARK proof over distributed secrets (also known as the witness).
This approach ensures the privacy of the witness, as no corrupted servers involved in the proof generation can learn anything about the honest servers' witness.
Later, Garg et al. continued the study, focusing on how to achieve faster proof generation (USENIX 2023).
However, their approach requires a powerful server that is responsible for the most resource-intensive computations and communications during the proof generation.
This requirement results in a scalability bottleneck, making their protocols unable to handle large-scale circuits.
In this work, we address this issue by lifting a zk-SNARK called Libra (Crypto 2019) to a collaborative zk-SNARK and achieve a fully distributed proof generation, where all servers take roughly the same portion of the total workload. Further, our protocol can be adapted to be secure against a malicious adversary by incorporating some verification mechanisms. With 128 consumer machines and a 4Gbps network, we successfully generate a proof for a data-parallel circuit containing $2^{23}$ gates in merely 2.5 seconds and take only 0.5 GB memory for each server. This represents a $19\times$ speed-up, compared to a local Libra prover. Our benchmark further indicates an impressive 877$\times$ improvement in running time and a 992$\times$ enhancement in communication compared to the implementation in previous work. Furthermore, our protocol is capable of handling larger circuits, making it scalable in practice.
In this work, we address this issue by lifting a zk-SNARK called Libra (Crypto 2019) to a collaborative zk-SNARK and achieve a fully distributed proof generation, where all servers take roughly the same portion of the total workload. Further, our protocol can be adapted to be secure against a malicious adversary by incorporating some verification mechanisms. With 128 consumer machines and a 4Gbps network, we successfully generate a proof for a data-parallel circuit containing $2^{23}$ gates in merely 2.5 seconds and take only 0.5 GB memory for each server. This represents a $19\times$ speed-up, compared to a local Libra prover. Our benchmark further indicates an impressive 877$\times$ improvement in running time and a 992$\times$ enhancement in communication compared to the implementation in previous work. Furthermore, our protocol is capable of handling larger circuits, making it scalable in practice.
Xiaohai Dai, Zhaonan Zhang, Jiang Xiao, Jingtao Yue, Xia Xie, Hai Jin
ePrint Report
To enable parallel processing, the Directed Acyclic Graph (DAG) structure is introduced to the design of asynchronous Byzantine Fault Tolerant (BFT) consensus protocols, known as DAG-based BFT. Existing DAG-based BFT protocols operate in successive waves, with each wave containing three or four Reliable Broadcast (RBC) rounds to broadcast data, resulting in high latency due to the three communication steps required in each RBC. For instance, Tusk, a state-of-the-art DAG-based BFT protocol, has a good-case latency of 7 communication steps and an expected worst latency of 21 communication steps.
To reduce latency, we propose GradedDAG, a new DAG-based BFT consensus protocol based on our adapted RBC called Graded RBC (GRBC) and the Consistent Broadcast (CBC), with each wave consisting of only one GRBC round and one CBC round. Through GRBC, a replica can deliver data with a grade of 1 or 2, and a non-faulty replica delivering the data with grade 2 can ensure that more than 2/3 of replicas have delivered the same data. Meanwhile, through CBC, data delivered by different non-faulty replicas must be identical. In each wave, a block in the GRBC round will be elected as the leader. If a leader block has been delivered with grade 2, it and all its ancestor blocks can be committed. GradedDAG offers a good-case latency of 4 communication steps and an expected worst latency of 7.5 communication steps, significantly lower than the state-of-theart. Experimental results demonstrate GradedDAG’s feasibility and efficiency.
To reduce latency, we propose GradedDAG, a new DAG-based BFT consensus protocol based on our adapted RBC called Graded RBC (GRBC) and the Consistent Broadcast (CBC), with each wave consisting of only one GRBC round and one CBC round. Through GRBC, a replica can deliver data with a grade of 1 or 2, and a non-faulty replica delivering the data with grade 2 can ensure that more than 2/3 of replicas have delivered the same data. Meanwhile, through CBC, data delivered by different non-faulty replicas must be identical. In each wave, a block in the GRBC round will be elected as the leader. If a leader block has been delivered with grade 2, it and all its ancestor blocks can be committed. GradedDAG offers a good-case latency of 4 communication steps and an expected worst latency of 7.5 communication steps, significantly lower than the state-of-theart. Experimental results demonstrate GradedDAG’s feasibility and efficiency.
Gilad Asharov, Koki Hamada, Dai Ikarashi, Ryo Kikuchi, Ariel Nof, Benny Pinkas, Junichi Tomida
ePrint Report
We implement a secure platform for statistical analysis over multiple organizations and multiple datasets. We provide a suite of protocols for different variants of JOIN and GROUP-BY operations. JOIN allows combining data from multiple datasets based on a common column. GROUP-BY allows aggregating rows that have the same values in a column or a set of columns, and then apply some aggregation summary on the rows (such as sum, count, median, etc.). Both operations are fundamental tools for relational databases. One example use case of our platform is in data marketing in which an analyst would join purchase histories and membership information, and then obtain statistics, such as "Which products were bought by people earning this much per annum?"
Both JOIN and GROUP-BY involve many variants, and we design protocols for several common procedures. In particular, we propose a novel group-by-median protocol that has not been known so far. Our protocols rely on sorting protocols, and work in the honest majority setting and against malicious adversaries. To the best of our knowledge, this is the first implementation of JOIN and GROUP-BY protocols secure against a malicious adversary.
Both JOIN and GROUP-BY involve many variants, and we design protocols for several common procedures. In particular, we propose a novel group-by-median protocol that has not been known so far. Our protocols rely on sorting protocols, and work in the honest majority setting and against malicious adversaries. To the best of our knowledge, this is the first implementation of JOIN and GROUP-BY protocols secure against a malicious adversary.