IACR News
If you have a news item you wish to distribute, they should be sent to the communications secretary. See also the events database for conference announcements.
Here you can see all recent updates to the IACR webpage. These updates are also available:
06 August 2021
Quoc Huy Do, Pedram Hosseyni, Ralf Kuesters, Guido Schmitz, Nils Wenzler, Tim Wuertele
Payment is an essential part of e-commerce. Merchants usually rely on third-parties, so-called payment processors, who take care of transferring the payment from the customer to the merchant. How a payment processor interacts with the customer and the merchant varies a lot. Each payment processor typically invents its own protocol that has to be integrated into the merchants application and provides the user with a new, potentially unknown and confusing user experience.
Pushed by major companies, including Apple, Google, Mastercard, and Visa, the W3C is currently developing a new set of standards to unify the online checkout process and streamline the users payment experience. The main idea is to integrate payment as a native functionality into web browsers, referred to as the Web Payment APIs. While this new checkout process will indeed be simple and convenient from an end-user perspective, the technical realization requires rather significant changes to browsers.
Many major browsers, such as Chrome, Firefox, Edge, Safari, and Opera, already implement these new standards, and many payment processors, such as Google Pay, Apple Pay, or Stripe, support the use of Web Payment APIs for payments. The ecosystem is constantly growing, meaning that the Web Payment APIs will likely be used by millions of people worldwide.
So far, there has been no in-depth security analysis of these new standards. In this paper, we present the first such analysis of the Web Payment APIs standards, a rigorous formal analysis. It is based on the Web Infrastructure Model (WIM), the most comprehensive model of the web infrastructure to date, which, among others, we extend to integrate the new payment functionality into the generic browser model.
Our analysis reveals two new critical vulnerabilities that allow a malicious merchant to over-charge an unsuspecting customer. We have verified our attacks using the Chrome implementation and reported these problems to the W3C as well as the Chrome developers, who have acknowledged these problems. Moreover, we propose fixes to the standard, which by now have been adopted by the W3C and Chrome, and prove that the fixed Web Payment APIs indeed satisfy strong security properties.
Pushed by major companies, including Apple, Google, Mastercard, and Visa, the W3C is currently developing a new set of standards to unify the online checkout process and streamline the users payment experience. The main idea is to integrate payment as a native functionality into web browsers, referred to as the Web Payment APIs. While this new checkout process will indeed be simple and convenient from an end-user perspective, the technical realization requires rather significant changes to browsers.
Many major browsers, such as Chrome, Firefox, Edge, Safari, and Opera, already implement these new standards, and many payment processors, such as Google Pay, Apple Pay, or Stripe, support the use of Web Payment APIs for payments. The ecosystem is constantly growing, meaning that the Web Payment APIs will likely be used by millions of people worldwide.
So far, there has been no in-depth security analysis of these new standards. In this paper, we present the first such analysis of the Web Payment APIs standards, a rigorous formal analysis. It is based on the Web Infrastructure Model (WIM), the most comprehensive model of the web infrastructure to date, which, among others, we extend to integrate the new payment functionality into the generic browser model.
Our analysis reveals two new critical vulnerabilities that allow a malicious merchant to over-charge an unsuspecting customer. We have verified our attacks using the Chrome implementation and reported these problems to the W3C as well as the Chrome developers, who have acknowledged these problems. Moreover, we propose fixes to the standard, which by now have been adopted by the W3C and Chrome, and prove that the fixed Web Payment APIs indeed satisfy strong security properties.
Mojtaba Rafiee
A Multi-Client Functional Encryption (MCFE) scheme for set intersection is a cryptographic primitive that enables an evaluator to learn the intersection from all sets of a pre-determined number of clients, without need to learn the plaintext set of each individual client. In this paper, we propose a flexible version of the MCFE schemes for the set intersection, called Flexible Multi-Client Functional Encryption for Set Intersection (FMCFE). In our FMCFE scheme, the evaluator can learn the intersection from any flexible choice of sets (instead of all sets). In this regard, we redefine syntax and security notions of the MCFE schemes for the FMCFE schemes.
In the literature, solving multi-client set intersection problem in polynomial time, such that only the intersection result is revealed (without additional information), is an open problem. In this paper, we propose a relaxed solution using FMCFE schemes to solve secure set intersection in polynomial time.
We analyze that for practical use of secure multi-client set intersection, this relaxation is necessary.
We also show that our scheme has the adaptive indistinguishability-based security under passive corruption. Our proof relies on the Symmetric eXternal Diffie-Hellman (SXDH) assumption in the standard model.
Endre (Silur) Abraham
Mainstream hash functions such as SHA or BLAKE while generally efficient in their implementations, are not suitable for zero-knowledge boolean or arithmetic circuits due to their reliance on CPU designs. As a candidate hash function that uses only on trivial arithmetics which can be generalized to zeroknowledge circuits, the Ajtai lattice SIS-hasher has been proposed. In this paper we review Micciancios R-SIS generalization and argue about its circuit complexity, then we show how this R-SIS
hasher can be used as a universal dynamic hash accumulator that has constant-time update and revocation complexity, and can be run on 16-bit hardware as well as smart contracts.
Aydin Abadi, Steven J. Murdoch, Thomas Zacharias
Private Set Intersection protocols (PSIs) allow parties to compute the intersection of their private sets, such that nothing about the sets elements beyond the intersection is revealed. PSIs have a variety of applications, primarily in efficiently supporting data sharing in a privacy-preserving manner. At Eurocrypt 2019, Ghosh and Nilges pro- posed three efficient PSIs based on the polynomial representation of sets and proved their security against active adversaries. In this work, we show that these three PSIs are susceptible to several serious attacks. The attacks let an adversary (1) learn the correct intersection while making its victim believe that the intersection is empty, (2) learn a certain element of its victims set beyond the intersection, and (3) delete multiple elements of its victims input set. We explain why the proofs did not identify these attacks and propose a set of mitigations
Zi-Yuan Liu, Yi-Fan Tseng, Raylin Tso, Masahiro Mambo, Yu-Chi Chen
With the rapid development of cloud computing, an increasing number of companies are adopting cloud storage to reduce overhead. However, to ensure the privacy of sensitive data, the uploaded data need to be encrypted before being outsourced to the cloud. The concept of public-key encryption with keyword search (PEKS) was introduced by Boneh \textit{et al.} to provide flexible usage of the encrypted data. Unfortunately, most of the PEKS schemes are not secure against inside keyword guessing attacks (IKGA), so the keyword information of the trapdoor may be leaked to the adversary. To solve this issue, Huang and Li presented public key authenticated encryption with keyword search (PAEKS) in which the trapdoor generated by the receiver is only valid for authenticated ciphertexts. With their seminal work, many PAEKS schemes have been introduced for the enhanced security of PAEKS. Some of them further consider the upcoming quantum attacks. However, our cryptanalysis indicated that in fact, these schemes could not withstand IKGA. To fight against the attacks from quantum adversaries and support the privacy-preserving search functionality, we first introduce a novel generic PAEKS construction in this work. We further present the first quantum-resistant PAEKS instantiation based on lattices. The security proofs showed that our instantiation not only satisfied the basic requirements but also achieved an enhanced security model, namely the multi-ciphertext and multi-trapdoor indistinguishability. Furthermore, the comparative results indicated that with only some additional expenditure, this instantiation could provide more secure properties, making it suitable for more diverse application environments.
03 August 2021
Jean-Sebastien Coron, Agnese Gini
At Crypto '99, Nguyen and Stern described a lattice based algorithm for solving the hidden subset sum problem, a variant of the classical subset sum problem where the $n$ weights are also hidden. As an application, they showed how to break the Boyko et al. fast generator of random pairs $(x,g^x \pmod{p})$. The Nguyen-Stern algorithm works quite well in practice for moderate values of $n$, but its complexity is exponential in $n$. A polynomial-time variant was recently described at Crypto 2020, based on a multivariate technique, but the approach is heuristic only. In this paper, we describe a proven polynomial-time algorithm for solving the hidden subset-sum problem, based on statistical learning. In addition, we show that the statistical approach is also quite efficient in practice: using the FastICA algorithm, we can reach $n=250$ in reasonable time.
Gilles Macario-Rat, Jacques Patarin
In this paper, we present a new perturbation for the design of multivariate schemes that we call ``Pepper''.
From this idea, we present some efficient multivariate signature schemes with explicit parameters that resist all known attacks. In particular they resist the two main (and often very powerful) attacks in this area: the Gröbner attacks (to compute a solution of the system derived from the public key) and the MinRank attacks (to recover the secret key). Pepper can also be seen as a new perturbation that can be used to strengthen many other multivariate schemes.
The ``Pepper'' perturbation works only for public key equations of degree (at least) 3. Despite this, the size of the public key may still be reasonable since we can use larger fields (and also maybe non dense equations). Furthermore, the size of the signatures can be very short.
Arush Chhatrapati
In this compilational work, we combine various techniques from classical cryptography and steganography to construct ciphers that conceal multiple plaintexts in a single ciphertext. We name these "multi-ciphers". Most notably, we construct and cryptanalyze a Four-In-One-Cipher: the first cipher which conceals four separate plaintexts in a single ciphertext. Following a brief overview of classical cryptography and steganography, we consider strategies that can be used to creatively combine these two fields to construct multi-ciphers. Finally, we cryptanalyze three
multi-ciphers which were constructed using the techniques described in this paper. This cryptanalysis
relies on both traditional algorithms that are used to decode classical ciphers and new algorithms which
we use to extract the additional plaintexts concealed by the multi-ciphers. We implement these algorithms
in Python, and provide code snippets. The primary goal of this work is to inform others who might be otherwise unfamiliar with the fields of classical cryptography and steganography from a new perspective which lies at the intersection of these
two fields. The ideas presented in this paper could prove useful in teaching cryptography, statistics,
mathematics, and computer science to future generations in a unique, interdisciplinary fashion. This work
might also serve as a source of creative inspiration for other cipher-making, code-breaking enthusiasts.
Nils Wisiol
We present the LP-PUF, a novel, Arbiter PUF-based, CMOS-compatible strong PUF design. We explain the motivation behind the design choices for LP-PUF and show evaluation results to demonstrate that LP-PUF has good uniqueness, low bias, and fair bit sensitivity and reliability values. Furthermore, based on analyses and discussion of the LR and splitting attacks, the reliability attacks, and MLP attack, we argue that the LP-PUF has potential to be secure against known PUF modeling attacks, which motivates a discussion of limitations of our study and future work with respect to the LP-PUF.
Lejla Batina, Łukasz Chmielewski, Björn Haase, Niels Samwel, Peter Schwabe
This paper describes an ECC implementation computing the X25519 key-exchange protocol on the ARM-Cortex M4 microcontroller. This software comes with extensive mitigations against various side-channel and fault attacks and is, to our best knowledge, the first to claim affordable protection against multiple classes of attacks that are motivated by distinct real-world application scenarios. We also present the results of a comprehensive side-channel evaluation. We distinguish between X25519 with ephemeral keys and X25519 with static keys and show that the overhead to protect the two is about 36% and 239% respectively. While this might seem to be a high price to pay for security, we also show that even our (most protected) static implementation is more efficient than widely deployed ECC cryptographic libraries, which offer much fewer protections.
28 July 2021
Yevgeniy Dodis, Siyao Guo, Noah Stephens-Davidowitz, Zhiye Xie
In this work, we characterize online linear extractors. In other words, given a matrix $A \in \mathbb{F}_2^{n \times n}$, we study the convergence of the iterated process $\mathbf{S} \leftarrow A\mathbf{S} \oplus \mathbf{X} $, where $\mathbf{X} \sim D$ is repeatedly sampled independently from some fixed (but unknown) distribution $D$ with (min)-entropy at least $k$. Here, we think of $\mathbf{S} \in \{0,1\}^n$ as the state of an online extractor, and $\mathbf{X} \in \{0,1\}^n$ as its input.
As our main result, we show that the state $\mathbf{S}$ converges to the uniform distribution for all input distributions $D$ with entropy $k > 0$ if and only if the matrix $A$ has no non-trivial invariant subspace (i.e., a non-zero subspace $V \subsetneq \mathbb{F}_2^n$ such that $AV \subseteq V$). In other words, a matrix $A$ yields an online linear extractor if and only if $A$ has no non-trivial invariant subspace. For example, the linear transformation corresponding to multiplication by a generator of the field $\mathbb{F}_{2^n}$ yields a good online linear extractor. Furthermore, for any such matrix convergence takes at most $\widetilde{O}(n^2(k+1)/k^2)$ steps.
We also study the more general notion of condensing---that is, we ask when this process converges to a distribution with entropy at least $\ell$, when the input distribution has entropy greater than $k$. (Extractors corresponding to the special case when $\ell = n$.) We show that a matrix gives a good condenser if there are relatively few vectors $\mathbf{w} \in \mathbb{F}_2^n$ such that $\mathbf{w}, A^T\mathbf{w}, \ldots, (A^T)^{n-k-1} \mathbf{w}$ are linearly dependent. As an application, we show that the very simple cyclic rotation transformation $A(x_1,\ldots, x_n) = (x_n,x_1,\ldots, x_{n-1})$ condenses to $\ell = n-1$ bits for any $k > 1$ if $n$ is a prime satisfying a certain simple number-theoretic condition.
Our proofs are Fourier-analytic and rely on a novel lemma, which gives a tight bound on the product of certain Fourier coefficients of any entropic distribution.
As our main result, we show that the state $\mathbf{S}$ converges to the uniform distribution for all input distributions $D$ with entropy $k > 0$ if and only if the matrix $A$ has no non-trivial invariant subspace (i.e., a non-zero subspace $V \subsetneq \mathbb{F}_2^n$ such that $AV \subseteq V$). In other words, a matrix $A$ yields an online linear extractor if and only if $A$ has no non-trivial invariant subspace. For example, the linear transformation corresponding to multiplication by a generator of the field $\mathbb{F}_{2^n}$ yields a good online linear extractor. Furthermore, for any such matrix convergence takes at most $\widetilde{O}(n^2(k+1)/k^2)$ steps.
We also study the more general notion of condensing---that is, we ask when this process converges to a distribution with entropy at least $\ell$, when the input distribution has entropy greater than $k$. (Extractors corresponding to the special case when $\ell = n$.) We show that a matrix gives a good condenser if there are relatively few vectors $\mathbf{w} \in \mathbb{F}_2^n$ such that $\mathbf{w}, A^T\mathbf{w}, \ldots, (A^T)^{n-k-1} \mathbf{w}$ are linearly dependent. As an application, we show that the very simple cyclic rotation transformation $A(x_1,\ldots, x_n) = (x_n,x_1,\ldots, x_{n-1})$ condenses to $\ell = n-1$ bits for any $k > 1$ if $n$ is a prime satisfying a certain simple number-theoretic condition.
Our proofs are Fourier-analytic and rely on a novel lemma, which gives a tight bound on the product of certain Fourier coefficients of any entropic distribution.
Nir Bitansky, Zvika Brakerski
In classical commitments, statistical binding means that for almost any commitment transcript there is at most one possible opening. While quantum commitments (for classical messages) sometimes have benefits over their classical counterparts (e.g.\ in terms of assumptions), they provide a weaker notion of binding. Essentially that the sender cannot open a given commitment to a random value with probability noticeably greater than $1/2$.
We introduce a notion of classical binding for quantum commitments which provides guarantees analogous to the classical case. In our notion, the receiver performs a (partial) measurement of the quantum commitment string, and the outcome of this measurement determines a single value that the sender may open. We expect that our notion can replace classical commitments in various settings, leaving the security proof essentially unchanged. As an example we show a soundness proof for the GMW zero-knowledge proof system.
We construct a non-interactive quantum commitment scheme which is classically statistically-binding and has a classical opening, based on the existence of any post-quantum one-way function. Prior candidates had inherently quantum openings and were not classically binding. In contrast, we show that it is impossible to achieve classical binding for statistically hiding commitments, regardless of assumption or round complexity.
Our scheme is simply Naor's commitment scheme (which classically requires a common random string, CRS), but executed in superposition over all possible values of the CRS, and repeated several times. We hope that this technique for using quantum communication to remove a CRS may find other uses.
We introduce a notion of classical binding for quantum commitments which provides guarantees analogous to the classical case. In our notion, the receiver performs a (partial) measurement of the quantum commitment string, and the outcome of this measurement determines a single value that the sender may open. We expect that our notion can replace classical commitments in various settings, leaving the security proof essentially unchanged. As an example we show a soundness proof for the GMW zero-knowledge proof system.
We construct a non-interactive quantum commitment scheme which is classically statistically-binding and has a classical opening, based on the existence of any post-quantum one-way function. Prior candidates had inherently quantum openings and were not classically binding. In contrast, we show that it is impossible to achieve classical binding for statistically hiding commitments, regardless of assumption or round complexity.
Our scheme is simply Naor's commitment scheme (which classically requires a common random string, CRS), but executed in superposition over all possible values of the CRS, and repeated several times. We hope that this technique for using quantum communication to remove a CRS may find other uses.
Masayuki Fukumitsu, Shingo Hasegawa
The multisignature schemes are attracted to utilize in some cryptographic applications such as the blockchain. Though the lattice-based constructions of multisignature schemes exist as quantum-secure multisignature, a multisignature scheme whose security is proven in the quantum random oracle model (QROM), rather than the classical random oracle model (CROM), is not known.
In this paper, we propose a first lattice-based multisignature scheme whose security is proven in QROM. Although our proposed scheme is based on the Dilithium-QROM signature, whose security is proven in QROM, their proof technique cannot be directly applied to the multisignature setting. The difficulty of proving the security in QROM is how to program the random oracle in the security proof. To solve the problems in the security proof, we develop several proof techniques in QROM. First, we employ the searching query technique by Targi and Unruh to convert the Dilithium-QROM into the multisignature setting. For the second, we develop a new programming technique in QROM since the conventional programming techniques seem not to work in the multisignature setting of QROM. We combine the programming technique by Unruh with the one by Liu and Zhandry. The new technique enables us to program the random oracle in QROM and construct the signing oracle in the security proof.
Léo Ducas, Wessel van Woerden
Until recently lattice reduction attacks on NTRU lattices were thought to behave similar as on (ring-)LWE lattices with the same parameters. However several works (Albrecht-Bai-Ducas 2016, Kirchner-Fouque 2017) showed a significant gap for large moduli $q$, the so-called overstretched regime of NTRU.
With the NTRU scheme being a finalist to the NIST PQC competition it is important to understand ---both asymptotically and concretely--- where the fatigue point lies exactly, i.e. at which $q$ the overstretched regime begins. Unfortunately the analysis by Kirchner and Fouque is based on an impossibility argument, which only results in an asymptotic upper bound on the fatigue point. It also does not really {\em explain} how lattice reduction actually recovers secret-key information.
We propose a new analysis that asymptotically improves on that of Kirchner and Fouque, narrowing down the fatigue point for ternary NTRU from $q \leq n^{2.783+o(1)}$ to $q=n^{2.484+o(1)}$, and finally explaining the mechanism behind this phenomenon. We push this analysis further to a concrete one, settling the fatigue point at $q \approx 0.004 \cdot n^{2.484}$, and allowing precise hardness predictions in the overstretched regime. These predictions are backed by extensive experiments.
With the NTRU scheme being a finalist to the NIST PQC competition it is important to understand ---both asymptotically and concretely--- where the fatigue point lies exactly, i.e. at which $q$ the overstretched regime begins. Unfortunately the analysis by Kirchner and Fouque is based on an impossibility argument, which only results in an asymptotic upper bound on the fatigue point. It also does not really {\em explain} how lattice reduction actually recovers secret-key information.
We propose a new analysis that asymptotically improves on that of Kirchner and Fouque, narrowing down the fatigue point for ternary NTRU from $q \leq n^{2.783+o(1)}$ to $q=n^{2.484+o(1)}$, and finally explaining the mechanism behind this phenomenon. We push this analysis further to a concrete one, settling the fatigue point at $q \approx 0.004 \cdot n^{2.484}$, and allowing precise hardness predictions in the overstretched regime. These predictions are backed by extensive experiments.
Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, Ingrid Verbauwhede
High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being single issue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on
Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber.
Annapurna Valiveti, Srinivas Vivek
Masking using randomised lookup tables is a popular countermeasure for side-channel attacks, particularly at small masking orders. An advantage of this class of countermeasures for masking S-boxes compared to ISW-based masking is that it supports pre-processing and thus significantly reducing the amount of computation to be done after the unmasked inputs are available. Indeed, the online computation can be as fast as just a table lookup. But the size of the randomised lookup table increases linearly with the masking order, and hence the RAM memory required to store pre-processed tables
becomes infeasible for higher masking orders. Hence demonstrating the feasibility of full pre-processing of higher-order lookup table-based masking schemes on resource-constrained devices has remained an open problem.
In this work, we solve the above problem by implementing a higher-order lookup table-based scheme using an amount of RAM memory that is essentially independent of the masking order. More concretely, we reduce the amount of RAM memory needed for the table-based scheme of Coron et al. (TCHES 2018) approximately by a factor equal to the number of shares. Our technique is based upon the use of PRG to minimise the randomness complexity of ISW-based masking schemes proposed by Ishai et al. (ICALP 2013) and Coron et al. (Eurocrypt 2020). Hence we show that for lookup table-based masking schemes, the use of a PRG not only reduces the randomness complexity (now logarithmic in the size of the S-box) but also the memory complexity, and without any significant increase in the overall running time. We have implemented in software the higher-order table-based masking scheme of Coron et al. (TCHES 2018) at tenth order with full pre-processing of a single execution of all the AES S-boxes on a ARM Cortex-M4 device that has 256 KB RAM memory. Our technique requires only 41.2 KB of RAM memory, whereas the original scheme would have needed 440 KB. Moreover, our 8-bit implementation results demonstrate that the online execution time of our variant is about 1.5 times faster compared to the 8-bit bitsliced masked implementation of AES-128.
In this work, we solve the above problem by implementing a higher-order lookup table-based scheme using an amount of RAM memory that is essentially independent of the masking order. More concretely, we reduce the amount of RAM memory needed for the table-based scheme of Coron et al. (TCHES 2018) approximately by a factor equal to the number of shares. Our technique is based upon the use of PRG to minimise the randomness complexity of ISW-based masking schemes proposed by Ishai et al. (ICALP 2013) and Coron et al. (Eurocrypt 2020). Hence we show that for lookup table-based masking schemes, the use of a PRG not only reduces the randomness complexity (now logarithmic in the size of the S-box) but also the memory complexity, and without any significant increase in the overall running time. We have implemented in software the higher-order table-based masking scheme of Coron et al. (TCHES 2018) at tenth order with full pre-processing of a single execution of all the AES S-boxes on a ARM Cortex-M4 device that has 256 KB RAM memory. Our technique requires only 41.2 KB of RAM memory, whereas the original scheme would have needed 440 KB. Moreover, our 8-bit implementation results demonstrate that the online execution time of our variant is about 1.5 times faster compared to the 8-bit bitsliced masked implementation of AES-128.
Elias Rohrer, Florian Tschorsch
In order to propagate transactions and blocks, todays blockchain systems rely on unstructured peer-to-peer overlay networks. In such networks, broadcast is known to be an inefficient operation in terms of message complexity and overhead. In addition to the impact on the system performance, inefficient or delayed block propagation may have severe consequences regarding security and fairness of the consensus layer. In contrast, the Kadcast protocol is a structured peer-to-peer protocol for block and transaction propagation in blockchain networks. Kadcast utilizes the well-known overlay topology of Kademlia to realize an efficient broadcast operation with tunable overhead. We study the security and privacy of the Kadcast protocol based on probabilistic models and analyze its resilience to packet losses and node failures. Moreover, we evaluate Kadcasts block delivery performance, broadcast reliability, efficiency, and security based on advanced network simulations. Lastly, we introduce a QUIC-based prototype implementation of the Kadcast protocol and show its merits through deployment in a large-scale cloud-based testbed.
Amin Abdulrahman, Jiun-Peng Chen, Yu-Jia Chen, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang
The U.S. National Institute of Standards and Technology (NIST) has
designated ARM microcontrollers as an important benchmarking platform for
its Post-Quantum Cryptography standardization process (NISTPQC).
In view of this, we explore the design
space of the NISTPQC finalist Saber on the Cortex-M4 and
its close relation, the Cortex-M3. In the process, we investigate
various optimization strategies and memory-time tradeoffs for number-theoretic
transforms (NTTs).
Recent work by Chung et al. has shown that NTT multiplication is superior compared to Toom--Cook multiplication for unprotected Saber implementations on the Cortex-M4 in terms of speed. However, it remains unclear if NTT multiplication can outperform Toom--Cook in masked implementations of Saber. Additionally, it is an open question if Saber with NTTs can outperform Toom--Cook in terms of stack usage. We answer both questions in the affirmative. Additionally, we present a Cortex-M3 implementation of Saber using NTTs outperforming an existing Toom--Cook implementation. Our stack-optimized unprotected M4 implementation uses around the same amount of stack as the most stack-optimized implementation using Toom--Cook while being 33%-41% faster. Our speed-optimized masked M4 implementation is 16% faster than the fastest masked implementation using Toom--Cook. For the Cortex-M3, we outperform existing implementations by 29%-35% in speed.
We conclude that for both stack- and speed-optimization purposes, one should base polynomial multiplications in Saber on the NTT rather than Toom--Cook for the Cortex-M4 and Cortex-M3. In particular, in many cases, composite moduli NTTs perform best.
Recent work by Chung et al. has shown that NTT multiplication is superior compared to Toom--Cook multiplication for unprotected Saber implementations on the Cortex-M4 in terms of speed. However, it remains unclear if NTT multiplication can outperform Toom--Cook in masked implementations of Saber. Additionally, it is an open question if Saber with NTTs can outperform Toom--Cook in terms of stack usage. We answer both questions in the affirmative. Additionally, we present a Cortex-M3 implementation of Saber using NTTs outperforming an existing Toom--Cook implementation. Our stack-optimized unprotected M4 implementation uses around the same amount of stack as the most stack-optimized implementation using Toom--Cook while being 33%-41% faster. Our speed-optimized masked M4 implementation is 16% faster than the fastest masked implementation using Toom--Cook. For the Cortex-M3, we outperform existing implementations by 29%-35% in speed.
We conclude that for both stack- and speed-optimization purposes, one should base polynomial multiplications in Saber on the NTT rather than Toom--Cook for the Cortex-M4 and Cortex-M3. In particular, in many cases, composite moduli NTTs perform best.
Dana Dachman-Soled, Huijing Gong, Hunter Kippen, Aria Shahverdi
We consider the Learning Parity with Noise (LPN) problem with sparse secret, where the secret vector $\textbf{s}$ of dimension $n$ has Hamming weight at most $k$. We are interested in algorithms with asymptotic improvement in the $\textit{exponent}$ beyond the state of the art. Prior work in this setting presented algorithms with runtime $n^{c \cdot k}$ for constant $c < 1$, obtaining a constant factor improvement over brute force search, which runs in time ${n \choose k}$. We obtain the following results:
- We first consider the $\textit{constant}$ error rate setting, and in this case present a new algorithm that leverages a subroutine from the acclaimed BKW algorithm [Blum, Kalai, Wasserman, J.~ACM '03] as well as techniques from Fourier analysis for $p$-biased distributions. Our algorithm achieves asymptotic improvement in the exponent compared to prior work, when the sparsity $k = k(n) = \frac{n}{\log^{1+ 1/c}(n)}$, where $c \in o(\log \log(n))$ and $c \in \omega(1)$. The runtime and sample complexity of this algorithm are approximately the same.
- We next consider the $\textit{low noise}$ setting, where the error is subconstant. We present a new algorithm in this setting that requires only a $\textit{polynomial}$ number of samples and achieves asymptotic improvement in the exponent compared to prior work, when the sparsity $k = \frac{1}{\eta} \cdot \frac{\log(n)}{\log(f(n))}$ and noise rate of $\eta \neq 1/2$ and $\eta^2 = \left(\frac{\log(n)}{n} \cdot f(n)\right)$, for $f(n) \in \omega(1) \cap n^{o(1)}$. To obtain the improvement in sample complexity, we create subsets of samples using the $\textit{design}$ of Nisan and Wigderson [J.~Comput.~Syst.~Sci. '94], so that any two subsets have a small intersection, while the number of subsets is large. Each of these subsets is used to generate a single $p$-biased sample for the Fourier analysis step. We then show that this allows us to bound the covariance of pairs of samples, which is sufficient for the Fourier analysis.
- Finally, we show that our first algorithm extends to the setting where the noise rate is very high $1/2 - o(1)$, and in this case can be used as a subroutine to obtain new algorithms for learning DNFs and Juntas. Our algorithms achieve asymptotic improvement in the exponent for certain regimes. For DNFs of size $s$ with approximation factor $\epsilon$ this regime is when $\log \frac{s}{\epsilon} \in \omega \left( \frac{c}{\log n \log \log c}\right)$, and $\log \frac{s}{\epsilon} \in n^{1 - o(1)}$, for $c \in n^{1 - o(1)}$. For Juntas of $k$ the regime is when $k \in \omega \left( \frac{c}{\log n \log \log c}\right)$, and $k \in n^{1 - o(1)}$, for $c \in n^{1 - o(1)}$.
Ye Dong, Xiaojun Chen, Kaiyun Li, Dakui Wang, Shuai Zeng
\textit{Privacy} and \textit{Byzantine-robustness} are two major concerns of federated learning (FL), but mitigating both threats simultaneously is highly challenging: privacy-preserving strategies prohibit access to individual model updates to avoid leakage, while Byzantine-robust methods require access for comprehensive mathematical analysis. Besides, most Byzantine-robust methods only work in the \textit{honest-majority} setting.
We present $\mathsf{FLOD}$, a novel oblivious defender for private Byzantine-robust FL in dishonest-majority setting. Basically, we propose a novel Hamming distance-based aggregation method to resist $>1/2$ Byzantine attacks using a small \textit{root-dataset} and \textit{server-model} for bootstrapping trust. Furthermore, we employ two non-colluding servers and use additive homomorphic encryption ($\mathsf{AHE}$) and secure two-party computation (2PC) primitives to construct efficient privacy-preserving building blocks for secure aggregation, in which we propose two novel in-depth variants of Beaver Multiplication triples (MT) to reduce the overhead of Bit to Arithmetic ($\mathsf{Bit2A}$) conversion and vector weighted sum aggregation ($\mathsf{VSWA}$) significantly. Experiments on real-world and synthetic datasets demonstrate our effectiveness and efficiency: (\romannumeral1) $\mathsf{FLOD}$ defeats known Byzantine attacks with a negligible effect on accuracy and convergence, (\romannumeral2) achieves a reduction of $\approx 2\times$ for offline (resp. online) overhead of $\mathsf{Bit2A}$ and $\mathsf{VSWA}$ compared to $\mathsf{ABY}$-$\mathsf{AHE}$ (resp. $\mathsf{ABY}$-$\mathsf{MT}$) based methods (NDSS'15), (\romannumeral3) and reduces total online communication and run-time by $167$-$1416\times$ and $3.1$-$7.4\times$ compared to $\mathsf{FLGUARD}$ (Crypto Eprint 2021/025).
We present $\mathsf{FLOD}$, a novel oblivious defender for private Byzantine-robust FL in dishonest-majority setting. Basically, we propose a novel Hamming distance-based aggregation method to resist $>1/2$ Byzantine attacks using a small \textit{root-dataset} and \textit{server-model} for bootstrapping trust. Furthermore, we employ two non-colluding servers and use additive homomorphic encryption ($\mathsf{AHE}$) and secure two-party computation (2PC) primitives to construct efficient privacy-preserving building blocks for secure aggregation, in which we propose two novel in-depth variants of Beaver Multiplication triples (MT) to reduce the overhead of Bit to Arithmetic ($\mathsf{Bit2A}$) conversion and vector weighted sum aggregation ($\mathsf{VSWA}$) significantly. Experiments on real-world and synthetic datasets demonstrate our effectiveness and efficiency: (\romannumeral1) $\mathsf{FLOD}$ defeats known Byzantine attacks with a negligible effect on accuracy and convergence, (\romannumeral2) achieves a reduction of $\approx 2\times$ for offline (resp. online) overhead of $\mathsf{Bit2A}$ and $\mathsf{VSWA}$ compared to $\mathsf{ABY}$-$\mathsf{AHE}$ (resp. $\mathsf{ABY}$-$\mathsf{MT}$) based methods (NDSS'15), (\romannumeral3) and reduces total online communication and run-time by $167$-$1416\times$ and $3.1$-$7.4\times$ compared to $\mathsf{FLGUARD}$ (Crypto Eprint 2021/025).