International Association for Cryptologic Research

International Association
for Cryptologic Research

IACR News

Updates on the COVID-19 situation are on the Announcement channel.

Here you can see all recent updates to the IACR webpage. These updates are also available:

RSS symbol icon
via RSS feed
Twitter bird icon
via Twitter
Weibo icon
via Weibo
Facebook icon
via Facebook

28 July 2021

Léo Ducas, Wessel van Woerden
ePrint Report ePrint Report
Until recently lattice reduction attacks on NTRU lattices were thought to behave similar as on (ring-)LWE lattices with the same parameters. However several works (Albrecht-Bai-Ducas 2016, Kirchner-Fouque 2017) showed a significant gap for large moduli $q$, the so-called overstretched regime of NTRU.

With the NTRU scheme being a finalist to the NIST PQC competition it is important to understand ---both asymptotically and concretely--- where the fatigue point lies exactly, i.e. at which $q$ the overstretched regime begins. Unfortunately the analysis by Kirchner and Fouque is based on an impossibility argument, which only results in an asymptotic upper bound on the fatigue point. It also does not really {\em explain} how lattice reduction actually recovers secret-key information.

We propose a new analysis that asymptotically improves on that of Kirchner and Fouque, narrowing down the fatigue point for ternary NTRU from $q \leq n^{2.783+o(1)}$ to $q=n^{2.484+o(1)}$, and finally explaining the mechanism behind this phenomenon. We push this analysis further to a concrete one, settling the fatigue point at $q \approx 0.004 \cdot n^{2.484}$, and allowing precise hardness predictions in the overstretched regime. These predictions are backed by extensive experiments.
Expand
Hanno Becker, Jose Maria Bermudo Mera, Angshuman Karmakar, Joseph Yiu, Ingrid Verbauwhede
ePrint Report ePrint Report
High-degree, low-precision polynomial arithmetic is a fundamental computational primitive underlying structured lattice based cryptography. Its algorithmic properties and suitability for implementation on different compute platforms is an active area of research, and this article contributes to this line of work: Firstly, we present memory-efficiency and performance improvements for the Toom-Cook/Karatsuba polynomial multiplication strategy. Secondly, we provide implementations of those improvements on Arm® Cortex®-M4 CPU, as well as the newer Cortex-M55 processor, the first M-profile core implementing the M-profile Vector Extension (MVE), also known as Arm® Helium™ technology. We also implement the Number Theoretic Transform (NTT) on the Cortex-M55 processor. We show that despite being single issue, in-order and offering only 8 vector registers compared to 32 on A-profile SIMD architectures like Arm® Neon™ technology and the Scalable Vector Extension (SVE), by careful register management and instruction scheduling, we can obtain a 3× to 5× performance improvement over already highly optimized implementations on Cortex-M4, while maintaining a low area and energy profile necessary for use in embedded market. Finally, as a real-world application we integrate our multiplication techniques to post-quantum key-encapsulation mechanism Saber.
Expand
Annapurna Valiveti, Srinivas Vivek
ePrint Report ePrint Report
Masking using randomised lookup tables is a popular countermeasure for side-channel attacks, particularly at small masking orders. An advantage of this class of countermeasures for masking S-boxes compared to ISW-based masking is that it supports pre-processing and thus significantly reducing the amount of computation to be done after the unmasked inputs are available. Indeed, the online computation can be as fast as just a table lookup. But the size of the randomised lookup table increases linearly with the masking order, and hence the RAM memory required to store pre-processed tables becomes infeasible for higher masking orders. Hence demonstrating the feasibility of full pre-processing of higher-order lookup table-based masking schemes on resource-constrained devices has remained an open problem.

In this work, we solve the above problem by implementing a higher-order lookup table-based scheme using an amount of RAM memory that is essentially independent of the masking order. More concretely, we reduce the amount of RAM memory needed for the table-based scheme of Coron et al. (TCHES 2018) approximately by a factor equal to the number of shares. Our technique is based upon the use of PRG to minimise the randomness complexity of ISW-based masking schemes proposed by Ishai et al. (ICALP 2013) and Coron et al. (Eurocrypt 2020). Hence we show that for lookup table-based masking schemes, the use of a PRG not only reduces the randomness complexity (now logarithmic in the size of the S-box) but also the memory complexity, and without any significant increase in the overall running time. We have implemented in software the higher-order table-based masking scheme of Coron et al. (TCHES 2018) at tenth order with full pre-processing of a single execution of all the AES S-boxes on a ARM Cortex-M4 device that has 256 KB RAM memory. Our technique requires only 41.2 KB of RAM memory, whereas the original scheme would have needed 440 KB. Moreover, our 8-bit implementation results demonstrate that the online execution time of our variant is about 1.5 times faster compared to the 8-bit bitsliced masked implementation of AES-128.
Expand
Elias Rohrer, Florian Tschorsch
ePrint Report ePrint Report
In order to propagate transactions and blocks, today’s blockchain systems rely on unstructured peer-to-peer overlay networks. In such networks, broadcast is known to be an inefficient operation in terms of message complexity and overhead. In addition to the impact on the system performance, inefficient or delayed block propagation may have severe consequences regarding security and fairness of the consensus layer. In contrast, the Kadcast protocol is a structured peer-to-peer protocol for block and transaction propagation in blockchain networks. Kadcast utilizes the well-known overlay topology of Kademlia to realize an efficient broadcast operation with tunable overhead. We study the security and privacy of the Kadcast protocol based on probabilistic models and analyze its resilience to packet losses and node failures. Moreover, we evaluate Kadcast’s block delivery performance, broadcast reliability, efficiency, and security based on advanced network simulations. Lastly, we introduce a QUIC-based prototype implementation of the Kadcast protocol and show its merits through deployment in a large-scale cloud-based testbed.
Expand
Amin Abdulrahman, Jiun-Peng Chen, Yu-Jia Chen, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang
ePrint Report ePrint Report
The U.S. National Institute of Standards and Technology (NIST) has designated ARM microcontrollers as an important benchmarking platform for its Post-Quantum Cryptography standardization process (NISTPQC). In view of this, we explore the design space of the NISTPQC finalist Saber on the Cortex-M4 and its close relation, the Cortex-M3. In the process, we investigate various optimization strategies and memory-time tradeoffs for number-theoretic transforms (NTTs).

Recent work by Chung et al. has shown that NTT multiplication is superior compared to Toom--Cook multiplication for unprotected Saber implementations on the Cortex-M4 in terms of speed. However, it remains unclear if NTT multiplication can outperform Toom--Cook in masked implementations of Saber. Additionally, it is an open question if Saber with NTTs can outperform Toom--Cook in terms of stack usage. We answer both questions in the affirmative. Additionally, we present a Cortex-M3 implementation of Saber using NTTs outperforming an existing Toom--Cook implementation. Our stack-optimized unprotected M4 implementation uses around the same amount of stack as the most stack-optimized implementation using Toom--Cook while being 33%-41% faster. Our speed-optimized masked M4 implementation is 16% faster than the fastest masked implementation using Toom--Cook. For the Cortex-M3, we outperform existing implementations by 29%-35% in speed.

We conclude that for both stack- and speed-optimization purposes, one should base polynomial multiplications in Saber on the NTT rather than Toom--Cook for the Cortex-M4 and Cortex-M3. In particular, in many cases, composite moduli NTTs perform best.
Expand
Dana Dachman-Soled, Huijing Gong, Hunter Kippen, Aria Shahverdi
ePrint Report ePrint Report
We consider the Learning Parity with Noise (LPN) problem with sparse secret, where the secret vector $\textbf{s}$ of dimension $n$ has Hamming weight at most $k$. We are interested in algorithms with asymptotic improvement in the $\textit{exponent}$ beyond the state of the art. Prior work in this setting presented algorithms with runtime $n^{c \cdot k}$ for constant $c < 1$, obtaining a constant factor improvement over brute force search, which runs in time ${n \choose k}$. We obtain the following results: - We first consider the $\textit{constant}$ error rate setting, and in this case present a new algorithm that leverages a subroutine from the acclaimed BKW algorithm [Blum, Kalai, Wasserman, J.~ACM '03] as well as techniques from Fourier analysis for $p$-biased distributions. Our algorithm achieves asymptotic improvement in the exponent compared to prior work, when the sparsity $k = k(n) = \frac{n}{\log^{1+ 1/c}(n)}$, where $c \in o(\log \log(n))$ and $c \in \omega(1)$. The runtime and sample complexity of this algorithm are approximately the same. - We next consider the $\textit{low noise}$ setting, where the error is subconstant. We present a new algorithm in this setting that requires only a $\textit{polynomial}$ number of samples and achieves asymptotic improvement in the exponent compared to prior work, when the sparsity $k = \frac{1}{\eta} \cdot \frac{\log(n)}{\log(f(n))}$ and noise rate of $\eta \neq 1/2$ and $\eta^2 = \left(\frac{\log(n)}{n} \cdot f(n)\right)$, for $f(n) \in \omega(1) \cap n^{o(1)}$. To obtain the improvement in sample complexity, we create subsets of samples using the $\textit{design}$ of Nisan and Wigderson [J.~Comput.~Syst.~Sci. '94], so that any two subsets have a small intersection, while the number of subsets is large. Each of these subsets is used to generate a single $p$-biased sample for the Fourier analysis step. We then show that this allows us to bound the covariance of pairs of samples, which is sufficient for the Fourier analysis. - Finally, we show that our first algorithm extends to the setting where the noise rate is very high $1/2 - o(1)$, and in this case can be used as a subroutine to obtain new algorithms for learning DNFs and Juntas. Our algorithms achieve asymptotic improvement in the exponent for certain regimes. For DNFs of size $s$ with approximation factor $\epsilon$ this regime is when $\log \frac{s}{\epsilon} \in \omega \left( \frac{c}{\log n \log \log c}\right)$, and $\log \frac{s}{\epsilon} \in n^{1 - o(1)}$, for $c \in n^{1 - o(1)}$. For Juntas of $k$ the regime is when $k \in \omega \left( \frac{c}{\log n \log \log c}\right)$, and $k \in n^{1 - o(1)}$, for $c \in n^{1 - o(1)}$.
Expand
Ye Dong, Xiaojun Chen, Kaiyun Li, Dakui Wang, Shuai Zeng
ePrint Report ePrint Report
\textit{Privacy} and \textit{Byzantine-robustness} are two major concerns of federated learning (FL), but mitigating both threats simultaneously is highly challenging: privacy-preserving strategies prohibit access to individual model updates to avoid leakage, while Byzantine-robust methods require access for comprehensive mathematical analysis. Besides, most Byzantine-robust methods only work in the \textit{honest-majority} setting.

We present $\mathsf{FLOD}$, a novel oblivious defender for private Byzantine-robust FL in dishonest-majority setting. Basically, we propose a novel Hamming distance-based aggregation method to resist $>1/2$ Byzantine attacks using a small \textit{root-dataset} and \textit{server-model} for bootstrapping trust. Furthermore, we employ two non-colluding servers and use additive homomorphic encryption ($\mathsf{AHE}$) and secure two-party computation (2PC) primitives to construct efficient privacy-preserving building blocks for secure aggregation, in which we propose two novel in-depth variants of Beaver Multiplication triples (MT) to reduce the overhead of Bit to Arithmetic ($\mathsf{Bit2A}$) conversion and vector weighted sum aggregation ($\mathsf{VSWA}$) significantly. Experiments on real-world and synthetic datasets demonstrate our effectiveness and efficiency: (\romannumeral1) $\mathsf{FLOD}$ defeats known Byzantine attacks with a negligible effect on accuracy and convergence, (\romannumeral2) achieves a reduction of $\approx 2\times$ for offline (resp. online) overhead of $\mathsf{Bit2A}$ and $\mathsf{VSWA}$ compared to $\mathsf{ABY}$-$\mathsf{AHE}$ (resp. $\mathsf{ABY}$-$\mathsf{MT}$) based methods (NDSS'15), (\romannumeral3) and reduces total online communication and run-time by $167$-$1416\times$ and $3.1$-$7.4\times$ compared to $\mathsf{FLGUARD}$ (Crypto Eprint 2021/025).
Expand
Kaizhan Lin , Jianming Lin, Weize Wang, Chang-an Zhao
ePrint Report ePrint Report
In recent years, the isogeny-based protocol, namely supersingular isogeny Diffie-Hellman (SIDH) has become highly attractive for its small public key size. In addition, the public-key compression makes supersingular isogeny key encapsulation scheme (SIKE) more competitive in the NIST post-quantum cryptography standardization effort. However, compared to other post-quantum protocols, the computational cost of SIDH is relatively high, and so is the public-key compression. On the other hand, the storage for pairing computation and discrete logarithms to speed up the current implementation of the key compression is somewhat large.

In this paper, we mainly improve the performance of the public-key compression of SIDH, especially the efficiency and the storage of pairing computation involved. Our experimental results show that the memory requirement for pairing computation is reduced by a factor of about 1.31, and meanwhile, the instantiation of the key generation of SIDH is $3.99\%\sim 5.95\%$ faster than the current state-of-the-art. Besides, in the case of Bob, we present another method to further reduce storage cost, while the acceleration is not as obvious as the former. %achieves an acceleration factor of $1.10\sim1.17$.
Expand
Naila Mukhtar, Lejla Batina, Stjepan Picek, Yinan Kong
ePrint Report ePrint Report
Deep learning-based side-channel analysis performance heavily depends on the dataset size and the number of instances in each target class. Both small and imbalanced datasets might lead to unsuccessful side-channel attacks. The attack performance can be improved by generating traces synthetically from the obtained data instances instead of collecting them from the target device. Unfortunately, generating the synthetic traces that have characteristics of the actual traces using random noise is a difficult and cumbersome task. This research proposes a novel data augmentation approach based on conditional generative adversarial networks (cGAN) and Siamese networks, enhancing in this way the attack capability. We present a quantitative comparative machine learning-based side-channel analysis between a real raw signal leakage dataset and an artificially augmented leakage dataset. The analysis is performed on the leakage datasets for both symmetric and public-key cryptographic implementations. We also investigate non-convergent networks' effect on the generation of fake leakage signals using two cGAN based deep learning models. The analysis shows that the proposed data augmentation model results in a well-converged network that generates realistic leakage traces, which can be used to mount deep learning-based side-channel analysis successfully even when the dataset available from the device is not optimal. Our results show potential in breaking datasets enhanced with ``faked'' leakage traces, which could change the way we perform deep learning-based side-channel analysis.
Expand
Sabrina Kunzweiler, Yan Bo Ti, Charlotte Weitkämper
ePrint Report ePrint Report
We present a polynomial-time adaptive attack on the genus-2 variant of the SIDH protocol (G2SIDH) and describe an improvement to its secret selection procedure. G2SIDH is a generalisation of the Supersingular Isogeny Diffie-Hellman key exchange into the genus-2 setting which was proposed by Flynn and Ti. G2SIDH is able to achieve the same security as SIDH while using fields a third of the size. We give a thorough analysis of the keyspace of G2SIDH and achieve an improvement to the secret selection by using symplectic bases for the torsion subgroups. This allows for the near uniform sampling of secrets without needing to solve multiple linear congruences as suggested by Flynn-Ti. The proposed adaptive attack on G2SIDH is able to recover the secret when furnished with an oracle that returns a single bit of information. We ensure that the maliciously generated information provided by the attacker cannot be detected by implementing simple countermeasures such as checking the Weil pairing or order of the given points. We demonstrate this attack and show that it is able to recover the secret isogeny in all cases of G2SIDH using a symplectic basis before extending the strategy to arbitrary bases.
Expand
Jia Xu, Yiwen Gao, Hoon Wei Lim, Hongbing Wang, Ee-Chien Chang
ePrint Report ePrint Report
A $(1,n)$-robust combiner combines $n$ cryptography primitives to construct a new primitive of the same type, and guarantees that if any of the ingredient primitive is secure, then the resulting primitive is secure. In recent two decades, robust combiners for various crypto primitives (e.g. public key encryption, oblivious transfer) have been proposed. Very recently, more works on robust combiners for post-quantum key encapsulation mechanism appear to achieve multi-layer of defence, to counter the future threat from Shor's algorithm running on powerful quantum computers. However, typically such combination of $n$ crypto primitives will sum up running times of all ingredient primitives and thus introduce linear overhead in time complexity, which may be a big burden on server side, since the server has to run key encapsulation mechanism (or key exchange protocol) with every online client.

We propose the very first robust combiner (of KEMs), with $O(1)$ \emph{amortized} complexity overhead, which not only breaks the linear boundary, but also achieves optimal complexity. Our experiments also confirm that the performance overhead of our robust combiner of $n$ KEMs is constant (i.e. $O(1)$) rather than linear (i.e. $O(n)$). Our cost is that, the resulting KEM has to maintain a secret dynamic state of fixed and linear size (i.e. $O(n)$) . We call such KEM as Stateful Key Encapsulation Mechanism (SKEM). SKEM is suitable for two users (or devices), who will have \emph{frequent} secure communications (e.g. via VPN or SSH). We also formally define the security formulation for SKEM and prove the security of our proposed SKEM scheme in standard model.
Expand
George Teseleanu
ePrint Report ePrint Report
Concurrent signatures allow two entities to produce two ambiguous signatures that become binding once an extra piece of information (called the keystone) is released. Such a signature is developed by Chen \emph{et al.}, but it restricts signers to using the same public parameters. We describe and analyse a new concurrent signature that allows users to sign documents even if they use different underlying hard problems when generating their public parameters.
Expand

27 July 2021

Kai Gellert, Tobias Handirk
ePrint Report ePrint Report
The TLS 1.3 session resumption handshakes enables a client and a server to resume a previous connection via a shared secret, which was established during a previous session. In practice, this is often done via session tickets, where the server provides a "self-encrypted" ticket containing the shared secret to its clients. A client may resume its session by sending the ticket to the server, which allows the server to retrieve the shared secret stored within the ticket.

Usually, a ticket is only accepted by the server that issued the ticket. However, in practice, servers that share the same hostname, often share the same key material for ticket encryption. The concept of a server accepting a ticket, which was issued by a different server, is known as session resumption across hostnames (SRAH). In 2020, Sy et al. showed in an empirical analysis that, by using SRAH, the time to load a webpage can be reduced by up to 31% when visiting the page for the very first time. Despite its performance advantages, the TLS 1.3 specification currently discourages the use of SRAH.

In this work, we formally investigate which security guarantees can be achieved when using SRAH. To this end, we provide the first formalization of SRAH and analyze its security in the multi-stage key exchange model (Dowling et al.; JoC 2021), which proved useful in previous analyses of TLS handshakes. We find that an adversary can break authentication if clients do not specify the intended receiver of their first protocol message. However, if the intended receiver is specified by the client, we prove that SRAH is secure in the multi-stage key exchange model.
Expand
Announcement Announcement
Dear Cryptographers,

Here you can find a compilation of mentoring videos with Q&A's on such questions as:
  • How to prepare a good talk?
  • Was there a time when you doubted yourself?
  • How do you find a research topic?
  • And many many more questions, all answered by people who have been through it before you, there will be many familiar faces.
The organizers:
  • Peihan Miao
  • Tal Rabin
  • Xiao Wang
Expand

25 July 2021

Dan Boneh, Hart Montgomery, Ananth Raghunathan
ePrint Report ePrint Report
We construct an algebraic pseudorandom function (PRF) that is more efficient than the classic Naor- Reingold algebraic PRF. Our PRF is the result of adapting the cascade construction, which is the basis of HMAC, to the algebraic settings. To do so we define an augmented cascade and prove it secure when the underlying PRF satisfies a property called parallel security. We then use the augmented cascade to build new algebraic PRFs. The algebraic structure of our PRF leads to an efficient large-domain Verifiable Random Function (VRF) and a large-domain simulatable VRF.
Expand
Gachon University, Korea
Job Posting Job Posting
Post-doctoral fellow position in Department of Computer Engineering at Gachon University in the field of information security including cryptography for at least two year appointment. Applicants should have their Ph.D degrees as of August 31, 2021. Please email with subject ‘Postdoc position’ statement of research, CV, recommendation letters or referees, and publications records to sohwang (at) gachon.ac.kr Closing date for applications: 07 August 2021 More information: https://ai-security.github.io/index_e.htm

Closing date for applications:

Contact: Contact Professor Seong Oun Hwang at sohwang (at) gachon.ac.kr

Expand

23 July 2021

Hanno Becker, Vincent Hwang, Matthias J. Kannwischer, Bo-Yin Yang, Shang-Yi Yang
ePrint Report ePrint Report
We present new speed records on the Arm-v8A architecture for the lattice-based schemes Dilithium, Kyber, and Saber. The core novelty in this paper is the combination of Montgomery multiplication and Barrett reduction resulting in “Barrett multiplication” which allows particularly efficient modular one-known-factor multiplication using the Arm-v8A Neon vector instructions. These novel techniques combined with fast two-unknown-factor Montgomery multiplication, Barrett reduction sequences, and interleaved multi-stage butterflies result in significantly faster code. We also introduce “asymmetric multiplication” which is an improved technique for caching the results of the incomplete NTT, used e.g. for matrix-to-vector polynomial multiplication. Our implementations target the Arm Cortex-A72 CPU, on which our speed is 1.7× that of the state-of-the-art matrix-to-vector polynomial multiplication in Kyber [Nguyen–Gaj 2021]. For Saber, NTTs are far superior to Toom–Cook multiplication on the Arm-v8A architecture, outrunning the matrix-to-vector polynomial multiplication by 2.1×. On the Apple M1, our matrix-vector products run 2.1× and 1.9× faster for Kyber and Saber respectively.
Expand
Karim Lounis
ePrint Report ePrint Report
Due to the heterogeneity and the particular security requirements of IoT (Internet of Things), developing secure, low-cost, and lightweight authentication protocols has become a serious challenge. This has excited the research community to design and develop new authentication protocols that meet IoT requirements. An interesting hardware technology, called PUFs (Physical Unclonable Functions), has been the subject of many subsequent publications on lightweight, low-cost, and secure-by-design authentication protocols for the past six years. In 2020, a lightweight PUF-based authenticated key-exchange (AKE) scheme was proposed. The scheme claimed to provide mutual authentication and key establishment. The protocol was demonstrated to be vulnerable to a spoofing attack, where an attacker is able to compromise the authentication claims that are made during the execution of the protocol. Recently, some researchers have argued the validity of the attack due to a misunderstanding of security protocol specification principles. In this paper, we show how the authentication claim, as well as the key-establishment claim of the authentication protocol, can be compromised by spoofing the server and fooling the meter.
Expand
Alan Szepieniec
ePrint Report ePrint Report
This paper proposes the use of Legendre symbols as component gates in the design of ciphers tailored for use in cryptographic proof systems. Legendre symbols correspond to high-degree maps, but can be evaluated much faster. As a result, a cipher that uses Legendre symbols can offer the same security as one that uses high-degree maps but without incurring the penalty of a comparatively slow evaluation time.

After discussing the design considerations induced by the use of Legendre symbol gates, we present a concrete design that follows this strategy, along with an elaborate security analysis thereof. This cipher is called Grendel.
Expand
Elena Fuchs, Kristin Lauter, Matthew Litman, Austin Tran
ePrint Report ePrint Report
Cryptographic hash functions from expander graphs were proposed by Charles, Goren, and Lauter in [CGL] based on the hardness of finding paths in the graph. In this paper, we propose a new candidate for a hash function based on the hardness of finding paths in the graph of Markoff triples modulo p. These graphs have been studied extensively in number theory and various other fields, and yet finding paths in the graphs remains difficult. We discuss the hardness of finding paths between points, based on the structure of the Markoff graphs. We investigate several possible avenues for attack and estimate their running time to be greater than O(p). In particular, we analyze a recent groundbreaking proof in [BGS1] that such graphs are connected and discuss how this proof gives an algorithm for finding paths.
Expand
◄ Previous Next ►