IACR News
Here you can see all recent updates to the IACR webpage. These updates are also available:
30 January 2024
Jung Hee Cheon, Hyeongmin Choe, Alain Passelègue, Damien Stehlé, Elias Suvanto
ePrint Report
A new security model for fully homomorphic encryption (FHE), called INDCPA-D security and introduced by Li and Micciancio [Eurocrypt'21], strengthens INDCPA-D security by giving the attacker access to a decryption oracle for ciphertexts for which it should know the underlying plaintexts. This includes ciphertexts that it (honestly) encrypted and those obtained from the latter by evaluating circuits that it chose. Li and Micciancio singled out the CKKS FHE scheme for approximate data [Asiacrypt'17] by giving an INDCPA-D attack on it and (erroneously) claiming that INDCPA-D security and INDCPA-D security coincide for FHEs on exact data.
We correct the widespread belief according to which INDCPA-D attacks are specific to approximate homomorphic computations. Indeed, the equivalency formally proved by Li and Micciancio assumes that the schemes are not only exact but have a negligible probability of incorrect decryption. However, almost all competitive implementations of exact FHE schemes give away strong correctness by analyzing correctness heuristically and allowing noticeable probabilities of incorrect decryption.
We exploit this imperfect correctness to mount efficient indistinguishability and key-recovery attacks against all major exact FHE schemes. We illustrate their strength by concretely breaking the default BFV implementation of OpenFHE and simulating an attack for the default parameter set of the CGGI implementation of TFHE-rs (the attack is too expensive to be run on commodity desktops, because of the cost of CGGI bootstrapping). Our attacks extend to threshold versions of the exact FHE schemes, when the correctness is similarly loose.
We correct the widespread belief according to which INDCPA-D attacks are specific to approximate homomorphic computations. Indeed, the equivalency formally proved by Li and Micciancio assumes that the schemes are not only exact but have a negligible probability of incorrect decryption. However, almost all competitive implementations of exact FHE schemes give away strong correctness by analyzing correctness heuristically and allowing noticeable probabilities of incorrect decryption.
We exploit this imperfect correctness to mount efficient indistinguishability and key-recovery attacks against all major exact FHE schemes. We illustrate their strength by concretely breaking the default BFV implementation of OpenFHE and simulating an attack for the default parameter set of the CGGI implementation of TFHE-rs (the attack is too expensive to be run on commodity desktops, because of the cost of CGGI bootstrapping). Our attacks extend to threshold versions of the exact FHE schemes, when the correctness is similarly loose.
Emanuele Bellini, David Gerault, Matteo Protopapa, Matteo Rossi
ePrint Report
The search for differential characteristics on block ciphers is a difficult combinatorial problem. In this paper, we investigate the performances of an AI-originated technique, Single Player Monte-Carlo Tree Search (SP-MCTS), in finding good differential characteristics on ARX ciphers, with an application to the block cipher SPECK. In order to make this approach competitive, we include several heuristics, such as the combination of forward and backward searches, and achieve significantly faster results than state-of-the-art works that are not based on automatic solvers.
We reach 9, 11, 13, 13 and 15 rounds for SPECK32, SPECK48, SPECK64, SPECK96 and SPECK128 respectively. In order to build our algorithm, we revisit Lipmaa and Moriai's algorithm for listing all optimal differential transitions through modular addition, and propose a variant to enumerate all transitions with probability close (up to a fixed threshold) to the optimal, while fixing a minor bug in the original algorithm.
Award
Dear IACR members,
Nominations for the 2024 Test-of-Time award (for papers published in 2009) will be accepted until Feb 28, 2024.
Details for the nomination process can be found here: https://www.iacr.org/testoftime/nomination.html
Nominations for the 2024 Test-of-Time award (for papers published in 2009) will be accepted until Feb 28, 2024.
Details for the nomination process can be found here: https://www.iacr.org/testoftime/nomination.html
29 January 2024
New self-orthogonal codes from weakly regular plateaued functions and their application in LCD codes
Melike Çakmak, Ahmet Sınak, Oğuz Yayla
ePrint Report
A linear code is considered self-orthogonal if it is contained within its dual code. Self-orthogonal codes have applications in linear complementary dual codes, quantum codes and so on. The construction of self-orthogonal codes from functions over finite fields has been studied in the literature. In this paper, we generalize the construction method given by Heng et al. (2023) to weakly regular plateaued functions. We first construct several families of ternary self-orthogonal codes from weakly regular plateaued unbalanced functions. Then we use the self-orthogonal codes to construct new families of ternary LCD codes. As a consequence, we obtain (almost) optimal ternary self-orthogonal codes and LCD codes. Moreover, we propose two new classes of $p$-ary linear codes from weakly regular plateaued unbalanced functions over the finite fields of odd characteristics.
Akira Ito, Rei Ueno, Naofumi Homma
ePrint Report
In conventional deep-learning-based side-channel attacks (DL-SCAs), an attacker trains a model by updating parameters to minimize the negative log-likelihood (NLL) loss function. Although a decrease in NLL improves DL-SCA performance, the reasons for this improvement remain unclear because of the lack of a formal analysis. To address this open problem, this paper explores the relationship between NLL and the attack success rate (SR) and conducts an information-theoretical analysis of DL-SCAs with an NLL loss function to solve open problems in DL-SCA. To this end, we introduce a communication channel for DL-SCAs and derive an inequality that links model outputs to the SR. Our inequality states that mutual information between the model output and intermediate value, which is named the latent perceived information (LPI), provides an upper bound of the SR of a DL-SCA with a trained neural network. Subsequently, we examine the conjecture by Ito et al. on the relationship between the effective perceived information (EPI) and SR and clarify its valid conditions from the perspective of LPI. Our analysis results reveal that a decrease in NLL correlated with an increase in LPI, which indicates that the model capability to extract intermediate value information from traces is enhanced. In addition, the results indicate that the LPI bounds the SR from above, and a higher upper bound of the SR could directly improve the SR if the selection function satisfies certain conditions, such as bijectivity. Finally, these theoretical insights are validated through attack experiments on neural network models applied to AES software and hardware implementations with masking countermeasures.
Elette Boyle, Ilan Komargodski, Neekon Vafa
ePrint Report
We study the complexity of memory checkers with computational security and prove the first general tight lower bound.
Memory checkers, first introduced over 30 years ago by Blum, Evans, Gemmel, Kannan, and Naor (FOCS '91, Algorithmica '94), allow a user to store and maintain a large memory on a remote and unreliable server by using small trusted local storage. The user can issue instructions to the server and after every instruction, obtain either the correct value or a failure (but not an incorrect answer) with high probability. The main complexity measure of interest is the size of the local storage and the number of queries the memory checker makes upon every logical instruction. The most efficient known construction has query complexity $O(\log n/\log\log n)$ and local space proportional to a computational security parameter, assuming one-way functions, where $n$ is the logical memory size. Dwork, Naor, Rothblum, and Vaikuntanathan (TCC '09) showed that for a restricted class of ``deterministic and non-adaptive'' memory checkers, this construction is optimal, up to constant factors. However, going beyond the small class of deterministic and non-adaptive constructions has remained a major open problem.
In this work, we fully resolve the complexity of memory checkers by showing that any construction with local space $p$ and query complexity $q$ must satisfy $$ p \ge \frac{n}{(\log n)^{O(q)}} \;. $$ This implies, as a special case, that $q\ge \Omega(\log n/\log\log n)$ in any scheme, assuming that $p\le n^{1-\varepsilon}$ for $\varepsilon>0$. The bound applies to any scheme with computational security, completeness $2/3$, and inverse polynomial in $n$ soundness (all of which make our lower bound only stronger). We further extend the lower bound to schemes where the read complexity $q_r$ and write complexity $q_w$ differ. For instance, we show the tight bound that if $q_r=O(1)$ and $p\le n^{1-\varepsilon}$ for $\varepsilon>0$, then $q_w\ge n^{\Omega(1)}$. This is the first lower bound, for any non-trivial class of constructions, showing a read-write query complexity trade-off.
Our proof is via a delicate compression argument showing that a ``too good to be true'' memory checker can be used to compress random bits of information. We draw inspiration from tools recently developed for lower bounds for relaxed locally decodable codes. However, our proof itself significantly departs from these works, necessitated by the differences between settings.
Memory checkers, first introduced over 30 years ago by Blum, Evans, Gemmel, Kannan, and Naor (FOCS '91, Algorithmica '94), allow a user to store and maintain a large memory on a remote and unreliable server by using small trusted local storage. The user can issue instructions to the server and after every instruction, obtain either the correct value or a failure (but not an incorrect answer) with high probability. The main complexity measure of interest is the size of the local storage and the number of queries the memory checker makes upon every logical instruction. The most efficient known construction has query complexity $O(\log n/\log\log n)$ and local space proportional to a computational security parameter, assuming one-way functions, where $n$ is the logical memory size. Dwork, Naor, Rothblum, and Vaikuntanathan (TCC '09) showed that for a restricted class of ``deterministic and non-adaptive'' memory checkers, this construction is optimal, up to constant factors. However, going beyond the small class of deterministic and non-adaptive constructions has remained a major open problem.
In this work, we fully resolve the complexity of memory checkers by showing that any construction with local space $p$ and query complexity $q$ must satisfy $$ p \ge \frac{n}{(\log n)^{O(q)}} \;. $$ This implies, as a special case, that $q\ge \Omega(\log n/\log\log n)$ in any scheme, assuming that $p\le n^{1-\varepsilon}$ for $\varepsilon>0$. The bound applies to any scheme with computational security, completeness $2/3$, and inverse polynomial in $n$ soundness (all of which make our lower bound only stronger). We further extend the lower bound to schemes where the read complexity $q_r$ and write complexity $q_w$ differ. For instance, we show the tight bound that if $q_r=O(1)$ and $p\le n^{1-\varepsilon}$ for $\varepsilon>0$, then $q_w\ge n^{\Omega(1)}$. This is the first lower bound, for any non-trivial class of constructions, showing a read-write query complexity trade-off.
Our proof is via a delicate compression argument showing that a ``too good to be true'' memory checker can be used to compress random bits of information. We draw inspiration from tools recently developed for lower bounds for relaxed locally decodable codes. However, our proof itself significantly departs from these works, necessitated by the differences between settings.
Gaurav Panwar, Roopa Vishwanathan, George Torres, Satyajayant Misra
ePrint Report
Payment channel networks are a promising solution to the scalability challenge of blockchains and are designed for significantly increased transaction throughput compared to the layer one blockchain. Since payment channel networks are essentially decentralized peer-to-peer networks, routing transactions is a fundamental challenge. Payment channel networks have some unique security and privacy requirements that make pathfinding challenging, for instance, network topology is not publicly known, and sender/receiver privacy should be preserved, in addition to providing atomicity guarantees for payments. In this paper, we present an efficient privacy-preserving routing protocol, SPRITE, for payment channel networks that supports concurrent transactions. By finding paths offline and processing transactions online, SPRITE can process transactions in just two rounds, which is more efficient compared to prior work. We evaluate SPRITE’s performance using Lightning Net- work data and prove its security using the Universal Composability framework. In contrast to the current cutting-edge methods that achieve rapid transactions, our approach significantly reduces the message complexity of the system by 3 orders of magnitude while maintaining similar latencies.
Stephen Meredith Williams
ePrint Report
In its standard form, the AKS prime identification algorithm is deterministic and polynomial time but too slow to be of practical use. By dropping its deterministic attribute, it can be accelerated to an extent that it is practically useful, though still much slower than the widely used Miller-Rabin-Selfridge-Monier (MRSM) algorithm based on the Fermat Little Theorem or the Solovay-Strassen algorithm based on the Euler Criterion. The change made, in the last stage of AKS, is to check a modular equation not for a long sequence of values but for a single one. Another change is to reduce, arbitrarily, the size of the parameter $r$ giving the degree of the polynomial used for the last stage.
Daniel Collins, Loïs Huguenin-Dumittan, Ngoc Khanh Nguyen, Nicolas Rolin, Serge Vaudenay
ePrint Report
The Signal protocol and its X3DH key exchange core are regularly used by billions of people in applications like WhatsApp but are unfortunately not quantum-secure. Thus, designing an efficient and post-quantum secure X3DH alternative is paramount. Notably, X3DH supports asynchronicity, as parties can immediately derive keys after uploading them to a central server, and deniability, allowing parties to plausibly deny having completed key exchange. To satisfy these constraints, existing post-quantum X3DH proposals use ring signatures (or equivalently a form of designated-verifier signatures) to provide authentication without compromising deniability as regular signatures would. Existing ring signature schemes, however, have some drawbacks. Notably, they are not generally proven secure in
the quantum random oracle model (QROM) and so the quantum security of parameters that are proposed is unclear and likely weaker than claimed. In addition, they are generally slower than standard primitives like KEMs.
In this work, we propose an efficient, deniable and post-quantum X3DH-like protocol that we call K-Waay, that does not rely on ring signatures. At its core, K-Waay uses a split-KEM, a primitive introduced by Brendel et al. [SAC 2020], to provide Diffie-Hellman-like implicit authentication and secrecy guarantees. Along the way, we revisit the formalism of Brendel et al. and identify that additional security properties are required to prove a split-KEM-based protocol secure. We instantiate split-KEM by building a protocol based on the Frodo key exchange protocol relying on the plain LWE assumption: our proofs might be of independent interest as we show it satisfies our novel unforgeability and deniability security notions. Finally, we complement our theoretical results by thoroughly benchmarking both K-Waay and existing X3DH protocols. Our results show even when using plain LWE and a conservative choice of parameters that K-Waay is significantly faster than previous work.
In this work, we propose an efficient, deniable and post-quantum X3DH-like protocol that we call K-Waay, that does not rely on ring signatures. At its core, K-Waay uses a split-KEM, a primitive introduced by Brendel et al. [SAC 2020], to provide Diffie-Hellman-like implicit authentication and secrecy guarantees. Along the way, we revisit the formalism of Brendel et al. and identify that additional security properties are required to prove a split-KEM-based protocol secure. We instantiate split-KEM by building a protocol based on the Frodo key exchange protocol relying on the plain LWE assumption: our proofs might be of independent interest as we show it satisfies our novel unforgeability and deniability security notions. Finally, we complement our theoretical results by thoroughly benchmarking both K-Waay and existing X3DH protocols. Our results show even when using plain LWE and a conservative choice of parameters that K-Waay is significantly faster than previous work.
Kaartik Bhushan, Sai Lakshmi Bhavana Obbattu, Manoj Prabhakaran, Rajeev Raghunath
ePrint Report
In recent breakthrough results, novel use of garbled circuits yielded constructions for several primitives like Identity-Based Encryption (IBE) and 2-round secure multi-party computation, based on standard assumptions in public-key cryptography. While the techniques in these different results have many common elements, these works did not offer a modular abstraction that could be used across them.
Our main contribution is to introduce a novel notion of obfuscation, called Reach-Restricted Reactive Program Obfuscation (R3PO) that captures the essence of these constructions, and exposes additional capabilities. We provide a powerful composition theorem whose proof fully encapsulates the use of garbled circuits in these works. As an illustration of the potential of R3PO, and as an important contribution of independent interest, we present a variant of Multi-Authority Attribute-Based Encryption (MA-ABE) that can be based on (single-authority) CP-ABE in a blackbox manner, using only standard cryptographic assumptions (e.g., DDH). This is in stark contrast to the existing constructions for MA-ABE, which rely on the random oracle model and/or support only limited policy classes.
Our main contribution is to introduce a novel notion of obfuscation, called Reach-Restricted Reactive Program Obfuscation (R3PO) that captures the essence of these constructions, and exposes additional capabilities. We provide a powerful composition theorem whose proof fully encapsulates the use of garbled circuits in these works. As an illustration of the potential of R3PO, and as an important contribution of independent interest, we present a variant of Multi-Authority Attribute-Based Encryption (MA-ABE) that can be based on (single-authority) CP-ABE in a blackbox manner, using only standard cryptographic assumptions (e.g., DDH). This is in stark contrast to the existing constructions for MA-ABE, which rely on the random oracle model and/or support only limited policy classes.
Charles Gouert, Nektarios Georgios Tsoutsos
ePrint Report
Homomorphic encryption is a powerful privacy-preserving technology that is notoriously difficult to configure and use, even for experts. The key difficulties include restrictive programming models of homomorphic schemes and choosing suitable parameters for an application. In this tutorial, we outline methodologies to solve these issues and allow for conversion of any application to the encrypted domain using both leveled and fully homomorphic encryption.
The first approach, called Walrus, is suitable for arithmetic-intensive applications with limited depth and applications with high throughput requirements. Walrus provides an intuitive programming interface and handles parameterization automatically by analyzing the application and gathering statistics such as homomorphic noise growth to derive a parameter set tuned specifically for the application. We provide an in-depth example of this approach in the form of a neural network inference as well as guidelines for using Walrus effectively.
Conversely, the second approach (HELM) takes existing HDL designs and converts them to the encrypted domain for secure outsourcing on powerful cloud servers. Unlike Walrus, HELM supports FHE backends and is well-suited for complex applications. At a high level, HELM consumes netlists and is capable of performing logic gate operations homomorphically on encryptions of individual bits. HELM incorporates both CPU and GPU acceleration by taking advantage of the inherent parallelism provided by Boolean circuits. As a case study, we walk through the process of taking an off-the-shelf HDL design in the form of AES-128 decryption and running it in the encrypted domain with HELM.
The first approach, called Walrus, is suitable for arithmetic-intensive applications with limited depth and applications with high throughput requirements. Walrus provides an intuitive programming interface and handles parameterization automatically by analyzing the application and gathering statistics such as homomorphic noise growth to derive a parameter set tuned specifically for the application. We provide an in-depth example of this approach in the form of a neural network inference as well as guidelines for using Walrus effectively.
Conversely, the second approach (HELM) takes existing HDL designs and converts them to the encrypted domain for secure outsourcing on powerful cloud servers. Unlike Walrus, HELM supports FHE backends and is well-suited for complex applications. At a high level, HELM consumes netlists and is capable of performing logic gate operations homomorphically on encryptions of individual bits. HELM incorporates both CPU and GPU acceleration by taking advantage of the inherent parallelism provided by Boolean circuits. As a case study, we walk through the process of taking an off-the-shelf HDL design in the form of AES-128 decryption and running it in the encrypted domain with HELM.
Alex Pellegrini, Giovanni Tognolini
ePrint Report
We analyse HWQCS, a code based signature scheme presented at ICISC 2023, which uses quasi-cyclic low density parity check codes (QC-LDPC). The scheme introduces high Hamming weight errors and signs each message using a fresh ephemeral secret key rather than using only one secret key, so to avoid known attacks on QC-LDPC signature schemes.
In this paper, we show that the signatures of HWQCS leak substantial information concerning the ephemeral keys and formally describe this behaviour. Furthermore, we show that for each security level, we can exploit the leakage to efficiently reconstruct partial secret data from very few signatures, and finally mount a universal forgery attack.
Marina Checri, Renaud Sirdey, Aymen Boudguiga, Jean-Paul Bultel, Antoine Choffrut
ePrint Report
In their 2021 seminal paper, Li and Micciancio presented a passive attack against the CKKS approximate FHE scheme and introduced the notion of CPAD security. The current status quo is that this line of attacks does not apply to ``exact'' FHE. In this paper, we challenge this statu quo by exhibiting a CPAD key recovery attack on the linearly homomorphic Regev cryptosystem which easily generalizes to other xHE schemes such as BFV, BGV and TFHE showing that these cryptosystems are not CPAD secure in their basic form. We also show that existing threshold variants of BFV, BGV and CKKS are particularily exposed to CPAD attackers and would be CPAD-insecure without smudging noise addition after partial decryption. Finally we successfully implement our attack against several mainstream FHE libraries and discuss a number of natural countermeasures and discuss their consequences in terms of FHE practice, security and efficiency. The attack itself is quite practical as it typically takes less than an hour on an average laptop PC, requiring a few thousand ciphertexts as well as up to around a million evaluations/decryptions, to perform a full key recovery.
Shihe Ma, Tairong Huang, Anyu Wang, Xiaoyun Wang
ePrint Report
The BGV scheme is one of the most popular FHE schemes for computing homomorphic integer arithmetic.
The bootstrapping technique of BGV is necessary to evaluate arbitrarily deep circuits homomorphically.
However, the BGV bootstrapping performs poorly for large plaintext prime $p$ due to its digit removal procedure exhibiting a computational complexity of at least $O(\sqrt{p})$.
In this paper, we propose optimizations for the digit removal procedure with large $p$ by leveraging the properties of null polynomials over the ring $\mathbb{Z}_{p^e}$.
Specifically, we demonstrate that it is possible to construct low-degree null polynomials based on two observations of the input to the digit removal procedure:
1) the support size of the input can be upper-bounded by $(2B+1)^2$; 2) the size of the lower digits to be removed can be upper-bounded by $B$.
Here $B$ can be controlled within a narrow interval $[22,23]$ in our parameter selection, making the degree of these null polynomials much smaller than $p$ for large values of $p$.
These low-degree null polynomials can significantly reduce the polynomial degrees during homomorphic digit removal, thereby decreasing both running time and capacity consumption.
Theoretically, our optimizations reduce the computational cost of extracting a single digit from $O(\sqrt{pe})$ (by Chen and Han) or $O(\sqrt{p}\sqrt[4]{e})$ (by Geelen et al.) to $\min(2B+1,\sqrt{\lceil e/t\rceil(2B+1)})$ for some $t\ge 1$.
We implement and benchmark our method on HElib with $p=17,127,257,8191$ and $65537$.
With our optimized digit removal, we achieve a bootstrapping throughput $1.38\sim151$ times that in HElib, with the speedup increasing with the value of $p$.
For $p=65537$, we accelerate the digit removal step by 80 times and reduce the bootstrapping time from more than 12 hours to less than 14 minutes.
Quinten Norga, Jan-Pieter D'Anvers, Suparna Kundu, Ingrid Verbauwhede
ePrint Report
The conversion between arithmetic and Boolean mask representations (A2B & B2A) is a crucial component for side-channel resistant implementations of lattice-based cryptography.
In this paper, we present a first- and high-order masked, unified hardware implementation which can perform both A2B & B2A conversions. We optimize the operation on several layers of abstraction, applicable to any protection order.
First, we propose novel higher-order algorithms for the secure addition and B2A operation. This is achieved through, among others, an improved method for repeated masked modular reduction and through the X2B operation, which can be viewed as a conversion from any type of additive masking to its Boolean representation. This allows for the removal of a full secure addition during B2A post-processing.
Compared to prior work, our $B2A_q$ requires 51/46/45 % less fresh randomness at first through third protection order when implemented in software or hardware.
Secondly, on the circuit level, we successfully introduce half-cycle data paths and demonstrate how careful, manual masking is a superior approach for masking highly non-linear operations and providing first- and high-order security. Our techniques significantly reduce the high latency and fresh randomness overhead, typically introduced by glitch-resistant masking schemes and universally composable gadgets, including HPC3 by Knichel et al. presented at CCS 2022. Compared to state-of-the-art algorithms and masking techniques, our unified and high-throughput hardware implementation requires up to 89/84/86 % fewer clock cycles and 78/71/55 % fewer fresh random bits.
We show detailed performance results for first-, second- and third-order protected implementations on FPGA. Our proposed algorithms are proven secure in the glitch extended probing model and their implementations are validated via practical lab analysis using the TVLA methodology. We experimentally show that both our first- and second-order masked implementation is hardened against univariate and multivariate attacks using 100 million traces, for each mode of operation.
Secondly, on the circuit level, we successfully introduce half-cycle data paths and demonstrate how careful, manual masking is a superior approach for masking highly non-linear operations and providing first- and high-order security. Our techniques significantly reduce the high latency and fresh randomness overhead, typically introduced by glitch-resistant masking schemes and universally composable gadgets, including HPC3 by Knichel et al. presented at CCS 2022. Compared to state-of-the-art algorithms and masking techniques, our unified and high-throughput hardware implementation requires up to 89/84/86 % fewer clock cycles and 78/71/55 % fewer fresh random bits.
We show detailed performance results for first-, second- and third-order protected implementations on FPGA. Our proposed algorithms are proven secure in the glitch extended probing model and their implementations are validated via practical lab analysis using the TVLA methodology. We experimentally show that both our first- and second-order masked implementation is hardened against univariate and multivariate attacks using 100 million traces, for each mode of operation.
CISPA Helmholtz Center for Information Security, Saarbrücken, Germany
Job Posting
Two postdoc positions in the research group of Julian Loss at CISPA Helmholtz Center for Information Security in Saarbrücken, Germany. The positions are for two years, funded by ERC project CRYPTOSYSTEMS.
Our research focuses on cryptography and its applications to distributed computing, e.g., making consensus and blockchain protocols more secure and scalable.
The candidate should have a strong background in cryptography or distributed computing.
Closing date for applications:
Contact: Julian Loss
More information: https://www.julianloss.com
26 January 2024
Milan, Italy, 2 December - 6 December 2024
TCC
Event date: 2 December to 6 December 2024
Wenhui Wu, Muzhou Li, Meiqin Wang
ePrint Report
PRESENT is an ultra-lightweight block cipher designed by Bogdanov et al., and has been widely studied since its proposal. It supports 80-bit and 128-bit keys, which are referred as PRESENT-80 and PRESENT-128, respectively. Up to now, linear cryptanalysis is the most effective method on attacking this cipher, especially when accelerated with the pruned Walsh transform. Combing pruned Walsh transform with multiple linear attacks, one can recover the right key for 28-round PRESENT-80 and -128. Later, this method is further improved with affine pruned Walsh transform by adding more zeros in the Walsh spectrum through rejecting some data. This leads to the 29-round attack on PRESENT-128 with full codebook.
In this paper, we follow the affine pruned Walsh transform accelerated linear method, and propose 29-round attacks on both PRESENT-80 and PRESENT-128 without using full codebook. Both attacks rely on a statistical model depicting distributions of the experimental correlation when some data are artificially rejected in its computation. Besides, detailed analysis of complexity reduction for each linear hull used in attacking PRESENT is also provided and supported by an automatic tool. Our 29-round attack on PRESENT-80 mainly benefits from this tool. According to our knowledge, both attacks are the best ones on PRESENT so far.
In this paper, we follow the affine pruned Walsh transform accelerated linear method, and propose 29-round attacks on both PRESENT-80 and PRESENT-128 without using full codebook. Both attacks rely on a statistical model depicting distributions of the experimental correlation when some data are artificially rejected in its computation. Besides, detailed analysis of complexity reduction for each linear hull used in attacking PRESENT is also provided and supported by an automatic tool. Our 29-round attack on PRESENT-80 mainly benefits from this tool. According to our knowledge, both attacks are the best ones on PRESENT so far.
Matthias J. Kannwischer, Markus Krausz, Richard Petri, Shang-Yi Yang
ePrint Report
In July 2022, the US National Institute for Standards and Technology (NIST) announced the first set of Post-Quantum Cryptography standards: Kyber, Dilithium, Falcon, and SPHINCS+. Shortly after, NIST published a call for proposals for additional post-quantum signature schemes to complement their initial portfolio. In 2023, 50 submissions were received, and 40 were accepted as round-1 candidates for future standardization.
In this paper, we study the suitability and performance of said candidates on the popular Arm Cortex-M4microcontroller. We integrate the suitable implementations into the benchmarking framework pqm4 and provide benchmarking results on the STM32L4R5ZI featuring 640 KB of RAM. pqm4 currently includes reference implementations for 15 submissions and M4-optimized implementations for five submissions. For the remaining candidates, we describe the reasons that hinder integration - the predominant reason being large key size or excessive memory consumption.
While the performance of reference implementations is rather meaningless and often does not correlate with the performance of well-optimized implementations, this work provides some first indication of which schemes are most promising on microcontrollers. The publicly available implementations in pqm4 also provide a good starting point for future optimization efforts.
Initially, we were hoping for a much higher code quality than for initial submissions to NIST's previous PQC project. However, we got grossly disappointed: Half of the submissions make use of dynamic memory allocations, often completely without reason; Many implementations have compiler warnings, sometimes hinting at more serious issues; Many implementations do not pass simple sanitizer tests such as using valgrind; Multiple implementations make use of static memory.
Yong Liu, Yuejun Liu, Yongbin Zhou, Yiwen Gao, Zehua Qiao, Huaxin Wang
ePrint Report
Post-Quantum Cryptography (PQC) was proposed due to the potential threats quantum computer attacks against conventional public key cryptosystems, and four PQC algorithms besides CRYSTALS-Dilithium (Dilithium for short) have so far been selected for NIST standardization. However, the selected algorithms are still vulnerable to side-channel attacks in practice, and their physical security need to be further evaluated.
This study introduces two efficient power analysis attacks, the optimized fast two-stage approach and the single-bit approach, aimed at reducing the key guess space in NTT polynomial multiplication on an STM32F405 device (ARM Cortex-M4 core).
Our findings reveal that the optimized approach outperforms the conservative approach and the fast two-stage approach proposed in ICCD 2021 by factors of 519 and 88, respectively.
Similarly, the single-bit approach demonstrates speedups of 365 and 62 times compared to these two approaches, respectively.