Post-Quantum Authenticated Encryption against Chosen-Ciphertext Side-Channel Attacks
Over the last years, the side-channel analysis of Post-Quantum Cryptography (PQC) candidates in the NIST standardization initiative has received increased attention. In particular, it has been shown that some post-quantum Key Encapsulation Mechanisms (KEMs) are vulnerable to Chosen-Ciphertext Side-Channel Attacks (CC-SCA). These powerful attacks target the re-encryption step in the Fujisaki-Okamoto (FO) transform, which is commonly used to achieve CCA security in such schemes. To sufficiently protect PQC KEMs on embedded devices against such a powerful CC-SCA, masking at increasingly higher order is required, which induces a considerable overhead. In this work, we propose a conceptually simple construction, the EtS KEM, that alleviates the impact of CC-SCA. It uses the EtS paradigm introduced by An, Dodis and Rabin at EUROCRYPT '02, and instantiates a post-quantum authenticated KEM in the outsider-security model. While the construction is generic, we apply it to the CRYSTALS-Kyber KEM, relying on the CRYSTALS-Dilithium and Falcon signature schemes, and show that a CC-SCA-protected EtS KEM version of CRYSTALS-Kyber requires less than 10% of the cycles required for the CC-SCA-protected FO-based KEM. We additionally show that the cost of protecting the EtS KEM against fault injection attacks, necessarily due to the added signature verification, remains negligible compared to the large cost of masking the FO transform at higher orders. Lastly, we discuss relevant embedded use cases for our EtS KEM construction.
Post-Quantum Authenticated Encryption against Chosen-Ciphertext Side-Channel Attacks
Over the last years, the side-channel analysis of Post-Quantum Cryptography (PQC) candidates in the NIST standardization initiative has received increased attention. In particular, it has been shown that some post-quantum Key Encapsulation Mechanisms (KEMs) are vulnerable to Chosen-Ciphertext Side-Channel Attacks (CC-SCA). These powerful attacks target the re-encryption step in the Fujisaki-Okamoto (FO) transform, which is commonly used to achieve CCA security in such schemes. To sufficiently protect PQC KEMs on embedded devices against such a powerful CC-SCA, masking at increasingly higher order is required, which induces a considerable overhead. In this work, we propose to use a conceptually simple construction, the ΕtS KEM, that alleviates the impact of CC-SCA. It uses the Encrypt-then-Sign (EtS) paradigm introduced by Zheng at ISW ’97 and further analyzed by An, Dodis and Rabin at EUROCRYPT ’02, and instantiates a postquantum authenticated KEM in the outsider-security model. While the construction is generic, we apply it to the CRYSTALS-Kyber KEM, relying on the CRYSTALSDilithium and Falcon signature schemes. We show that a CC-SCA-protected EtS KEM version of CRYSTALS-Kyber requires less than 10% of the cycles required for the CC-SCA-protected FO-based KEM, at the cost of additional data/communication overhead. We additionally show that the cost of protecting the EtS KEM against fault injection attacks, necessarily due to the added signature verification, remains negligible compared to the large cost of masking the FO transform at higher orders. Lastly, we discuss relevant embedded use cases for our EtS KEM construction.
Masking Kyber: First- and Higher-Order Implementations 📺
In the final phase of the post-quantum cryptography standardization effort, the focus has been extended to include the side-channel resistance of the candidates. While some schemes have been already extensively analyzed in this regard, there is no such study yet of the finalist Kyber.In this work, we demonstrate the first completely masked implementation of Kyber which is protected against first- and higher-order attacks. To the best of our knowledge, this results in the first higher-order masked implementation of any post-quantum secure key encapsulation mechanism algorithm. This is realized by introducing two new techniques. First, we propose a higher-order algorithm for the one-bit compression operation. This is based on a masked bit-sliced binary-search that can be applied to prime moduli. Second, we propose a technique which enables one to compare uncompressed masked polynomials with compressed public polynomials. This avoids the costly masking of the ciphertext compression while being able to be instantiated at arbitrary orders.We show performance results for first-, second- and third-order protected implementations on the Arm Cortex-M0+ and Cortex-M4F. Notably, our implementation of first-order masked Kyber decapsulation requires 3.1 million cycles on the Cortex-M4F. This is a factor 3.5 overhead compared to the unprotected optimized implementationin pqm4. We experimentally show that the first-order implementation of our new modules on the Cortex-M0+ is hardened against attacks using 100 000 traces and mechanically verify the security in a fine-grained leakage model using the verification tool scVerif.
High-Speed Masking for Polynomial Comparison in Lattice-based KEMs 📺
With the NIST post-quantum standardization competition entering the second round, the interest in practical implementation results of the remaining NIST candidates is steadily growing. Especially implementations on embedded devices are often not protected against side-channel attacks, such as differential power analysis. In this regard, the application of countermeasures against side-channel attacks to candidates of the NIST standardization process is still an understudied topic. Our work aims to contribute to the NIST competition by enabling a more realistic judgment of the overhead cost introduced by side-channel countermeasures that are applied to lattice-based KEMs that achieve CCA-security based on the Fujisaki-Okamoto transform. We present a novel higher-order masking scheme that enables an efficient comparison of polynomials as previous techniques based on arithmetic-to-Boolean conversions renders this (generally inexpensive) component extremely expensive in the masked case. Our approach has linear complexity in the number of shares compared to quadratic complexity of previous contributions and it applies to lattice based schemes with prime modulus. It comes with a proof in the probing model and an efficient implementation on an ARM Cortex-M4F microcontroller which was defined as a preferred evaluation platform for embedded implementations by NIST. Our algorithm can be executed in only 1.5-2.2 milliseconds on the target platform (depending on the masking order) and is therefore well suited even for lightweight applications. While in previous work, practical side-channel experiments were conducted using only 5,000 - 100,000 power traces, we confirm the absence of first-order leakage in this work by collecting 1 million power traces and applying the t-test methodology.
Glitch-Resistant Masking Revisited 📺
Implementing the masking countermeasure in hardware is a delicate task. Various solutions have been proposed for this purpose over the last years: we focus on Threshold Implementations (TIs), Domain-Oriented Masking (DOM), the Unified Masking Approach (UMA) and Generic Low Latency Masking (GLM). The latter generally come with innovative ideas to cope with physical defaults such as glitches. Yet, and in contrast to the situation in software-oriented masking, these schemes have not been formally proven at arbitrary security orders and their composability properties were left unclear. So far, only a 2-cycle implementation of the seminal masking scheme by Ishai, Sahai and Wagner has been shown secure and composable in the robust probing model – a variation of the probing model aimed to capture physical defaults such as glitches – for any number of shares.In this paper, we argue that this lack of proofs for TIs, DOM, UMA and GLM makes the interpretation of their security guarantees difficult as the number of shares increases. For this purpose, we first put forward that the higher-order variants of all these schemes are affected by (local or composability) security flaws in the (robust) probing model, due to insufficient refreshing. We then show that composability and robustness against glitches cannot be analyzed independently. We finally detail how these abstract flaws translate into concrete (experimental) attacks, and discuss the additional constraints robust probing security implies on the need of registers. Despite not systematically leading to improved complexities at low security orders, e.g., with respect to the required number of measurements for a successful attack, we argue that these weaknesses provide a case for the need of security proofs in the robust probing model (or a similar abstraction) at higher security orders.
Multi-Tuple Leakage Detection and the Dependent Signal Issue 📺
Leakage detection is a common tool to quickly assess the security of a cryptographic implementation against side-channel attacks. The Test Vector Leakage Assessment (TVLA) methodology using Welch’s t-test, proposed by Cryptography Research, is currently the most popular example of such tools, thanks to its simplicity and good detection speed compared to attack-based evaluations. However, as any statistical test, it is based on certain assumptions about the processed samples and its detection performances strongly depend on parameters like the measurement’s Signal-to-Noise Ratio (SNR), their degree of dependency, and their density, i.e., the ratio between the amount of informative and non-informative points in the traces. In this paper, we argue that the correct interpretation of leakage detection results requires knowledge of these parameters which are a priori unknown to the evaluator, and, therefore, poses a non-trivial challenge to evaluators (especially if restricted to only one test). For this purpose, we first explore the concept of multi-tuple detection, which is able to exploit differences between multiple informative points of a trace more effectively than tests relying on the minimum p-value of concurrent univariate tests. To this end, we map the common Hotelling’s T2-test to the leakage detection setting and, further, propose a specialized instantiation of it which trades computational overheads for a dependency assumption. Our experiments show that there is not one test that is the optimal choice for every leakage scenario. Second, we highlight the importance of the assumption that the samples at each point in time are independent, which is frequently considered in leakage detection, e.g., with Welch’s t-test. Using simulated and practical experiments, we show that (i) this assumption is often violated in practice, and (ii) deviations from it can affect the detection performances, making the correct interpretation of the results more difficult. Finally, we consolidate our findings by providing guidelines on how to use a combination of established and newly-proposed leakage detection tools to infer the measurements parameters. This enables a better interpretation of the tests’ results than the current state-of-the-art (yet still relying on heuristics for the most challenging evaluation scenarios).
Efficiently Masking Binomial Sampling at Arbitrary Orders for Lattice-Based Crypto
With the rising popularity of lattice-based cryptography, the Learning with Errors (LWE) problem has emerged as a fundamental core of numerous encryption and key exchange schemes. Many LWE-based schemes have in common that they require sampling from a discrete Gaussian distribution which comes with a number of challenges for the practical instantiation of those schemes. One of these is the inclusion of countermeasures against a physical side-channel adversary. While several works discuss the protection of samplers against timing leaks, only few publications explore resistance against other side-channels, e.g., power. The most recent example of a protected binomial sampler (as used in key encapsulation mechanisms to sufficiently approximate Gaussian distributions) from CHES 2018 is restricted to a first-order adversary and cannot be easily extended to higher protection orders.In this work, we present the first protected binomial sampler which provides provable security against a side-channel adversary at arbitrary orders. Our construction relies on a new conversion between Boolean and arithmetic (B2A) masking schemes for prime moduli which outperforms previous algorithms significantly for the relevant parameters, and is paired with a new masked bitsliced sampler allowing secure and efficient sampling even at larger protection orders. Since our proposed solution supports arbitrary moduli, it can be utilized in a large variety of lattice-based constructions, like NewHope, LIMA, Saber, Kyber, HILA5, or Ding Key Exchange.
Practical CCA2-Secure and Masked Ring-LWE Implementation 📺
During the last years public-key encryption schemes based on the hardness of ring-LWE have gained significant popularity. For real-world security applications assuming strong adversary models, a number of practical issues still need to be addressed. In this work we thus present an instance of ring-LWE encryption that is protected against active attacks (i.e., adaptive chosen-ciphertext attacks) and equipped with countermeasures against side-channel analysis. Our solution is based on a postquantum variant of the Fujisaki-Okamoto (FO) transform combined with provably secure first-order masking. To protect the key and message during decryption, we developed a masked binomial sampler that secures the re-encryption process required by FO. Our work shows that CCA2-secured RLWE-based encryption can be achieved with reasonable performance on constrained devices but also stresses that the required transformation and handling of decryption errors implies a performance overhead that has been overlooked by the community so far. With parameters providing 233 bits of quantum security, our implementation requires 4,176,684 cycles for encryption and 25,640,380 cycles for decryption with masking and hiding countermeasures on a Cortex-M4F. The first-order security of our masked implementation is also practically verified using the non-specific t-test evaluation methodology.
Leakage Detection with the x2-Test 📺
We describe how Pearson’s χ2-test can be used as a natural complement to Welch’s t-test for black box leakage detection. In particular, we show that by using these two tests in combination, we can mitigate some of the limitations due to the moment-based nature of existing detection techniques based on Welch’s t-test (e.g., for the evaluation of higher-order masked implementations with insufficient noise). We also show that Pearson’s χ2-test is naturally suited to analyze threshold implementations with information lying in multiple statistical moments, and can be easily extended to a distinguisher for key recovery attacks. As a result, we believe the proposed test and methodology are interesting complementary ingredients of the side-channel evaluation toolbox, for black box leakage detection and non-profiled attacks, and as a preliminary before more demanding advanced analyses.
Gimli : A Cross-Platform Permutation
This paper presents Gimli, a 384-bit permutation designed to achieve high security with high performance across a broad range of platforms, including 64-bit Intel/AMD server CPUs, 64-bit and 32-bit ARM smartphone CPUs, 32-bit ARM microcontrollers, 8-bit AVR microcontrollers, FPGAs, ASICs without side-channel protection, and ASICs with side-channel protection.
- Eurocrypt 2021
- CHES 2021
- CHES 2020
- CHES 2019
- Mélissa Azouaoui (2)
- Florian Bache (1)
- Daniel J. Bernstein (1)
- Joppe W. Bos (1)
- Erik Boss (1)
- Olivier Bronchain (1)
- Sebastian Faust (1)
- Marc Gourjon (1)
- Vincent Grosso (1)
- Tim Güneysu (5)
- Stefan Kölbl (1)
- Yulia Kuzovkova (2)
- Gregor Leander (1)
- Stefan Lucks (1)
- Pedro Maat Costa Massolino (1)
- Florian Mendel (1)
- Thorben Moos (1)
- Amir Moradi (6)
- Kashif Nawaz (1)
- Tobias Oder (3)
- Clara Paglialonga (3)
- Thomas Pöppelmann (1)
- Joost Renes (1)
- Bastian Richter (1)
- Peter Schwabe (1)
- François-Xavier Standaert (4)
- Yosuke Todo (1)
- Christine van Vredendaal (1)
- Benoît Viguier (1)
- Christine van Vredendaal (2)