International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Papers from Transactions on Cryptographic Hardware and Embedded Systems 2019

Year
Venue
Title
2019
TCHES
3-Share Threshold Implementation of AES S-box without Fresh Randomness 📺
Threshold implementation is studied as a countermeasure against sidechannel attack. There had been no threshold implementation for the AES and Keccak S-boxes that satisfies an important property called uniformity. In the conventional implementations, intermediate values are remasked to compensate for the lack of uniformity. The remasking consumes thousands of fresh random bits and its implementation cost is a serious concern. Daemen recently proposed a 3-share uniform threshold implementation of the Keccak S-box. This is enabled by a new technique called the changing of the guards which can be applied to any invertible functions. Subsequently, Wegener et al. proposed a 4-share threshold implementation of the AES S-box based on the changing of the guards technique. However, a 3-share threshold implementation of AES S-box remains open. The difficulty stays in 2-input multiplication, used in decomposed S-box representations, which is non-invertible because of different input and output sizes. In this study, this problem is addressed by introducing a certain generalization of the changing of the guards technique. The proposed method provides a generic way to construct a uniform sharing for a target function having different input and output sizes. The key idea is to transform a target function into an invertible one by adding additional inputs and outputs. Based on the proposed technique, the first 3-share threshold implementation of AES S-box without fresh randomness is presented. Performance evaluation and simulation-based leakage assessment of the implementation are also presented.
2019
TCHES
A Comprehensive Study of Deep Learning for Side-Channel Analysis 📺
Recently, several studies have been published on the application of deep learning to enhance Side-Channel Attacks (SCA). These seminal works have practically validated the soundness of the approach, especially against implementations protected by masking or by jittering. Concurrently, important open issues have emerged. Among them, the relevance of machine (and thereby deep) learning based SCA has been questioned in several papers based on the lack of relation between the accuracy, a typical performance metric used in machine learning, and common SCA metrics like the Guessing entropy or the key-discrimination success rate. Also, the impact of the classical side-channel counter-measures on the efficiency of deep learning has been questioned, in particular by the semi-conductor industry. Both questions enlighten the importance of studying the theoretical soundness of deep learning in the context of side-channel and of developing means to quantify its efficiency, especially with respect to the optimality bounds published so far in the literature for side-channel leakage exploitation. The first main contribution of this paper directly concerns the latter point. It is indeed proved that minimizing the Negative Log Likelihood (NLL for short) loss function during the training of deep neural networks is actually asymptotically equivalent to maximizing the Perceived Information introduced by Renauld et al. at EUROCRYPT 2011 as a lower bound of the Mutual Information between the leakage and the target secret. Hence, such a training can be considered as an efficient and effective estimation of the PI, and thereby of the MI (known to be complex to accurately estimate in the context of secure implementations). As a second direct consequence of our main contribution, it is argued that, in a side-channel exploitation context, choosing the NLL loss function to drive the training is sound from an information theory point of view. As a third contribution, classical counter-measures like Boolean masking or execution flow shuffling, initially dedicated to classical SCA, are proved to stay sound against deep Learning based attacks.
2019
TCHES
Analysis and Improvement of Differential Computation Attacks against Internally-Encoded White-Box Implementations 📺
White-box cryptography is the last security barrier for a cryptographic software implementation deployed in an untrusted environment. The principle of internal encodings is a commonly used white-box technique to protect block cipher implementations. It consists in representing an implementation as a network of look-up tables which are then encoded using randomly generated bijections (the internal encodings). When this approach is implemented based on nibble (i.e. 4-bit wide) encodings, the protected implementation has been shown to be vulnerable to differential computation analysis (DCA). The latter is essentially an adaptation of differential power analysis techniques to computation traces consisting of runtime information, e.g., memory accesses, of the target software. In order to thwart DCA, it has then been suggested to use wider encodings, and in particular byte encodings, at least to protect the outer rounds of the block cipher which are the prime targets of DCA.In this work, we provide an in-depth analysis of when and why DCA works. We pinpoint the properties of the target variables and the encodings that make the attack (in)feasible. In particular, we show that DCA can break encodings wider than 4-bit, such as byte encodings. Additionally, we propose new DCA-like attacks inspired from side-channel analysis techniques. Specifically, we describe a collision attack particularly effective against the internal encoding countermeasure. We also investigate mutual information analysis (MIA) which naturally applies in this context. Compared to the original DCA, these attacks are also passive and they require very limited knowledge of the attacked implementation, but they achieve significant improvements in terms of trace complexity. All the analyses of our work are experimentally backed up with various attack simulation results. We also verified the practicability of our analyses and attack techniques against a publicly available white-box AES implementation protected with byte encodings –which DCA has failed to break before– and against a “masked” white-box AES implementation –which intends to resist DCA.
2019
TCHES
AuCPace: Efficient verifier-based PAKE protocol tailored for the IIoT 📺
Increasingly connectivity becomes integrated in products and devices that previously operated in a stand-alone setting. This observation holds for many consumer applications in the so-called "Internet of Things" (IoT) as well as for corresponding industry applications (IIoT), such as industrial process sensors. Often the only practicable means for authentication of human users is a password. The security of password-based authentication schemes frequently forms the weakest point of the security infrastructure. In this paper we first explain why a tailored protocol designed for the IIoT use case is considered necessary. The differences between IIoT and the conventional Internet use-cases result in largely modified threats and require special procedures for allowing both, convenient and secure use in the highly constrained industrial setting. Specifically the use of a verifier-based password-authenticated key-exchange (V-PAKE) protocol as a hedge against public-key-infrastructure (PKI) failures is considered important. Availability concerns for the case of failures of (part of) the communication infrastructure makes local storage of access credentials mandatory. The larger threat of physical attacks makes it important to use memory-hard password hashing. This paper presents a corresponding tailored protocol, AuCPace, together with a security proof within the Universal Composability (UC) framework considering fully adaptive adversaries. We also introduce a new security notion of partially augmented PAKE that provides specific performance advantages and makes them suitable for a larger set of IIoT applications. We also present an actual instantiation of our protocol, AuCPace25519, and present performance results on ARM Cortex-M0 and Cortex-M4 microcontrollers. Our implementation realizes new speed-records for PAKE and X25519 Diffie-Hellman for the ARM Cortex M4 architecture.
2019
TCHES
Best Information is Most Successful 📺
Using information-theoretic tools, this paper establishes a mathematical link between the probability of success of a side-channel attack and the minimum number of queries to reach a given success rate, valid for any possible distinguishing rule and with the best possible knowledge on the attacker’s side. This link is a lower bound on the number of queries highly depends on Shannon’s mutual information between the traces and the secret key. This leads us to derive upper bounds on the mutual information that are as tight as possible and can be easily calculated. It turns out that, in the case of an additive white Gaussian noise, the bound on the probability of success of any attack is directly related to the signal to noise ratio. This leads to very easy computations and predictions of the success rate in any leakage model.
2019
TCHES
Bluethunder: A 2-level Directional Predictor Based Side-Channel Attack against SGX 📺
Software Guard Extension (SGX) is a hardware-based trusted execution environment (TEE) implemented in recent Intel commodity processors. By isolating the memory of security-critical applications from untrusted software, this mechanism provides users with a strongly shielded environment called enclave for executing programs safely. However, recent studies have demonstrated that SGX enclaves are vulnerable to side-channel attacks. In order to deal with these attacks, several protection techniques have been studied and utilized.In this paper, we explore a new pattern history table (PHT) based side-channel attack against SGX named Bluethunder, which can bypass existing protection techniques and reveal the secret information inside an enclave. Comparing to existing PHT-based attacks (such as Branchscope [ERAG+18]), Bluethunder abuses the 2-level directional predictor in the branch prediction unit, on top of which we develop an exploitation methodology to disclose the input-dependent control flow in an enclave. Since the cost of training the 2-level predictor is pretty low, Bluethunder can achieve a high bandwidth during the attack. We evaluate our attacks on two case studies: extracting the format string information in the vfprintf function in the Intel SGX SDK and attacking the implementation of RSA decryption algorithm in mbed TLS. Both attacks show that Bluethunder can recover fine-grained information inside an enclave with low training overhead, which outperforms the latest PHT-based side channel attack (Branchscope) by 52×. Specifically, in the second attack, Bluethunder can recover the RSA private key with 96.76% accuracy in a single run.
2019
TCHES
Cache vs. Key-Dependency: Side Channeling an Implementation of Pilsung 📺
Over the past two decades, cache attacks have been identified as a threat to the security of cipher implementations. These attacks recover secret information by combining observations of the victim cache accesses with the knowledge of the internal structure of the cipher. So far, cache attacks have been applied to ciphers that have fixed state transformations, leaving open the question of whether using secret, key-dependent transformations enhances the security against such attacks. In this paper we investigate this question. We look at an implementation of the North Korean cipher Pilsung, as reverse-engineered by Kryptos Logic. Like AES, Pilsung is a permutation-substitution cipher, but unlike AES, both the substitution and the permutation steps in Pilsung depend on the key, and are not known to the attacker. We analyze Pilsung and design a cache-based attack. We improve the state of the art by developing techniques for reversing secret-dependent transformations. Our attack, which requires an average of eight minutes on a typical laptop computer, demonstrates that secret transformations do not necessarily protect ciphers against side channel attacks.
2019
TCHES
Cache-Timing Attacks on RSA Key Generation 📺
During the last decade, constant-time cryptographic software has quickly transitioned from an academic construct to a concrete security requirement for real-world libraries. Most of OpenSSL’s constant-time code paths are driven by cryptosystem implementations enabling a dedicated flag at runtime. This process is perilous, with several examples emerging in the past few years of the flag either not being set or software defects directly mishandling the flag. In this work, we propose a methodology to analyze security-critical software for side-channel insecure code path traversal. Applying our methodology to OpenSSL, we identify three new code paths during RSA key generation that potentially leak critical algorithm state. Exploiting one of these leaks, we design, implement, and mount a single trace cache-timing attack on the GCD computation step. We overcome several hurdles in the process, including but not limited to: (1) granularity issues due to word-size operands to the GCD function; (2) bulk processing of desynchronized trace data; (3) non-trivial error rate during information extraction; and (4) limited high-confidence information on the modulus factors. Formulating lattice problem instances after obtaining and processing this limited information, our attack achieves roughly a 27% success rate for key recovery using the empirical data from 10K trials.
2019
TCHES
CAS-Lock: A Security-Corruptibility Trade-off Resilient Logic Locking Scheme 📺
Logic locking has recently been proposed as a solution for protecting gatelevel semiconductor intellectual property (IP). However, numerous attacks have been mounted on this technique, which either compromise the locking key or restore the original circuit functionality. SAT attacks leverage golden IC information to rule out all incorrect key classes, while bypass and removal attacks exploit the limited output corruptibility and/or structural traces of SAT-resistant locking schemes. In this paper, we propose a new lightweight locking technique: CAS-Lock (cascaded locking) which nullifies both SAT and bypass attacks, while simultaneously maintaining nontrivial output corruptibility. This property of CAS-Lock is in stark contrast to the well-accepted notion that there is an inherent trade-off between output corruptibility and SAT resistance. We theoretically and experimentally validate the SAT resistance of CAS-Lock, and show that it reduces the attack to brute-force, regardless of its construction. Further, we evaluate its resistance to recently proposed approximate SAT attacks (i.e., AppSAT). We also propose a modified version of CAS-Lock (mirrored CAS-Lock or M-CAS) to protect against removal attacks. M-CAS allows a trade-off evaluation between removal attack and SAT attack resiliency, while incurring minimal area overhead. We also show how M-CAS parameters such as the implemented Boolean function and selected key can be tuned by the designer so that a desired level of protection against all known attacks can be achieved.
2019
TCHES
Consolidating Security Notions in Hardware Masking 📺
In this paper, we revisit the security conditions of masked hardware implementations. We describe a new, succinct, information-theoretic condition called d-glitch immunity which is both necessary and sufficient for security in the presence of glitches. We show that this single condition includes, but is not limited to, previous security notions such as those used in higher-order threshold implementations and in abstractions using ideal gates. As opposed to these previously known necessary conditions, our new condition is also sufficient. On the other hand, it excludes avoidable notions such as uniformity. We also treat the notion of (strong) noninterference from an information-theoretic point-of-view in order to unify the different security concepts and pave the way to the verification of composability in the presence of glitches. We conclude the paper by demonstrating how the condition can be used as an efficient and highly generic flaw detection mechanism for a variety of functions and schemes based on different operations.
2019
TCHES
Covert Gates: Protecting Integrated Circuits with Undetectable Camouflaging 📺
Integrated circuit (IC) camouflaging has emerged as a promising solution for protecting semiconductor intellectual property (IP) against reverse engineering. Existing methods of camouflaging are based on standard cells that can assume one of many Boolean functions, either through variation of transistor threshold voltage or contact configurations. Unfortunately, such methods lead to high area, delay and power overheads, and are vulnerable to invasive as well as non-invasive attacks based on Boolean satisfiability/VLSI testing. In this paper, we propose, fabricate, and demonstrate a new cell camouflaging strategy, termed as ‘covert gate’ that leverages doping and dummy contacts to create camouflaged cells that are indistinguishable from regular standard cells under modern imaging techniques. We perform a comprehensive security analysis of covert gate, and show that it achieves high resiliency against SAT and test-based attacks at very low overheads. We also derive models to characterize the covert cells, and develop measures to incorporate them into a gate-level design. Simulation results of overheads and attacks are presented on benchmark circuits.
2019
TCHES
Deep Learning to Evaluate Secure RSA Implementations 📺
This paper presents the results of several successful profiled side-channel attacks against a secure implementation of the RSA algorithm. The implementation was running on a ARM Core SC 100 completed with a certified EAL4+ arithmetic co-processor. The analyses have been conducted by three experts’ teams, each working on a specific attack path and exploiting information extracted either from the electromagnetic emanation or from the power consumption. A particular attention is paid to the description of all the steps that are usually followed during a security evaluation by a laboratory, including the acquisitions and the observations preprocessing which are practical issues usually put aside in the literature. Remarkably, the profiling portability issue is also taken into account and different device samples are involved for the profiling and testing phases. Among other aspects, this paper shows the high potential of deep learning attacks against secure implementations of RSA and raises the need for dedicated countermeasures.
2019
TCHES
Electromagnetic Information Extortion from Electronic Devices Using Interceptor and Its Countermeasure 📺
The problem of information leakage through electromagnetic waves for various devices has been extensively discussed in literature. Conventionally, devices that are under such a threat suffer from potential electromagnetic information leakage during their operation. Further, the information inside the devices can be obtained by monitoring the electromagnetic waves leaking at the boundaries of the devices. The leakage of electromagnetic waves, however, was not observed for some devices, and such devices were not the target of the threat discussed above. In light of this circumstance, this paper discusses an “interceptor” that forces the leakage of information through electromagnetic waves, even from devices in which potential electromagnetic leakage does not occur. The proposed interceptor is a small circuit consisting of an affordable semiconductor chip and wiring and is powered by electromagnetic waves that irradiate from the outside of a device as its driving energy. The distance at which information is obtained is controlled by increasing the intensity of the irradiated electromagnetic waves. The paper presents the structure of the circuit for implementing the proposed interceptor to be used in major input–output devices and cryptographic modules, mounting a pathway designed on the basis of the construction method onto each device. Moreover, it is shown that it is possible to forcefully cause information leakage through electromagnetic waves. To detect the aforementioned threat, the paper also focuses on the changes in a device itself and the surrounding electromagnetic environment as a result of mounting an interceptor and considers a method of detecting an interceptor by both passive and active monitoring methods.
2019
TCHES
Error Amplification in Code-based Cryptography 📺
Code-based cryptography is one of the main techniques enabling cryptographic primitives in a post-quantum scenario. In particular, the MDPC scheme is a basic scheme from which many other schemes have been derived. These schemes rely on iterative decoding in the decryption process and thus have a certain small probability p of having a decryption (decoding) error.In this paper we show a very fundamental and important property of code-based encryption schemes. Given one initial error pattern that fails to decode, the time needed to generate another message that fails to decode is strictly much less than 1/p. We show this by developing a method for fast generation of undecodable error patterns (error pattern chaining), which additionally proves that a measure of closeness in ciphertext space can be exploited through its strong linkage to the difficulty of decoding these messages. Furthermore, if side-channel information is also available (time to decode), then the initial error pattern no longer needs to be given since one can be easily generated in this case.These observations are fundamentally important because they show that a, say, 128- bit encryption scheme is not inherently safe from reaction attacks even if it employs a decoder with a failure rate of 2−128. In fact, unless explicit protective measures are taken, having a failure rate at all – of any magnitude – can pose a security problem because of the error amplification effect of our method.A key-recovery reaction attack was recently shown on the MDPC scheme as well as similar schemes, taking advantage of decoding errors in order to recover the secret key. It was also shown that knowing the number of iterations in the iterative decoding step, which could be received in a timing attack, would also enable and enhance such an attack. In this paper we apply our error pattern chaining method to show how to improve the performance of such reaction attacks in the CPA case. We show that after identifying a single decoding error (or a decoding step taking more time than expected in a timing attack), we can adaptively create new error patterns that have a much higher decoding error probability than for a random error. This leads to a significant improvement of the attack based on decoding errors in the CPA case and it also gives the strongest known attack on MDPC-like schemes, both with and without using side-channel information.
2019
TCHES
Exploring the Effect of Device Aging on Static Power Analysis Attacks 📺
Vulnerability of cryptographic devices to side-channel analysis attacks, and in particular power analysis attacks has been extensively studied in the recent years. Among them, static power analysis attacks have become relevant with moving towards smaller technology nodes for which the static power is comparable to the dynamic power of a chip, or even dominant in future technology generations. The magnitude of the static power of a chip depends on the physical characteristics of transistors (e.g., the dimensions) as well as operating conditions (e.g., the temperature) and the electrical specifications such as the threshold voltage. In fact, the electrical specifications of transistors deviate from their originally intended ones during device lifetime due to aging mechanisms. Although device aging has been extensively investigated from reliability point of view, the impact of aging on the security of devices, and in particular on the vulnerability of devices to power analysis attacks are yet to be considered.This paper fills the gap and investigates how device aging can affect the susceptibility of a chip exposed to static power analysis attacks. To this end, we conduct both, simulation and practical experiments on real silicon. The experimental results are extracted from a realization of the PRESENT cipher fabricated using a 65nm commercial standard cell library. The results show that the amount of exploitable leakage through the static power consumption as a side channel is reduced when the device is aged. This can be considered as a positive development which can (even slightly) harden such static power analysis attacks. Additionally, this result is of great interest to static power side-channel adversaries since state-of-the-art leakage current measurements are conducted over long time periods under increased working temperatures and supply voltages to amplify the exploitable information, which certainly fuels aging-related device degradation.
2019
TCHES
Fast and simple constant-time hashing to the BLS12-381 elliptic curve 📺
Pairing-friendly elliptic curves in the Barreto-Lynn-Scott family are seeing a resurgence in popularity because of the recent result of Kim and Barbulescu that improves attacks against other pairing-friendly curve families. One particular Barreto-Lynn-Scott curve, called BLS12-381, is the locus of significant development and deployment effort, especially in blockchain applications. This effort has sparked interest in using the BLS12-381 curve for BLS signatures, which requires hashing to one of the groups of the bilinear pairing defined by BLS12-381.While there is a substantial body of literature on the problem of hashing to elliptic curves, much of this work does not apply to Barreto-Lynn-Scott curves. Moreover, the work that does apply has the unfortunate property that fast implementations are complex, while simple implementations are slow.In this work, we address these issues. First, we show a straightforward way of adapting the “simplified SWU” map of Brier et al. to BLS12-381. Second, we describe optimizations to this map that both simplify its implementation and improve its performance; these optimizations may be of interest in other contexts. Third, we implement and evaluate. We find that our work yields constant-time hash functions that are simple to implement, yet perform within 9% of the fastest, non–constant-time alternatives, which require much more complex implementations.
2019
TCHES
Fast constant-time gcd computation and modular inversion 📺
This paper introduces streamlined constant-time variants of Euclid’s algorithm, both for polynomial inputs and for integer inputs. As concrete applications, this paper saves time in (1) modular inversion for Curve25519, which was previously believed to be handled much more efficiently by Fermat’s method, and (2) key generation for the ntruhrss701 and sntrup4591761 lattice-based cryptosystems.
2019
TCHES
Fast, Furious and Insecure: Passive Keyless Entry and Start Systems in Modern Supercars 📺
The security of immobiliser and Remote Keyless Entry systems has been extensively studied over many years. Passive Keyless Entry and Start systems, which are currently deployed in luxury vehicles, have not received much attention besides relay attacks. In this work we fully reverse engineer a Passive Keyless Entry and Start system and perform a thorough analysis of its security.Our research reveals several security weaknesses. Specifically, we document the use of an inadequate proprietary cipher using 40-bit keys, the lack of mutual authentication in the challenge-response protocol, no firmware readout protection features enabled and the absence of security partitioning.In order to validate our findings, we implement a full proof of concept attack allowing us to clone a Tesla Model S key fob in a matter of seconds with low cost commercial off the shelf equipment. Our findings most likely apply to other manufacturers of luxury vehicles including McLaren, Karma and Triumph motorcycles as they all use the same system developed by Pektron.
2019
TCHES
Fully Automated Differential Fault Analysis on Software Implementations of Block Ciphers 📺
Differential Fault Analysis (DFA) is considered as the most popular fault analysis method. While there are techniques that provide a fault analysis automation on the cipher level to some degree, it can be shown that when it comes to software implementations, there are new vulnerabilities, which cannot be found by observing the cipher design specification.This work bridges the gap by providing a fully automated way to carry out DFA on assembly implementations of symmetric block ciphers. We use a customized data flow graph to represent the program and develop a novel fault analysis methodology to capture the program behavior under faults. We establish an effective description of DFA as constraints that are passed to an SMT solver. We create a tool that takes assembly code as input, analyzes the dependencies among instructions, automatically attacks vulnerable instructions using SMT solver and outputs the attack details that recover the last round key (and possibly the earlier keys). We support our design with evaluations on lightweight ciphers SIMON, SPECK, and PRIDE, and a current NIST standard, AES. By automated assembly analysis, we were able to find new efficient DFA attacks on SPECK and PRIDE, exploiting implementation specific vulnerabilities, and previously published DFA on SIMON and AES. Moreover, we present a novel DFA on multiplication operation that has never been shown for symmetric block ciphers before. Our experimental evaluation also shows reasonable execution times that are scalable to current cipher designs and can easily outclass the manual analysis. Moreover, we present a method to check the countermeasure-protected implementations in a way that helps implementers to decide how many rounds should be protected. We note that this is the first work that automatically carries out DFA on cipher implementations without any plaintext or ciphertext information and therefore, can be generally applied to any input data to the cipher.
2019
TCHES
Glitch-Resistant Masking Revisited 📺
Best paper award CHES 2019
Implementing the masking countermeasure in hardware is a delicate task. Various solutions have been proposed for this purpose over the last years: we focus on Threshold Implementations (TIs), Domain-Oriented Masking (DOM), the Unified Masking Approach (UMA) and Generic Low Latency Masking (GLM). The latter generally come with innovative ideas to cope with physical defaults such as glitches. Yet, and in contrast to the situation in software-oriented masking, these schemes have not been formally proven at arbitrary security orders and their composability properties were left unclear. So far, only a 2-cycle implementation of the seminal masking scheme by Ishai, Sahai and Wagner has been shown secure and composable in the robust probing model – a variation of the probing model aimed to capture physical defaults such as glitches – for any number of shares.In this paper, we argue that this lack of proofs for TIs, DOM, UMA and GLM makes the interpretation of their security guarantees difficult as the number of shares increases. For this purpose, we first put forward that the higher-order variants of all these schemes are affected by (local or composability) security flaws in the (robust) probing model, due to insufficient refreshing. We then show that composability and robustness against glitches cannot be analyzed independently. We finally detail how these abstract flaws translate into concrete (experimental) attacks, and discuss the additional constraints robust probing security implies on the need of registers. Despite not systematically leading to improved complexities at low security orders, e.g., with respect to the required number of measurements for a successful attack, we argue that these weaknesses provide a case for the need of security proofs in the robust probing model (or a similar abstraction) at higher security orders.
2019
TCHES
Implementing RLWE-based Schemes Using an RSA Co-Processor 📺
We repurpose existing RSA/ECC co-processors for (ideal) lattice-based cryptography by exploiting the availability of fast long integer multiplication. Such co-processors are deployed in smart cards in passports and identity cards, secured microcontrollers and hardware security modules (HSM). In particular, we demonstrate an implementation of a variant of the Module-LWE-based Kyber Key Encapsulation Mechanism (KEM) that is tailored for high performance on a commercially available smart card chip (SLE 78). To benefit from the RSA/ECC co-processor we use Kronecker substitution in combination with schoolbook and Karatsuba polynomial multiplication. Moreover, we speed-up symmetric operations in our Kyber variant using the AES co-processor to implement a PRNG and a SHA-256 co-processor to realise hash functions. This allows us to execute CCA-secure Kyber768 key generation in 79.6 ms, encapsulation in 102.4 ms and decapsulation in 132.7 ms.
2019
TCHES
Improved Heuristics for Short Linear Programs 📺
In this article, we propose new heuristics for minimising the amount of XOR gates required to compute a system of linear equations in GF(2). We first revisit the well known Boyar-Peralta strategy and argue that a proper randomisation process during the selection phases can lead to great improvements. We then propose new selection criteria and explain their rationale. Our new methods outperform state-of-the-art algorithms such as Paar or Boyar-Peralta (or open synthesis tools such as Yosys) when tested on random matrices with various densities. They can be applied to matrices of reasonable sizes (up to about 32 × 32). Notably, we provide a new implementation record for the matrix underlying the MixColumns function of the AES block cipher, requiring only 94 XORs.
2019
TCHES
Improving CEMA using Correlation Optimization 📺
Sensitive cryptographic information, e.g. AES secret keys, can be extracted from the electromagnetic (EM) leakages unintentionally emitted by a device using techniques such as Correlation Electromagnetic Analysis (CEMA). In this paper, we introduce Correlation Optimization (CO), a novel approach that improves CEMA attacks by formulating the selection of useful EM leakage samples in a trace as a machine learning optimization problem. To this end, we propose the correlation loss function, which aims to maximize the Pearson correlation between a set of EM traces and the true AES key during training. We show that CO works with high-dimensional and noisy traces, regardless of time-domain trace alignment and without requiring prior knowledge of the power consumption characteristics of the cryptographic hardware. We evaluate our approach using the ASCAD benchmark dataset and a custom dataset of EM leakages from an Arduino Duemilanove, captured with a USRP B200 SDR. Our results indicate that the masked AES implementation used in all three ASCAD datasets can be broken with a shallow Multilayer Perceptron model, whilst requiring only 1,000 test traces on average. A similar methodology was employed to break the unprotected AES implementation from our custom dataset, using 22,000 unaligned and unfiltered test traces.
2019
TCHES
Leaky Noise: New Side-Channel Attack Vectors in Mixed-Signal IoT Devices 📺
Microcontrollers and SoC devices have widely been used in Internet of Things applications. This also brings the question whether they lead to new security threats unseen in traditional computing systems. In fact, almost all modern SoC chips, particularly in the IoT domain, contain both analog and digital components, for various sensing and transmission tasks. Traditional remote-accessible online systems do not have this property, which can potentially become a security vulnerability. In this paper we demonstrate that such mixed-signal components, namely ADCs, expose a new security threat that allows attackers with ADC access to deduce the activity of a CPU in the system. To prove the leakage, we perform leakage assessment on three individual microcontrollers from two different vendors with various ADC settings. After showing a correlation of CPU activity with ADC noise, we continue with a leakage assessment of modular exponentiation and AES. It is shown that for all of these devices, leakage occurs for at least one algorithm and configuration of the ADC. Finally, we show a full key recovery attack on AES that works despite of the limited ADC sampling rate. These results imply that even remotely accessible microcontroller systems should be equipped with proper countermeasures against power analysis attacks, or restrict access to ADC data.
2019
TCHES
Lightweight Authenticated Encryption Mode of Operation for Tweakable Block Ciphers 📺
The use of a small block length is a common strategy when designing lightweight (tweakable) block ciphers (TBCs), and several 64-bit primitives have been proposed. However, when such a 64-bit primitive is used for an authenticated encryption with birthday-bound security, it has only 32-bit data complexity, which is subject to practical attacks. To employ a short block length without compromising security, we propose PFB, a lightweight TBC-based authenticated encryption with associated data mode, which achieves beyond-birthday-bound security. For this purpose, we extend iCOFB, which is originally defined with a tweakable random function. Unlike iCOFB, the proposed method can be instantiated with a TBC using a fixed tweak length and can handle variable-length data. Moreover, its security bound is improved and independent of the data length; this improves the key lifetime, particularly in lightweight blocks with a small size. The proposed method also covers a broader class of feedback functions because of the generalization presented in our proof. We evaluate the concrete hardware performances of PFB, which benefits from the small block length and shows particularly good performances in threshold implementation.
2019
TCHES
M&M: Masks and Macs against Physical Attacks 📺
Cryptographic implementations on embedded systems need to be protected against physical attacks. Today, this means that apart from incorporating countermeasures against side-channel analysis, implementations must also withstand fault attacks and combined attacks. Recent proposals in this area have shown that there is a big tradeoff between the implementation cost and the strength of the adversary model. In this work, we introduce a new combined countermeasure M&M that combines Masking with information-theoretic MAC tags and infective computation. It works in a stronger adversary model than the existing scheme ParTI, yet is a lot less costly to implement than the provably secure MPC-based scheme CAPA. We demonstrate M&M with a SCA- and DFA-secure implementation of the AES block cipher. We evaluate the side-channel leakage of the second-order secure design with a non-specific t-test and use simulation to validate the fault resistance.
2019
TCHES
Make Some Noise. Unleashing the Power of Convolutional Neural Networks for Profiled Side-channel Analysis 📺
Profiled side-channel analysis based on deep learning, and more precisely Convolutional Neural Networks, is a paradigm showing significant potential. The results, although scarce for now, suggest that such techniques are even able to break cryptographic implementations protected with countermeasures. In this paper, we start by proposing a new Convolutional Neural Network instance able to reach high performance for a number of considered datasets. We compare our neural network with the one designed for a particular dataset with masking countermeasure and we show that both are good designs but also that neither can be considered as a superior to the other one.Next, we address how the addition of artificial noise to the input signal can be actually beneficial to the performance of the neural network. Such noise addition is equivalent to the regularization term in the objective function. By using this technique, we are able to reduce the number of measurements needed to reveal the secret key by orders of magnitude for both neural networks. Our new convolutional neural network instance with added noise is able to break the implementation protected with the random delay countermeasure by using only 3 traces in the attack phase. To further strengthen our experimental results, we investigate the performance with a varying number of training samples, noise levels, and epochs. Our findings show that adding noise is beneficial throughout all training set sizes and epochs.
2019
TCHES
Methodology for Efficient CNN Architectures in Profiling Attacks 📺
The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Previous works have shown the benefit of using convolutional neural networks (CNN) to limit the effect of some countermeasures such as desynchronization. Compared with template attacks, deep learning techniques can deal with trace misalignment and the high dimensionality of the data. Pre-processing is no longer mandatory. However, the performance of attacks depends to a great extent on the choice of each hyperparameter used to configure a CNN architecture. Hence, we cannot perfectly harness the potential of deep neural networks without a clear understanding of the network’s inner-workings. To reduce this gap, we propose to clearly explain the role of each hyperparameters during the feature selection phase using some specific visualization techniques including Weight Visualization, Gradient Visualization and Heatmaps. By highlighting which features are retained by filters, heatmaps come in handy when a security evaluator tries to interpret and understand the efficiency of CNN. We propose a methodology for building efficient CNN architectures in terms of attack efficiency and network complexity, even in the presence of desynchronization. We evaluate our methodology using public datasets with and without desynchronization. In each case, our methodology outperforms the previous state-of-the-art CNN models while significantly reducing network complexity. Our networks are up to 25 times more efficient than previous state-of-the-art while their complexity is up to 31810 times smaller. Our results show that CNN networks do not need to be very complex to perform well in the side-channel context.
2019
TCHES
Multi-Tuple Leakage Detection and the Dependent Signal Issue 📺
Leakage detection is a common tool to quickly assess the security of a cryptographic implementation against side-channel attacks. The Test Vector Leakage Assessment (TVLA) methodology using Welch’s t-test, proposed by Cryptography Research, is currently the most popular example of such tools, thanks to its simplicity and good detection speed compared to attack-based evaluations. However, as any statistical test, it is based on certain assumptions about the processed samples and its detection performances strongly depend on parameters like the measurement’s Signal-to-Noise Ratio (SNR), their degree of dependency, and their density, i.e., the ratio between the amount of informative and non-informative points in the traces. In this paper, we argue that the correct interpretation of leakage detection results requires knowledge of these parameters which are a priori unknown to the evaluator, and, therefore, poses a non-trivial challenge to evaluators (especially if restricted to only one test). For this purpose, we first explore the concept of multi-tuple detection, which is able to exploit differences between multiple informative points of a trace more effectively than tests relying on the minimum p-value of concurrent univariate tests. To this end, we map the common Hotelling’s T2-test to the leakage detection setting and, further, propose a specialized instantiation of it which trades computational overheads for a dependency assumption. Our experiments show that there is not one test that is the optimal choice for every leakage scenario. Second, we highlight the importance of the assumption that the samples at each point in time are independent, which is frequently considered in leakage detection, e.g., with Welch’s t-test. Using simulated and practical experiments, we show that (i) this assumption is often violated in practice, and (ii) deviations from it can affect the detection performances, making the correct interpretation of the results more difficult. Finally, we consolidate our findings by providing guidelines on how to use a combination of established and newly-proposed leakage detection tools to infer the measurements parameters. This enables a better interpretation of the tests’ results than the current state-of-the-art (yet still relying on heuristics for the most challenging evaluation scenarios).
2019
TCHES
New Circuit Minimization Techniques for Smaller and Faster AES SBoxes 📺
In this paper we consider various methods and techniques to find the smallest circuit realizing a given linear transformation on n input signals and m output signals, with a constraint of a maximum depth, maxD, of the circuit. Additional requirements may include that input signals can arrive to the circuit with different delays, and output signals may be requested to be ready at a different depth. We apply these methods and also improve previous results in order to find hardware circuits for forward, inverse, and combined AES SBoxes, and for each of them we provide the fastest and smallest combinatorial circuits. Additionally, we propose a novel technique with “floating multiplexers” to minimize the circuit for the combined SBox, where we have two different linear matrices (forward and inverse) combined with multiplexers. The resulting AES SBox solutions are the fastest and smallest to our knowledge.
2019
TCHES
New Insights to Key Derivation for Tamper-Evident Physical Unclonable Functions 📺
Several publications presented tamper-evident Physical Unclonable Functions (PUFs) for secure storage of cryptographic keys and tamper-detection. Unfortunately, previously published PUF-based key derivation schemes do not sufficiently take into account the specifics of the underlying application, i.e., an attacker that tampers with the physical parameters of the PUF outside of an idealized noise error model. This is a notable extension of existing schemes for PUF key derivation, as they are typically concerned about helper data leakage, i.e., by how much the PUF’s entropy is diminished when gaining access to its helper data.To address the specifics of tamper-evident PUFs, we formalize the aspect of tamper-sensitivity, thereby providing a new tool to rate by how much an attacker is allowed to tamper with the PUF. This complements existing criteria such as effective number of secret bits for entropy and failure rate for reliability. As a result, it provides a fair comparison among different schemes and independent of the PUF implementation, as its unit is based on the noise standard deviation of the underlying PUF measurement. To overcome the limitations of previous schemes, we then propose an Error-Correcting Code (ECC) based on the Lee metric, i.e., a distance metric well-suited to describe the distance between q-ary symbols as output from an equidistant quantization, i.e., a higher-order alphabet PUF. This novel approach is required, as the underlying symbols’ bits are not i.i.d. which hinders applying previous state-of-the-art approaches. We present the concept for our scheme and demonstrate its feasibility based on an empirical PUF distribution. The benefits of our approach are an increase by over 21% in effective secret bit compared to previous approaches based on equidistant quantization. At the same time, we improve tamper-sensitivity compared to an equiprobable quantization while ensuring similar reliability and entropy. Hence, this work opens up a new direction of how to interpret the PUF output and details a practically relevant scheme outperforming all previous constructions.
2019
TCHES
Non-Profiled Deep Learning-based Side-Channel attacks with Sensitivity Analysis 📺
Deep Learning has recently been introduced as a new alternative to perform Side-Channel analysis [MPP16]. Until now, studies have been focused on applying Deep Learning techniques to perform Profiled Side-Channel attacks where an attacker has a full control of a profiling device and is able to collect a large amount of traces for different key values in order to characterize the device leakage prior to the attack. In this paper we introduce a new method to apply Deep Learning techniques in a Non-Profiled context, where an attacker can only collect a limited number of side-channel traces for a fixed unknown key value from a closed device. We show that by combining key guesses with observations of Deep Learning metrics, it is possible to recover information about the secret key. The main interest of this method is that it is possible to use the power of Deep Learning and Neural Networks in a Non-Profiled scenario. We show that it is possible to exploit the translation-invariance property of Convolutional Neural Networks [CDP17] against de-synchronized traces also during Non-Profiled side-channel attacks. In this case, we show that this method can outperform classic Non-Profiled attacks such as Correlation Power Analysis. We also highlight that it is possible to break masked implementations in black-box, without leakages combination pre-preprocessing and with no assumptions nor knowledge about the masking implementation. To carry the attack, we introduce metrics based on Sensitivity Analysis that can reveal both the secret key value as well as points of interest, such as leakages and masks locations in the traces. The results of our experiments demonstrate the interests of this new method and show that this attack can be performed in practice.
2019
TCHES
Novel Side-Channel Attacks on Quasi-Cyclic Code-Based Cryptography 📺
Chou suggested a constant-time implementation for quasi-cyclic moderatedensity parity-check (QC-MDPC) code-based cryptography to mitigate timing attacks at CHES 2016. This countermeasure was later found to become vulnerable to a differential power analysis (DPA) in private syndrome computation, as described by Rossi et al. at CHES 2017. The proposed DPA, however, still could not completely recover accurate secret indices, requiring further solving linear equations to obtain entire secret information. In this paper, we propose a multiple-trace attack which enables to completely recover accurate secret indices. We further propose a singletrace attack which can even work when using ephemeral keys or applying Rossi et al.’s DPA countermeasures. Our experiments show that the BIKE and LEDAcrypt may become vulnerable to our proposed attacks. The experiments are conducted using power consumption traces measured from ChipWhisperer-Lite XMEGA (8-bit processor) and ChipWhisperer UFO STM32F3 (32-bit processor) target boards.
2019
TCHES
NTTRU: Truly Fast NTRU Using NTT 📺
We present NTTRU – an IND-CCA2 secure NTRU-based key encapsulation scheme that uses the number theoretic transform (NTT) over the cyclotomic ring Z7681[X]/(X768−X384+1) and produces public keys and ciphertexts of approximately 1.25 KB at the 128-bit security level. The number of cycles on a Skylake CPU of our constant-time AVX2 implementation of the scheme for key generation, encapsulation and decapsulation is approximately 6.4K, 6.1K, and 7.9K, which is more than 30X, 5X, and 8X faster than these respective procedures in the NTRU schemes that were submitted to the NIST post-quantum standardization process. These running times are also, by a large margin, smaller than those for all the other schemes in the NIST process as well as the KEMs based on elliptic curve Diffie-Hellman. We additionally give a simple transformation that allows one to provably deal with small decryption errors in OW-CPA encryption schemes (such as NTRU) when using them to construct an IND-CCA2 key encapsulation.
2019
TCHES
On-Device Power Analysis Across Hardware Security Domains. 📺
Side-channel power analysis is a powerful method of breaking secure cryptographic algorithms, but typically power analysis is considered to require specialized measurement equipment on or near the device. Assuming an attacker first gained the ability to run code on the unsecure side of a device, they could trigger encryptions and use the on-board ADC to capture power traces of that hardware encryption engine.This is demonstrated on a SAML11 which contains a M23 core with a TrustZone-M implementation as the hardware security barrier. This attack requires 160 × 106 traces, or approximately 5 GByte of data. This attack does not use any external measurement equipment, entirely performing the power analysis using the ADC on-board the microcontroller under attack. The attack is demonstrated to work both from the non-secure and secure environment on the chip, being a demonstration of a cross-domain power analysis attack.To understand the effect of noise and sample rate reduction, an attack is mounted on the SAML11 hardware AES peripheral using classic external equipment, and results are compared for various sample rates and hardware setups. A discussion on how users of this device can help prevent such remote attacks is also presented, along with metrics that can be used in evaluating other devices. Complete copies of all recorded power traces and scripts used by the authors are publicly presented.
2019
TCHES
Power Analysis on NTRU Prime 📺
This paper applies a variety of power analysis techniques to several implementations of NTRU Prime, a Round 2 submission to the NIST PQC Standardization Project. The techniques include vertical correlation power analysis, horizontal indepth correlation power analysis, online template attacks, and chosen-input simple power analysis. The implementations include the reference one, the one optimized using smladx, and three protected ones. Adversaries in this study can fully recover private keys with one single trace of short observation span, with few template traces from a fully controlled device similar to the target and no a priori power model, or sometimes even with the naked eye. The techniques target the constant-time generic polynomial multiplications in the product scanning method. Though in this work they focus on the decapsulation, they also work on the key generation and encapsulation of NTRU Prime. Moreover, they apply to the ideal-lattice-based cryptosystems where each private-key coefficient comes from a small set of possibilities.
2019
TCHES
Practical Evaluation of Protected Residue Number System Scalar Multiplication 📺
The Residue Number System (RNS) arithmetic is gaining grounds in public key cryptography, because it offers fast, efficient and secure implementations over large prime fields or rings of integers. In this paper, we propose a generic, thorough and analytic evaluation approach for protected scalar multiplication implementations with RNS and traditional Side Channel Attack (SCA) countermeasures in an effort to assess the SCA resistance of RNS. This paper constitutes the first robust evaluation of RNS software for Elliptic Curve Cryptography against electromagnetic (EM) side-channel attacks. Four different countermeasures, namely scalar and point randomization, random base permutations and random moduli operation sequence, are implemented and evaluated using the Test Vector Leakage Assessment (TVLA) and template attacks. More specifically, variations of RNS-based Montgomery Powering Ladder scalar multiplication algorithms are evaluated on an ARM Cortex A8 processor using an EM probe for acquisition of the traces. We show experimentally and theoretically that new bounds should be put forward when TVLA evaluations on public key algorithms are performed. On the security of RNS, our data and location dependent template attacks show that even protected implementations are vulnerable to these attacks. A combination of RNS-based countermeasures is the best way to protect against side-channel leakage.
2019
TCHES
Recovering the CTR_DRBG state in 256 traces 📺
The NIST CTR_DRBG specification prescribes a maximum size on each random number request, limiting the number of encryptions in CTR mode with the same key to 4 096. Jaffe’s attack on AES in CTR mode without knowledge of the nonce from CHES 2007 requires 216 traces, which is safely above this recommendation. In this work, we exhibit an attack that requires only 256 traces, which is well within the NIST limits. We use simulated traces to investigate the success probability as a function of the signal-to-noise ratio. We also demonstrate its success in practice by attacking an AES-CTR implementation on a Cortex-M4 among others and recovering both the key and nonce. Our traces and code are made openly available for reproducibility.
2019
TCHES
Reducing a Masked Implementation’s Effective Security Order with Setup Manipulations 📺
Couplings are a type of physical default that can violate the independence assumption needed for the secure implementation of the masking countermeasure. Two recent works by De Cnudde et al. put forward qualitatively that couplings can cause information leakages of lower order than theoretically expected. However, the (quantitative) amplitude of these lower-order leakages (e.g., measured as the amplitude of a detection metric such as Welch’s T statistic) was usually lower than the one of the (theoretically expected) dth order leakages. So the actual security level of these implementations remained unaffected. In addition, in order to make the couplings visible, the authors sometimes needed to amplify them internally (e.g., by tweaking the placement and routing or iterating linear operations on the shares). In this paper, we first show that the amplitude of low-order leakages in masked implementations can be amplified externally, by tweaking side-channel measurement setups in a way that is under control of a power analysis adversary. Our experiments put forward that the “effective security order” of both hardware (FPGA) and software (ARM-32) implementations can be reduced, leading to concrete reductions of their security level. For this purpose, we move from the detection-based analyzes of previous works to attack-based evaluations, allowing to confirm the exploitability of the lower-order leakages that we amplify. We also provide a tentative explanation for these effects based on couplings, and describe a model that can be used to predict them in function of the measurement setup’s external resistor and implementation’s supply voltage. We posit that the effective security orders observed are mainly due to “externally-amplified couplings” that can be systematically exploited by actual adversaries.
2019
TCHES
Return of the Hidden Number Problem. A Widespread and Novel Key Extraction Attack on ECDSA and DSA 📺
Side channels have long been recognized as a threat to the security of cryptographic applications. Implementations can unintentionally leak secret information through many channels, such as microarchitectural state changes in processors, changes in power consumption, or electromagnetic radiation. As a result of these threats, many implementations have been hardened to defend against these attacks. Despite these mitigations, this work presents a novel side-channel attack against ECDSA and DSA. The attack targets a common implementation pattern that is found in many cryptographic libraries. In fact, about half of the libraries that were tested exhibited the vulnerable pattern. This pattern is exploited in a full proof of concept attack against OpenSSL, demonstrating that it is possible to extract a 256-bit ECDSA private key using a simple cache attack after observing only a few thousand signatures. The target of this attack is a previously unexplored part of (EC)DSA signature generation, which explains why mitigations are lacking and the issue is so widespread. Finally, estimates are provided for the minimum number of signatures needed to perform the attack, and countermeasures are suggested to protect against this attack.
2019
TCHES
Sapphire: A Configurable Crypto-Processor for Post-Quantum Lattice-based Protocols 📺
Public key cryptography protocols, such as RSA and elliptic curve cryptography, will be rendered insecure by Shor’s algorithm when large-scale quantum computers are built. Cryptographers are working on quantum-resistant algorithms, and lattice-based cryptography has emerged as a prime candidate. However, high computational complexity of these algorithms makes it challenging to implement lattice-based protocols on low-power embedded devices. To address this challenge, we present Sapphire – a lattice cryptography processor with configurable parameters. Efficient sampling, with a SHA-3-based PRNG, provides two orders of magnitude energy savings; a single-port RAM-based number theoretic transform memory architecture is proposed, which provides 124k-gate area savings; while a low-power modular arithmetic unit accelerates polynomial computations. Our test chip was fabricated in TSMC 40nm low-power CMOS process, with the Sapphire cryptographic core occupying 0.28 mm2 area consisting of 106k logic gates and 40.25 KB SRAM. Sapphire can be programmed with custom instructions for polynomial arithmetic and sampling, and it is coupled with a low-power RISC-V micro-processor to demonstrate NIST Round 2 lattice-based CCA-secure key encapsulation and signature protocols Frodo, NewHope, qTESLA, CRYSTALS-Kyber and CRYSTALS-Dilithium, achieving up to an order of magnitude improvement in performance and energy-efficiency compared to state-of-the-art hardware implementations. All key building blocks of Sapphire are constant-time and secure against timing and simple power analysis side-channel attacks. We also discuss how masking-based DPA countermeasures can be implemented on the Sapphire core without any changes to the hardware.
2019
TCHES
Secure Data Retrieval on the Cloud: Homomorphic Encryption meets Coresets 📺
Secure report is the problem of a client that retrieves all records matching specified attributes from a database table at the server (e.g. cloud), as in SQL SELECT queries, but where the query and the database are encrypted. Here, only the client has the secret key, but still the server is expected to compute and return the encrypted result. Secure report is theoretically possible with Fully Homomorphic Encryption (FHE). However, the current state-of-the-art solutions are realized by a polynomial of degree that is at least linear in the number m of records, which is too slow in practice even for very small databases. We present the first solution that is realized by a polynomial that attains degree independent of the number of records m, as well as the first implementation of an FHE solution to Secure report. This is by suggesting a novel paradigm that forges a link between cryptography and modern data summarization techniques known as coresets (core-sets), and sketches in particular. The key idea is to compute only a coreset of the desired report. Since the coreset is small, the client can quickly decode the desired report that the server computes after decrypting the coreset. We implemented our main reporting system in an open source library. This is the first implemented system that can answer such database queries when processing only FHE encrypted data and queries. As our analysis promises, the experimental results show that we can run Secure report queries on billions records in minutes on an Amazon EC2 server, compared to less than a hundred-thousands in previous FHE based solutions.
2019
TCHES
Secure Physical Enclosures from Covers with Tamper-Resistance 📺
Ensuring physical security of multiple-chip embedded systems on a PCB is challenging, since the attacker can control the device in a hostile environment. To detect physical intruders as part of a layered approach to security, it is common to create a physical security boundary that is difficult to penetrate or remove, e.g., enclosures created from tamper-respondent envelopes or covers. Their physical integrity is usually checked by active sensing, i.e., a battery-backed circuit continuously monitors the enclosure. However, adoption is often hampered by the disadvantages of a battery and due to specialized equipment which is required to create the enclosure. In contrast, we present a batteryless tamper-resistant cover made from standard flexPCB technology, i.e., a commercially widespread, scalable, and proven technology. The cover comprises a fine mesh of electrodes and an evaluation unit underneath the cover checks their integrity by detecting short and open circuits. Additionally, it measures the capacitances between the electrodes of the mesh. Once its preliminary integrity is confirmed, a cryptographic key is derived from the capacitive measurements representing a PUF, to decrypt and authenticate sensitive data of the enclosed system. We demonstrate the feasibility of our concept, provide details on the layout, electrical properties of the cover, and explain the underlying security architecture. Practical results including statistics over a set of 115 flexPCB covers, physical attacks, and environmental testing support our design rationale. Hence, our work opens up a new direction of counteracting physical tampering without the need of batteries, while aiming at a physical security level comparable to FIPS 140-2 level 3.
2019
TCHES
Security on Plastics: Fake or Real? 📺
Electronic devices on plastic foil, also referred to as flexible electronics, are making their way into mainstream applications. In the near future, flexible electronic labels can be embedded in smart blisters, but also used as mainstream technology for flexible medical patches. A key technology for flexible electronics is based on thin-film transistors, which have the potential to be manufactured at low cost, making them an ideal candidate for these applications. Yet, up to now, no-one is taking digital security into account in the design of flexible electronics.In this paper, we present, to our knowledge, the first cryptographic core on plastic foil. Two main research challenges arise. The first challenge is related to the reliability of the circuit, which typically decreases when the circuit area increases. By integrating cryptographic modules, we explore the limits of the technology, since the smallest lightweight block ciphers feature a larger area than the largest digital circuit on flex foil reported up to now. The second challenge is related to key hiding. The relatively large features on the chip and the fact that electronic chips on plastics are used as bare dies, i.e. they are not packaged, make it easy to read out the value of the stored secret key. Because there is no dedicated non-volatile memory technology yet, existing methods for writing data to the flexible chip after fabrication are based on wire cutting with a laser or inkjet printing. With these techniques, however, it is extremely easy to “see” the value of the secret key under a microscope. We propose a novel solution that allows us to invisibly program the key after fabrication.
2019
TCHES
Shaping the Glitch: Optimizing Voltage Fault Injection Attacks 📺
Voltage fault injection is a powerful active side channel attack that modifies the execution-flow of a device by creating disturbances on the power supply line. The attack typically aims at skipping security checks or generating side-channels that gradually leak sensitive data, including the firmware code. In this paper we propose a new voltage fault injection technique that generates fully arbitrary voltage glitch waveforms using off-the-shelf and low cost equipment. To show the effectiveness of our setup, we present new, unpublished firmware extraction attacks on six microcontrollers from three major manufacturers: STMicroelectronics, Texas Instruments and Renesas Electronics that, in 2016 declared a market of $1.5 billion, $800 million and $2.5 billion on units sold, respectively. Among the presented attacks, the most challenging ones exploit multiple vulnerabilities and inject over one million glitches, heavily leveraging on the performance and repeatability of the new proposed technique. We perform a thorough evaluation of arbitrary glitch waveforms by comparing the attack performance against two other major V-FI techniques in the literature. Along a responsible disclosure policy, all the vulnerabilities have been timely reported to the manufacturers.
2019
TCHES
Share-slicing: Friend or Foe? 📺
Masking is a well loved and widely deployed countermeasure against side channel attacks, in particular in software. Under certain assumptions (w.r.t. independence and noise level), masking provably prevents attacks up to a certain security order and leads to a predictable increase in the number of required leakages for successful attacks beyond this order. The noise level in typical processors where software masking is used may not be very high, thus low masking orders are not sufficient for real world security. Higher order masking however comes at a great cost, and therefore a number techniques have been published over the years that make such implementations more efficient via parallelisation in the form of bit or share slicing. We take two highly regarded schemes (ISW and Barthe et al.), and some corresponding open source implementations that make use of share slicing, and discuss their true security on an ARM Cortex-M0 and an ARM Cortex-M3 processor (both from the LPC series). We show that micro-architectural features of the M0 and M3 undermine the independence assumptions made in masking proofs and thus their theoretical guarantees do not translate into practice (even worse it seems unpredictable at which order leaks can be expected). Our results demonstrate how difficult it is to link theoretical security proofs to practical real-world security guarantees.
2019
TCHES
SITM: See-In-The-Middle Side-Channel Assisted Middle Round Differential Cryptanalysis on SPN Block Ciphers 📺
Side-channel analysis constitutes a powerful attack vector against cryptographic implementations. Techniques such as power and electromagnetic side-channel analysis have been extensively studied to provide an efficient way to recover the secret key used in cryptographic algorithms. To protect against such attacks, countermeasure designers have developed protection methods, such as masking and hiding, to make the attacks harder. However, due to significant overheads, these protections are sometimes deployed only at the beginning and the end of encryption, which are the main targets for side-channel attacks.In this paper, we present a methodology for side-channel assisted differential cryptanalysis attack to target middle rounds of block cipher implementations. Such method presents a powerful attack vector against designs that normally only protect the beginning and end rounds of ciphers. We generalize the attack to SPN based ciphers and calculate the effort the attacker needs to recover the secret key. We provide experimental results on 8-bit and 32-bit microcontrollers. We provide case studies on state-of-the-art symmetric block ciphers, such as AES, SKINNY, and PRESENT. Furthermore, we show how to attack shuffling-protected implementations.
2019
TCHES
SMT Attack: Next Generation Attack on Obfuscated Circuits with Capabilities and Performance Beyond the SAT Attacks 📺
In this paper, we introduce the Satisfiability Modulo Theory (SMT) attack on obfuscated circuits. The proposed attack is the superset of Satisfiability (SAT) attack, with many additional features. It uses one or more theory solvers in addition to its internal SAT solver. For this reason, it is capable of modeling far more complex behaviors and could formulate much stronger attacks. In this paper, we illustrate that the use of theory solvers enables the SMT to carry attacks that are not possible by SAT formulated attacks. As an example of its capabilities, we use the SMT attack to break a recent obfuscation scheme that uses key values to alter delay properties (setup and hold time) of a circuit to remain SAT hard. Considering that the logic delay is not a Boolean logical property, the targeted obfuscation mechanism is not breakable by a SAT attack. However, in this paper, we illustrate that the proposed SMT attack, by deploying a simple graph theory solver, can model and break this obfuscation scheme in few minutes. We describe how the SMT attack could be used in one of four different attack modes: (1) We explain how SMT attack could be reduced to a SAT attack, (2) how the SMT attack could be carried out in Eager, and (3) Lazy approach, and finally (4) we introduce the Accelerated SMT (AccSMT) attack that offers significant speed-up to SAT attack. Additionally, we explain how AccSMT attack could be used as an approximate attack when facing SMT-Hard obfuscation schemes.
2019
TCHES
Software Toolkit for HFE-based Multivariate Schemes 📺
In 2017, NIST shook the cryptographic world by starting a process for standardizing post-quantum cryptography. Sixty-four submissions have been considered for the first round of the on-going NIST Post-Quantum Cryptography (PQC) process. Multivariate cryptography is a classical post-quantum candidate that turns to be the most represented in the signature category. At this stage of the process, it is of primary importance to investigate efficient implementations of the candidates. This article presents MQsoft, an efficient library which permits to implement HFE-based multivariate schemes submitted to the NIST PQC process such as GeMSS, Gui and DualModeMS. The library is implemented in C targeting Intel 64-bit processors and using avx2 set instructions. We present performance results for our library and its application to GeMSS, Gui and DualModeMS. In particular, we optimize several crucial parts for these schemes. These include root finding for HFE polynomials and evaluation of multivariate quadratic systems in F2. We propose a new method which accelerates root finding for specific HFE polynomials by a factor of two. For GeMSS and Gui, we obtain a speed-up of a factor between 2 and 19 for the keypair generation, between 1.2 and 2.5 for the signature generation, and between 1.6 and 2 for the verifying process. We have also improved the arithmetic in F2n by a factor of 4 compared to the NTL library. Moreover, a large part of our implementation is protected against timing attacks.
2019
TCHES
Static Power SCA of Sub-100 nm CMOS ASICs and the Insecurity of Masking Schemes in Low-Noise Environments 📺
Semiconductor technology scaling faced tough engineering challenges while moving towards and beyond the deep sub-micron range. One of the most demanding issues, limiting the shrinkage process until the present day, is the difficulty to control the leakage currents in nanometer-scaled field-effect transistors. Previous articles have shown that this source of energy dissipation, at least in case of digital CMOS logic, can successfully be exploited as a side-channel to recover the secrets of cryptographic implementations. In this work, we present the first fair technology comparison with respect to static power side-channel measurements on real silicon and demonstrate that the effect of down-scaling on the potency of this security threat is huge. To this end, we designed two ASICs in sub-100nm CMOS nodes (90 nm, 65 nm) and got them fabricated by one of the leading foundries. Our experiments, which we performed at different operating conditions, show consistently that the ASIC technology with the smaller minimum feature size (65 nm) indeed exhibits substantially more informative leakages (factor of ~10) than the 90nm one, even though all targeted instances have been derived from identical RTL code. However, the contribution of this work extends well beyond a mere technology comparison. With respect to the real-world impact of static power attacks, we present the first realistic scenarios that allow to perform a static power side-channel analysis (including noise reduction) without requiring control over the clock signal of the target. Furthermore, as a follow-up to some proof-of-concept work indicating the vulnerability of masking schemes to static powerattacks, we perform a detailed study on how the reduction of the noise level in static leakage measurements affects the security provided by masked implementations. As a result of this study, we do not only find out that the threat for masking schemes is indeed real, but also that common leakage assessment techniques, such as the Welch’s t-test, together with essentially any moment-based analysis of the leakage traces, is simply not sufficient in low-noise contexts. In fact, we are able to show that either a conversion (resp. compression) of the leakage order or the recently proposed X2 test need to be considered in assessment and attack to avoid false negatives.
2019
TCHES
TEDT, a Leakage-Resist AEAD Mode for High Physical Security Applications 📺
We propose TEDT, a new Authenticated Encryption with Associated Data (AEAD) mode leveraging Tweakable Block Ciphers (TBCs). TEDT provides the following features: (i) It offers full leakage-resistance, that is, it limits the exploitability of physical leakages via side-channel attacks, even if these leakages happen during every message encryption and decryption operation. Moreover, the leakage integrity bound is asymptotically optimal in the multi-user setting. (ii) It offers nonce misuse-resilience, that is, the repetition of nonces does not impact the security of ciphertexts produced with fresh nonces. (iii) It can be implemented with a remarkably low energy cost when strong resistance to side-channel attacks is needed, supports online encryption and handles static and incremental associated data efficiently. Concretely, TEDT encourages so-called leveled implementations, in which two TBCs are implemented: the first one needs strong and energy demanding protections against side-channel attacks but is used in a limited way, while the other only requires weak and energy-efficient protections and performs the bulk of the computation. As a result, TEDT leads to more energy-efficient implementations compared to traditional AEAD schemes, whose side-channel security requires to uniformly protect every (T)BC execution.
2019
TCHES
The Curse of Class Imbalance and Conflicting Metrics with Machine Learning for Side-channel Evaluations 📺
We concentrate on machine learning techniques used for profiled sidechannel analysis in the presence of imbalanced data. Such scenarios are realistic and often occurring, for instance in the Hamming weight or Hamming distance leakage models. In order to deal with the imbalanced data, we use various balancing techniques and we show that most of them help in mounting successful attacks when the data is highly imbalanced. Especially, the results with the SMOTE technique are encouraging, since we observe some scenarios where it reduces the number of necessary measurements more than 8 times. Next, we provide extensive results on comparison of machine learning and side-channel metrics, where we show that machine learning metrics (and especially accuracy as the most often used one) can be extremely deceptive. This finding opens a need to revisit the previous works and their results in order to properly assess the performance of machine learning in side-channel analysis.
2019
TCHES
The Interpose PUF: Secure PUF Design against State-of-the-art Machine Learning Attacks 📺
The design of a silicon Strong Physical Unclonable Function (PUF) that is lightweight and stable, and which possesses a rigorous security argument, has been a fundamental problem in PUF research since its very beginnings in 2002. Various effective PUF modeling attacks, for example at CCS 2010 and CHES 2015, have shown that currently, no existing silicon PUF design can meet these requirements. In this paper, we introduce the novel Interpose PUF (iPUF) design, and rigorously prove its security against all known machine learning (ML) attacks, including any currently known reliability-based strategies that exploit the stability of single CRPs (we are the first to provide a detailed analysis of when the reliability based CMA-ES attack is successful and when it is not applicable). Furthermore, we provide simulations and confirm these in experiments with FPGA implementations of the iPUF, demonstrating its practicality. Our new iPUF architecture so solves the currently open problem of constructing practical, silicon Strong PUFs that are secure against state-of-the-art ML attacks.
2019
TCHES
Towards Globally Optimized Masking: From Low Randomness to Low Noise Rate 📺
We improve the state-of-the-art masking schemes in two important directions. First, we propose a new masked multiplication algorithm that satisfies a recently introduced notion called Probe-Isolating Non-Interference (PINI). It captures a sufficient requirement for designing masked implementations in a trivial way, by combining PINI multiplications and linear operations performed share by share. Our improved algorithm has the best reported randomness complexity for large security orders (while the previous PINI multiplication was best for small orders). Second, we analyze the security of most existing multiplication algorithms in the literature against so-called horizontal attacks, which aim to reduce the noise of the actual leakages measured by an adversary, by combining the information of multiple target intermediate values. For this purpose, we leave the (abstract) probing model and consider a specialization of the (more realistic) noisy leakage / random probing models. Our (still partially heuristic but quantitative) analysis allows confirming the improved security of an algorithm by Battistello et al. from CHES 2016 in this setting. We then use it to propose new improved algorithms, leading to better tradeoffs between randomness complexity and noise rate, and suggesting the possibility to design efficient masked multiplication algorithms with constant noise rate in F2.