Papers from Transactions on Cryptographic Hardware and Embedded Systems 2021

CryptoDB

Papers from Transactions on Cryptographic Hardware and Embedded Systems 2021

Year

Venue

Title

2021

TCHES

A Compact Hardware Implementation of CCA-Secure Key Exchange Mechanism CRYSTALS-KYBER on FPGA 📺 Abstract

Yufei Xing Shuguo Li

Post-quantum cryptosystems should be prepared before the advent of powerful quantum computers to ensure information secure in our daily life. In 2016 a post-quantum standardization contest was launched by National Institute of Standards and Technology (NIST), and there have been lots of works concentrating on evaluation of these candidate protocols, mainly in pure software or through hardware-software co-design methodology on different platforms. As the contest progresses to third round in July 2020 with only 7 finalists and 8 alternate candidates remained, more dedicated and specific hardware designs should be considered to illustrate the intrinsic property of a certain protocol and achieve better performance. To this end, we present a standalone hardware design of CRYSTALS-KYBER, amodule learning-with-errors (MLWE) based key exchange mechanism (KEM) protocol within the 7 finalists on FPGA platform. Through elaborate scheduling of sampling and number theoretic transform (NTT) related calculations, decent performance is achieved with limited hardware resources. The way that Encode/Decode and the tweaked Fujisaki-Okamoto transform are implemented is demonstrated in detail. Analysis about minimizing memory footprint is also given out. In summary, we realize the adaptive chosen ciphertext attack (CCA) secure Kyber with all selectable module dimension k on the smallest Xilinx Artix-7 device. Our design computes key-generation, encapsulation (encryption) and decapsulation (decryption and reencryption) phase in 3768/5079/6668 cycles when k = 2, 6316/7925/10049 cycles when k = 3, and 9380/11321/13908 cycles when k = 4, consuming 7412/6785 LUTs, 4644/3981 FFs, 2126/1899 slices, 2/2 DSPs and 3/3 BRAMs in server/client with 6.2/6.0 ns critical path delay, outperforming corresponding high level synthesis (HLS) based designs or hardware-software co-designs to a large extent.

2021

TCHES

A Side-Channel Attack on a Masked IND-CCA Secure Saber KEM Implementation 📺 Abstract

Kalle Ngo Elena Dubrova Qian Guo Thomas Johansson

In this paper, we present a side-channel attack on a first-order masked implementation of IND-CCA secure Saber KEM. We show how to recover both the session key and the long-term secret key from 24 traces using a deep neural network created at the profiling stage. The proposed message recovery approach learns a higher-order model directly, without explicitly extracting random masks at each execution. This eliminates the need for a fully controllable profiling device which is required in previous attacks on masked implementations of LWE/LWR-based PKEs/KEMs. We also present a new secret key recovery approach based on maps from error-correcting codes that can compensate for some errors in the recovered message. In addition, we discovered a previously unknown leakage point in the primitive for masked logical shifting on arithmetic shares.

2021

TCHES

A White-Box Masking Scheme Resisting Computational and Algebraic Attacks 📺 Abstract

Okan Seker Thomas Eisenbarth Maciej Liskiewicz

White-box cryptography attempts to protect cryptographic secrets in pure software implementations. Due to their high utility, white-box cryptosystems (WBC) are deployed by the industry even though the security of these constructions is not well defined. A major breakthrough in generic cryptanalysis of WBC was Differential Computation Analysis (DCA), which requires minimal knowledge of the underlying white-box protection and also thwarts many obfuscation methods. To avert DCA, classic masking countermeasures originally intended to protect against highly related side-channel attacks have been proposed for use in WBC. However, due to the controlled environment of WBCs, new algebraic attacks against classic masking schemes have quickly been found. These algebraic DCA attacks break all classic masking countermeasures efficiently, as they are independent of the masking order.In this work, we propose a novel generic masking scheme that can resist both DCA and algebraic DCA attacks. The proposed scheme extends the seminal work by Ishai et al. which is probing secure and thus resists DCA, to also resist algebraic attacks. To prove the security of our scheme, we demonstrate the connection between two main security notions in white-box cryptography: probing security and prediction security. Resistance of our masking scheme to DCA is proven for an arbitrary order of protection, using the well-known strong non-interference notion by Barthe et al. Our masking scheme also resists algebraic attacks, which we show concretely for first and second-order algebraic protection. Moreover, we present an extensive performance analysis and quantify the overhead of our scheme, for a proof-of-concept protection of an AES implementation.

2021

TCHES

AES-LBBB: AES Mode for Lightweight and BBB-Secure Authenticated Encryption 📺 Abstract

Yusuke Naito Yu Sasaki Takeshi Sugawara

In this paper, a new lightweight authenticated encryption scheme AESLBBB is proposed, which was designed to provide backward compatibility with advanced encryption standard (AES) as well as high security and low memory. The primary design goal, backward compatibility, is motivated by the fact that AES accelerators are now very common for devices in the field; we are interested in designing an efficient and highly secure mode of operation that exploits the best of those AES accelerators. The backward compatibility receives little attention in the NIST lightweight cryptography standardization process, in which only 3 out of 32 round-2 candidates are based on AES. Our mode, LBBB, is inspired by the design of ALE in the sense that the internal state size is a minimum 2n bits when using a block cipher of length n bits for the key and data. Unfortunately, there is no security proof of ALE, and forgery attacks have been found on ALE. In LBBB, we introduce an additional feed from block cipher’s output to the key state via a certain permutation λ, which enables us to prove beyond-birthday-bound (BBB) security. We then specify its AES instance, AES-LBBB, and evaluate its performance for (i) software implementation on a microcontroller with an AES coprocessor and (ii) hardware implementation for an application-specific integrated circuit (ASIC) to show that AES-LBBB performs better than the current state-of-the-art Remus-N2 with AES-128.

2021

TCHES

An Instruction Set Extension to Support Software-Based Masking 📺 Abstract

Si Gao Johann Großschädl Ben Marshall Daniel Page Thinh Pham Francesco Regazzoni

In both hardware and software, masking can represent an effective means of hardening an implementation against side-channel attack vectors such as Differential Power Analysis (DPA). Focusing on software, however, the use of masking can present various challenges: specifically, it often 1) requires significant effort to translate any theoretical security properties into practice, and, even then, 2) imposes a significant overhead in terms of efficiency. To address both challenges, this paper explores the use of an Instruction Set Extension (ISE) to support masking in software-based implementations of a range of (symmetric) cryptographic kernels including AES: we design, implement, and evaluate such an ISE, using RISC-V as the base ISA. Our ISE-supported first-order masked implementation of AES, for example, is an order of magnitude more efficient than a software-only alternative with respect to both execution latency and memory footprint; this renders it comparable to an unmasked implementation using the same metrics, but also first-order secure.

2021

TCHES

Analysis and Comparison of Table-based Arithmetic to Boolean Masking 📺 Abstract

Michiel Van Beirendonck Jan-Pieter D’Anvers Ingrid Verbauwhede

Masking is a popular technique to protect cryptographic implementations against side-channel attacks and comes in several variants including Boolean and arithmetic masking. Some masked implementations require conversion between these two variants, which is increasingly the case for masking of post-quantum encryption and signature schemes. One way to perform Arithmetic to Boolean (A2B) mask conversion is a table-based approach first introduced by Coron and Tchulkine, and later corrected and adapted by Debraize in CHES 2012. In this work, we show both analytically and experimentally that the table-based A2B conversion algorithm proposed by Debraize does not achieve the claimed resistance against differential power analysis due to a non-uniform masking of an intermediate variable. This non-uniformity is hard to find analytically but leads to clear leakage in experimental validation. To address the non-uniform masking issue, we propose two new A2B conversions: one that maintains efficiency at the cost of additional memory and one that trades efficiency for a reduced memory footprint. We give analytical and experimental evidence for their security, and will make their implementations, which are shown to be free from side-channel leakage in 100.000 power traces collected on the ARM Cortex-M4, available online. We conclude that when designing side-channel protection mechanisms, it is of paramount importance to perform both a theoretical analysis and an experimental validation of the method.

2021

TCHES

Attacking and Defending Masked Polynomial Comparison for Lattice-Based Cryptography 📺 Abstract

Shivam Bhasin Jan-Pieter D’Anvers Daniel Heinz Thomas Pöppelmann Michiel Van Beirendonck

In this work, we are concerned with the hardening of post-quantum key encapsulation mechanisms (KEM) against side-channel attacks, with a focus on the comparison operation required for the Fujisaki-Okamoto (FO) transform. We identify critical vulnerabilities in two proposals for masked comparison and successfully attack the masked comparison algorithms from TCHES 2018 and TCHES 2020. To do so, we use first-order side-channel attacks and show that the advertised security properties do not hold. Additionally, we break the higher-order secured masked comparison from TCHES 2020 using a collision attack, which does not require side-channel information. To enable implementers to spot such flaws in the implementation or underlying algorithms, we propose a framework that is designed to test the re-encryption step of the FO transform for information leakage. Our framework relies on a specifically parametrized t-test and would have identified the previously mentioned flaws in the masked comparison. Our framework can be used to test both the comparison itself and the full decapsulation implementation.

2021

TCHES

Batching CSIDH Group Actions using AVX-512 📺 Abstract

Hao Cheng Georgios Fotiadis Johann Großschädl Peter Y. A. Ryan Peter B. Rønne

Commutative Supersingular Isogeny Diffie-Hellman (or CSIDH for short) is a recently-proposed post-quantum key establishment scheme that belongs to the family of isogeny-based cryptosystems. The CSIDH protocol is based on the action of an ideal class group on a set of supersingular elliptic curves and comes with some very attractive features, e.g. the ability to serve as a “drop-in” replacement for the standard elliptic curve Diffie-Hellman protocol. Unfortunately, the execution time of CSIDH is prohibitively high for many real-world applications, mainly due to the enormous computational cost of the underlying group action. Consequently, there is a strong demand for optimizations that increase the efficiency of the class group action evaluation, which is not only important for CSIDH, but also for related cryptosystems like the signature schemes CSI-FiSh and SeaSign. In this paper, we explore how the AVX-512 vector extensions (incl. AVX-512F and AVX-512IFMA) can be utilized to optimize constant-time evaluation of the CSIDH-512 class group action with the goal of, respectively, maximizing throughput and minimizing latency. We introduce different approaches for batching group actions and computing them in SIMD fashion on modern Intel processors. In particular, we present a hybrid batching technique that, when combined with optimized (8 × 1)-way prime-field arithmetic, increases the throughput by a factor of 3.64 compared to a state-of-the-art (non-vectorized) x64 implementation. On the other hand, vectorization in a 2-way fashion aimed to reduce latency makes our AVX-512 implementation of the group action evaluation about 1.54 times faster than the state-of-the-art. To the best of our knowledge, this paper is the first to demonstrate the high potential of using vector instructions to increase the throughput (resp. decrease the latency) of constant-time CSIDH.

2021

TCHES

Breaking CAS-Lock and Its Variants by Exploiting Structural Traces 📺 Abstract

Abhrajit Sengupta Nimisha Limaye Ozgur Sinanoglu

Logic locking is a prominent solution to protect against design intellectual property theft. However, there has been a decade-long cat-and-mouse game between defenses and attacks. A turning point in logic locking was the development of miterbased Boolean satisfiability (SAT) attack that steered the research in the direction of developing SAT-resilient schemes. These schemes, however achieved SAT resilience at the cost of low output corruption. Recently, cascaded locking (CAS-Lock) [SXTF20a] was proposed that provides non-trivial output corruption all-the-while maintaining resilience to the SAT attack. Regardless of the theoretical properties, we revisit some of the assumptions made about its implementation, especially about security-unaware synthesis tools, and subsequently expose a set of structural vulnerabilities that can be exploited to break these schemes. We propose our attacks on baseline CAS-Lock as well as mirrored CAS (M-CAS), an improved version of CAS-Lock. We furnish extensive simulation results of our attacks on ISCAS’85 and ITC’99 benchmarks, where we show that CAS-Lock/M-CAS can be broken with ∼94% success rate. Further, we open-source all implementation scripts, locked circuits, and attack scripts for the community. Finally, we discuss the pitfalls of point function-based locking techniques including Anti-SAT [XS18] and Stripped Functionality Logic Locking(SFLL-HD) [YSN+17], which suffer from similar implementation issues.

2021

TCHES

Breaking Masked Implementations with Many Shares on 32-bit Software Platforms: or When the Security Order Does Not Matter 📺 Abstract

Olivier Bronchain François-Xavier Standaert

We explore the concrete side-channel security provided by state-of-theart higher-order masked software implementations of the AES and the (candidate to the NIST Lightweight Cryptography competition) Clyde, in ARM Cortex-M0 and M3 devices. Rather than looking for possibly reduced security orders (as frequently considered in the literature), we directly target these implementations by assuming their maximum security order and aim at reducing their noise level thanks to multivariate, horizontal and analytical attacks. Our investigations point out that the Cortex-M0 device has so limited physical noise that masking is close to ineffective. The Cortex-M3 shows a better trend but still requires a large number of shares to provide strong security guarantees. Practically, we first exhibit a full 128-bit key recovery in less than 10 traces for a 6-share masked AES implementation running on the Cortex-M0 requiring 232 enumeration power. A similar attack performed against the Cortex-M3 with 5 shares require 1,000 measurements with 244 enumeration power. We then show the positive impact of lightweight block ciphers with limited number of AND gates for side-channel security, and compare our attacks against a masked Clyde with the best reported attacks of the CHES 2020 CTF. We complement these experiments with a careful information theoretic analysis, which allows interpreting our results. We also discuss our conclusions under the umbrella of “backwards security evaluations” recently put forwards by Azouaoui et al. We finally extrapolate the evolution of the proposed attack complexities in the presence of additional countermeasures using the local random probing model proposed at CHES 2020.

2021

TCHES

Chosen Ciphertext k-Trace Attacks on Masked CCA2 Secure Kyber 📺 Abstract

Mike Hamburg Julius Hermelink Robert Primas Simona Samardjiska Thomas Schamberger Silvan Streit Emanuele Strieder Christine van Vredendaal

Single-trace attacks are a considerable threat to implementations of classic public-key schemes, and their implications on newer lattice-based schemes are still not well understood. Two recent works have presented successful single-trace attacks targeting the Number Theoretic Transform (NTT), which is at the heart of many lattice-based schemes. However, these attacks either require a quite powerful side-channel adversary or are restricted to specific scenarios such as the encryption of ephemeral secrets. It is still an open question if such attacks can be performed by simpler adversaries while targeting more common public-key scenarios. In this paper, we answer this question positively. First, we present a method for crafting ring/module-LWE ciphertexts that result in sparse polynomials at the input of inverse NTT computations, independent of the used private key. We then demonstrate how this sparseness can be incorporated into a side-channel attack, thereby significantly improving noise resistance of the attack compared to previous works. The effectiveness of our attack is shown on the use-case of CCA2 secure Kyber k-module-LWE, where k ∈ {2, 3, 4}. Our k-trace attack on the long-term secret can handle noise up to a σ ≤ 1.2 in the noisy Hamming weight leakage model, also for masked implementations. A 2k-trace variant for Kyber1024 even allows noise σ ≤ 2.2 also in the masked case, with more traces allowing us to recover keys up to σ ≤ 2.7. Single-trace attack variants have a noise tolerance depending on the Kyber parameter set, ranging from σ ≤ 0.5 to σ ≤ 0.7. As a comparison, similar previous attacks in the masked setting were only successful with σ ≤ 0.5.

2021

TCHES

Classic McEliece on the ARM Cortex-M4 📺 Abstract

Ming-Shing Chen Tung Chou

This paper presents a constant-time implementation of Classic McEliece for ARM Cortex-M4. Specifically, our target platform is stm32f4-Discovery, a development board on which the amount of SRAM is not even large enough to hold the public key of the smallest parameter sets of Classic McEliece. Fortunately, the flash memory is large enough, so we use it to store the public key. For the level-1 parameter sets mceliece348864 and mceliece348864f, our implementation takes 582 199 cycles for encapsulation and 2 706 681 cycles for decapsulation. Compared to the level-1 parameter set of FrodoKEM, our encapsulation time is more than 80 times faster, and our decapsulation time is more than 17 times faster. For the level-3 parameter sets mceliece460896 and mceliece460896f, our implementation takes 1 081 335 cycles for encapsulation and 6 535 186 cycles for decapsulation. In addition, our implementation is also able to carry out key generation for the level-1 parameter sets and decapsulation for level-5 parameter sets on the board.

2021

TCHES

Combining Optimization Objectives: New Modeling Attacks on Strong PUFs 📺 Abstract

Johannes Tobisch Anita Aghaie Georg T. Becker

Strong Physical Unclonable Functions (PUFs), as a promising security primitive, are supposed to be a lightweight alternative to classical cryptography for purposes such as device authentication. Most of the proposed candidates, however, have been plagued by modeling attacks breaking their security claims. The Interpose PUF (iPUF), which has been introduced at CHES 2019, was explicitly designed with state-of-the-art modeling attacks in mind and is supposed to be impossible to break by classical and reliability attacks. In this paper, we analyze its vulnerability to reliability attacks. Despite the increased difficulty, these attacks are still feasible, against the original authors’ claim. We explain how adding constraints to the modeling objective streamlines reliability attacks and allows us to model all individual components of an iPUF successfully. In order to build a practical attack, we give several novel contributions. First, we demonstrate that reliability attacks can be performed not only with covariance matrix adaptation evolution strategy (CMA-ES) but also with gradient-based optimization. Second, we show that the switch to gradient-based reliability attacks makes it possible to combine reliability attacks, weight constraints, and Logistic Regression (LR) into a single optimization objective. This framework makes modeling attacks more efficient, as it exploits knowledge of responses and reliability information at the same time. Third, we show that a differentiable model of the iPUF exists and how it can be utilized in a combined reliability attack. We confirm that iPUFs are harder to break than regular XOR Arbiter PUFs. However, we are still able to break (1,10)-iPUF instances, which were originally assumed to be secure, with less than 107 PUF response queries.

2021

TCHES

Countermeasures against Static Power Attacks: – Comparing Exhaustive Logic Balancing and Other Protection Schemes in 28 nm CMOS – 📺 Abstract

Thorben Moos Amir Moradi

In recent years it has been demonstrated convincingly that the standby power of a CMOS chip reveals information about the internally stored and processed data. Thus, for adversaries who seek to extract secrets from cryptographic devices via side-channel analysis, the static power has become an attractive quantity to obtain. Most works have focused on the destructive side of this subject by demonstrating attacks. In this work, we examine potential solutions to protect circuits from silently leaking sensitive information during idle times. We focus on countermeasures that can be implemented using any common digital standard cell library and do not consider solutions that require full-custom or analog design flow. In particular, we evaluate and compare a set of five distinct standard-cell-based hiding countermeasures, including both, randomization and equalization techniques. We then combine the hiding countermeasures with state-of-the-art hardware masking in order to amplify the noise level and achieve a high resistance against attacks. An important part of our contribution is the proposal and evaluation of the first ever standard-cell-based balancing scheme which achieves perfect data-independence on paper, i.e., in absence of intra-die process variations and aging effects. We call our new countermeasure Exhaustive Logic Balancing (ELB). While this scheme, applied to a threshold implementation, provides the highest level of resistance in our experiments, it may not be the most cost effective option due to the significant resource overhead associated. All evaluated countermeasures and combinations thereof are applied to a serialized hardware implementation of the PRESENT block cipher and realized as cryptographic co-processors on a 28nm CMOS ASIC prototype. Our experimental results are obtained through real-silicon measurements of a fabricated die of the ASIC in a temperature-controlled environment using a source measure unit (SMU). We believe that our elaborate comparison serves as a useful guideline for hardware designers to find a proper tradeoff between security and cost for almost any application.

2021

TCHES

Cross-Device Profiled Side-Channel Attack with Unsupervised Domain Adaptation 📺 Abstract

Pei Cao Chi Zhang Xiangjun Lu Dawu Gu

Deep learning (DL)-based techniques have recently proven to be very successful when applied to profiled side-channel attacks (SCA). In a real-world profiled SCA scenario, attackers gain knowledge about the target device by getting access to a similar device prior to the attack. However, most state-of-the-art literature performs only proof-of-concept attacks, where the traces intended for profiling and attacking are acquired consecutively on the same fully-controlled device. This paper reminds that even a small discrepancy between the profiling and attack traces (regarded as domain discrepancy) can cause a successful single-device attack to completely fail. To address the issue of domain discrepancy, we propose a Cross-Device Profiled Attack (CDPA), which introduces an additional fine-tuning phase after establishing a pretrained model. The fine-tuning phase is designed to adjust the pre-trained network, such that it can learn a hidden representation that is not only discriminative but also domain-invariant. In order to obtain domain-invariance, we adopt a maximum mean discrepancy (MMD) loss as a constraint term of the classic cross-entropy loss function. We show that the MMD loss can be easily calculated and embedded in a standard convolutional neural network. We evaluate our strategy on both publicly available datasets and multiple devices (eight Atmel XMEGA 8-bit microcontrollers and three SAKURA-G evaluation boards). The results demonstrate that CDPA can improve the performance of the classic DL-based SCA by orders of magnitude, which significantly eliminates the impact of domain discrepancy caused by different devices.

2021

TCHES

CTIDH: faster constant-time CSIDH 📺 Abstract

Gustavo Banegas Daniel J. Bernstein Fabio Campos Tung Chou Tanja Lange Michael Meyer Benjamin Smith Jana Sotáková

This paper introduces a new key space for CSIDH and a new algorithm for constant-time evaluation of the CSIDH group action. The key space is not useful with previous algorithms, and the algorithm is not useful with previous key spaces, but combining the new key space with the new algorithm produces speed records for constant-time CSIDH. For example, for CSIDH-512 with a 256-bit key space, the best previous constant-time results used 789000 multiplications and more than 200 million Skylake cycles; this paper uses 438006 multiplications and 125.53 million cycles.

2021

TCHES

Cutting Through the Complexity of Reverse Engineering Embedded Devices 📺 Abstract

Sam L. Thomas Jan Van den Herrewegen Georgios Vasilakis Zitai Chen Mihai Ordean Flavio D. Garcia

Performing security analysis of embedded devices is a challenging task. They present many difficulties not usually found when analyzing commodity systems: undocumented peripherals, esoteric instruction sets, and limited tool support. Thus, a significant amount of reverse engineering is almost always required to analyze such devices. In this paper, we present Incision, an architecture and operating-system agnostic reverse engineering framework. Incision tackles the problem of reducing the upfront effort to analyze complex end-user devices. It combines static and dynamic analyses in a feedback loop, enabling information from each to be used in tandem to improve our overall understanding of the firmware analyzed. We use Incision to analyze a variety of devices and firmware. Our evaluation spans firmware based on three RTOSes, an automotive ECU, and a 4G/LTE baseband. We demonstrate that Incision does not introduce significant complexity to the standard reverse engineering process and requires little manual effort to use. Moreover, its analyses produce correct results with high confidence and are robust across different OSes and ISAs.

2021

TCHES

Denial-of-Service on FPGA-based Cloud Infrastructures — Attack and Defense 📺 Abstract

Tuan La Khoa Pham Joseph Powell Dirk Koch

This paper presents attacks targeting the FPGAs of AWS F1 instances at the electrical level through power-hammering, where excessive dynamic power is used to crash FPGA instances. We demonstrate different power-hammering attacks that pass all AWS security fences implemented on F1 instances, including the FPGA vendor design rule checks. In addition, we fingerprint the FPGA instances to observe the responsiveness of the instances, which indicates a successful denial-of-service attack. Most importantly, we provide an FPGA virus scanner framework, which was improved to support large datacenter FPGAs for preventing such attacks, including virtually all currently demonstrated side-channel attacks. Our experiments showed that an AWS F1 instance crashes immediately by starting an FPGA design demanding 369W. By using FPGA-fingerprinting, we found that crashed instances are unavailable for about one to over 200 hours.

2021

TCHES

DL-LA: Deep Learning Leakage Assessment: A modern roadmap for SCA evaluations 📺 Abstract

Thorben Moos Felix Wegener Amir Moradi

In recent years, deep learning has become an attractive ingredient to side-channel analysis (SCA) due to its potential to improve the success probability or enhance the performance of certain frequently executed tasks. One task that is commonly assisted by machine learning techniques is the profiling of a device’s leakage behavior in order to carry out a template attack. At CHES 2019, deep learning has also been applied to non-profiled scenarios for the first time, extending its reach within SCA beyond template attacks. The proposed method, called DDLA, has some tempting advantages over traditional SCA due to merits inherited from (convolutional) neural networks. Most notably, it greatly reduces the need for pre-processing steps< when the SCA traces are misaligned or when the leakage is of a multivariate nature. However, similar to traditional attack scenarios the success of this approach highly depends on the correct choice of a leakage model and the intermediate value to target. In this work we explore, for the first time in literature, whether deep learning can similarly be used as an instrument to advance another crucial (non-profiled) discipline of SCA which is inherently independent of leakage models and targeted intermediates, namely leakage assessment. In fact, given the simple classification-based nature of common leakage assessment techniques, in particular distinguishing two groups fixed-vs-random or fixed-vs-fixed, it comes as a surprise that machine learning has not been brought into this context, yet. Our contribution is the development of the first full leakage assessment methodology based on deep learning. It gives the evaluator the freedom to not worry about location, alignment and statistical order of the leakages and easily covers multivariate and horizontal patterns as well. We test our approach against a number of case studies based on FPGA, ASIC and μC implementations of the PRESENT block cipher, equipped with state-of-the-art SCA countermeasures. Our results clearly show that the proposed methodology and network structures are robust across all case studies and outperform the classical detection approaches (t-test and X2-test) in all considered scenarios.

2021

TCHES

Efficiency through Diversity in Ensemble Models applied to Side-Channel Attacks: – A Case Study on Public-Key Algorithms – 📺 Abstract

Gabriel Zaid Lilian Bossuet Amaury Habrard Alexandre Venelli

Deep Learning based Side-Channel Attacks (DL-SCA) are considered as fundamental threats against secure cryptographic implementations. Side-channel attacks aim to recover a secret key using the least number of leakage traces. In DL-SCA, this often translates in having a model with the highest possible accuracy. Increasing an attack’s accuracy is particularly important when an attacker targets public-key cryptographic implementations where the recovery of each secret key bits is directly related to the model’s accuracy. Commonly used in the deep learning field, ensemble models are a well suited method that combine the predictions of multiple models to increase the ensemble accuracy by reducing the correlation between their errors. Linked to this correlation, the diversity is considered as an indicator of the ensemble model performance. In this paper, we propose a new loss, namely Ensembling Loss (EL), that generates an ensemble model which increases the diversity between the members. Based on the mutual information between the ensemble model and its related label, we theoretically demonstrate how the ensemble members interact during the training process. We also study how an attack’s accuracy gain translates to a drastic reduction of the remaining time complexity of a side-channel attacks through multiple scenarios on public-key implementations. Finally, we experimentally evaluate the benefits of our new learning metric on RSA and ECC secure implementations. The Ensembling Loss increases by up to 6.8% the performance of the ensemble model while the remaining brute-force is reduced by up to 222 operations depending on the attack scenario.

2021

TCHES

Fault Attacks on CCA-secure Lattice KEMs 📺 Abstract

Peter Pessl Lukas Prokop

NIST’s post-quantum standardization effort very recently entered its final round. This makes studying the implementation-security aspect of the remaining candidates an increasingly important task, as such analyses can aid in the final selection process and enable appropriately secure wider deployment after standardization. However, lattice-based key-encapsulation mechanisms (KEMs), which are prominently represented among the finalists, have thus far received little attention when it comes to fault attacks.Interestingly, many of these KEMs exhibit structural similarities. They can be seen as variants of the encryption scheme of Lyubashevsky, Peikert, and Rosen, and employ the Fujisaki-Okamoto transform (FO) to achieve CCA2 security. The latter involves re-encrypting a decrypted plaintext and testing the ciphertexts for equivalence. This corresponds to the classic countermeasure of computing the inverse operation and hence prevents many fault attacks.In this work, we show that despite this inherent protection, practical fault attacks are still possible. We present an attack that requires a single instruction-skipping fault in the decoding process, which is run as part of the decapsulation. After observing if this fault actually changed the outcome (effective fault) or if the correct result is still returned (ineffective fault), we can set up a linear inequality involving the key coefficients. After gathering enough of these inequalities by faulting many decapsulations, we can solve for the key using a bespoke statistical solving approach. As our attack only requires distinguishing effective from ineffective faults, various detection-based countermeasures, including many forms of double execution, can be bypassed.We apply this attack to Kyber and NewHope, both of which belong to the aforementioned class of schemes. Using fault simulations, we show that, e.g., 6,500 faulty decapsulations are required for full key recovery on Kyber512. To demonstrate practicality, we use clock glitches to attack Kyber running on a Cortex M4. As we argue that other schemes of this class, such as Saber, might also be susceptible, the presented attack clearly shows that one cannot rely on the FO transform’s fault deterrence and that proper countermeasures are still needed.

2021

TCHES

FIVER – Robust Verification of Countermeasures against Fault Injections 📺 Abstract

Jan Richter-Brockmann Aein Rezaei Shahmirzadi Pascal Sasdrich Amir Moradi Tim Güneysu

Fault Injection Analysis is seen as a powerful attack against implementations of cryptographic algorithms. Over the last two decades, researchers proposed a plethora of countermeasures to secure such implementations. However, the design process and implementation are still error-prone, complex, and manual tasks which require long-standing experience in hardware design and physical security. Moreover, the validation of the claimed security is often only done by empirical testing in a very late stage of the design process. To prevent such empirical testing strategies, approaches based on formal verification are applied instead providing the designer early feedback.In this work, we present a fault verification framework to validate the security of countermeasures against fault-injection attacks designed for ICs. The verification framework works on netlist-level, parses the given digital circuit into a model based on Binary Decision Diagrams, and performs symbolic fault injections. This verification approach constitutes a novel strategy to evaluate protected hardware designs against fault injections offering new opportunities as performing full analyses under a given fault models.Eventually, we apply the proposed verification framework to real-world implementations of well-established countermeasures against fault-injection attacks. Here, we consider protected designs of the lightweight ciphers CRAFT and LED-64 as well as AES. Due to several optimization strategies, our tool is able to perform more than 90 million fault injections in a single-round CRAFT design and evaluate the security in under 50 min while the symbolic simulation approach considers all 2128 primary inputs.

2021

TCHES

Higher-Order Lookup Table Masking in Essentially Constant Memory 📺 Abstract

Annapurna Valiveti Srinivas Vivek

Masking using randomised lookup tables is a popular countermeasure for side-channel attacks, particularly at small masking orders. An advantage of this class of countermeasures for masking S-boxes compared to ISW-based masking is that it supports pre-processing and thus significantly reducing the amount of computation to be done after the unmasked inputs are available. Indeed, the “online” computation can be as fast as just a table lookup. But the size of the randomised lookup table increases linearly with the masking order, and hence the RAM memory required to store pre-processed tables becomes infeasible for higher masking orders. Hence demonstrating the feasibility of full pre-processing of higher-order lookup table-based masking schemes on resource-constrained devices has remained an open problem. In this work, we solve the above problem by implementing a higher-order lookup table-based scheme using an amount of RAM memory that is essentially independent of the masking order. More concretely, we reduce the amount of RAM memory needed for the table-based scheme of Coron et al. (TCHES 2018) approximately by a factor equal to the number of shares. Our technique is based upon the use of pseudorandom number generator (PRG) to minimise the randomness complexity of ISW-based masking schemes proposed by Ishai et al. (ICALP 2013) and Coron et al. (Eurocrypt 2020). Hence we show that for lookup table-based masking schemes, the use of a PRG not only reduces the randomness complexity (now logarithmic in the size of the S-box) but also the memory complexity, and without any significant increase in the overall running time. We have implemented in software the higher-order table-based masking scheme of Coron et al. (TCHES 2018) at tenth order with full pre-processing of a single execution of all the AES S-boxes on a ARM Cortex-M4 device that has 256 KB RAM memory. Our technique requires only 41.2 KB of RAM memory, whereas the original scheme would have needed 440 KB. Moreover, our 8-bit implementation results demonstrate that the online execution time of our variant is about 1.5 times faster compared to the 8-bit bitsliced masked implementation of AES-128.

2021

TCHES

Improved Leakage-Resistant Authenticated Encryption based on Hardware AES Coprocessors 📺 Abstract

Olivier Bronchain Charles Momin Thomas Peters François-Xavier Standaert

We revisit Unterstein et al.’s leakage-resilient authenticated encryption scheme from CHES 2020. Its main goal is to enable secure software updates by leveraging unprotected (e.g., AES, SHA256) coprocessors available on low-end microcontrollers. We show that the design of this scheme ignores an important attack vector that can significantly reduce its security claims, and that the evaluation of its leakage-resilient PRF is quite sensitive to minor variations of its measurements, which can easily lead to security overstatements. We then describe and analyze a new mode of operation for which we propose more conservative security parameters and show that it competes with the CHES 2020 one in terms of performances. As an additional bonus, our solution relies only on AES-128 coprocessors, an

2021

TCHES

Inconsistency of Simulation and Practice in Delay-based Strong PUFs 📺 Abstract

Anita Aghaie Amir Moradi

The developments in the areas of strong Physical Unclonable Functions (PUFs) predicate an ongoing struggle between designers and attackers. Such a combat motivated the atmosphere of open research, hence enhancing PUF designs in the presence of Machine Learning (ML) attacks. As an example of this controversy, at CHES 2019, a novel delay-based PUF (iPUF) has been introduced and claimed to be resistant against various ML and reliability attacks. At CHES 2020, a new divide-and-conquer modeling attack (splitting iPUF) has been presented showing the vulnerability of even large iPUF variants.Such attacks and analyses are naturally examined purely in the simulation domain, where some metrics like uniformity are assumed to be ideal. This assumption is motivated by a common belief that implementation defects (such as bias) may ease the attacks. In this paper, we highlight the critical role of uniformity in the success of ML attacks, and for the first time present a case where the bias originating from implementation defects hardens certain learning problems in complex PUF architectures. We present the result of our investigations conducted on a cluster of 100 Xilinx Artix 7 FPGAs, showing the incapability of the splitting iPUF attack to model even small iPUF instances when facing a slight non-uniformity. In fact, our findings imply that non-ideal conditions due to implementation defects should also be considered when developing an attack vector on complex PUF architectures like iPUF. On the other hand, we observe a relatively low uniqueness even when following the suggestions made by the iPUF’s original authors with respect to the FPGA implementations, which indeed questions the promised physical unclonability.

2021

TCHES

Information Leakages in Code-based Masking: A Unified Quantification Approach 📺 Abstract

Wei Cheng Sylvain Guilley Claude Carlet Jean-Luc Danger Sihem Mesnager

This paper presents a unified approach to quantifying the information leakages in the most general code-based masking schemes. Specifically, by utilizing a uniform representation, we highlight first that all code-based masking schemes’ side-channel resistance can be quantified by an all-in-one framework consisting of two easy-tocompute parameters (the dual distance and the number of conditioned codewords) from a coding-theoretic perspective. In particular, we use signal-to-noise ratio (SNR) and mutual information (MI) as two complementary metrics, where a closed-form expression of SNR and an approximation of MI are proposed by connecting both metrics to the two coding-theoretic parameters. Secondly, considering the connection between Reed-Solomon code and SSS (Shamir’s Secret Sharing) scheme, the SSS-based masking is viewed as a particular case of generalized code-based masking. Hence as a straightforward application, we evaluate the impact of public points on the side-channel security of SSS-based masking schemes, namely the polynomial masking, and enhance the SSS-based masking by choosing optimal public points for it. Interestingly, we show that given a specific security order, more shares in SSS-based masking leak more information on secrets in an information-theoretic sense. Finally, our approach provides a systematic method for optimizing the side-channel resistance of every code-based masking. More precisely, this approach enables us to select optimal linear codes (parameters) for the generalized code-based masking by choosing appropriate codes according to the two coding-theoretic parameters. Summing up, we provide a best-practice guideline for the application of code-based masking to protect cryptographic implementations.

2021

TCHES

Learning Parity with Physical Noise: Imperfections, Reductions and FPGA Prototype 📺 Abstract

Davide Bellizia Clément Hoffmann Dina Kamel Hanlin Liu Pierrick Méaux François-Xavier Standaert Yu Yu

Hard learning problems are important building blocks for the design of various cryptographic functionalities such as authentication protocols and post-quantum public key encryption. The standard implementations of such schemes add some controlled errors to simple (e.g., inner product) computations involving a public challenge and a secret key. Hard physical learning problems formalize the potential gains that could be obtained by leveraging inexact computing to directly generate erroneous samples. While they have good potential for improving the performances and physical security of more conventional samplers when implemented in specialized integrated circuits, it remains unknown whether physical defaults that inevitably occur in their instantiation can lead to security losses, nor whether their implementation can be viable on standard platforms such as FPGAs. We contribute to these questions in the context of the Learning Parity with Physical Noise (LPPN) problem by: (1) exhibiting new (output) data dependencies of the error probabilities that LPPN samples may suffer from; (2) formally showing that LPPN instances with such dependencies are as hard as the standard LPN problem; (3) analyzing an FPGA prototype of LPPN processor that satisfies basic security and performance requirements.

2021

TCHES

Let’s Take it Offline: Boosting Brute-Force Attacks on iPhone’s User Authentication through SCA 📺 Abstract

Oleksiy Lisovets David Knichel Thorben Moos Amir Moradi

In recent years, smartphones have become an increasingly important storage facility for personal sensitive data ranging from photos and credentials up to financial and medical records like credit cards and person’s diseases. Trivially, it is critical to secure this information and only provide access to the genuine and authenticated user. Smartphone vendors have already taken exceptional care to protect user data by the means of various software and hardware security features like code signing, authenticated boot chain, dedicated co-processor and integrated cryptographic engines with hardware fused keys. Despite these obstacles, adversaries have successfully broken through various software protections in the past, leaving only the hardware as the last standing barrier between the attacker and user data. In this work, we build upon existing software vulnerabilities and break through the final barrier by performing the first publicly reported physical Side-Channel Analysis (SCA) attack on an iPhone in order to extract the hardware-fused devicespecific User Identifier (UID) key. This key – once at hand – allows the adversary to perform an offline brute-force attack on the user passcode employing an optimized and scalable implementation of the Key Derivation Function (KDF) on a Graphics Processing Unit (GPU) cluster. Once the passcode is revealed, the adversary has full access to all user data stored on the device and possibly in the cloud.As the software exploit enables acquisition and processing of hundreds of millions oftraces, this work further shows that an attacker being able to query arbitrary many chosen-data encryption/decryption requests is a realistic model, even for compact systems with advanced software protections, and emphasizes the need for assessing resilience against SCA for a very high number of traces.

2021

TCHES

LifeLine for FPGA Protection: Obfuscated Cryptography for Real-World Security 📺 Abstract

Florian Stolz Nils Albartus Julian Speith Simon Klix Clemens Nasenberg Aiden Gula Marc Fyrbiak Christof Paar Tim Güneysu Russell Tessier

Over the last decade attacks have repetitively demonstrated that bitstream protection for SRAM-based FPGAs is a persistent problem without a satisfying solution in practice. Hence, real-world hardware designs are prone to intellectual property infringement and malicious manipulation as they are not adequately protected against reverse-engineering.In this work, we first review state-of-the-art solutions from industry and academia and demonstrate their ineffectiveness with respect to reverse-engineering and design manipulation. We then describe the design and implementation of novel hardware obfuscation primitives based on the intrinsic structure of FPGAs. Based on our primitives, we design and implement LifeLine, a hardware design protection mechanism for FPGAs using hardware/software co-obfuscated cryptography. We show that LifeLine offers effective protection for a real-world adversary model, requires minimal integration effort for hardware designers, and retrofits to already deployed (and so far vulnerable) systems.

2021

TCHES

Low-Latency Keccak at any Arbitrary Order 📺 Abstract

Sara Zarei Aein Rezaei Shahmirzadi Hadi Soleimany Raziyeh Salarifard Amir Moradi

Correct application of masking on hardware implementation of cryptographic primitives necessitates the instantiation of registers in order to achieve the non-completeness (commonly said to stop the propagation of glitches). This sometimes leads to a high latency overhead, making the implementation not necessarily suitable for the underlying application. As a concrete example, this holds for Keccak. Application of d + 1 Domain Oriented Masking (DOM) on a round-based implementation of Keccak leads to the introduction of two register stages per round, i.e., two times higher latency. On the other hand, Rhythmic-Keccak, introduced in CHES 2018, unrolls two rounds to half the latency compared to an unprotected ordinary round-based implementation. To that end, td + 1 masking is used which requires a notable area, and – apart from the difficulty to construct – its extension to higher orders seems beyond the bounds of feasibility.In this paper, we focus on d + 1 masking and introduce a methodology which enables us to stay with the latency of an unprotected round-based implementation, i.e., one register stage per round. While being secure under glitch-extended probing model, we provide a general design where the desired security order can be easily adjusted without any effect on the above-given latency. Compared to the Rhythmic-Keccak, the synthesis results show that our first-order design is able to accomplish the entire operations of Keccak-f[200] in the same period of time while decreasing the area by 74.5%. Notably, our implementations achieve around 30% less delay compared to the corresponding original DOM-Keccak designs.

2021

TCHES

Machine Learning of Physical Unclonable Functions using Helper Data: Revealing a Pitfall in the Fuzzy Commitment Scheme 📺 Abstract

Emanuele Strieder Christoph Frisch Michael Pehl

Physical Unclonable Functions (PUFs) are used in various key-generation schemes and protocols. Such schemes are deemed to be secure even for PUFs with challenge-response behavior, as long as no responses and no reliability information about the PUF are exposed. This work, however, reveals a pitfall in these constructions: When using state-of-the-art helper data algorithms to correct noisy PUF responses, an attacker can exploit the publicly accessible helper data and challenges. We show that with this public information and the knowledge of the underlying error correcting code, an attacker can break the security of the system: The redundancy in the error correcting code reveals machine learnable features and labels. Learning these features and labels results in a predictive model for the dependencies between different challenge-response pairs (CRPs) without direct access to the actual PUF response. We provide results based on simulated data of a k-SUM PUF model and an Arbiter PUF model. We also demonstrate the attack for a k-SUM PUF model generated from real data and discuss the impact on more recent PUF constructions such as the Multiplexer PUF and the Interpose PUF. The analysis reveals that especially the frequently used repetition code is vulnerable: For a SUM-PUF in combination with a repetition code, e.g., already the observation of 800 challenges and helper data bits suffices to reduce the entropy of the key down to one bit. The analysis also shows that even other linear block codes like the BCH, the Reed-Muller, or the Single Parity Check code are affected by the problem. The code-dependent insights we gain from the analysis allow us to suggest mitigation strategies for the identified attack. While the shown vulnerability advances Machine Learning (ML) towards realistic attacks on key-storage systems with PUFs, our analysis also facilitates a better understanding and evaluation of existing approaches and protocols with PUFs. Therefore, it brings the community one step closer to a more complete leakage assessment of PUFs.

2021

TCHES

Masking in Fine-Grained Leakage Models: Construction, Implementation and Verification 📺 Abstract

Gilles Barthe Marc Gourjon Benjamin Grégoire Maximilian Orlt Clara Paglialonga Lars Porth

We propose a new approach for building efficient, provably secure, and practically hardened implementations of masked algorithms. Our approach is based on a Domain Specific Language in which users can write efficient assembly implementations and fine-grained leakage models. The latter are then used as a basis for formal verification, allowing for the first time formal guarantees for a broad range of device-specific leakage effects not addressed by prior work. The practical benefits of our approach are demonstrated through a case study of the PRESENT S-Box: we develop a highly optimized and provably secure masked implementation, and show through practical evaluation based on TVLA that our implementation is practically resilient. Our approach significantly narrows the gap between formal verification of masking and practical security.

2021

TCHES

Masking Kyber: First- and Higher-Order Implementations 📺 Abstract

Joppe W. Bos Marc Gourjon Joost Renes Tobias Schneider Christine van Vredendaal

In the final phase of the post-quantum cryptography standardization effort, the focus has been extended to include the side-channel resistance of the candidates. While some schemes have been already extensively analyzed in this regard, there is no such study yet of the finalist Kyber.In this work, we demonstrate the first completely masked implementation of Kyber which is protected against first- and higher-order attacks. To the best of our knowledge, this results in the first higher-order masked implementation of any post-quantum secure key encapsulation mechanism algorithm. This is realized by introducing two new techniques. First, we propose a higher-order algorithm for the one-bit compression operation. This is based on a masked bit-sliced binary-search that can be applied to prime moduli. Second, we propose a technique which enables one to compare uncompressed masked polynomials with compressed public polynomials. This avoids the costly masking of the ciphertext compression while being able to be instantiated at arbitrary orders.We show performance results for first-, second- and third-order protected implementations on the Arm Cortex-M0+ and Cortex-M4F. Notably, our implementation of first-order masked Kyber decapsulation requires 3.1 million cycles on the Cortex-M4F. This is a factor 3.5 overhead compared to the unprotected optimized implementationin pqm4. We experimentally show that the first-order implementation of our new modules on the Cortex-M0+ is hardened against attacks using 100 000 traces and mechanically verify the security in a fine-grained leakage model using the verification tool scVerif.

2021

TCHES

My other car is your car: compromising the Tesla Model X keyless entry system 📺 Abstract

★ CHES 2021 Best Paper Award

Lennert Wouters Benedikt Gierlichs Bart Preneel

This paper documents a practical security evaluation of the Tesla Model X keyless entry system. In contrast to other works, the keyless entry system analysed in this paper employs secure symmetric-key and public-key cryptographic primitives implemented by a Common Criteria certified Secure Element. We document the internal workings of this system, covering the key fob, the body control module and the pairing protocol. Additionally, we detail our reverse engineering techniques and document several security issues. The identified issues in the key fob firmware update mechanism and the key fob pairing protocol allow us to bypass all of the cryptographic security measures put in place. To demonstrate the practical impact of our research we develop a fully remote Proof-of-Concept attack that allows to gain access to the vehicle’s interior in a matter of minutes and pair a modified key fob, allowing to drive off. Our attack is not a relay attack, as our new key fob allows us to start the car anytime anywhere. Finally, we provide an analysis of the update performed by Tesla to mitigate our findings. Our work highlights how the increased complexity and connectivity of vehicular systems can result in a larger and easier to exploit attack surface.

2021

TCHES

New First-Order Secure AES Performance Records 📺 Abstract

Aein Rezaei Shahmirzadi Dušan Božilov Amir Moradi

Being based on a sound theoretical basis, masking schemes are commonly applied to protect cryptographic implementations against Side-Channel Analysis (SCA) attacks. Constructing SCA-protected AES, as the most widely deployed block cipher, has been naturally the focus of several research projects, with a direct application in industry. The majority of SCA-secure AES implementations introduced to the community opted for low area and latency overheads considering Application-Specific Integrated Circuit (ASIC) platforms. Albeit a few, those which particularly targeted Field Programmable Gate Arrays (FPGAs) as the implementation platform yield either a low throughput or a not-highly secure design.In this work, we fill this gap by introducing first-order glitch-extended probing secure masked AES implementations highly optimized for FPGAs, which support both encryption and decryption. Compared to the state of the art, our designs efficiently map the critical non-linear parts of the masked S-box into the built-in Block RAMs (BRAMs).The most performant variant of our constructions accomplishes five first-order secure AES encryptions/decryptions simultaneously in 50 clock cycles. Compared to the equivalent state-of-the-art designs, this leads to at least 70% reduction in utilization of FPGA resources (slices) at the cost of occupying BRAMs. Last but not least, we provide a wide range of such secure and efficient implementations supporting a large set of applications, ranging from low-area to high-throughput.

2021

TCHES

Novel Key Recovery Attack on Secure ECDSA Implementation by Exploiting Collisions between Unknown Entries 📺 Abstract

Sunghyun Jin Sangyub Lee Sung Min Cho HeeSeok Kim Seokhie Hong

In this paper, we propose a novel key recovery attack against secure ECDSA signature generation employing regular table-based scalar multiplication. Our attack exploits novel leakage, denoted by collision information, which can be constructed by iteratively determining whether two entries loaded from the table are the same or not through side-channel collision analysis. Without knowing the actual value of the table entries, an adversary can recover the private key of ECDSA by finding the condition for which several nonces are linearly dependent by exploiting only the collision information. We show that this condition can be satisfied practically with a reasonable number of digital signatures and corresponding traces. Furthermore, we also show that all entries in the pre-computation table can be recovered using the recovered private key and a sufficient number of digital signatures based on the collision information. As case studies, we find that fixed-base comb and T_SM scalar multiplication are vulnerable to our attack. Finally, we verify that our attack is a real threat by conducting an experiment with power consumption traces acquired during T_SM scalar multiplication operations on an ARM Cortex-M based microcontroller. We also provide the details for validation process.

2021

TCHES

NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2 📺 Abstract

Chi-Ming Marvin Chung Vincent Hwang Matthias J. Kannwischer Gregor Seiler Cheng-Jhih Shih Bo-Yin Yang

In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.

2021

TCHES

Online Template Attacks: Revisited 📺 Abstract

Alejandro Cabrera Aldaya Billy Bob Brumley

An online template attack (OTA) is a powerful technique previously used to attack elliptic curve scalar multiplication algorithms. This attack has only been analyzed in the realm of power consumption and EM side channels, where the signals leak related to the value being processed. However, microarchitecture signals have no such feature, invalidating some assumptions from previous OTA works.In this paper, we revisit previous OTA descriptions, proposing a generic framework and evaluation metrics for any side-channel signal. Our analysis reveals OTA features not previously considered, increasing its application scenarios and requiring a fresh countermeasure analysis to prevent it.In this regard, we demonstrate that OTAs can work in the backward direction, allowing to mount an augmented projective coordinates attack with respect to the proposal by Naccache, Smart and Stern (Eurocrypt 2004). This demonstrates that randomizing the initial targeted algorithm state does not prevent the attack as believed in previous works.We analyze three libraries libgcrypt, mbedTLS, and wolfSSL using two microarchitecture side channels. For the libgcrypt case, we target its EdDSA implementation using Curve25519 twist curve. We obtain similar results for mbedTLS and wolfSSL with curve secp256r1. For each library, we execute extensive attack instances that are able to recover the complete scalar in all cases using a single trace.This work demonstrates that microarchitecture online template attacks are also very powerful in this scenario, recovering secret information without knowing a leakage model. This highlights the importance of developing secure-by-default implementations, instead of fix-on-demand ones.

2021

TCHES

Optimizing BIKE for the Intel Haswell and ARM Cortex-M4 📺 Abstract

Ming-Shing Chen Tung Chou Markus Krausz

BIKE is a key encapsulation mechanism that entered the third round of the NIST post-quantum cryptography standardization process. This paper presents two constant-time implementations for BIKE, one tailored for the Intel Haswell and one tailored for the ARM Cortex-M4. Our Haswell implementation is much faster than the avx2 implementation written by the BIKE team: for bikel1, the level-1 parameter set, we achieve a 1.39x speedup for decapsulation (which is the slowest operation) and a 1.33x speedup for the sum of all operations. For bikel3, the level-3 parameter set, we achieve a 1.5x speedup for decapsulation and a 1.46x speedup for the sum of all operations. Our M4 implementation is more than two times faster than the non-constant-time implementation portable written by the BIKE team. The speedups are achieved by both algorithm-level and instruction-level optimizations.

2021

TCHES

Over 100x Faster Bootstrapping in Fully Homomorphic Encryption through Memory-centric Optimization with GPUs 📺 Abstract

Wonkyung Jung Sangpyo Kim Jung Ho Ahn Jung Hee Cheon Younho Lee

Fully Homomorphic encryption (FHE) has been gaining in popularity as an emerging means of enabling an unlimited number of operations in an encrypted message without decryption. A major drawback of FHE is its high computational cost. Specifically, a bootstrapping step that refreshes the noise accumulated through consequent FHE operations on the ciphertext can even take minutes of time. This significantly limits the practical use of FHE in numerous real applications.By exploiting the massive parallelism available in FHE, we demonstrate the first instance of the implementation of a GPU for bootstrapping CKKS, one of the most promising FHE schemes supporting the arithmetic of approximate numbers. Through analyzing CKKS operations, we discover that the major performance bottleneck is their high main-memory bandwidth requirement, which is exacerbated by leveraging existing optimizations targeted to reduce the required computation. These observations motivate us to utilize memory-centric optimizations such as kernel fusion and reordering primary functions extensively.Our GPU implementation shows a 7.02× speedup for a single CKKS multiplication compared to the state-of-the-art GPU implementation and an amortized bootstrapping time of 0.423us per bit, which corresponds to a speedup of 257× over a single-threaded CPU implementation. By applying this to logistic regression model training, we achieved a 40.0× speedup compared to the previous 8-thread CPU implementation with the same data.

2021

TCHES

Pay Attention to Raw Traces: A Deep Learning Architecture for End-to-End Profiling Attacks 📺 Abstract

Xiangjun Lu Chi Zhang Pei Cao Dawu Gu Haining Lu

With the renaissance of deep learning, the side-channel community also notices the potential of this technology, which is highly related to the profiling attacks in the side-channel context. Many papers have recently investigated the abilities of deep learning in profiling traces. Some of them also aim at the countermeasures (e.g., masking) simultaneously. Nevertheless, so far, all of these papers work with an (implicit) assumption that the number of time samples in raw traces can be reduced before the profiling, i.e., the position of points of interest (PoIs) can be manually located. This is arguably the most challenging part of a practical black-box analysis targeting an implementation protected by masking. Therefore, we argue that to fully utilize the potential of deep learning and get rid of any manual intervention, the end-to-end profiling directly mapping raw traces to target intermediate values is demanded.In this paper, we propose a neural network architecture that consists of encoders, attention mechanisms and a classifier, to conduct the end-to-end profiling. The networks built by our architecture could directly classify the traces that contain a large number of time samples (i.e., raw traces without manual feature extraction) while whose underlying implementation is protected by masking. We validate our networks on several public datasets, i.e., DPA contest v4 and ASCAD, where over 100,000 time samples are directly used in profiling. To our best knowledge, we are the first that successfully carry out end-to-end profiling attacks. The results on the datasets indicate that our networks could get rid of the tricky manual feature extraction. Moreover, our networks perform even systematically better (w.r.t. the number of traces in attacks) than those trained on the reduced traces. These validations imply our approach is not only a first but also a concrete step towards end-to-end profiling attacks in the side-channel context.

2021

TCHES

Probing Security through Input-Output Separation and Revisited Quasilinear Masking 📺 Abstract

Dahmun Goudarzi Thomas Prest Matthieu Rivain Damien Vergnaud

The probing security model is widely used to formally prove the security of masking schemes. Whenever a masked implementation can be proven secure in this model with a reasonable leakage rate, it is also provably secure in a realistic leakage model known as the noisy leakage model. This paper introduces a new framework for the composition of probing-secure circuits. We introduce the security notion of input-output separation (IOS) for a refresh gadget. From this notion, one can easily compose gadgets satisfying the classical probing security notion –which does not ensure composability on its own– to obtain a region probing secure circuit. Such a circuit is secure against an adversary placing up to t probes in each gadget composing the circuit, which ensures a tight reduction to the more realistic noisy leakage model. After introducing the notion and proving our composition theorem, we compare our approach to the composition approaches obtained with the (Strong) Non-Interference (S/NI) notions as well as the Probe-Isolating Non-Interference (PINI) notion. We further show that any uniform SNI gadget achieves the IOS security notion, while the converse is not true. We further describe a refresh gadget achieving the IOS property for any linear sharing with a quasilinear complexity Θ(n log n) and a O(1/ log n) leakage rate (for an n-size sharing). This refresh gadget is a simplified version of the quasilinear SNI refresh gadget proposed by Battistello, Coron, Prouff, and Zeitoun (ePrint 2016). As an application of our composition framework, we revisit the quasilinear-complexity masking scheme of Goudarzi, Joux and Rivain (Asiacrypt 2018). We improve this scheme by generalizing it to any base field (whereas the original proposal only applies to field with nth powers of unity) and by taking advantage of our composition approach. We further patch a flaw in the original security proof and extend it from the random probing model to the stronger region probing model. Finally, we present some application of this extended quasilinear masking scheme to AES and MiMC and compare the obtained performances.

2021

TCHES

Provably Secure Hardware Masking in the Transition- and Glitch-Robust Probing Model: Better Safe than Sorry 📺 Abstract

Gaëtan Cassiers François-Xavier Standaert

There exists many masking schemes to protect implementations of cryptographic operations against side-channel attacks. It is common practice to analyze the security of these schemes in the probing model, or its variant which takes into account physical effects such as glitches and transitions. Although both effects exist in practice and cause leakage, masking schemes implemented in hardware are often only analyzed for security against glitches. In this work, we fill this gap by proving sufficient conditions for the security of hardware masking schemes against transitions, leading to the design of new masking schemes and a proof of security for an existing masking scheme in presence of transitions. Furthermore, we give similar results in the stronger model where the effects of glitches and transitions are combined.

2021

TCHES

Rainbow on Cortex-M4 📺 Abstract

Tung Chou Matthias J. Kannwischer Bo-Yin Yang

We present the first Cortex-M4 implementation of the NISTPQC signature finalist Rainbow. We target the Giant Gecko EFM32GG11B which comes with 512 kB of RAM which can easily accommodate the keys of RainbowI.We present fast constant-time bitsliced F16 multiplication allowing multiplication of 32 field elements in 32 clock cycles. Additionally, we introduce a new way of computing the public map P in the verification procedure allowing vastly faster signature verification.Both the signing and verification procedures of our implementation are by far the fastest among the NISTPQC signature finalists. Signing of rainbowIclassic requires roughly 957 000 clock cycles which is 4× faster than the state of the art Dilithium2 implementation and 45× faster than Falcon-512. Verification needs about 239 000 cycles which is 5× and 2× faster respectively. The cost of signing can be further decreased by 20% when storing the secret key in a bitsliced representation.

2021

TCHES

RASSLE: Return Address Stack based Side-channel LEakage 📺 Abstract

Anirban Chakraborty Sarani Bhattacharya Manaar Alam Sikhar Patranabis Debdeep Mukhopadhyay

Microarchitectural attacks on computing systems often stem from simple artefacts in the underlying architecture. In this paper, we focus on the Return Address Stack (RAS), a small hardware stack present in modern processors to reduce the branch miss penalty by storing the return addresses of each function call. The RAS is useful to handle specifically the branch predictions for the RET instructions which are not accurately predicted by the typical branch prediction units. In particular, we envisage a spy process who crafts an overflow condition in the RAS by filling it with arbitrary return addresses, and wrestles with a concurrent process to establish a timing side channel between them. We call this attack principle, RASSLE 1 (Return Address Stack based Side-channel Leakage), which an adversary can launch on modern processors by first reverse engineering the RAS using a generic methodology exploiting the established timing channel. Subsequently, we show three concrete attack scenarios: i) How a spy can establish a covert channel with another co-residing process? ii) How RASSLE can be utilized to determine the secret key of the P-384 curves in OpenSSL (v1.1.1 library)? iii) How an Elliptic Curve Digital Signature Algorithm (ECDSA) secret key on P-256 curve of OpenSSL can be revealed using Lattice Attack on partially leaked nonces with the aid of RASSLE? In this attack, we show that the OpenSSL implementation of scalar multiplication on this curve has varying number of add-and-sub function calls, which depends on the secret scalar bits. We demonstrate through several experiments that the number of add-and-sub function calls can be used to template the secret bit, which can be picked up by the spy using the principles of RASSLE. Finally, we demonstrate a full end-to-end attack on OpenSSL ECDSA using curve parameters of curve P-256. In this part of our experiments with RASSLE, we abuse the deadline scheduler policy to attain perfect synchronization between the spy and victim, without any aid of induced synchronization from the victim code. This synchronization and timing leakage through RASSLE is sufficient to retrieve the Most Significant Bits (MSB) of the ephemeral nonces used while signature generation, from which we subsequently retrieve the secret signing key of the sender applying the Hidden Number Problem. 1RASSLE is a non-standard spelling for wrestle.

2021

TCHES

Reinforcement Learning for Hyperparameter Tuning in Deep Learning-based Side-channel Analysis 📺 Abstract

Jorai Rijsdijk Lichao Wu Guilherme Perin Stjepan Picek

Deep learning represents a powerful set of techniques for profiling sidechannel analysis. The results in the last few years show that neural network architectures like multilayer perceptron and convolutional neural networks give strong attack performance where it is possible to break targets protected with various countermeasures. Considering that deep learning techniques commonly have a plethora of hyperparameters to tune, it is clear that such top attack results can come with a high price in preparing the attack. This is especially problematic as the side-channel community commonly uses random search or grid search techniques to look for the best hyperparameters.In this paper, we propose to use reinforcement learning to tune the convolutional neural network hyperparameters. In our framework, we investigate the Q-Learning paradigm and develop two reward functions that use side-channel metrics. We mount an investigation on three commonly used datasets and two leakage models where the results show that reinforcement learning can find convolutional neural networks exhibiting top performance while having small numbers of trainable parameters. We note that our approach is automated and can be easily adapted to different datasets. Several of our newly developed architectures outperform the current state-of-the-art results. Finally, we make our source code publicly available. https://github.com/AISyLab/Reinforcement-Learning-for-SCA

2021

TCHES

Revealing the Weakness of Addition Chain Based Masked SBox Implementations 📺 Abstract

Jingdian Ming Huizhong Li Yongbin Zhou Wei Cheng Zehua Qiao

Addition chain is a well-known approach for implementing higher-order masked SBoxes. However, this approach induces more computations of intermediate monomials over F2n, which in turn leak more information related to the sensitive variables and may decrease its side-channel resistance consequently. In this paper, we introduce a new notion named polygon degree to measure the resistance of monomial computations. With the help of this notion, we select several typical addition chain implementations with the strongest or the weakest resistance. In practical experiments based on an ARM Cortex-M4 architecture, we collect power and electromagnetic traces in consideration of different noise levels. The results show that the resistance of the weakest masked SBox implementation is close to that of an unprotected implementation, while the strongest one can also be broken with fewer than 1,500 traces due to extra leakages. Moreover, we study the resistance of addition chain implementations against profiled attacks. We find that some monomials with smaller output size leak more information than the SBox output. The work by Duc et al. at JOC 2019 showed that for a balanced function, the smaller the output size is, the less information is leaked. Thus, our attacks demonstrate that this property of balanced functions does not apply to unbalanced functions.

2021

TCHES

Revisiting the functional bootstrap in TFHE 📺 Abstract

Antonio Guimarães Edson Borin Diego F. Aranha

The FHEW cryptosystem introduced the idea that an arbitrary function can be evaluated within the bootstrap procedure as a table lookup. The faster bootstraps of TFHE strengthened this approach, which was later named Functional Bootstrap (Boura et al., CSCML’19). From then on, little effort has been made towards defining efficient ways of using it to implement functions with high precision. In this paper, we introduce two methods to combine multiple functional bootstraps to accelerate the evaluation of reasonably large look-up tables and highly precise functions. We thoroughly analyze and experimentally validate the error propagation in both methods, as well as in the functional bootstrap itself. We leverage the multi-value bootstrap of Carpov et al. (CT-RSA’19) to accelerate (single) lookup table evaluation, and we improve it by lowering the complexity of its error variance growth from quadratic to linear in the value of the output base. Compared to previous literature using TFHE’s functional bootstrap, our methods are up to 2.49 times faster than the lookup table evaluation of Carpov et al. (CT-RSA’19) and up to 3.19 times faster than the 32-bit integer comparison of Bourse et al. (CT-RSA’20). Compared to works using logic gates, we achieved speedups of up to 6.98, 8.74, and 3.55 times over 8-bit implementations of the functions ReLU, Addition, and Maximum, respectively.

2021

TCHES

ROTed: Random Oblivious Transfer for embedded devices 📺 Abstract

P. Branco L. Fiolhais M. Goulão P. Martins P. Mateus L. Sousa

Oblivious Transfer (OT) is a fundamental primitive in cryptography, supporting protocols such as Multi-Party Computation and Private Set Intersection (PSI), that are used in applications like contact discovery, remote diagnosis and contact tracing. Due to its fundamental nature, it is utterly important that its execution is secure even if arbitrarily composed with other instances of the same, or other protocols. This property can be guaranteed by proving its security under the Universal Composability model. Herein, a 3-round Random Oblivious Transfer (ROT) protocol is proposed, which achieves high computational efficiency, in the Random Oracle Model. The security of the protocol is based on the Ring Learning With Errors assumption (for which no quantum solver is known). ROT is the basis for OT extensions and, thus, achieves wide applicability, without the overhead of compiling ROTs from OTs. Finally, the protocol is implemented in a server-class Intel processor and four application-class ARM processors, all with different architectures. The usage of vector instructions provides on average a 40% speedup. The implementation shows that our proposal is at least one order of magnitude faster than the state-of-the-art, and is suitable for a wide range of applications in embedded systems, IoT, desktop, and servers. From a memory footprint perspective, there is a small increase (16%) when compared to the state-of-the-art. This increase is marginal and should not prevent the usage of the proposed protocol in a multitude of devices. In sum, the proposal achieves up to 37k ROTs/s in an Intel server-class processor and up to 5k ROTs/s in an ARM application-class processor. A PSI application, using the proposed ROT, is up to 6.6 times faster than related art.

2021

TCHES

Scabbard: a suite of efficient learning with rounding key-encapsulation mechanisms 📺 Abstract

Jose Maria Bermudo Mera Angshuman Karmakar Suparna Kundu Ingrid Verbauwhede

In this paper, we introduce Scabbard, a suite of post-quantum keyencapsulation mechanisms. Our suite contains three different schemes Florete, Espada, and Sable based on the hardness of module- or ring-learning with rounding problem. In this work, we first show how the latest advancements on lattice-based cryptographycan be utilized to create new better schemes and even improve the state-of-the-art on post-quantum cryptography. We put particular focus on designing schemes that can optimally exploit the parallelism offered by certain hardware platforms and are also suitable for resource constrained devices. We show that this can be achieved without compromising the security of the schemes or penalizing their performance on other platforms.To substantiate our claims, we provide optimized implementations of our three new schemes on a wide range of platforms including general-purpose Intel processors using both portable C and vectorized instructions, embedded platforms such as Cortex-M4 microcontrollers, and hardware platforms such as FPGAs. We show that on each platform, our schemes can outperform the state-of-the-art in speed, memory footprint, or area requirements.

2021

TCHES

SEAL-Embedded: A Homomorphic Encryption Library for the Internet of Things 📺 Abstract

Deepika Natarajan Wei Dai

The growth of the Internet of Things (IoT) has led to concerns over the lack of security and privacy guarantees afforded by IoT systems. Homomorphic encryption (HE) is a promising privacy-preserving solution to allow devices to securely share data with a cloud backend; however, its high memory consumption and computational overhead have limited its use on resource-constrained embedded devices. To address this problem, we present SEAL-Embedded, the first HE library targeted for embedded devices, featuring the CKKS approximate homomorphic encryption scheme. SEAL-Embedded employs several computational and algorithmic optimizations along with a detailed memory re-use scheme to achieve memory efficient, high performance CKKS encoding and encryption on embedded devices without any sacrifice of security. We additionally provide an “adapter” server module to convert data encrypted by SEAL-Embedded to be compatible with the Microsoft SEAL library for homomorphic encryption, enabling an end-to-end solution for building privacy-preserving applications. For a polynomial ring degree of 4096, using RNS primes of 30 or fewer bits, our library can be configured to use between 64–137 KB of RAM and 1–264 KB of flash data, depending on developer-selected configurations and tradeoffs. Using these parameters, we evaluate SEAL-Embedded on two different IoT platforms with high performance, memory efficient, and balanced configurations of the library for asymmetric and symmetric encryption. With 136 KB of RAM, SEAL-Embedded can perform asymmetric encryption of 2048 single-precision numbers in 77 ms on the Azure Sphere Cortex-A7 and 737 ms on the Nordic nRF52840 Cortex-M4.

2021

TCHES

Second-Order SCA Security with almost no Fresh Randomness 📺 Abstract

Aein Rezaei Shahmirzadi Amir Moradi

Masking schemes are among the most popular countermeasures against Side-Channel Analysis (SCA) attacks. Realization of masked implementations on hardware faces several difficulties including dealing with glitches. Threshold Implementation (TI) is known as the first strategy with provable security in presence of glitches. In addition to the desired security order d, TI defines the minimum number of shares to also depend on the algebraic degree of the target function. This may lead to unaffordable implementation costs for higher orders.For example, at least five shares are required to protect the smallest nonlinear function against second-order attacks. By cuttingsuch a dependency, the successor schemes are able to achieve the same security level by just d + 1 shares, at the cost of high demand for fresh randomness, particularly at higher orders. In this work, we provide a methodology to realize the second-order glitch-extended probing-secure implementation of a group of quadratic functions with three shares and no fresh randomness. This allows us to construct second-order secure implementations of several cryptographic primitives with very limited number of fresh masks, including Keccak, SKINNY, Midori, PRESENT, and PRINCE.

2021

TCHES

Secure, Accurate, and Practical Narrow-Band Ranging System 📺 Abstract

Aysajan Abidin Mohieddine El Soussi Jac Romme Pepijn Boer Dave Singelée Christian Bachmann

Relay attacks pose a serious security threat to wireless systems, such as, contactless payment systems, keyless entry systems, or smart access control systems. Distance bounding protocols, which allow an entity to not only authenticate another entity but also determine whether it is physically close by, effectively mitigate relay attacks. However, secure implementation of distance bounding protocols, especially of the time critical challenge-response phase, has been a challenging task. In this paper, we design and implement a secure and accurate distance bounding protocol based on Narrow-Band signals, such as Bluetooth Low Energy (BLE), to particularly mitigate relay attacks. Narrow-Band ranging, specifically, phase-based ranging, enables accurate distance measurement, but it is vulnerable to phase rollover attacks. In our solution, we mitigate phase rollover attacks by also measuring time-of-flight (ToF) to detect the delay introduced by such attacks. Therefore, our protocol effectively combines the best of both worlds: phase-based ranging for accuracy and time-of-flight (ToF) measurement for security. To demonstrate the feasibility and practicality of our solution, we prototype it on NXP KW36 BLE chips and evaluate its performance and relay attack resistance. The obtained precision and accuracy of the presented ranging solution are 2.5 cm and 30 cm, respectively, in wireless measurements.

2021

TCHES

Security and Trust in Open Source Security Tokens 📺 Abstract

Marc Schink Alexander Wagner Florian Unterstein Johann Heyszl

Using passwords for authentication has been proven vulnerable in countless security incidents. Hardware security tokens effectively prevent most password-related security issues and improve security indisputably. However, we would like to highlight that there are new threats from attackers with physical access which need to be discussed. Supply chain adversaries may manipulate devices on a large scale and install backdoors before they even reach end users. In evil maid scenarios, specific devices may even be attacked while already in use. Hence, we thoroughly investigate the security and trustworthiness of seven commercially available open source security tokens, including devices from the two market leaders: SoloKeys and Nitrokey. Unfortunately, we identify and practically verify significant vulnerabilities in all seven examined tokens. Some of them are based on severe, previously undiscovered, vulnerabilities of two major microcontrollers which are used at a large scale in various products. Our findings clearly emphasize the significant threat from supply chain and evil maid scenarios since the attacks are practical and only require moderate attacker efforts. Fortunately, we are able to describe software-based countermeasures as effective improvements to retrofit the examined devices. To improve the security and trustworthiness of future security tokens, we also derive important general design recommendations.

2021

TCHES

Side-Channel Protections for Picnic Signatures 📺 Abstract

Diego F. Aranha Sebastian Berndt Thomas Eisenbarth Okan Seker Akira Takahashi Luca Wilke Greg Zaverucha

We study masking countermeasures for side-channel attacks against signature schemes constructed from the MPC-in-the-head paradigm, specifically when the MPC protocol uses preprocessing. This class of signature schemes includes Picnic, an alternate candidate in the third round of the NIST post-quantum standardization project. The only previously known approach to masking MPC-in-the-head signatures suffers from interoperability issues and increased signature sizes. Further, we present a new attack to demonstrate that known countermeasures are not sufficient when the MPC protocol uses a preprocessing phase, as in Picnic3.We overcome these challenges by showing how to mask the underlying zero-knowledge proof system due to Katz–Kolesnikov–Wang (CCS 2018) for any masking order, and by formally proving that our approach meets the standard security notions of non-interference for masking countermeasures. As a case study, we apply our masking technique to Picnic. We then implement different masked versions of Picnic signing providing first order protection for the ARM Cortex M4 platform, and quantify the overhead of these different masking approaches. We carefully analyze the side-channel risk of hashing operations, and give optimizations that reduce the CPU cost of protecting hashing in Picnic by a factor of five. The performance penalties of the masking countermeasures ranged from 1.8 to 5.5, depending on the degree of masking applied to hash function invocations.

2021

TCHES

Speed Reading in the Dark: Accelerating Functional Encryption for Quadratic Functions with Reprogrammable Hardware 📺 Abstract

Milad Bahadori Kimmo Järvinen Tilen Marc Miha Stopar

Functional encryption is a new paradigm for encryption where decryption does not give the entire plaintext but only some function of it. Functional encryption has great potential in privacy-enhancing technologies but suffers from excessive computational overheads. We introduce the first hardware accelerator that supports functional encryption for quadratic functions. Our accelerator is implemented on a reprogrammable system-on-chip following the hardware/software codesign methogology. We benchmark our implementation for two privacy-preserving machine learning applications: (1) classification of handwritten digits from the MNIST database and (2) classification of clothes images from the Fashion MNIST database. In both cases, classification is performed with encrypted images. We show that our implementation offers speedups of over 200 times compared to a published software implementation and permits applications which are unfeasible with software-only solutions.

2021

TCHES

Structural Attack (and Repair) of Diffused-Input-Blocked-Output White-Box Cryptography 📺 Abstract

Claude Carlet Sylvain Guilley Sihem Mesnager

In some practical enciphering frameworks, operational constraints may require that a secret key be embedded into the cryptographic algorithm. Such implementations are referred to as White-Box Cryptography (WBC). One technique consists of the algorithm’s tabulation specialized for its key, followed by obfuscating the resulting tables. The obfuscation consists of the application of invertible diffusion and confusion layers at the interface between tables so that the analysis of input/output does not provide exploitable information about the concealed key material.Several such protections have been proposed in the past and already cryptanalyzed thanks to a complete WBC scheme analysis. In this article, we study a particular pattern for local protection (which can be leveraged for robust WBC); we formalize it as DIBO (for Diffused-Input-Blocked-Output). This notion has been explored (albeit without having been nicknamed DIBO) in previous works. However, we notice that guidelines to adequately select the invertible diffusion ∅and the blocked bijections B were missing. Therefore, all choices for ∅ and B were assumed as suitable. Actually, we show that most configurations can be attacked, and we even give mathematical proof for the attack. The cryptanalysis tool is the number of zeros in a Walsh-Hadamard spectrum. This “spectral distinguisher” improves on top of the previously known one (Sasdrich, Moradi, Güneysu, at FSE 2016). However, we show that such an attack does not work always (even if it works most of the time).Therefore, on the defense side, we give a straightforward rationale for the WBC implementations to be secure against such spectral attacks: the random diffusion part ∅ shall be selected such that the rank of each restriction to bytes is full. In AES’s case, this seldom happens if ∅ is selected at random as a linear bijection of F322. Thus, specific care shall be taken. Notice that the entropy of the resulting ∅ (suitable for WBC against spectral attacks) is still sufficient to design acceptable WBC schemes.

2021

TCHES

The SPEEDY Family of Block Ciphers: Engineering an Ultra Low-Latency Cipher from Gate Level for Secure Processor Architectures 📺 Abstract

Gregor Leander Thorben Moos Amir Moradi Shahram Rasoolzadeh

We introduce SPEEDY, a family of ultra low-latency block ciphers. We mix engineering expertise into each step of the cipher’s design process in order to create a secure encryption primitive with an extremely low latency in CMOS hardware. The centerpiece of our constructions is a high-speed 6-bit substitution box whose coordinate functions are realized as two-level NAND trees. In contrast to other low-latency block ciphers such as PRINCE, PRINCEv2, MANTIS and QARMA, we neither constrain ourselves by demanding decryption at low overhead, nor by requiring a super low area or energy. This freedom together with our gate- and transistor-level considerations allows us to create an ultra low-latency cipher which outperforms all known solutions in single-cycle encryption speed. Our main result, SPEEDY-6-192, is a 6-round 192-bit block and 192-bit key cipher which can be executed faster in hardware than any other known encryption primitive (including Gimli in Even-Mansour scheme and the Orthros pseudorandom function) and offers 128-bit security. One round more, i.e., SPEEDY-7-192, provides full 192-bit security. SPEEDY primarily targets hardware security solutions embedded in high-end CPUs, where area and energy restrictions are secondary while high performance is the number one priority.

2021

TCHES

Time-Memory Analysis of Parallel Collision Search Algorithms 📺 Abstract

Monika Trimoska Sorina Ionica Gilles Dequen

Parallel versions of collision search algorithms require a significant amount of memory to store a proportion of the points computed by the pseudo-random walks. Implementations available in the literature use a hash table to store these points and allow fast memory access. We provide theoretical evidence that memory is an important factor in determining the runtime of this method. We propose to replace the traditional hash table by a simple structure, inspired by radix trees, which saves space and provides fast look-up and insertion. In the case of many-collision search algorithms, our variant has a constant-factor improved runtime. We give benchmarks that show the linear parallel performance of the attack on elliptic curves discrete logarithms and improved running times for meet-in-the-middle applications.

2021

TCHES

Timing Black-Box Attacks: Crafting Adversarial Examples through Timing Leaks against DNNs on Embedded Devices 📺 Abstract

Tsunato Nakai Daisuke Suzuki Takeshi Fujino

Deep neural networks (DNNs) have been applied to various industries. In particular, DNNs on embedded devices have attracted considerable interest because they allow real-time and distributed processing on site. However, adversarial examples (AEs), which add small perturbations to the input data of DNNs to cause misclassification, are serious threats to DNNs. In this paper, a novel black-box attack is proposed to craft AEs based only on processing time, i.e., the side-channel leaks from DNNs on embedded devices. Unlike several existing black-box attacks that utilize output probability, the proposed attack exploits the relationship between the number of activated nodes and processing time without using training data, model architecture, parameters, substitute models, or output probability. The perturbations for AEs are determined by the differential processing time based on the input data of the DNNs in the proposed attack. The experimental results show that the AEs of the proposed attack effectively cause an increase in the number of activated nodes and the misclassification of one of the incorrect labels against the DNNs on a microcontroller unit. Moreover, these results indicate that the attack can evade gradient-masking and confidence reduction countermeasures, which conceal the output probability, to prevent the crafting of AEs against several black-box attacks. Finally, the countermeasures against the attack are implemented and evaluated to clarify that the implementation of an activation function with data-dependent timing leaks is the cause of the proposed attack.

2021

TCHES

Yoroi: Updatable Whitebox Cryptography 📺 Abstract

Yuji Koike Takanori Isobe

Whitebox cryptography aims to provide security in the whitebox setting where the adversary has unlimited access to the implementation and its environment. In order to ensure security in the whitebox setting, it should prevent key extraction attacks and code-lifting attacks, in which the adversary steals the original cryptographic implementation instead of the key, and utilizes it as a big key. Although recent published ciphers such as SPACE, SPNbox, and Whiteblock successfully achieve security against the key extraction attacks, they only provide mitigation of codelifting attack by the so-called space hardness and incompressibility properties of the underlying tables as the space-hard/incompressible table might be eventually stolen by continuous leakage. The complete prevention of such attacks may need to periodically update the secret key. However, that entails high costs and might introduce an additional vulnerability into the system due to the necessity for the reencryption of all data by the updated key. In this paper, we introduce a new property, denominated longevity, for whitebox cryptography. This property enhances security against code-lifting attacks with continuous leakage by updating incompressible tables instead of the secret key. We propose a family of new whitebox-secure block ciphers Yoroi that has the longevity property in addition to the space hardness. By updating its implementation periodically, Yoroi provides constant security against code-lifting attacks without key updating. Moreover, the performance of Yoroi is competitive with existing ciphers implementations in the blackbox and whitebox context.

International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB