Amir Moradi

CryptoDB

Amir Moradi

Publications and invited talks

Year

Venue

Title

2025

TCHES

Design and Implementation of a Physically Secure Open-Source FPGA and Toolchain Abstract

Sergej Meschkov Daniel Lammers Mehdi B. Tahoori Amir Moradi

The increasing prevalence of security breaches highlights the importanceof robust hardware security measures. Among these breaches, physical attacks– such as Side-Channel Analysis ( SCA) and Fault Injection (FI ) attacks – posea significant challenge for security-sensitive applications. To ensure robust systemsecurity throughout its lifecycle, hardware security updates are indispensable alongsidesoftware security patches. Programmable hardware plays a pivotal role in establishinga robust hardware root-of-trust, serving to effectively mitigate various hardwaresecurity threats. In this paper, we propose a methodology for the design of areconfigurable fabric and the corresponding mapping toolchain, specifically tailoredto hardware security. This approach offers resistance to various malicious physicalattacks, including SCA and FI , addressing each threat individually. As a case study,we propose a resulting fabric that implements a combination of first-order BooleanMasking and hiding countermeasures to provide strong protection against SCA attacksand enables the detection of fault injection attempts. In particular, we present howreconfigurable secure gadgets can be realized employing a reformed variant of theLUT-based Masked Dual-Rail with Pre-charge Logic (LMDPL) hardware maskingscheme and a modified version of Wave Dynamic Differential Logic ( WDDL) tobe composed into a fabric. We also show how any basic Hardware DescriptionLanguage ( HDL) design is automatically mapped to the primitives of our fabric,embedding provable hardware security, and bypassing the necessity for hardwaresecurity proficiency in this process. It is worth mentioning that our fabric requiresapproximately 85% less area to map a secure design compared to conventional FieldProgrammable Gate Arrays ( FPGAs). A practical security evaluation of our securefabric implementation on a real FPGA target board, using Test Vector LeakageAssessment (TVLA), demonstrated no SCA leakage over 100 million traces.

2025

TCHES

Constant-Cycle Hardware Private Circuits Abstract

Daniel Lammers Nicolai Müller Siemen Dhooghe Amir Moradi

The efficient implementation of Boolean masking with minimal overhead in terms of latency has become a critical topic due to the increasing demand for physically secure yet high-performance cryptographic primitives. However, achieving low latency in masked circuits while ensuring that glitches and transitions do not compromise their security remains a significant challenge. State-of-the-art multiplication gadgets, such as the recently introduced HPC4 (CHES 2024), offer composable security against glitches and transitions, as proven under the robust d-probing model. However, these gadgets require at least one clock cycle per computation, resulting in a latency overhead that increases with the algebraic degree. In contrast, LMDPL gadgets (CHES 2014 & CHES 2020) can achieve fixed latency independent of the algebraic degree, effectively addressing this issue. However, they are limited to two shares, and extending them to guarantee composable security at order d with d + 1 shares is considered an open challenge.In this work, we introduce Constant-Cycle Hardware Private Circuits (CCHPC), a novel hardware masking scheme built on the concept of LUT-based Masked Dual-Rail with Pre-charge Logic (LMDPL). Specifically, CCHPC achieves a fixed latency of d clock cycles by masking a Boolean function of arbitrary algebraic degree with d + 1 shares. CCHPC gadgets are secure and trivially composable, as formally proven under the Robust but Relaxed d-probing model (CHES 2024). Using CCHPC gadgets, we design a masked Advanced Encryption Standard (AES) encryption core which can be instantiated for an arbitrary number of d + 1 shares with a total latency of 11 + d clock cycles.

2025

TCHES

Fault Injection Evaluation with Statistical Analysis: How to Deal with Nearly Fabricated Large Circuits Abstract

Felix Uhle Nicolai Müller Amir Moradi

A critical aspect of securing cryptographic hardware is their resistance to Fault Injection (FI) attacks, which involve the successful injection of faults into the system in operation. Specifically, a hardware design must be resilient to wellestablished fault injection techniques, including voltage or clock glitching, laser fault injections, and the more recently introduced Electromagnetic Fault Injection (EMFI). Ideally, the protection level must be verified before the chip is fabricated. Although initial efforts to verify the resistance of hardware designs against fault injection have been made, analyzing the security of practical designs with realistic gate counts under fault injections that affect multiple gates or the entire circuit state remains a significant challenge. This scenario, however, is considered more realistic than assessing resistance to a fixed, relatively small number of faults. In this work, we introduce FIESTA, a versatile automated framework for analyzing the resistance of hardware circuits under the general random fault model. By leveraging a nonexhaustive approach, FIESTA is capable of evaluating larger designs compared to state-of-the-art tools, while maintaining a reasonable level of confidence. FIESTA supports various adversary models, allowing customized resistance analysis against specific adversaries. In particular, we present a concrete procedure for evaluating more realistic precise adversaries, based on practical observations. Using FIESTA, we assessed the resistance of several (protected) Advanced Encryption Standard (AES) cores.

2024

TCHES

JustSTART: How to Find an RSA Authentication Bypass on Xilinx UltraScale(+) with Fuzzing Abstract

Maik Ender Felix Hahn Marc Fyrbiak Amir Moradi Christof Paar

Fuzzing is a well-established technique in the software domain to uncover bugs and vulnerabilities. Yet, applications of fuzzing for security vulnerabilities in hardware systems are scarce, as principal reasons are requirements for design information access, i.e., HDL source code. Moreover, observation of internal hardware state during runtime is typically an ineffective information source, as its documentation is often not publicly available. In addition, such observation during runtime is also inefficient due to bandwidth-limited analysis interfaces, i.e., JTAG, and minimal introspection of hardware-internal modules.In this work, we investigate fuzzing for Xilinx 7-Series and UltraScale(+) FPGA configuration engines, the control plane governing the (secure) bitstream configuration within the FPGA. Our goal is to examine the effectiveness of fuzzing to analyze and document the opaque inner workings of FPGA configuration engines, with a primary emphasis on identifying security vulnerabilities. Using only the publicly available hardware chip and dispersed documentation, we first design and implement ConFuzz, an advanced FPGA configuration engine fuzzing and rapid prototyping framework. Based on our detailed understanding of the bitstream file format, we then systematically define 3 novel key fuzzing strategies for Xilinx FPGA configuration engines. Moreover, our strategies are executed through mutational structure-aware fuzzers and incorporate various novel custom-tailored, FPGA-specific optimizations to reduce search space. Our evaluation reveals previously undocumented behavior within the configuration engine, including critical findings such as system crashes leading to unresponsive states of the whole FPGA. In addition, our investigations not only lead to the rediscovery of the recent starbleed attack but also uncover a novel unpatchable vulnerability, denoted as JustSTART (CVE-2023-20570), capable of circumventing RSA authentication for Xilinx UltraScale(+). Note that we also discuss effective countermeasures by secure FPGA settings to prevent aforementioned attacks.

2024

CIC

Randomness Generation for Secure Hardware Masking – Unrolled Trivium to the Rescue Abstract

Gaëtan Cassiers Loïc Masure Charles Momin Thorben Moos Amir Moradi François-Xavier Standaert

<p>Masking is a prominent strategy to protect cryptographic implementations against side-channel analysis. Its popularity arises from the exponential security gains that can be achieved for (approximately) quadratic resource utilization. Many variants of the countermeasure tailored for different optimization goals have been proposed. The common denominator among all of them is the implicit demand for robust and high entropy randomness. Simply assuming that uniformly distributed random bits are available, without taking the cost of their generation into account, leads to a poor understanding of the efficiency vs. security tradeoff of masked implementations. This is especially relevant in case of hardware masking schemes which are known to consume large amounts of random bits per cycle due to parallelism. Currently, there seems to be no consensus on how to most efficiently derive many pseudo-random bits per clock cycle from an initial seed and with properties suitable for masked hardware implementations. In this work, we evaluate a number of building blocks for this purpose and find that hardware-oriented stream ciphers like Trivium and its reduced-security variant Bivium B outperform most competitors when implemented in an unrolled fashion. Unrolled implementations of these primitives enable the flexible generation of many bits per cycle, which is crucial for satisfying the large randomness demands of state-of-the-art masking schemes. According to our analysis, only Linear Feedback Shift Registers (LFSRs), when also unrolled, are capable of producing long non-repetitive sequences of random-looking bits at a higher rate per cycle for the same or lower cost as Trivium and Bivium B. Yet, these instances do not provide black-box security as they generate only linear outputs. We experimentally demonstrate that using multiple output bits from an LFSR in the same masked implementation can violate probing security and even lead to harmful randomness cancellations. Circumventing these problems, and enabling an independent analysis of randomness generation and masking, requires the use of cryptographically stronger primitives like stream ciphers. As a result of our studies, we provide an evidence-based estimate for the cost of securely generating $n$ fresh random bits per cycle. Depending on the desired level of black-box security and operating frequency, this cost can be as low as $20n$ to $30n$ ASIC gate equivalents (GE) or $3n$ to $4n$ FPGA look-up tables (LUTs), where $n$ is the number of random bits required. Our results demonstrate that the cost per bit is (sometimes significantly) lower than estimated in previous works, incentivizing parallelism whenever exploitable. This provides further motivation to potentially move low randomness usage from a primary to a secondary design goal in hardware masking research. </p>

2024

TCHES

A Deep Analysis of two Glitch-Free Hardware Masking Schemes SESYM and LMDPL Abstract

Nicolai Müller Daniel Lammers Amir Moradi

In the context of masking, which is the dominant technique for protecting cryptographic hardware designs against Side-Channel Analysis (SCA) attacks, the focus has long been on the design of masking schemes that guarantee provable security in the presence of glitches. Unfortunately, achieving this comes at the cost of increased latency, since registers are required to stop glitch propagation. Previous work has attempted to reduce latency by eliminating registers, but the exponential increase in area makes such approaches impractical. Some relatively new attempts have used Dual-Rail Pre-charge (DRP) logic styles to avoid glitches in algorithmically masked circuits. Promising approaches in this area include LUT-based Masked Dual-Rail with Pre-charge Logic (LMDPL) and Self-Synchronized Masking (SESYM), presented at CHES 2020 and CHES 2022 respectively. Both schemes allow masking of arbitrary functions with only one cycle latency. However, even if glitches no longer occur, there are other physical defaults that may violate the security of a glitch-free masked circuit. The imbalanced delay of dual rails is a known security problem for DRP logic styles such as Wave Dynamic Differential Logic (WDDL), but is not covered by the known security models, e.g., robust probing model.In this work, we illustrate that imbalanced signal delays pose a threat to the security of algorithmically masked circuits implemented with DRP logic, both in theory and practice. Notably, we underscore the security of LMDPL even when delays are taken into account, contrasting with the vulnerability observed in SESYM under similar conditions. Consequently, our findings highlight the critical importance of addressing imbalanced delays in the design of masked circuits using DRP logic. In particular, our findings motivate the need for an appropriate security model, and imply that relying solely on the probing security model and avoiding glitches may be insufficient to construct secure circuits.

2024

TCHES

Automated Generation of Fault-Resistant Circuits Abstract

Nicolai Müller Amir Moradi

Fault Injection (FI) attacks, which involve intentionally introducing faults into a system to cause it to behave in an unintended manner, are widely recognized and pose a significant threat to the security of cryptographic primitives implemented in hardware, making fault tolerance an increasingly critical concern. However, protecting cryptographic hardware primitives securely and efficiently, even with wellestablished and documented methods such as redundant computation, can be a timeconsuming, error-prone, and expertise-demanding task. In this research, we present a comprehensive and fully-automated software solution for the Automated Generation of Fault-Resistant Circuits (AGEFA). Our application employs a generic and extensively researched methodology for the secure integration of countermeasures based on Error-Correcting Codes (ECCs) into cryptographic hardware circuits. Our software tool allows designers without hardware security expertise to develop fault-tolerant hardware circuits with pre-defined correction capabilities under a comprehensive fault adversary model. Moreover, our tool applies to masked designs without violating the masking security requirements, in particular to designs generated by the tool AGEMA. We evaluate the effectiveness of our approach through experiments on various block ciphers and demonstrate its ability to produce fault-tolerant circuits. Additionally, we assess the security of examples generated by AGEFA against Side-Channel Analysis (SCA) and FI using state-of-the-art leakage and fault evaluation tools.

2024

TCHES

PoMMES: Prevention of Micro-architectural Leakages in Masked Embedded Software Abstract

Jannik Zeitschner Amir Moradi

Software solutions to address computational challenges are ubiquitous in our daily lives. One specific application area where software is often used is in embedded systems, which, like other digital electronic devices, are vulnerable to side-channel analysis attacks. Although masking is the most common countermeasure and provides a solid theoretical foundation for ensuring security, recent research has revealed a crucial gap between theoretical and real-world security. This shortcoming stems from the micro-architectural effects of the underlying micro-processor. Common security models used to formally verify masking schemes such as the d-probing model fully ignore the micro-architectural leakages that lead to a set of instructions that unintentionally recombine the shares. Manual generation of masked assembly code that remains secure in the presence of such micro-architectural recombinations often involves trial and error, and is non-trivial even for experts.Motivated by this, we present PoMMES, which enables inexperienced software developers to automatically compile masked functions written in a high-level programming language into assembly code, while preserving the theoretically proven security in practice. Compared to the state of the art, based on a general model for microarchitectural effects, our scheme allows the generation of practically secure masked software at arbitrary security orders for in-order processors. The major contribution of PoMMES is its micro-architecture aware register allocation algorithm, which is one of the crucial steps during the compilation process. In addition to simulation-based assessments that we conducted by open-source tools dedicated to evaluating masked software implementations, we confirm the effectiveness of the PoMMES-generated codes through experimental analysis. We present the result of power consumption based leakage assessments of several case studies running on a Cortex M0+ micro-controller, which is commonly deployed in industry.

2024

TCHES

Another Evidence to not Employ Customized Masked Hardware: Identifying and Fixing Flaws in SCARV Abstract

Felix Uhle Florian Stolz Amir Moradi

As a well-studied countermeasure against side-channel analysis attacks, there is a general interest in applying masking to different cryptographic functions executed on different platforms. On the one hand, despite their high performance, masked hardware implementations are dedicated to specific algorithms, making them inflexible. On the other hand, applying masking on software involves serious challenges including significant overhead in terms of efficiency and difficulties to maintain theoretical security guarantees in practice. As a result, a line of research has been devoted to enable masked operations in flexible platforms (i.e., microprocessors) by including some masked modules in their hardware, hence a combination of flexibility and performance. In such scenarios, RISC-V is a natural choice as hardware can be adjusted to the extended instruction set. One such attempt presented at CHES 2021 is known as SCARV, which extends the Instruction Set Architecture (ISA) of a RISC-V core with a rich number of first-order masked operations on both Boolean and arithmetic masked operands. In this work, we conduct a comprehensive analysis of SCARV. Instead of relying on empirical measurements to demonstrate security, we utilize tool-assisted evaluations. Through these evaluations, we identified a couple of design flaws that lead to leakage in the masked implementations hosted by the corresponding processor. These flaws are primarily due to the lack of composability of cascaded components. While heuristic and ad-hoc design principles can result in secure, small, and efficient designs, they lack formal security proofs, which may lead to security flaws, like those we report here. Consequently, this work serves as a motivation for using composable masked modules and tool-assisted evaluations when constructing complex circuits.

2024

TCHES

Robust but Relaxed Probing Model Abstract

Nicolai Müller Amir Moradi

Masking has become a widely applied and heavily researched method to protect cryptographic implementations against Side-Channel Analysis (SCA) attacks. The success of masking is primarily attributed to its strong theoretical foundation enabling it to formally prove security by modeling physical properties through socalled probing models. Specifically, the robust d-probing model enables us to prove the security for arbitrarily masked hardware circuits, manually or with the assistance of automated tools, even when considering the imperfect nature of physical hardware, including the occurrence of physical defaults such as glitches. However, the generic strategy employed by the robust d-probing model comes with a downside: It tends to over-conservatively model the information leakage caused by glitches meaning that the robust d-probing model considers glitches that can never occur in practice. This implies that in theory, an adversary could gain more information than she would obtain in practice. From a designer’s perspective, this entails that (1) securely designed hardware circuits may need to be withdrawn due to potential insecurity under the robust d-probing model and (2) designs that satisfy the security requirements of the robust d-probing model may incur unnecessary overhead, such as increased circuit size or latency.In this work, we refine the formal treatment of glitches within the robust d-probing model to address glitches more accurately within a formal adversary model. Unlike the robust d-probing model, our approach considers glitches based on the operations performed and the data processed, ensuring that only manifesting glitches are accounted for. As a result, we introduce the Robust but Relaxed (RR) d-probing model, a formal adversary model maintaining the same level of security as the robust d-probing model but without the overly conservative treatment of glitches. Leveraging our new model, we prove the security of LUT-based Masked Dual-Rail with Pre-charge Logic (LMDPL) gadgets, a class of physically secure gadgets reported as insecure based on the robust d-probing model. We provide manual proofs and automated security evaluations employing an updated version of PROLEAD capable of verifying the security of masked circuits under our new model.

2024

TCHES

Static Leakage in Dual-Rail Precharge Logics Abstract

Bijan Fadaeinia Thorben Moos Amir Moradi

In recent research studies, an observable dependency has been found between the static power consumption of a Complementary Metal-Oxide-Semiconductor (CMOS) chip and its internally stored and processed data. For the most part, these studies have focused on utilizing the leakage currents as a side channel to conduct key-recovery attacks on cryptographic devices. There are two main reasons why information leakage through the static power side channel is considered particularly harmful for the security of implementations, namely 1) the low influence of noise due to averaging over time and 2) the ability to target secrets even outside of the time window that they are actively computed upon (data is leaked for as long as it is saved anywhere in the circuit). Hence, developing effective countermeasures against this threat is of significant importance for the security of cryptographic hardware. Hiding techniques known as Dual-Rail Precharge (DRP) logic have been proposed and studied in literature as an instrument to equalize a circuit’s dynamic power consumption irrespective of the processed data. The specific instance called improved Masked Dual-Rail Precharge Logic (iMDPL) is – despite its high overhead – known as one of the most potent and attractive DRP-based Side-Channel Analysis (SCA) countermeasures. While its ability to prevent data extraction through the dynamic power consumption is well studied and documented, we thoroughly analyze its susceptibility to Static Power Side-Channel Analysis (SPSCA) attacks in this work. To conduct our study we have taped-out a custom Application-Specific Integrated Circuit (ASIC) prototype in 65nm CMOS technology which contains multiple cryptographic co-processors protected by iMDPL, partially combined with other countermeasures. Additionally, it contains circuits protected by a new variant of iMDPL that we specifically hardened against SPSCA, which we call Static Robust iMDPL (SRiMDPL). Our careful experiments performed in a controlled environment under exploitation of voltage and temperature dependencies show that SRiMDPL circuits combined with modern hardware masking offer an extremely high level of security against both dynamic and static power SCA attacks. While the cost of such combinations is admittedly significant (≈ 108 kGE post-layout area for a corresponding PRESENT core), we obtain the strongest combined resistance to both power side channels that has been experimentally demonstrated on real silicon so far. In summary, we believe that our analysis can assist hardware designers in making important decisions on the trade-offs between cost and security that such countermeasures facilitate.

2023

TCHES

PROLEAD_SW: Probing-Based Software Leakage Detection for ARM Binaries Abstract

Jannik Zeitschner Nicolai Müller Amir Moradi

A decisive contribution to the all-embracing protection of cryptographic software, especially on embedded devices, is the protection against Side-Channel Analysis (SCA) attacks. Masking countermeasures can usually be integrated into the software during the design phase. In theory, this should provide reliable protection against such physical attacks. However, the correct application of masking is a non-trivial task that often causes even experts to make mistakes. In addition to human-caused errors, micro-architectural Central Processing Unit (CPU) effects can lead even a seemingly theoretically correct implementation to fail to satisfy the desired level of security in practice. This originates from different components of< the underlying CPU which complicates the tracing of leakage back to a particular source and hence avoids making general and device-independent statements about its security.PROLEAD has recently been presented at CHES 2022 and has originally been developed as a simulation-based tool to evaluate masked hardware designs. In this work, we adapt PROLEAD for the evaluation of masked software, and enable the transfer of the already known benefits of PROLEAD into the software world. These include (1) evaluation of larger designs compared to the state of the art, e.g. a full Advanced Encryption Standard (AES) masked implementation, and (2) formal verification under our new generic leakage model for CPUs. Concretely, we formalize leakages, observed across different CPU architectures, into a generic abstraction model that includes all these leakages and is therefore independent of a specific CPU design. Our resulting tool PROLEAD_SW allows to provide a formal statement on the security based on the derived generic model. As a concrete result, using PROLEAD_SW we evaluated the security of several publicly available masked software implementations in our new generic leakage model and reveal multiple vulnerabilities.

2023

TCHES

Deep Learning Side-Channel Collision Attack Abstract

Marvin Staib Amir Moradi

With the breakthrough of Deep Neural Networks, many fields benefited from its enormously increasing performance. Although there is an increasing trend to utilize Deep Learning (DL) for Side-Channel Analysis (SCA) attacks, previous works made specific assumptions for the attack to work. Especially the concept of template attacks is widely adapted while not much attention was paid to other attack strategies. In this work, we present a new methodology, that is able to exploit side-channel collisions in a black-box setting. In particular, our attack is performed in a non-profiled setting and requires neither a hypothetical power model (or let’s say a many-to-one function) nor details about the underlying implementation. While the existing non-profiled DL attacks utilize training metrics to distinguish the correct key, our attack is more efficient by training a model that can be applied to recover multiple key portions, e.g., bytes. In order to perform our attack on raw traces instead of pre-selected samples, we further introduce a DL-based technique that can localize input-dependent leakages in masked implementations, e.g., the leakages associated to one byte of the cipher state in case of AES. We validated our approach by targeting several publicly available power consumption datasets measured from implementations protected by different masking schemes. As a concrete example, we demonstrate how to successfully recover the key bytes of the ASCAD dataset with only a single trained model in a non-profiled setting.

2022

TCHES

Generic Hardware Private Circuits: Towards Automated Generation of Composable Secure Gadgets Abstract

David Knichel Pascal Sasdrich Amir Moradi

With an increasing number of mobile devices and their high accessibility, protecting the implementation of cryptographic functions in the presence of physical adversaries has become more relevant than ever. Over the last decade, a lion’s share of research in this area has been dedicated to developing countermeasures at an algorithmic level. Here, masking has proven to be a promising approach due to the possibility of formally proving the implementation’s security solely based on its algorithmic description by elegantly modeling the circuit behavior. Theoretically verifying the security of masked circuits becomes more and more challenging with increasing circuit complexity. This motivated the introduction of security notions that enable masking of single gates while still guaranteeing the security when the masked gates are composed. Systematic approaches to generate these masked gates – commonly referred to as gadgets – were restricted to very simple gates like 2-input AND gates. Simply substituting such small gates by a secure gadget usually leads to a large overhead in terms of fresh randomness and additional latency (register stages) being introduced to the design.In this work, we address these problems by presenting a generic framework to construct trivially composable and secure hardware gadgets for arbitrary vectorial Boolean functions, enabling the transformation of much larger sub-circuits into gadgets. In particular, we present a design methodology to generate first-order secure masked gadgets which is well-suited for integration into existing Electronic Design Automation (EDA) tools for automated hardware masking as only the Boolean function expression is required. Furthermore, we practically verify our findings by conducting several case studies and show that our methodology outperforms various other masking schemes in terms of introduced latency or fresh randomness – especially for large circuits.

2022

TCHES

Automated Generation of Masked Hardware Abstract

David Knichel Amir Moradi Nicolai Müller Pascal Sasdrich

Masking has been recognized as a sound and secure countermeasure for cryptographic implementations, protecting against physical side-channel attacks. Even though many different masking schemes have been presented over time, design and implementation of protected cryptographic Integrated Circuits (ICs) remains a challenging task. More specifically, correct and efficient implementation usually requires manual interactions accompanied by longstanding experience in hardware design and physical security. To this end, design and implementation of masked hardware often proves to be an error-prone task for engineers and practitioners. As a result, our novel tool for automated generation of masked hardware (AGEMA) allows even inexperienced engineers and hardware designers to create secure and efficient masked cryptograhic circuits originating from an unprotected design. More precisely, exploiting the concepts of Probe-Isolating Non-Interference (PINI) for secure composition of masked circuits, our tool provides various processing techniques to transform an unprotected design into a secure one, eventually accelerating and safeguarding the process of masking cryptographic hardware. Ultimately, we evaluate our tool in several case studies, emphasizing different trade-offs for the transformation techniques with respect to common performance metrics, such as latency, area, andrandomness.

2022

TCHES

Cryptanalysis of Efficient Masked Ciphers: Applications to Low Latency Abstract

Tim Beyne Siemen Dhooghe Amir Moradi Aein Rezaei Shahmirzadi

This work introduces second-order masked implementation of LED, Midori, Skinny, and Prince ciphers which do not require fresh masks to be updated at every clock cycle. The main idea lies on a combination of the constructions given by Shahmirzadi and Moradi at CHES 2021, and the theory presented by Beyne et al. at Asiacrypt 2020. The presented masked designs only use a minimal number of shares, i.e., three to achieve second-order security, and we make use of a trick to pair a couple of S-boxes to reduce their latency. The theoretical security analyses of our constructions are based on the linear-cryptanalytic properties of the underlying masked primitive as well as SILVER, the leakage verification tool presented at Asiacrypt 2020. To improve this cryptanalytic analysis, we use the noisy probing model which allows for the inclusion of noise in the framework of Beyne et al. We further provide FPGA-based experimental security analysis confirming second-order protection of our masked implementations.

2022

TCHES

Transitional Leakage in Theory and Practice: Unveiling Security Flaws in Masked Circuits Abstract

Nicolai Müller David Knichel Pascal Sasdrich Amir Moradi

Accelerated by the increased interconnection of highly accessible devices, the demand for effective and efficient protection of hardware designs against Side-Channel Analysis (SCA) is ever rising, causing its topical relevance to remain immense in both, academia and industry. Among a wide range of proposed countermeasures against SCA, masking is a highly promising candidate due to its sound foundations and well-understood security requirements. In addition, formal adversary models have been introduced, aiming to accurately capture real-world attack scenarios while remaining sufficiently simple to efficiently reason about the SCA resilience of designs. Here, the d-probing model is the most prominent and well-studied adversary model. Its extension, introduced as the robust d-probing model, covers physical defaults occurring in hardware implementations, particularly focusing on combinational recombinations (glitches), memory recombinations (transitions), and routing recombinations (coupling).With increasing complexity of modern cryptographic designs and logic circuits, formal security verification becomes ever more cumbersome. This started to spark innovative research on automated verification frameworks. Unfortunately, these verification frameworks mostly focus on security verification of hardware circuits in the presence of glitches, but remain limited in identification and verification of transitional leakage. To this end, we extend SILVER, a recently proposed tool for formal security verification of masked logic circuits, to also detect and verify information leakage resulting from combinations of glitches and transitions. Based on extensive case studies, we further confirm the accuracy and practical relevance of our methodology when assessing and verifying information leakage in hardware implementations.

2022

TCHES

Composable Gadgets with Reused Fresh Masks: First-Order Probing-Secure Hardware Circuits with only 6 Fresh Masks Abstract

David Knichel Amir Moradi

Albeit its many benefits, masking cryptographic hardware designs has proven to be a non-trivial and error-prone task, even for experienced engineers. Masked variants of atomic logic gates, like AND or XOR – commonly referred to as gadgets – aim to facilitate the process of masking large circuits by offering free composition while sustaining the overall design’s security in the d-probing adversary model. A wide variety of research has already been conducted to (i) find formal properties a gadget must fulfill to guarantee composability and (ii) construct gadgets that fulfill these properties, while minimizing overhead requirements. In all existing composition frameworks like NI/SNI/PINI and all corresponding gadget realizations, the security argument relies on the fact that each gadget requires individual fresh randomness. Naturally, this approach leads to very high randomness requirements of the resulting composed circuit. In this work, we present composable gadgets with reused fresh masks (COMAR), allowing the composition of any first-order secure hardware circuit utilizing only 6 fresh masks in total. By construction, our newly presented gadgets render individual fresh randomness unnecessary, while retaining free composition and first-order security in the robust probing model. More precisely, we give an instantiation of gadgets realizing arbitrary XOR and AND gates with an arbitrary number of inputs which can be trivially extended to all basic logic gates. With these, we break the linear dependency between the number of (non-linear) gates in a circuit and the randomness requirements, hence offering the designers the possibility to highly optimize a masked circuit’s randomness requirements while keeping error susceptibility to a minimum.

2022

TCHES

Beware of Insufficient Redundancy: An Experimental Evaluation of Code-based FI Countermeasures Abstract

Timo Bartkewitz Sven Bettendorf Thorben Moos Amir Moradi Falk Schellenberg

Fault injection attacks pose a serious threat to cryptographic implementations. Countermeasures beyond sensors and shields usually deploy some form of redundancy to detect or even correct errors. A few years ago, a novel design methodology called Impeccable Circuits has been introduced on how to correctly integrate Concurrent Error Detection (CED) schemes, based on Error-Detection Codes (EDCs), into cryptographic hardware circuits. The underlying adversary model limits attackers to inject at most t single-bit faults. By additionally considering the propagation of faults in combinational circuits, the countermeasure guarantees detection of any faulty computation caused by up to t single-bit faults.In this work, we present an experimental analysis of the Impeccable Circuits countermeasure and its underlying assumptions in modern semiconductor technology. More precisely, we have taken hardware implementations of the lightweight block cipher SKINNY equipped with various forms of the EDC-based CED schemes and realized them as cryptographic co-processors on a 40nm ASIC to experimentally evaluate their resistance to Laser Fault Injection (LFI) attacks. In short, our results show that it is fairly simple to overcome the protection offered by the integrated countermeasures when the length of the code n is smaller than twice its rank k (i.e., no full redundancy). This is not caused by any flaw in the underlying design methodology or concept, but merely demonstrates how easily the defined adversary model can be overcome. In our case, a standard black-box scan over the target using a common single-shot LFI setup is sufficient to occasionally inject more single-bit faults than those bounded by the underlying adversary model when n < 2k. The probability of such events proved to be large enough to perform successful key-recovery attacks via Differential Fault Analysis (DFA) in a matter of hours. Thus, we caution against limiting the redundancy in code-based FI countermeasures to less than the number of bits per word, especially in nanometer technologies, and point out that less-complex countermeasures like duplication showed a higher level of resistance in our experiments at a lower cost.

2022

TCHES

Randomness Optimization for Gadget Compositions in Higher-Order Masking Abstract

Jakob Feldtkeller David Knichel Pascal Sasdrich Amir Moradi Tim Güneysu

Physical characteristics of electronic devices, leaking secret and sensitive information to an adversary with physical access, pose a long-known threat to cryptographic hardware implementations. Among a variety of proposed countermeasures against such Side-Channel Analysis attacks, masking has emerged as a promising, but often costly, candidate. Furthermore, the manual realization of masked implementations has proven error-prone and often introduces flaws, possibly resulting in insecure circuits. In the context of automatic masking, a new line of research emerged, aiming to replace each physical gate with a secure gadget that fulfills well-defined properties, guaranteeing security when interconnected to a large circuit. Unfortunately, those gadgets introduce a significant amount of additional overhead into the design, in terms of area, latency, and randomness requirements.In this work, we present a novel approach to reduce the demands for randomness in such gadget-composed circuits by reusing randomness across gadgets while maintaining security in the probing adversary model. To this end, we embedded the corresponding optimization passes into an Electronic Design Automation toolchain, able to construct, optimize, and implement masked circuits, starting from an unprotected design. As such, our security-aware optimization offers an additional building block for existing or new Electronic Design Automation frameworks, where security is considered a first-class design constraint.

2022

TCHES

PROLEAD: A Probing-Based Hardware Leakage Detection Tool Abstract

Nicolai Müller Amir Moradi

Even today, Side-Channel Analysis attacks pose a serious threat to the security of cryptographic implementations fabricated with low-power and nanoscale feature technologies. Fortunately, the masking countermeasures offer reliable protection against such attacks based on simple security assumptions. However, the practical application of masking to a cryptographic algorithm is not trivial, and the designer may overlook possible security flaws, especially when masking a complex circuit. Moreover, abstract models like probing security allow formal verification tools to evaluate masked implementations. However, this is computationally too expensive when dealing with circuits that are not based on composable gadgets. Unfortunately, using composable gadgets comes at some area overhead. As a result, such tools can only evaluate subcircuits, not their compositions, which can become the Achilles’ heel of such masked implementations.In this work, we apply logic simulations to evaluate the security of masked implementations which are not necessarily based on composable gadgets. We developed PROLEAD, an automated tool analyzing the statistical independence of simulated intermediates probed by a robust probing adversary. Compared to the state of the art, our approach (1) does not require any power model as only the state of a gate-level netlist is simulated, (2) can handle masked full cipher implementations, and (3) can detect flaws related to the combined occurrence of glitches and transitions as well as higher-order multivariate leakages. With PROLEAD, we can evaluate masked mplementations that are too complex for existing formal verification tools while being in line with the robust probing model. Through PROLEAD, we have detected security flaws in several publicly-available masked implementations, which have been claimed to be robust probing secure.

2022

TCHES

Low-Latency and Low-Randomness Second-Order Masked Cubic Functions Abstract

Aein Rezaei Shahmirzadi Siemen Dhooghe Amir Moradi

Masking schemes are the most popular countermeasure to mitigate Side-Channel Analysis (SCA) attacks. Compared to software, their hardware implementations require certain considerations with respect to physical defaults, such as glitches. To counter this extended leakage effect, the technique known as Threshold Implementation (TI) has proven to be a reliable solution. However, its efficiency, namely the number of shares, is tied to the algebraic degree of the target function. As a result, the application of TI may lead to unaffordable implementation costs. This dependency is relaxed by the successor schemes where the minimum number of d + 1 shares suffice for dth-order protection independent of the function’s algebraic degree. By this, although the number of input shares is reduced, the implementation costs are not necessarily low due to their high demand for fresh randomness. It becomes even more challenging when a joint low-latency and low-randomness cost is desired. In this work, we provide a methodology to realize the second-order glitch-extended probing-secure implementation of cubic functions with three shares while allowing to reuse fresh randomness. This enables us to construct low-latency second-order secure implementations of several popular lightweight block ciphers, including Skinny, Midori, and Prince, with a very limited number of fresh masks. Notably, compared to state-of-the-art equivalent implementations, our designs lower the latency in terms of the number of clock cycles while keeping randomness costs low.

2021

TCHES

New First-Order Secure AES Performance Records 📺 Abstract

Aein Rezaei Shahmirzadi Dušan Božilov Amir Moradi

Being based on a sound theoretical basis, masking schemes are commonly applied to protect cryptographic implementations against Side-Channel Analysis (SCA) attacks. Constructing SCA-protected AES, as the most widely deployed block cipher, has been naturally the focus of several research projects, with a direct application in industry. The majority of SCA-secure AES implementations introduced to the community opted for low area and latency overheads considering Application-Specific Integrated Circuit (ASIC) platforms. Albeit a few, those which particularly targeted Field Programmable Gate Arrays (FPGAs) as the implementation platform yield either a low throughput or a not-highly secure design.In this work, we fill this gap by introducing first-order glitch-extended probing secure masked AES implementations highly optimized for FPGAs, which support both encryption and decryption. Compared to the state of the art, our designs efficiently map the critical non-linear parts of the masked S-box into the built-in Block RAMs (BRAMs).The most performant variant of our constructions accomplishes five first-order secure AES encryptions/decryptions simultaneously in 50 clock cycles. Compared to the equivalent state-of-the-art designs, this leads to at least 70% reduction in utilization of FPGA resources (slices) at the cost of occupying BRAMs. Last but not least, we provide a wide range of such secure and efficient implementations supporting a large set of applications, ranging from low-area to high-throughput.

2021

TCHES

Let’s Take it Offline: Boosting Brute-Force Attacks on iPhone’s User Authentication through SCA 📺 Abstract

Oleksiy Lisovets David Knichel Thorben Moos Amir Moradi

In recent years, smartphones have become an increasingly important storage facility for personal sensitive data ranging from photos and credentials up to financial and medical records like credit cards and person’s diseases. Trivially, it is critical to secure this information and only provide access to the genuine and authenticated user. Smartphone vendors have already taken exceptional care to protect user data by the means of various software and hardware security features like code signing, authenticated boot chain, dedicated co-processor and integrated cryptographic engines with hardware fused keys. Despite these obstacles, adversaries have successfully broken through various software protections in the past, leaving only the hardware as the last standing barrier between the attacker and user data. In this work, we build upon existing software vulnerabilities and break through the final barrier by performing the first publicly reported physical Side-Channel Analysis (SCA) attack on an iPhone in order to extract the hardware-fused devicespecific User Identifier (UID) key. This key – once at hand – allows the adversary to perform an offline brute-force attack on the user passcode employing an optimized and scalable implementation of the Key Derivation Function (KDF) on a Graphics Processing Unit (GPU) cluster. Once the passcode is revealed, the adversary has full access to all user data stored on the device and possibly in the cloud.As the software exploit enables acquisition and processing of hundreds of millions oftraces, this work further shows that an attacker being able to query arbitrary many chosen-data encryption/decryption requests is a realistic model, even for compact systems with advanced software protections, and emphasizes the need for assessing resilience against SCA for a very high number of traces.

2021

TCHES

Inconsistency of Simulation and Practice in Delay-based Strong PUFs 📺 Abstract

Anita Aghaie Amir Moradi

The developments in the areas of strong Physical Unclonable Functions (PUFs) predicate an ongoing struggle between designers and attackers. Such a combat motivated the atmosphere of open research, hence enhancing PUF designs in the presence of Machine Learning (ML) attacks. As an example of this controversy, at CHES 2019, a novel delay-based PUF (iPUF) has been introduced and claimed to be resistant against various ML and reliability attacks. At CHES 2020, a new divide-and-conquer modeling attack (splitting iPUF) has been presented showing the vulnerability of even large iPUF variants.Such attacks and analyses are naturally examined purely in the simulation domain, where some metrics like uniformity are assumed to be ideal. This assumption is motivated by a common belief that implementation defects (such as bias) may ease the attacks. In this paper, we highlight the critical role of uniformity in the success of ML attacks, and for the first time present a case where the bias originating from implementation defects hardens certain learning problems in complex PUF architectures. We present the result of our investigations conducted on a cluster of 100 Xilinx Artix 7 FPGAs, showing the incapability of the splitting iPUF attack to model even small iPUF instances when facing a slight non-uniformity. In fact, our findings imply that non-ideal conditions due to implementation defects should also be considered when developing an attack vector on complex PUF architectures like iPUF. On the other hand, we observe a relatively low uniqueness even when following the suggestions made by the iPUF’s original authors with respect to the FPGA implementations, which indeed questions the promised physical unclonability.

2021

TCHES

DL-LA: Deep Learning Leakage Assessment: A modern roadmap for SCA evaluations 📺 Abstract

Thorben Moos Felix Wegener Amir Moradi

In recent years, deep learning has become an attractive ingredient to side-channel analysis (SCA) due to its potential to improve the success probability or enhance the performance of certain frequently executed tasks. One task that is commonly assisted by machine learning techniques is the profiling of a device’s leakage behavior in order to carry out a template attack. At CHES 2019, deep learning has also been applied to non-profiled scenarios for the first time, extending its reach within SCA beyond template attacks. The proposed method, called DDLA, has some tempting advantages over traditional SCA due to merits inherited from (convolutional) neural networks. Most notably, it greatly reduces the need for pre-processing steps< when the SCA traces are misaligned or when the leakage is of a multivariate nature. However, similar to traditional attack scenarios the success of this approach highly depends on the correct choice of a leakage model and the intermediate value to target. In this work we explore, for the first time in literature, whether deep learning can similarly be used as an instrument to advance another crucial (non-profiled) discipline of SCA which is inherently independent of leakage models and targeted intermediates, namely leakage assessment. In fact, given the simple classification-based nature of common leakage assessment techniques, in particular distinguishing two groups fixed-vs-random or fixed-vs-fixed, it comes as a surprise that machine learning has not been brought into this context, yet. Our contribution is the development of the first full leakage assessment methodology based on deep learning. It gives the evaluator the freedom to not worry about location, alignment and statistical order of the leakages and easily covers multivariate and horizontal patterns as well. We test our approach against a number of case studies based on FPGA, ASIC and μC implementations of the PRESENT block cipher, equipped with state-of-the-art SCA countermeasures. Our results clearly show that the proposed methodology and network structures are robust across all case studies and outperform the classical detection approaches (t-test and X2-test) in all considered scenarios.

2021

TCHES

Second-Order SCA Security with almost no Fresh Randomness 📺 Abstract

Aein Rezaei Shahmirzadi Amir Moradi

Masking schemes are among the most popular countermeasures against Side-Channel Analysis (SCA) attacks. Realization of masked implementations on hardware faces several difficulties including dealing with glitches. Threshold Implementation (TI) is known as the first strategy with provable security in presence of glitches. In addition to the desired security order d, TI defines the minimum number of shares to also depend on the algebraic degree of the target function. This may lead to unaffordable implementation costs for higher orders.For example, at least five shares are required to protect the smallest nonlinear function against second-order attacks. By cuttingsuch a dependency, the successor schemes are able to achieve the same security level by just d + 1 shares, at the cost of high demand for fresh randomness, particularly at higher orders. In this work, we provide a methodology to realize the second-order glitch-extended probing-secure implementation of a group of quadratic functions with three shares and no fresh randomness. This allows us to construct second-order secure implementations of several cryptographic primitives with very limited number of fresh masks, including Keccak, SKINNY, Midori, PRESENT, and PRINCE.

2021

TCHES

Countermeasures against Static Power Attacks: – Comparing Exhaustive Logic Balancing and Other Protection Schemes in 28 nm CMOS – 📺 Abstract

Thorben Moos Amir Moradi

In recent years it has been demonstrated convincingly that the standby power of a CMOS chip reveals information about the internally stored and processed data. Thus, for adversaries who seek to extract secrets from cryptographic devices via side-channel analysis, the static power has become an attractive quantity to obtain. Most works have focused on the destructive side of this subject by demonstrating attacks. In this work, we examine potential solutions to protect circuits from silently leaking sensitive information during idle times. We focus on countermeasures that can be implemented using any common digital standard cell library and do not consider solutions that require full-custom or analog design flow. In particular, we evaluate and compare a set of five distinct standard-cell-based hiding countermeasures, including both, randomization and equalization techniques. We then combine the hiding countermeasures with state-of-the-art hardware masking in order to amplify the noise level and achieve a high resistance against attacks. An important part of our contribution is the proposal and evaluation of the first ever standard-cell-based balancing scheme which achieves perfect data-independence on paper, i.e., in absence of intra-die process variations and aging effects. We call our new countermeasure Exhaustive Logic Balancing (ELB). While this scheme, applied to a threshold implementation, provides the highest level of resistance in our experiments, it may not be the most cost effective option due to the significant resource overhead associated. All evaluated countermeasures and combinations thereof are applied to a serialized hardware implementation of the PRESENT block cipher and realized as cryptographic co-processors on a 28nm CMOS ASIC prototype. Our experimental results are obtained through real-silicon measurements of a fabricated die of the ASIC in a temperature-controlled environment using a source measure unit (SMU). We believe that our elaborate comparison serves as a useful guideline for hardware designers to find a proper tradeoff between security and cost for almost any application.

2021

TCHES

Low-Latency Keccak at any Arbitrary Order 📺 Abstract

Sara Zarei Aein Rezaei Shahmirzadi Hadi Soleimany Raziyeh Salarifard Amir Moradi

Correct application of masking on hardware implementation of cryptographic primitives necessitates the instantiation of registers in order to achieve the non-completeness (commonly said to stop the propagation of glitches). This sometimes leads to a high latency overhead, making the implementation not necessarily suitable for the underlying application. As a concrete example, this holds for Keccak. Application of d + 1 Domain Oriented Masking (DOM) on a round-based implementation of Keccak leads to the introduction of two register stages per round, i.e., two times higher latency. On the other hand, Rhythmic-Keccak, introduced in CHES 2018, unrolls two rounds to half the latency compared to an unprotected ordinary round-based implementation. To that end, td + 1 masking is used which requires a notable area, and – apart from the difficulty to construct – its extension to higher orders seems beyond the bounds of feasibility.In this paper, we focus on d + 1 masking and introduce a methodology which enables us to stay with the latency of an unprotected round-based implementation, i.e., one register stage per round. While being secure under glitch-extended probing model, we provide a general design where the desired security order can be easily adjusted without any effect on the above-given latency. Compared to the Rhythmic-Keccak, the synthesis results show that our first-order design is able to accomplish the entire operations of Keccak-f[200] in the same period of time while decreasing the area by 74.5%. Notably, our implementations achieve around 30% less delay compared to the corresponding original DOM-Keccak designs.

2021

TCHES

FIVER – Robust Verification of Countermeasures against Fault Injections 📺 Abstract

Jan Richter-Brockmann Aein Rezaei Shahmirzadi Pascal Sasdrich Amir Moradi Tim Güneysu

Fault Injection Analysis is seen as a powerful attack against implementations of cryptographic algorithms. Over the last two decades, researchers proposed a plethora of countermeasures to secure such implementations. However, the design process and implementation are still error-prone, complex, and manual tasks which require long-standing experience in hardware design and physical security. Moreover, the validation of the claimed security is often only done by empirical testing in a very late stage of the design process. To prevent such empirical testing strategies, approaches based on formal verification are applied instead providing the designer early feedback.In this work, we present a fault verification framework to validate the security of countermeasures against fault-injection attacks designed for ICs. The verification framework works on netlist-level, parses the given digital circuit into a model based on Binary Decision Diagrams, and performs symbolic fault injections. This verification approach constitutes a novel strategy to evaluate protected hardware designs against fault injections offering new opportunities as performing full analyses under a given fault models.Eventually, we apply the proposed verification framework to real-world implementations of well-established countermeasures against fault-injection attacks. Here, we consider protected designs of the lightweight ciphers CRAFT and LED-64 as well as AES. Due to several optimization strategies, our tool is able to perform more than 90 million fault injections in a single-round CRAFT design and evaluate the security in under 50 min while the symbolic simulation approach considers all 2128 primary inputs.

2021

TCHES

The SPEEDY Family of Block Ciphers: Engineering an Ultra Low-Latency Cipher from Gate Level for Secure Processor Architectures 📺 Abstract

Gregor Leander Thorben Moos Amir Moradi Shahram Rasoolzadeh

We introduce SPEEDY, a family of ultra low-latency block ciphers. We mix engineering expertise into each step of the cipher’s design process in order to create a secure encryption primitive with an extremely low latency in CMOS hardware. The centerpiece of our constructions is a high-speed 6-bit substitution box whose coordinate functions are realized as two-level NAND trees. In contrast to other low-latency block ciphers such as PRINCE, PRINCEv2, MANTIS and QARMA, we neither constrain ourselves by demanding decryption at low overhead, nor by requiring a super low area or energy. This freedom together with our gate- and transistor-level considerations allows us to create an ultra low-latency cipher which outperforms all known solutions in single-cycle encryption speed. Our main result, SPEEDY-6-192, is a 6-round 192-bit block and 192-bit key cipher which can be executed faster in hardware than any other known encryption primitive (including Gimli in Even-Mansour scheme and the Orthros pseudorandom function) and offers 128-bit security. One round more, i.e., SPEEDY-7-192, provides full 192-bit security. SPEEDY primarily targets hardware security solutions embedded in high-end CPUs, where area and energy restrictions are secondary while high performance is the number one priority.

2020

JOFC

Spin Me Right Round Rotational Symmetry for FPGA-Specific AES: Extended Version Abstract

Felix Wegener Lauren De Meyer Amir Moradi

The effort in reducing the area of AES implementations has largely been focused on application-specific integrated circuits (ASICs) in which a tower field construction leads to a small design of the AES S-box. In contrast, a naive implementation of the AES S-box has been the status-quo on field-programmable gate arrays (FPGAs). A similar discrepancy holds for masking schemes—a well-known side-channel analysis countermeasure—which are commonly optimized to achieve minimal area in ASICs. In this paper, we demonstrate a representation of the AES S-box exploiting rotational symmetry which leads to a 50% reduction in the area footprint on FPGA devices. We present new AES implementations which improve on the state-of-the-art and explore various trade-offs between area and latency. For instance, at the cost of increasing 4.5 times the latency, one of our design variants requires 25% less look-up tables (LUTs) than the smallest known AES on Xilinx FPGAs by Sasdrich and Güneysu at ASAP 2016. We further explore the protection of such implementations against side-channel attacks. We introduce a generic methodology for masking any n -bit Boolean functions of degree t with protection order d . The methodology is exact for first-order and heuristic for higher orders. Its application to our new construction of the AES S-box allows us to improve previous results and introduce the smallest first-order masked AES implementation on Xilinx FPGAs, to date.

2020

TOSC

SKINNY-AEAD and SKINNY-Hash 📺 Abstract

Christof Beierle Jérémy Jean Stefan Kölbl Gregor Leander Amir Moradi Thomas Peyrin Yu Sasaki Pascal Sasdrich Siang Meng Sim

We present the family of authenticated encryption schemes SKINNY-AEAD and the family of hashing schemes SKINNY-Hash. All of the schemes employ a member of the SKINNY family of tweakable block ciphers, which was presented at CRYPTO 2016, as the underlying primitive. In particular, for authenticated encryption, we show how to instantiate members of SKINNY in the Deoxys-I-like ΘCB3 framework to fulfill the submission requirements of the NIST lightweight cryptography standardization process. For hashing, we use SKINNY to build a function with larger internal state and employ it in a sponge construction. To highlight the extensive amount of third-party analysis that SKINNY obtained since its publication, we briefly survey the existing cryptanalysis results for SKINNY-128-256 and SKINNY-128-384 as of February 2020. In the last part of the paper, we provide a variety of ASIC implementations of our schemes and propose new simple SKINNY-AEAD and SKINNY-Hash variants with a reduced number of rounds while maintaining a very comfortable security margin. https://csrc.nist.gov/Projects/Lightweight-Cryptography

2020

ASIACRYPT

SILVER - Statistical Independence and Leakage Verification 📺 Abstract

David Knichel Pascal Sasdrich Amir Moradi

Implementing cryptographic functions securely in the presence of physical adversaries is still a challenge although a lion's share of research in the physical security domain has been put in development of countermeasures. Among several protection schemes, masking has absorbed the most attention of research in both academic and industrial communities, due to its theoretical foundation allowing to provide proofs or model the achieved security level. In return, masking schemes are difdicult to implement as the implementation process often is manual, complex, and error-prone. This motivated the need for formal verification tools that allow the designers and engineers to analyze and verify the designs before manufacturing. In this work, we present a new framework to analyze and verify masked implementations against various security notions using different security models as reference. In particular, our framework { which directly processes the resulting gate-level netlist of a hardware synthesis { particularly relies on Reduced Ordered Binary Decision Diagrams (ROBDDs) and the concept of statistical independence of probability distributions. Compared to existing tools, our framework captivates due to its simplicity, accuracy, and functionality while still having a reasonable efficiency for many applications and common use-cases.

2020

TCHES

Re-Consolidating First-Order Masking Schemes: Nullifying Fresh Randomness 📺 Abstract

Aein Rezaei Shahmirzadi Amir Moradi

Application of masking, known as the most robust and reliable countermeasure to side-channel analysis attacks, on various cryptographic algorithms has dedicated a lion’s share of research to itself. The difficulty originates from the fact that the overhead of application of such an algorithmic-level countermeasure might not be affordable. This includes the area- and latency overheads and the amount of fresh randomness required to fulfill the resulting design’s security properties. There are already techniques applicable in hardware platforms that consider glitches into account. Among them, classical threshold implementations force the designers to use at least three shares in the underlying masking. The other schemes, which can deal with two shares, often necessitates the use of fresh randomness.Here, in this work, we present a technique allowing us to use two shares to realize the first-order glitch-extended probing secure masked realization of several functions, including the S-box of Midori, PRESENT, PRINCE, and AES ciphers without any fresh randomness.

2019

TCHES

Glitch-Resistant Masking Revisited 📺 ★ Abstract

Thorben Moos Amir Moradi Tobias Schneider François-Xavier Standaert

Implementing the masking countermeasure in hardware is a delicate task. Various solutions have been proposed for this purpose over the last years: we focus on Threshold Implementations (TIs), Domain-Oriented Masking (DOM), the Unified Masking Approach (UMA) and Generic Low Latency Masking (GLM). The latter generally come with innovative ideas to cope with physical defaults such as glitches. Yet, and in contrast to the situation in software-oriented masking, these schemes have not been formally proven at arbitrary security orders and their composability properties were left unclear. So far, only a 2-cycle implementation of the seminal masking scheme by Ishai, Sahai and Wagner has been shown secure and composable in the robust probing model – a variation of the probing model aimed to capture physical defaults such as glitches – for any number of shares.In this paper, we argue that this lack of proofs for TIs, DOM, UMA and GLM makes the interpretation of their security guarantees difficult as the number of shares increases. For this purpose, we first put forward that the higher-order variants of all these schemes are affected by (local or composability) security flaws in the (robust) probing model, due to insufficient refreshing. We then show that composability and robustness against glitches cannot be analyzed independently. We finally detail how these abstract flaws translate into concrete (experimental) attacks, and discuss the additional constraints robust probing security implies on the need of registers. Despite not systematically leading to improved complexities at low security orders, e.g., with respect to the required number of measurements for a successful attack, we argue that these weaknesses provide a case for the need of security proofs in the robust probing model (or a similar abstraction) at higher security orders.

2019

TOSC

CRAFT: Lightweight Tweakable Block Cipher with Efficient Protection Against DFA Attacks 📺 Abstract

Christof Beierle Gregor Leander Amir Moradi Shahram Rasoolzadeh

Traditionally, countermeasures against physical attacks are integrated into the implementation of cryptographic primitives after the algorithms have been designed for achieving a certain level of cryptanalytic security. This picture has been changed by the introduction of PICARO, ZORRO, and FIDES, where efficient protection against Side-Channel Analysis (SCA) attacks has been considered in their design. In this work we present the tweakable block cipher CRAFT: the efficient protection of its implementations against Differential Fault Analysis (DFA) attacks has been one of the main design criteria, while we provide strong bounds for its security in the related-tweak model. Considering the area footprint of round-based hardware implementations, CRAFT outperforms the other lightweight ciphers with the same state and key size. This holds not only for unprotected implementations but also when fault-detection facilities, side-channel protection, and their combination are integrated into the implementation. In addition to supporting a 64-bit tweak, CRAFT has the additional property that the circuit realizing the encryption can support the decryption functionality as well with very little area overhead.

2019

TCHES

Exploring the Effect of Device Aging on Static Power Analysis Attacks 📺 Abstract

Naghmeh Karimi Thorben Moos Amir Moradi

Vulnerability of cryptographic devices to side-channel analysis attacks, and in particular power analysis attacks has been extensively studied in the recent years. Among them, static power analysis attacks have become relevant with moving towards smaller technology nodes for which the static power is comparable to the dynamic power of a chip, or even dominant in future technology generations. The magnitude of the static power of a chip depends on the physical characteristics of transistors (e.g., the dimensions) as well as operating conditions (e.g., the temperature) and the electrical specifications such as the threshold voltage. In fact, the electrical specifications of transistors deviate from their originally intended ones during device lifetime due to aging mechanisms. Although device aging has been extensively investigated from reliability point of view, the impact of aging on the security of devices, and in particular on the vulnerability of devices to power analysis attacks are yet to be considered.This paper fills the gap and investigates how device aging can affect the susceptibility of a chip exposed to static power analysis attacks. To this end, we conduct both, simulation and practical experiments on real silicon. The experimental results are extracted from a realization of the PRESENT cipher fabricated using a 65nm commercial standard cell library. The results show that the amount of exploitable leakage through the static power consumption as a side channel is reduced when the device is aged. This can be considered as a positive development which can (even slightly) harden such static power analysis attacks. Additionally, this result is of great interest to static power side-channel adversaries since state-of-the-art leakage current measurements are conducted over long time periods under increased working temperatures and supply voltages to amplify the exploitable information, which certainly fuels aging-related device degradation.

2018

TCHES

Hardware Masking, Revisited 📺 Abstract

Thomas De Cnudde Maik Ender Amir Moradi

MaskingHardware masking schemes have shown many advances in the past few years. Through a series of publications their implementation cost has dropped significantly and flaws have been fixed where present. Despite these advancements it seems that a limit has been reached when implementing masking schemes on FPGA platforms. Indeed, even with a correct transition from the masking scheme to the masking realization (i.e., when the implementation is not buggy) it has been shown that the implementation can still exhibit unexpected leakage, e.g., through variations in placement and routing. In this work, we show that the reason for such unexpected leakages is the violation of an underlying assumption made by all masking schemes, i.e., that the leakage of the circuit is a linear sum of leakages associated to each share. In addition to the theory of VLSI which supports our claim, we perform a wide range of experiments based on an FPGA) to find out under what circumstances this causes a masked hardware implementation to show undesirable leakage. We further illustrate case studies, where publicly-known secure designs exhibit first-order leakage when being operated at certain conditions.

2018

TCHES

Leakage Detection with the x2-Test 📺 Abstract

Amir Moradi Bastian Richter Tobias Schneider François-Xavier Standaert

We describe how Pearson’s χ2-test can be used as a natural complement to Welch’s t-test for black box leakage detection. In particular, we show that by using these two tests in combination, we can mitigate some of the limitations due to the moment-based nature of existing detection techniques based on Welch’s t-test (e.g., for the evaluation of higher-order masked implementations with insufficient noise). We also show that Pearson’s χ2-test is naturally suited to analyze threshold implementations with information lying in multiple statistical moments, and can be easily extended to a distinguisher for key recovery attacks. As a result, we believe the proposed test and methodology are interesting complementary ingredients of the side-channel evaluation toolbox, for black box leakage detection and non-profiled attacks, and as a preliminary before more demanding advanced analyses.

2018

TCHES

Spin Me Right Round Rotational Symmetry for FPGA-Specific AES Abstract

Lauren De Meyer Amir Moradi Felix Wegener

The effort in reducing the area of AES implementations has largely been focused on Application-Specific Integrated Circuits (ASICs) in which a tower field construction leads to a small design of the AES S-box. In contrast, a naïve implementation of the AES S-box has been the status-quo on Field-Programmable Gate Arrays (FPGAs). A similar discrepancy holds for masking schemes – a wellknown side-channel analysis countermeasure – which are commonly optimized to achieve minimal area in ASICs.In this paper we demonstrate a representation of the AES S-box exploiting rotational symmetry which leads to a 50% reduction of the area footprint on FPGA devices. We present new AES implementations which improve on the state of the art and explore various trade-offs between area and latency. For instance, at the cost of increasing 4.5 times the latency, one of our design variants requires 25% less look-up tables (LUTs) than the smallest known AES on Xilinx FPGAs by Sasdrich and Güneysu at ASAP 2016. We further explore the protection of such implementations against first-order side-channel analysis attacks. Targeting the small area footprint on FPGAs, we introduce a heuristic-based algorithm to find a masking of a given function with d + 1 shares. Its application to our new construction of the AES S-box allows us to introduce the smallest masked AES implementation on Xilinx FPGAs, to-date.

2017

ASIACRYPT

The First Thorough Side-Channel Hardware Trojan

Maik Ender Samaneh Ghandali Amir Moradi Christof Paar

2017

CHES

Bit-Sliding: A Generic Technique for Bit-Serial Implementations of SPN-based Primitives Abstract

Jérémy Jean Amir Moradi Thomas Peyrin Pascal Sasdrich

Area minimization is one of the main efficiency criterion for lightweight encryption primitives. While reducing the implementation data path is a natural strategy for achieving this goal, Substitution-Permutation Network (SPN) ciphers are usually hard to implement in a bit-serial way (1-bit data path). More generally, this is hard for any data path smaller than its Sbox size, since many scan flip-flops would be required for storage, which are more area-expensive than regular flip-flops.In this article, we propose the first strategy to obtain extremely small bit-serial ASIC implementations of SPN primitives. Our technique, which we call bit-sliding, is generic and offers many new interesting implementation trade-offs. It manages to minimize the area by reducing the data path to a single bit, while avoiding the use of many scan flip-flops.Following this general architecture, we could obtain the first bit-serial and the smallest implementation of AES-128 to date (1560 GE for encryption only, and 1738 GE for encryption and decryption with IBM 130 nm standard-cell library), greatly improving over the smallest known implementations (about 30% decrease), making AES-128 competitive to many ciphers specifically designed for lightweight cryptography. To exhibit the generality of our strategy, we also applied it to the PRESENT and SKINNY block ciphers, again offering the smallest implementations of these ciphers thus far, reaching an area as low as 1065 GE for a 64-bit block 128-bit key cipher. It is also to be noted that our bit-sliding seems to obtain very good power consumption figures, which makes this implementation strategy a good candidate for passive RFID tags.

2016

CRYPTO

ParTI - Towards Combined Hardware Countermeasures Against Side-Channel and Fault-Injection Attacks 📺

Tobias Schneider Amir Moradi Tim Güneysu

2016

CRYPTO

The SKINNY Family of Block Ciphers and Its Low-Latency Variant MANTIS 📺

Christof Beierle Jérémy Jean Stefan Kölbl Gregor Leander Amir Moradi Thomas Peyrin Yu Sasaki Pascal Sasdrich Siang Meng Sim

2016

FSE

White-Box Cryptography in the Gray Box - - A Hardware Implementation and its Side Channels -

Pascal Sasdrich Amir Moradi Tim Güneysu