Tim Güneysu

CryptoDB

Tim Güneysu

Publications and invited talks

Year

Venue

Title

2025

EUROCRYPT

INDIANA - Verifying (Random) Probing Security through Indistinguishability Analysis Abstract

Christof Beierle Jakob Feldtkeller Anna Guinet Tim Güneysu Gregor Leander Jan Richter-Brockmann Pascal Sasdrich

While masking is a widely used defense against passive side-channel attacks, its secure implementation in hardware continues to be a manual, complex, and error-prone process. This paper introduces INDIANA, a comprehensive security verification methodology for hardware masking. Our results include a hardware verification tool, enabling a complete analysis of simulation-based security in the glitch-extended probing model and intra-cycle estimations for leakage probabilities in the random probing model. Notably, INDIANA is the first framework to analyze arbitrary masked circuits in both models, even at the scale of full SPN cipher rounds (e.g., AES), while delivering exact verification results. To achieve accurate and comprehensive verification, we propose a partitionable probing distinguisher that allows for fast validation of probe tuples, surpassing current methods that rely on statistical independence. Furthermore, our approach naturally supports extensions to the random probing model by utilizing Fast Fourier-Hadamard Transformations (FHTs). Benchmark results show that INDIANA competes effectively with leading probing model verification tools, such as ironMask, maskVerif, and VERICA. INDIANA is also the first tool that is capable to provide intra-cycle estimations of random probing leakage probabilities for large-scale masked circuits.

2025

TCHES

KeyVisor – A Lightweight ISA Extension for Protected Key Handles with CPU-enforced Usage Policies Abstract

Fabian Schwarz Jan Philipp Thoma Christian Rossow Tim Güneysu

The confidentiality of cryptographic keys is essential for the security of protection schemes used for communication, file encryption, and outsourced compu- tation. Beyond cryptanalytic attacks, adversaries can steal keys from memory via software exploits or side channels, enabling them to, e.g., manipulate confidential information or impersonate key owners. Therefore, existing defenses protect keys in dedicated devices or isolated memory, or store them only in encrypted form. However, these designs often provide unfavorable tradeoffs, sacrificing performance, fine-grained access control, or deployability.In this paper, we present KeyVisor, a lightweight Instruction Set Architecture ( ISA) extension that securely offloads the handling of symmetric crypto keys to the CPU. KeyVisor provides CPU instructions that enable applications to request protected key handles and perform AEAD cipher operations on them. The underlying keys are accessible only by KeyVisor, and thus never leak to memory. KeyVisor’s direct CPU integration enables fast crypto operations and hardware-enforced key usage restrictions, e.g., keys usable only for de-/encryption, with a limited lifetime, or with a process binding. Furthermore, privileged software, e.g., the monitor firmware of TEEs, can revoke keys or bind them to a specific process/TEE. We implement KeyVisor for RISC-V based on RocketChip, evaluate its performance, and demonstrate real-world use cases, including key-value databases, automotive feature licensing, and a read-only network middlebox.

2025

TCHES

HADES: Automated Hardware Design Exploration for Cryptographic Primitives Abstract

Fabian Buschkowski Georg Land Niklas Höher Jan Richter-Brockmann Pascal Sasdrich Tim Güneysu

While formal constructions for cryptographic schemes have steadily evolved and emerged over the past decades, the design and implementation of efficient and secure hardware instances are still mostly manual, tedious, and intuition-driven processes. With the increasing complexity of modern cryptography, e.g., Post-Quantum Cryptography (PQC) schemes, and consideration of physical implementation attacks, e.g., Side-Channel Analysis (SCA), the design space often grows exorbitantly without developers being able to weigh all design options.This emphasizes the evident necessity for tool-assisted Design Space Exploration (DSE) for efficient and secure cryptographic hardware. To address this demand, we present the HADES framework. This tool systematically traverses the design space driven by security requirements, rapidly predicts user-defined performance metrics, e.g., area footprint or cycle-accurate latency, and instantiates the most suitable candidate in a synthesizable Hardware Description Language (HDL).We demonstrate the capabilities of our framework by applying our proof-of-concept implementation to a wide-ranging selection of symmetric and PQC schemes, including the ChaCha20 stream cipher and the PQC standard Kyber. Notably, for these schemes, we present the first hardware implementations featuring arbitrary-order masking.

2025

TCHES

pracy: A Practical Compiler for Attribute-Based Encryption in Python Abstract

Sven Argo Marloes Venema Adrian Ackermann Tim Güneysu

Attribute-based encryption (ABE) is a versatile primitive that has been considered in many applications to enforce access control cryptographically. To actually benefit from ABE in practice, we require implementations of schemes that satisfy all the properties that are needed. Many theoretical advancements have been made to attain such properties, ultimately resulting in powerful abstractions such as pair encodings. To build an ABE scheme, we use a compiler (in the theoretical sense), which transforms a provably secure pair encoding scheme into a provably secure ABE scheme. Although several such compilers have been introduced, they all abstract away many details that are relevant for engineers, which can hinder the implementation of schemes in practice.To address this problem, we propose pracy, which is a tool that automatically implements an ABE scheme from an input pair encoding scheme. To achieve this, we first note that we need to overcome a general issue in any automation efforts – including automated optimization and security analysis – in the field of pairing-based cryptography. In particular, there exist no parsers that properly model the interaction between the predicate and the pair encodings. Therefore, we devise a new formal model and type system, which capture this interaction in a way that is compatible with automated implementation efforts. To illustrate the feasibility of our model and system, we construct pracy, which is a (practical) compiler in Python that can implement ABE schemes in multiple target programming languages such as Python and C/C++. With pracy, we not only make the implementation of ABE schemes from pair encodings more accessible to practitioners, we realize the potential that pair encodings have to simplify implementation efforts.

2025

TCHES

ABE Cubed: Advanced Benchmarking Extensions for ABE Squared Abstract

Sven Argo Marloes Venema Doreen Riepel Tim Güneysu Diego F. Aranha

Since attribute-based encryption (ABE) was proposed in 2005, it has established itself as a valuable tool in the enforcement of access control. For practice, it is important that ABE satisfies many desirable properties such as multi-authority and negations support. Nowadays, we can attain these properties simultaneously, but none of these schemes have been implemented. Furthermore, although simpler schemes have been optimized extensively on a structural level, there is still much room for improvement for these more advanced schemes. However, even if we had schemes with such structural improvements, we would not have a way to benchmark and compare them fairly to measure the effect of such improvements. The only framework that aims to achieve this goal, ABE Squared (TCHES ’22), was designed with simpler schemes in mind.In this work, we propose the ABE Cubed framework, which provides advanced benchmarking extensions for ABE Squared. To motivate our framework, we first apply structural improvements to the decentralized ciphertext-policy ABE scheme supporting negations presented by Riepel, Venema and Verma (ACM CCS ’24), which results in five new schemes with the same properties. We use these schemes to uncover and bridge the gaps in the ABE Squared framework. In particular, we observe that advanced schemes depend on more “variables” that affect the schemes’ efficiency in different dimensions. Whereas ABE Squared only considered one dimension (as was sufficient for the schemes considered there), we devise a benchmarking strategy that allows us to analyze the schemes in multiple dimensions. As a result, we obtain a more complete overview on the computational efficiency of the schemes, and ultimately, this allows us to make better-founded choices about which schemes provide the best efficiency trade-offs for practice.

2025

TCHES

A New Perspective on Key Switching for BGV-like Schemes Abstract

Johannes Mono Tim Güneysu

Fully homomorphic encryption is a promising approach when computing on encrypted data, especially when sensitive data is involved. For BFV, BGV, and CKKS, three state-of-the-art encryption schemes, the most costly homomorphic primitive is the so-called key switching. While a decent amount of research has been devoted to optimizing other aspects of these schemes, key switching has gone largely untouched. One exception has been a recent work [KLSS23] introducing a new double-decomposition technique. Their contributions are an important addition to the current state-of-the-art with one flaw: They take a limited perspective on key switching parameters and their asymptotic complexity which leads to incorrect conclusions about how effective their approach really is. In our work, we deep dive into key switching and correct, enhance, and improve the current state-of-the-art. We provide a new perspective on key switching parameters for the single- and doubledecomposition techniques, respectively, and show that the former outperforms the latter in most scenarios. Additionally, we revisit an idea by Gentry, Halevi, and Smart [GHS12b] and reduce the number of multiplications.

2024

TCHES

HAETAE: Shorter Lattice-Based Fiat-Shamir Signatures Abstract

Jung Hee Cheon Hyeongmin Choe Julien Devevey Tim Güneysu Dongyeon Hong Markus Krausz Georg Land Marc Möller Damien Stehlé MinJune Yi

We present HAETAE (Hyperball bimodAl modulE rejecTion signAture schemE), a new lattice-based signature scheme. Like the NIST-selected Dilithium signature scheme, HAETAE is based on the Fiat-Shamir with Aborts paradigm, but our design choices target an improved complexity/compactness compromise that is highly relevant for many space-limited application scenarios. We primarily focus on reducing signature and verification key sizes so that signatures fit into one TCP or UDP datagram while preserving a high level of security against a variety of attacks. As a result, our scheme has signature and verification key sizes up to 39% and 25% smaller, respectively, compared than Dilithium. We provide a portable, constanttime reference implementation together with an optimized implementation using AVX2 instructions and an implementation with reduced stack size for the Cortex-M4. Moreover, we describe how to efficiently protect HAETAE against implementation attacks such as side-channel analysis, making it an attractive candidate for use in IoT and other embedded systems.

2024

TCHES

Correction Fault Attacks on Randomized CRYSTALS-Dilithium Abstract

Elisabeth Krahmer Peter Pessl Georg Land Tim Güneysu

After NIST’s selection of Dilithium as the primary future standard for quantum-secure digital signatures, increased efforts to understand its implementation security properties are required to enable widespread adoption on embedded devices. Concretely, there are still many open questions regarding the susceptibility of Dilithium to fault attacks. This is especially the case for Dilithium’s randomized (or hedged) signing mode, which, likely due to devastating implementation attacks on the deterministic mode, was selected as the default by NIST.This work takes steps towards closing this gap by presenting two new key-recovery fault attacks on randomized/hedged Dilithium. Both attacks are based on the idea< of correcting faulty signatures after signing. A successful correction yields the value of a secret intermediate that carries information on the key. After gathering many faulty signatures and corresponding correction values, it is possible to solve for thesigning key via either simple linear algebra or lattice-reduction techniques. Our first attack extends a previously published attack based on an instruction-skipping fault to the randomized setting. Our second attack injects faults in the matrix A, which is part of the public key. As such, it is not sensitive to side-channel leakage and has, potentially for this reason, not seen prior analysis regarding faults.We show that for Dilithium2, the attacks allow key recovery with as little as 1024 and 512 faulty signatures, with each signature generated by injecting a single targeted fault. We also demonstrate how our attacks can be adapted to circumvent several popular fault countermeasures with a moderate increase in the computational runtime and the number of required faulty signatures. These results are verified using both simulated faults and clock glitches on an ARM-based standard microcontroller. The presented attacks demonstrate that also randomized Dilithium can be subject to diverse fault attacks, that certain countermeasures might be easily bypassed, and that potential fault targets reach beyond side-channel sensitive operations. Still, many further operations are likely also susceptible, implying the need for increased analysis efforts in the future.

2024

TCHES

Combined Threshold Implementation Abstract

Jakob Feldtkeller Jan Richter-Brockmann Pascal Sasdrich Tim Güneysu

Physical security is an important aspect of devices for which an adversary can manipulate the physical execution environment. Recently, more and more attention has been directed towards a security model that combines the capabilities of passive and active physical attacks, i.e., an adversary that performs fault-injection and side-channel analysis at the same time. Implementing countermeasures against such a powerful adversary is not only costly but also requires the skillful combination of masking and redundancy to counteract all reciprocal effects.In this work, we propose a new methodology to generate combined-secure circuits. We show how to transform Threshold Implementation (TI)-like constructions to resist any adversary with the capability to tamper with internal gates and probe internal wires. For the resulting protection scheme, we can prove the combined security in a well-established theoretical security model.Since the transformation preserves the advantages of TI-like structures, the resulting circuits prove to be more efficient in the number of required bits of randomness (up to 100%), the latency in clock cycles (up to 40%), and even the area for pipelined designs (up to 40%) than the state of the art for an adversary restricted to manipulating a single gate and probing a single wire.

2024

TCHES

Improved Circuit Synthesis with Multi-Value Bootstrapping for FHEW-like Schemes Abstract

Johannes Mono Kamil Kluczniak Tim Güneysu

In recent years, the research community has made great progress in improving techniques for privacy-preserving computation, such as fully homomorphic encryption (FHE). Despite the progress, there remain open challenges, mainly in performance and usability, to further advance the adoption of these technologies. This work provides multiple contributions that improve the current state-of-the-art in both areas. More specifically, we significantly simplify the multi-value bootstrapping by Carpov, Izabachène, and Mollimard [CIM19] for Boolean-based FHE schemes such as FHEW or TFHE, making the concept usable in practice. Based on our simplifications, we implement an easy-to-use interface for multi-value bootstrapping in the open-source library FHE-Deck [fhe23], derive new parameter sets for multi-bit encryptions with state-of-the-art security, and build a toolset that translates high-level code to multi-bit operations on encrypted data using circuit synthesis. We propose and integrate the first non-trivial FHE-specific optimizations for privacy-preserving circuit synthesis: look-up table (LUT) grouping and adder substitution. Using LUT grouping, we reduce the number of bootstrapping operations by almost 40% on average, while for adder substitution, we reduce the number of required bootstrappings by up to 85% for certain use cases. Overall, the execution time is up to 4.2x faster with all optimizations enabled compared to previous state-of-the-art circuit synthesis.

2024

ASIACRYPT

Formal Definition and Verification for Combined Random Fault and Random Probing Security Abstract

Sonia Belaïd Jakob Feldtkeller Tim Güneysu Anna Guinet Jan Richter-Brockmann Matthieu Rivain Pascal Sasdrich Abdul Rahman Taleb

In our highly digitalized world, an adversary is not constrained to purely digital attacks but can monitor or influence the physical execution environment of a target computing device. Such side-channel or fault-injection analysis poses a significant threat to otherwise secure cryptographic implementations. Hence, it is important to consider additional adversarial capabilities when analyzing the security of cryptographic implementations besides the default black-box model. For side-channel analysis, this is done by providing the adversary with knowledge of some internal values, while for fault-injection analysis the capabilities of the adversaries include manipulation of some internal values. In this work, we extend probabilistic security models for physical attacks, by introducing a general random probing model and a general random fault model to capture arbitrary leakage and fault distributions, as well as the combination of these models. Our aim is to enable a more accurate modeling of low-level physical effects. We then analyze important properties, such as the impact of adversarial knowledge on faults and compositions, and provide tool-based formal verification methods that allow the security assessment of design components. These methods are introduced as extension of previous tools VERICA and IronMask which are implemented, evaluated and compared.

2024

TCHES

FANNG-MPC: Framework for Artificial Neural Networks and Generic MPC Abstract

Najwa Aaraj Abdelrahaman Aly Tim Güneysu Chiara Marcolla Johannes Mono Rogerio Paludo Iván Santos-González Mireia Scholz Eduardo Soria-Vazquez Victor Sucasas Ajith Suresh

In this work, we introduce FANNG-MPC, a versatile secure multi-party computation framework capable to offer active security for privacy-preserving machine learning as a service (MLaaS). Derived from the now deprecated SCALE-MAMBA, FANNG is a data-oriented fork, featuring novel set of libraries and instructions for realizing private neural networks, effectively reviving the popular framework. To the best of our knowledge, FANNG is the first MPC framework to offer actively secure MLaaS in the dishonest majority setting.FANNG goes beyond SCALE-MAMBA by decoupling offline and online phases and materializing the dealer model in software, enabling a separate set of entities to produce offline material. The framework incorporates database support, a new instruction set for pre-processed material, including garbled circuits and convolutional and matrix multiplication triples. FANNG also implements novel private comparison protocols and an optimized library supporting Neural Network functionality. All our theoretical claims are substantiated by an extensive evaluation using an open-sourced implementation, including the private inference of popular neural networks like LeNet and VGG16.

2023

PKC

A Holistic Approach Towards Side-Channel Secure Fixed-Weight Polynomial Sampling Abstract

Markus Krausz Georg Land Jan Richter-Brockmann Tim Güneysu

The sampling of polynomials with fixed weight is a procedure required by round-4 Key Encapsulation Mechanisms (KEMs) for Post-Quantum Cryptography (PQC) standardization (BIKE, HQC, McEliece) as well as NTRU, Streamlined NTRU Prime, and NTRU LPRrime. Recent attacks have shown in this context that side-channel leakage of sampling methods can be exploited for key recoveries. While countermeasures regarding such timing attacks have already been presented, still, there is no comprehensive work covering solutions that are also secure against power side channels. To close this gap, the contribution of this work is threefold: First, we analyze requirements for the different use cases of fixed weight sampling. Second, we demonstrate how all known sampling methods can be implemented securely against timing and power/EM side channels and propose performance-enhancing modifications. Furthermore, we propose a new, comparison based methodology that outperforms existing methods in the masked setting for the three round-4 KEMs BIKE, HQC, and McEliece. Third, we present bitsliced and arbitrary-order masked software implementations and benchmarked them for all relevant cryptographic schemes to be able to infer recommendations for each use case. Additionally, we provide a hardware implementation of our new method as a case study and analyze the feasibility of implementing the other approaches in hardware.

2023

TCHES

Gadget-based Masking of Streamlined NTRU Prime Decapsulation in Hardware Abstract

Georg Land Adrian Marotzke Jan Richter-Brockmann Tim Güneysu

Streamlined NTRU Prime is a lattice-based Key Encapsulation Mechanism (KEM) that is, together with X25519, the default algorithm in OpenSSH 9. Based on lattice assumptions, it is assumed to be secure also against attackers with access to< large-scale quantum computers. While Post-Quantum Cryptography (PQC) schemes have been subject to extensive research in recent years, challenges remain with respect to protection mechanisms against attackers that have additional side-channel information, such as the power consumption of a device processing secret data. As a countermeasure to such attacks, masking has been shown to be a promising and effective approach. For public-key schemes, including any recent PQC schemes, usually, a mixture of Boolean and arithmetic techniques is applied on an algorithmic level. Our generic hardware implementation of Streamlined NTRU Prime decapsulation, however, follows an idea that until now was assumed to be solely applicable efficiently to symmetric cryptography: gadget-based masking. The hardware design is transformed into a secure implementation by replacing each gate with a composable secure gadget that operates on uniform random shares of secret values. In our work, we show the feasibility of applying this approach also to PQC schemes and present the first Public-Key Cryptography (PKC) – pre- and post-quantum – implementation masked with the gadget-based approach considering several trade-offs and design choices. By the nature of gadget-based masking, the implementation can be instantiated at arbitrary masking order. We synthesize our implementation both for Artix-7 Field-Programmable Gate Arrays (FPGAs) and 45nm Application-Specific Integrated Circuits (ASICs), yielding practically feasible results regarding the area, randomness requirement, and latency. We verify the side-channel security of our implementation using formal verification on the one hand, and practically using Test Vector Leakage Assessment (TVLA) on the other. Finally, we also analyze the applicability of our concept to Kyber and Dilithium, which will be standardized by the National Institute of Standards and Technology (NIST).

2023

TCHES

A Tale of Snakes and Horses: Amplifying Correlation Power Analysis on Quadratic Maps Abstract

Anna Guinet Georg Land Ioan Gabriel Bucur Tim Güneysu

We study the success probabilities of two variants of Correlation Power Analysis (CPA) to retrieve multiple secret bits. The target is a permutation-based symmetric cryptographic construction with a quadratic map as an S-box. More precisely, we focus on the nonlinear mapping X used in the Xoodoo and Keccak-p permutations, which is affine equivalent to the nonlinear mapping of Ascon. We thus consider three-bit and five-bit S-boxes. Our leakage model is the difference in power consumption of register cells before and after one round. It reflects that, >in hardware, the aforesaid cryptographic algorithms are usually implemented by deploying a round-based architecture. The power consumption difference depends on whether the targeted bits in the register flip. In particular, we describe two attacks based on the CPA methodology. First, we start with a standard CPA approach, for which, to the best of our knowledge, we are the first to point out the differences between attacking a three-bit and a five-bit S-box. For CPA, the highest correlation coefficient is the most likely secret hypothesis. We improve on this with our novel combined Correlation Power Analysis (combined CPA), or Snake attack, which uses quadratic map cryptanalysis (e.g., of the function X) to achieve a better attack in terms of the number of traces required and computational complexity. For the Snake attack, sums of absolute or squared values of correlation coefficients are used to determine the most likely guess. As a result, we effectively show that our proposed Snake attack can recover the secret in n22 (or n22+n) intermediate results, compared to 22n for the CPA, where n is the length of the targeted S-Box. We collected power measurements from a hardware setup to demonstrate practical attack success probabilities according to the rank of the correct secret hypothesis both for Xoodoo and Keccak-p. In addition, we explain our success probabilities thanks to the Henery model developed for horse races. In short, after performing 16,896 attacks, the Snake attack or combined CPA on Xoodoo consistently recovers the correct secret bits with each attack using 43,860 traces on average and with only 12 correlations, compared to 61,380 traces for the standard CPA attack with 64 correlations. The Snake attack requires about one-fifth as many correlation values as the standard CPA. For Keccak-p, the difference is more drastic: the Snake attack recovers the secret bits invariably after 771,600 traces with just 20 correlations. In contrast, the standard CPA attack operates on 1,024 correlations (about fifty times more than the Snake attack) and requires 1,223,400 traces.

2022

TCHES

Racing BIKE: Improved Polynomial Multiplication and Inversion in Hardware Abstract

Jan Richter-Brockmann Ming-Shing Chen Santosh Ghosh Tim Güneysu

BIKE is a Key Encapsulation Mechanism selected as an alternate candidate in NIST’s PQC standardization process, in which performance plays a significant role in the third round. This paper presents FPGA implementations of BIKE with the best area-time performance reported in literature. We optimize two key arithmetic operations, which are the sparse polynomial multiplication and the polynomial inversion. Our sparse multiplier achieves time-constancy for sparse polynomials of indefinite Hamming weight used in BIKE’s encapsulation. The polynomial inversion is based on the extended Euclidean algorithm, which is unprecedented in current BIKE implementations. Our optimized design results in a 5.5 times faster key generation compared to previous implementations based on Fermat’s little theorem.Besides the arithmetic optimizations, we present a united hardware design of BIKE with shared resources and shared sub-modules among KEM functionalities. On Xilinx Artix-7 FPGAs, our light-weight implementation consumes only 3 777 slices and performs a key generation, encapsulation, and decapsulation in 3 797 μs, 443 μs, and 6 896 μs, respectively. Our high-speed design requires 7 332 slices and performs the three KEM operations in 1 672 μs, 132 μs, and 1 892 μs, respectively.

2022

TCHES

Randomness Optimization for Gadget Compositions in Higher-Order Masking Abstract

Jakob Feldtkeller David Knichel Pascal Sasdrich Amir Moradi Tim Güneysu

Physical characteristics of electronic devices, leaking secret and sensitive information to an adversary with physical access, pose a long-known threat to cryptographic hardware implementations. Among a variety of proposed countermeasures against such Side-Channel Analysis attacks, masking has emerged as a promising, but often costly, candidate. Furthermore, the manual realization of masked implementations has proven error-prone and often introduces flaws, possibly resulting in insecure circuits. In the context of automatic masking, a new line of research emerged, aiming to replace each physical gate with a secure gadget that fulfills well-defined properties, guaranteeing security when interconnected to a large circuit. Unfortunately, those gadgets introduce a significant amount of additional overhead into the design, in terms of area, latency, and randomness requirements.In this work, we present a novel approach to reduce the demands for randomness in such gadget-composed circuits by reusing randomness across gadgets while maintaining security in the probing adversary model. To this end, we embedded the corresponding optimization passes into an Electronic Design Automation toolchain, able to construct, optimize, and implement masked circuits, starting from an unprotected design. As such, our security-aware optimization offers an additional building block for existing or new Electronic Design Automation frameworks, where security is considered a first-class design constraint.

2022

TCHES

VERICA - Verification of Combined Attacks: Automated formal verification of security against simultaneous information leakage and tampering Abstract

Jan Richter-Brockmann Jakob Feldtkeller Pascal Sasdrich Tim Güneysu

Physical attacks, including passive Side-Channel Analysis and active Fault Injection Analysis, are considered among the most powerful threats against physical cryptographic implementations. These attacks are well known and research provides many specialized countermeasures to protect cryptographic implementations against them. Still, only a limited number of combined countermeasures, i.e., countermeasures that protect implementations against multiple attacks simultaneously, were proposed in the past. Due to increasing complexity and reciprocal effects, design of efficient and reliable combined countermeasures requires longstanding expertise in hardware design and security. With the help of formal security specifications and adversary models, automated verification can streamline development cycles, increase quality, and facilitate development of robust cryptographic implementations.In this work, we revise and refine formal security notions for combined protection mechanisms and specifically embed them in the context of hardware implementations. Based on this, we present the first automated verification framework that can verify physical security properties of hardware circuits with respect to combined physical attacks. To this end, we conduct several case studies to demonstrate the capabilities and advantages of our framework, analyzing secure building blocks (gadgets), S-boxes build from Toffoli gates, and the ParTI scheme. For the first time, we reveal security flaws in analyzed structures due to reciprocal effects, highlighting the importance of continuously integrating security verification into modern design and development cycles.

2022

TCHES

Risky Translations: Securing TLBs against Timing Side Channels Abstract

Florian Stolz Jan Philipp Thoma Pascal Sasdrich Tim Güneysu

Microarchitectural side-channel vulnerabilities in modern processors are known to be a powerful attack vector that can be utilized to bypass common security boundaries like memory isolation. As shown by recent variants of transient execution attacks related to Spectre and Meltdown, those side channels allow to leak data from the microarchitecture to the observable architectural state. The vast majority of attacks currently build on the cache-timing side channel, since it is easy to exploit and provides a reliable, fine-grained communication channel. Therefore, many proposals for side-channel secure cache architectures have been made. However, caches are not the only source of side-channel leakage in modern processors and mitigating the cache side channel will inevitably lead to attacks exploiting other side channels. In this work, we focus on defeating side-channel attacks based on page translations.It has been shown that the Translation Lookaside Buffer (TLB) can be exploited in a very similar fashion to caches. Since the main caches and the TLB share many features in their architectural design, the question arises whether existing countermeasures against cache-timing attacks can be used to secure the TLB. We analyze state-ofthe-art proposals for side-channel secure cache architectures and investigate their applicability to TLB side channels. We find that those cache countermeasures are notdirectly applicable to TLBs, and propose TLBcoat, a new side-channel secure TLB architecture. We provide evidence of TLB side-channel leakage on RISC-V-based Linux systems, and demonstrate that TLBcoat prevents this leakage. We implement TLBcoat using the gem5 simulator and evaluate its performance using the PARSEC benchmark suite.

2021

TCHES

LifeLine for FPGA Protection: Obfuscated Cryptography for Real-World Security 📺 Abstract

Florian Stolz Nils Albartus Julian Speith Simon Klix Clemens Nasenberg Aiden Gula Marc Fyrbiak Christof Paar Tim Güneysu Russell Tessier

Over the last decade attacks have repetitively demonstrated that bitstream protection for SRAM-based FPGAs is a persistent problem without a satisfying solution in practice. Hence, real-world hardware designs are prone to intellectual property infringement and malicious manipulation as they are not adequately protected against reverse-engineering.In this work, we first review state-of-the-art solutions from industry and academia and demonstrate their ineffectiveness with respect to reverse-engineering and design manipulation. We then describe the design and implementation of novel hardware obfuscation primitives based on the intrinsic structure of FPGAs. Based on our primitives, we design and implement LifeLine, a hardware design protection mechanism for FPGAs using hardware/software co-obfuscated cryptography. We show that LifeLine offers effective protection for a real-world adversary model, requires minimal integration effort for hardware designers, and retrofits to already deployed (and so far vulnerable) systems.

2021

TCHES

FIVER – Robust Verification of Countermeasures against Fault Injections 📺 Abstract

Jan Richter-Brockmann Aein Rezaei Shahmirzadi Pascal Sasdrich Amir Moradi Tim Güneysu

Fault Injection Analysis is seen as a powerful attack against implementations of cryptographic algorithms. Over the last two decades, researchers proposed a plethora of countermeasures to secure such implementations. However, the design process and implementation are still error-prone, complex, and manual tasks which require long-standing experience in hardware design and physical security. Moreover, the validation of the claimed security is often only done by empirical testing in a very late stage of the design process. To prevent such empirical testing strategies, approaches based on formal verification are applied instead providing the designer early feedback.In this work, we present a fault verification framework to validate the security of countermeasures against fault-injection attacks designed for ICs. The verification framework works on netlist-level, parses the given digital circuit into a model based on Binary Decision Diagrams, and performs symbolic fault injections. This verification approach constitutes a novel strategy to evaluate protected hardware designs against fault injections offering new opportunities as performing full analyses under a given fault models.Eventually, we apply the proposed verification framework to real-world implementations of well-established countermeasures against fault-injection attacks. Here, we consider protected designs of the lightweight ciphers CRAFT and LED-64 as well as AES. Due to several optimization strategies, our tool is able to perform more than 90 million fault injections in a single-round CRAFT design and evaluate the security in under 50 min while the symbolic simulation approach considers all 2128 primary inputs.

2020

TCHES

High-Speed Masking for Polynomial Comparison in Lattice-based KEMs 📺 Abstract

Florian Bache Clara Paglialonga Tobias Oder Tobias Schneider Tim Güneysu

With the NIST post-quantum standardization competition entering the second round, the interest in practical implementation results of the remaining NIST candidates is steadily growing. Especially implementations on embedded devices are often not protected against side-channel attacks, such as differential power analysis. In this regard, the application of countermeasures against side-channel attacks to candidates of the NIST standardization process is still an understudied topic. Our work aims to contribute to the NIST competition by enabling a more realistic judgment of the overhead cost introduced by side-channel countermeasures that are applied to lattice-based KEMs that achieve CCA-security based on the Fujisaki-Okamoto transform. We present a novel higher-order masking scheme that enables an efficient comparison of polynomials as previous techniques based on arithmetic-to-Boolean conversions renders this (generally inexpensive) component extremely expensive in the masked case. Our approach has linear complexity in the number of shares compared to quadratic complexity of previous contributions and it applies to lattice based schemes with prime modulus. It comes with a proof in the probing model and an efficient implementation on an ARM Cortex-M4F microcontroller which was defined as a preferred evaluation platform for embedded implementations by NIST. Our algorithm can be executed in only 1.5-2.2 milliseconds on the target platform (depending on the masking order) and is therefore well suited even for lightweight applications. While in previous work, practical side-channel experiments were conducted using only 5,000 - 100,000 power traces, we confirm the absence of first-order leakage in this work by collecting 1 million power traces and applying the t-test methodology.

2020

TCHES

Side-Channel Analysis of the Xilinx Zynq UltraScale+ Encryption Engine 📺 Abstract

Benjamin Hettwer Sebastien Leger Daniel Fennes Stefan Gehrer Tim Güneysu

The Xilinx Zynq UltraScale+ (ZU+) is a powerful and flexible System-on- Chip (SoC) computing platform for next generation applications such as autonomous driving or industrial Internet-of-Things (IoT) based on 16 nm production technology. The devices are equipped with a secure boot mechanism in order to provide confidentiality, integrity, and authenticity of the configuration files that are loaded during power-up. This includes a dedicated encryption engine which features a protocol-based countermeasure against passive Side-Channel Attacks (SCAs) called key rolling. The mechanism ensures that the same key is used only for a certain number of data blocks that has to be defined by the user. However, a suitable choice for the key rolling parameter depends on the power leakage behavior of the chip and is not published by the manufacturer. To close this gap, this paper presents the first publicly known side-channel analysis of the ZU+ encryption unit. We conduct a black-box reverse engineering of the internal hardware architecture of the encryption engine using Electromagnetic (EM) measurements from a decoupling capacitor of the power supply. Then, we illustrate a sophisticated methodology that involves the first five rounds of an AES encryption to attack the 256-bit secret key. We apply the elaborated attack strategy using several new Deep Learning (DL)-based evaluation methods for cryptographic implementations. Even though we are unable to recover all bytes of the secret key, the experimental results still allow us to provide concrete recommendations for the key rolling parameter under realistic conditions. This eventually helps to configure the secure boot mechanism of the ZU+ and similar devices appropriately.

2019

PKC

Efficiently Masking Binomial Sampling at Arbitrary Orders for Lattice-Based Crypto Abstract

Tobias Schneider Clara Paglialonga Tobias Oder Tim Güneysu

With the rising popularity of lattice-based cryptography, the Learning with Errors (LWE) problem has emerged as a fundamental core of numerous encryption and key exchange schemes. Many LWE-based schemes have in common that they require sampling from a discrete Gaussian distribution which comes with a number of challenges for the practical instantiation of those schemes. One of these is the inclusion of countermeasures against a physical side-channel adversary. While several works discuss the protection of samplers against timing leaks, only few publications explore resistance against other side-channels, e.g., power. The most recent example of a protected binomial sampler (as used in key encapsulation mechanisms to sufficiently approximate Gaussian distributions) from CHES 2018 is restricted to a first-order adversary and cannot be easily extended to higher protection orders.In this work, we present the first protected binomial sampler which provides provable security against a side-channel adversary at arbitrary orders. Our construction relies on a new conversion between Boolean and arithmetic (B2A) masking schemes for prime moduli which outperforms previous algorithms significantly for the relevant parameters, and is paired with a new masked bitsliced sampler allowing secure and efficient sampling even at larger protection orders. Since our proposed solution supports arbitrary moduli, it can be utilized in a large variety of lattice-based constructions, like NewHope, LIMA, Saber, Kyber, HILA5, or Ding Key Exchange.

2018

TCHES

Practical CCA2-Secure and Masked Ring-LWE Implementation 📺 Abstract

Tobias Oder Tobias Schneider Thomas Pöppelmann Tim Güneysu

During the last years public-key encryption schemes based on the hardness of ring-LWE have gained significant popularity. For real-world security applications assuming strong adversary models, a number of practical issues still need to be addressed. In this work we thus present an instance of ring-LWE encryption that is protected against active attacks (i.e., adaptive chosen-ciphertext attacks) and equipped with countermeasures against side-channel analysis. Our solution is based on a postquantum variant of the Fujisaki-Okamoto (FO) transform combined with provably secure first-order masking. To protect the key and message during decryption, we developed a masked binomial sampler that secures the re-encryption process required by FO. Our work shows that CCA2-secured RLWE-based encryption can be achieved with reasonable performance on constrained devices but also stresses that the required transformation and handling of decryption errors implies a performance overhead that has been overlooked by the community so far. With parameters providing 233 bits of quantum security, our implementation requires 4,176,684 cycles for encryption and 25,640,380 cycles for decryption with masking and hiding countermeasures on a Cortex-M4F. The first-order security of our masked implementation is also practically verified using the non-specific t-test evaluation methodology.

2018

TCHES

Standard Lattice-Based Key Encapsulation on Embedded Devices Abstract

James Howe Tobias Oder Markus Krausz Tim Güneysu

Lattice-based cryptography is one of the most promising candidates being considered to replace current public-key systems in the era of quantum computing. In 2016, Bos et al. proposed the key exchange scheme FrodoCCS, that is also a submission to the NIST post-quantum standardization process, modified as a key encapsulation mechanism (FrodoKEM). The security of the scheme is based on standard lattices and the learning with errors problem. Due to the large parameters, standard latticebased schemes have long been considered impractical on embedded devices. The FrodoKEM proposal actually comes with parameters that bring standard lattice-based cryptography within reach of being feasible on constrained devices. In this work, we take the final step of efficiently implementing the scheme on a low-cost FPGA and microcontroller devices and thus making conservative post-quantum cryptography practical on small devices. Our FPGA implementation of the decapsulation (the computationally most expensive operation) needs 7,220 look-up tables (LUTs), 3,549 flip-flops (FFs), a single DSP, and only 16 block RAM modules. The maximum clock frequency is 162 MHz and it takes 20.7 ms for the execution of the decapsulation. Our microcontroller implementation has a 66% reduced peak stack usage in comparison to the reference implementation and needs 266 ms for key pair generation, 284 ms for encapsulation, and 286 ms for decapsulation. Our results contribute to the practical evaluation of a post-quantum standardization candidate.

2016

CRYPTO

ParTI - Towards Combined Hardware Countermeasures Against Side-Channel and Fault-Injection Attacks 📺

Tobias Schneider Amir Moradi Tim Güneysu

2016

FSE

White-Box Cryptography in the Gray Box - - A Hardware Implementation and its Side Channels -

Pascal Sasdrich Amir Moradi Tim Güneysu

2016

CHES

Strong 8-bit Sboxes with Efficient Masking in Hardware 📺

Erik Boss Vincent Grosso Tim Güneysu Gregor Leander Amir Moradi Tobias Schneider