International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Recently updated IACR publications

CryptoDB is periodically updated by manual and automatic processes. Whenever a paper is added or modified it will appear in this list, e.g., when a video appears.

A separate history of changes tracks schema and process changes. There is further information about CryptoDB in the documentation.

Year
Venue
Title
2024
TCHES
High-Performance Design Patterns and File Formats for Side-Channel Analysis
Data and instruction dependent power consumption can reveal cryptographic secrets by means of Side-Channel Analysis (SCA). Consequently, manufacturers and evaluation labs perform thorough testing of cryptographic implementations to confirm their security. Unfortunately, the computation and storage needs for the resulting measurement data can be substantial and at times, limit the scope of their analyses. Therefore, it is surprising that only few publications study the efficient computation and storage of side-channel analysis related data.To address this gap, we discuss high-performance design patterns and how they align with characteristics of different file formats. More specifically, we perform an in-depth analysis of common side-channel analysis algorithms and how they can be implemented for maximum performance. At the same time, we focus on storage requirements and how to reduce them, by applying compression and chunking.In addition, we investigate and benchmark popular SCA frameworks. Moreover, we propose SCARR, a proof of concept SCA framework based on the file format Zarr, that outperforms all considered frameworks in several common algorithms (SNR, TVLA, CPA, MIA) by a factor of about two compared to the thus far fastest framework for a given profile. Most notably, in all tested scenarios, we are faster even with file compression, than other frameworks without compression. We are convinced that the presented design patterns and comparative study will benefit the greater side-channel community, help practitioners to improve their own frameworks, and reduce data storage requirements, associated costs, and lower computation/energy demands of SCA, as required to perform more testing at scale.
2024
TCHES
Defeating Low-Cost Countermeasures against Side-Channel Attacks in Lattice-based Encryption: A Case Study on Crystals-Kyber
In an effort to circumvent the high cost of standard countermeasures against side-channel attacks in post-quantum cryptography, some works have developed low-cost detection-based countermeasures. These countermeasures try to detect maliciously generated input ciphertexts and react to them by discarding the ciphertext or secret key. In this work, we take a look at two previously proposed low-cost countermeasures: the ciphertext sanity check and the decapsulation failure check, and demonstrate successful attacks on these schemes. We show that the first countermeasure can be broken with little to no overhead, while the second countermeasure requires a more elaborate attack strategy that relies on valid chosen ciphertexts. Thus, in this work, we propose the first chosen-ciphertext based side-channel attack that only relies on valid ciphertexts for key recovery. As part of this attack, a third contribution of our paper is an improved solver that retrieves the secret key from linear inequalities constructed using side-channel leakage from the decryption procedure. Our solver is an improvement over the state-of-the-art Belief Propagation solvers by Pessl and Prokop, and later Delvaux. Our method is simpler, easier to understand and has lower computational complexity, while needing less than half the inequalities compared to previous methods.
2024
TCHES
SHAPER: A General Architecture for Privacy-Preserving Primitives in Secure Machine Learning
Secure multi-party computation and homomorphic encryption are two primary security primitives in privacy-preserving machine learning, whose wide adoption is, nevertheless, constrained by the computation and network communication overheads. This paper proposes a hybrid Secret-sharing and Homomorphic encryption Architecture for Privacy-pERsevering machine learning (SHAPER). SHAPER protects sensitive data in encrypted or randomly shared domains instead of relying on a trusted third party. The proposed algorithm-protocol-hardware co-design methodology explores techniques such as plaintext Single Instruction Multiple Data (SIMD) and fine-grained scheduling, to minimize end-to-end latency in various network settings. SHAPER also supports secure domain computing acceleration and the conversion between mainstream privacy-preserving primitives, making it ready for general and distinctive data characteristics. SHAPER is evaluated by FPGA prototyping with a comprehensive hyper-parameter exploration, demonstrating a 94x speed-up over CPU clusters on large-scale logistic regression training tasks.
2024
TCHES
Carry Your Fault: A Fault Propagation Attack on Side-Channel Protected LWE-based KEM
Post-quantum cryptographic (PQC) algorithms, especially those based on the learning with errors (LWE) problem, have been subjected to several physical attacks in the recent past. Although the attacks broadly belong to two classes – passive side-channel attacks and active fault attacks, the attack strategies vary significantly due to the inherent complexities of such algorithms. Exploring further attack surfaces is, therefore, an important step for eventually securing the deployment of these algorithms. Also, it is mportant to test the robustness of the already proposed countermeasures in this regard. In this work, we propose a new fault attack on side-channel secure masked implementation of LWE-based key-encapsulation mechanisms (KEMs) exploiting fault propagation. The attack typically originates due to an algorithmic modification widely used to enable masking, namely the Arithmetic-to-Boolean (A2B) conversion. We exploit the data dependency of the adder carry chain in A2B and extract sensitive information, albeit masking (of arbitrary order) being present. As a practical demonstration of the exploitability of this information leakage, we show key recovery attacks of Kyber, although the leakage also exists for other schemes like Saber. The attack on Kyber targets the decapsulation module and utilizes Belief Propagation (BP) for key recovery. To the best of our knowledge, it is the first attack exploiting an algorithmic component introduced to ease masking rather than only exploiting the randomness introduced by masking to obtain desired faults (as done by Delvaux [Del22]). Finally, we performed both simulated and electromagnetic (EM) fault-based practical validation of the attack for an open-source first-order secure Kyber implementation running on an STM32 platform.
2024
TCHES
Impact of the Flicker Noise on the Ring Oscillator-based TRNGs
Ring Oscillators (RO) are often used in true random number generators (TRNG). Their jittered clock signal, used as randomness source, originates from thermal and flicker noises. While thermal noise jitter is generally used as the main source of randomness, flicker noise jitter is not due to its autocorrelation. This work aims at qualitatively settling the issue of the influence of flicker noise in TRNGs, as its impact increases in newer technology nodes. For this, we built a RO behavioural model, which generates time series equivalent to a jittered RO signal. It is then used to generate the output of an elementary RO-TRNG. Despite general expectations, the autocorrelation inside the output bit stream is reduced when the amplitude of flicker noise increases. The model shows that this effect is caused by the sampling of the jittered signal by the second oscillator, which hides the behaviour of the absolute jitter, causes resetting of the perceived phase, and suppresses any memory effect. The inclusion of flicker noise as a legitimate noise source can increase the TRNG output bit rate by a factor of four for the same output entropy rate. This observation opens new perspectives towards more efficient stochastic models of the RO-TRNGs.
2024
TCHES
JustSTART: How to Find an RSA Authentication Bypass on Xilinx UltraScale(+) with Fuzzing
Fuzzing is a well-established technique in the software domain to uncover bugs and vulnerabilities. Yet, applications of fuzzing for security vulnerabilities in hardware systems are scarce, as principal reasons are requirements for design information access, i.e., HDL source code. Moreover, observation of internal hardware state during runtime is typically an ineffective information source, as its documentation is often not publicly available. In addition, such observation during runtime is also inefficient due to bandwidth-limited analysis interfaces, i.e., JTAG, and minimal introspection of hardware-internal modules.In this work, we investigate fuzzing for Xilinx 7-Series and UltraScale(+) FPGA configuration engines, the control plane governing the (secure) bitstream configuration within the FPGA. Our goal is to examine the effectiveness of fuzzing to analyze and document the opaque inner workings of FPGA configuration engines, with a primary emphasis on identifying security vulnerabilities. Using only the publicly available hardware chip and dispersed documentation, we first design and implement ConFuzz, an advanced FPGA configuration engine fuzzing and rapid prototyping framework. Based on our detailed understanding of the bitstream file format, we then systematically define 3 novel key fuzzing strategies for Xilinx FPGA configuration engines. Moreover, our strategies are executed through mutational structure-aware fuzzers and incorporate various novel custom-tailored, FPGA-specific optimizations to reduce search space. Our evaluation reveals previously undocumented behavior within the configuration engine, including critical findings such as system crashes leading to unresponsive states of the whole FPGA. In addition, our investigations not only lead to the rediscovery of the recent starbleed attack but also uncover a novel unpatchable vulnerability, denoted as JustSTART (CVE-2023-20570), capable of circumventing RSA authentication for Xilinx UltraScale(+). Note that we also discuss effective countermeasures by secure FPGA settings to prevent aforementioned attacks.
2024
TCHES
CASA: A Compact and Scalable Accelerator for Approximate Homomorphic Encryption
Approximate arithmetic-based homomorphic encryption (HE) scheme CKKS [CKKS17] is arguably the most suitable one for real-world data-privacy applications due to its wider computation range than other HE schemes such as BGV [BGV14], FV and BFV [Bra12, FV12]. However, the most crucial homomorphic operation of CKKS called key-switching induces a great amount of computational burden in actual deployment situations, and creates scalability challenges for hardware acceleration. In this paper, we present a novel Compact And Scalable Accelerator (CASA) for CKKS on the field-programmable gate array (FPGA) platform. The proposed CASA addresses the aforementioned computational and scalability challenges in homomorphic operations, including key-exchange, homomorphic multiplication, homomorphic addition, and rescaling.On the architecture layer, we propose a new design methodology for efficient acceleration of CKKS. We design this novel hardware architecture by carefully studying the homomorphic operation patterns and data dependency amongst the primitive oracles. The homomorphic operations are efficiently mapped into an accelerator with simple control and smooth operation, which brings benefits for scalable implementation and enhanced pipeline and parallel processing (even with the potential for further improvement).On the component layer, we carry out a detailed and extensive study and present novel micro-architectures for primitive function modules, including memory bank, number theoretic transform (NTT) module, modulus switching bank, and dyadic multiplication and accumulation.On the arithmetic layer, we develop a new partially reduction-free modular arithmetic technique to eliminate part of the reduction cost over different prime moduli within the moduli chain of the Residue Number System (RNS). The proposed structure can support arbitrary numbers of security primes of CKKS during key exchange, which offers better security options for adopting the scalable design methodology.As a proof-of-concept, we implement CASA on the FPGA platform and compare it with state-of-the-art designs. The implementation results showcase the superior performance of the proposed CASA in many aspects such as compact area, scalable architecture, and overall better area-time complexities.In particular, we successfully implement CASA on a mainstream resource-constrained Artix-7 FPGA. To the authors’ best knowledge, this is the first compact CKKS accelerator implemented on an Artix-7 device, e.g., CASA achieves a 10.8x speedup compared with the state-of-the-art CPU implementations (with power consumption of only 5.8%). Considering the power-delay product metric, CASA also achieves 138x and 105x improvement compared with the recent GPU implementation.
2024
TCHES
Compact Circuits for Efficient Möbius Transform
The Möbius transform is a linear circuit used to compute the evaluations of a Boolean function over all points on its input domain. The operation is very useful in finding the solution of a system of polynomial equations over GF(2) for obvious reasons. However the operation, although linear, needs exponential number of logic operations (around n · 2n−1 bit xors) for an n-variable Boolean function. As such, the only known hardware circuit to efficiently compute the Möbius Transform requires silicon area that is exponential in n. For Boolean functions whose algebraic degree is bound by some parameter d, recursive definitions of the Möbius Transform exist that requires only O(nd+1) space in software. However converting the mathematical definition of this space-efficient algorithm into a hardware architecture is a non-trivial task, primarily because the recursion calls notionally lead to a depth-first search in a transition graph that requires context switches at each recursion call for which straightforward mapping to hardware is difficult. In this paper we look to overcome these very challenges in an engineering sense. We propose a space efficient sequential hardware circuit for the Möbius Transform that requires only polynomial circuit area (i.e. O(nd+1)) provided the algebraic degree of the Boolean function is limited to d. We show how this circuit can be used as a component to efficiently solve polynomial equations of degree at most d by using fast exhaustive search. We propose three different circuit architectures for this, each of which uses the Möbius Transform circuit as a core component. We show that asymptotically, all the solutions of a system of m polynomials in n unknowns and algebraic degree d over GF(2) can be found using a circuit of silicon area proportional to m · nd+1 and circuit depth proportional to 2 · log2(n − d).In the second part of the paper we introduce a fourth hardware solver that additionally aims to achieve energy efficiency. The main idea is to reduce the solution space to a small enough value by parallel application of Möbius Transform circuits over the first few equations of the system. This is done so that one can check individually whether the vectors of this reduced solution space satisfy each of the remaining equations of the system using lower power consumption. The new circuit has area also bound by m · nd+1 and has circuit depth proportional to d · log2 n. We also show that further optimizations with respect to energy consumption may be obtained by using depth-bound Möbius circuits that exponentially decrease run time at the cost of additional logic area and depth.
2024
TCHES
Load-Balanced Parallel Implementation on GPUs for Multi-Scalar Multiplication Algorithm
Multi-scalar multiplication (MSM) is an important building block in most of elliptic-curve-based zero-knowledge proof systems, such as Groth16 and PLONK. Recently, Lu et al. proposed cuZK, a new parallel MSM algorithm on GPUs. In this paper, we revisit this scheme and present a new GPU-based implementation to further improve the performance of MSM algorithm. First, we propose a novel method for mapping scalars into Pippenger’s bucket indices, largely reducing the number of buckets compared to the original Pippenger algorithm. Second, in the case that memory is sufficient, we develop a new efficient algorithm based on homogeneous coordinates in the bucket accumulation phase. Moreover, our accumulation phase is load-balanced, which means the parallel speedup ratio is almost linear growth as the number of device threads increases. Finally, we also propose a parallel layered reduction algorithm for the bucket aggregation phase, whose time complexity remains at the logarithmic level of the number of buckets. The implementation results over the BLS12-381 curve on the V100 graphics card show that our proposed algorithm achieves up to 1.998x, 1.821x and 1.818x speedup compared to cuZK at scales of 221, 222, and 223, respectively.
2024
TCHES
Gleeok: A Family of Low-Latency PRFs and its Applications to Authenticated Encryption
In this paper, we propose a new family of low-latency pseudorandom functions (PRFs), dubbed Gleeok.Gleeok utilizes three 128-bit branches to achieve a 256-bit key size while maintaining low latency. The first two branches are specifically designed to defend against statistical attacks, especially for differential attacks, while the third branch provides resilience against algebraic attacks. This unique design enables Gleeok to offer ultralow latency while supporting 256-bit keys, setting it apart from existing ciphers dedicated to low-latency requirements. In addition, we propose wide-block variants having three 256-bit branches. We also present an application of Gleeok to short-input authenticated encryption which is crucial for memory encryption and various realtime communication applications. Furthermore, we present comprehensive hardware implementation results that establish the capabilities of Gleeok and demonstrate its competitiveness against related schemes in the literature. In particular, Gleeok achieves a minimum latency of roughly 360 ps with the NanGate 15 nm cell library and is thus on par with related low-latency schemes that only feature 128-bit keys while maintaining minimal overhead when equipped in an authenticated mode of operation.
2024
TCHES
OBSCURE: Versatile Software Obfuscation from a Lightweight Secure Element
Software obfuscation is a powerful tool to protect the intellectual property or secret keys inside programs. Strong software obfuscation is crucial in the context of untrusted execution environments (e.g., subject to malware infection) or to face potentially malicious users trying to reverse-engineer a sensitive program. Unfortunately, the state-of-the-art of pure software-based obfuscation (including white-box cryptography) is either insecure or infeasible in practice.This work introduces OBSCURE, a versatile framework for practical and cryptographically strong software obfuscation relying on a simple stateless secure element (to be embedded, for example, in a protected hardware chip or a token). Based on the foundational result by Goyal et al. from TCC 2010, our scheme enjoys provable security guarantees, and further focuses on practical aspects, such as efficient execution of the obfuscated programs, while maintaining simplicity of the secure element. In particular, we propose a new rectangular universalization technique, which is also of independent interest. We provide an implementation of OBSCURE taking as input a program source code written in a subset of the C programming language. This ensures usability and a broad range of applications of our framework. We benchmark the obfuscation on simple software programs as well as on cryptographic primitives, hence highlighting the possible use cases of the framework as an alternative to pure software-based white-box implementations.
2024
TCHES
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion
Masking, an effective countermeasure against side-channel attacks, is commonly applied in modern cryptographic implementations. Considering cryptographic algorithms that utilize both Boolean and arithmetic masking, the conversion algorithm between arithmetic masking and Boolean masking is required. Conventional high-order arithmetic masking to Boolean masking conversion algorithms based on Boolean circuits suffer from performance overhead, especially in terms of hardware implementation. In this work, we analyze high latency for the conversion and propose an improved high-order A2B conversion algorithm. For the conversion of 16-bit variables, the hardware latency can be reduced by 47% in the best scenario. For the case study of second-order 32-bit conversion, the implementation results show that the improved scheme reduces the clock cycle latency by 42% in hardware and achieves a 30% speed performance improvement in software. Theoretically, a security proof of arbitrary order is provided for the proposed high-order A2B conversion. Experimental validations are performed to verify the second-order DPA resistance of second-order implementation. The Test Vector Leakage Assessment does not observe side-channel leakage for hardware and software implementations.
2024
TCHES
Laser-Based Command Injection Attacks on Voice-Controlled Microphone Arrays
Voice-controlled (VC) systems, such as mobile phones and smart speakers, enable users to operate smart devices through voice commands. Previous works (e.g., LightCommands) show that attackers can trigger VC systems to respond to various audio commands by injecting light signals. However, LightCommands only discusses attacks on devices with a single microphone, while new devices typically use microphone arrays with sensor fusion technology for better capturing sound from different distances. By replicating LightCommands’s experiments on the new devices, we find that simply extending the light scope (just as they do) to overlap multiple microphone apertures is inadequate to wake up the device with sensor fusion. Adapting LightCommands’s approach to microphone arrays is challenging due to their requirement for multiple sound amplifiers, and each amplifier requires an independent power driver with unique settings. The number of additional devices increases with the microphone aperture count, significantly increasing the complexity of implementing and deploying the attack equipment. With a growing number of devices adopting sensor fusion to distinguish the sound location, it is essential to propose new approaches to adapting the light injection attacks to these new devices. To address these problems, we propose a lightweight microphone array laser injection solution called LCMA (Laser Commands for Microphone Array), which can use a single laser controller to manipulate multiple laser points and simultaneously target all the apertures of a microphone array and input light waves at different frequencies. Our key design is to propose a new PWM (Pulse Width Modulation) based control signal algorithm that can be implemented on a single MCU and directly control multiple lasers via different PWM output channels. Moreover, LCMA can be remotely configured via BLE (Bluetooth Low Energy). These features allow our solution to be deployed on a drone to covertly attack the targets hidden inside the building. Using LCMA, we successfully attack 29 devices. The experiment results show that LCMA is robust on the newest devices such as the iPhone 15, and the control panel of the Tesla Model Y.
2024
TCHES
UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication
White-box cryptography (WBC) seeks to protect secret keys even if the attacker has full control over the execution environment. One of the techniques to hide the key is space hardness approach, which conceals the key into a large lookup table generated from a reliable small block cipher. Despite its provable security, space-hard WBC also suffers from heavy performance overhead when executed on general purpose hardware platform, hundreds of magnitude slower than conventional block ciphers. Specifically, recent studies adopt nested substitution permutation network (NSPN) to construct dedicated white-box block cipher [BIT16], whose performance is limited by a massive number of rounds, nested loop dependency and high-dimension dynamic maximal distance separable (MDS) matrices.To address these limitations, we put forward UpWB, an uncoupled and efficient accelerator for NSPN-structure WBC. We propose holistic optimization techniques across timing schedule, algorithms and operators. For the high-level timing schedule, we propose a fine-grained task partition (FTP) mechanism to decouple the parameteroriented nested loop with different trip counts. The FTP mechanism narrows down the idle time for synchronization and avoids the extra usage of FIFO, which efficiently increases the computation throughput. For the optimization of arithmetic operators, we devise a flexible and vectorized modular multiplier (VMM) based on the complexity-reduced Montgomery algorithm, which can process multi-precision variable data, multi-size matrix-vector multiplication and different irreducible polynomials. Then, a configurable matrix-vector multiplication (MVM) architecture with diagonal-major dataflow is presented to handle the dynamic MDS matrix. The multi-scale (Inv)Mixcolumns are also unified in a compact manner by intensively sharing the common sub-operations and customizing the constant multiplier.To verify the proposed methodology, we showcase the unified design implementation for three recent families of WBCs, including SPNbox-8/16/24/32, Yoroi-16/32 and WARX-16. Evaluated on FPGA platform, UpWB outperforms the optimized software counterpart (executed on 3.2 GHz Intel CPU with AES-NI and AVX2 instructions) by 7x to 30x in terms of computation throughput. Synthesized under TSMC 28nm technology, 36x to 164x improvement of computation throughput is achieved when UpWB operates at the maximum frequency of 1.3 GHz and consumes a modest area 0.14 mm2. Besides, the proposed VMM also offers about 30% improvement of area efficiency without pulling flexibility down when compared to state-of-the-art work.
2024
TCHES
TPMScan: A wide-scale study of security-relevant properties of TPM 2.0 chips
The Trusted Platform Module (TPM) is a widely deployed computer component that provides increased protection of key material during cryptographic operations, secure storage, and support for a secure boot with a remotely attestable state of the target machine. A systematic study of the TPM ecosystem, its cryptographic properties, and the orderliness of vulnerability mitigation is missing despite its pervasive deployment – likely due to the black-box nature of the implementations. We collected metadata, RSA and ECC cryptographic keys, and performance characteristics from 78 different TPM versions manufactured by 6 vendors, including recent Pluton-based iTPMs, to systematically analyze TPM implementations.Surprisingly, a high rate of changes with a detectable impact on generated secrets, the timing of cryptographic operations, and frequent off-chip generation of Endorsement Keys were observed. Our analysis of public artifacts for TPM-related products certified under Common Criteria (CC) and FIPS 140 showed relatively high popularity of TPMs but without explanation for these changes in cryptographic implementations. Despite TPMs being commonly certified to CC EAL4+, serious vulnerabilities like ROCA or TPM-Fail were discovered in the past. We found a range of additional unreported nonce leakages in ECDSA, ECSCHNORR, and ECDAA algorithms in dTPMs and fTPMs of three vendors. The most serious discovered leakage allows extraction of the private key of certain Intel’s fTPM versions using only nine signatures with no need for any side-channel information, making the vulnerability retrospectively exploitable despite a subsequent firmware update. Unreported timing leakages were discovered in the implementations of ECC algorithms on multiple Nuvoton TPMs, and other previously reported leakages were confirmed. The analysis also unveiled incompleteness of vulnerability reporting and subsequent mitigation with missing clear information about the affected versions and inconsistent fixes.
2024
TCHES
Quantum Circuit Reconstruction from Power Side-Channel Attacks on Quantum Computer Controllers
The interest in quantum computing has grown rapidly in recent years, and with it grows the importance of securing quantum circuits. A novel type of threat to quantum circuits that dedicated attackers could launch are power trace attacks. To address this threat, this paper presents first formalization and demonstration of using power traces to unlock and steal quantum circuit secrets. With access to power traces, attackers can recover information about the control pulses sent to quantum computers. From the control pulses, the gate level description of the circuits, and eventually the secret algorithms can be reverse engineered. This work demonstrates how and what information could be recovered. This work uses algebraic reconstruction from power traces to realize two new types of single trace attacks: per-channel and total power attacks. The former attack relies on per-channel measurements to perform a brute-force attack to reconstruct the quantum circuits. The latter attack performs a single-trace attack using Mixed-Integer Linear Programming optimization. Through the use of algebraic reconstruction, this work demonstrates that quantum circuit secrets can be stolen with high accuracy. Evaluation on 32 real benchmark quantum circuits shows that our technique is highly effective at reconstructing quantum circuits. The findings not only show the veracity of the potential attacks, but also the need to develop new means to protect quantum circuits from power trace attacks. Throughout this work real control pulse information from real quantum computers is used to demonstrate potential attacks based on simulation of collection of power traces.
2024
TCHES
Impeccable Keccak: Towards Fault Resilient SPHINCS+ Implementations
The standardization of the hash-based digital signature scheme SPHINCS+ proceeds faster than initially expected. This development seems to be welcomed by practitioners who appreciate the high confidence in SPHINCS+’s security assumptions and its reliance on well-known hash functions. However, the implementation security of SPHINCS+ leaves many questions unanswered, due to its proneness to fault injection attacks. Previous works have shown, that even imprecise fault injections on the signature generation are sufficient for universal forgery. This led the SPHINCS+ team to promote the usage of hardware countermeasures against such attacks. Since the majority of operations in SPHINCS+ is dedicated to the computation of the Keccak function, we focus on its security. At the core, hardware countermeasures against fault injection attacks are almost exclusively based on redundancy. For hash functions such as Keccak, straightforward instance- or time-redundancy is expensive in terms of chip area or latency. Further, for applications that must withstand powerful fault adversaries, these simple forms of redundancy are not sufficient. To this end, we propose our impeccable Keccak design. It is based on the methodology presented in the original Impeccable Circuits paper by Aghaie et al. from 2018. On the way, we show potential pitfalls when designing impeccable circuits and how the concept of active security can be applied to impeccable circuits. To the best of our knowledge, we are the first to provide proofs of active security for impeccable circuits. Further, we show a novel way to implement non-linear functions without look-up tables. We use our findings to design an impeccable Keccak. Assuming an adversary with the ability to flip single bits, our design detects all attacks with three and less flipped bits. Attacks from adversaries who are able to flip four or more bits are still detected with a high probability. Thus, our design is one of the most resilient designs published so far and the only Keccak design that is provably secure within a bit-flip model. At an area overhead of factor 3.2, our design is competitive with state-of-the-art designs with less resilience.
2024
TCHES
High-Performance Hardware Implementation of MPCitH and Picnic3
Picnic is a post-quantum digital signature, the security of which relies solely on symmetric-key primitives such as block ciphers and hash functions instead of number theoretic assumptions. One of the main concerns of Picnic is the large signature size. Although Katz et al.’s protocol (MPCitH-PP) significantly reduces the size of Picnic, the involvement of more parties in MPCitH-PP leads to longer signing/verification times and more hardware resources. This poses new challenges for implementing high-performance Picnic on resource-constrained FPGAs. So far as we know, current works on the hardware implementation of MPCitH-based signatures are compatible with 3 parties only. In this work, we investigate the optimization of the implementation of MPCitH-PP and successfully deploying MPCitH-PP with more than three parties on resource-constrained FPGAs, e.g., Xilinx Artix-7 and Kintex-7, for the first time. In particular, we propose a series of optimizations, which include pipelining and parallel optimization for MPCitH-PP and the optimization of the underlying symmetric primitives. Besides, we make a slight modification to the computation of the offline commitment, which can further reduce the number of computations of Keccak. These optimizations significantly improve the hardware performance of Picnic3. Signing messages on our FPGA takes 0.047 ms for the L1 security level, outperforming Picnic1 with hardware by a factor of about 5.3, which is the fastest implementation of post-quantum signatures as far as we know. Our FPGA implementation for the L5 security level takes 0.146 ms beating Picnic1 by a factor of 8.5, and outperforming Sphincs by a factor of 17.3.
2024
TCHES
SDitH in Hardware
This work presents the first hardware realisation of the Syndrome-Decodingin-the-Head (SDitH) signature scheme, which is a candidate in the NIST PQC process for standardising post-quantum secure digital signature schemes. SDitH’s hardness is based on conservative code-based assumptions, and it uses the Multi-Party-Computation-in-the-Head (MPCitH) construction. This is the first hardware design of a code-based signature scheme based on traditional decoding problems and only the second for MPCitH constructions, after Picnic. This work presents optimised designs to achieve the best area efficiency, which we evaluate using the Time-Area Product (TAP) metric. This work also proposes a novel hardware architecture by dividing the signature generation algorithm into two phases, namely offline and online phases for optimising the overall clock cycle count. The hardware designs for key generation, signature generation, and signature verification are parameterised for all SDitH parameters, including the NIST security levels, both syndrome decoding base fields (GF256 and GF251), and thus conforms to the SDitH specifications. The hardware design further supports secret share splitting, and the hypercube optimisation which can be applied in this and multiple other NIST PQC candidates. The results of this work result in a hardware design with a drastic reducing in clock cycles compared to the optimised AVX2 software implementation, in the range of 2-4x for most operations. Our key generation outperforms software drastically, giving a 11-17x reduction in runtime, despite the significantly faster clock speed. On Artix 7 FPGAs we can perform key generation in 55.1 Kcycles, signature generation in 6.7 Mcycles, and signature verification in 8.6 Mcycles for NIST L1 parameters, which increase for GF251, and for L3 and L5 parameters.
2024
TCHES
Nibbling MAYO: Optimized Implementations for AVX2 and Cortex-M4
MAYO is a popular high-calorie condiment as well as an auspicious candidate in the ongoing NIST competition for additional post-quantum signature schemes achieving competitive signature and public key sizes. In this work, we present high-speed implementations of MAYO using the AVX2 and Armv7E-M instruction sets targeting recent x86 platforms and the Arm Cortex-M4. Moreover, the main contribution of our work is showing that MAYO can be even faster when switching from a bitsliced representation of keys to a nibble-sliced representation. While the bitsliced representation was primarily motivated by faster arithmetic on microcontrollers, we show that it is not necessary for achieving high performance on Cortex-M4. On Cortex-M4, we instead propose to implement the large matrix multiplications of MAYO using the Method of the Four Russians (M4R), which allows us to achieve better performance than when using the bitsliced approach. This results in up to 21% faster signing. For AVX2, the change in representation allows us to implement the arithmetic much faster using shuffle instructions. Signing takes up to 3.2x fewer cycles and key generation and verification enjoy similar speedups. This shows that MAYO is competitive with lattice-based signature schemes on x86 CPUs, and a factor of 2-6 slower than lattice-based signature schemes on Cortex-M4 (which can still be considered competitive).
2024
TCHES
Masking Floating-Point Number Multiplication and Addition of Falcon: First- and Higher-order Implementations and Evaluations
In this paper, we provide the first masking scheme for floating-point number multiplication and addition to defend against recent side-channel attacks on Falcon’s pre-image vector computation. Our approach involves a masked nonzero check gadget that securely identifies whether a shared value is zero. This gadget can be utilized for various computations such as rounding the mantissa, computing the sticky bit, checking the equality of two values, and normalizing a number. To support the masked floating-point number addition, we also developed a masked shift and a masked normalization gadget. Our masking design provides both first- and higherorder mask protection, and we demonstrate the theoretical security by proving the (Strong)-Non-Interference properties in the probing model. To evaluate the performance of our approach, we implemented unmasked, first-order, and second-order algorithms on an Arm Cortex-M4 processor, providing cycle counts and the number of random bytes used. We also report the time for one complete signing process with our countermeasure on an Intel-Core CPU. In addition, we assessed the practical security of our approach by conducting the test vector leakage assessment (TVLA) to validate the effectiveness of our protection. Specifically, our TVLA experiment results for second-order masking passed the test in 100,000 measured traces.
2024
TCHES
MiRitH: Efficient Post-Quantum Signatures from MinRank in the Head
Since 2016’s NIST call for standardization of post-quantum cryptographic primitives, developing efficient post-quantum secure digital signature schemes has become a highly active area of research. The difficulty in constructing such schemes is evidenced by NIST reopening the call in 2022 for digital signature schemes, because of missing diversity in existing proposals. In this work, we introduce the new postquantum digital signature scheme MiRitH. As direct successor of a scheme recently developed by Adj, Rivera-Zamarripa and Verbel (Africacrypt ’23), it is based on the hardness of the MinRank problem and follows the MPC-in-the-Head paradigm. We revisit the initial proposal, incorporate design-level improvements and provide more efficient parameter sets. We also provide the missing justification for the quantum security of all parameter sets following NIST metrics. In this context we design a novel Grover-amplified quantum search algorithm for solving the MinRank problem that outperforms a naive quantum brute-force search for the solution.MiRitH obtains signatures of size 5.7 kB for NIST category I security and therefore competes for the smallest signatures among any post-quantum signature following the MPCitH paradigm.At the same time MiRitH offers competitive signing and verification timings compared to the state of the art. To substantiate those claims we provide extensive implementations. This includes a reference implementation as well as optimized constant-time implementations for Intel processors (AVX2), and for the ARM (NEON) architecture. The speedup of our optimized AVX2 implementation relies mostly on a redesign of the finite field arithmetic, improving over existing implementations as well as an improved memory management.
2024
TCHES
eLIMInate: a Leakage-focused ISE for Masked Implementation
Even given a state-of-the-art masking scheme, masked software implementation of some cryptography functionality can pose significant challenges stemming, e.g., from simultaneous requirements for efficiency and security. In this paper we design an Instruction Set Extension (ISE) to address a specific element of said challenge, namely the elimination of leakage stemming from architectural and microarchitectural overwriting. Conceptually, the ISE allows a leakage-focused behavioural hint to be communicated from software to the micro-architecture: using it informs how computation is realised when applied to masking-specific data, which then offers an opportunity to eliminate associated leakage. We develop prototype, latencyand area-optimised implementations of the ISE design based on the RISC-V Ibex core. Using them, we demonstrate that use of the ISE can close the gap between assumptions about and actual behaviour of a device and thereby deliver an improved security guarantee.
2024
TCHES
Exploiting Small-Norm Polynomial Multiplication with Physical Attacks: Application to CRYSTALS-Dilithium
We present a set of physical profiled attacks against CRYSTALS-Dilithium that accumulate noisy knowledge on secret keys over multiple signatures, finally leading to a full key recovery attack. The methodology is composed of two steps. The first step consists of observing or inserting a bias in the posterior distribution of sensitive variables. The second step is an information processing phase which is based on belief propagation and effectively exploits that bias. The proposed concrete attacks rely on side-channel information, induced faults or possibly a combination of the two. Interestingly, the adversary benefits most from this previous knowledge when targeting the released signatures, however, the latter are not strictly necessary. We show that the combination of a physical attack with the binary knowledge of acceptance or rejection of a signature also leads to exploitable information on the secret key. Finally, we demonstrate that this approach is also effective against shuffled implementations of CRYSTALS-Dilithium.
2024
TCHES
Distribution of Signal to Noise Ratio and Application to Leakage Detection
In the context of side-channel attacks, the Signal to Noise Ratio (SNR) is a widely used metric for characterizing the information leaked by a device when handling sensitive variables. In this paper, we derive the probability density function (p.d.f.) of the signal to noise ratio (SNR) for the byte value and Hamming Weight (HW) models, when the number of traces per class is large and the target SNR is small. These findings are subsequently employed to establish an SNR threshold, guaranteeing minimal occurrences of false positives. Then, these results are used to derive the theoretical number of traces that are required to remain below pre-defined false negative and false positive rates. The sampling complexity of the T-test, ρ-test and SNR is evaluated for the byte value and HW leakage model by simulations and compared to the theoretical predictions. This allows to establish the most pertinent strategy to make use of each of these detection techniques.
2024
TCHES
Efficient ASIC Architecture for Low Latency Classic McEliece Decoding
Post-quantum cryptography addresses the increasing threat that quantum computing poses to modern communication systems. Among the available “quantum-resistant” systems, the Classic McEliece key encapsulation mechanism (KEM) is positioned as a conservative choice with strong security guarantees. Building upon the code-based Niederreiter cryptosystem, this KEM enables high performance encapsulation and decapsulation and is thus ideally suited for applications such as the acceleration of server workloads. However, until now, no ASIC architecture is available for low latency computation of Classic McEliece operations. Therefore, the present work targets the design, implementation and optimization of a tailored ASIC architecture for low latency Classic McEliece decoding. An efficient ASIC design is proposed, which was implemented and manufactured in a 22 nm FDSOI CMOS technology node. We also introduce a novel inversionless architecture for the computation of error-locator polynomials as well as a systolic array for combined syndrome computation and polynomial evaluation. With these approaches, the associated optimized architecture improves the latency of computing error-locator polynomials by 47% and the overall decoding latency by 27% compared to a state-of-the-art reference, while requiring only 25% of the area.
2024
TCHES
Unlock the Door to my Secrets, but don’t Forget to Glitch: A comprehensive analysis of flash erase suppression attacks
In this work, we look into an attack vector known as flash erase suppression. Many microcontrollers have a feature that allows the debug interface protection to be deactivated after wiping the entire flash memory. The flash erase suppression attack exploits this feature by glitching the mass erase, allowing unlimited access to the data stored in flash memory. This type of attack was presented in a confined context by Schink et al. at CHES 2021. In this paper, we investigate whether this generic attack vector poses a serious threat to real-world products. For this to be true, the success rate of the attack must be sufficiently high, as otherwise, device unique secrets might be erased. Further, the applicability to different devices, different glitching setups, cost, and limitations must be explored. We present the first in-depth analysis of this attack vector. Our study yields that realistic attacks on devices from multiple vendors are possible. As countermeasures can hardly be retrofitted with software, our findings should be considered by users when choosing microcontrollers for security-relevant products or for protection of intellectual property (IP), as well by hardware designers when creating next generation microcontrollers.
2024
TCHES
A Highly-efficient Lattice-based Post-Quantum Cryptography Processor for IoT Applications
Lattice-Based Cryptography (LBC) schemes, like CRYSTALS-Kyber and CRYSTALS-Dilithium, have been selected to be standardized in the NIST Post-Quantum Cryptography standard. However, implementing these schemes in resourceconstrained Internet-of-Things (IoT) devices is challenging, considering efficiency, power consumption, area overhead, and flexibility to support various operations and parameter settings. Some existing ASIC designs that prioritize lower power and area can not achieve optimal performance efficiency, which are not practical for battery-powered devices. Custom hardware accelerators in prior co-processor and processor designs have limited applications and flexibility, incurring significant area and power overheads for IoT devices. To address these challenges, this paper presents an efficient lattice-based cryptography processor with customized Single-Instruction-Multiple-Data (SIMD) instruction. First, our proposed SIMD architecture supports efficient parallel execution of various polynomial operations in 256-bit mode and acceleration of Keccak in 320-bit mode, both utilizing efficiently reused resources. Additionally, we introduce data shuffling hardware units to resolve data dependencies within SIMD data. To further enhance performance, we design a dual-issue path for memory accesses and corresponding software design methodologies to reduce the impact of data load/store blocking. Through a hardware/software co-design approach, our proposed processor achieves high efficiency in supporting all operations in lattice-based cryptography schemes. Evaluations of Kyber and Dilithium show our proposed processor achieves over 10x speedup compared with the baseline RISC-V processor and over 5x speedup versus ARM Cortex M4 implementations, making it a promising solution for securing IoT communications and storage. Moreover, Silicon synthesis results show our design can run at 200 MHz with 2.01 mW for Kyber KEM 512 and 2.13 mW for Dilithium 2, which outperforms state-of-the-art works in terms of PPAP (Performance x Power x Area).
2024
TCHES
Revisiting Keccak and Dilithium Implementations on ARMv7-M
Keccak is widely used in lattice-based cryptography (LBC) and its impact to the overall running time in LBC scheme can be predominant on platforms lacking dedicated SHA-3 instructions. This holds true on embedded devices for Kyber and Dilithium, two LBC schemes selected by NIST to be standardized as quantumsafe cryptographic algorithms. While extensive work has been done to optimize the polynomial arithmetic in these schemes, it was generally assumed that Keccak implementations were already optimal and left little room for enhancement.In this paper, we revisit various optimization techniques for both Keccak and Dilithium on two ARMv7-M processors, i.e., Cortex-M3 and M4. For Keccak, we improve its efficiency using two architecture-specific optimizations, namely lazy rotation and memory access pipelining, on ARMv7-M processors. These optimizations yield performance gains of up to 24.78% and 21.4% for the largest Keccak permutation instance on Cortex-M3 and M4, respectively. As for Dilithium, we first apply the multi-moduli NTT for the small polynomial multiplication cti on Cortex-M3. Then, we thoroughly integrate the efficient Plantard arithmetic to the 16-bit NTTs for computing the small polynomial multiplications csi and cti on Cortex-M3 and M4. We show that the multi-moduli NTT combined with the efficient Plantard arithmetic could obtain significant speed-ups for the small polynomial multiplications of Dilithium on Cortex-M3. Combining all the aforementioned optimizations for both Keccak and Dilithium, we obtain 15.44% ∼ 23.75% and 13.94% ∼ 15.52% speed-ups for Dilithium on Cortex-M3 and M4, respectively. Furthermore, we also demonstrate that the Keccak optimizations yield 13.35% to 15.00% speed-ups for Kyber, and our Keccak optimizations decrease the proportion of time spent on hashing in Dilithium and Kyber by 2.46% ∼ 5.03% on Cortex-M4.
2024
TCHES
ConvKyber: Unleashing the Power of AI Accelerators for Faster Kyber with Novel Iteration-based Approaches
The remarkable performance capabilities of AI accelerators offer promising opportunities for accelerating cryptographic algorithms, particularly in the context of lattice-based cryptography. However, current approaches to leveraging AI accelerators often remain at a rudimentary level of implementation, overlooking the intricate internal mechanisms of these devices. Consequently, a significant number of computational resources is underutilized.In this paper, we present a comprehensive exploration of NVIDIA Tensor Cores and introduce a novel framework tailored specifically for Kyber. Firstly, we propose two innovative approaches that efficiently break down Kyber’s NTT into iterative matrix multiplications, resulting in approximately a 75% reduction in costs compared to the state-of-the-art scanning-based methods. Secondly, by reversing the internal mechanisms, we precisely manipulate the internal resources of Tensor Cores using assembly-level code instead of inefficient standard interfaces, eliminating memory accesses and redundant function calls. Finally, building upon our highly optimized NTT, we provide a complete implementation for all parameter sets of Kyber. Our implementation surpasses the state-of-the-art Tensor Core based work, achieving remarkable speed-ups of 1.93x, 1.65x, 1.22x and 3.55x for polyvec_ntt, KeyGen, Enc and Dec in Kyber-1024, respectively. Even when considering execution latency, our throughput-oriented full Kyber implementation maintains an acceptable execution latency. For instance, the execution latency ranges from 1.02 to 5.68 milliseconds for Kyber-1024 on R3080 when achieving the peak throughput.
2024
TCHES
Single trace HQC shared key recovery with SASCA
This paper presents practicable single trace attacks against the Hamming Quasi-Cyclic (HQC) Key Encapsulation Mechanism. These attacks are the first Soft Analytical Side-Channel Attacks (SASCA) against code-based cryptography. We mount SASCA based on Belief Propagation (BP) on several steps of HQC’s decapsulation process. Firstly, we target the Reed-Solomon (RS) decoder involved in the HQC publicly known code. We perform simulated attacks under Hamming weight leakage model, and reach excellent accuracies (superior to 0.9) up to a high noise level (σ = 3), thanks to a re-decoding strategy. In a real case attack scenario, on a STM32F407, this attack leads to a perfect success rate. Secondly, we conduct an analogous attack against the RS encoder used during the re-encryption step required by the Fujisaki-Okamoto-like transform. Both in simulation and practical instances, results are satisfactory and this attack represents a threat to the security of HQC. Finally, we analyze the strength of countermeasures based on masking and shuffling strategies. In line with previous SASCA literature targeting Kyber, we show that masking HQC is a limited countermeasure against BP attacks, as well as shuffling countermeasures adapted from Kyber. We evaluate the “full shuffling” strategy which thwarts our attack by introducing sufficient combinatorial complexity. Eventually, we highlight the difficulty of protecting the current RS encoder with a shuffling strategy. A possible countermeasure would be to consider another encoding algorithm for the scheme to support a full shuffling. Since the encoding subroutine is only a small part of the implementation, it would come at a small cost.
2024
EUROCRYPT
Perfect (Parallel) Broadcast in Constant Expected Rounds via Statistical VSS
We study broadcast protocols in the information-theoretic model under optimal conditions, where the number of corruptions $t$ is at most one-third of the parties, $n$. While worst-case $\Omega(n)$ round broadcast protocols are known to be impossible to achieve, protocols with an expected constant number of rounds have been demonstrated since the seminal work of Feldman and Micali [STOC'88]. Communication complexity for such protocols has gradually improved over the years, reaching $O(nL)$ plus expected $O(n^4\log n)$ for broadcasting a message of size $L$ bits. This paper presents a perfectly secure broadcast protocol with expected constant rounds and communication complexity of $O(nL)$ plus expected $O(n^3 \log^2n)$ bits. In addition, we consider the problem of parallel broadcast, where $n$ senders, each wish to broadcast a message of size $L$. We show a parallel broadcast protocol with expected constant rounds and communication complexity of $O(n^2L)$ plus expected $O(n^3 \log^2n)$ bits. Our protocol is optimal (up to expectation) for messages of length $L \in \Omega(n \log^2 n)$. Our main contribution is a framework for obtaining \emph{perfectly} secure broadcast with an expected constant number of rounds from a \emph{statistically} secure verifiable secret sharing. Moreover, we provide a new statistically secure verifiable secret sharing where the broadcast cost per participant is reduced from $O(n \log n)$ bits to only $O(\poly \log n)$ bits. All our protocols are adaptively secure.
2024
EUROCRYPT
Concurrently Secure Blind Schnorr Signatures
Many applications of blind signatures, e.g. in blockchains, require compatibility of the resulting signatures with the existing system. This makes blind issuing of Schnorr signatures (now being standardized and supported by major cryptocurrencies) desirable. Concurrent security of the signing protocol is required to thwart denial-of-service attacks. We present a concurrently secure blind-signing protocol for Schnorr signatures, using the standard primitives NIZK and PKE and assuming that Schnorr signatures themselves are unforgeable. Our protocol is the first to be compatible with standard Schnorr implementations over 256-bit elliptic curves. We cast our scheme as a generalization of blind and partially blind signatures: we introduce the notion of predicate blind signatures, in which the signer can define a predicate that the blindly signed message must satisfy. We provide implementations and benchmarks for various choices of primitives and scenarios, such as blindly signing Bitcoin transactions only when they meet certain conditions specified by the signer.
2024
EUROCRYPT
Software with Certified Deletion
Is it possible to prove the deletion of a computer program after having executed it? While this task is clearly impossible using classical information alone, the laws of quantum mechanics may admit a solution to this problem. In this work, we propose a new approach to answer this question, using quantum information. In the interactive settings, we present the first fully-secure solution for blind delegation with certified deletion, assuming post-quantum hardness of the learning with errors (LWE) problem. In the non-interactive settings, we propose a construction of obfuscation with certified deletion, assuming post-quantum iO and one-way functions. Our main technical contribution is a new deletion theorem for subspace coset states [Vidick and Zhang, EUROCRYPT'21, Coladangelo et al., CRYPTO'21], which enables a generic compiler that adds the certified deletion guarantee to a variety of cryptographic primitives. In addition to our main result, this allows us to obtain a host of new primitives, such as functional encryption with certified deletion and secure software leasing for an interesting class of programs. In fact, we are able for the first time to achieve a stronger notion of secure software leasing, where even a dishonest evaluator cannot evaluate the program after returning it.
2024
PKC
A Refined Hardness Estimation of LWE in Two-step Mode
Recently, researchers have proposed many LWE estimators, such as lattice-estimator (Albrecht et al, Asiacrypt 2017) and leaky-LWE-Estimator (Dachman-Soled et al, Crypto 2020), while the latter has already been used in estimating the security level of Kyber and Dilithium using only BKZ. However, we prove in this paper that solving LWE by combining a lattice reduction step (by LLL or BKZ) and a target vector searching step (by enumeration or sieving), which we call a Two-step mode, is more efficient than using only BKZ. Moreover, we give a refined LWE estimator in Two-step mode by analyzing the relationship between the probability distribution of the target vector and the solving success rate in a Two-step mode LWE solving algorithm. While the latest Two-step estimator for LWE, which is the “primal-bdd” mode in lattice-estimator1, does not take into account some up-to-date results and lacks a thorough theoretical analysis. Under the same gate-count model, our estimation for NIST PQC standards drops by 2.1∼3.4 bits (2.2∼4.6 bits while considering more flexible blocksize and jump strategy) compared with leaky-LWE-Estimator. Furthermore, we also give a conservative estimation for LWE from the Two-step solving algorithm. Compared with the Core-SVP model, which is used in previous conservative estimations, our estimation relies on weaker assumptions and outputs higher evaluation results than the Core-SVP model. For NIST PQC standards, our conservative estimation is 4.17∼8.11 bits higher than the Core-SVP estimation. Hence our estimator can give a closer estimation for both upper bound and lower bound of LWE hardness.
2024
EUROCRYPT
Can Alice and Bob Guarantee Output to Carol?
In the setting of solitary output computations, only a single designated party learns the output of some function applied to the private inputs of all participating parties with the guarantee that nothing beyond the output is revealed. The setting of solitary output functionalities is a special case of secure multiparty computation, which allows a set of mutually distrusting parties to compute some function of their private inputs. The computation should guarantee some security properties, such as correctness, privacy, fairness, and output delivery. Full security captures all these properties together. Solitary output computation is a common setting that has become increasingly important, as it is relevant to many real-world scenarios, such as federated learning and set disjointness. In the set-disjointness problem, a set of parties with private datasets wish to convey to another party whether they have a common input. In this work, we investigate the limits of achieving set-disjointness which already has numerous applications and whose feasibility (under non-trivial conditions) was left open in the work of Halevi et al. (TCC 2019). Towards resolving this, we completely characterize the set of Boolean functions that can be computed in the three-party setting in the face of a malicious adversary that corrupts up to two of the parties. As a corollary, we characterize the family of set-disjointness functions that can be computed in this setting, providing somewhat surprising results regarding this family and resolving the open question posed by Halevi et al.
2024
EUROCRYPT
Evaluating the security of CRYSTALS-Dilithium in the quantum random oracle model
In the wake of recent progress on quantum computing hardware, the National Institute of Standards and Technology (NIST) is standardizing cryptographic protocols that are resistant to attacks by quantum adversaries. The primary digital signature scheme that NIST has chosen is CRYSTALS-Dilithium. The hardness of this scheme is based on the hardness of three computational problems: Module Learning with Errors (MLWE), Module Short Integer Solution (MSIS), and SelfTargetMSIS. MLWE and MSIS have been well-studied and are widely believed to be secure. However, SelfTargetMSIS is novel and, though classically as hard as MSIS, its quantum hardness is unclear. In this paper, we provide the first proof of the hardness of SelfTargetMSIS via a reduction from MLWE in the Quantum Random Oracle Model (QROM). Our proof uses recently developed techniques in quantum reprogramming and rewinding. A central part of our approach is a proof that a certain hash function, derived from the MSIS problem, is collapsing. From this approach, we deduce a new security proof for Dilithium under appropriate parameter settings. Compared to the previous work by Kiltz, Lyubashevsky, and Schaffner (EUROCRYPT 2018) that gave the only other rigorous security proof for a variant of Dilithium, our proof has the advantage of being applicable under the condition q = 1 mod 2n, where q denotes the modulus and n the dimension of the underlying algebraic ring. This condition is part of the original Dilithium proposal and is crucial for the efficient implementation of the scheme. We provide new secure parameter sets for Dilithium under the condition q = 1 mod 2n, finding that our public key size and signature size are about 2.9 times and 1.3 times larger, respectively, than those proposed by Kiltz et al. at the same security level.
2024
EUROCRYPT
Generalized Feistel Ciphers for Efficient Prime Field Masking
A recent work from Eurocrypt 2023 suggests that prime-field masking has excellent potential to improve the efficiency vs. security tradeoff of masked implementations against side-channel attacks, especially in contexts where physical leakages show low noise. We pick up on the main open challenge that this seed result leads to, namely the design of an optimized prime cipher able to take advantage of this potential. Given the interest of tweakable block ciphers with cheap inverses in many leakage-resistant designs, we start by describing the FPM (Feistel for Prime Masking) family of tweakable block ciphers based on a generalized Feistel structure. We then propose a first instantiation of FPM, which we denote as small-pSquare. It builds on the recent observation that the square operation (which is non-linear in Fp) can lead to masked gadgets that are more efficient than those for multiplication, and is tailored for efficient masked implementations in hardware. We analyze the mathematical security of the FPM family of ciphers and the small-pSquare instance, trying to isolate the parts of our study that can be re-used for other instances. We additionally evaluate the implementation features of small-pSquare by comparing the efficiency vs. security tradeoff of masked FPGA circuits against those of a state-of-the art binary cipher, namely SKINNY, confirming significant gains in relevant contexts.
2024
EUROCRYPT
Tight Indistinguishability Bounds for the XOR of Independent Random Permutations by Fourier Analysis
The XOR of two independent permutations (XoP) is a well-known construction for achieving security beyond the birthday bound when implementing a pseudorandom function using a block cipher (i.e., a pseudorandom permutation). The idealized construction (where the permutations are uniformly chosen and independent) and its variants have been extensively analyzed over nearly 25 years. The best-known asymptotic information-theoretic indistinguishability bound for the XoP construction is $O(q/2^{1.5n})$, derived by Eberhard in~2017. A generalization of the XoP construction outputs the XOR of $r \geq 2$ independent permutations, and has also received significant attention in both the single-user and multi-user settings. In particular, for $r = 3$, the best-known bound (obtained by Choi et al. [ASIACRYPT'22]) is about $q^2/2^{2.5 n}$ in the single-user setting and $\sqrt{u} q_{\max}^2/2^{2.5 n}$ in the multi-user setting (where $u$ is the number of users and $q_{\max}$ is the number of queries per user). In this paper, we prove an indistinguishability bound of $q/2^{(r - 0.5)n}$ for the (generalized) XoP construction in the single-user setting, and a bound of $\sqrt{u} q_{\max}/2^{(r - 0.5)n}$ in the multi-user setting. In particular, for $r=2$, we obtain the bounds $q/2^{1.5n}$ and $\sqrt{u} q_{\max}/2^{1.5n}$ in single-user and multi-user settings, respectively. For $r=3$ the corresponding bounds are $q/2^{2.5n}$ and $\sqrt{u} q_{\max}/2^{2.5n}$. All of these bounds hold assuming $q < 2^{n}/2$ (or $q_{\max} < 2^{n}/2$). Compared to previous works, we improve all the best-known bounds for the (generalized) XoP construction in the multi-user setting, and the best-known bounds for the generalized XoP construction for $r \geq 3$ in the single-user setting (assuming $q \geq 2^{n/2}$). For the basic two-permutation XoP construction in the single-user setting, our concrete bound of $q/2^{1.5n}$ stands in contrast to the asymptotic bound of $O(q/2^{1.5n})$ by Eberhard. Since all of our bounds are matched (up to constant factors) for $q > 2^{n/2}$ by attacks published by Patarin in 2008 (and their generalizations to the multi-user setting), they are all tight. We obtain our results by Fourier analysis of Boolean functions. Most of our technical work involves bounding (sums of) Fourier coefficients of the density function associated with sampling without replacement. While the proof of Eberhard relies on similar bounds, our proof is elementary and significantly simpler.
2024
EUROCRYPT
A Direct PRF Construction from Kolmogorov Complexity
While classic results in the 1980s establish that one-way functions (OWFs) imply the existence of pseudorandom generators (PRGs) which in turn imply pseudorandom functions (PRFs), the constructions (most notably the one from OWFs to PRGs) is complicated and inefficient. Consequently, researchers have developed alternative \emph{direct} constructions of PRFs from various different concrete hardness assumptions. In this work, we continue this thread of work and demonstrate the first direct constructions of PRFs from average-case hardness of the time-bounded Kolmogorov complexity problem $\mktp[s]$, where given a threshold, $s(\cdot)$, and a polynomial time-bound, $t(\cdot)$, $\mktp[s]$ denotes the language consisting of strings $x$ with $t$-bounded Kolmogorov complexity, $K^t(x)$, bounded by $s(|x|)$. In more detail, we demonstrate a direct PRF construction with quasi-polynomial security from mild average-case of hardness of $\mktp[2^{O(\sqrt{\log n})}]$ w.r.t the uniform distribution. We note that by earlier results, this assumption is known to be equivalent to the existence of quasi-polynomially secure OWFs; as such, our results yield the first direct (quasi-polynomially secure) PRF constructions from a natural hardness assumptions that also is known to be implied by (quasi-polynomially secure) PRFs. Perhaps surprisingly, we show how to make use of the Nisan-Wigderson PRG construction to get a cryptographic, as opposed to a complexity-theoretic, PRG.
2024
TOSC
Finding Impossible Differentials in ARX Ciphers under Weak Keys
Impossible differential cryptanalysis is very important in the field of symmetric ciphers. Currently, there are many automatic search approaches to find impossible differentials. However, these methods have two underlying assumptions: Markov cipher assumption and key independence assumption. Actually, these two assumptions are not true in ARX ciphers, especially lightweight ones. In this paper, we study the impossible differentials in ARX cipher under weak keys for the first time. Firstly, we propose several accurate difference propagation properties on consecutive two and three modular additions. Then, these properties are applied to four typical local constructions composed of two consecutive modular additions, two modular additions with a rotation operation, xoring secret key or constant in the middle, to find impossible differentials under weak keys or special constants. What’s more, we propose a more accurate difference propagation property on three consecutive modular additions. It can be used to find impossible differentials on more complex local constructions under weak keys or special constants. In practical ciphers, these impossible differentials on local constructions can be used to find contradictions. Lastly, combining our new findings with traditional automatic search methods for impossible differentials, we propose a framework to find impossible differentials in ARX ciphers under weak keys. As applications, we apply the framework to SPECK-32/64, LEA and CHAM-64/128. As a result, we find two 8-round impossible differentials for SPECK-32/64 under 260 weak keys, and one 11-round impossible differential for LEA under 2k−1 weak keys, where k is the key size. These impossible differentials can start from any round. Furthermore, we find two 22-round impossible differentials for CHAM-64/128 under 2127 weak keys starting from certain rounds. As far as we know, all these impossible differentials are longer than previous ones.
2024
TOSC
Solving Degree Bounds for Iterated Polynomial Systems
For Arithmetization-Oriented ciphers and hash functions Gröbner basis attacks are generally considered as the most competitive attack vector. Unfortunately, the complexity of Gröbner basis algorithms is only understood for special cases, and it is needless to say that these cases do not apply to most cryptographic polynomial systems. Therefore, cryptographers have to resort to experiments, extrapolations and hypotheses to assess the security of their designs. One established measure to quantify the complexity of linear algebra-based Gröbner basis algorithms is the so-called solving degree. Caminata &amp; Gorla revealed that under a certain genericity condition on a polynomial system the solving degree is always upper bounded by the Castelnuovo-Mumford regularity and henceforth by the Macaulay bound, which only takes the degrees and number of variables of the input polynomials into account. In this paper we extend their framework to iterated polynomial systems, the standard polynomial model for symmetric ciphers and hash functions. In particular, we prove solving degree bounds for various attacks on MiMC, Feistel-MiMC, Feistel-MiMC-Hash, Hades and GMiMC. Our bounds fall in line with the hypothesized complexity of Gröbner basis attacks on these designs, and to the best of our knowledge this is the first time that a mathematical proof for these complexities is provided. Moreover, by studying polynomials with degree falls we can prove lower bounds on the Castelnuovo-Mumford regularity for attacks on MiMC, Feistel-MiMC and Feistel-MiMCHash provided that only a few solutions of the corresponding iterated polynomial system originate from the base field. Hence, regularity-based solving degree estimations can never surpass a certain threshold, a desirable property for cryptographic polynomial systems.
2024
TOSC
Equivalence of Generalised Feistel Networks
This paper focuses on equivalences between Generalised Feistel Networks (GFN) of type-II. We introduce a new definition of equivalence which captures the concept that two GFNs are identical up to re-labelling of the inputs/outputs, and give a procedure to test this equivalence relation. Such two GFNs are therefore cryptographically equivalent for several classes of attacks. It induces a reduction o the space of possible GFNs: the set of the (k!)2 possible even-odd GFNs with 2k branches can be partitioned into k! different classes.This result can be very useful when looking for an optimal GFN regarding specific computationally intensive properties, such as the minimal number of active S-boxes in a differential trail. We also show that in several previous papers, many GFN candidates are redundant as they belong to only a few classes. Because of this reduction of candidates, we are also able to suggest better permutations than the one of WARP: they reach 64 active S-boxes in one round less and still have the same diffusion round that WARP. Finally, we also point out a new family of permutations with good diffusion properties.
2024
TOSC
Design of a Linear Layer Optimised for Bitsliced 32-bit Implementation
The linear layer of block ciphers plays an important role in their security In particular, ciphers designed following the wide-trail strategy use the branch number of the linear layer to derive bounds on the probability of linear and differential trails. At FSE 2014, the LS-design construction was introduced as a simple and regular structure to design bitsliced block ciphers. It considers the internal state as a bit matrix, and applies alternatively an identical S-Box on all the columns, and an identical L-Box on all the lines. Security bounds are derived from the branch number of the L-Box.In this paper, we focus on bitsliced linear layers inspired by the LS-design construction and the Spook AEAD algorithm. We study the construction of bitsliced linear transformations with efficient implementations using XORs and rotations (optimized for bitsliced ciphers implemented on 32-bit processors), and a high branch number. In order to increase the density of the activity patterns, the linear layer is designed on the whole state, rather than using multiple parallel copies of an L-Box. Our main result is a linear layer for 128-bit ciphers with branch number 21, improving upon the best 32-bit transformation with branch number 12, and the one of Spook with branch number 16.
2024
TOSC
Tightening Leakage Resilience of the Suffix Keyed Sponge
Lightweight cryptographic constructions are often optimized on multiple aspects that put the security bounds to the limit. In this respect, it is important to obtain security bounds that are tight and give an accurate and exact indication of the generic security. However, whereas for black-box security bounds it has become common practice to argue tightness of security bounds, for leakage resilience security bounds this is not the case. This is unfortunate, as for leakage resilience results, tightness is even more important as there is already a lossiness incurred in capturing the actual leakage by a theoretical model in the first place.In this work, we consider the SuKS (Suffix Keyed Sponge) PRF construction and investigate tightness of the leakage resilience bound of Dobraunig and Mennink (ToSC 2019). We observe that, although their black-box security result is tight, their leakage resilience bound is not tight in their bounded leakage term λ. We observe that this is caused by the fact that parts of the security bound contain a term covering multicollisions and a term covering leakage, but an adversary is unable to combine both. We next consider improved security of the SuKS for two types of leakage: fixed position leakage, where the adversary directly learns the value of λ bits of a secret state, and Hamming weight leakage, where the Hamming weight of a fixed part of the state is leaked. For fixed position leakage, a very generous form of bounded leakage, we improve the original bound by making wise use of the multicollision limit function of Daemen et al. (ASIACRYPT 2017). For the more realistic setting of Hamming weight leakage, we structurally revisit the multicollision limit function analysis by including Hamming weight in the computation, a problem that is difficult on its own due to the non-uniform character of this type of leakage. In both cases, we improve and tighten the leakage resilience bound of Dobraunig and Mennink. The improved bound for the SuKS has immediate consequences for the leakage resilience of the NIST lightweight cryptography competition finalist ISAP v2, an authenticated encryption scheme that uses the SuKS internally.
2024
TOSC
Constructing Committing and Leakage-Resilient Authenticated Encryption
The main goal of this work is to construct authenticated encryption (AE) hat is both committing and leakage-resilient. As a first approach for this we consider generic composition as a well-known method for constructing AE schemes. While the leakage resilience of generic composition schemes has already been analyzed by Barwell et al. (Asiacrypt’17), for committing security this is not the case. We fill this gap by providing a separate analysis of the generic composition paradigms with respect to committing security, giving both positive and negative results: By means of a concrete attack, we show that Encrypt-then-MAC is not committing. Furthermore, we prove that Encrypt-and-MAC is committing, given that the underlying schemes satisfy security notions we introduce for this purpose. We later prove these new notions achievable by providing schemes that satisfy them. MAC-then-Encrypt turns out to be more difficult due to the fact that the tag is not outputted alongside the ciphertext as it is done for the other two composition methods. Nevertheless, we give a detailed heuristic analysis of MAC-then-Encrypt with respect to committing security, leaving a definite result as an open task for future work. Our results, in combination with the fact that only Encrypt-then-MAC yields leakage-resilient AE schemes, show that one cannot obtain AE schemes that are both committing and leakage-resilient via generic composition. As a second approach for constructing committing and leakage-resilient AE, we develop a generic transformation that turns an arbitrary AE scheme into one that fulfills both properties. The transformation relies on a keyed function that is both binding, i.e., it is hard to find key-input pairs that result in the same output, and leakage-resilient pseudorandom.
2024
TOSC
XDRBG: A Proposed Deterministic Random Bit Generator Based on Any XOF
A deterministic random bit generator (DRBG) generates pseudorandom bits from an unpredictable seed, i.e., a seed drawn from any random source with sufficient entropy. The current paper formalizes a security notion for a DRBG, in which an attacker may make any legal sequence of requests to the DRBG and sometimes compromise the DRBG state, but should still not be able to distingush DRBG outputs from ideal random bits. The paper proposes XDRBG, a new DRBG based on any eXtendable Output Function (XOF) and proves the security of the XDRBG in the ideal-XOF model. The proven bounds are tight, as demonstrated by matching attacks. The paper also discusses the security of XDRBG against quantum attackers. Finally, the paper proposes concrete instantiations of XDRBG, employing either the SHAKE128 or the SHAKE256 XDRBG. Alternative instantiations suitable for lightweight applications can be based on ASCON.
2024
TOSC
2024
TOSC
Building PRFs from TPRPs: Beyond the Block and the Tweak Length Bounds
A secure n-bit tweakable block cipher (TBC) using t-bit tweaks can be modeled as a tweakable uniform random permutation, where each tweak defines an independent random n-bit permutation. When an input to this tweakable permutation is fixed, it can be viewed as a perfectly secure t-bit random function. On the other hand, when a tweak is fixed, it can be viewed as a perfectly secure n-bit random permutation, and it is well known that the sum of two random permutations is pseudorandom up to 2n queries.A natural question is whether one can construct a pseudorandom function (PRF) beyond the block and the tweak length bounds using a small number of calls to the underlying tweakable permutations. A straightforward way of constructing a PRF from tweakable permutations is to xor the outputs from two tweakable permutations with c bits of the input to each permutation fixed. Using the multi-user security of the sum of two permutations, one can prove that the (t + n − c)-to-n bit PRF is secure up to 2n+c queries.In this paper, we propose a family of PRF constructions based on tweakable permutations, dubbed XoTPc, achieving stronger security than the straightforward construction. XoTPc is parameterized by c, giving a (t + n − c)-to-n bit PRF. When t &lt; 3n and c = t/3 , XoTPt/3 becomes an (n + 2t/3 )-to-n bit pseudorandom function, which is secure up to 2n+2t/3 queries. It provides security beyond the block and the tweak length bounds, making two calls to the underlying tweakable permutations. In order to prove the security of XoTPc, we extend Mirror theory to q ≫ 2n, where q is the number of equations. From a practical point of view, our construction can be used to construct TBC-based MAC finalization functions and CTR-type encryption modes with stronger provable security compared to existing schemes.
2024
TOSC
Permutation-Based Hashing Beyond the Birthday Bound
It is known that the sponge construction is tightly indifferentiable from a random oracle up to around 2c/2 queries, where c is the capacity. In particular, it cannot provide generic security better than half of the underlying permutation size. In this paper, we aim to achieve hash function security beating this barrier. We present a hashing mode based on two b-bit permutations named the double sponge. The double sponge can be seen as the sponge embedded within the double block length hashing paradigm, making two permutation calls in parallel interleaved with an efficient mixing function. Similarly to the sponge, the permutation size is split as b = r+c, and the underlying compression function absorbs r bits at a time. We prove that the double sponge is indifferentiable from a random oracle up to around 22c/3 queries. This means that the double sponge achieves security beyond the birthday bound in the capacity. In addition, if c &gt; 3b/4, the double sponge beats the birthday bound in the primitive size, to our knowledge being the first hashing mode based on a permutation that accomplices this feature.
2024
TOSC
Small Stretch Problem of the DCT Scheme and How to Fix It
DCT is a beyond-birthday-bound (BBB) deterministic authenticated encryption (DAE) mode proposed by Forler et al. in ACISP 2016, ensuring integrity by redundancy. The instantiation of DCT employs the BRW polynomial, which is more efficient than the usual polynomial in GCM by reducing half of the multiplication operations. However, we show that DCT suffers from a small stretch problem similar to GCM. When the stretch length τ is small, choosing a special m-block message, we can reduce the number of queries required by a successful forgery to O(2τ/m). We emphasize that this attack efficiently balances space and time complexity but does not contradict the security bounds of DCT. Finally, we propose an improved scheme named Robust DCT (RDCT) with a minor change to DCT, which improves the security when τ is small and makes it resist the above attack.
2024
TOSC
Key Committing Attacks against AES-based AEAD Schemes
Recently, there has been a surge of interest in the security of authenticated encryption with associated data (AEAD) within the context of key commitment frameworks. Security within this framework ensures that a ciphertext chosen by an adversary does not decrypt to two different sets of key, nonce, and associated data. Despite this increasing interest, the security of several widely deployed AEAD schemes has not been thoroughly examined within this framework. In this work, we assess the key committing security of several AEAD schemes. First, the AEGIS family, which emerged as a winner in the Competition for Authenticated Encryption: Security, Applicability, and Robustness (CAESAR), and has been proposed to standardization at the IETF. A now outdated version of the draft standard suggested that AEGIS could qualify as a fully committing AEAD scheme; we prove that it is not the case by proposing a novel attack applicable to all variants, which has been experimentally verified. We also exhibit a key committing attack on Rocca-S. Our attacks are executed within the FROB game setting, which is known to be one of the most stringent key committing frameworks. This implies that they remain valid in other, more relaxed frameworks, such as CMT-1, CMT-4, and so forth. Finally, we show that applying the same attack techniques to Rocca and Tiaoxin-346 does not compromise their key-committing security. This observation provides valuable insights into the design of such secure round update functions for AES-based AEAD schemes.
2024
TOSC
Improved Meet-in-the-Middle Nostradamus Attacks on AES-like Hashing
The Nostradamus attack was originally proposed as a security vulnerability for a hash function by Kelsey and Kohno at EUROCRYPT 2006. It requires the attacker to commit to a hash value y of an iterated hash function H. Subsequently, upon being provided with a message prefix P, the adversary’s task is to identify a suffix S such that H(P∥S) equals y. Kelsey and Kohno demonstrated a herding attack requiring O(√n · 22n/3) evaluations of the compression function of H, where n represents the output and state size of the hash, placing this attack between preimage attacks and collision searches in terms of complexity. At ASIACRYPT 2022, Benedikt et al. transform Kelsey and Kohno’s attack into a quantum variant, decreasing the time complexity from O(√n · 22n/3) to O( 3√n · 23n/7). At ToSC 2023, Zhang et al. proposed the first dedicated Nostradamus attack on AES-like hashing in both classical and quantum settings. In this paper, we have made revisions to the multi-target technique incorporated into the meet-in-the-middle automatic search framework. This modification leads to a decrease in time complexity during the online linking phase, effectively reducing the overall attack time complexity in both classical and quantum scenarios. Specifically, we can achieve more rounds in the classical setting and reduce the time complexity for the same round in the quantum setting.
2024
TOSC
Cryptanalysis of QARMAv2
QARMAv2 is a general-purpose and hardware-oriented family of lightweight tweakable block ciphers (TBCs) introduced in ToSC 2023. QARMAv2, as a redesign of QARMAv1 with a longer tweak and tighter security margins, is also designed to be suitable for cryptographic memory protection and control flow integrity. The designers of QARMAv2 provided a relatively comprehensive security analysis in the design specification, e.g., some bounds for the number of attacked rounds in differential and boomerang analysis, together with some concrete impossible differential, zerocorrelation, and integral distinguishers. As one of the first third-party cryptanalysis of QARMAv2, Hadipour et al., [HGSE24] significantly improved the integral distinguishers of QARMAv2, and provided the longest concrete distinguishers of QARMAv2 up to now. However, they provided no key recovery attack based on their distinguishers. This paper delves into the cryptanalysis of QARMAv2 to enhance our understanding of its security. Given that the integral distinguishers of QARMAv2 are the longest concrete distinguishers for this cipher so far, we focus on integral attack. To this end, we first further improve the automatic tool introduced by Hadipour et al. [HSE23,HGSE24] for finding integral distinguishers of TBCs following the TWEAKEY framework. This new tool exploits the MixColumns property of QARMAv2 to find integral distinguishers more suitable for key recovery attacks. Then, we combine several techniques for integral key recovery attacks, e.g., Meet-in-the-middle and partial-sum techniques to build a fine-grained integral key recovery attack on QARMAv2. Notably, we demonstrate how to leverage the low data complexity of the integral distinguishers of QARMAv2 to reduce the memory complexity of the meet-in-the-middle technique. As a result, we successfully present the first concrete key recovery attacks on reduced-round versions of QARMAv2. This includes attacking 13 rounds of QARMAv2-64-128 with a single tweak block (T = 1), 14 rounds of QARMAv2-64-128 with two independent tweak blocks (T = 2), and 16 rounds of QARMAv2-128-256 with two independent tweak blocks (T = 2), all in an unbalanced setting. Our attacks do not compromise the claimed security of QARMAv2, but they shed more light on the cryptanalysis of this cipher.
2024
TOSC
Algebraic Attack on FHE-Friendly Cipher HERA Using Multiple Collisions
Fully homomorphic encryption (FHE) is an advanced cryptography technique to allow computations (i.e., addition and multiplication) over encrypted data. After years of effort, the performance of FHE has been significantly improved and it has moved from theory to practice. The transciphering framework is another important technique in FHE to address the issue of ciphertext expansion and reduce the client-side computational overhead. To apply the transciphering framework to the CKKS FHE scheme, a new transciphering framework called the Real-to-Finite-Field (RtF) framework and a corresponding FHE-friendly symmetric-key primitive called HERA were proposed at ASIACRYPT 2021. Although HERA has a very similar structure to AES, it is considerably different in the following aspects: 1) the power map x → x3 is used as the S-box; 2) a randomized key schedule is used; 3) it is over a prime field Fp with p &gt; 216. In this work, we perform the first third-party cryptanalysis of HERA, by showing how to mount new algebraic attacks with multiple collisions in the round keys. Specifically, according to the special way to randomize the round keys in HERA, we find it possible to peel off the last nonlinear layer by using collisions in the last-round key and a simple property of the power map. In this way, we could construct an overdefined system of equations of a much lower degree in the key, and efficiently solve the system via the linearization technique. As a esult, for HERA with 192 and 256 bits of security, respectively, we could break some parameters under the same assumption made by designers that the algebra constant ω for Gaussian elimination is ω = 2, i.e., Gaussian elimination on an n × n matrix takes O(nω) field operations. If using more conservative choices like ω ∈ {2.8, 3}, our attacks can also successfully reduce the security margins of some variants of HERA to only 1 round. However, the security of HERA with 80 and 128 bits of security is not affected by our attacks due to the high cost to find multiple collisions. In any case, our attacks reveal a weakness of HERA caused by the randomized key schedule and its small state size.
2024
TOSC
Improved Search for Integral, Impossible Differential and Zero-Correlation Attacks: Application to Ascon, ForkSKINNY, SKINNY, MANTIS, PRESENT and QARMAv2
Integral, impossible-differential (ID), and zero-correlation (ZC) attacks are three of the most important attacks on block ciphers. However, manually finding these attacks can be a daunting task, which is why automated methods are becoming increasingly important. Most automatic tools regarding integral, ZC, and ID attacks have focused only on finding distinguishers rather than complete attacks. At EUROCRYPT 2023, Hadipour et al. proposed a generic and efficient constraint programming (CP) model based on satisfiability for finding ID, ZC, and integral distinguishers. This new model can be extended to a unified CP model for finding full key recovery attacks. However, it has limitations, including determining the contradiction location beforehand and a cell-wise model unsuitable for weakly aligned ciphers like Ascon and PRESENT. They also deferred developing a CP model for the partial-sum technique in key recovery as future work.In this paper, we enhance Hadipour et al.’s method in several ways. First, we remove the limitation of determining the contradiction location in advance. Second, we show how to extend the distinguisher model to a bit-wise model, considering the internal structure of S-boxes and keeping the model based on satisfiability. Third, we introduce a CP model for the partial-sum technique for the first time. To show the usefulness and versatility of our approach, we apply it to various designs, from strongly aligned ones like ForkSKINNY and QARMAv2 to weakly aligned ones such as Ascon and PRESENT, yielding significantly improved results. To mention a few of our results, we improve the integral distinguisher of QARMAv2-128 (resp. QARMAv2-64) by 7 (resp. 5) rounds, and the integral distinguisher of ForkSKINNY by 1 round, only thanks to our cell-wise distinguisher modelings. By using our new bit-wise modeling, our tool can find a group of 2155 5-round ID and ZC distinguishers for Ascon in only one run, taking a few minutes on a regular laptop. The new CP model for the partial-sum technique enhances integral attacks on all SKINNY variants, notably improving the best attack on SKINNY-n-n in the single-key setting by 1 round. We also enhance ID attacks on ForkSKINNY and provide the first analysis of this cipher in a limited reduced-round setting. Our methods are generic and applicable to other block ciphers.
2024
EUROCRYPT
Reduction from sparse LPN to LPN, Dual Attack 3.0
The security of code-based cryptography relies primarily on the hardness of decoding generic linear codes. Until very recently, all the best algorithms for solving the decoding problem were information set decoders ($\mathsf{ISD}$). However, recently a new algorithm called \RLPN-decoding which relies on a completely different approach was introduced and it has been shown that \RLPN \ outperforms significantly $\mathsf{ISD}$ decoders for a rather large range of rates. This \RLPN \ decoder relies on two ingredients, first reducing decoding to some underlying \LPN{} problem, and then computing efficiently many parity-checks of small weight when restricted to some positions. We revisit \RLPN-decoding by noticing that, in this algorithm, decoding is in fact reduced to a sparse-\LPN{} problem, namely with a secret whose Hamming weight is small. Our new approach consists this time in making an additional reduction from sparse-\LPN{} to plain-\LPN{} with a coding approach inspired by $\mathsf{coded}$-$\mathsf{BKW}$. It outperforms significantly the $\mathsf{ISD}$'s and \RLPN \ for code rates smaller than $\hardestrate$. This algorithm can be viewed as the code based cryptography cousin of recent dual attacks in lattice based cryptography. We depart completely from the traditional analysis of this kind of algorithm which uses a certain number of independence assumptions that have been strongly questioned recently in the latter domain. We give instead a formula for the \LPNs noise relying on duality which allows to analyze the behavior of the algorithm by relying only on the analysis of a certain weight distribution. By using only a minimal assumption whose validity has been verified experimentally we are able to justify the correctness of our algorithm. This key tool, namely the duality formula, can be readily adapted to the lattice setting and is shown to give a simple explanation for some phenomena observed on dual attacks in lattices in \cite{DP23}.
2024
EUROCRYPT
A General Framework for Lattice-Based ABE Using Evasive Inner-Product Functional Encryption
We present a general framework for constructing attribute-based encryption (ABE) schemes for arbitrary function class based on lattices from two ingredients, i) a noisy linear secret sharing scheme for the class and ii) a new type of inner-product functional encryption (IPFE) scheme, termed *evasive* IPFE, which we introduce in this work. We propose lattice-based evasive IPFE schemes and establish their security under simple conditions based on variants of evasive learning with errors (LWE) assumption recently proposed by Wee [EUROCRYPT '22] and Tsabary [CRYPTO '22]. Our general framework is modular and conceptually simple, reducing the task of constructing ABE to that of constructing noisy linear secret sharing schemes, a more lightweight primitive. The versatility of our framework is demonstrated by three new ABE schemes based on variants of the evasive LWE assumption. - We obtain two ciphertext-policy ABE schemes for all polynomial-size circuits with a predetermined depth bound. One of these schemes has *succinct* ciphertexts and secret keys, of size polynomial in the depth bound, rather than the circuit size. This eliminates the need for tensor LWE, another new assumption, from the previous state-of-the-art construction by Wee [EUROCRYPT '22]. - We develop ciphertext-policy and key-policy ABE schemes for deterministic finite automata (DFA) and logspace Turing machines (L). They are the first lattice-based public-key ABE schemes supporting uniform models of computation. Previous lattice-based schemes for uniform computation were limited to the secret-key setting or offered only weaker security against bounded collusion. Lastly, the new primitive of evasive IPFE serves as the lattice-based counterpart of pairing-based IPFE, enabling the application of techniques developed in pairing-based ABE constructions to lattice-based constructions. We believe it is of independent interest and may find other applications.
2024
EUROCRYPT
Garbled Circuit Lookup Tables with Logarithmic Number of Ciphertexts
Garbled Circuit (GC) is a basic technique for practical secure computation. GC handles Boolean circuits; it consumes significant network bandwidth to transmit encoded gate truth tables, each of which scales with the computational security parameter $\kappa$. GC optimizations that reduce bandwidth consumption are valuable. It is natural to consider a generalization of Boolean two-input one-output gates (represented by $4$-row one-column lookup tables, LUTs) to arbitrary $N$-row $m$-column LUTs. Known techniques for this do not scale, with naive size-$O(Nm\kappa)$ garbled LUT being the most practical approach in many scenarios. Our novel garbling scheme -- logrow -- implements GC LUTs while sending only a logarithmic in $N$ number of ciphertexts! Specifically, let $n = \lceil \log_2 N \rceil$. We allow the GC parties to evaluate a LUT for $(n-1)\kappa + nm\kappa + Nm$ bits of communication. logrow is compatible with modern GC advances, e.g. half gates and free XOR. Our work improves state-of-the-art GC handling of several interesting applications, such as privacy-preserving machine learning, floating-point arithmetic, and DFA evaluation.
2024
EUROCRYPT
Universal Composable Password Authenticated Key Exchange for the Post-Quantum World
In this paper, we construct the first password authenticated key exchange (PAKE) scheme from isogenies with Universal Composable (UC) security in the random oracle model (ROM). We also construct the first two PAKE schemes with UC security in the quantum random oracle model (QROM), one is based on the learning with error (LWE) assumption, and the other is based on the group-action decisional Diffie-Hellman (GA-DDH) assumption in the isogeny setting. To obtain our UC-secure PAKE scheme in ROM, we propose a generic construction of PAKE from basic lossy public key encryption (LPKE) and CCA-secure PKE. We also introduce a new variant of LPKE, named extractable LPKE (eLPKE). By replacing the basic LPKE with eLPKE, our generic construction of PAKE achieves UC security in QROM. The LPKE and eLPKE have instantiations not only from LWE but also from GA-DDH, which admit four specific PAKE schemes with UC security in ROM or QROM, based on LWE or GA-DDH.
2024
EUROCRYPT
Bulletproofs++: Next Generation Confidential Transactions via Reciprocal Set Membership Arguments
Zero-knowledge proofs are a cryptographic cornerstone of privacy-preserving technologies such as “Confidential Transactions” (CT), which aims at hiding monetary amounts in cryptocurrency transactions. Due to its asymptotically logarithmic proof size and transparent setup, most state-of-the-art CT protocols use the Bulletproofs (BP) zero-knowledge proof system for set membership proofs such as range proofs. However, even taking into account recent efficiency improvements, BP comes with a serious overhead in terms of concrete proof size as well as verifier running time and thus puts a large burden on practical deployments of CT and its extensions. In this work, we introduce Bulletproofs++ (BP++), a drop-in replacement for BP that improves its concrete efficiency and compactness significantly. As for BP, the security of BP++ relies only on the hardness of the discrete logarithm problem in the random oracle model, and BP++ retains all features of Bulletproofs including transparent setup and support for proof aggregation, multi-party proving and batch verification. Asymptotically, BP++ range proofs require only O(n/ log n) group scalar multiplications compared to O(n) for BP and BP+. At the heart of our construction are novel techniques for permutation and set membership, enabling highly efficient proofs of statements encoded as arithmetic circuits. Concretely, a single BP++ range proof to establish that a committed value is in a 64-bit range (as commonly required by CT) is just 416 bytes over a 256-bit elliptic curve, 38% smaller than an equivalent BP and 27% smaller than BP+. When instantiated on the secp256k1 curve as used in Bitcoin, our benchmarks show that proving is about 5 times faster than BP and verification is about 3 times faster than BP+. When aggregating 32 range proofs, proving and verification are about 9.5 times and 5.5 times faster, respectively.
2024
EUROCRYPT
Improved Differential Meet-In-The-Middle Cryptanalysis
In this paper, we extend the applicability of differential meet-in-the-middle attacks, proposed at Crypto 2023, to truncated differentials, and in addition, we introduce three new ideas to improve this type of attack: we show how to add longer structures than the original paper, we show how to improve the key recovery steps by introducing some probability in them, and we combine this type of attacks with the state-test technique, that was introduced in the context of impossible differential attacks. Furthermore, we have developed a MILP-based tool to automate the search for a truncated differential-MITM attack with optimized overall complexity, incorporating some of the proposed improvements. Thanks to this, we can build the best known attacks on the cipher CRAFT, reaching 23 rounds against 21 previously; we provide a new attack on 23 round SKINNY-64-192, and we improve the best attacks on SKINNY-128-384.
2024
EUROCRYPT
The Exact Multi-User Security of (Tweakable) Key Alternating Ciphers with a Single Permutation
We prove the tight multi-user (mu) security of the (tweakable) key alternating ciphers (KACs) for any round r with a single permutation and r-wise independent subkeys, providing a more realistic provable-security foundation for block ciphers. After Chen and Steinberger proved the single-user (su) tight security bound of r-round KAC in 2014, its extension under more realistic conditions has become a new research challenge. The state-of-the-art includes (i) single permutation by Yu et al., (ii) mu-security by Hoang and Tessaro, and (iii) correlated subkeys by Tessaro and Zhang. However, the previous works considered these conditions independently, and the tight security bound of r-round KACs with all of these conditions is an open research problem. We address it by giving the new mu-bound with an n-bit message space, approximately q*((p+rq)/(2^n))^r , wherein p and q are the number of primitive and construction queries, respectively. The bound ensures the security up to the O(2^(rn/(r+1))) query complexity and is tight, matching the conven- tional upper bound. Moreover, our result easily extends to the r-round tweakable KAC when its subkeys generated by a tweak function is r-wise independent. The proof is based on the re-sampling method originally proposed for the mu-security analysis of the triple encryption. Its extension to any rounds is the core technique enabling the new bound.
2024
EUROCRYPT
Plover: Masking-Friendly Hash-and-Sign Lattice Signatures
We introduce a toolkit for transforming lattice-based hash-and-sign signature schemes into masking-friendly signatures secure in the t-probing model. Until now, efficiently masking lattice-based hash-and-sign schemes was an open problem unsuccessful attempts such as Mitaka. Our toolkit includes noise flooding to mitigate statistical leaks and an extended Strong Non-Interfering probing security property (SNIu) for masking gadgets to handle unshared inputs. Our main conceptual contribution lies in finding a systematic way to use noise flooding within the hash-and-sign paradigm. Our main technical contribution is to formalize, prove, instantiate and implement a hash-and-sign scheme based on these techniques. We showcase the efficiency of our techniques in a signature scheme, Plover-RLWE, based on (hint) Ring-LWE. It is the first lattice-based masked hash-and-sign scheme with quasi-linear complexity O(d log d) in the number of shares d. Our performances are competitive with the state-of-the-art masking-friendly signature, the Fiat-Shamir scheme Raccoon.
2024
EUROCRYPT
New Limits of Provable Security and Applications to ElGamal Encryption
We provide new results showing that ElGamal encryption cannot be proven CCA1-secure -- a long-standing open problem in cryptography. Our result follows from a very broad, meta-reduction-based impossibility result on random self-reducible relations with efficiently re-randomizable witnesses. The techniques that we develop allow, for the first time, to provide impossibility results for very weak security notions where the challenger outputs fresh challenge statements at the end of the security game. This can be used to finally tackle encryption-type definitions that have remained elusive in the past. We show that our results have broad applicability by casting several known cryptographic setups as instances of random self-reducible and re-randomizable relations. These setups include general semi-homomorphic PKE and the large class of certified homomorphic one-way bijections. As a result, we also obtain new impossibility results for the IND-CCA1 security of the PKEs of Paillier and Damgard-Jurik, and many one-more inversion assumptions like the one-more DLOG or the one-more RSA assumption.
2024
EUROCRYPT
Partial Sums Meet FFT: Improved Attack on 6-Round AES
The partial sums cryptanalytic technique was introduced in 2000 by Ferguson et al., who used it to break 6-round AES with time complexity of $2^{52}$ S-box computations -- a record that has not been beaten ever since. In 2014, Todo and Aoki showed that for 6-round AES, partial sums can be replaced by a technique based on the Fast Fourier Transform (FFT), leading to an attack with a comparable complexity. In this paper we show that the partial sums technique can be combined with an FFT-based technique, to get the best of the two worlds. Using our combined technique, we obtain an attack on 6-round AES with complexity of about $2^{46.4}$ additions. We fully implemented the attack experimentally, along with the partial sums attack and the Todo-Aoki attack, and confirmed that our attack improves the best known attack on 6-round AES by a factor of more than 32. We expect that our technique can be used to significantly enhance numerous attacks that exploit the partial sums technique. To demonstrate this, we use our technique to improve the best known attack on 7-round Kuznyechik by a factor of more than 80, and to reduce the complexity of the best known attack on the full MISTY1 from $2^{69.5}$ to $2^{67}$.
2024
EUROCRYPT
Symmetric Signcryption and E2EE Messaging in Keybase
We introduce a new cryptographic primitive called symmetric signcryption, which differs from traditional signcryption because the sender and recipient share a secret key. We prove that a natural composition of symmetric encryption and signatures achieves strong notions of security against attackers that can learn and control many keys. We then identify that the core encryption algorithm of the Keybase encrypted messaging protocol can be modeled as a symmetric signcryption scheme. We prove the security of this algorithm, though our proof requires assuming non-standard, brittle security properties of the underlying primitives.
2024
EUROCRYPT
Updatable Public Key Encryption, Revisited
We revisit Updatable Public-Key Encryption (UPKE), which was introduced as a practical mechanism for building forward-secure cryptographic protocols. We begin by observing that all UPKE notions to date are neither syntactically flexible nor secure enough for the most important multi-party protocols motivating UPKE. We provide an intuitive taxonomy of UPKE properties -- some partially or completely overlooked in the past -- along with an overview of known (explicit and implicit) UPKE constructions. We then introduce a formal UPKE definition capturing all intuitive properties needed for multi-party protocols. Next, we provide a practical pairing-based construction for which we provide concrete bounds under a standard assumption in the random oracle and the algebraic group model. The efficiency profile of the scheme compares very favorably with existing UPKE constructions (despite the added flexibility and stronger security). For example, when used to improve the forward security of the Messaging Layer Security protocol [RFC9420], our new UPKE construction requires less than 1.5% of the bandwidth of the next-most efficient UPKE construction satisfying the strongest UPKE notion considered so far.
2024
EUROCRYPT
Approximate Lower Bound Arguments
Suppose a prover, in possession of a large body of valuable evidence, wants to quickly convince a verifier by presenting only a small portion of the evidence. We define an Approximate Lower Bound Argument, or ALBA, which allows the prover to do just that: to succinctly prove knowledge of a large number of elements satisfying a predicate (or, more generally, elements of a sufficient total weight when a predicate is generalized to a weight function). The argument is approximate because there is a small gap between what the prover actually knows and what the verifier is convinced the prover knows. This gap enables very efficient schemes. We present noninteractive constructions of ALBA in the random oracle and Uniform Random String models and show that our proof sizes are nearly optimal. We also show how our constructions can be made particularly communication-efficient when the evidence is distributed among multiple provers working together, which is of practical importance when ALBA is applied to a decentralized setting. We demonstrate two very different applications of ALBAs: for large-scale decentralized signatures and for achieving universal composability in general-purpose succinct proof systems (SNARKs).
2024
EUROCRYPT
Integrating Causality in Messaging Channels
Causal reasoning plays an important role in the comprehension of communication, but it has been elusive so far how causality should be properly preserved by instant messaging services. To the best of our knowledge, causality preservation is not even treated as a desired security property by most (if not all) existing secure messaging protocols like Signal. This is probably due to the intuition that causality seems already preserved when all received messages are intact and displayed according to their sending order. Our starting point is to notice that this intuition is wrong. Until now, for messaging channels (where conversations take place), both the proper causality model and the provably secure constructions have been left open. Our work fills this gap, with the goal to facilitate the formal understanding of causality preservation in messaging. First, we focus on the common two-user secure messaging channels and model the desired causality preservation property. We take the popular Signal protocol as an example and analyze the causality security of its cryptographic core (the double-ratchet mechanism). We show its inadequacy with a simple causality attack, then fix it such that the resulting Signal channel is causality-preserving, even in a strong sense that guarantees post-compromise security. Our fix is actually generic: it can be applied to any bidirectional channel to gain strong causality security. Then, we model causality security for the so-called message franking channels. Such a channel additionally enables end users to report individual abusive messages to a server (e.g., the service provider), where this server relays the end-to-end-encrypted communication between users. Causality security in this setting further allows the server to retrieve the necessary causal dependencies of each reported message, essentially extending isolated reported messages to message flows. This has great security merit for dispute resolution, because a benign message may be deemed abusive when isolated from the context. As an example, we apply our model to analyze Facebook’s message franking scheme. We show that a malicious user can easily trick Facebook (i.e., the server) to accuse an innocent user. Then we fix this issue by amending the underlying message franking channel to preserve the desired causality.
2024
EUROCRYPT
Isogeny problems with level structure
Given two elliptic curves and the degree of an isogeny between them, finding the isogeny is believed to be a difficult problem—upon which rests the security of nearly any isogeny-based scheme. If, however, to the data above we add information about the behavior of the isogeny on a large enough subgroup, the problem can become easy, as recent cryptanalyses on SIDH have shown. Between the restriction of the isogeny to a full N‑torsion subgroup and no "torsion information" at all lies a spectrum of interesting intermediate problems, raising the question of how easy or hard each of them is. Here we explore modular isogeny problems where the torsion information is masked by the action of a group of 2×2 matrices. We give reductions between these problems, classify them by their difficulty, and link them to security assumptions found in the literature.
2024
EUROCRYPT
Jolt: SNARKs for Virtual Machines via Lookups
Succinct Non-interactive Arguments of Knowledge (SNARKs) allow an untrusted prover to establish that it correctly ran some “witness-checking procedure” on a witness. A zkVM (short for zero-knowledge Virtual Machine) is a SNARK that allows the witness-checking procedure to be specified as a computer program written in the assembly language of a specific instruction set architecture (ISA). A front-end converts computer programs into a lower-level representation such as an arithmetic circuit or generalization thereof. A SNARK for circuit- satisfiability can then be applied to the resulting circuit. We describe a new front-end technique called Jolt that applies to a variety of ISAs. Jolt arguably realizes a vision called the lookup singularity, which seeks to produce circuits that only perform lookups into pre-determined lookup tables. The circuits output by Jolt primarily perform lookups into a gigantic lookup table, of size more than 2^128, which depends only on the ISA. The validity of the lookups are proved via a new lookup argument described in a companion work called Lasso. Although size 2^128 tables are vastly too large to materialize in full, the tables arising in Jolt are structured, avoiding costs that grow linearly with the table size. We describe performance and auditability benefits of Jolt compared to prior zkVMs, focusing on the popular RISC-V ISA as a concrete example. The dominant cost for the Jolt prover applied to this ISA (on 64-bit data types) is cryptographically committing to about six 256-bit field elements per step of the RISC-V CPU. This compares favorably to prior zkVM provers, even those focused on far simpler VMs.
2024
EUROCRYPT
Unlocking the lookup singularity with Lasso
This paper introduces Lasso, a new family of lookup arguments, which allow an untrusted prover to commit to a vector $a \in \mathbb{F}^m$ and prove that all entries of $a$ reside in some predetermined table $t \in \mathbb{F}^n$. Lasso's performance characteristics unlock the so-called ``lookup singularity''. Lasso works with any multilinear polynomial commitment scheme, and provides the following efficiency properties. (1) For $m$ lookups into a table of size $n$, Lasso's prover commits to just $m+n$ field elements. Moreover, the committed field elements are \emph{small}, meaning that, no matter how big the field $\mathbb{F}$ is, they are all in the set $\{0, \dots, m\}$. When using a multiexponentiation-based commitment scheme, this results in the prover's costs dominated by only $O(m+n)$ group \emph{operations} (e.g., elliptic curve point additions), plus the cost to prove an evaluation of a multilinear polynomial whose evaluations over the Boolean hypercube are the table entries. This represents a significant improvement in prover costs over prior lookup arguments (e.g., plookup, Halo2's lookups, lookup arguments based on logarithmic derivatives). (2) Unlike all prior lookup arguments, if the table $t$ is structured (in a precise sense that we define), then no party needs to commit to $t$, enabling the use of much larger tables than prior works (e.g., of size $2^{128}$ or larger). Moreover, Lasso's prover only ``pays'' in runtime for table entries that are accessed by the lookup operations. This applies to tables commonly used to implement range checks, bitwise operations, big-number arithmetic, and even transitions of a full-fledged CPU such as RISC-V. Specifically, for any integer parameter $c>1$, Lasso's prover's dominant cost is committing to $3 \cdot c \cdot m + c \cdot n^{1/c}$ field elements. Furthermore, all these field elements are ``small'', meaning they are in the set $\{0, \dots, \max\{m, n^{1/c}, q\}-1\}$, where $q$ is the maximum value in any of the sub-tables that collectively capture $t$ (in a precise manner that we define).
2024
EUROCRYPT
Massive Superpoly Recovery with a Meet-in-the-middle Framework -- Improved Cube Attacks on Trivium and Kreyvium
The cube attack extracts the information of secret key bits by recovering the coefficient called superpoly in the output bit with respect to a subset of plaintexts/IV, which is called a cube. While the division property provides an efficient way to detect the structure of the superpoly, superpoly recovery could still be prohibitively costly if the number of rounds is sufficiently high. In particular, Core Monomial Prediction (CMP) was proposed at ASIACRYPT 2022 as a scaled-down version of Monomial Prediction (MP), which sacrifices accuracy for efficiency but ultimately gets stuck at 848 rounds of \trivium. In this paper, we provide new insights into CMP by elucidating the algebraic meaning to the core monomial trails. We prove that it is sufficient to recover the superpoly by extracting all the core monomial trails, an approach based solely on CMP, thus demonstrating that CMP can achieve perfect accuracy as MP does. We further reveal that CMP is still MP in essence, but with variable substitutions on the target function. Inspired by the divide-and-conquer strategy that has been widely used in previous literature, we design a meet-in-the-middle (MITM) framework, in which the CMP-based approach can be embedded to achieve a speedup. To illustrate the power of these new techniques, we apply the MITM framework to \trivium, \grain and \kreyvium. As a result, not only can the previous computational cost of superpoly recovery be reduced (e.g., 5x faster for superpoly recovery on 192-round \grain), but we also succeed in recovering superpolies for up to 851 rounds of \trivium and up to 899 rounds of \kreyvium. This surpasses the previous best results by respectively 3 and 4 rounds. Using the memory-efficient M\"obius transform proposed at EUROCRYPT 2021, we can perform key recovery attacks on target ciphers, even though the superpoly may contain over $2^{40}$ monomials. This leads to the best key recovery attacks on the target ciphers.
2024
EUROCRYPT
Circuit Bootstrapping: Faster and Smaller
We present a novel circuit bootstrapping algorithm that outperforms the state-of-the-art TFHE method with 9.9× speedup and 15.6× key size reduction. These improvements can be attributed to two technical contributions. Firstly, we redesigned the circuit bootstrapping workflow to operate exclusively under the ring ciphertext type, which eliminates the need of conversion between LWE and RLWE ciphertexts. Secondly, we improve the LMKC+ blind rotation algorithm by reducing the number of automorphisms, then propose the first automorphism type multi-value functional bootstrapping. These automorphism-based techniques lead to further key size optimization, and are of independent interest besides circuit bootstrapping. Based our new circuit bootstrapping we can evaluate AES-128 in 26.2s (single thread), achieving 10.3× speedup compared with the state-of-the-art TFHE-based approach.
2024
EUROCRYPT
Anamorphic Encryption, Revisited
Anamorphic encryption refers to an enhanced version of an established PKE scheme which can be set up with an additional so-called double key, shared by sender and receiver. This protects against a dictator that can force the receiver to reveal the secret keys for the PKE scheme, but who is oblivious about the existence of the double key. We identify two limitations of the original model by Persiano, Phan, and Yung (EUROCRYPT 2022). First, in their definition a double key can only be generated once, together with a key-pair. This has the drawback that a receiver who wants to use the anamorphic mode after a dictator comes to power, needs to deploy a new key-pair, a potentially suspicious act. Second, a receiver cannot distinguish whether or not a ciphertext contains a covert message. In this work we propose a new model that overcomes these limitations. First, we allow to associate multiple double keys to a key-pair, after its deployment. This also enables deniability in case the double key only depends on the public key. Second, we propose a natural robustness notion, which guarantees that anamorphically decrypting a regularly encrypted message results in a special symbol indicating that no covert message is contained, which also eliminates certain attacks. Finally, to instantiate our new, stronger definition of anamorphic encryption, we provide generic and concrete constructions. Concretely, we show that ElGamal and Cramer-Shoup satisfy a new condition, selective randomness recoverability, which enables robust anamorphic extensions, and we also provide a robust anamorphic extension for RSA-OAEP.
2024
JOFC
Time-Space Lower Bounds for Finding Collisions in Merkle–Damgård Hash Functions
We revisit the problem of finding B -block-long collisions in Merkle–Damgård Hash Functions in the auxiliary-input random oracle model, in which an attacker gets a piece of S -bit advice about the random oracle and makes T oracle queries. Akshima, Cash, Drucker and Wee (CRYPTO 2020), based on the work of Coretti, Dodis, Guo and Steinberger (EUROCRYPT 2018), showed a simple attack for $$2\le B\le T$$ 2 ≤ B ≤ T (with respect to a random salt). The attack achieves advantage $$\widetilde{\Omega }(STB/2^n+T^2/2^n)$$ Ω ~ ( S T B / 2 n + T 2 / 2 n ) where n is the output length of the random oracle. They conjectured that this attack is optimal. However, this so-called STB conjecture was only proved for $$B\approx T$$ B ≈ T and $$B=2$$ B = 2 . Very recently, Ghoshal and Komargodski (CRYPTO 2022) confirmed the STB conjecture for all constant values of B and provided an $$\widetilde{O}(S^4TB^2/2^n+T^2/2^n)$$ O ~ ( S 4 T B 2 / 2 n + T 2 / 2 n ) bound for all choices of B . In this work, we prove an $$\widetilde{O}((STB/2^n)\cdot \max \{1,ST^2/2^n\}+ T^2/2^n)$$ O ~ ( ( S T B / 2 n ) · max { 1 , S T 2 / 2 n } + T 2 / 2 n ) bound for every $$2< B < T$$ 2 < B < T . Our bound confirms the STB conjecture for $$ST^2\le 2^n$$ S T 2 ≤ 2 n and is optimal up to a factor of S for $$ST^2>2^n$$ S T 2 > 2 n (note as $$T^2$$ T 2 is always at most $$2^n$$ 2 n , otherwise finding a collision is trivial by the birthday attack). Our result subsumes all previous upper bounds for all ranges of parameters except for $$B=\widetilde{O}(1)$$ B = O ~ ( 1 ) and $$ST^2>2^n$$ S T 2 > 2 n . We obtain our results by adopting and refining the technique of Chung, Guo, Liu and Qian (FOCS 2020). Our approach yields more modular proofs and sheds light on how to bypass the limitations of prior techniques. Along the way, we obtain a considerably simpler and illuminating proof for $$B=2$$ B = 2 , recovering the main result of Akshima, Cash, Drucker and Wee.
2024
JOFC
Hashing to Elliptic Curves Through Cipolla–Lehmer–Müller’s Square Root Algorithm
The present article provides a novel hash function $${\mathcal {H}}$$ H to any elliptic curve of j -invariant $$\ne 0, 1728$$ ≠ 0 , 1728 over a finite field $${\mathbb {F}}_{\!q}$$ F q of large characteristic. The unique bottleneck of $${\mathcal {H}}$$ H consists of extracting a square root in $${\mathbb {F}}_{\!q}$$ F q as well as for most hash functions. However, $${\mathcal {H}}$$ H is designed in such a way that the root can be found by (Cipolla–Lehmer–)Müller’s algorithm in constant time. Violation of this security condition is known to be the only obstacle to applying the given algorithm in the cryptographic context. When the field $${\mathbb {F}}_{\!q}$$ F q is highly 2-adic and $$q \equiv 1 \ (\textrm{mod} \ 3)$$ q ≡ 1 ( mod 3 ) , the new batching technique is the state-of-the-art hashing solution except for some sporadic curves. Indeed, Müller’s algorithm costs $$\approx 2\log _2(q)$$ ≈ 2 log 2 ( q ) multiplications in $${\mathbb {F}}_{\!q}$$ F q . In turn, original Tonelli–Shanks’s square root algorithm and all of its subsequent modifications have the algebraic complexity $$\varTheta (\log (q) + g(\nu ))$$ Θ ( log ( q ) + g ( ν ) ) , where $$\nu $$ ν is the 2-adicity of $${\mathbb {F}}_{\!q}$$ F q and a function $$g(\nu ) \ne O(\nu )$$ g ( ν ) ≠ O ( ν ) . As an example, it is shown that Müller’s algorithm actually needs several times fewer multiplications in the field $${\mathbb {F}}_{\!q}$$ F q (whose $$\nu = 96$$ ν = 96 ) of the standardized curve NIST P-224.
2024
EUROCRYPT
Succinct Homomorphic Secret Sharing
This work introduces homomorphic secret sharing (HSS) with succinct share size. In HSS, private inputs are shared between parties, who can then homomorphically evaluate a function on their shares, obtaining a share of the function output. In succinct HSS, a portion of the inputs can be distributed using shares whose size is sublinear in the number of such inputs. The parties can then locally evaluate a function f on the shares, with the restriction that f must be linear in the succinctly shared inputs. We construct succinct, two-party HSS for branching programs, based on either the decisional composite residuosity assumption, a DDH-like assumption in class groups, or learning with errors with a superpolynomial modulus-to-noise ratio. We then give several applications of succinct HSS, which were only previously known using fully homomorphic encryption, or stronger tools: 1. Succinct vector oblivious linear evaluation (VOLE): Two parties can obtain secret shares of a long, arbitrary vector x, multiplied by a scalar ∆, with communication sublinear in the size of the vector. 2. Batch, multi-party distributed point functions: A protocol for distributing a batch of secret, random point functions among N parties, for any polynomial N, with communication sublinear in the number of DPFs. 3. Sublinear MPC for any number of parties: Two new constructions of MPC with sublinear communication complexity, with N parties for any polynomial N: (1) For general layered Boolean circuits of size s, with communication O(N s/log log s), and (2) For layered, sufficiently wide Boolean circuits, with communication O(N s/log s).
2024
EUROCRYPT
From Random Probing to Noisy Leakages Without Field-Size Dependence
Side channel attacks are devastating attacks targeting cryptographic implementations. To protect against these attacks, various countermeasures have been proposed -- in particular, the so-called masking scheme. Masking schemes work by hiding sensitive information via secret sharing all intermediate values that occur during the evaluation of a cryptographic implementation. Over the last decade, there has been broad interest in designing and formally analyzing such schemes. The random probing model considers leakage where the value on each wire leaks with some probability $\varepsilon$. This model is important as it implies security in the noisy leakage model via a reduction by Duc et al. (Eurocrypt 2014). Noisy leakages are considered the ``gold-standard'' for analyzing masking schemes as they accurately model many real-world physical leakages. Unfortunately, the reduction of Duc et al. is non-tight, and in particular requires that the amount of noise increases by a factor of $|\mathbb{F}|$ for circuits that operate over $\mathbb{F}$ (where $\mathbb{F}$ is a finite field). In this work, we give a generic transformation from $\varepsilon$-random probing to $\delta$-average probing, with $\delta \approx \varepsilon^2$, which avoids this loss of $|\mathbb{F}|$. Since the average probing is identical to the noisy leakage model (Eurocrypt 2014), this yields for the first time a security analysis of masked circuits where the noise parameter in the noisy leakage model is independent of $|\mathbb{F}|$. The latter is particularly important for cryptographic schemes operating over large fields, e.g., the AES or the recently standardized post-quantum schemes.
2024
EUROCRYPT
Unbiasable Verifiable Random Functions
Verifiable Random Functions (VRFs) play a pivotal role in Proof of Stake (PoS) blockchain due to their applications in secret leader election protocols. However, the original definition by Goldreich, Goldwasser and Micali is by itself insufficient for such applications. The primary concern is that adversaries may craft VRF key pairs with skewed output distribution, allowing them to unfairly increase their winning chances. To address this issue David, Gazi, Kiayias and Russel (2017/573) proposed a stronger definition in the universal composability framework, while Esgin et al. (FC '21) put forward a weaker game-based one. Their proposed notions come with some limitations though. The former appears to be too strong, being seemingly impossible to instantiate without a programmable random oracle. The latter instead is not sufficient to prove security for VRF-based secret leader election schemes. In this work we close the above gap by proposing a new security property for VRF we call unbiasability. On the one hand, our notion suffices to imply fairness in VRF-based leader elections protocols. On the other hand, we provide an efficient compiler in the plain model (with no CRS) transforming any VRF into an unbiasable one under standard assumptions. Moreover, we show folklore VRF constructions in the ROM to achieve our notion without the need to program the random oracle. As a minor contribution, we also provide a generic and efficient construction of certified 1 to 1 VRFs from any VRF.
2024
EUROCRYPT
Public-Coin, Complexity-Preserving, Succinct Arguments of Knowledge for NP from Collision-Resistance
Succinct arguments allow a powerful (yet polynomial-time) prover to convince a weak verifier of the validity of some NP statement using very little communication. A major barrier to the deployment of such proofs is the unwieldy overhead of the prover relative to the complexity of the statement to be proved. In this work, we focus on complexity-preserving arguments where proving a non-deterministic time $t$ and space $s$ RAM computation takes time $\tilde O(t)$ and space $\tilde O(s)$. Currently, all known complexity-preserving arguments either are private-coin, rely on non-standard assumptions, or provide only weak succinctness. In this work, we construct complexity-preserving succinct argument based solely on collision-resistant hash functions, thereby matching the classic succinct argument of Kilian (STOC '92).
2024
EUROCRYPT
Laconic Function Evaluation, Functional Encryption and Obfuscation for RAMs with Sublinear Computation
Laconic function evaluation (LFE) is a ``flipped'' version of fully homomorphic encryption, where the server performing the computation gets the output. The server commits itself to a function $f$ by outputting a small digest. Clients can later efficiently encrypt inputs $x$ with respect to the digest in much less time than computing $f$, and ensure that the server only decrypts $f(x)$, but does not learn anything else about $x$. Prior works constructed LFE for \emph{circuits} under LWE, and for \emph{Turing Machines (TMs)} from indistinguishability obfuscation (iO). In this work we introduce LFE for \emph{Random-Access Machines} (RAM-LFE). The server commits itself to a potentially huge database $y$ via a short digest. Clients can later efficiently encrypt inputs $x$ with respect to the digest and the server decrypts $f(x,y)$ for some specified RAM program $f$ (e.g., a universal RAM), without learning anything else about $x$. The main advantage of RAM-LFE is that the server's decryption run-time only scales with the RAM run-time $T$ of the computation $f(x,y)$, which can be sublinear in both $|x|$ and $|y|$. We consider a \emph{weakly efficient} variant, where the client's run-time is also allowed to scale linearly with $T$, but not $|y|$, and a \emph{strongly efficient} variant, where the client's run-time must be sublinear in both $T$ and $|y|$. We construct the the former from doubly efficient private information retrieval (DEPIR) and laconic OT (LOT), both of which are known from RingLWE, and the latter from an additional use of iO. We then show how to leverage strongly efficient RAM-LFE to also get (many-key) \emph{functional encryption for RAMs (RAM-FE)} where secret keys are associate with big databases $y$ and the decryption time is sublinear in $|y|$, as well as \emph{iO for RAMs} where the obfuscated program contains a big database $y$ and the evaluation time is sublinear in $|y|$.
2024
EUROCRYPT
Constructing Leakage-resilient Shamir's Secret Sharing: Over Composite Order Fields
Probing physical bits in hardware has compromised cryptographic systems. This work investigates how to instantiate Shamir's secret sharing so that the physical probes into its shares reveal statistically insignificant information about the secret. Over prime fields, Maji, Nguyen, Paskin-Cherniavsky, Suad, and Wang (EUROCRYPT 2021) proved that choosing random evaluation places achieves this objective with high probability. Our work extends their randomized construction to composite order fields -- particularly for fields with characteristic 2. Next, this work presents an algorithm to classify evaluation places as secure or vulnerable against physical-bit probes for some specific cases. Our security analysis of the randomized construction is Fourier-analytic, and the classification techniques are combinatorial. Our analysis relies on (1) contemporary Bezout-theorem-type algebraic complexity results that bound the number of simultaneous zeroes of a system of polynomial equations over composite order fields and (2) characterization of the zeroes of an appropriate generalized Vandermonde determinant.
2024
EUROCRYPT
Publicly Verifiable Secret Sharing over Class Groups and Applications to DKG and YOSO
Publicly Verifiable Secret Sharing (PVSS) allows a dealer to publish encrypted shares of a secret so that parties holding the corresponding decryption keys may later reconstruct it. Both dealing and reconstruction are non-interactive and any verifier can check their validity. PVSS finds applications in randomness beacons, distributed key generation (DKG) and in YOSO MPC (Gentry et al. CRYPTO'21), when endowed with suitable publicly verifiable re-sharing as in YOLO YOSO (Cascudo et al. ASIACRYPT'22). We introduce a PVSS scheme over class groups that achieves similar efficiency to state-of-the art schemes that only allow for reconstructing a function of the secret, while our scheme allows the reconstruction of the original secret. Our construction generalizes the DDH-based scheme of YOLO YOSO to operate over class groups, which poses technical challenges in adapting the necessary NIZKs in face of the unknown group order and the fact that efficient NIZKs of knowledge are not as simple to construct in this setting. Building on our PVSS scheme's ability to recover the original secret, we propose two DKG protocols for discrete logarithm key pairs: a biasable 1-round protocol, which improves on the concrete communication/computational complexities of previous works; and a 2-round unbiasable protocol, which improves on the round complexity of previous works. We also add publicly verifiable resharing towards anonymous committees to our PVSS, so that it can be used to efficiently transfer state among committees in the YOSO setting. Together with a recent construction of MPC in the YOSO model based on class groups (Braun et al. CRYPTO'23), this results in the most efficient full realization (i.e. without assuming receiver anonymous channels) of YOSO MPC based on the CDN framework with transparent setup.
2024
EUROCRYPT
Asymptotics and Improvements of Sieving for Codes
A recent work of Guo, Johansson and Nguyen (Eprint'23) proposes a promising adaptation of Sieving techniques from lattices to codes, in particular claiming concrete cryptanalytic improvements on various schemes. The core of their algorithm reduces to a Near-Neighbor Search (NNS) problem, for which they devise an ad-hoc approach. In this work we aim for a better theoretical understanding of this approach. First by providing an asymptotic analysis which is not present in the original paper. Second, we propose a more systematic use of known NNS machinery, namely Locality Sensitive Hashing and Filtering (LSH/F), an approach that has been applied very successfully in the case of sieving over lattices. We establish the first baseline for the sieving approach with a decoding complexity of $2^{0.117n}$ for the conventional worst parameters (full distance decoding, complexity maximized over all code rates). Our cumulative improvements, eventually enable us to lower the hardest parameter decoding complexity for SievingISD algorithms to $2^{0.101n}$. While this outperforms the BJMM algorithm (Eurocrypt'12) it falls yet behind the most advanced conventional ISD approach by Both and May (PQCrypto'18). As for lattices, we found the Random-Spherical-Code-Product (RPC) to give the best asymptotic complexity. Moreover, we also consider an alternative that seems specific to the Hamming Sphere, which we believe could be of practical interest, as they plausibly hide less sub-exponential overheads than RPC.
2024
EUROCRYPT
Ordering Transactions with Bounded Unfairness: Definitions, Complexity and Constructions
An important consideration in the context of distributed ledger protocols is fairness in terms of transaction ordering. Recent work [Crypto 2020] revealed a connection of (receiver) order fairness to social choice theory and related impossibility results arising from the Condorcet paradox. As a result of the impossibility, various relaxations of order fairness were proposed in prior works. Given that distributed ledger protocols, especially those processing smart contracts, must serialize the input transactions, a natural objective is to minimize the distance (in terms of number of transactions) between any pair of unfairly ordered transactions in the output ledger — a concept we call bounded unfairness. In state machine replication (SMR) parlance this asks for minimizing the number of unfair state updates occurring before the processing of any request. This unfairness minimization objective gives rise to a natural class of parametric order fairness definitions that has not been studied before. As we observe, previous realizable relaxations of order fairness do not yield good unfairness bounds. Achieving optimal order fairness in the sense of bounded unfairness turns out to be connected to the graph theoretic properties of the underlying transaction dependency graph and specifically the bandwidth metric of strongly connected components in this graph. This gives rise to a specific instance of the definition that we call ``directed bandwidth order-fairness'' which we show that it captures the best possible that any ledger protocol can achieve in terms of bounding unfairness. We prove ordering transactions in this fashion is NP-hard and non-approximable for any constant ratio. Towards realizing the property, we put forth a new distributed ledger protocol called Taxis that achieves directed bandwidth order-fairness. We present two variations, one that matches the property perfectly but (necessarily) lacks in performance and liveness, and another that achieves liveness and better complexity while offering a slightly relaxed version of the property. Finally, we comment on applications of our work to social choice theory, a direction which we believe to be of independent interest.
2024
EUROCRYPT
Efficient Pre-processing PIR Without Public-Key Cryptography
Classically, Private Information Retrieval (PIR) was studied in a setting without any pre-processing. In this setting, it is well-known that 1) public-key cryptography is necessary to achieve non-trivial (i.e., sublinear) communication efficiency in the single-server setting, and 2) the total server computation per query must be linear in the size of the database, no matter in the single-server or multi-server setting. Recent works have shown that both of these barriers can be overcome if we are willing to introduce a pre-processing phase. In particular, a recent work called \textsc{Piano} showed that using only one-way functions, one can construct a single-server preprocessing PIR with $\widetilde{O}(\sqrt{n})$ bandwidth and computation per query, assuming $\widetilde{O}(\sqrt{n})$ client storage. For the two-server setting, the state-of-the-art is defined by two incomparable results. First, \textsc{Piano} immediately implies a scheme in the two-server setting with the same performance bounds as stated above. Moreover, Beimel et al. showed a two-server scheme with $O(n^{1/3})$ bandwidth and $O(n/\log^2 n)$ computation per query, and one with $O(n^{1/2 + \epsilon})$ cost both in bandwidth and computation --- both schemes provide information theoretic security. In this paper, we show that assuming the existence of one-way functions, we can construct a two-server preprocessing PIR scheme with $\widetilde{O}(n^{1/4})$ bandwidth and $\widetilde{O}(n^{1/2})$ computation per query, while requiring only $\widetilde{O}(n^{1/2})$ client storage. We also construct a new single-server preprocessing PIR scheme with $\widetilde{O}(n^{1/4})$ {\it online} bandwidth and $\widetilde{O}(n^{1/2})$ {\it offline} bandwidth and {\it computation} per query, also requiring $\widetilde{O}(n^{1/2})$ client storage. Specifically, the online bandwidth is the bandwidth required for the client to obtain an answer, and the offline bandwidth can be viewed as %query-independent background maintenance work amortized to each query. Our new constructions not only advance the theoretical understanding of preprocessing PIR, but are also %conceptually simple and concretely efficient because the only cryptography needed is pseudorandom functions.
2024
EUROCRYPT
Foundations of Adaptor Signatures
Adaptor signatures extend the functionality of regular signatures through the computation of pre-signatures on messages for statements of NP relations. Pre-signatures are publicly verifiable; they simultaneously hide and commit to a signature of an underlying signature scheme on that message. Anybody possessing a corresponding witness for the statement can adapt the pre-signature to obtain the ``regular'' signature. Adaptor signatures have found numerous applications for conditional payments in blockchain systems, like payment channels~(CCS'20, CCS'21), private coin mixing (CCS'22, SP'23), and oracle-based payments (NDSS'23). In our work, we revisit the state of the security of adaptor signatures and their constructions. In particular, our two main contributions are: - Security Gaps and Definitions: We review the widely-used security model of adaptor signatures due to Aumayr et al. (ASIACRYPT'21), and identify gaps in their definitions that render known protocols for private coin-mixing and oracle-based payments insecure. We give simple counterexamples of adaptor signatures that are secure w.r.t. their definitions, but result in insecure instantiations of these protocols. To fill these gaps, we identify a minimal set of modular definitions that align with these practical applications. - Secure Constructions: Despite their popularity, all known constructions are (1) derived from identification schemes via the Fiat-Shamir transform in the random oracle model or (2) require modifications to the underlying signature verification algorithm, thus making the construction useless in the setting of cryptocurrencies. More concerningly, all known constructions were proven secure w.r.t. the insufficient definitions of Aumayr et al., leaving us with no provably secure adaptor signature scheme to use in applications. Firstly, in this work, we salvage all current applications by proving security of the widely-used Schnorr adaptor signatures under our proposed definitions. We then provide several new constructions, including presenting the first adaptor signature schemes for Camenisch-Lysyanskaya (CL), Boneh-Boyen-Shacham (BBS+), and Waters signatures; all of which are proven secure in the standard model. Our new constructions rely on a new abstraction of digital signatures, called dichotomic signatures, which covers the essential properties we need to build adaptor signatures. Proving the security of all constructions (including identification-based schemes) relies on a novel non-black-box proof technique. Both our digital signature abstraction and the proof technique could be of independent interest to the community.
2024
EUROCRYPT
Crypto Dark Matter on the Torus: Oblivious PRFs from shallow PRFs and TFHE
Partially Oblivious Pseudorandom Functions (POPRFs) are 2-party protocols that allow a client to learn pseudorandom function (PRF) evaluations on inputs of its choice from a server. The client submits two inputs, one public and one private. The security properties ensure that the server cannot learn the private input and the client cannot learn more than one evaluation per POPRF query. POPRFs have many applications including password-based key exchange and privacy-preserving authentication mechanisms. However, most constructions are based on classical assumptions, and those with post-quantum security suffer from large efficiency drawbacks. In this work, we construct a novel POPRF from lattice assumptions and the “Crypto Dark Matter” PRF candidate (TCC’18) in the random oracle model. At a conceptual level, our scheme exploits the alignment of this family of PRF candidates, relying on mixed modulus computations, and programmable bootstrapping in the torus fully homomorphic encryption scheme (TFHE). We show that our construction achieves malicious client security based on circuit-private FHE, and client privacy from the semantic security of the FHE scheme. We further explore a heuristic approach to extend our scheme to support verifiability based on the difficulty of computing cheating circuits in low depth. This would yield a verifiable (P)OPRF. We provide a proof-of-concept implementation and preliminary benchmarks of our construction. For the core online OPRF functionality, we require amortised 10.0KB communication per evaluation and a one-time per-client setup communication of 2.5MB.
2024
EUROCRYPT
Monotone-Policy Aggregate Signatures
The notion of aggregate signatures allows for combining signatures from different parties into a short certificate that attests that *all* parties signed a message. In this work, we lift this notion to capture different, more expressive signing policies. For example, we can certify that a message was signed by a (weighted) threshold of signers. We present the first constructions of aggregate signatures for monotone policies based on standard polynomial-time cryptographic assumptions. The aggregate signatures in our schemes are succinct, i.e., their size is *independent* of the number of signers. Moreover, verification is also succinct if all parties sign the same message (or if the messages have a succinct representation). All prior work requires either interaction between the parties or non-standard assumptions (that imply SNARKs for NP). Our signature schemes are based on non-interactive batch arguments (BARGs) for monotone policies [Brakerski-Brodsky-Kalai-Lombardi-Paneth, Crypto'23]. In contrast to previous constructions, our BARGs satisfy a new notion of *adaptive* security which is instrumental to our application. Our new BARGs for monotone policies can be constructed from standard BARGs and other standard assumptions.
2024
EUROCRYPT
SQIsignHD: New Dimensions in Cryptography
We introduce SQIsignHD, a new post-quantum digital signature scheme inspired by SQIsign. SQIsignHD exploits the recent algorithmic breakthrough underlying the attack on SIDH, which allows to efficiently represent isogenies of arbitrary degrees as components of a higher dimensional isogeny. SQIsignHD overcomes the main drawbacks of SQIsign. First, it scales well to high security levels, since the public parameters for SQIsignHD are easy to generate: the characteristic of the underlying field needs only be of the form $2^{f}3^{f'}-1$. Second, the signing procedure is simpler and more efficient. Our signing procedure implemented in C runs in 28 ms, which is a significant improvement compared to SQISign. Third, the scheme is easier to analyse, allowing for a much more compelling security reduction. Finally, the signature sizes are even more compact than (the already record-breaking) SQIsign, with compressed signatures as small as 109 bytes for the post-quantum NIST-1 level of security. These advantages may come at the expense of the verification, which now requires the computation of an isogeny in dimension $4$, a task whose optimised cost is still uncertain, as it has been the focus of very little attention. Our experimental \verb+sagemath+ implementation of the verification runs in 850 ms, indicating the potential cryptographic interest of dimension $4$ isogenies after optimisations and low level implementation.
2024
EUROCRYPT
Fast batched asynchronous distributed key generation
We present new protocols for threshold Schnorr signatures that work in an asynchronous communication setting, providing robustness and optimal resilience. These protocols provide unprecedented performance in terms of communication and computational complexity. In terms of communication complexity, for each signature, a single party must transmit a few dozen group elements and scalars across the network (independent of the size of the signing committee). In terms of computational complexity, the amortized cost for one party to generate a signature is actually less than that of just running the standard Schnorr signing or verification algorithm (at least for moderately sized signing committees, say, up to 100). For example, we estimate that with a signing committee of 49 parties, at most 16 of which are corrupt, we can generate 50,000 Schnorr signatures per second (assuming each party can dedicate one standard CPU core and 500Mbs of network bandwidth to signing). Importantly, this estimate includes both the cost of an offline precomputation phase (which just churns out message independent "presignatures") and an online signature generation phase. Also, the online signing phase can generate a signature with very little network latency (just one to three rounds, depending on how throughput and latency are balanced). To achieve this result, we provide two new innovations. One is a new secret sharing protocol (again, asynchronous, robust, optimally resilient) that allows the dealer to securely distribute shares of a large batch of ephemeral secret keys, and to publish the corresponding ephemeral public keys. To achieve better performance, our protocol minimizes public-key operations, and in particular, is based on a novel technique that does not use the traditional technique based on "polynomial commitments". The second innovation is a new algorithm to efficiently combine ephemeral public keys contributed by different parties (some possibly corrupt) into a smaller number of secure ephemeral public keys. This new algorithm is based on a novel construction of a so-called "super-invertible matrix" along with a corresponding highly-efficient algorithm for multiplying this matrix by a vector of group elements. As protocols for verifiably sharing a secret key with an associated public key and the technology of super-invertible matrices both play a major role in threshold cryptography and multi-party computation, our two new innovations should have applicability well beyond that of threshold Schnorr signatures.
2024
EUROCRYPT
Fast Public-Key Silent OT and More from Constrained Naor-Reingold
Pseudorandom Correlation Functions (PCFs) allow two parties, given correlated evaluation keys, to locally generate arbitrarily many pseudorandom correlated strings, e.g. Oblivious Transfer (OT) correlations, which can then be used by the two parties to jointly run secure computation protocols. In this work, we provide a novel and simple approach for constructing PCFs for OT correlation, by relying on constrained pseudorandom functions for a class of constraints containing a weak pseudorandom function (wPRF). We then show that tweaking the Naor-Reingold pseudorandom function and relying on low-complexity pseudorandom functions allow us to instantiate our paradigm. We further extend our ideas to obtain efficient public-key PCFs, which allow the distribution of correlated keys between parties to be non-interactive: each party can generate a pair of public/secret keys, and any pair of parties can locally derive their correlated evaluation key by combining their secret key with the other party’s public key. In addition to these theoretical contributions, we detail various optimizations and provide concrete instantiations of our paradigm relying on the Boneh-Ishai-Passelègue-Sahai-Wu wPRF and the Goldreich-Applebaum-Raykov wPRF. Putting everything together, we obtain public-key PCFs with a throughput of 15k-40k OT/s, which is of a similar order of magnitude to the state-of-the-art interactive PCFs and about 4 orders of magnitude faster than state-of-the-art public-key PCFs. As a side result, we also show that public-key PCFs can serve as a building block to construct reusable designated-verifier non-interactive zero-knowledge proofs (DV-NIZK) for NP. Combined with our instantiations, this yields simple and efficient reusable DV-NIZKs for NP in pairing-free groups.
2024
EUROCRYPT
Asymptotically Optimal Message Dissemination with Applications to Blockchains
Messages in large-scale networks such as blockchain systems are typically disseminated using flooding protocols, in which parties send the message to a random set of peers until it reaches all parties. Optimizing the communication complexity of such protocols and, in particular, the per-party communication complexity is of primary interest since nodes in a network are often subject to bandwidth constraints. Previous flooding protocols incur a per-party communication complexity of $\Omega(l\cdot \gamma^{-1} \cdot (\log(n) + \kappa))$ bits to disseminate an $l$-bit message among $n$ parties with security parameter~$\kappa$ when it is guaranteed that a $\gamma$ fraction of the parties remain honest. In this work, we present the first flooding protocols with a per-party communication complexity of $O(l\cdot \gamma^{-1})$ bits. We further show that this is asymptotically optimal and that our protocols can be instantiated provably securely in the usual setting for proof-of-stake blockchains. To demonstrate that one of our new protocols is not only asymptotically optimal but also practical, we perform several probabilistic simulations to estimate the concrete complexity for given parameters. Our simulations show that our protocol significantly improves the per-party communication complexity over the state-of-the-art for practical parameters. Hence, for given bandwidth constraints, our results allow to, e.g., increase the block size, improving the overall throughput of a blockchain.
2024
EUROCRYPT
Non-malleable codes with optimal rate for poly-size circuits
We give an explicit construction of non-malleable codes with rate $1-o(1)$ for the tampering class of poly-size circuits. This rate is optimal, and improves upon the previous explicit construction of Ball, Dachman-Soled and Loss \cite{BDL22} which achieves a rate smaller than $\frac{1}{n}$. Our codes are based on the same hardness assumption used by Ball, Dachman-Soled and Loss, namely, that there exists a problem in $\text{E}=\text{DTIME}(2^{O(n)})$ that requires nondeterministic circuits of size $2^{\Omega(n)}$. This is a standard complexity theoretic assumption that was used in many papers in complexity theory and cryptography, and can be viewed as a scaled, nonuniform version of the widely believed assumption that $\text{EXP} \not \subseteq \text{NP}$. Our result is incomparable to that of Ball, Dachman-Soled and Loss, as we only achieve computational (rather than statistical) security. Non-malleable codes with Computational security (with lower error than what we get) were obtained by \cite{BDKLM19,DKP21} under strong cryptographic assumptions. We show that our approach can potentially yield statistical security if certain explicit constructions of pseudorandom objects can be improved. By composing our new non-malleable codes with standard (information theoretic) error-correcting codes (that recover from a $p$ fraction of errors) we achieve the \emph{best of both worlds}. Namely, we achieve explicit codes that recover from a $p$-fraction of errors and have the same rate as the best known explicit information theoretic codes, while \emph{also} being non-malleable for poly-size circuits. Moreover, if we restrict our attention to errors that are introduced by poly-size circuits, we can achieve best of both worlds codes with rate $1-H(p)$. This is superior to the rate achieved by standard (information theoretic) error-correcting codes, and this result is obtained by composing our new non-malleable codes with the recent codes of Shaltiel and Silbak \cite{SS23}. Our technique combines ideas from non-malleable codes and pseudorandomness. We show how to take a low rate ``small set non-malleable code (this is a variant of non-malleable codes with a different notion of security that was introduced by Shaltiel and Silbak \cite{SS22}) and compile it into a (standard) high-rate non-malleable code. Using small set non-malleable codes (as well as seed-extending PRGs) bypasses difficulties that arise when analysing standard non-malleable codes, and allows us to use a simple construction.
2024
EUROCRYPT
Connecting Leakage-Resilient Secret Sharing to Practice: Scaling Trends and Physical Dependencies of Prime Field Masking
Symmetric ciphers operating in (small or mid-size) prime fields have been shown to be promising candidates to maintain security against low-noise (or even noise-free) side-channel leakage. In order to design prime ciphers that best trade physical security and implementation efficiency, it is essential to understand how side-channel security evolves with the field size (i.e., scaling trends). Unfortunately, it has also been shown that such a scaling trend depends on the leakage functions and cannot be explained by the standard metrics used to analyze Boolean masking with noise. In this work, we therefore initiate a formal study of prime field masking for two canonical leakage functions: bit leakages and Hamming weight leakages. By leveraging theoretical results from the leakage-resilient secret sharing literature, we explain formally why (1) bit leakages correspond to a worst-case and do not encourage operating in larger fields, and (2) an opposite conclusion holds for Hamming weight leakages, where increasing the prime field modulus p can contribute to a security amplification that is exponential in the number of shares,with log(p) seen as security parameter like the noise variance in Boolean masking. We combine these theoretical results with experimental ones and show that the interest masking in larger prime fields can degrade gracefully when leakage functions slightly deviate from the Hamming weight abstraction, motivating further research towards characterizing (ideally wide) classes of leakage functions offering such guarantees.
2024
EUROCRYPT
Cryptanalysis of rank-2 module-LIP in totally real number fields
We formally define the Lattice Isomorphism Problem for module lattices (module-LIP) in a number field K. This is a generalization of the problem defined by Ducas, Postlethwaite, Pulles, and van Woerden (Asiacrypt 2022), taking into account the arithmetic and algebraic specificity of module lattices from their representation using pseudo-bases. We also provide the corresponding set of algorithmic and theoretical tools for the future study of this problem in a module setting. Our main contribution is an algorithm solving module-LIP for modules of rank 2 in K^2, when K is a totally real number field. Our algorithm exploits the connection between this problem, relative norm equations and the decomposition of algebraic integers as sums of two squares. For a large class of modules, including O_K^2, it runs in classical polynomial time (under reasonable number theoretic assumptions). We provide a proof-of-concept code running over the maximal real subfield of cyclotomic fields.
2024
EUROCRYPT
Certified Everlasting Secure Collusion-Resistant Functional Encryption, and More
We study certified everlasting secure functional encryption (FE) and many other cryptographic primitives in this work. Certified everlasting security roughly means the following. A receiver possessing a quantum cryptographic object (such as ciphertext) can issue a certificate showing that the receiver has deleted the cryptographic object and information included in the object (such as plaintext) was lost. If the certificate is valid, the security is guaranteed even if the receiver becomes computationally unbounded after the deletion. Many cryptographic primitives are known to be impossible (or unlikely) to have information-theoretical security even in the quantum world. Hence, certified everlasting security is a nice compromise (intrinsic to quantum). In this work, we define certified everlasting secure versions of FE, compute-and-compare obfuscation, predicate encryption (PE), secret-key encryption (SKE), public-key encryption (PKE), receiver non-committing encryption (RNCE), and garbled circuits. We also present the following constructions: - Adaptively certified everlasting secure collusion-resistant public-key FE for all polynomial-size circuits from indistinguishability obfuscation and one-way functions. - Adaptively certified everlasting secure bounded collusion-resistant public-key FE for $\mathsf{NC}^1$ circuits from standard PKE. - Certified everlasting secure compute-and-compare obfuscation from standard fully homomorphic encryption and standard compute-and-compare obfuscation - Adaptively (resp., selectively) certified everlasting secure PE from standard adaptively (resp., selectively) secure attribute-based encryption and certified everlasting secure compute-and-compare obfuscation. - Certified everlasting secure SKE and PKE from standard SKE and PKE, respectively. - Cetified everlasting secure RNCE from standard PKE. - Cetified everlasting secure garbled circuits from standard SKE.
2024
EUROCRYPT
Perfect Asynchronous MPC with Linear Communication Overhead
We study secure multiparty computation in the asynchronous setting with perfect security and optimal resilience (less than one-fourth of the participants are malicious). It has been shown that every function can be computed in this model [Ben-OR, Canetti, and Goldreich, STOC'1993]. Despite 30 years of research, all protocols in the asynchronous setting require $\Omega(n^2C)$ communication complexity for computing a circuit with $C$ multiplication gates. In contrast, for nearly 15 years, in the synchronous setting, it has been known how to achieve $\mathcal{O}(nC)$ communication complexity (Beerliova and Hirt; TCC 2008). The techniques for achieving this result in the synchronous setting are not known to be sufficient for obtaining an analogous result in the asynchronous setting. We close this gap between synchronous and asynchronous secure computation and show the first asynchronous protocol with $\mathcal{O}(nC)$ communication complexity for a circuit with $C$ multiplication gates. Linear overhead forms a natural barrier for general secret-sharing-based MPC protocols. Our main technical contribution is an asynchronous weak binding secret sharing that achieves rate-1 communication (i.e., $\mathcal{O}(1)$-overhead per secret). To achieve this goal, we develop new techniques for the asynchronous setting, including the use of \emph{trivariate polynomials} (as opposed to bivariate polynomials).