IACR News
Here you can see all recent updates to the IACR webpage. These updates are also available:
25 November 2024
Ruiyang Li, Chun Guo, Xiao Wang
ePrint Report
State-of-the-art garbling schemes for boolean circuits roughly consist of two families, i.e., ideal model garbling that combines linear operations and ideal blockciphers (aiming at maximizing performance), and PRF-based garbling that insists on using theoretically sound assumptions. In the linear garbling framework introduced by Zahur, Rosulek, and Evans (Eurocrypt 2015), it was established that garbling an AND gate requires at least $2(\kappa +1)$ bits of ciphertext, with $\kappa$ as the security parameter. Recent contributions from Lei Fan et al. and Chunghun Baek et al. have provided a detailed model showing that, under the free-XOR setting, which relies on a non-standard assumption, garbling an AND gate requires at least $1.5\kappa + O(1)$ bits. In contrast, regarding PRF-based garbling, the general model and efficiency bounds remain open questions.
In this paper, we present a comprehensive model for PRF-based garbled circuits and establish both the communication and computation lower bound. Specifically, we demonstrate that garbling an AND gate requires at least $2\kappa + 2$ bits communication and 6 PRF calls, while an XOR gate requires a minimum of $\kappa$ bits communication and 4 PRF calls. Notably, the state-of-the-art garbling scheme (GLNP scheme) under the PRF assumption, introduced by Shay, Yehuda, Ariel, and Benny (JOC 2018), uses $2\kappa + 4$ bits and 8 PRF calls for an AND gate, which exceeds our established lower bound. We finally introduce an optimal garbling scheme showing that our communication and computation lower bounds are tight.
In this paper, we present a comprehensive model for PRF-based garbled circuits and establish both the communication and computation lower bound. Specifically, we demonstrate that garbling an AND gate requires at least $2\kappa + 2$ bits communication and 6 PRF calls, while an XOR gate requires a minimum of $\kappa$ bits communication and 4 PRF calls. Notably, the state-of-the-art garbling scheme (GLNP scheme) under the PRF assumption, introduced by Shay, Yehuda, Ariel, and Benny (JOC 2018), uses $2\kappa + 4$ bits and 8 PRF calls for an AND gate, which exceeds our established lower bound. We finally introduce an optimal garbling scheme showing that our communication and computation lower bounds are tight.
Guangwu Xu, Ke Han, Yunxiao Tian
ePrint Report
The family of Koblitz curves $E_b: y^2=x^3+b/\mathbb{F}_p$ over primes fields has close connections to the ring $\mathbb{Z}[\omega]$ of Eisenstein integers. Utilizing nice facts from the theory of cubic residues, this paper derives an efficient formula for a (complex) scalar multiplication by $\tau=1-\omega$. This enables us to develop a window $\tau$-NAF method for Koblitz curves over prime fields. This probably is the first window $\tau$-NAF method to be designed for curves over fields with large characteristic. Besides its theoretical interest, a higher performance is also achieved due to the facts that (1) the operation $\tau^2$ can be done more efficiently that makes the average cost of $\tau$ to be close to $2.5\mathbf{S}+3\mathbf{M}$ ( $\mathbf{S}$ and $\mathbf{M}$ denote the costs for field squaring and multiplication, respectively); (2) the pre-computation for the window $\tau$-NAF method is surprisingly simple in that only one-third of the coefficients need to be processed. The overall improvement over the best current method is more than $11\%$. The paper also suggests a simplified modular reduction for Eisenstein integers where the division operations are eliminated. The efficient formula of $\tau P$ can be further used to speed up the computation of $3P$, compared to $10\mathbf{S}+5\mathbf{M}$ , our new formula just costs $4\mathbf{S}+6\mathbf{M}$. As a main ingredient for double base chain method for scalar multiplication, the $3P$ formula will contribute to a greater efficiency.
OPL4GPT: An Application Space Exploration of Optimal Programming Language for Hardware Design by LLM
Kimia Tasnia, Sazadur Rahman
ePrint Report
Despite the emergence of Large Language Models (LLMs) as potential tools for automating hardware design, the optimal programming language to describe hardware functions remains unknown. Prior works extensively explored optimizing Verilog-based HDL design, which often overlooked the potential capabilities of alternative programming languages for hardware designs. This paper investigates the efficacy of C++ and Verilog as input languages in extensive application space exploration, tasking an LLM to generate implementations for various System-on-chip functional blocks. We proposed an automated Optimal Programming Language (OPL) framework that leverages OpenAI's GPT-4o LLM to translate natural language specifications into hardware descriptions using both high-level and low-level programming paradigms. The OPL4GPT demonstration initially employs a novel prompt engineering approach that decomposes design specifications into manageable submodules, presented to the LLM to generate code in both C++ and Verilog. A closed-loop feedback mechanism automatically incorporates error logs from the LLM's outputs, encompassing both syntax and functionality. Finally, functionally correct outputs are synthesized using either RTL (Register-Transfer Level) for Verilog or High-Level Synthesis for C++ to assess area, power, and performance. Our findings illuminate the strengths and weaknesses of each language across various application domains, empowering hardware designers to select the most effective approach.
Aydin Aysu, Fatemeh Ganji, Trey Marcantonio, Patrick Schaumont
ePrint Report
Implementation-security vulnerabilities such as the
power-based side-channel leakage and fault-injection sensitivity
of a secure chip are hard to verify because of the sophistication
of the measurement setup, as well as the need to generalize the
adversary into a test procedure. While the literature has proposed
a wide range of vulnerability metrics to test the correctness of a
secure implementation, it is still up to the subject-matter expert to
map these concepts into a working and reliable test procedure.
Recently, we investigated the benefits of using an open-source
implementation security testing environment called Chipwhisperer.
The open-source and low-cost nature of the Chipwhisperer
hardware and software has resulted in the adoption of thousands
of testing kits throughout academia and industry, turning the
testkit into a baseline for implementation security testing. We
investigate the use cases for the Chipwhisperer hardware and
software, and we evaluate the feasibility of an open-source
ecosystem for implementation security testing. In addition to the
open-source hardware and firmware, an ecosystem also considers
broader community benefits such as re-usability, sustainability,
and governance.
Mahdi Mahdavi, Navid Abapour, Zahra Ahmadian
ePrint Report
With the increasing integration of crowd computing, new vulnerabilities emerge in widely used cryptographic systems like the RSA cryptosystem, whose security is based on the factoring problem. It is strongly advised to avoid using the same modulus to produce two pairs of public-private keys, as the cryptosystem would be rendered vulnerable to common modulus attacks. Such attacks can take two forms: one that aims to factorize the common modulus based on one key pair and the other that aims to decrypt certain ciphertexts generated by two public keys if the keys are co-prime. This paper introduces a new type of common modulus attack on the RSA cryptosystem. In our proposed attack, given one public-private key pair, an attacker can obtain the private key corresponding to a given public key in RSA decryption. This allows the adversary to decrypt any ciphertext generated using this public key. It is worth noting that the proposed attack can be used in the CRT model of RSA. In addition, we propose a parallelizable factoring algorithm with an order equivalent to a cyclic attack in the worst-case scenario.
Nicolas Huber, Ralf Kuesters, Julian Liedtke, Daniel Rausch
ePrint Report
Electronic voting (e-voting) systems have become more prevalent in recent years, but security concerns have also increased, especially regarding the privacy and verifiability of votes. As an essential ingredient for constructing secure e-voting systems, designers often employ zero-knowledge proofs (ZKPs), allowing voters to prove their votes are valid without revealing them. Invalid votes can then be discarded to protect verifiability without compromising the privacy of valid votes.
General purpose zero-knowledge proofs (GPZKPs) such as ZK-SNARKs can be used to prove arbitrary statements, including ballot validity. While a specialized ZKP that is constructed only for a specific election type/voting method, ballot format, and encryption/commitment scheme can be more efficient than a GPZKP, the flexibility offered by GPZKPs would allow for quickly constructing e-voting systems for new voting methods and new ballot formats. So far, however, the viability of GPZKPs for showing ballot validity for various ballot formats, in particular, whether and in how far they are practical for voters to compute, has only recently been investigated for ballots that are computed as Pedersen vector commitments in an ACM CCS 2022 paper by Huber et al.
Here, we continue this line of research by performing a feasibility study of GPZKPs for the more common case of ballots encrypted via Exponential ElGamal encryption. Specifically, building on the work by Huber et al., we describe how the Groth16 ZK-SNARK can be instantiated to show ballot validity for arbitrary election types and ballot formats encrypted via Exponential ElGamal. As our main contribution, we implement, benchmark, and compare several such instances for a wide range of voting methods and ballot formats. Our benchmarks not only establish a basis for protocol designers to make an educated choice for or against such a GPZKP, but also show that GPZKPs are actually viable for showing ballot validity in voting systems using Exponential ElGamal.
General purpose zero-knowledge proofs (GPZKPs) such as ZK-SNARKs can be used to prove arbitrary statements, including ballot validity. While a specialized ZKP that is constructed only for a specific election type/voting method, ballot format, and encryption/commitment scheme can be more efficient than a GPZKP, the flexibility offered by GPZKPs would allow for quickly constructing e-voting systems for new voting methods and new ballot formats. So far, however, the viability of GPZKPs for showing ballot validity for various ballot formats, in particular, whether and in how far they are practical for voters to compute, has only recently been investigated for ballots that are computed as Pedersen vector commitments in an ACM CCS 2022 paper by Huber et al.
Here, we continue this line of research by performing a feasibility study of GPZKPs for the more common case of ballots encrypted via Exponential ElGamal encryption. Specifically, building on the work by Huber et al., we describe how the Groth16 ZK-SNARK can be instantiated to show ballot validity for arbitrary election types and ballot formats encrypted via Exponential ElGamal. As our main contribution, we implement, benchmark, and compare several such instances for a wide range of voting methods and ballot formats. Our benchmarks not only establish a basis for protocol designers to make an educated choice for or against such a GPZKP, but also show that GPZKPs are actually viable for showing ballot validity in voting systems using Exponential ElGamal.
Jelle Vos, Jorrit van Assen, Tjitske Koster, Evangelia Anna Markatou, Zekeriya Erkin
ePrint Report
Private set intersections are cryptographic protocols that compute the intersection of multiple parties' private sets without revealing elements that are not in the intersection. These protocols become less efficient when the number of parties grows, or the size of the sets increases. For this reason, many protocols are based on Bloom filters, which speed up the protocol by approximating the intersections, introducing false positives with a small but non-negligible probability. These false positives are caused by hash collisions in the hash functions that parties use to encode their sets as Bloom filters. In this work, we show that these false positives are more than an inaccuracy: an adversary in the augmented semi-honest model can use them to learn information about elements that are not in the intersection. First, we show that existing security proofs for Bloom filter-based private set intersections are flawed. Second, we show that even in the most optimistic setting, Bloom filter-based private set intersections cannot securely realize an approximate private set intersection unless the parameters are so large that false positives only occur with negligible probability. Third, we propose a practical attack that allows a party to learn if an element is contained in a victim's private set, showing that the problem with Bloom filters is not just theoretical. We conclude that the efficiency gain of using Bloom filters as an approximation in existing protocols vanishes when accounting for this security problem. We propose three mitigations besides choosing larger parameters: One can use oblivious pseudo-random functions instead of hash functions to reduce the success rate of our attack significantly, or replace them with password-based key derivation functions to significantly slow down attackers. A third option is to let a third party authorize the input sets before proceeding with the protocol.
Fukang Liu, Katharina Koschatko, Lorenzo Grassi, Hailun Yan, Shiyao Chen, Subhadeep Banik, Willi Meier
ePrint Report
A new design strategy for ZK-friendly hash functions has emerged since the proposal of $\mathsf{Reinforced Concrete}$ at CCS 2022, which is based on the hybrid use of two types of nonlinear transforms: the composition of some small-scale lookup tables (e.g., 7-bit or 8-bit permutations) and simple power maps over $\mathbb{F}_p$. Following such a design strategy, some new ZK-friendly hash functions have been recently proposed, e.g., $\mathsf{Tip5}$, $\mathsf{Tip4}$, $\mathsf{Tip4}'$ and the $\mathsf{Monolith}$ family. All these hash functions have a small number of rounds, i.e., $5$ rounds for $\mathsf{Tip5}$, $\mathsf{Tip4}$, and $\mathsf{Tip4}'$, and $6$ rounds for $\mathsf{Monolith}$ (recently published at ToSC 2024/3). Using the composition of some small-scale lookup tables to build a large-scale permutation over $\mathbb{F}_p$ - which we call S-box - is a main feature in such designs, which can somehow enhance the resistance against the Gröbner basis attack because this large-scale permutation will correspond to a complex and high-degree polynomial representation over $\mathbb{F}_p$.
As the first technical contribution, we propose a novel and efficient algorithm to study the differential property of this S-box and to find a conforming input pair for a randomly given input and output difference. For comparison, a trivial method based on the use of the differential distribution table (DDT) for solving this problem will require time complexity $\mathcal{O}(p^2)$.
For the second contribution, we also propose new frameworks to devise efficient collision attacks on such hash functions. Based on the differential properties of these S-boxes and the new attack frameworks, we propose the first collision attacks on $3$-round $\mathsf{Tip5}$, $\mathsf{Tip4}$, and $\mathsf{Tip4}'$, as well as $2$-round $\mathsf{Monolith}$-$31$ and $\mathsf{Monolith}$-$64$, where the $2$-round attacks on $\mathsf{Monolith}$ are practical. In the semi-free-start (SFS) collision attack setting, we achieve practical SFS collision attacks on $3$-round $\mathsf{Tip5}$, $\mathsf{Tip4}$, and $\mathsf{Tip4}'$. Moreover, the SFS collision attacks can reach up to $4$-round $\mathsf{Tip4}$ and $3$-round $\mathsf{Monolith}$-$64$. As far as we know, this is the first third-party cryptanalysis of these hash functions, which improves the initial analysis given by the designers.
Fast Multiplication and the PLWE-RLWE Equivalence for an Infinite Family of Cyclotomic Subextensions
Joonas Ahola, Iván Blanco-Chacón, Wilmar Bolaños, Antti Haavikko, Camilla Hollanti, Rodrigo M. Sánchez-Ledesma
ePrint Report
We prove the equivalence between the Ring Learning With Errors (RLWE) and the Polynomial Learning With Errors (PLWE) problems for the maximal totally real subfield of the $2^r 3^s$-th cyclotomic field for $r \geq 3$ and $s \geq 1$. Moreover, we describe a fast algorithm for computing the product of two elements in the ring of integers of these subfields. This multiplication algorithm has quasilinear complexity in the dimension of the field, as it makes use of the fast Discrete Cosine Transform (DCT). Our approach assumes that the two input polynomials are given in a basis of Chebyshev-like polynomials, in contrast to the customary power basis. To validate this assumption, we prove that the change of basis from the power basis to the Chebyshev-like basis can be computed with $\mathcal{O}(n \log n)$ arithmetic operations, where $n$ is the problem dimension. Finally, we provide a heuristic and theoretical comparison of the vulnerability to some attacks for the $p$-th cyclotomic field versus the maximal totally real subextension of the $4p$-th cyclotomic field for a reasonable set of parameters of cryptographic size.
Binwu Xiang, Jiang Zhang, Kaixing Wang, Yi Deng, Dengguo Feng
ePrint Report
Recent attacks on NTRU lattices given by Ducas and van Woerden (ASIACRYPT 2021) showed that for moduli $q$ larger than the so-called fatigue point $n^{2.484+o(1)}$, the security of NTRU is noticeably less than that of (ring)-LWE. Unlike
NTRU-based PKE with $q$ typically lying in the secure regime of NTRU lattices (i.e., $q
In this paper, we first propose a (matrix) NTRU-based MK-FHE for super-constant number $k$ of keys without using overstretched NTRU parameters. Our scheme is essentially a combination of two components following the two-layer framework of TFHE/FHEW: - a simple first-layer matrix NTRU-based encryption that naturally supports multi-key NAND operations with moduli $q=O(k\cdot n^{1.5})$ only linear in the number $k$ of keys; -and a crucial second-layer NTRU-based encryption that supports an efficient hybrid product between a single-key ciphertext and a multi-key ciphertext for gate bootstrapping.
Then, by replacing the first-layer with a more efficient LWE-based multi-key encryption, we obtain an improved MK-FHE scheme with better performance. We also employ a light key-switching technique to reduce the key-switching key size from the previous $O(n^2)$ bits to $O(n)$ bits. A proof-of-concept implementation shows that our two MK-FHE schemes outperform the state-of-the-art TFHE-like MK-FHE schemes in both computation efficiency and bootstrapping key size. Concretely, for $k=8$ at the same 100-bit security level, our improved MK-FHE scheme can bootstrap a ciphertext in {0.54s} on a laptop and only has a bootstrapping key of size {13.89}MB,which are respectively 2.2 times faster and 7.4 times smaller than the MK-FHE scheme (which relies on a second-layer encryption from the ring-LWE assumption) due to Chen, Chillotti and Song (ASIACRYPT 2019).
In this paper, we first propose a (matrix) NTRU-based MK-FHE for super-constant number $k$ of keys without using overstretched NTRU parameters. Our scheme is essentially a combination of two components following the two-layer framework of TFHE/FHEW: - a simple first-layer matrix NTRU-based encryption that naturally supports multi-key NAND operations with moduli $q=O(k\cdot n^{1.5})$ only linear in the number $k$ of keys; -and a crucial second-layer NTRU-based encryption that supports an efficient hybrid product between a single-key ciphertext and a multi-key ciphertext for gate bootstrapping.
Then, by replacing the first-layer with a more efficient LWE-based multi-key encryption, we obtain an improved MK-FHE scheme with better performance. We also employ a light key-switching technique to reduce the key-switching key size from the previous $O(n^2)$ bits to $O(n)$ bits. A proof-of-concept implementation shows that our two MK-FHE schemes outperform the state-of-the-art TFHE-like MK-FHE schemes in both computation efficiency and bootstrapping key size. Concretely, for $k=8$ at the same 100-bit security level, our improved MK-FHE scheme can bootstrap a ciphertext in {0.54s} on a laptop and only has a bootstrapping key of size {13.89}MB,which are respectively 2.2 times faster and 7.4 times smaller than the MK-FHE scheme (which relies on a second-layer encryption from the ring-LWE assumption) due to Chen, Chillotti and Song (ASIACRYPT 2019).
Eliana Carozza, Geoffroy Couteau
ePrint Report
We investigate the feasibility of constructing threshold signature schemes from the MPC-in-the-head paradigm. Our work addresses the significant challenge posed by recent impossibility results (Doerner et al., Crypto’24), which establish inherent barriers to efficient thresholdization of such schemes without compromising their security or significantly increasing the signature size.
- We introduce a general methodology to adapt any MPC-in-the-head signature into a threshold-friendly scheme, ensuring that the dependency on the number of users $n$ grows as $\lambda^2n + O(1)$. This represents a substantial improvement over the naive concatenation of independent signatures.
- We present a threshold signature scheme on top of the scheme of (Carozza, Couteau and Joux, EUROCRYPT’23). Our security analysis introduces the notion of Corruptible Existential Unforgeability under Chosen Message Attacks (CEUF-CMA), which formalizes resilience against adversarial control over parts of the randomness.
Our results provide a new perspective on the trade-offs between efficiency and security in threshold settings, opening pathways for future improvements in post-quantum threshold cryptography.
22 November 2024
Yu Tao, Lu Zhou, Lei Xie, Dongming Zhang, Xinyu Lei, Fei Xu, Zhe Liu
ePrint Report
Sharding emerges as a promising solution to enhance blockchain scalability. However, it faces two critical limitations during shard reconfiguration: (1) the TPS-Degradation issue, arising from ledger synchronization conflicts during transaction processing, and (2) the Zero-TPS issue, caused by disruptions in transaction processing due to key negotiation. To this end, we propose Shardora, a blockchain sharding system for scaling blockchain by unleashing parallelism. In Shardora, we implement two essential mechanisms: (1) A parallelized dual committee framework with a reputation mechanism to mitigate the TPS-Degradation issue while ensuring system security. (2) A parallelized key pre-negotiation mechanism with a secret-reuse strategy to avoid the Zero-TPS issue while maintaining a continuously high TPS. We prove that Shardora offers theory-guaranteed security. We implement a prototype of Shardora and deploy it on Alibaba Cloud. Experimental results demonstrate that Shardora addresses the limitations by significantly reducing the overhead of both ledger synchronization and key negotiation, which outperforms state-of-the-art sharding schemes by at least 90%. In addition, Shardora shows its superior performance in terms of throughput and latency, achieving a peak throughput of 8300 TPS on a single shard with 600 nodes under LAN conditions. The code of Shardora is publicly available on GitHub.
Beatrice Biasioli, Elena Kirshanova, Chiara Marcolla, Sergi Rovira
ePrint Report
The field of fully homomorphic encryption (FHE) has seen many theoretical and computational advances in recent years, bringing the technology closer to practicality than ever before. For this reason, practitioners in related fields, such as machine learning, are increasingly interested in using FHE to provide privacy to their applications.
Despite this progress, selecting secure and efficient parameters for FHE remains a complex and challenging task due to the intricate interdependencies between parameters. In this work, we address this issue by providing a rigorous theoretical foundation for parameter selection for any LWE-based schemes, with a specific focus on FHE. Our approach starts with an in-depth analysis of lattice attacks on the LWE problem, deriving precise expressions for the most effective ones. Building on this, we introduce closed-form formulas that establish the relationships among the LWE parameters.
In addition, we introduce a numerical method to enable the accurate selection of any configurable parameter to meet a desired security level. Finally, we use our results to build a practical and efficient tool for researchers and practitioners deploying FHE in real-world applications, ensuring that our approach is both rigorous and accessible.
Despite this progress, selecting secure and efficient parameters for FHE remains a complex and challenging task due to the intricate interdependencies between parameters. In this work, we address this issue by providing a rigorous theoretical foundation for parameter selection for any LWE-based schemes, with a specific focus on FHE. Our approach starts with an in-depth analysis of lattice attacks on the LWE problem, deriving precise expressions for the most effective ones. Building on this, we introduce closed-form formulas that establish the relationships among the LWE parameters.
In addition, we introduce a numerical method to enable the accurate selection of any configurable parameter to meet a desired security level. Finally, we use our results to build a practical and efficient tool for researchers and practitioners deploying FHE in real-world applications, ensuring that our approach is both rigorous and accessible.
Sofiane Azogagh, Marc-Olivier Killijian, Félix Larose-Gervais
ePrint Report
This paper introduces a novel adaptation of counting sort that enables sorting of encrypted data using Fully Homomorphic Encryption (FHE). Our approach represents the first known sorting algorithm for encrypted data that does not rely on comparisons. The implementation leverages some basic operations on TFHE's Look-Up-Tables (LUT). We have integrated these operations into RevoLUT, a comprehensive open-source library built upon tfhe-rs, which can be of independent interest for oblivious algorithms. We demonstrate the effectiveness of our Blind Counting Sort algorithm by developing a top-$k$ selection algorithm and applying it to privacy-preserving $k$-Nearest Neighbors classification. This proves to be approximately 5x faster than current state-of-the-art methods.
Samyuktha M, Pallavi Borkar, Chester Rebeiro
ePrint Report
In this poster, we present a Jasmin implementation of Mayo2, a multivariate quadratic(MQ) based signature scheme. Mayo overcomes the disadvantage of the Unbalanced oil and vinegar(UOV) scheme by whipping the UOV map to produce public keys of sizes comparable to ML-DSA. Our Jasmin implementation of Mayo2 takes 930 μs for key-gen, 3206 μs for sign, 480 μs for verify based on the average of 1,00,000 runs of the implementation on a 2.25GHz x86 64 processor with 256 GB RAM. To this end, we have a multivariate quadratic based signature implementation that is amenable for verification of constant-time, correctness, proof of equivalence properties using Easycrypt. Subsequently, the results of this endeavor can be extended for other MQ based schemes including UOV.
Md Habibur Rahman
ePrint Report
In the face of escalating security threats in modern
computing systems, there is an urgent need for comprehensive
defense mechanisms that can effectively mitigate invasive, noninvasive and interactive security vulnerabilities in hardware
and software domains. Individually, hardware and software
weaknesses and probable remedies have been practiced but
protecting a combined system has not yet been discussed in
detail. This survey paper provides a comprehensive overview of
the emerging field of Hardware-Software co-Protection against
Invasive and Non-Invasive Security Threats. We systematically
review state-of-the-art research and developments in hardware
and software security techniques, focusing on their integration to
create synergistic defense mechanisms. The survey covers a wide
range of security threats, including physical attacks, side-channel
attacks, and malware exploits, and explores the diverse strategies
employed to counter them. Our survey meticulously examines the
landscape of security vulnerabilities, encompassing both physical
and software-based attack vectors, and explores the intricate
interplay between hardware and software defenses in mitigating
these threats. Furthermore, we discuss the challenges and opportunities associated with Hardware-Software co-Protection and
identify future research directions to advance the field. Through
this survey, we aim to provide researchers, practitioners, and
policymakers with valuable insights into the latest advancements
and best practices for defending against complex security threats
in modern computing environments.
Lars Ran, Monika Trimoska
ePrint Report
Unbalanced Oil and Vinegar (UOV) is one of the oldest, simplest, and most studied ad-hoc multivariate signature schemes. UOV signature schemes are attractive because they have very small signatures and fast verification. On the downside, they have large public and secret keys. As a result, variations of the traditional UOV scheme are usually developed with the goal to reduce the key sizes. Seven variants of UOV were submitted to the additional call for digital signatures by NIST, prior to which, a variant named MQ-Sign was submitted to the (South) Korean post-quantum cryptography competition (KpqC). MQ-Sign is currently competing in the second round of KpqC with two variants. One of the variants corresponds to the classic description of UOV with certain implementation and parameter choices. In the other variant, called MQ-Sign-LR, a part of the central map is constructed from row shifts of a single matrix. This design makes for smaller secret keys, and in the case where the equivalent keys optimization is used, it also leads to smaller public keys. However, we show in this work that the polynomial systems arising from an algebraic attack have a specific structure that can be exploited. Specifically, we are able to find preimages for $d$-periodic targets under the public map with a probability of $63\%$ for all security levels. The complexity of finding these preimages, as well as the fraction of $d$-periodic target increases with $d$ and hence provides a trade-off. We show that for all security levels one can choose $d=\frac{v}{2}$, for $v$ the number of vinegar variables, and reduce the security claim. Our experiments show practical running times for lower $d$ ranging from 0.06 seconds to 32 hours.
Tolun Tosun, Selim Kırbıyık, Emre Koçer, Erkay Savaş, Ersin Alaybeyoğlu
ePrint Report
In this paper, we present a comprehensive analysis of various modular multiplication methods for Number Theoretic Transform (NTT) on FPGA. NTT is a critical and time-intensive component of Fully Homomorphic Encryption (FHE) applications while modular multiplication consumes a significant portion of the design resources in an NTT implementation. We study the existing modular reduction approaches from the literature, and implement particular methods on FPGA. Specifically Word-Level Montgomery (WLM)) for NTT friendly primes [1] and K2RED [2]. For improvements, we explore the trade-offs between the number of available primes in special forms and hardware cost of the reduction methods. We develop a DSP multiplication-optimized version of WLM, which we call WLM-Mixed. We also introduce a subclass of Proth primes, referred to as Proth-l primes, characterized by a low and fixed signed Hamming Weight. This special class of primes allows us to design multiplication-free shift-add versions of K2RED and naive Montgomery reduction [3], referred to as K2RED-Shift and Montgomery-Shift. We provide in-depth evaluations of these five reduction methods in an NTT architecture on FPGA. Our results indicate that WLM-Mixed is highly resource-efficient, utilizing only 3 DSP multiplications for 64-bit coefficient moduli. On the other hand, K2RED-Shift and Montgomery-Shift offer DSP-free alternatives, which can be beneficial in specific scenarios
Emre Koçer, Selim KIrbıyık, Tolun Tosun, Ersin Alaybeyoğlu, Erkay Savaş
ePrint Report
FHE enables computations on encrypted data, making it essential for privacy-preserving applications. However, it involves computationally demanding tasks, such as polynomial multiplication, while NTT is the state-of-the-art solution to perform this task. Most FHE schemes operate over the negacyclic ring of polynomials. We introduce a novel formulation of the hierarchical Four-Step NTT approach for the negacyclic ring, eliminating the need for pre- and post-processing steps found in the existing methods. To accelerate NTT operations, the FPGAs offer flexible and powerful computing platforms. We propose an FPGA-based, parametric and fully pipelined architecture that implements the improved Seven-Step NTT algorithm (which builds upon the four-step). Our design supports a wide range of parameters, including ring sizes up to $2^{16}$ and modulus sizes up to $64$-bit. We focus on achieving configurable throughput, as constrained by the bandwidth of HBM bandwidth, and aim to maximize throughput through an IO parametric design on the Alveo U280 FPGA. The implementation results demonstrate a reduction in the area-time-product by $2.08\times$ and a speed-up of $10.32\times$ for a ring size of $2^{16}$ and a 32-bit width compared to the current state-of-the-art designs.
Shiyao Chen, Xiaoyang Dong, Jian Guo, Tianyu Zhang
ePrint Report
Chosen-prefix collision (CPC) attack was first presented by Stevens, Lenstra and de Weger on MD5 at Eurocrypt 2007. A CPC attack finds a collision for any two chosen prefixes, which is a stronger variant of collision attack. CPCs are naturally harder to construct but have larger practical impact than (identical-prefix) collisions, as seen from the series of previous works on MD5 by Stevens et al. and SHA-1 by Leurent and Peyrin. Despite its significance, the resistance of CPC attacks has not been studied on AES-like hashing.
In this work, we explore CPC attacks on AES-like hashing following the framework practiced on MD5 and SHA-1. Instead of the message modification technique developed for MD-SHA family, we opt for related-key rebound attack to construct collisions for AES-like hashing in view of its effectiveness. We also note that the CPC attack framework can be exploited to convert a specific class of one-block free-start collisions into two-block collisions, which sheds light on the importance of free-start collisions. As a result, we present the first CPC attacks on reduced Whirlpool, Saturnin-hash and AES-MMO/MP in classic and quantum settings, and extend the collision attack on Saturnin-hash from 5 to 6 rounds in the classic setting. As an independent contribution, we improve the memoryless algorithm of solving 3-round inbound phase by Hosoyamada and Sasaki at Eurocrpyt 2020, which leads to improved quantum attacks on Whirlpool. Notably, we find the first 6-round memoryless quantum collision attack on Whirlpool better than generic CNS collision finding algorithm when exponential-size qRAM is not available but exponential-size classic memory is available.