Yu Yu

CryptoDB

Yu Yu

ORCID: 0000-0002-9278-4521

Publications and invited talks

Year

Venue

Title

2025

EUROCRYPT

BitGC: Garbled Circuits with 1 Bit per Gate Abstract

Hanlin Liu Xiao Wang Kang Yang Yu Yu

We present BitGC, a garbling scheme for Boolean circuits with 1 bit per gate communication based on either ring learning with errors (RLWE) or NTRU assumption, with key-dependent message security. The garbling consists of 1) a homomorphically encrypted seed that can be expanded to encryption of many pseudo-random bits and 2) one-bit stitching information per gate to reconstruct garbled tables from the expanded ciphertexts. By using low-complexity PRGs, both the garbling and evaluation of each gate require only O(1) homomorphic addition/multiplication operations without bootstrapping.

2025

EUROCRYPT

Tighter Security Notions for a Modular Approach to Private Circuits Abstract

Bohan Wang Juelin Zhang Yu Yu Weijia Wang

To counteract side-channel attacks, a masking scheme splits each intermediate variable into $n$ shares and transforms each elementary operation (e.g., field addition and multiplication) to the masked correspondence called gadget, such that intrinsic noise in the leakages renders secret recovery infeasible in practice. A simple and efficient security notion is the probing model ensuring that any $n-1$ shares are independently distributed from the secret input. One requirement of the probing model is that the noise in the leakages should increase with the number of shares, largely restricting the side-channel security in the low-noise scenario. Another security notion for masking, called the random probing model, allows each variable to leak with a probability $p$. While this model reflects the physical reality of side channels much better, it brings significant overhead. At Crypto 2018, Ananth et al. proposed a modular approach that can provide random probing security for any security level by expanding small base gadgets with $n$ share recursively, such that the tolerable leakage probability $p$ decreases with $n$ while the security increases exponentially with the recursion depth of expansion. Then, Bela{\"{i}}d et al. provided a formal security definition called Random Probing Expandability~(RPE) and an explicit framework using the modular approach to construct masking schemes at Crypto 2020. In this paper, we investigate how to tighten the RPE definition via allowing the dependent failure probabilities of multiple inputs, which results in a new definition called related RPE. It can be directly used for the expansion of multiplication gates and reduce the complexity of the base multiplication gadget from $\mathcal{O}(n^2\log n)$ proposed at Asiacrypt 2021 to $\mathcal{O}(n^2)$ and maintain the same security level. Furthermore, we describe a method to expand any gates (rather than only multiplication) with the related RPE gadgets. Besides, we denote another new RPE definition called Multiple inputs RPE used for the expansion of multiple-input gates composed with any gates. Utilizing these methods, we reduce the complexity of the 3-share circuit compiler to $\mathcal{O}(|C|\cdot\kappa^{3.2})$, where $|C|$ is the size of the unprotected circuit and the protection failure probability of the global circuit is $2^{-\kappa}$. In comparison, the complexity of the state-of-the-art work, proposed at Eurocrypt 2021, is $\mathcal{O}(|C|\cdot\kappa^{3.9})$ for the same value of $n$. Additionally, we provide the construction of a 5-share circuit compiler with a complexity $\mathcal{O}(|C|\cdot\kappa^{2.8})$.

2025

PKC

Stateless Deterministic Multi-Party EdDSA Signatures with Low Communication Abstract

Qi Feng Kang Yang Kaiyi Zhang Xiao Wang Yu Yu Xiang Xie

EdDSA is a standardized signing algorithm, by both the IRTF and NIST, that is widely used in blockchain, e.g., Hyperledger, Cardano, Zcash, etc. It is a variant of the well-known Schnorr signature scheme that leverages Edwards curves. It features stateless and deterministic nonce generation, meaning it does not rely on a reliable source of randomness or state continuity. Recently, NIST issued a call for multi-party threshold EdDSA signatures, with one approach verifying nonce generation through zero-knowledge (ZK) proofs. In this paper, we propose a new stateless and deterministic multi-party EdDSA protocol in the full-threshold setting, capable of tolerating all-but-one malicious corruption. Compared to the state-of-the-art multi-party EdDSA protocol by Garillot et al. (Crypto’21), our protocol reduces communication cost by a factor of 56x while maintaining the same three-round structure, albeit with a roughly 2.25x increase in computational cost. We utilize information-theoretic message authentication codes (IT-MACs) in a multi-verifier setting to authenticate values and transform them from the Boolean domain to the arithmetic domain by refining multi-verifier extended doubly-authenticated bits (mv-edabits). Additionally, we employ pseudorandom correlation functions (PCF) to generate IT-MACs in a stateless and deterministic manner. Combining these elements, we design a multi-verifier zero-knowledge (MVZK) protocol for stateless and deterministic nonce generation. Our protocol can be used to build secure blockchain wallets and custody solutions, enhancing key protection.

2025

CRYPTO

Authenticated BitGC for Actively Secure Rate-One 2PC Abstract

Hanlin Liu Xiao Wang Kang Yang Yu Yu

In this paper, we present a constant-round actively secure two-party computation protocol with small communication based on the ring learning with errors (RLWE) assumption with key-dependent message security. Our result builds on the recent BitGC protocol by Liu, Wang, Yang, and Yu (Eurocrypt 2025) with communication of one bit per gate for semi-honest security. First, we achieve a different manner of distributed garbling, where the global correlation is secret-shared among the two parties. The garbler always and only holds the garbled labels corresponding to the wire values when all inputs are zero, while the evaluator holds the labels corresponding to the real evaluation. In the second phase, we run an authentication protocol that requires some extra communication, which allows two parties to check the correct computation of each gate by treating the ciphertext as commitments, now that the global key is distributed. For layered circuits, the extra communication for authentication is $o(1)$ bits per gate, resulting in total communication of $1+o(1)$ bits per gate. For generic circuits, the extra communication cost can be $1$ bit per gate, and thus, the total communication cost would be 2 bits per gate.

2025

ASIACRYPT

GPV Preimage Sampling with Weak Smoothness and Its Applications to Lattice Signatures Abstract

Shiduo Zhang Huiwen Jia Delong Ran Yang Yu Yu Yu Xiaoyun Wang

The lattice trapdoor associated with Ajtai's function is the cornerstone of many lattice-based cryptosystems. The current provably secure trapdoor framework, known as the GPV framework, uses a \emph{strong smoothness} condition, i.e. $\epsilon\ll \frac{1}{n^2}$ for smoothing parameter $\eta_{\epsilon}(\Z^{n})$, to ensure the correctness of the security reduction. In this work, we investigate the feasibility of \emph{weak smoothness}, e.g., $\epsilon = O(\frac{1}{n})$ or even $O(1)$ in the GPV framework and present several positive results. First, we provide a theoretical security proof for GPV with weak smoothness under a new assumption. %Interestingly, the additional assumption has \emph{no impact on the concrete security} of practical GPV signatures. Then, we present Gaussian samplers that are compatible with the weak smoothness condition. As direct applications, we present two practical GPV signature instantiations based on a weak smoothness condition. Our first instantiation is a variant of Falcon achieving \emph{smaller size} and \emph{higher security}. The public key sizes are $21\%$ to $28\%$ smaller, and the signature sizes are $23.5\%$ to $29\%$ smaller than Falcon. We also showcase an NTRU-based GPV signature scheme that employs the Peikert sampler with weak smoothness. This offers a simple implementation while the security level is greatly lower. Nevertheless, at the NIST-3 security level, our scheme achieves a $49\%$ reduction in size compared to Dilithium-3.

2025

ASIACRYPT

A Hybrid Algorithm for the Regular Syndrome Decoding Problem Abstract

Tianrui Wang Anyu Wang Kang Yang Hanlin Liu Yu Yu Jun Zhang Xiaoyun Wang

Regular Syndrome Decoding (RSD) is a variant of the traditional Syndrome Decoding (SD) problem, where the error vector is divided into consecutive, equal-length blocks, each containing exactly one nonzero element. Recently, RSD has gained significant attention due to its extensive applications in cryptographic constructions, including MPC, ZK protocols, and more. The computational complexity of RSD has primarily been analyzed using two methods: Information Set Decoding (ISD) approach and algebraic approach. In this paper, we introduce a new hybrid algorithm for solving the RSD problem. This algorithm can be viewed as replacing the meet-in-the-middle enumeration in ISD with a process that solves quadratic equations. Our new algorithm demonstrates superior performance across a wide range of concrete parameters compared to previous methods, including both ISD and algebraic approaches, for parameter sets over both large fields (q = 2^128) and binary fields (q = 2). For parameter sets used in prior works, our algorithm reduces the concrete security of RSD by up to 20 bits compared to the state-of-the-art algorithms. We also provide an asymptotic analysis, identifying a broader parameter region where RSD is solvable in polynomial time compared to ISD and algebraic methods over binary fields. Additionally, we apply our algorithm to evaluate the security of the ZK protocol Wolverine (IEEE S&P 2021) and the OT protocol Ferret (ACM CCS 2020). Our results reduce the security level of Wolverine, which targets a 128-bit security level, to about 111 bits, and also marginally lowers the security of Ferret below the targeted 128-bit level for the first time.

2025

TCHES

Rejected Signatures’ Challenges Pose New Challenges: Key Recovery of CRYSTALS-Dilithium via Side-Channel Attacks Abstract

Yuanyuan Zhou Weijia Wang Yiteng Sun Yu Yu

Rejection sampling is a crucial security mechanism in lattice-based signature schemes that follow the Fiat-Shamir with aborts paradigm, such as MLDSA/ CRYSTALS-Dilithium. This technique transforms secret-dependent signature samples into ones that are statistically close to a secret-independent distribution (in the random oracle model). While many side-channel attacks have directly targeted sensitive data such as nonces, secret keys, and decomposed commitments, fewer studies have explored the potential leakage associated with rejection sampling. Notably, at HOST 2021, Karabulut et al. showed that leakage from rejected signatures’ challenges can undermine, but not entirely break, the security of the Dilithium scheme.Motivated by the above, we convert the problem of key recovery (from the leakage of rejection sampling) to an integer linear programming problem (ILP), where rejected responses of unique Hamming weights set upper/lower constraints of the product between the challenge and the private key. We formally study the worst-case complexity of the problem as well as empirically confirm the practicality of the rejected signature’s challenge attack. For all three security levels of Dilithium-2/3/5, our attack recovers the private key in seconds or minutes with a 100% Success Rate (SR).Our attack leverages knowledge of the rejected signature’s challenge and response, and thus we propose methods to extract this information by exploiting single-trace sidechannel leakage from Number Theoretic Transform (NTT) operations and functions associated with the response generation procedure. We demonstrate the practicality of this rejected signature’s challenge attack by using real power consumption on an ARM Cortex-M4 microcontroller. To the best of our knowledge, it is the first practical and efficient side-channel key recovery attack on ML-DSA/Dilithium that targets the rejection sampling procedure. Furthermore, we discuss some countermeasures to mitigate this security issue.

2024

JOFC

Actively Secure Half-Gates with Minimum Overhead under Duplex Networks Abstract

Hongrui Cui Xiao Wang Kang Yang Yu Yu

Abstract Actively secure two-party computation (2PC) is one of the canonical building blocks in modern cryptography. One main goal for designing actively secure 2PC protocols is to reduce the communication overhead, compared to semi-honest 2PC protocols. In this paper, we make significant progress in closing this gap by proposing two new actively secure constant-round 2PC protocols, one with one-way communication of $$2\kappa +5$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>5</mml:mn> </mml:mrow> </mml:math> bits per AND gate (for $$\kappa $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>κ</mml:mi> </mml:math> -bit computational security and any statistical security) and one with total communication of $$2\kappa +\rho +5$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mi>ρ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>5</mml:mn> </mml:mrow> </mml:math> bits per AND gate (for $$\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ρ</mml:mi> </mml:math> -bit statistical security). In particular, our first protocol essentially matches the one-way communication of semi-honest half-gates protocol. Our optimization is achieved by three new techniques: The recent compression technique by Dittmer et al. (Crypto 13510:57–87, 2022) shows that a relaxed preprocessing is sufficient for authenticated garbling that does not reveal masked wire values to the garbler. We introduce a new form of authenticated bits and propose a new technique of generating authenticated AND triples to reduce the one-way communication of preprocessing from $$5\rho +1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>5</mml:mn> <mml:mi>ρ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:math> bits to 2 bits per AND gate for $$\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>ρ</mml:mi> </mml:math> -bit statistical security. Unfortunately, the above compressing technique is only compatible with a less compact authenticated garbled circuit of size $$2\kappa +3\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>3</mml:mn> <mml:mi>ρ</mml:mi> </mml:mrow> </mml:math> bits per AND gate. We designed a new authenticated garbling that does not use information-theoretic MACs but rather dual execution without leakage to authenticate wire values in the circuit. This allows us to use a more compact half-gates based authenticated garbled circuit of size $$2\kappa +1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:math> bits per AND gate, and meanwhile keep compatible with the compression technique. Our new technique can achieve one-way communication of $$2\kappa +5$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>5</mml:mn> </mml:mrow> </mml:math> bits per AND gate. In terms of total communication, we notice that the communication overhead of the consistency checking method by Dittmer et al. (Crypto 13510:57–87, 2022) can be optimized by adding one-round of interaction and utilizing the Free-XOR property. This reduces the online communication from $$2\kappa +3\rho $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>3</mml:mn> <mml:mi>ρ</mml:mi> </mml:mrow> </mml:math> bits down to $$2\kappa +\rho +1$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mi>ρ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>1</mml:mn> </mml:mrow> </mml:math> bits per AND gate. Combined with our first contribution, this yields total amortized communication of $$2\kappa +\rho +5$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>2</mml:mn> <mml:mi>κ</mml:mi> <mml:mo>+</mml:mo> <mml:mi>ρ</mml:mi> <mml:mo>+</mml:mo> <mml:mn>5</mml:mn> </mml:mrow> </mml:math> bits.

2024

JOFC

An Efficient ZK Compiler from SIMD Circuits to General Circuits Abstract

Dung Bui Haotian Chu Geoffroy Couteau Xiao Wang Chenkai Weng Kang Yang Yu Yu

Abstract We propose a generic compiler that can convert any zero-knowledge (ZK) proof for SIMD circuits to general circuits efficiently, and an extension that can preserve the space complexity of the proof systems. Our compiler can immediately produce new results improving upon state of the art. By plugging in our compiler to Antman, an interactive sublinear-communication protocol, we improve the overall communication complexity for general circuits from $$\mathcal {O}(C^{3/4})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>C</mml:mi> <mml:mrow> <mml:mn>3</mml:mn> <mml:mo>/</mml:mo> <mml:mn>4</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> to $$\mathcal {O}(C^{1/2})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>C</mml:mi> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> . Our implementation shows that for a circuit of size $$2^{27}$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:msup> <mml:mn>2</mml:mn> <mml:mn>27</mml:mn> </mml:msup> </mml:math> , it achieves up to $$83.6\times $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>83.6</mml:mn> <mml:mo>×</mml:mo> </mml:mrow> </mml:math> improvement on communication compared to the state-of-the-art implementation. Its end-to-end running time is at least $$70\%$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mn>70</mml:mn> <mml:mo>%</mml:mo> </mml:mrow> </mml:math> faster in a 10Mbps network. Using the recent results on compressed $$\varSigma $$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mi>Σ</mml:mi> </mml:math> -protocol theory, we obtain a discrete-log-based constant-round zero-knowledge argument with $$\mathcal {O}(C^{1/2})$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:msup> <mml:mi>C</mml:mi> <mml:mrow> <mml:mn>1</mml:mn> <mml:mo>/</mml:mo> <mml:mn>2</mml:mn> </mml:mrow> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> communication and common random string length, improving over the state of the art that has linear-size common random string and requires heavier computation. We improve the communication of a designated n-verifier zero-knowledge proof from $$\mathcal {O}(nC/B+n^2B^2)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:mi>n</mml:mi> <mml:mi>C</mml:mi> <mml:mo>/</mml:mo> <mml:mi>B</mml:mi> <mml:mo>+</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:msup> <mml:mi>B</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> to $$\mathcal {O}(nC/B+n^2)$$ <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML"> <mml:mrow> <mml:mi>O</mml:mi> <mml:mo>(</mml:mo> <mml:mi>n</mml:mi> <mml:mi>C</mml:mi> <mml:mo>/</mml:mo> <mml:mi>B</mml:mi> <mml:mo>+</mml:mo> <mml:msup> <mml:mi>n</mml:mi> <mml:mn>2</mml:mn> </mml:msup> <mml:mo>)</mml:mo> </mml:mrow> </mml:math> . To demonstrate the scalability of our compilers, we were able to extract a commit-and-prove SIMD ZK from Ligero and cast it in our framework. We also give one instantiation derived from LegoSNARK, demonstrating that the idea of CP-SNARK also fits in our methodology.

2024

PKC

ReSolveD: Shorter Signatures from Regular Syndrome Decoding and VOLE-in-the-Head Abstract

Hongrui Cui Hanlin Liu Di Yan Kang Yang Yu Yu Kaiyi Zhang

We present ReSolveD, a new candidate post-quantum signature scheme under the regular syndrome decoding (RSD) assumption for random linear codes, which is a well-established variant of the well-known syndrome decoding (SD) assumption. Our signature scheme is obtained by designing a new zero-knowledge proof for proving knowledge of a solution to the RSD problem in the recent VOLE-in-the-head framework using a sketching scheme to verify that a vector has weight exactly one. We achieve a signature size of 3.99 KB with a signing time of 27.3 ms and a verification time of 23.1 ms on a single core of a standard desktop for a 128-bit security level. Compared to the state-of-the-art code-based signature schemes, our signature scheme achieves 1.5X ~ 2X improvement in terms of the common “signature size + public-key size” metric, while keeping the computational efficiency competitive.

2024

EUROCRYPT

The Hardness of LPN over Any Integer Ring and Field for PCG Applications Abstract

Hanlin Liu Xiao Wang Kang Yang Yu Yu

Learning parity with noise (LPN) has been widely studied and used in cryptography. It was recently brought to new prosperity since Boyle et al. (CCS'18), putting LPN to a central role in designing secure multi-party computation, zero-knowledge proofs, private set intersection, and many other protocols. In this paper, we thoroughly studied security of LPN problems in this particular context. We found that some important aspects are long ignored and many conclusions from classical LPN cryptanalysis do not apply to this new setting, due to the low noise rates, extremely high dimensions, various types (in addition to $\FF_2$) and noise distributions. For LPN over a field, we give a parameterized reduction from exact-noise LPN to regular-noise LPN. Compared to the recent result by Feneuil, Joux and Rivain (Crypto'22), we significantly reduce the security loss by paying only a small additive price in dimension and number of samples. We analyze the security of LPN over a ring $\ZZ_{2^\lambda}$. Existing protocols based on LPN over integer rings use parameters as if they are over fields, but we found an attack that effectively reduces the weight of a noise by half compared to LPN over fields. Consequently, prior works that use LPN over $\ZZ_{2^\lambda}$ overestimate up to 40 bits of security. We provide a complete picture of the hardness of LPN over integer rings by showing: 1) the equivalence between its search and decisional versions; 2) an efficient reduction from LPN over $\FF_{2}$ to LPN over $\ZZ_{2^\lambda}$; and 3) generalization of our results to any integer ring. Finally, we provide an all-in-one estimator tool for the bit security of LPN parameters in the context of PCG, incorporating the recent advanced attacks.

2024

TCHES

Efficient Table-Based Masking with Pre-processing Abstract

Juelin Zhang Taoyun Wang Yiteng Sun Fanjie Ji Bohan Wang Lu Li Yu Yu Weijia Wang

Masking is one of the most investigated countermeasures against sidechannel attacks. In a nutshell, it randomly encodes each sensitive variable into a number of shares, and compiles the cryptographic implementation into a masked one that operates over the shares instead of the original sensitive variables. Despite its provable security benefits, masking inevitably introduces additional overhead. Particularly, the software implementation of masking largely slows down the cryptographic implementations and requires a large number of random bits that need to be produced by a true random number generator. In this respect, reducing the< overhead of masking is still an essential and challenging task. Among various known schemes, Table-Based Masking (TBM) stands out as a promising line of work enjoying the advantages of generality to any lookup tables. It also allows the pre-processing paradigm, wherein a pre-processing phase is executed independently of the inputs, and a much more efficient online (using the precomputed tables) phase takes place to calculate the result. Obviously, practicality of pre-processing paradigm relies heavily on the efficiency of online phase and the size of precomputed tables.In this paper, we investigate the TBM scheme that offers a combination of linear complexity (in terms of the security order, denoted as d) during the online phase and small precomputed tables. We then apply our new scheme to the AES-128, and provide an implementation on the ARM Cortex architecture. Particularly, for a security order d = 8, the online phase outperforms the current state-of-the-art AES implementations on embedded processors that are vulnerable to the side-channel attacks. The security order of our scheme is proven in theory and verified by the T-test in practice. Moreover, we investigate the speed overhead associated with the random bit generation in our masking technique. Our findings indicate that the speed overhead can be effectively balanced. This is mainly because that the true random number generator operates in parallel with the processor’s execution, ensuring a constant supply of fresh random bits for the masked computation at regular intervals.

2024

CIC

Efficient Maliciously Secure Oblivious Exponentiations Abstract

Carsten Baum Jens Berlips Walther Chen Ivan B. Damgård Kevin M. Esvelt Leonard Foner Dana Gretton Martin Kysel Ronald L. Rivest Lawrence Roy Francesca Sage-Ling Adi Shamir Vinod Vaikuntanathan Lynn Van Hauwe Theia Vogel Benjamin Weinstein-Raun Daniel Wichs Stephen Wooster Andrew C. Yao Yu Yu

<p> Oblivious Pseudorandom Functions (OPRFs) allow a client to evaluate a pseudorandom function (PRF) on her secret input based on a key that is held by a server. In the process, the client only learns the PRF output but not the key, while the server neither learns the input nor the output of the client. The arguably most popular OPRF is due to Naor, Pinkas and Reingold (Eurocrypt 2009). It is based on an Oblivious Exponentiation by the server, with passive security under the Decisional Diffie-Hellman assumption. In this work, we strengthen the security guarantees of the NPR OPRF by protecting it against active attacks of the server. We have implemented our solution and report on the performance. Our main result is a new batch OPRF protocol which is secure against maliciously corrupted servers, but is essentially as efficient as the semi-honest solution. More precisely, the computation (and communication) overhead is a multiplicative factor $o(1)$ as the batch size increases. The obvious solution using zero-knowledge proofs would have a constant factor overhead at best, which can be too expensive for certain deployments. Our protocol relies on a novel version of the DDH problem, which we call the Oblivious Exponentiation Problem (OEP), and we give evidence for its hardness in the Generic Group model. We also present a variant of our maliciously secure protocol that does not rely on the OEP but nevertheless only has overhead $o(1)$ over the known semi-honest protocol. Moreover, we show that our techniques can also be used to efficiently protect threshold blind BLS signing and threshold ElGamal decryption against malicious attackers. </p>

2023

EUROCRYPT

Actively Secure Half-Gates with Minimum Overhead under Duplex Networks Abstract

Hongrui Cui Xiao Wang Kang Yang Yu Yu

Actively secure two-party computation (2PC) is one of the canonical building blocks in modern cryptography. One main goal for designing actively secure 2PC protocols is to reduce the communication overhead, compared to semi-honest 2PC protocols. In this paper, we propose a new actively secure constant-round 2PC protocol with one-way communication of $2\kappa+5$ bits per AND gate (for $\kappa$-bit computational security and any statistical security), essentially matching the one-way communication of semi-honest half-gates protocol. This is achieved by two new techniques: - The recent compression technique by Dittmer et al. (Crypto 2022) shows that a relaxed preprocessing is sufficient for authenticated garbling that does not reveal masked wire values to the garbler. We introduce a new form of authenticated bits and propose a new technique of generating authenticated AND triples to reduce the one-way communication of preprocessing from $5\rho+1$ bits to $2$ bits per AND gate for $\rho$-bit statistical security. - Unfortunately, the above compressing technique is only compatible with a less compact authenticated garbled circuit of size $2\kappa+3\rho$ bits per AND gate. We designed a new authenticated garbling that does not use information theoretic MACs but rather dual execution without leakage to authenticate wire values in the circuit. This allows us to use a more compact half-gates based authenticated garbled circuit of size $2\kappa+1$ bits per AND gate, and meanwhile keep compatible with the compression technique. Our new technique can achieve one-way communication of $2\kappa+5$ bits per AND gate. Our technique of yielding authenticated AND triples can also be used to optimize the two-way communication (i.e., the total communication) by combining it with the authenticated garbled circuits by Dittmer et al., which results in an actively secure 2PC protocol with two-way communication of $2\kappa+3\rho+4$ bits per AND gate.

2023

TCHES

Efficient Private Circuits with Precomputation Abstract

Weijia Wang Fanjie Ji Juelin Zhang Yu Yu

At CHES 2022, Wang et al. described a new paradigm for masked implementations using private circuits, where most intermediates can be precomputed before the input shares are accessed, significantly accelerating the online execution of masked functions. However, the masking scheme they proposed mainly featured (and was designed for) the cost amortization, leaving its (limited) suitability in the above precomputation-based paradigm just as a bonus. This paper aims to provide an efficient, reliable, easy-to-use, and precomputation-compatible masking scheme. We propose a new masked multiplication over the finite field Fq suitable for the precomputation, and prove its security in the composable notion called Probing-Isolating Non-Inference (PINI). Particularly, the operations (e.g., AND and XOR) in the binary field can be achieved by assigning q = 2, allowing the bitsliced implementation that has been shown to be quite efficient for the software implementations. The new masking scheme is applied to leverage the masking of AES and SKINNY block ciphers on ARM Cortex M architecture. The performance results show that the new scheme contributes to a significant speed-up compared with the state-of-the-art implementations. For SKINNY with block size 64, the speed and RAM requirement can be significantly improved (saving around 45% cycles in the online-computation and 60% RAM space for precomputed values) from AES-128, thanks to its smaller number of AND gates. Besides the security proof by hand, we provide formal verifications for the multiplication and T-test evaluations for the masked implementations of AES and SKINNY. Because of the structure of the new masked multiplication, our formal verification can be performed for security orders up to 16.

2023

CRYPTO

Revisiting the Constant-sum Winternitz One-time Signature with Applications to SPHINCS+ and XMSS Abstract

Kaiyi Zhang Hongrui Cui Yu Yu

Hash-based signatures offer a conservative alternative to post-quantum signatures with arguably better-understood security than other post-quantum candidates. As a core building block of hash-based signatures, the efficiency of one-time signature (OTS) largely dominates that of hash-based signatures. The WOTS$^{+}$ signature scheme (Africacrypt 2013) is the current state-of-the-art OTS adopted by the signature schemes standardized by NIST---XMSS, LMS, and SPHINCS$^+$. A natural question is whether there is (and how much) room left for improving one-time signatures (and thus standard hash-based signatures). In this paper, we show that WOTS$^{+}$ one-time signature, when adopting the constant-sum encoding scheme (Bos and Chaum, Crypto 1992), is size-optimal not only under Winternitz's OTS framework, but also among all tree-based OTS designs. Moreover, we point out a flaw in the DAG-based OTS design previously shown to be size-optimal at Asiacrypt 1996, which makes the constant-sum WOTS$^{+}$ the most size-efficient OTS to the best of our knowledge. Finally, we evaluate the performance of constant-sum WOTS$^{+}$ integrated into the SPHINCS$^+$ (CCS 2019) and XMSS (PQC 2011) signature schemes which exhibit certain degrees of improvement in both signing time and signature size.

2023

ASIACRYPT

Algebraic Attacks on Round-Reduced Rain and Full AIM-III Abstract

Kaiyi Zhang Qingju Wang Yu Yu Chun Guo Hongrui Cui

Picnic is a NIST PQC Round 3 Alternate signature candidate that builds upon symmetric primitives following the MPC-in-the-head paradigm. Recently, researchers have been exploring more secure/efficient signature schemes from conservative one-way functions based on AES, or new low complexity one-way functions like Rain (CCS 2022) and AIM (CCS 2023). The signature schemes based on Rain and AIM are currently the most efficient among MPC-in-the-head-based schemes, making them promising post-quantum digital signature candidates. However, the exact hardness of these new one-way functions deserves further study and scrutiny. This work presents algebraic attacks on Rain and AIM for certain instances, where one-round Rain can be compromised in $2^{n/2}$ for security parameter $n\in \{128,192,256\}$, and two-round Rain can be broken in $2^{120.3}$, $2^{180.4}$, and $2^{243.1}$ encryptions, respectively. Additionally, we demonstrate an attack on AIM-III (which aims at 192-bit security) with a complexity of $2^{186.5}$ encryptions. These attacks exploit the algebraic structure of the power function over fields with characteristic 2, which provides potential insights into the algebraic structures of some symmetric primitives and thus might be of independent interest.

2023

TCC

Security Proofs for Key-Alternating Ciphers with Non-Independent Round Permutations Abstract

Liqing Yu Yusai Wu Yu Yu Zhenfu Cao Xiaolei Dong

This work studies the key-alternating ciphers (KACs) whose round permutations are not necessarily independent. We revisit existing security proofs for key-alternating ciphers with a single permutation (KACSPs), and extend their method to an arbitrary number of rounds. In particular, we propose new techniques that can significantly simplify the proofs, and also remove two unnatural restrictions in the known security bound of 3-round KACSP (Wu et al., Asiacrypt 2020). With these techniques, we prove the first tight security bound for t-round KACSP, which was an open problem. We stress that our techniques apply to all variants of KACs with non-independent round permutations, as well as to the standard KACs.

2022

EUROCRYPT

Cryptanalysis of Candidate Obfuscators for Affine Determinant Programs 📺 Abstract

Li Yao Yilei Chen Yu Yu

At ITCS 2020, Bartusek et al. proposed a candidate indistinguishability obfuscator (iO) for affine determinant programs (ADPs). The candidate is special since it is the only unbroken candidate iO to date that does not rely on the hardness of traditional cryptographic assumptions like discrete-log or learning with errors. Instead, it directly applies specific randomization techniques to the underlying ADP. It is relatively efficient compared to the rest of the iO candidates. However, the obfuscation scheme requires further cryptanalysis since it was not known to be based on any well-formed mathematical assumptions. In this paper, we show cryptanalytic attacks on the iO candidate provided by Bartusek et al. Our attack exploits the weakness of one of the randomization steps in the candidate. The attack applies to a fairly general class of programs. At the end of the paper we discuss plausible countermeasures to defend against our attacks.

2022

TCHES

Side-Channel Masking with Common Shares Abstract

Weijia Wang Chun Guo Yu Yu Fanjie Ji Yang Su

To counter side-channel attacks, a masking scheme randomly encodes keydependent variables into several shares, and transforms operations into the masked correspondence (called gadget) operating on shares. This provably achieves the de facto standard notion of probing security.We continue the long line of works seeking to reduce the overhead of masking. Our main contribution is a new masking scheme over finite fields in which shares of different variables have a part in common. This enables the reuse of randomness / variables across different gadgets, and reduces the total cost of masked implementation. For security order d and circuit size l, the randomness requirement and computational complexity of our scheme are Õ(d2) and Õ(ld2) respectively, strictly improving upon the state-of-the-art Õ(d2) and Õ(ld3) of Coron et al. at Eurocrypt 2020.A notable feature of our scheme is that it enables a new paradigm in which many intermediates can be precomputed before executing the masked function. The precomputation consumes Õ(ld2) and produces Õ(ld) variables to be stored in RAM. The cost of subsequent (online) computation is reduced to Õ(ld), effectively speeding up e.g., challenge-response authentication protocols. We showcase our method on the AES on ARM Cortex M architecture and perform a T-test evaluation. Our results show a speed-up during the online phase compared with state-of-the-art implementations, at the cost of acceptable RAM consumption and precomputation time.To prove security for our scheme, we propose a new security notion intrinsically supporting randomness / variables reusing across gadgets, and bridging the security of parallel compositions of gadgets to general compositions, which may be of independent interest.

2022

ASIACRYPT

A Third is All You Need: Extended Partial Key Exposure Attack on CRT-RSA with Additive Exponent Blinding 📺 Abstract

Yuanyuan Zhou Joop van de Pol Yu Yu François-Xavier Standaert

At Eurocrypt 2022, May et al. proposed a partial key exposure (PKE) attack on CRT-RSA that efficiently factors $N$ knowing only a $\frac{1}{3}$-fraction of either most significant bits (MSBs) or least significant bits (LSBs) of private exponents $d_p$ and $d_q$ for public exponent $e \approx N^{\frac{1}{12}}$. In practice, PKE attacks typically rely on the side-channel leakage of these exponents, while a side-channel resistant implementation of CRT-RSA often uses additively blinded exponents $d^{\prime}_p = d_p + r_p(p-1)$ and $d^{\prime}_q = d_q + r_q(q-1)$ with unknown random blinding factors $r_p$ and $r_q$, which makes PKE attacks more challenging. Motivated by the above, we extend the PKE attack of May et al. to CRT-RSA with additive exponent blinding. While admitting $r_pe\in(0,N^{\frac{1}{4}})$, our extended PKE works ideally when $r_pe \approx N^{\frac{1}{12}}$, in which case the entire private key can be recovered using only $\frac{1}{3}$ known MSBs or LSBs of the blinded CRT exponents $d^{\prime}_p$ and $d^{\prime}_q$. Our extended PKE follows their novel two-step approach to first compute the key-dependent constant $k^{\prime}$ ($ed^{\prime}_p = 1 + k^{\prime}(p-1)$, $ed^{\prime}_q = 1 + l^{\prime}(q-1)$), and then to factor $N$ by computing the root of a univariate polynomial modulo $k^{\prime}p$. We extend their approach as follows. For the MSB case, we propose two options for the first step of the attack, either by obtaining a single estimate $k^{\prime}l^{\prime}$ and calculating $k^{\prime}$ via factoring, or by obtaining multiple estimates $k^{\prime}l^{\prime}_1,\ldots,k^{\prime}l^{\prime}_z$ and calculating $k^{\prime}$ probabilistically via GCD. For the LSB case, we extend their approach by constructing a different univariate polynomial in the second step of the LSB attack. A formal analysis shows that our LSB attack runs in polynomial time under the standard Coppersmith-type assumption, while our MSB attack either runs in sub-exponential time with a reduced input size (the problem is reduced to factor a number of size $e^2r_pr_q\approx N^{\frac{1}{6}}$) or in probabilistic polynomial time under a novel heuristic assumption. Under the settings of the most common key sizes (1024-bit, 2048-bit, and 3072-bit) and blinding factor lengths (32-bit, 64-bit, and 128-bit), our experiments verify the validity of the Coppersmith-type assumption and our own assumption, as well as the feasibility of the factoring step. To the best of our knowledge, this is the first PKE on CRT-RSA with experimentally verified effectiveness against 128-bit unknown exponent blinding factors. We also demonstrate an application of the proposed PKE attack using real partial side-channel key leakage targeting a Montgomery Ladder exponentiation CRT implementation.

2022

ASIACRYPT

A Non-heuristic Approach to Time-space Tradeoffs and Optimizations for BKW 📺 Abstract

Hanlin Liu Yu Yu

Blum, Kalai and Wasserman (JACM 2003) gave the first sub-exponential algorithm to solve the Learning Parity with Noise (LPN) problem. In particular, consider the LPN problem with constant noise and dimension $n$. The BKW solves it with space complexity $2^{\frac{(1+\epsilon)n}{\log n}}$ and time/sample complexity $2^{\frac{(1+\epsilon)n}{\log n}}\cdot 2^{\Omega(n^{\frac{1}{1+\epsilon}})}$ for small constant $\epsilon\to 0^+$. We propose a variant of the BKW by tweaking Wagner's generalized birthday problem (Crypto 2002) and adapting the technique to a $c$-ary tree structure. In summary, our algorithm achieves the following: \begin{enumerate} \item {\bf (Time-space tradeoff).} We obtain the same time-space tradeoffs for LPN and LWE as those given by Esser et al. (Crypto 2018), but without resorting to any heuristics. For any $2\leq c\in\mathbb{N}$, our algorithm solves the LPN problem with time complexity $2^{\frac{\log c(1+\epsilon)n}{\log n}}\cdot 2^{\Omega(n^{\frac{1}{1+\epsilon}})}$ and space complexity $2^{\frac{\log c(1+\epsilon)n}{(c-1)\log n}}$, where one can use Grover's quantum algorithm or Dinur et al.'s dissection technique (Crypto 2012) to further accelerate/optimize the time complexity. \item {\bf (Time/sample optimization).} A further adjusted variant of our algorithm solves the LPN problem with sample, time and space complexities all kept at $2^{\frac{(1+\epsilon)n}{\log n}}$ for $\epsilon\to 0^+$, saving factor $2^{\Omega(n^{\frac{1}{1+\epsilon}})}$ in time/sample compared to the original BKW, and the variant of Devadas et al. (TCC 2017). \item {\bf (Sample reduction).} Our algorithm provides an alternative to Lyubashevsky's BKW variant (RANDOM 2005) for LPN with a restricted amount of samples. In particular, given $Q=n^{1+\epsilon}$ (resp., $Q=2^{n^{\epsilon}}$) samples, our algorithm saves a factor of $2^{\Omega(n)/(\log n)^{1-\kappa}}$ (resp., $2^{\Omega(n^{\kappa})}$) for constant $\kappa \to 1^-$ in running time while consuming roughly the same space, compared with Lyubashevsky's algorithm. \end{enumerate} In particular, the time/sample optimization benefits from a careful analysis of the error distribution among the correlated candidates, which was not studied by previous rigorous approaches such as the analysis of Minder and Sinclair (J.Cryptology 2012) or Devadas et al. (TCC 2017).

2021

ASIACRYPT

Learning Parity with Noise: Constructions, Reductions, and Analyses 📺 ★

Yu Yu

2021

TOSC

Provable Security of SP Networks with Partial Non-Linear Layers 📺 Abstract

Chun Guo François-Xavier Standaert Weijia Wang Xiao Wang Yu Yu

Motivated by the recent trend towards low multiplicative complexity blockciphers (e.g., Zorro, CHES 2013; LowMC, EUROCRYPT 2015; HADES, EUROCRYPT 2020; MALICIOUS, CRYPTO 2020), we study their underlying structure partial SPNs, i.e., Substitution-Permutation Networks (SPNs) with parts of the substitution layer replaced by an identity mapping, and put forward the first provable security analysis for such partial SPNs built upon dedicated linear layers. For different instances of partial SPNs using MDS linear layers, we establish strong pseudorandom security as well as practical provable security against impossible differential attacks. By extending the well-established MDS code-based idea, we also propose the first principled design of linear layers that ensures optimal differential propagation. Our results formally confirm the conjecture that partial SPNs achieve the same security as normal SPNs while consuming less non-linearity, in a well-established framework.

2021

CRYPTO

Pushing the Limits of Valiant's Universal Circuits: Simpler, Tighter and More Compact 📺 Abstract

Hanlin Liu Yu Yu Shuoyao Zhao Jiang Zhang Wenling Liu Zhenkai Hu

A universal circuit (UC) is a general-purpose circuit that can simulate arbitrary circuits (up to a certain size $n$). Valiant provides a $k$-way recursive construction of UCs (STOC 1976), where $k$ tunes the complexity of the recursion. More concretely, Valiant gives theoretical constructions of 2-way and 4-way UCs of asymptotic (multiplicative) sizes $5n\log n$ and $4.75 n\log n$ respectively, which matches the asymptotic lower bound $\Omega(n\log n)$ up to some constant factor. Motivated by various privacy-preserving cryptographic applications, Kiss et al. (Eurocrypt 2016) validated the practicality of $2$-way universal circuits by giving example implementations for private function evaluation. G{\"{u}}nther et al. (Asiacrypt 2017) and Alhassan et al. (J. Cryptology 2020) implemented the 2-way/4-way hybrid UCs with various optimizations in place towards making universal circuits more practical. Zhao et al. (Asiacrypt 2019) optimized Valiant's 4-way UC to asymptotic size $4.5 n\log n$ and proved a lower bound $3.64 n\log n$ for UCs under the Valiant framework. As the scale of computation goes beyond 10-million-gate ($n=10^7$) or even billion-gate level ($n=10^9$), the constant factor in UCs size plays an increasingly important role in application performance. In this work, we investigate Valiant's universal circuits and present an improved framework for constructing universal circuits with the following advantages. [Simplicity.] Parameterization is no longer needed. In contrast to that previous implementations resorted to a hybrid construction combining $k=2$ and $k=4$ for a tradeoff between fine granularity and asymptotic size-efficiency, our construction gets the best of both worlds when configured at the lowest complexity (i.e., $k=2$). [Compactness.] Our universal circuits have asymptotic size $3n\log n$, improving upon the best previously known $4.5n\log n$ by 33\% and beating the $3.64n\log n$ lower bound for UCs constructed under Valiant's framework (Zhao et al., Asiacrypt 2019). [Tightness.] We show that under our new framework the UCs size is lower bounded by $2.95 n\log n$, which almost matches the $3n\log n$ circuit size of our $2$-way construction. We implement the 2-way universal circuits and evaluate its performance with other implementations, which confirms our theoretical analysis.

2021

CRYPTO

Smoothing Out Binary Linear Codes and Worst-case Sub-exponential Hardness for LPN 📺 Abstract

Yu Yu Jiang Zhang

Learning parity with noise (LPN) is a notorious (average-case) hard problem that has been well studied in learning theory, coding theory and cryptography since the early 90's. It further inspires the Learning with Errors (LWE) problem [Regev, STOC 2005], which has become one of the central building blocks for post-quantum cryptography and advanced cryptographic. Unlike LWE whose hardness can be reducible from worst-case lattice problems, no corresponding worst-case hardness results were known for LPN until very recently. At Eurocrypt 2019, Brakerski et al. [BLVW19] established the first feasibility result that the worst-case hardness of nearest codeword problem (NCP) (on balanced linear code) at the extremely low noise rate $\frac{\log^2 n}{n}$ implies the quasi-polynomial hardness of LPN at the extremely high noise rate $1/2-1/\poly(n)$. It remained open whether a worst-case to average-case reduction can be established for standard (constant-noise) LPN, ideally with sub-exponential hardness. We start with a simple observation that the hardness of high-noise LPN over large fields is implied by that of the LWE of the same modulus, and is thus reducible from worst-case hardness of lattice problems. We then revisit [BLVW19], which is the main focus of this work. We first expand the underlying binary linear codes (of the NCP) to not only the balanced code considered in [BLVW19] but also to another code (in some sense dual to balanced code). At the core of our reduction is a new variant of smoothing lemma (for both binary codes) that circumvents the barriers (inherent in the underlying worst-case randomness extraction) and admits tradeoffs for a wider spectrum of parameter choices. In addition to the worst-case hardness result obtained in [BLVW19], we show that for any constant $0<c<1$ the constant-noise LPN problem is ($T=2^{\Omega(n^{1-c})},\epsilon=2^{-\Omega(n^{\min(c,1-c)})},q=2^{\Omega(n^{\min(c,1-c)})}$)-hard assuming that the NCP at the low-noise rate $\tau=n^{-c}$ is ($T'={2^{\Omega(\tau n)}}$, $\epsilon'={2^{-\Omega(\tau n)}}$,$m={2^{\Omega(\tau n)}}$)-hard in the worst case, where $T$, $\epsilon$, $q$ and $m$ are time complexity, success rate, sample complexity, and codeword length respectively. Moreover, refuting the worst-case hardness assumption would imply arbitrary polynomial speedups over the current state-of-the-art algorithms for solving the NCP (and LPN), which is a win-win result. Unfortunately, public-key encryptions and collision resistant hash functions need constant-noise LPN with ($T={2^{\omega(\sqrt{n})}}$, $\epsilon'={2^{-\omega(\sqrt{n})}}$,$q={2^{\sqrt{n}}}$)-hardness (Yu et al., CRYPTO 2016 \& ASIACRYPT 2019), which is almost (up to an arbitrary $\omega(1)$ factor in the exponent) what is reducible from the worst-case NCP when $c= 0.5$. We leave it as an open problem whether the gap can be closed or there is a separation in place.

2021

TCHES

Learning Parity with Physical Noise: Imperfections, Reductions and FPGA Prototype 📺 Abstract

Davide Bellizia Clément Hoffmann Dina Kamel Hanlin Liu Pierrick Méaux François-Xavier Standaert Yu Yu

Hard learning problems are important building blocks for the design of various cryptographic functionalities such as authentication protocols and post-quantum public key encryption. The standard implementations of such schemes add some controlled errors to simple (e.g., inner product) computations involving a public challenge and a secret key. Hard physical learning problems formalize the potential gains that could be obtained by leveraging inexact computing to directly generate erroneous samples. While they have good potential for improving the performances and physical security of more conventional samplers when implemented in specialized integrated circuits, it remains unknown whether physical defaults that inevitably occur in their instantiation can lead to security losses, nor whether their implementation can be viable on standard platforms such as FPGAs. We contribute to these questions in the context of the Learning Parity with Physical Noise (LPPN) problem by: (1) exhibiting new (output) data dependencies of the error probabilities that LPPN samples may suffer from; (2) formally showing that LPPN instances with such dependencies are as hard as the standard LPN problem; (3) analyzing an FPGA prototype of LPPN processor that satisfies basic security and performance requirements.

2020

TOSC

Efficient Side-Channel Secure Message Authentication with Better Bounds 📺 Abstract

Chun Guo François-Xavier Standaert Weijia Wang Yu Yu

We investigate constructing message authentication schemes from symmetric cryptographic primitives, with the goal of achieving security when most intermediate values during tag computation and verification are leaked (i.e., mode-level leakage-resilience). Existing efficient proposals typically follow the plain Hash-then-MAC paradigm T = TGenK(H(M)). When the domain of the MAC function TGenK is {0, 1}128, e.g., when instantiated with the AES, forgery is possible within time 264 and data complexity 1. To dismiss such cheap attacks, we propose two modes: LRW1-based Hash-then-MAC (LRWHM) that is built upon the LRW1 tweakable blockcipher of Liskov, Rivest, and Wagner, and Rekeying Hash-then-MAC (RHM) that employs internal rekeying. Built upon secure AES implementations, LRWHM is provably secure up to (beyond-birthday) 278.3 time complexity, while RHM is provably secure up to 2121 time. Thus in practice, their main security threat is expected to be side-channel key recovery attacks against the AES implementations. Finally, we benchmark the performance of instances of our modes based on the AES and SHA3 and confirm their efficiency.

2020

PKC

Tweaking the Asymmetry of Asymmetric-Key Cryptography on Lattices: KEMs and Signatures of Smaller Sizes 📺 Abstract

Jiang Zhang Yu Yu Shuqin Fan Zhenfeng Zhang Kang Yang

Currently, lattice-based cryptosystems are less efficient than their number-theoretic counterparts (based on RSA, discrete logarithm, etc.) in terms of key and ciphertext (signature) sizes. For adequate security the former typically needs thousands of bytes while in contrast the latter only requires at most hundreds of bytes. This significant difference has become one of the main concerns in replacing currently deployed public-key cryptosystems with lattice-based ones. Observing the inherent asymmetries in existing lattice-based cryptosystems, we propose asymmetric variants of the (module-)LWE and (module-)SIS assumptions, which yield further size-optimized KEM and signature schemes than those from standard counterparts. Following the framework of Lindner and Peikert (CT-RSA 2011) and the Crystals-Kyber proposal (EuroS&P 2018), we propose an IND-CCA secure KEM scheme from the hardness of the asymmetric module-LWE (AMLWE), whose asymmetry is fully exploited to obtain shorter public keys and ciphertexts. To target at a 128-bit quantum security, the public key (resp., ciphertext) of our KEM only has 896 bytes (resp., 992 bytes). Our signature scheme bears most resemblance to and improves upon the Crystals-Dilithium scheme (ToCHES 2018). By making full use of the underlying asymmetric module-LWE and module-SIS assumptions and carefully selecting the parameters, we construct an SUF-CMA secure signature scheme with shorter public keys and signatures. For a 128-bit quantum security, the public key (resp., signature) of our signature scheme only has 1312 bytes (resp., 2445 bytes). We adapt the best known attacks and their variants to our AMLWE and AMSIS problems and conduct a comprehensive and thorough analysis of several parameter choices (aiming at different security strengths) and their impacts on the sizes, security and error probability of lattice-based cryptosystems. Our analysis demonstrates that AMLWE and AMSIS problems admit more flexible and size-efficient choices of parameters than the respective standard versions.

2020

CRYPTO

Better Concrete Security for Half-Gates Garbling (in the Multi-Instance Setting) 📺 Abstract

Chun Guo Jonathan Katz Xiao Wang Chenkai Weng Yu Yu

We study the concrete security of high-performance implementations of half-gates garbling, which all rely on (hardware-accelerated) AES. We find that current instantiations using k-bit wire labels can be completely broken—in the sense that the circuit evaluator learns all the inputs of the circuit garbler—in time O(2k/C), where C is the total number of (non-free) gates that are garbled, possibly across multiple independent executions. The attack can be applied to existing circuit-garbling libraries using k = 80 when C ≈ $10^9$, and would require 267 machine-months and cost about $3500 to implement on the Google Cloud Platform. Since the attack can be entirely parallelized, the attack could be carried out in about a month using ≈ 250 machines. With this as our motivation, we seek a way to instantiate the hash function in the half-gates scheme so as to achieve better concrete security. We present a construction based on AES that achieves optimal security in the single-instance setting (when only a single circuit is garbled). We also show how to modify the half-gates scheme so that its concrete security does not degrade in the multi-instance setting. Our modified scheme is as efficient as prior work in networks with up to 2 Gbps bandwidth.

2020

ASIACRYPT

Packed Multiplication: How to Amortize the Cost of Side-channel Masking? 📺 Abstract

Weijia Wang Chun Guo François-Xavier Standaert Yu Yu Gaëtan Cassiers

Higher-order masking countermeasures provide strong provable security against side-channel attacks at the cost of incurring significant overheads, which largely hinders its applicability. Previous works towards remedying cost mostly concentrated on ``local'' calculations, i.e., optimizing the cost of computation units such as a single AND gate or a field multiplication. This paper explores a complementary ``global'' approach, i.e., considering multiple operations in the masked domain as a batch and reducing randomness and computational cost via amortization. In particular, we focus on the amortization of $\ell$ parallel field multiplications for appropriate integer $\ell > 1$, and design a kit named {\it packed multiplication} for implementing such a batch. Higher-order masking countermeasures provide strong provable security against side-channel attacks at the cost of incurring significant overheads, which largely hinders its applicability. Previous works towards remedying cost mostly concentrated on ``local'' calculations, i.e., optimizing the cost of computation units such as a single AND gate or a field multiplication. This paper explores a complementary ``global'' approach, i.e., considering multiple operations in the masked domain as a batch and reducing randomness and computational cost via amortization. In particular, we focus on the amortization of $\ell$ parallel field multiplications for appropriate integer $\ell > 1$, and design a kit named {\it packed multiplication} for implementing such a batch. For $\ell+d\leq2^m$, when $\ell$ parallel multiplications over $\mathbb{F}_{2^{m}}$ with $d$-th order probing security are implemented, packed multiplication consumes $d^2+2\ell d + \ell$ bilinear multiplications and $2d^2 + d(d+1)/2$ random field variables, outperforming the state-of-the-art results with $O(\ell d^2)$ multiplications and $\ell \left \lfloor d^2/4\right \rfloor + \ell d$ randomness. To prove $d$-probing security for packed multiplications, we introduce some weaker security notions for multiple-inputs-multiple-outputs gadgets and use them as intermediate steps, which may be of independent interest. As parallel field multiplications exist almost everywhere in symmetric cryptography, lifting optimizations from ``local'' to ``global'' substantially enlarges the space of improvements. To demonstrate, we showcase the method on the AES Subbytes step, GCM and TET (a popular disk encryption). Notably, when $d=8$, our implementation of AES Subbytes in ARM Cortex M architecture achieves a gain of up to $33\%$ in total speeds and saves up to $68\%$ random bits than the state-of-the-art bitsliced implementation reported at ASIACRYPT~2018.

2019

ASIACRYPT

Valiant’s Universal Circuits Revisited: An Overall Improvement and a Lower Bound Abstract

Shuoyao Zhao Yu Yu Jiang Zhang Hanlin Liu

A universal circuit (UC) is a general-purpose circuit that can simulate arbitrary circuits (up to a certain size n). At STOC 1976 Valiant presented a graph theoretic approach to the construction of UCs, where a UC is represented by an edge universal graph (EUG) and is recursively constructed using a dedicated graph object (referred to as supernode). As a main end result, Valiant constructed a 4-way supernode of size 19 and an EUG of size $$4.75n\log n$$ (omitting smaller terms), which remained the most size-efficient even to this day (after more than 4 decades).Motivated by the emerging applications of UCs in various privacy preserving computation scenarios, we revisit Valiant’s universal circuits, and propose a 4-way supernode of size 18, and an EUG of size $$4.5n\log n$$. As confirmed by our implementations, we reduce the size of universal circuits (and the number of AND gates) by more than 5% in general, and thus improve upon the efficiency of UC-based cryptographic applications accordingly. Our approach to the design of optimal supernodes is computer aided (rather than by hand as in previous works), which might be of independent interest. As a complement, we give lower bounds on the size of EUGs and UCs in Valiant’s framework, which significantly improves upon the generic lower bound on UC size and therefore reduces the gap between theory and practice of universal circuits.

2019

ASIACRYPT

Collision Resistant Hashing from Sub-exponential Learning Parity with Noise Abstract

Yu Yu Jiang Zhang Jian Weng Chun Guo Xiangxue Li

The Learning Parity with Noise (LPN) problem has recently found many cryptographic applications such as authentication protocols, pseudorandom generators/functions and even asymmetric tasks including public-key encryption (PKE) schemes and oblivious transfer (OT) protocols. It however remains a long-standing open problem whether LPN implies collision resistant hash (CRH) functions. Inspired by the recent work of Applebaum et al. (ITCS 2017), we introduce a general construction of CRH from LPN for various parameter choices. We show that, just to mention a few notable ones, under any of the following hardness assumptions (for the two most common variants of LPN) 1.constant-noise LPN is $$2^{n^{0.5+\varepsilon }}$$-hard for any constant $$\varepsilon >0$$;2.constant-noise LPN is $$2^{\varOmega (n/\log n)}$$-hard given $$q=\mathsf {poly}(n)$$ samples;3.low-noise LPN (of noise rate $$1/\sqrt{n}$$) is $$2^{\varOmega (\sqrt{n}/\log n)}$$-hard given $$q=\mathsf {poly}(n)$$ samples. there exists CRH functions with constant (or even poly-logarithmic) shrinkage, which can be implemented using polynomial-size depth-3 circuits with NOT, (unbounded fan-in) AND and XOR gates. Our technical route LPN $$\rightarrow $$ bSVP $$\rightarrow $$ CRH is reminiscent of the known reductions for the large-modulus analogue, i.e., LWE $$\rightarrow $$ SIS $$\rightarrow $$ CRH, where the binary Shortest Vector Problem (bSVP) was recently introduced by Applebaum et al. (ITCS 2017) that enables CRH in a similar manner to Ajtai’s CRH functions based on the Short Integer Solution (SIS) problem.Furthermore, under additional (arguably minimal) idealized assumptions such as small-domain random functions or random permutations (that trivially imply collision resistance), we still salvage a simple and elegant collision-resistance-preserving domain extender combining the best of the two worlds, namely, maximized (depth one) parallelizability and polynomial shrinkage. In particular, assume $$2^{n^{0.5+\varepsilon }}$$-hard constant-noise LPN or $$2^{n^{0.25+\varepsilon }}$$-hard low-noise LPN, we obtain a collision resistant hash function that evaluates in parallel only a single layer of small-domain random functions (or random permutations) and shrinks polynomially.

2017

ASIACRYPT

Two-Round PAKE from Approximate SPH and Instantiations from Lattices

Jiang Zhang Yu Yu

2016

EUROCRYPT

Pseudorandom Functions in Almost Constant Depth from Low-Noise LPN 📺

Yu Yu John P. Steinberger

2016

CRYPTO

Cryptography with Auxiliary Input and Trapdoor from Constant-Noise LPN 📺

Yu Yu Jiang Zhang