International Association for Cryptologic Research

International Association
for Cryptologic Research

IACR News

Updates on the COVID-19 situation are on the Announcement channel.

Here you can see all recent updates to the IACR webpage. These updates are also available:

RSS symbol icon
via RSS feed
Twitter bird icon
via Twitter
Weibo icon
via Weibo
Facebook icon
via Facebook

23 April 2021

Sijun Tan, Brian Knott, Yuan Tian, David J. Wu
ePrint Report ePrint Report
We introduce CryptGPU, a system for privacy-preserving machine learning that implements all operations on the GPU (graphics processing unit). Just as GPUs played a pivotal role in the success of modern deep learning, they are also essential for realizing scalable privacy-preserving deep learning. In this work, we start by introducing a new interface to losslessly embed cryptographic operations over secret-shared values (in a discrete domain) into floating-point operations that can be processed by highly-optimized CUDA kernels for linear algebra. We then identify a sequence of "GPU-friendly" cryptographic protocols to enable privacy-preserving evaluation of both linear and non-linear operations on the GPU. Our microbenchmarks indicate that our private GPU-based convolution protocol is over 150x faster than the analogous CPU-based protocol; for non-linear operations like the ReLU activation function, our GPU-based protocol is around 10x faster than its CPU analog.

With CryptGPU, we support private inference and private training on convolutional neural networks with over 60 million parameters as well as handle large datasets like ImageNet. Compared to the previous state-of-the-art, when considering large models and datasets, our protocols achieve a 2x to 8x improvement in private inference and a 6x to 36x improvement for private training. Our work not only showcases the viability of performing secure multiparty computation (MPC) entirely on the GPU to enable fast privacy-preserving machine learning, but also highlights the importance of designing new MPC primitives that can take full advantage of the GPU's computing capabilities.
Expand
Tung Chou, Matthias J. Kannwischer, Bo-Yin Yang
ePrint Report ePrint Report
We present the first Cortex-M4 implementation of the NISTPQC signature finalist Rainbow. We target the Giant Gecko EFM32GG11B which comes with 512 kB of RAM which can easily accommodate the keys of RainbowI. We present fast constant-time bitsliced F_16 multiplication allowing multiplication of 32 field elements in 32 clock cycles. Additionally, we introduce a new way of computing the public map P in the verification procedure allowing vastly faster signature verification. Both the signing and verification procedures of our implementation are by far the fastest among the NISTPQC signature finalists. Signing of rainbowIclassic requires roughly 957 000 clock cycles which 4× faster than the state of the art Dilithium2 implementation and 45× faster than Falcon-512. Verification needs about 239 000 cycles which is 5× and 2× faster respectively. The cost of signing can be further decreased by 20% when storing the secret key in a bitsliced representation.
Expand
David Heath, Vladimir Kolesnikov
ePrint Report ePrint Report
Secure two party computation (2PC) of arbitrary programs can be efficiently achieved using garbled circuits (GC). Until recently, it was widely believed that a GC proportional to the entire program, including parts of the program that are entirely discarded due to conditional branching, must be transmitted over a network. Recent work shows that this belief is false, and that communication proportional only to the longest program execution path suffices (Heath and Kolesnikov, CRYPTO 20, [HK20a]). Although this recent work reduces needed communication, it increases computation. For a conditional with $b$ branches, the players use $O(b^2)$ computation (traditional GC uses only $O(b)$).

Our scheme LogStack reduces stacked garbling computation from $O(b^2)$ to $O(b \log b)$ with no increase in communication over [HK20a]. The cause of [HK20a]'s increased computation is the oblivious collection of garbage labels that emerge during the evaluation of inactive branches. Garbage is collected by a multiplexer that is costly to generate. At a high level, we redesign stacking and garbage collection to avoid quadratic scaling.

Our construction is also more space efficient: [HK20a] algorithms require $O(b)$ space, while ours use only $O(\log b)$ space. This space efficiency allows even modest setups to handle large numbers of branches.

[HK20a] assumes a random oracle (RO). We track the source of this need, formalize a simple and natural added assumption on the base garbling scheme, and remove reliance on RO: LogStack is secure in the standard model. Nevertheless, LogStack can be instantiated with typical GC tricks based on non-standard assumptions, such as free XOR and half-gates, and hence can be implemented with high efficiency.

We implemented LogStack (in the RO model, based on half-gates garbling) and report performance. In terms of wall-clock time and for fewer than $16$ branches, our performance is comparable to [HK20a]'s; for larger branching factors, our approach clearly outperforms [HK20a]. For example, given $1024$ branches, our approach is $31\times$ faster.
Expand
Yuan Yao, Tuna Tufan, Tarun Kathuria, Baris Ege, Ulkuhan Guler, Patrick Schaumont
ePrint Report ePrint Report
While side-channel leakage is traditionally evaluated from a fabricated chip, it is more time-efficient and cost-effective to do so during the design phase of the chip. We present Pre-silicon Architecture Correlation Analysis (PACA), a hardware design analysis methodology to help designer locate and mitigate the vulnerabilities in the design at an early design stage. PACA first ranks the individual cells in a design netlist according to their contribution to the estimated side-channel leakage and points out the leaky cells. Next, we further reduce the side-channel leakage by selective replacement of the highest-leaking cells in the design with a side-channel protection version. We demonstrate that PACA’s selective replacement can significantly reduce the overhead of the countermeasure, since traditionally countermeasures are applied to the whole design. We first use a simple circuit to introduce and demonstrate the effectiveness of PACA. Then we further demonstrate that PACA can also handle complex designs by applying the overall methodology of PACA on an AES coprocessor, a PRESENT hardware cipher, and on a complex SoC. We demonstrate it is an achievable goal in the modern IC design flow to locate and mitigate the leakage source with low cost.
Expand
Nicolas Gailly, Mary Maller, Anca Nitulescu
ePrint Report ePrint Report
We present and implement SnarkPack, an argument for aggregating $n$ Groth16 zkSNARKs with a $O(\log n)$ proof size and verifier time. Our techniques are inspired from the inner pairing product argument introduced by Bünz et al. with the difference that our final scheme does not require a different trusted setup, but it reuses the one from the pairing-based SNARK that it aggregates.

The key tool for our SnarkPack construction is a new commitment scheme that allows us to instantiate the inner product pairing argument of Bünz et al. by using existing powers of tau ceremony transcripts. We also describe a scheme that merge together a multi-exponentiation argument and an inner pairing product argument for some common randomness vector with minimal overhead. We further apply some optimisations to our protocol and illustrate it's efficiency by implementing it. SnarkPack can aggregate 1024 proofs in 2s and verify them in 33ms, including un-serialization time, yielding a verification mechanism that is exponentially faster than batching.
Expand
Denis Firsov, Henri Lakk, Ahto Truu
ePrint Report ePrint Report
Buldas, Laanoja, and Truu designed a family of server-assisted digital signature schemes (BLT signatures) built around cryptographic timestamping and forward-resistant tag systems. The original constructions had either expensive key generation phase or stateful client-side computations.

In this paper, we construct a stateless tag system with efficient key generation from one-time signature schemes. We prove that the proposed tag system is forward-resistant and when combined with cryptographic timestamping, it induces a secure (existentially unforgeable) multiple-time signature scheme. Our constructions are developed and verified using the EasyCrypt framework.
Expand
Michał Wroński
ePrint Report ePrint Report
Shor's quantum algorithm for integer factorization and discrete logarithm is one of the fundamental approaches in modern cryptology. The application of Shor's algorithm requires a general-purpose quantum computer. On the other hand, there are known methods of transformation of factorization problem to the QUBO problem and then solving it using quantum annealing computing with approximately $\frac{n^2}{4}$ logical qubits, for example, using D-Wave computer. It is believed that this approach also may be helpful, primarily until large general-purpose quantum computers will exist. Until now algorithm of similar efficiency for computing discrete logarithm over prime fields was unknown. In this paper, we present a method of reducing discrete logarithm problem to the QUBO problem, which requires approximately $\frac{n^3}{2}$ logical qubits. We also show how to apply quantum annealing to compute discrete logarithm modulo composite numbers, where a quantum annealing factorization algorithm may be used to reduce discrete logarithm modulo composite to several discrete logarithm problems modulo prime number.
Expand
Jorai Rijsdijk, Lichao Wu, Guilherme Perin, Stjepan Picek
ePrint Report ePrint Report
Deep learning represents a powerful set of techniques for profiling side-channel analysis. The results in the last few years show that neural network architectures like multilayer perceptron and convolutional neural networks give strong attack performance where it is possible to break targets protected with various countermeasures. Considering that deep learning techniques commonly have a plethora of hyperparameters to tune, it is clear that such top attack results can come with a high price in preparing the attack. This is especially problematic as the side-channel community commonly uses random search or grid search techniques to look for the best hyperparameters.

In this paper, we propose to use reinforcement learning to tune the convolutional neural network hyperparameters. In our framework, we investigate the Q-Learning paradigm and develop two reward functions that use side-channel metrics. We mount an investigation on three commonly used datasets and two leakage models where the results show that reinforcement learning can find convolutional neural networks exhibiting top performance while having small numbers of trainable parameters. We note that our approach is automated and can be easily adapted to different datasets. Several of our newly developed architectures outperform the current state-of-the-art results. Finally, we make our source code publicly available.
Expand
Lichao Wu, Guilherme Perin
ePrint Report ePrint Report
In recent years, the advent of deep neural networks opened new perspectives for security evaluations with side-channel analysis. Specifically, profiling attacks now benefit from capabilities offered by convolutional neural networks, such as dimensionality reduction, the absence of manual feature selection, and the inherent ability to reduce trace desynchronization effects. These neural networks contain at least three types of layers: convolutional, pooling, and dense layers. Although the definition of pooling layers causes a large impact on neural network performance, a study on pooling hyperparameters effect on side-channel analysis is still not provided in the academic community.

This paper provides extensive experimental results to demonstrate how pooling layer types and pooling stride and size affect the profiling attack performance with convolutional neural networks. Additionally, we demonstrate that pooling hyperparameters can be larger than usually used in related works and still keep good performance for profiling attacks on specific datasets. Finally, with a larger pooling stride and size, a neural network can be reduced in size, favoring training performance.
Expand
Kwangsu Lee
ePrint Report ePrint Report
Functional encryption (FE) is a new paradigm of public key encryption that can control the exposed information of plaintexts by supporting computation on encrypted data. In this paper, we propose efficient multi-client FE (MCFE) schemes that compute the set intersection of ciphertexts generated by two clients. First, we propose an MCFE scheme that calculates the set intersection cardinality (MCFE-SIC) and prove its static security under dynamic assumptions. Next, we extend our MCFE-SIC scheme to an MCFE scheme for set intersection (MCFE-SI) and prove its static security under dynamic assumptions. The decryption algorithm of our MCFE-SI scheme is more efficient than the existing MCFE-SI scheme because it requires fewer pairing operations to calculate the intersection of two clients. Finally, we propose a decentralized MCFE scheme for set intersection (DMCFE-SI) that decentralizes the generation of function keys. Our MCFE schemes can be effectively applied to a privacy-preserving contact tracing system to prevent the spread of recent infectious diseases.
Expand
Yevgeniy Dodis, Siyao Guo, Noah Stephens-Davidowitz, Zhiye Xie
ePrint Report ePrint Report
Real-world random number generators (RNGs) cannot afford to use (slow) cryptographic hashing every time they refresh their state $R$ with a new entropic input $X$. Instead, they use ``superefficient'' simple entropy-accumulation procedures, such as $$R \leftarrow \mathsf{rot}_{\alpha, n}(R) \oplus X,$$ where $\mathsf{rot}_{\alpha,n}$ rotates an $n$-bit state $R$ by some fixed number $\alpha$. For example, Microsoft's RNG uses $\alpha=5$ for $n=32$ and $\alpha=19$ for $n=64$. Where do these numbers come from? Are they good choices? Should rotation be replaced by a better permutation $\pi$ of the input bits?

In this work we initiate a rigorous study of these pragmatic questions, by modeling the sequence of successive entropic inputs $X_1,X_2,\ldots$ as independent (but otherwise adversarial) samples from some natural distribution family ${\mathcal D}$. Our contribution is as follows.

* We define $2$-monotone distributions as a rich family ${\mathcal D}$ that includes relevant real-world distributions (Gaussian, exponential, etc.), but avoids trivial impossibility results.

* For any $\alpha$ with $\gcd(\alpha,n)=1$, we show that rotation accumulates $\Omega(n)$ bits of entropy from $n$ independent samples $X_1,\ldots,X_n$ from any (unknown) $2$-monotone distribution with entropy $k > 1$.

* However, we also show that some choices of $\alpha$ perform much better than others for a given $n$. E.g., we show $\alpha=19$ is one of the best choices for $n=64$; in contrast, $\alpha=5$ is good, but generally worse than $\alpha=7$, for $n=32$.

* More generally, given a permutation $\pi$ and $k\ge 1$, we define a simple parameter, the covering number $C_{\pi,k}$, and show that it characterizes the number of steps before the rule $$(R_1,\ldots,R_n)\leftarrow (R_{\pi(1)},\ldots, R_{\pi(n)})\oplus X$$ accumulates nearly $n$ bits of entropy from independent, $2$-monotone samples of min-entropy $k$ each.

* We build a simple permutation $\pi^*$, which achieves nearly optimal $C_{\pi^*,k}\approx n/k$ for all values of $k$ simultaneously, and experimentally validate that it compares favorably with all rotations $\mathsf{rot}_{\alpha,n}$.
Expand
Yingpu Deng, Lixia Luo, Yanbin Pan, Zhaonan Wang, Guanju Xiao
ePrint Report ePrint Report
In 2018, the longest vector problem and closest vector problem in local fields were introduced, as the p-adic analogues of the shortest vector problem and closest vector problem in lattices of Euclidean spaces. They are considered to be hard and useful in constructing cryptographic primitives, but no applications in cryptography were given. In this paper, we construct the first signature scheme and public-key encryption cryptosystem based on p-adic lattice by proposing a trapdoor function with the orthogonal basis of p-adic lattice. These cryptographic schemes have reasonable key size and efficiency, which shows that p-adic lattice can be a new alternative to construct cryptographic primitives and well worth studying.
Expand
Daniel Demmler, Stefan Katzenbeisser, Thomas Schneider, Tom Schuster, Christian Weinert
ePrint Report ePrint Report
Secure multi-party computation (MPC) allows multiple parties to securely evaluate a public function on their private inputs. The field has steadily moved forward and real-world applications have become practical. However, MPC implementations are often hand-built and require cryptographic knowledge. Thus, special compilers like HyCC (Büscher et al., CCS'18) have been developed, which automatically compile high-level programs to combinations of Boolean and arithmetic circuits required for mixed-protocol (hybrid) MPC. In this work, we explore the advantages of extending MPC compilers with an intermediate representation (IR) as commonly used in modern compiler infrastructures. For this, we extend HyCC with a graph-based IR that facilitates the implementation of well-known algorithms from compiler design as well as further MPC-specific optimizations. We demonstrate the benefits by implementing arithmetic decomposition based on our new IR that automatically extracts arithmetic expressions and then compiles them into separate circuits. For a line intersection algorithm, we require 40% less run-time and improve total communication by a factor of 3x compared to regular HyCC when securely evaluating the corresponding circuit with the hybrid MPC framework ABY (Demmler et al., NDSS'15).
Expand
Thomas Haines, Johannes Mueller
ePrint Report ePrint Report
One of the most important verifiability techniques for mix nets is randomized partial checking (RPC). This method is employed in a number of prominent secure e-voting systems, including Pret a Voter, Civitas, and Scantegrity II, some of which have also been used for real political elections including in Australia.

Unfortunately, it turned out that there exists a significant gap between the intended and the actual verifiability tolerance of the original RPC protocol. This mismatch affects exactly the "Achilles heel" of RPC, namely those application scenarios where manipulating a few messages can swap the final result (e.g., in close runoff elections).

In this work, we propose the first RPC protocol which closes the aforementioned gap for decryption mix nets. We prove that our new RPC protocol achieves an optimal verifiability level, without introducing any disadvantages. Current implementations of RPC for decryption mix nets, in particular for real-world secure e-voting, should adopt our changes to improve their security.
Expand
Atakan Arslan, Muhammed Ali Bingöl
ePrint Report ePrint Report
Most recently, Izza et al. propose a new ECC-based RFID authentication protocol by showing the vulnerabilities of Naeem's protocol. They claim that their scheme provides security and privacy. However, we assert that their protocol does not satisfy privacy including anonymity, untraceability, forward and backward secrecy on the contrary of their claim. We also argue that the scheme suffers from availability problems.
Expand
Victor Ermolaev, Gamze Tillem
ePrint Report ePrint Report
Custodian service is a service safeguarding a firm's or individual's financial assets or secret information. Such services often present a user with security versus ownership dilemma. The user does not wish to pass full control over their asset to the custodian to facilitate safeguarding. A control sharing mechanism allowing the custodian to hold enough information and keeping the user as the owner of the asset is required. For the assets being secret information, cryptographic protocols addressing this dilemma are known as prepositioned secret sharing~(PSS) protocols. PSS schemes distinguish redundant ``common'' shares and specific ``activating'' shares controlling the very possibility of the secret information reconstruction. Usually, PSS schemes: 1) lack robustness with respect to the amount of ``common'' shares, i.e., a high redundancy degree in ``common'' enables them to reconstruct the secret without ``activation'', and 2) are inflexible in configuring the robustness of the ``activating'' shares, i.e., how many ``activating'' shares can be lost or stolen before the secret can be reconstructed. In this paper, we present a PSS addressing these shortcomings.
Expand
Yanyi Liu, Rafael Pass
ePrint Report ePrint Report
Let $\mktp[s]$ be the set of strings $x$ such that $K^t(x) \leq s(|x|)$, where $K^t(x)$ denotes the $t$-bounded Kolmogorov complexity of the truthtable described by $x$. Our main theorem shows that for an appropriate notion of mild average-case hardness, for every $\varepsilon>0$, polynomial $t(n) \geq (1+\varepsilon)n$, and every ``nice'' class $\F$ of super-polynomial functions, the following are equivalent: - the existence of some function $T \in \F$ such that $T$-hard one-way functions (OWF) exists (with non-uniform security); - the existence of some function $T \in \F$ such that $\mktp[T^{-1}]$ is mildly average-case hard with respect to sublinear-time non-uniform algorithms (with running-time $n^{\delta}$ for some $0<\delta<1$). For instance, existence of subexponentially-hard (resp. quasi-polynomially-hard) OWFs is equivalent to mild average-case hardness of $\mktp[\poly\log n]$ (resp. $\mktp[2^{O(\sqrt{\log n})})]$) w.r.t. sublinear-time non-uniform algorithms.

We additionally note that if we want to deduce $T$-hard OWFs where security holds w.r.t. uniform $T$-time probabilistic attackers (i.e., uniformly-secure OWFs), it suffices to assume sublinear time hardness of $\mktp$ w.r.t. uniform probabilistic sublinear-time attackers. We complement this result by proving lower bounds that come surprisingly close to what is required to unconditionally deduce the existence of (uniformly-secure) OWFs: $\mktp[\poly\log n]$ is worst-case hard w.r.t. uniform probabilistic sublinear-time algorithms, and $\mktp[n-\log n]$ is mildly average-case hard for all $O(t(n)/n^3)$-time deterministic algorithms.
Expand
Weiqiong Cao, Hongsong Shi, Hua Chen, Wei Xi, Yuhang Wang
ePrint Report ePrint Report
ECIES has been widely used in many cryptographic devices and systems to ensure the confidentiality of communication data. Hence, researching its security of implementation is essential. It is generally considered that the embedded point validation towards the input point $Q$ during decryption is enough to resist most of the existing fault attacks and small subgroup attacks. Even many open source algorithm libraries (e.g., OpenSSL and BouncyCastle) only employ the embedded point validation to resist fault attack. However, the proposed weak curve fault attack in this paper can break this situation because it can successfully pass the embedded point validation and the validation of the scalar multiplication about the input point $Q$ and cofactor $h$(i.e., $hQ \ne \mathcal{O}$). Moreover, the proposed attack does not require that the instances of ECDLP on the weak curve derived by fault injection is computationally practical which could increase the availability of fault injection. The simulations demonstrate the feasibility of our attack. Finally, we also investigate the implementations of $14$ open source algorithm libraries, and there are $10$ algorithm libraries which can not block our attack. Hence, we also give some suggestions about countermeasures.
Expand
Keita Emura, Atsushi Takayasu, Yohei Watanabe
ePrint Report ePrint Report
Revocable hierarchical identity-based encryption (RHIBE) is an extension of hierarchical identity-based encryption (HIBE) supporting the key revocation mechanism. In this paper, we propose a generic construction of RHIBE from HIBE with the complete subtree method. Then, we obtain the first RHIBE schemes under the quadratic residuosity assumption, CDH assumption without pairing, factoring Blum integers, LPN assumption, and code-based assumption, and the first almost tightly secure RHIBE schemes under the k-linear assumption. Furthermore, by using pairing-based (dual) identity-based broadcast encryption, we obtain the variants of the scheme with shorter ciphertexts or shorter key updates.
Expand
Abhishek Jain, Zhengzhong Jin
ePrint Report ePrint Report
We provide the first constructions of non-interactive zero-knowledge and Zap arguments for NP based on the sub-exponential hardness of Decisional Diffie-Hellman against polynomial time adversaries (without use of groups with pairings).

Central to our results, and of independent interest, is a new notion of interactive trapdoor hashing protocols.
Expand
◄ Previous Next ►