International Association for Cryptologic Research

International Association
for Cryptologic Research

CryptoDB

Bo-Yin Yang

Publications

Year
Venue
Title
2021
TCHES
NTT Multiplication for NTT-unfriendly Rings: New Speed Records for Saber and NTRU on Cortex-M4 and AVX2 📺
In this paper, we show how multiplication for polynomial rings used in the NIST PQC finalists Saber and NTRU can be efficiently implemented using the Number-theoretic transform (NTT). We obtain superior performance compared to the previous state of the art implementations using Toom–Cook multiplication on both NIST’s primary software optimization targets AVX2 and Cortex-M4. Interestingly, these two platforms require different approaches: On the Cortex-M4, we use 32-bit NTT-based polynomial multiplication, while on Intel we use two 16-bit NTT-based polynomial multiplications and combine the products using the Chinese Remainder Theorem (CRT).For Saber, the performance gain is particularly pronounced. On Cortex-M4, the Saber NTT-based matrix-vector multiplication is 61% faster than the Toom–Cook multiplication resulting in 22% fewer cycles for Saber encapsulation. For NTRU, the speed-up is less impressive, but still NTT-based multiplication performs better than Toom–Cook for all parameter sets on Cortex-M4. The NTT-based polynomial multiplication for NTRU-HRSS is 10% faster than Toom–Cook which results in a 6% cost reduction for encapsulation. On AVX2, we obtain speed-ups for three out of four NTRU parameter sets.As a further illustration, we also include code for AVX2 and Cortex-M4 for the Chinese Association for Cryptologic Research competition award winner LAC (also a NIST round 2 candidate) which outperforms existing code.
2021
EUROCRYPT
The Nested Subset Differential Attack: A Practical Direct Attack Against LUOV which Forges a Signature within 210 Minutes
In 2017, Ward Beullenset al.submitted Lifted Unbalanced Oil and Vinegar [4], which is a modification to the Unbalanced Oil and Vinegar Schemeby Patarin. Previously, Ding et al.proposed the Subfield Differential Attack [20]which prompted a change of parameters by the authors of LUOV for the second round of the NIST post quantum standardization competition [3].In this paper we propose a modification to the Subfield Differential Attackcalled the Nested Subset Differential Attack which fully breaks half of the parameter sets put forward. We also show by experimentation that this attack is practically possible to do in under 210 minutes for the level I security parameters and not just a theoretical attack. The Nested Subset Differential attack is a large improvement of the Subfield differential attack which can be used in real world circumstances. Moreover, we will only use what is called the "lifted" structure of LUOV, and our attack can be thought as a development of solving"lifted" quadratic systems.
2021
TCHES
Rainbow on Cortex-M4 📺
We present the first Cortex-M4 implementation of the NISTPQC signature finalist Rainbow. We target the Giant Gecko EFM32GG11B which comes with 512 kB of RAM which can easily accommodate the keys of RainbowI.We present fast constant-time bitsliced F16 multiplication allowing multiplication of 32 field elements in 32 clock cycles. Additionally, we introduce a new way of computing the public map P in the verification procedure allowing vastly faster signature verification.Both the signing and verification procedures of our implementation are by far the fastest among the NISTPQC signature finalists. Signing of rainbowIclassic requires roughly 957 000 clock cycles which is 4× faster than the state of the art Dilithium2 implementation and 45× faster than Falcon-512. Verification needs about 239 000 cycles which is 5× and 2× faster respectively. The cost of signing can be further decreased by 20% when storing the secret key in a bitsliced representation.
2020
TCHES
Polynomial Multiplication in NTRU Prime: Comparison of Optimization Strategies on Cortex-M4 📺
This paper proposes two different methods to perform NTT-based polynomial multiplication in polynomial rings that do not naturally support such a multiplication. We demonstrate these methods on the NTRU Prime key-encapsulation mechanism (KEM) proposed by Bernstein, Chuengsatiansup, Lange, and Vredendaal, which uses a polynomial ring that is, by design, not amenable to use with NTT. One of our approaches is using Good’s trick and focuses on speed and supporting more than one parameter set with a single implementation. The other approach is using a mixed radix NTT and focuses on the use of smaller multipliers and less memory. On a ARM Cortex-M4 microcontroller, we show that our three NTT-based implementations, one based on Good’s trick and two mixed radix NTTs, provide between 32% and 17% faster polynomial multiplication. For the parameter-set ntrulpr761, this results in between 16% and 9% faster total operations (sum of key generation, encapsulation, and decapsulation) and requires between 15% and 39% less memory than the current state-of-the-art NTRU Prime implementation on this platform, which is using Toom-Cook-based polynomial multiplication.
2019
TCHES
Fast constant-time gcd computation and modular inversion 📺
Daniel J. Bernstein Bo-Yin Yang
This paper introduces streamlined constant-time variants of Euclid’s algorithm, both for polynomial inputs and for integer inputs. As concrete applications, this paper saves time in (1) modular inversion for Curve25519, which was previously believed to be handled much more efficiently by Fermat’s method, and (2) key generation for the ntruhrss701 and sntrup4591761 lattice-based cryptosystems.
2019
TCHES
Power Analysis on NTRU Prime 📺
This paper applies a variety of power analysis techniques to several implementations of NTRU Prime, a Round 2 submission to the NIST PQC Standardization Project. The techniques include vertical correlation power analysis, horizontal indepth correlation power analysis, online template attacks, and chosen-input simple power analysis. The implementations include the reference one, the one optimized using smladx, and three protected ones. Adversaries in this study can fully recover private keys with one single trace of short observation span, with few template traces from a fully controlled device similar to the target and no a priori power model, or sometimes even with the naked eye. The techniques target the constant-time generic polynomial multiplications in the product scanning method. Though in this work they focus on the decapsulation, they also work on the key generation and encapsulation of NTRU Prime. Moreover, they apply to the ideal-lattice-based cryptosystems where each private-key coefficient comes from a small set of possibilities.
2015
EPRINT
2015
ASIACRYPT
2012
PKC
2012
CHES
2011
CHES
2011
CHES
2010
ASIACRYPT
2010
CHES
2010
EPRINT
Fast Exhaustive Search for Polynomial Systems in $F_2$
We analyze how fast we can solve general systems of multivariate equations of various low degrees over \GF{2}; this is a well known hard problem which is important both in itself and as part of many types of algebraic cryptanalysis. Compared to the standard exhaustive-search technique, our improved approach is more efficient both asymptotically and practically. We implemented several optimized versions of our techniques on CPUs and GPUs. Modern graphic cards allows our technique to run more than 10 times faster than the most powerful CPU available. Today, we can solve 48+ quadratic equations in 48 binary variables on a NVIDIA GTX 295 video card (USD 500) in 21 minutes. With this level of performance, solving systems of equations supposed to ensure a security level of 64 bits turns out to be feasible in practice with a modest budget. This is a clear demonstration of the power of GPUs in solving many types of combinatorial and cryptanalytic problems.
2009
CHES
2009
EUROCRYPT
2008
EPRINT
New Differential-Algebraic Attacks and Reparametrization of Rainbow
A recently proposed class of multivariate quadratic schemes, the Rainbow-Like signature Schemes, in which successive sets of central variables are obtained from previous ones by solving linear equations, seem to lead to efficient schemes (TTS, TRMS, and Rainbow) that perform well on systems of low computational resources. Recently SFLASH ($C^{\ast-}$) was broken by Dubois, Fouque, Shamir, and Stern via a differential attack. In this paper, we exhibit similar attacks based on differentials, that will reduce published Rainbow-like schemes below their security levels. We will present a new type of construction of Rainbow-Like schemes and design signature schemes with new parameters for practical applications.
2008
EPRINT
Small Odd Prime Field Multivariate PKCs
We show that Multivariate Public Key Cryptosystems (MPKCs) over fields of small odd prime characteristic, say 31, can be highly efficient. Indeed, at the same design security of $2^{80}$ under the best known attacks, odd-char MPKC is generally faster than prior MPKCs over \GF{2^k}, which are in turn faster than ``traditional'' alternatives. This seemingly counter-intuitive feat is accomplished by exploiting the comparative over-abundance of small integer arithmetic resources in commodity hardware, here embodied by SSE2 or more advanced special multimedia instructions on modern x86-compatible CPUs. We explain our implementation techniques and design choices in implementing our chosen MPKC instances modulo small a odd prime. The same techniques are also applicable in modern FPGAs which often contains a large number of multipliers.
2008
EPRINT
ECM on Graphics Cards
This paper reports record-setting performance for the elliptic-curve method of integer factorization: for example, 926.11 curves/second for ECM stage 1 with B1=8192 for 280-bit integers on a single PC.The state-of-the-art GMP-ECM software handles 124.71 curves/second for ECM stage 1 with B1=8192 for 280-bit integers using all four cores of a 2.4 GHz Core 2 Quad Q6600. The extra speed takes advantage of extra hardware,specifically two NVIDIA GTX 295 graphics cards,using a new ECM implementation introduced in this paper.Our implementation uses Edwards curves, relies on new parallel addition formulas, and is carefully tuned for the highly parallel GPU architecture.On a single GTX 295 the implementation performs 41.88 million modular multiplications per second for a general 280-bit modulus.GMP-ECM, using all four cores of a Q6600, performs 13.03 million modular multiplications per second. This paper also reports speeds on other graphics processors: for example, 2414 280-bit elliptic-curve scalar multiplications per second on an older NVIDIA 8800 GTS (G80), again for a general 280-bit modulus.For comparison, the CHES 2008 paper ``Exploiting the Power of GPUs for Asymmetric Cryptography'' reported 1412 elliptic-curve scalar multiplications per second on the same graphics processor despite having fewer bits in the scalar (224 instead of 280), fewer bits in the modulus (224 instead of 280), and a special modulus (2^{224}-2^{96}+1).
2008
EPRINT
Odd-Char Multivariate Hidden Field Equations
We present a multivariate version of Hidden Field Equations (HFE) over a finite field of odd characteristic, with an extra ``embedding'' modifier. Combining these known ideas makes our new MPKC (multivariate public key cryptosystem) more efficient and scalable than any other extant multivariate encryption scheme. Switching to odd characteristics in HFE-like schemes affects how an attacker can make use of field equations. Extensive empirical tests (using MAGMA-2.14, the best commercially available \mathbold{F_4} implementation) suggests that our new construction is indeed secure against algebraic attacks using Gr\"obner Basis algorithms. The ``embedding'' serves both to narrow down choices of pre-images and to guard against a possible Kipnis-Shamir type (rank-based) attack. We may hence reasonably argue that for practical sizes, prior attacks take exponential time. We demonstrate that our construction is in fact efficient by implementing practical-sized examples of our ``odd-char HFE'' with 3 variables (``THFE'') over $\mathrm{GF}(31)$. To be precise, our preliminary THFE implementation is $15\times$--$20\times$ the speed of RSA-1024.
2007
FSE
2007
PKC
2007
EPRINT
Multivariates Polynomials for Hashing
Jintai Ding Bo-Yin Yang
We propose the idea of building a secure hash using quadratic or higher degree multivariate polynomials over a finite field as the compression function, whose security relies on simple hard questions. We analyze some security properties and potential feasibility, where the compression functions are randomly chosen high-degree polynomials. Next, we propose to improve on the efficiency of the system by using some specially designed polynomials using composition of maps and certain sparsity property, where the security of the system would then relies on stronger assumptions.
2007
EPRINT
Breaking the Symmetry: a Way to Resist the New Differential Attack
Sflash had recently been broken by Dubois, Stern, Shamir, etc., using a differential attack on the public key. The $C^{\ast-}$ signature schemes are hence no longer practical. In this paper, we will study the new attack from the point view of symmetry, then (1) present a simple concept (projection) to modify several multivariate schemes to resist the new attacks; (2) demonstrate with practical examples that this simple method could work well; and (3) show that the same discussion of attack-and-defence applies to other big-field multivariates. The speed of encryption schemes is not affected, and we can still have a big-field multivariate signatures resisting the new differential attacks with speeds comparable to Sflash.
2007
EPRINT
Secure PRNGs from Specialized Polynomial Maps over Any $F_q$
We prove that a random map drawn from any class ${\frak C}$ of polynomial maps from $F_q^n$ to $F_q^{n+r}$ that is (i) totally random in the affine terms, and (ii) has a negligible chance of being not strongly one-way, provides a secure PRNG (hence a secure stream cipher) for any q. Plausible choices for ${\frak C}$ are semi-sparse (i.e., the affine terms are truly random) systems and other systems that are easy to evaluate from a small (compared to a generic map) number of parameters. To our knowledge, there are no other positive results for provable security of specialized polynomial systems, in particular sparse ones (which are natural candidates to investigate for speed). We can build a family of provably secure stream ciphers a rough implementation of which at the same security level can be more than twice faster than an optimized QUAD (and any other provably secure stream ciphers proposed so far), and uses much less storage. This may also help build faster provably secure hashes. We also examine the effects of recent results on specialization on security, e.g., Aumasson-Meier (ICISC 2007), which precludes Merkle-Damgaard compression using polynomials systems {uniformly very sparse in every degree} from being universally collision-free. We conclude that our ideas are consistent with and complements these new results. We think that we can build secure primitives based on specialized (versus generic) polynomial maps which are more efficient.
2006
EPRINT
Note on Design Criteria for Rainbow-Type Multivariates
This was a short note that deals with the design of Rainbow or ``stagewise unbalanced oil-and-vinegar'' multivariate signature schemes. We exhibit new cryptanalysis for current schemes that relates to flawed choices of system parameters in current schemes. These can be ameliorated according to an updated list of security design criteria.
2005
PKC
2004
CHES
2004
EPRINT
TTS: Rank Attacks in Tame-Like Multivariate PKCs
Bo-Yin Yang Jiun-Ming Chen
We herein discuss two modes of attack on multivariate public-key cryptosystems. A 2000 Goubin-Courtois article applied these techniques against a special class of multivariate PKC's called ``Triangular-Plus-Minus'' (TPM), and may explain in part the present dearth of research on ``true'' multivariates -- multivariate PKC's in which the middle map is not really taken in a much larger field. These attacks operate by finding linear combinations of matrices with a given rank. Indeed, we can describe the two attacks very aptly as ``high-rank'' and ``low-rank''. However, TPM was not general enough to cover all pertinent true multivariate PKC's. \emph{Tame-like} PKC's, multivariates with relatively few terms per equation in the central map and an easy inverse, is a superset of TPM that can enjoy both fast private maps and short set-up times. However, inattention can still let rank attacks succeed in tame-like PKCs. The TTS (Tame Transformation Signatures) family of digital signature schemes lies at this cusp of contention. Previous TTS instances (proposed at ICISC '03) claim good resistance to other known attacks. But we show how careless construction in current TTS instances (TTS/4 and TTS/$2'$) exacerbates the security concern of rank, and show two different cryptanalysis in under $2^{57}$ AES units. TTS is not the only tame-like PKC with these liabilities -- they are shared by a few other misconstructed schemes. A suitable equilibrium between speed and security must be struck. We suggest a generic way to craft tame-like PKC's more resistant to rank attacks. A demonstrative TTS variant with similar dimensions is built for which rank attack takes $>2^{80}$ AES units, while remaining very fast and as resistant to other attacks. The proposed TTS variants can scale up. In short: We show that rank attacks apply to the wider class of tame-like PKC's, sometimes even better than previously described. However, this is relativized by the realization that we can build adequately resistant tame-like multivariate PKC's, so the general theme still seem viable compared to more traditional or large-field multivariate alternatives.
2003
EPRINT
A More Secure and Efficacious TTS Signature Scheme
Jiun-Ming Chen Bo-Yin Yang
In 2002 the new genre of digital signature scheme TTS (Tame Transformation Signatures) is introduced along with a sample scheme TTS/2. TTS is from the family of multivariate cryptographic schemes to which the NESSIE primitive {SFLASH} also belongs. It is a realization of Moh's theory for digital signatures, based on Tame Transformations or Tame Maps. Properties of multivariate cryptosystems are determined mainly by their central maps. TTS uses Tame Maps as their central portion for even greater speed than $C^\ast$-related schemes (using monomials in a large field for the central portion), previously usually acknowledged as fastest. We show a small flaw in TTS/2 and present an improved TTS implementation which we call TTS/4. We will examine in some detail how well TTS/4 performs, how it stands up to previously known attacks, and why it represents an advance over TTS/2. Based on this topical assessment, we consider TTS in general and TTS/4 in particular to be competitive or superior in several aspects to other schemes, partly because the theoretical roots of TTS induce many good traits. One specific area in which TTS/4 should excel is in low-cost smartcards. It seems that the genre has great potential for practical deployment and deserves further attention by the cryptological community.

Program Committees

CHES 2020
CHES 2018
Asiacrypt 2018
PKC 2016
CHES 2014
CHES 2013