International Association for Cryptologic Research

International Association
for Cryptologic Research


Wessel P. J. van Woerden


Advanced Lattice Sieving on GPUs, with Tensor Cores 📺
Léo Ducas Marc Stevens Wessel van Woerden
In this work, we study GPU implementations of various state-of-the-art sieving algorithms for lattices (Becker-Gama-Joux 2015, Becker-Ducas-Gama-Laarhoven 2016, Herold-Kirshanova 2017) inside the General Sieve Kernel (G6K, Albrecht et al. 2019). In particular, we extensively exploit the recently introduced Tensor Cores -- originally designed for raytracing and machine learning -- and demonstrate their fitness for the cryptanalytic task at hand. We also propose a new dual-hash technique for efficient detection of `lift-worthy' pairs to accelerate a key ingredient of G6K: finding short lifted vectors. We obtain new computational records, reaching dimension 180 for the SVP Darmstadt Challenge improving upon the previous record for dimension 155. This computation ran for 51.6 days on a server with 4 NVIDIA Turing GPUs and 1.5TB of RAM. This corresponds to a gain of about two orders of magnitude over previous records both in terms of wall-clock time and of energy efficiency.
NTRU Fatigue: How Stretched is Overstretched?
Wessel van Woerden Léo Ducas
Until recently lattice reduction attacks on NTRU lattices were thought to behave similar as on (ring)-LWE lattices with the same parameters. However several works (Albrecht-Bai-Ducas 2016, Kirchner-Fouque 2017) showed a significant gap for large moduli $q$, the so-called overstretched regime of NTRU. With the NTRU scheme being a finalist to the NIST PQC competition it is important to understand ---both asymptotically and concretely--- where the fatigue point lies exactly, i.e. at which $q$ the overstretched regime begins. Unfortunately the analysis by Kirchner and Fouque is based on an impossibility argument, which only results in an asymptotic upper bound on the fatigue point. It also does not really {\em explain} how lattice reduction actually recovers secret-key information. We propose a new analysis that asymptotically improves on that of Kirchner and Fouque, narrowing down the fatigue point for ternary NTRU from $q \leq n^{2.783+o(1)}$ to $q=n^{2.484+o(1)}$, and finally explaining the mechanism behind this phenomenon. We push this analysis further to a concrete one, settling the fatigue point at $q \approx 0.004 \cdot n^{2.484}$, and allowing precise hardness predictions in the overstretched regime. These predictions are backed by extensive experiments.
The Randomized Slicer for CVPP: Sharper, Faster, Smaller, Batchier 📺
Léo Ducas Thijs Laarhoven Wessel P. J. van Woerden
Following the recent line of work on solving the closest vector problem with preprocessing (CVPP) using approximate Voronoi cells, we improve upon previous results in the following ways: We derive sharp asymptotic bounds on the success probability of the randomized slicer, by modelling the behaviour of the algorithm as a random walk on the coset of the lattice of the target vector. We thereby solve the open question left by Doulgerakis–Laarhoven–De Weger [PQCrypto 2019] and Laarhoven [MathCrypt 2019]. We obtain better trade-offs for CVPP and its generalisations (strictly, in certain regimes), both with and without nearest neighbour searching, as a direct result of the above sharp bounds on the success probabilities. We show how to reduce the memory requirement of the slicer, and in particular the corresponding nearest neighbour data structures, using ideas similar to those proposed by Becker–Gama–Joux [Cryptology ePrint Archive, 2015]. Using $$2^{0.185d + o(d)}$$ memory, we can solve a single CVPP instance in $$2^{0.264d + o(d)}$$ time. We further improve on the per-instance time complexities in certain memory regimes, when we are given a sufficiently large batch of CVPP problem instances for the same lattice. Using $$2^{0.208d + o(d)}$$ memory, we can heuristically solve CVPP instances in $$2^{0.234d + o(d)}$$ amortized time, for batches of size at least $$2^{0.058d + o(d)}$$ . Our random walk model for analysing arbitrary-step transition probabilities in complex step-wise algorithms may be of independent interest, both for deriving analytic bounds through convexity arguments, and for computing optimal paths numerically with a shortest path algorithm. As a side result we apply the same random walk model to graph-based nearest neighbour searching, where we improve upon results of Laarhoven [SOCG 2018] by deriving sharp bounds on the success probability of the corresponding greedy search procedure.


Léo Ducas (3)
Thijs Laarhoven (1)
Marc Stevens (1)