Complete and Improved FPGA Implementation of Classic McEliece
We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST’s Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming operation of Classic McEliece is the systemization of the public key matrix during key generation, we present and evaluate three new algorithms that can be used for systemization while complying with the specification: hybrid early-abort systemizer (HEA), single-pass early-abort systemizer (SPEA), and dual-pass earlyabort systemizer (DPEA). All of the designs outperform the prior systemizer designs for Classic McEliece by 2.2x to 2.6x in average runtime and by 1.7x to 2.4x in time-area efficiency. We show that our complete Classic McEliece design for example can perform key generation in 5.2 ms to 20 ms, encapsulation in 0.1 ms to 0.5 ms, and decapsulation in 0.7 ms to 1.5 ms for all security levels on an Xlilinx Artix 7 FPGA. The performance can be increased even further at the cost of resources by increasing the level of parallelization using the performance parameters of our design.
Don’t Reject This: Key-Recovery Timing Attacks Due to Rejection-Sampling in HQC and BIKE
Well before large-scale quantum computers will be available, traditional cryptosystems must be transitioned to post-quantum (PQ) secure schemes. The NIST PQC competition aims to standardize suitable cryptographic schemes. Candidates are evaluated not only on their formal security strengths, but are also judged based on the security with regard to resistance against side-channel attacks. Although round 3 candidates have already been intensively vetted with regard to such attacks, one important attack vector has hitherto been missed: PQ schemes often rely on rejection sampling techniques to obtain pseudorandomness from a specific distribution. In this paper, we reveal that rejection sampling routines that are seeded with secretdependent information and leak timing information result in practical key recovery attacks in the code-based key encapsulation mechanisms HQC and BIKE.Both HQC and BIKE have been selected as alternate candidates in the third round of the NIST competition, which puts them on track for getting standardized separately o the finalists. They have already been specifically hardened with constant-time decoders to avoid side-channel attacks. However, in this paper, we show novel timing vulnerabilities in both schemes: (1) Our secret key recovery attack on HQC requiresonly approx. 866,000 idealized decapsulation timing oracle queries in the 128-bit security setting. It is structurally different from previously identified attacks on the scheme: Previously, exploitable side-channel leakages have been identified in the BCH decoder of a previously submitted HQC version, in the ciphertext check as well as in the pseudorandom function of the Fujisaki-Okamoto transformation. In contrast, our attack uses the fact that the rejection sampling routine invoked during the deterministic re-encryption of the decapsulation leaks secret-dependent timing information, which can be efficiently exploited to recover the secret key when HQC is instantiated with the (now constant-time) BCH decoder, as well as with the RMRS decoder of the current submission. (2) From the timing information of the constant weight word sampler in the BIKE decapsulation, we demonstrate how to distinguish whether the decoding step is successful or not, and how this distinguisher is then used in the framework of the GJS attack to derive the distance spectrum of the secret key, using 5.8 x 107 idealized timing oracle queries. We provide details and analyses of the fully implemented attacks, as well as a discussion on possible countermeasures and their limits.
ISA Extensions for Finite Field Arithmetic: Accelerating Kyber and NewHope on RISC-V 📺
We present and evaluate a custom extension to the RISC-V instruction set for finite field arithmetic. The result serves as a very compact approach to software-hardware co-design of PQC implementations in the context of small embedded processors such as smartcards. The extension provides instructions that implement finite field operations with subsequent reduction of the result. As small finite fields are used in various PQC schemes, such instructions can provide a considerable speedup for an otherwise software-based implementation. Furthermore, we create a prototype implementation of the presented instructions for the extendable VexRiscv core, integrate the result into a chip design, and evaluate the design on two different FPGA platforms. The effectiveness of the extension is evaluated by using the instructions to optimize the Kyber and NewHope key-encapsulation schemes. To that end, we also present an optimized software implementation for the standard RISC-V instruction set for the polynomial arithmetic underlying those schemes, which serves as basis for comparison. Both variants are tuned on an assembler level to optimally use the processor pipelines of contemporary RISC-V CPUs. The result shows a speedup for the polynomial arithmetic of up to 85% over the basic software implementation. Using the custom instructions drastically reduces the code and data size of the implementation without introducing runtime-performance penalties at a small cost in circuit size. When used in the selected schemes, the custom instructions can be used to replace a full general purpose multiplier to achieve very compact implementations.
Side Channel Information Set Decoding using Iterative Chunking 📺
This paper presents an attack based on side-channel information and information set decoding (ISD) on the code-based Niederreiter cryptosystem and an evaluation of the practicality of the attack using an electromagnetic side channel. We start by directly adapting the timing side-channel plaintext-recovery attack by Shoufan et al. from 2010 to the constant-time implementation of the Niederreiter cryptosystem as used in the official FPGA-implementation of the NIST finalist “Classic McEliece”. We then enhance our attack using ISD and a new technique that we call iterative chunking to further significantly reduce the number of required side-channel measurements. We theoretically show that our attack improvements have a significant impact on reducing the number of required side-channel measurements. For example, for the 256-bit security parameter set kem/mceliece6960119 of “Classic McEliece”, we improve the basic attack that requires 5415 measurements to less than 562 measurements on average to mount a successful plaintext-recovery attack. Further reductions can be achieved at the price of increasing the cost of the ISD computations. We confirm our findings by practically mounting the attack on the official FPGA-implementation of “Classic McEliece” for all proposed parameter sets.