Tung Chou

CryptoDB

Tung Chou

Publications and invited talks

Year

Venue

Title

2024

CRYPTO

CryptAttackTester: high-assurance attack analysis Abstract

Daniel J. Bernstein Tung Chou

Quantitative analyses of the costs of cryptographic attack algorithms play a central role in comparing cryptosystems, guiding the search for improved attacks, and deciding which cryptosystems to standardize. Unfortunately, these analyses often turn out to be wrong. Sometimes errors are not caught until years later. This paper introduces CryptAttackTester (CAT), a software framework for high-assurance quantification of attack effectiveness. CAT enforces complete definitions of attack algorithms all the way down through the model of computation, enforces complete definitions of probability predictions and cost predictions all the way down through the cost metric, and systematically tests the predictions on small-scale inputs. For example, CAT gives a fully defined meaning to the statement "the median cost of brute-force search for an AES-128 key is under 2^141.89 bit operations", and provides clear, auditable reasons to believe that the statement is correct. This does not rule out all possible analysis errors, but with CAT it is no longer possible for bugs to hide inside ambiguous or untested security-level claims. The paper gives various examples of errors in the literature that survived typical informal testing practices and that would have been caught if CAT-enforced links had been in place. As an important case study, the bulk of the current CAT release consists of full definitions of a broad spectrum of algorithms for information-set decoding (ISD), along with cost/probability predictions for each algorithm. ISD is the top attack strategy against the McEliece cryptosystem. The predictions cover interactions between (1) high-level search strategies from Prange, Lee–Brickell, Leon, Stern, Dumer, May–Meurer–Thomae, and Becker–Joux–May–Meurer; (2) random walks from Omura, Canteaut–Chabaud, Canteaut–Sendrier, and Bernstein–Lange–Peters; and (3) speedups in core subroutines such as linear algebra and sorting. The predictions also account for various attack overheads that were omitted from previous analyses. These gaps add up to roughly 10 bits, depending on parameters. CAT's tests catch much smaller errors than this. The cost metric selected in CAT has a very simple definition, is a lower bound for the price-performance ratio of non-quantum special-purpose hardware (although the bound is loose for attacks bottlenecked by long-distance communication), and allows many optimization efforts to be shared with the design of cryptographic circuits.

2022

TCHES

A Constant-time AVX2 Implementation of a Variant of ROLLO Abstract

Tung Chou Jin-Han Liou

This paper introduces a key encapsulation mechanism ROLLO+ and presents a constant-time AVX2 implementation of it. ROLLO+ is a variant of ROLLO-I targeting IND-CPA security. The main difference between ROLLO+ and ROLLO-I is that the decoding algorithm of ROLLO+ is adapted from the decoding algorithm of ROLLO-I. Our implementation of ROLLO+-I-128, one of the level-1 parameter sets of ROLLO+, takes 851823 Skylake cycles for key generation, 30361 Skylake cycles for encapsulation, and 673666 Skylake cycles for decapsulation. Compared to the state-of-the-art implementation of ROLLO-I-128 by Aguilar-Melchor et al., which is claimed to be constant-time but actually is not, our implementation achieves a 12.9x speedup for key generation, a 10.6x speedup for encapsulation, and a 14.5x speedup for decapsulation. Compared to the state-of-the-art implementation of the level-1 parameter set of BIKE by Chen, Chou, and Krausz, our key generation time is 1.4x as slow, but our encapsulation time is 3.8x as fast, and our decapsulation time is 2.4x as fast.

2022

TCHES

Complete and Improved FPGA Implementation of Classic McEliece Abstract

Po-Jen Chen Tung Chou Sanjay Deshpande Norman Lahr Ruben Niederhagen Jakub Szefer Wen Wang

We present the first specification-compliant constant-time FPGA implementation of the Classic McEliece cryptosystem from the third-round of NIST’s Post-Quantum Cryptography standardization process. In particular, we present the first complete implementation including encapsulation and decapsulation modules as well as key generation with seed expansion. All the hardware modules are parametrizable, at compile time, with security level and performance parameters. As the most time consuming operation of Classic McEliece is the systemization of the public key matrix during key generation, we present and evaluate three new algorithms that can be used for systemization while complying with the specification: hybrid early-abort systemizer (HEA), single-pass early-abort systemizer (SPEA), and dual-pass earlyabort systemizer (DPEA). All of the designs outperform the prior systemizer designs for Classic McEliece by 2.2x to 2.6x in average runtime and by 1.7x to 2.4x in time-area efficiency. We show that our complete Classic McEliece design for example can perform key generation in 5.2 ms to 20 ms, encapsulation in 0.1 ms to 0.5 ms, and decapsulation in 0.7 ms to 1.5 ms for all security levels on an Xlilinx Artix 7 FPGA. The performance can be increased even further at the cost of resources by increasing the level of parallelization using the performance parameters of our design.

2021

TCHES

Optimizing BIKE for the Intel Haswell and ARM Cortex-M4 📺 Abstract

Ming-Shing Chen Tung Chou Markus Krausz

BIKE is a key encapsulation mechanism that entered the third round of the NIST post-quantum cryptography standardization process. This paper presents two constant-time implementations for BIKE, one tailored for the Intel Haswell and one tailored for the ARM Cortex-M4. Our Haswell implementation is much faster than the avx2 implementation written by the BIKE team: for bikel1, the level-1 parameter set, we achieve a 1.39x speedup for decapsulation (which is the slowest operation) and a 1.33x speedup for the sum of all operations. For bikel3, the level-3 parameter set, we achieve a 1.5x speedup for decapsulation and a 1.46x speedup for the sum of all operations. Our M4 implementation is more than two times faster than the non-constant-time implementation portable written by the BIKE team. The speedups are achieved by both algorithm-level and instruction-level optimizations.

2021

TCHES

Classic McEliece on the ARM Cortex-M4 📺 Abstract

Ming-Shing Chen Tung Chou

This paper presents a constant-time implementation of Classic McEliece for ARM Cortex-M4. Specifically, our target platform is stm32f4-Discovery, a development board on which the amount of SRAM is not even large enough to hold the public key of the smallest parameter sets of Classic McEliece. Fortunately, the flash memory is large enough, so we use it to store the public key. For the level-1 parameter sets mceliece348864 and mceliece348864f, our implementation takes 582 199 cycles for encapsulation and 2 706 681 cycles for decapsulation. Compared to the level-1 parameter set of FrodoKEM, our encapsulation time is more than 80 times faster, and our decapsulation time is more than 17 times faster. For the level-3 parameter sets mceliece460896 and mceliece460896f, our implementation takes 1 081 335 cycles for encapsulation and 6 535 186 cycles for decapsulation. In addition, our implementation is also able to carry out key generation for the level-1 parameter sets and decapsulation for level-5 parameter sets on the board.

2021

TCHES

CTIDH: faster constant-time CSIDH 📺 Abstract

Gustavo Banegas Daniel J. Bernstein Fabio Campos Tung Chou Tanja Lange Michael Meyer Benjamin Smith Jana Sotáková

This paper introduces a new key space for CSIDH and a new algorithm for constant-time evaluation of the CSIDH group action. The key space is not useful with previous algorithms, and the algorithm is not useful with previous key spaces, but combining the new key space with the new algorithm produces speed records for constant-time CSIDH. For example, for CSIDH-512 with a 256-bit key space, the best previous constant-time results used 789000 multiplications and more than 200 million Skylake cycles; this paper uses 438006 multiplications and 125.53 million cycles.

2021

TCHES

Rainbow on Cortex-M4 📺 Abstract

Tung Chou Matthias J. Kannwischer Bo-Yin Yang

We present the first Cortex-M4 implementation of the NISTPQC signature finalist Rainbow. We target the Giant Gecko EFM32GG11B which comes with 512 kB of RAM which can easily accommodate the keys of RainbowI.We present fast constant-time bitsliced F16 multiplication allowing multiplication of 32 field elements in 32 clock cycles. Additionally, we introduce a new way of computing the public map P in the verification procedure allowing vastly faster signature verification.Both the signing and verification procedures of our implementation are by far the fastest among the NISTPQC signature finalists. Signing of rainbowIclassic requires roughly 957 000 clock cycles which is 4× faster than the state of the art Dilithium2 implementation and 45× faster than Falcon-512. Verification needs about 239 000 cycles which is 5× and 2× faster respectively. The cost of signing can be further decreased by 20% when storing the secret key in a bitsliced representation.

2017

CHES

McBits Revisited Abstract

Tung Chou

This paper presents a constant-time fast implementation for a high-security code-based encryption system. The implementation is based on the “McBits” paper by Bernstein, Chou, and Schwabe in 2013: we use the same FFT algorithms for root finding and syndrome computation, similar algorithms for secret permutation, and bitslicing for low-level operations. As opposed to McBits, where a high decryption throughput is achieved by running many decryption operations in parallel, we take a different approach to exploit the internal parallelism in one decryption operation for the use of more applications. As the result, we manage to achieve a slightly better decryption throughput at a much higher security level than McBits. As a minor contribution, we also present a constant-time implementation for encryption and key-pair generation, with similar techniques used for decryption.

2016

CHES

QcBits: Constant-Time Small-Key Code-Based Cryptography 📺

Tung Chou

2013

CHES

McBits: Fast Constant-Time Code-Based Cryptography

Daniel J. Bernstein Tung Chou Peter Schwabe

2012

CHES

Solving Quadratic Equations with XL on Parallel Architectures

Chen-Mou Cheng Tung Chou Ruben Niederhagen Bo-Yin Yang