CryptoDB
Sujoy Sinha Roy
Publications
Year
Venue
Title
2022
TCHES
Will You Cross the Threshold for Me? Generic Side-Channel Assisted Chosen-Ciphertext Attacks on NTRU-based KEMs
Abstract
In this work, we propose generic and novel side-channel assisted chosenciphertext attacks on NTRU-based key encapsulation mechanisms (KEMs). These KEMs are IND-CCA secure, that is, they are secure in the chosen-ciphertext model. Our attacks involve the construction of malformed ciphertexts. When decapsulated by the target device, these ciphertexts ensure that a targeted intermediate variable becomes very closely related to the secret key. An attacker, who can obtain information about the secret-dependent variable through side-channels, can subsequently recover the full secret key. We propose several novel CCAs which can be carried through by using side-channel leakage from the decapsulation procedure. The attacks instantiate three different types of oracles, namely a plaintext-checking oracle, a decryptionfailure oracle, and a full-decryption oracle, and are applicable to two NTRU-based schemes, which are NTRU and NTRU Prime. The two schemes are candidates in the ongoing NIST standardization process for post-quantum cryptography. We perform experimental validation of the attacks on optimized and unprotected implementations of NTRU-based schemes, taken from the open-source pqm4 library, using the EM-based side-channel on the 32-bit ARM Cortex-M4 microcontroller. All of our proposed attacks are capable of recovering the full secret key in only a few thousand chosen ciphertext queries on all parameter sets of NTRU and NTRU Prime. Our attacks, therefore, stress on the need for concrete side-channel protection strategies for NTRUbased KEMs.
2022
TCHES
Medha: Microcoded Hardware Accelerator for computing on Encrypted Data
Abstract
Homomorphic encryption enables computation on encrypted data, and hence it has a great potential in privacy-preserving outsourcing of computations to the cloud. Hardware acceleration of homomorphic encryption is crucial as software implementations are very slow. In this paper, we present design methodologies for building a programmable hardware accelerator for speeding up the cloud-side homomorphic evaluations on encrypted data.First, we propose a divide-and-conquer technique that enables homomorphic evaluations in the polynomial ring RQ,2N = ZQ[x]/(x2N + 1) to use a hardware accelerator that has been built for the smaller ring RQ,N = ZQ[x]/(xN + 1). The technique makes it possible to use a single hardware accelerator flexibly for supporting several homomorphic encryption parameter sets.Next, we present several architectural design methods that we use to realize the flexible and instruction-set accelerator architecture, which we call ‘Medha’. At every level of the implementation hierarchy, we explore possibilities for parallel processing. Starting from hardware-friendly parallel algorithms for the basic building blocks, we gradually build heavily parallel RNS polynomial arithmetic units. Next, many of these parallel units are interconnected elegantly so that their interconnections require the minimum number of nets, therefore making the overall architecture placement-friendly on the platform. As homomorphic encryption is computation- as well as data-centric, the speed of homomorphic evaluations depends greatly on the way the data variables are handled. For Medha, we take a memory-conservative design approach and get rid of any off-chip memory access during homomorphic evaluations.Finally, we implement Medha in a Xilinx Alveo U250 FPGA and measure timing performances of the microcoded homomorphic addition, multiplication, key-switching, and rescaling routines for the leveled fully homomorphic encryption scheme RNSHEAAN at 200 MHz clock frequency. For the large parameter sets (log Q,N) = (438, 214) and (546, 215), Medha achieves accelerations by up to 68× and 78× times respectively compared to a highly optimized software implementation Microsoft SEAL running at 2.3 GHz.
2020
TCHES
Generic Side-channel attacks on CCA-secure lattice-based PKE and KEMs
📺
Abstract
In this work, we demonstrate generic and practical EM side-channel assisted chosen ciphertext attacks over multiple LWE/LWR-based Public Key Encryption (PKE) and Key Encapsulation Mechanisms (KEM) secure in the chosen ciphertext model (IND-CCA security). We show that the EM side-channel information can be efficiently utilized to instantiate a plaintext checking oracle, which provides binary information about the output of decryption, typically concealed within IND-CCA secure PKE/KEMs, thereby enabling our attacks. Firstly, we identified EM-based side-channel vulnerabilities in the error correcting codes (ECC) enabling us to distinguish based on the value/validity of decrypted codewords. We also identified similar vulnerabilities in the Fujisaki-Okamoto transform which leaks information about decrypted messages applicable to schemes that do not use ECC. We subsequently exploit these vulnerabilities to demonstrate practical attacks applicable to six CCA-secure lattice-based PKE/KEMs competing in the second round of the NIST standardization process. We perform experimental validation of our attacks on implementations taken from the open-source pqm4 library, running on the ARM Cortex-M4 microcontroller. Our attacks lead to complete key-recovery in a matter of minutes on all the targeted schemes, thus showing the effectiveness of our attack.
2020
TCHES
High-speed Instruction-set Coprocessor for Lattice-based Key Encapsulation Mechanism: Saber in Hardware
📺
Abstract
In this paper, we present an instruction set coprocessor architecture for lattice-based cryptography and implement the module lattice-based post-quantum key encapsulation mechanism (KEM) Saber as a case study. To achieve fast computation time, the architecture is fully implemented in hardware, including CCA transformations. Since polynomial multiplication plays a performance-critical role in the module and ideal lattice-based public-key cryptography, a parallel polynomial multiplier architecture is proposed that overcomes memory access bottlenecks and results in a highly parallel yet simple and easy-to-scale design. Such multipliers can compute a full multiplication in 256 cycles, but are designed to target any area/performance trade-offs. Besides optimizing polynomial multiplication, we make important design decisions and perform architectural optimizations to reduce the overall cycle counts as well as improve resource utilization. For the module dimension 3 (security comparable to AES-192), the coprocessor computes CCA key generation, encapsulation, and decapsulation in only 5,453, 6,618 and 8,034 cycles respectively, making it the fastest hardware implementation of Saber to our knowledge. On a Xilinx UltraScale+ XCZU9EG-2FFVB1156 FPGA, the entire instruction set coprocessor architecture runs at 250 MHz clock frequency and consumes 23,686 LUTs, 9,805 FFs, and 2 BRAM tiles (including 5,113 LUTs and 3,068 FFs for the Keccak core).
2018
TCHES
Saber on ARM CCA-secure module lattice-based key encapsulation on ARM
Abstract
The CCA-secure lattice-based post-quantum key encapsulation scheme Saber is a candidate in the NIST’s post-quantum cryptography standardization process. In this paper, we study the implementation aspects of Saber in resourceconstrained microcontrollers from the ARM Cortex-M series which are very popular for realizing IoT applications. In this work, we carefully optimize various parts of Saber for speed and memory. We exploit digital signal processing instructions and efficient memory access for a fast implementation of polynomial multiplication. We also use memory efficient Karatsuba and just-in-time strategy for generating the public matrix of the module lattice to reduce the memory footprint. We also show that our optimizations can be combined with each other seamlessly to provide various speed-memory trade-offs. Our speed optimized software takes just 1,147K, 1,444K, and 1,543K clock cycles on a Cortex-M4 platform for key generation, encapsulation and decapsulation respectively. Our memory efficient software takes 4,786K, 6,328K, and 7,509K clock cycles on an ultra resource-constrained Cortex-M0 platform for key generation, encapsulation, and decapsulation respectively while consuming only 6.2 KB of memory at most. These results show that lattice-based key encapsulation schemes are perfectly practical for securing IoT devices from quantum computing attacks.
Program Committees
- CHES 2019
Coauthors
- Aikata (1)
- Andrea Basso (1)
- Shivam Bhasin (2)
- Anupam Chattopadhyay (2)
- Donald Donglong Chen (1)
- Vassil S. Dimitrov (1)
- Martianus Frederic Ezerman (1)
- Johann Großschädl (1)
- Kimmo U. Järvinen (2)
- Angshuman Karmakar (1)
- Howon Kim (1)
- Sunmin Kwon (1)
- Yongwoo Lee (1)
- Zhe Liu (1)
- Nele Mentens (1)
- Jose M. Bermudo Mera (1)
- Ahmet Can Mert (1)
- Debdeep Mukhopadhyay (1)
- Prasanna Ravi (2)
- Chester Rebeiro (1)
- Oscar Reparaz (1)
- Hwajeong Seo (1)
- Youngsam Shin (1)
- Ingrid Verbauwhede (6)
- Frederik Vercauteren (3)
- Donghoon Yoo (1)