Aikata Aikata

CryptoDB

Aikata Aikata

Publications and invited talks

Year

Venue

Title

2025

TCHES

REED: Chiplet-based Accelerator for Fully Homomorphic Encryption Abstract

Aikata Aikata Ahmet Can Mert Sunmin Kwon Maxim Deryabin Sujoy Sinha Roy

Fully Homomorphic Encryption (FHE) enables privacy-preserving computation and has many applications. However, its practical implementation faces massive computation and memory overheads. To address this bottleneck, several Application-Specific Integrated Circuit (ASIC) FHE accelerators have been proposed. All these prior works put every component needed for FHE onto one chip (monolithic), hence offering high performance. However, they encounter common challenges associated with large-scale chip design, such as inflexibility, low yield, and high manufacturing costs. In this paper, we present the first-of-its-kind multi-chiplet-based FHE accelerator ‘REED’ for overcoming the limitations of prior monolithic designs. To utilize the advantages of multi-chiplet structures while matching the performance of larger monolithic systems, we propose and implement several novel strategies in the context of FHE. These include a scalable chiplet design approach, an effective framework for workload distribution, a custom inter-chiplet communication strategy, and advanced pipelined Number Theoretic Transform and automorphism design to enhance performance.Our instruction-set and power simulations experiments with a prelayout netlist indicate that REED 2.5D microprocessor consumes 96.7mm2 chip area, 49.4Waverage power in 7nm technology. It could achieve a remarkable speedup of up to 2,991x compared to a CPU (24-core 2xIntel X5690) and offer 1.9x better performance, along with a 50% reduction in development costs when compared to state-of-the-art ASIC FHE accelerators. Furthermore, our work presents the first instance of benchmarking an encrypted deep neural network (DNN) training. Overall, the REED architecture design offers a highly effective solution for accelerating FHE, thereby significantly advancing the practicality and deployability of FHE in real-world applications.

2025

TCHES

ToFA: Towards Fault Analysis of GIFT and GIFT-like Ciphers Leveraging Truncated Impossible Differentials Abstract

Anup Kumar Kundu Shibam Ghosh Aikata Aikata Dhiman Saha

In this work, we introduce ToFA, the first fault attack (FA) strategy that attempts to leverage the classically well-known idea of impossible differential cryptanalysis to mount practically verifiable attacks on bit-oriented ciphers like GIFT and BAKSHEESH. The idea stems from the fact that truncated differential paths induced due to fault injection in certain intermediate rounds of the ciphers lead to active SBox-es in subsequent rounds whose inputs admit specific truncated differences. This leads to a (multi-round) impossible differential distinguisher, which can be incrementally leveraged for key-guess elimination via partial decryption. The key-space reduction further exploits the multi-round impossibility, capitalizing on the relations due to the quotient-remainder (QR) groups of the GIFT and BAKSHEESH linear layer, which increases the filtering capability of the distinguisher. Moreover, the primary observations made in this work are independent of the actual SBox. Clock glitch based fault attacks were mounted on 8-bit implementations of GIFT- 64/GIFT-128 using a ChipWhisperer Lite board on an 8-bit ATXmega128D4-AU micro-controller. Unique key recovery was achieved for GIFT-128 with 3 random byte faults, while for GIFT-64, key space was reduced to 232, the highest achievable for GIFT-64, with a single level fault due to its key-schedule. To the best of our knowledge, this work also reports the highest fault injection penetration for any variant of GIFT and BAKSHEESH. Finally, this work reiterates the role of classical cryptanalysis strategies in fault vulnerability assessment by showcasing the most efficient fault attacks on GIFT.

2024

CRYPTO

Exploring the Advantages and Challenges of Fermat NTT in FHE Acceleration Abstract

Andrey Kim Ahmet Can Mert Anisha Mukherjee Aikata Aikata Maxim Deryabin Sunmin Kwon HyungChul Kang Sujoy Sinha Roy

Recognizing the importance of a fast and resource-efficient polynomial multiplication in homomorphic encryption, in this paper, we design a \emph{multiplier-less} number theoretic transform using a Fermat number as an auxiliary modulus. To make this algorithm scalable with the degree of polynomial, we apply a univariate to multivariate polynomial ring transformation. We develop an accelerator architecture for fully homomorphic encryption using these algorithmic techniques for efficient multivariate polynomial multiplication. For practical homomorphic encryption application benchmarks, the hardware accelerator achieves a 1,200$\times$ speed-up compared to software implementations. Finally, we conclude the paper by discussing the advantages and limitations of the proposed polynomial multiplication method.

2023

TCHES

Kavach: Lightweight masking techniques for polynomial arithmetic in lattice-based cryptography Abstract

Aikata Aikata Andrea Basso Gaetan Cassiers Ahmet Can Mert Sujoy Sinha Roy

Lattice-based cryptography has laid the foundation of various modern-day cryptosystems that cater to several applications, including post-quantum cryptography. For structured lattice-based schemes, polynomial arithmetic is a fundamental part. In several instances, the performance optimizations come from implementing compact multipliers due to the small range of the secret polynomial coefficients. However, this optimization does not easily translate to side-channel protected implementations since masking requires secret polynomial coefficients to be distributed over a large range. In this work, we address this problem and propose two novel generalized techniques, one for the number theoretic transform (NTT) based and another for the non-NTT-based polynomial arithmetic. Both these proposals enable masked polynomial multiplication while utilizing and retaining the small secret property.For demonstration, we used the proposed technique and instantiated masked multipliers for schoolbook as well as NTT-based polynomial multiplication. Both of these can utilize the compact multipliers used in the unmasked implementations. The schoolbook multiplication requires an extra polynomial accumulation along with the two polynomial multiplications for a first-order protected implementation. However, this cost is nothing compared to the area saved by utilizing the existing cheap multiplication units. We also extensively test the side-channel resistance of the proposed design through TVLA to guarantee its first-order security.

2023

TCHES

ModHE: Modular Homomorphic Encryption Using Module Lattices: Potentials and Limitations Abstract

Anisha Mukherjee Aikata Aikata Ahmet Can Mert Yongwoo Lee Sunmin Kwon Maxim Deryabin Sujoy Sinha Roy

The promising field of homomorphic encryption enables functions to be evaluated on encrypted data and produce results for the same computations done on plaintexts. It, therefore, comes as no surprise that many ventures at constructing homomorphic encryption schemes have come into the limelight in recent years. Most popular are those that rely on the hard lattice problem, called the Ring Learning with Errors problem (RLWE). One major limitation of these homomorphic encryption schemes is that in order to securely increase the maximum multiplicative depth, they need to increase the polynomial-size (degree of the polynomial ring) thereby also ncreasing the complexity of the design. We aim to bridge this gap by proposing a homomorphic encryption (HE) scheme based on the Module Learning with Errors problem (MLWE), ModHE that allows us to break the big computations into smaller ones. Given the popularity of module lattice-based post-quantum schemes, it is an evidently interesting research endeavor to also formulate module lattice-based homomorphic encryption schemes. While our proposed scheme is general, as a case study, we port the well-known RLWE-based CKKS scheme to the MLWE setting. The module version of the scheme completely stops the polynomial-size blowups when aiming for a greater circuit depth. Additionally, it presents greater opportunities for designing flexible, reusable, and parallelizable hardware architecture. A hardware implementation is provided to support our claims. We also acknowledge that as we try to decrease the complexity of computations, the amount of computations (such as relinearizations) increases. We hope that the potential and limitations of using such a hardware-friendly scheme will spark further research.