International Association for Cryptologic Research

International Association
for Cryptologic Research


Lilian Bossuet


Conditional Variational AutoEncoder based on Stochastic Attacks
Over the recent years, the cryptanalysis community leveraged the potential of research on Deep Learning to enhance attacks. In particular, several studies have recently highlighted the benefits of Deep Learning based Side-Channel Attacks (DLSCA) to target real-world cryptographic implementations. While this new research area on applied cryptography provides impressive result to recover a secret key even when countermeasures are implemented (e.g. desynchronization, masking schemes), the lack of theoretical results make the construction of appropriate and powerful models a notoriously hard problem. This can be problematic during an evaluation process where a security bound is required. In this work, we propose the first solution that bridges DL and SCA in order to get this security bound. Based on theoretical results, we develop the first Machine Learning generative model, called Conditional Variational AutoEncoder based on Stochastic Attacks (cVAE-SA), designed from the well-known Stochastic Attacks, that have been introduced by Schindler et al. in 2005. This model reduces the black-box property of DL and eases the architecture design for every real-world crypto-system as we define theoretical complexity bounds which only depend on the dimension of the (reduced) trace and the targeting variable over F2n . We validate our theoretical proposition through simulations and public datasets on a wide range of use cases, including multi-task learning, curse of dimensionality and masking scheme.
Message-recovery Laser Fault Injection Attack on the Classic McEliece Cryptosystem 📺
Code-based public-key cryptosystems are promising candidates for standardization as quantum-resistant public-key cryptographic algorithms. Their security is based on the hardness of the syndrome decoding problem. Computing the syndrome in a finite field, usually $\F_{2}$, guarantees the security of the constructions. We show in this article that the problem becomes considerably easier to solve if the syndrome is computed in $\mathbb{N}$ instead. By means of laser fault injection, we illustrate how to force the matrix-vector product in $\mathbb{N}$ by corrupting specific instructions, and validate it experimentally. To solve the syndrome decoding problem in $\mathbb{N}$, we propose a reduction to an integer linear programming problem. We leverage the computational efficiency of linear programming solvers to obtain real-time message recovery attacks against all the code-based proposals to the NIST Post-Quantum Cryptography standardization challenge. We perform our attacks on worst-case scenarios, i.e. random binary codes, and retrieve the initial message within minutes on a desktop computer. Our practical evaluation of the attack targets the reference implementation of the Niederreiter cryptosystem in the NIST finalist \textit{Classic McEliece} and is feasible for all proposed parameters sets of this submission. For example, for the 256-bit security parameters sets, we successfully recover the plaintext in a couple of seconds on a desktop computer Finally, we highlight the fact that the attack is still possible if only a fraction of the syndrome entries are faulty. This makes the attack feasible even though the fault injection does not have perfect repeatability and reduces the computational complexity of the attack, making it even more practical overall.
Efficiency through Diversity in Ensemble Models applied to Side-Channel Attacks: – A Case Study on Public-Key Algorithms – 📺
Deep Learning based Side-Channel Attacks (DL-SCA) are considered as fundamental threats against secure cryptographic implementations. Side-channel attacks aim to recover a secret key using the least number of leakage traces. In DL-SCA, this often translates in having a model with the highest possible accuracy. Increasing an attack’s accuracy is particularly important when an attacker targets public-key cryptographic implementations where the recovery of each secret key bits is directly related to the model’s accuracy. Commonly used in the deep learning field, ensemble models are a well suited method that combine the predictions of multiple models to increase the ensemble accuracy by reducing the correlation between their errors. Linked to this correlation, the diversity is considered as an indicator of the ensemble model performance. In this paper, we propose a new loss, namely Ensembling Loss (EL), that generates an ensemble model which increases the diversity between the members. Based on the mutual information between the ensemble model and its related label, we theoretically demonstrate how the ensemble members interact during the training process. We also study how an attack’s accuracy gain translates to a drastic reduction of the remaining time complexity of a side-channel attacks through multiple scenarios on public-key implementations. Finally, we experimentally evaluate the benefits of our new learning metric on RSA and ECC secure implementations. The Ensembling Loss increases by up to 6.8% the performance of the ensemble model while the remaining brute-force is reduced by up to 222 operations depending on the attack scenario.
Ranking Loss: Maximizing the Success Rate in Deep Learning Side-Channel Analysis 📺
The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Compared to template attacks, deep learning techniques can deal with protected implementations, such as masking or desynchronization, without substantial preprocessing. However, important issues are still open. One challenging problem is to adapt the methods classically used in the machine learning field (e.g. loss function, performance metrics) to the specific side-channel context in order to obtain optimal results. We propose a new loss function derived from the learning to rank approach that helps preventing approximation and estimation errors, induced by the classical cross-entropy loss. We theoretically demonstrate that this new function, called Ranking Loss (RkL), maximizes the success rate by minimizing the ranking error of the secret key in comparison with all other hypotheses. The resulting model converges towards the optimal distinguisher when considering the mutual information between the secret and the leakage. Consequently, the approximation error is prevented. Furthermore, the estimation error, induced by the cross-entropy, is reduced by up to 23%. When the ranking loss is used, the convergence towards the best solution is up to 23% faster than a model using the cross-entropy loss function. We validate our theoretical propositions on public datasets.
Methodology for Efficient CNN Architectures in Profiling Attacks 📺
The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Previous works have shown the benefit of using convolutional neural networks (CNN) to limit the effect of some countermeasures such as desynchronization. Compared with template attacks, deep learning techniques can deal with trace misalignment and the high dimensionality of the data. Pre-processing is no longer mandatory. However, the performance of attacks depends to a great extent on the choice of each hyperparameter used to configure a CNN architecture. Hence, we cannot perfectly harness the potential of deep neural networks without a clear understanding of the network’s inner-workings. To reduce this gap, we propose to clearly explain the role of each hyperparameters during the feature selection phase using some specific visualization techniques including Weight Visualization, Gradient Visualization and Heatmaps. By highlighting which features are retained by filters, heatmaps come in handy when a security evaluator tries to interpret and understand the efficiency of CNN. We propose a methodology for building efficient CNN architectures in terms of attack efficiency and network complexity, even in the presence of desynchronization. We evaluate our methodology using public datasets with and without desynchronization. In each case, our methodology outperforms the previous state-of-the-art CNN models while significantly reducing network complexity. Our networks are up to 25 times more efficient than previous state-of-the-art while their complexity is up to 31810 times smaller. Our results show that CNN networks do not need to be very complex to perform well in the side-channel context.