Conditional Variational AutoEncoder based on Stochastic Attacks
Over the recent years, the cryptanalysis community leveraged the potential of research on Deep Learning to enhance attacks. In particular, several studies have recently highlighted the benefits of Deep Learning based Side-Channel Attacks (DLSCA) to target real-world cryptographic implementations. While this new research area on applied cryptography provides impressive result to recover a secret key even when countermeasures are implemented (e.g. desynchronization, masking schemes), the lack of theoretical results make the construction of appropriate and powerful models a notoriously hard problem. This can be problematic during an evaluation process where a security bound is required. In this work, we propose the first solution that bridges DL and SCA in order to get this security bound. Based on theoretical results, we develop the first Machine Learning generative model, called Conditional Variational AutoEncoder based on Stochastic Attacks (cVAE-SA), designed from the well-known Stochastic Attacks, that have been introduced by Schindler et al. in 2005. This model reduces the black-box property of DL and eases the architecture design for every real-world crypto-system as we define theoretical complexity bounds which only depend on the dimension of the (reduced) trace and the targeting variable over F2n . We validate our theoretical proposition through simulations and public datasets on a wide range of use cases, including multi-task learning, curse of dimensionality and masking scheme.
Efficiency through Diversity in Ensemble Models applied to Side-Channel Attacks: – A Case Study on Public-Key Algorithms – 📺
Deep Learning based Side-Channel Attacks (DL-SCA) are considered as fundamental threats against secure cryptographic implementations. Side-channel attacks aim to recover a secret key using the least number of leakage traces. In DL-SCA, this often translates in having a model with the highest possible accuracy. Increasing an attack’s accuracy is particularly important when an attacker targets public-key cryptographic implementations where the recovery of each secret key bits is directly related to the model’s accuracy. Commonly used in the deep learning field, ensemble models are a well suited method that combine the predictions of multiple models to increase the ensemble accuracy by reducing the correlation between their errors. Linked to this correlation, the diversity is considered as an indicator of the ensemble model performance. In this paper, we propose a new loss, namely Ensembling Loss (EL), that generates an ensemble model which increases the diversity between the members. Based on the mutual information between the ensemble model and its related label, we theoretically demonstrate how the ensemble members interact during the training process. We also study how an attack’s accuracy gain translates to a drastic reduction of the remaining time complexity of a side-channel attacks through multiple scenarios on public-key implementations. Finally, we experimentally evaluate the benefits of our new learning metric on RSA and ECC secure implementations. The Ensembling Loss increases by up to 6.8% the performance of the ensemble model while the remaining brute-force is reduced by up to 222 operations depending on the attack scenario.
Ranking Loss: Maximizing the Success Rate in Deep Learning Side-Channel Analysis 📺
The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Compared to template attacks, deep learning techniques can deal with protected implementations, such as masking or desynchronization, without substantial preprocessing. However, important issues are still open. One challenging problem is to adapt the methods classically used in the machine learning field (e.g. loss function, performance metrics) to the specific side-channel context in order to obtain optimal results. We propose a new loss function derived from the learning to rank approach that helps preventing approximation and estimation errors, induced by the classical cross-entropy loss. We theoretically demonstrate that this new function, called Ranking Loss (RkL), maximizes the success rate by minimizing the ranking error of the secret key in comparison with all other hypotheses. The resulting model converges towards the optimal distinguisher when considering the mutual information between the secret and the leakage. Consequently, the approximation error is prevented. Furthermore, the estimation error, induced by the cross-entropy, is reduced by up to 23%. When the ranking loss is used, the convergence towards the best solution is up to 23% faster than a model using the cross-entropy loss function. We validate our theoretical propositions on public datasets.
Methodology for Efficient CNN Architectures in Profiling Attacks 📺
The side-channel community recently investigated a new approach, based on deep learning, to significantly improve profiled attacks against embedded systems. Previous works have shown the benefit of using convolutional neural networks (CNN) to limit the effect of some countermeasures such as desynchronization. Compared with template attacks, deep learning techniques can deal with trace misalignment and the high dimensionality of the data. Pre-processing is no longer mandatory. However, the performance of attacks depends to a great extent on the choice of each hyperparameter used to configure a CNN architecture. Hence, we cannot perfectly harness the potential of deep neural networks without a clear understanding of the network’s inner-workings. To reduce this gap, we propose to clearly explain the role of each hyperparameters during the feature selection phase using some specific visualization techniques including Weight Visualization, Gradient Visualization and Heatmaps. By highlighting which features are retained by filters, heatmaps come in handy when a security evaluator tries to interpret and understand the efficiency of CNN. We propose a methodology for building efficient CNN architectures in terms of attack efficiency and network complexity, even in the presence of desynchronization. We evaluate our methodology using public datasets with and without desynchronization. In each case, our methodology outperforms the previous state-of-the-art CNN models while significantly reducing network complexity. Our networks are up to 25 times more efficient than previous state-of-the-art while their complexity is up to 31810 times smaller. Our results show that CNN networks do not need to be very complex to perform well in the side-channel context.