International Association for Cryptologic Research

International Association
for Cryptologic Research


Barbara Gigerl


Smooth Passage with the Guards: Second-Order Hardware Masking of the AES with Low Randomness and Low Latency
Cryptographic devices in hostile environments can be vulnerable to physical attacks such as power analysis. Masking is a popular countermeasure against such attacks, which works by splitting every sensitive variable into d+1 randomized shares. The implementation cost of the masking countermeasure in hardware increases significantly with the masking order d, and protecting designs often results in a large overhead. One of the main drivers of the cost is the required amount of fresh randomness for masking the non-linear parts of a cipher. In the case of AES, first-order designs have been built without the need for any fresh randomness, but state-of-the-art higher-order designs still require a significant number of random bits per encryption. Attempts to reduce the randomness however often result in a considerable latency overhead, which is not favorable in practice. This raises the need for AES designs offering a decent performance tradeoff, which are efficient both in terms of required randomness and latency.In this work, we present a second-order AES design with the minimal number of three shares, requiring only 3 200 random bits per encryption at a latency of 5 cycles per round. Our design represents a significant improvement compared to state-of-the-art designs that require more randomness and/or have a higher latency. The core of the design is an optimized 5-cycle AES S-box which needs 78 bits of fresh randomness. We use this S-box to construct a round-based AES design, for which we present a concept for sharing randomness across the S-boxes based on the changing of the guards (COTG) technique. We assess the security of our design in the probing model using a formal verification tool. Furthermore, we evaluate the practical side-channel resistance on an FPGA.
Riding the Waves Towards Generic Single-Cycle Masking in Hardware
Research on the design of masked cryptographic hardware circuits in the past has mostly focused on reducing area and randomness requirements. However, many embedded devices like smart cards and IoT nodes also need to meet certain performance criteria, which is why the latency of masked hardware circuits also represents an important metric for many practical applications.The root cause of latency in masked hardware circuits is the need for additional register stages that synchronize the propagation of shares. Otherwise, glitches would violate the basic assumptions of the used masking scheme. This issue can be addressed to some extent, e.g., by using lightweight cryptographic algorithms with low-degree Sboxes, however, many applications still require the usage of schemes with higher-degree S-boxes like AES. Several recent works have already proposed solutions that help reduce this latency yet they either come with noticeably increased area/randomness requirements, limitations on masking orders, or specific assumptions on the general architecture of the crypto core.In this work, we introduce a generic and efficient method for designing single-cycle glitch-resistant (higher-order) masked hardware of cryptographic S-boxes. We refer to this technique as (generic) Self-Synchronized Masking (“SESYM”). The main idea of our approach is to replace register stages with a partial dual-rail encoding of masked signals that ensures synchronization within the circuit. More concretely, we show that WDDL gates and Muller C-elements can be used in combination with standard masking schemes to design single-cycle S-box circuits that, especially in case of higher-degree S-boxes, have noticeably lower requirements in terms of area and online randomness. We apply our method to DOM-based S-boxes of Ascon and AES and compare the resulting circuits to existing latency optimized circuits based on TI, GLM, and LMDPL. The latency of all three designs is reduced to single-cycle operation and are dth-order secure. Compared to GLM-masked Ascon, our approach comes with a 6.4 times reduction in online randomness for all protection orders. Compared to 1st-order LMDPL-masked AES, our approach achieves comparable results, while it is more generic, amongst others, by also supporting higher-order designs. We also underline the practical protection of our constructions against power analysis attacks via empirical and formal verification approaches.
Secure and Efficient Software Masking on Superscalar Pipelined Processors 📺
Barbara Gigerl Robert Primas Stefan Mangard
Physical side-channel attacks like power analysis pose a serious threat to cryptographic devices in real-world applications. Consequently, devices implement algorithmic countermeasures like masking. In the past, works on the design and verification of masked software implementations have mostly focused on simple microprocessors that findusage on smart cards. However, many other applications such as in the automotive industry require side-channel protected cryptographic computations on much more powerful CPUs. In such situations, the security loss due to complex architectural side-effects, the corresponding performance degradation, as well as discussions of suitable probing models and verification techniques are still vastly unexplored research questions. We answer these questions and perform a comprehensive analysis of more complex processor architectures in the context of masking-related side effects. First, we analyze the RISC-V SweRV core — featuring a 9-stage pipeline, two execution units, and load/store buffers — and point out a significant gap between security in a simple software probing model and practical security on such CPUs. More concretely, we show that architectural side effects of complex CPU architectures can significantly reduce the protection order of masked software, both via formal analysis in the hardware probing model, as well as empirically via gate-level timing simulations. We then discuss the options of fixing these problems in hardware or leaving them as constraints to software. Based on these software constraints, we formulate general rules for the design of masked software on more complex CPUs. Finally, we compare several implementation strategies for masking schemes and present in a case study that designing secure masked software for complex CPUs is still possible with overhead as low as 13%.