





# Key Recovery from Side-Channel Power Analysis Attacks on Non-SIMD HQC Decryption

Nathan Maillet <sup>1,3</sup> Cyrius Nugier<sup>2</sup> Vincent Migliore<sup>3</sup> Jean-Christophe Deneuville <sup>2</sup>

EDF R&D, France
 Fédération ENAC ISAE-SUPAERO ONERA, France
 LAAS-CNRS / INSA-Toulouse, France





Contributions The ISA attack The microarchitectural attack Countermeasures Conclusion

#### Why would we want to remove the SIMD?

## Usage

Context

SIMD-less implementation represents a *portable*, *IoT-friendly* version, close to what would be used in production.

#### **Usage**

SIMD-less implementation represents a portable, IoT-friendly version, close to what would be used in production.

| Version of HQC       | Cycle count (mean) | Cycle count (std) |
|----------------------|--------------------|-------------------|
| Optimized (SIMD -03) | 471                | 72                |
| SIMD -02             | 550                | 239               |
| No SIMD              | 312                | 12                |
| Reference            | 669                | 58                |

**Table:** Performances of the attacked function (expand\_and\_sum) for diverse implementations of HQC

## Quick reminder of HQC



#### Remarks

- ▶ HQC's code uses Reed-Muller (RM) and Reed-Solomon (RS) codes and works on vectors of Hamming Weight  $\approx \sqrt{n}$ , n being the length of the code.
- ▶ Any vector can be viewed as a polynomial of  $\mathcal{R} = \mathbb{F}_2[X]/(X^n 1)$ .

## Comparison of this work with known attacks on HQC

| Attack                      | Oracle<br>calls | HQC<br>implem.        | Type of SCA |
|-----------------------------|-----------------|-----------------------|-------------|
| [this work]                 | 1               | Optimized,<br>no SIMD | <b>≯</b>    |
| [TCHES:GMGL24]              | 1               | Reference             |             |
| [USENIX:SchGasGuo24]        | 1142            | Optimized             |             |
| [TCHES:HSCGJ23; AC:GNNJ23]  | 9000            | Reference             |             |
| [TCHES:GHJLNS22; AC:GNNJ23] | 10000           | Optimized             |             |
| [PQCRYPTO:SHRWS22]          | 52992           | Reference             | 4           |
| [TCHES:HSCGJ23]             | 53857           | Reference             |             |
| [TCHES:GHJLNS22]            | 866000          | Optimized             |             |

Table: Oracle calls needed for each ISA-level attack on HQC-128

Context

#### An ISA-level attack needing only 1 oracle call

| IC A    |       |       | Optimiza | ation flag |       |       |
|---------|-------|-------|----------|------------|-------|-------|
| ISA     | -0g   | -00   | -01      | -02        | -03   | -0z   |
| RISC-V  | V0/V1 | V0/V1 | V0/V1    | V0/V1      | V0/V1 | -     |
| AArch32 | V0/V1 | V0/V1 | V0/V1    | V0/V1      | V0/V1 | V0/V1 |
| ×86-64  | V0/V1 | V0/V1 | V1       | V1         | V1    | V1    |

**Table:** Applicability of the ISA-level attack for both of our variants

4/18

#### Two Side-Channel Attacks targeting the duplicated-RM decoding

#### An ISA-level attack needing only 1 oracle call

| ISA     |       |       | Optimiza | ation flag |       |       |
|---------|-------|-------|----------|------------|-------|-------|
| ISA     | -0g   | -00   | -01      | -02        | -03   | -0z   |
| RISC-V  | V0/V1 | V0/V1 | V0/V1    | V0/V1      | V0/V1 | -     |
| AArch32 | V0/V1 | V0/V1 | V0/V1    | V0/V1      | V0/V1 | V0/V1 |
| ×86-64  | V0/V1 | V0/V1 | V1       | V1         | V1    | V1    |

Table: Applicability of the ISA-level attack for both of our variants

#### A replay attack on a Cortex-M4 microarchitecture

83 (resp. 56) traces are enough to ensure a key recovery with 99% (resp. 50%) for any HQC version.

#### Countermeasures to those attacks

| Attack \ Countermeasure | Min. Dist.<br>Decoding | Adding<br>Noise | Codeword<br>masking* |
|-------------------------|------------------------|-----------------|----------------------|
| [this work]             | $\checkmark$           | ✓               | $\checkmark$         |
| [TCHES:GMGL24]          | ×                      | ×               | $\checkmark$         |
| [USENIX:SchGasGuo24]    | ×                      | $\checkmark$    | $\checkmark$         |
| [PQCRYPTO:SHRWS22]      | ×                      | $\checkmark$    | ×                    |
| [TCHES:HSCGJ23]         | ×                      | ×               | ×                    |
| [TCHES:GHJLNS22]        | ×                      | ×               | ×                    |

Table: Effectiveness of our countermeasures for each known attack

<sup>\*</sup>Under the assumption of a leakage-free codeword masking.

## Code used in HQC

#### **HQC's Encoding**



## **HQC's Decoding**



## From leakage to the secret key

Assuming  $\mathbf{v} - \mathbf{u} \cdot \mathbf{y}$  is known by SCA, and  $\mathbf{u}$  is invertible, we have:

$$\mathbf{y} = \mathbf{u}^{-1} \cdot (\mathbf{v} - (\mathbf{v} - (\mathbf{u} \cdot \mathbf{y})))$$
, and thus,  $\mathbf{x} = \mathbf{s} - \mathbf{h} \cdot \mathbf{y}$ 

## From leakage to the secret key

Assuming  $\mathbf{v} - \mathbf{u} \cdot \mathbf{y}$  is known by SCA, and  $\mathbf{u}$  is invertible, we have:

$$\mathbf{y} = \mathbf{u}^{-1} \cdot (\mathbf{v} - (\mathbf{v} - (\mathbf{u} \cdot \mathbf{y})))$$
, and thus,  $\mathbf{x} = \mathbf{s} - \mathbf{h} \cdot \mathbf{y}$ 

## Theorem (Security reduction)

 $\forall \mathbf{u}$ , a brute-force of  $n - \mathbf{rk}(\mathbf{rot}(\mathbf{u}))$  bits is enough to recover  $\mathbf{y}$ .

## From leakage to the secret key

Assuming  $\mathbf{v} - \mathbf{u} \cdot \mathbf{y}$  is known by SCA, and  $\mathbf{u}$  is invertible, we have:

The microarchitectural attack

$$\mathbf{y} = \mathbf{u}^{-1} \cdot (\mathbf{v} - (\mathbf{v} - (\mathbf{u} \cdot \mathbf{y})))$$
, and thus,  $\mathbf{x} = \mathbf{s} - \mathbf{h} \cdot \mathbf{y}$ 

## Theorem (Security reduction)

 $\forall \mathbf{u}$ , a brute-force of  $\mathbf{n} - \mathbf{rk}(\mathbf{rot}(\mathbf{u}))$  bits is enough to recover  $\mathbf{y}$ .

#### Theorem (Specifity of ${f u}$ )

Let  $\mathbf{u} \in \mathcal{R}$  and  $\mathbb{1} = 1 + X + \cdots + X^{n-1}$ . The following holds:

$$\mathbf{rk}(\mathbf{rot}(u)) = \begin{cases} 0 & \textit{if } \mathbf{u} = 0 \\ 1 & \textit{if } \mathbf{u} = 1 \\ n - 1 & \textit{if } \mathbf{u} \neq 0 \textit{ and } HW(\mathbf{u}) \textit{ is even} \\ n & \textit{otherwise, i.e., if } \mathbf{u} \neq 1 \textit{ and } HW(\mathbf{u}) \textit{ is odd} \end{cases}.$$

### Building an oracle out of HQC's C implementation

```
typedef union {
                               typedef union {
     uint16_t u16[8];
                                  int16_t i16[128];
  } codeword;
                               } expandedCodeword;
4
  void expand_and_sum(expandedCodeword *dst, codeword src[]) {
     for (size_t part = 0; part < 8; part++) {
       for (size_t i = 0; i < 16; ++i) {
         dst \rightarrow i16[(part << 4) + i] = src \rightarrow u16[part] >> i & 1;
8
9
    // sum the rest of the copies
13
```

The microarchitectural attack

Figure: C code of the Reed-Muller expand\_and\_sum function's optimized version with no SIMD

## From the C to the ASM: validation of the leakage

```
; Load from memory \mathbf{lw}\ t_0, 0(a_0); Right shift of i \mathbf{srl}\ t_1, t_0, r; Bit extraction \mathbf{andi}\ t_1, t_1, t_2; Store the extracted bit into memory \mathbf{sh}\ t_1, 0(a_1)
```

**Figure:** Instruction flow of the bit extraction in expand\_and\_sum

## From the C to the ASM: validation of the leakage

```
; Load from memory \mathbf{lw}\ t_0, 0(a_0); Right shift of i \mathbf{srl}\ t_1, t_0, r; Bit extraction \mathbf{andi}\ t_1, t_1, 1; Store the extracted bit into memory \mathbf{sh}\ t_1, 0(a_1)
```

**Figure:** Instruction flow of the bit extraction in expand\_and\_sum

| Extraction at i = 14 |   |  |       |          |          |  |  |
|----------------------|---|--|-------|----------|----------|--|--|
| inst                 |   |  | $t_1$ |          |          |  |  |
| srl-                 | 0 |  | 0     | 0        | $b_{14}$ |  |  |
| srl+                 | 0 |  | 0     | $b_{15}$ | $b_{14}$ |  |  |

#### From the C to the ASM: validation of the leakage

: Load from memory Iw  $t_0$ ,  $0(a_0)$ ; Right shift of i  $\mathbf{srl} \ t_1, \ t_0, \ r$ : Bit extraction andi  $t_1$ ,  $t_1$ , 1 : Store the extracted bit into memory **sh**  $t_1$ ,  $0(a_1)$ 

Figure: Instruction flow of the bit extraction in expand and sum

| Extraction at i = 14 |     |  |       |                      |                    |  |  |  |
|----------------------|-----|--|-------|----------------------|--------------------|--|--|--|
| inst                 |     |  | $t_1$ |                      |                    |  |  |  |
| srl-                 | 0 0 |  | 0 0   | 0<br>b <sub>15</sub> | $b_{14} \\ b_{14}$ |  |  |  |

The microarchitectural attack

| Extraction at i < 14 |   |  |       |   |           |  |  |  |
|----------------------|---|--|-------|---|-----------|--|--|--|
| inst                 |   |  | $t_1$ |   |           |  |  |  |
| srl-                 | 0 |  | 0     | 0 | $b_{i-1}$ |  |  |  |
| srl+                 | 0 |  | 0     | 0 | bi        |  |  |  |
|                      |   |  |       |   |           |  |  |  |

ntributions The ISA attack **The microarchitectural attack** Countermeasures Conclusion

#### Easiness of real-world microarchitecture



Figure: Power distributions for 0 and 1 on the point of interest for bit<sub>0</sub>

#### Attack on Cortex-M4



**Figure:** Probability of success of  $(\mathbf{v} - \mathbf{u} \cdot \mathbf{y})$ 's retrieval

#### Idea behind each countermeasure

#### Min. dist. decoder

Idea: Compare all (256) codewords to find the nearest to the input using the Hamming Distance.

As a result, an attacker can't peek on the result of  $\mathbf{v} - \mathbf{u} \cdot \mathbf{y}$  as the expand and sum function does not exists anymore.

#### Idea behind each countermeasure

#### Min. dist. decoder

*Idea*: Compare all (256) codewords to find the nearest to the input using the Hamming Distance.

#### Adding noise

Idea: Add a random error e'' of weight  $\mathbf{w}_{\mathbf{e}''}$  such that  $\binom{n}{\mathbf{w}_{\mathbf{e}''}}\gg 2^{128}$  to  $\mathbf{v}$  before decoding.

As a result, peeking on the expand\_and\_sum function gives  $(\mathbf{v} - \mathbf{u} \cdot \mathbf{y}) + \mathbf{e}'' = (\mathbf{mG} + \mathbf{e}') + \mathbf{e}''$ .

Cat.
 
$$\overline{HW(e')}$$
 $w_{e''}$ 
 $\overline{HW(e'+e'')}$ 
 $DFR$ 
 $DFR'$ 

 I
 6003.93
 11
 6007.45
 -132.86
 -132.24

Countermeasures

#### Idea behind each countermeasure

#### Min. dist. decoder

*Idea*: Compare all (256) codewords to find the nearest to the input using the Hamming Distance.

#### Adding noise

*Idea*: Add a random error  ${f e}''$  of weight  ${f w}_{{f e}''}$  such that  ${n \choose {f w}_{"'}}\gg 2^{128}$ to v before decoding.

#### Codeword masking

*Idea*: Encode a new (random) message  $\mathbf{m}'$  and compute  $\mathbf{v} + \mathbf{m}'\mathbf{G}$ before decoding.

**As a result**, reproducing the attacks "uselessly leaks"  $\mathbf{v} - \mathbf{u} \cdot \mathbf{v} + \mathbf{m}' \mathbf{G}$ .

12/18

#### Sum-up of the countermeasures

| Proposed countermeasures to prevent our attacks |             |                 |                 |                 |                 |                 |                 |                 |
|-------------------------------------------------|-------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|-----------------|
| Min. dist. decoder                              | ×           | ✓               | ×               | ×               | ✓               | ×               | ✓               | ✓               |
| Adding noise                                    | ×           | ×               | ✓               | ×               | ✓               | ✓               | ×               | ✓               |
| Codeword masking                                | ×           | ×               | ×               | $\checkmark$    | ×               | $\checkmark$    | $\checkmark$    | $\checkmark$    |
| Cycles ( $\times 10^3$ )<br>Overhead            | 62875<br>0% | 66232<br>+5.34% | 62963<br>+0.14% | 63589<br>+1.14% | 66320<br>+5.48% | 63660<br>+1.25% | 66929<br>+6.45% | 67017<br>+6.59% |

**Table:** Comparison to the reference implementation of the overhead of each countermeasure

13/18

## Thank you for your attention

#### **Acknowledgments**

This work was partially supported by the Institut Cybersecurité Occitanie (ICO) and the joint-lab SEIDO with the active participation of Arthur Villard, EDF R&D, France.

## Bibliography I

- [TCHES:GMGL24] Goy, G., Maillard, J., Gaborit, P., Loiseau, A. Single trace HQC shared key recovery with SASCA.
  IACR Transactions on Cryptographic Hardware and Embedded Systems 2024(2), 64–87 (2024).
- [USENIX:SchGasGuo24] Schröder, R.L., Gast, S., Guo, Q. Divide and surrender: Exploiting variable division instruction timing in HQC key recovery attacks.

  USENIX Security 2024: 33rd USENIX Security Symposium
- [TCHES:HSCGJ23] Huang, S., Sim, R.Q., Chuengsatiansup, C., Guo Q., Johansson, T.
  - Cache-timing attack against HQC.
  - IACR Transactions on Cryptographic Hardware and Embedded Systems 2023(3), 136–163 (2023)

## Bibliography II

[TCHES:GHJLNS22] Guo, Q., Hlauschek, C., Johansson, T., Lahr, N., Nilsson, A., Schröder, R.L.

Don't reject this: Key-recovery timing attacks due to rejection-sampling in HQC and BIKE.

IACR Transactions on Cryptographic Hardware and Embedded Systems 2022(3), 223–263 (2022)

[PQCRYPTO:SHRWS22] Schamberger, T., Holzbaur, L., Renner, J., Wachter-Zeh, A., Sigl, G.

A power side-channel attack on the reed-muller reed-solomon version of the HQC cryptosystem.

Post-Quantum Cryptography - 13th International Workshop, PQCrypto 2022. pp. 327–352

## **Bibliography III**

- [AC:GNNJ23] Guo, Q., Nabokov, D., Nilsson, A., Johansson, T. SCA-LDPC: A code-based framework for key-recovery side-channel attacks on post-quantum encryption schemes. Advances in Cryptology ASIACRYPT 2023, Part IV. Lecture Notes in Computer Science, vol. 14441, pp. 203–236
- [TIT:ABDGZ18] Aguilar-Melchor, C., Blazy, O., Deneuville, JJ.C. and Gaborit, P. and Zémor, G.
  Efficient encryption from random quasi-cyclic codes.
  IEEE Transactions on Information Theory 64(5), 3927–3943
  (2018)

## Bibliography IV

[DCC:AADGLZ24] Aguilar-Melchor, C., Aragon, N., Deneuville, J.C., Gaborit, P., Lacan, J., Zémor, G. Efficient error-correcting codes for the HQC post-quantum cryptosystem. Designs, Codes and Cryptography 92(12), 4511–4530 (2024)

[NIST:HQCround4] NIST

Post-Quantum Cryptography PQC | CSRC

https://csrc.nist.gov/projects/post-quantum-cryptography/selected-algorithms