### ALE: AES-Based Lightweight Authenticated Encryption

Andrey Bogdanov<sup>1</sup>, Florian Mendel<sup>2</sup>, Francesco Regazzoni<sup>3,4</sup>, Vincent Rijmen<sup>5</sup>, Elmar Tischhauser<sup>5</sup>

<sup>1</sup>Technical University of Denmark <sup>2</sup>IAIK, Graz University of Technology, Austria <sup>3</sup>ALaRI - USI, Switzerland <sup>4</sup>Delft University of Technology, Netherlands <sup>5</sup>Dept. ESAT/COSIC, KU Leuven and iMinds, Belgium

# Authenticated Encryption (AE)

- Is cryptography about encryption?
  - o Yes, but not only!
  - o Encryption alone is not enough in numerous applications
  - One might even argue that authentication is really what is needed in most cases
- Authenticated encryption

AE: (P,K) -> (C,T) with T authentication tag

Authenticated encryption with associated data

AEAD: (A,P,K) -> (A,C,T) with A associated data transmitted in plaintext

### The assumption of nonce

- Nonce N = number used once, freshness
- Nice but might be difficult to enforce in sometimes



David McGrew, DIAC'12 slides

Good news: Nonce can be "just" a counter!

#### [RBBK01] Nonce-based: AES-OCE

[BR02] [R02]

[R04] [KR11]







- Init(N): initialization function
- Inc: increment function
- Checksum = M1 xor M2 xor... Mn

#### [RBBK01] Nonce-based: AES-OCE

[R02]

[BR02]

[R04]

[KR11]





- 1 AES-128 call per block
- perfectly parallelizable
- only forgery with nonce reuse
- associated data

#### [RBBK01] Nonce-based: AES-OCI

[BR02] [R02]

[R04]

[KR11]





- 1 AES-128 call per block
- perfectly parallelizable
- only forgery with nonce reuse
- associated data

- enc/dec different
- state 4x128 bits
- (patents pending)

#### ASC-1



 $X_0 = E_K(0^{70}||00||Cntr)$ 

 $K_{1,0} = E_K(0^{70}||01||Cntr), K_{2,0} = E_K(0^{70}||10||Cntr), K_{3,0} = E_K(l(M)||0^6||11||Cntr)$ 

#### ASC-1



+

- only 4 AES-128 rounds per block
- enc/dec similar

 $X_0 = E_K(0^{70}||00||Cntr)$ 

 $K_{1,0} = E_K(0^{70}||01||Cntr), K_{2,0} = E_K(0^{70}||10||Cntr), K_{3,0} = E_K(l(M)||0^6||11||Cntr)$ 

#### ASC-1



+

- only 4 AES-128 rounds per block
- enc/dec similar

\_

- state 4x128 bits
- serial
- state recovery with nonce reuse
- slow in compact ASIC implementation
- no associated data

 $X_0 = E_K(0^{70}||00||Cntr)$ 

 $K_{1,0} = E_K(0^{70}||01||Cntr), K_{2,0} = E_K(0^{70}||10||Cntr), K_{3,0} = E_K(l(M)||0^6||11||Cntr)$ 

#### Our Goal

- Design of a dedicated AE scheme which would:
  - o require less operations on average
  - be compact in hardware (for both encryption and decryption)
  - have low power and low energy figures
  - be good in software
    - PC (AES-NI)
    - Embedded (usually not parallelizable)
  - o rely on some previous cryptanalysis



Initialization: nonce, AES with master k, 0, AES with master k, AES with ks Processing Associated Data: xor with state, 4R AES Processing Message: xor with message, 4R AES LEX leak

### LEX leak for ALE encryption

| $b_{_{0,0}}$ | $b_{_{\scriptscriptstyle{0,I}}}$ | $b_{_{0,2}}$ | $b_{_{\scriptscriptstyle{0,3}}}$ |
|--------------|----------------------------------|--------------|----------------------------------|
| $b_{_{I,0}}$ | $b_{_{I,I}}$                     | $b_{_{0,0}}$ | $b_{_{I,3}}$                     |
| $b_{_{2,0}}$ | $b_{_{2,I}}$                     | $b_{_{2,2}}$ | $b_{_{2,3}}$                     |
| $b_{_{3,0}}$ | $b_{_{3,I}}$                     | $b_{_{3,2}}$ | $b_{_{3,3}}$                     |

| $b_{_{0,0}}$ | $b_{_{_{0,I}}}$ | $b_{_{0,2}}$ | $b_{_{0,3}}$ |
|--------------|-----------------|--------------|--------------|
| $b_{_{I,0}}$ | $b_{_{I\!,I}}$  | $b_{_{0,0}}$ | $b_{_{I,3}}$ |
| $b_{_{2,0}}$ | $b_{_{2,I}}$    | $b_{_{2,2}}$ | $b_{_{2,3}}$ |
| $b_{_{3,0}}$ | $b_{_{3,I}}$    | $b_{_{3,2}}$ | $b_{_{3,3}}$ |

odd rounds

| $b_{_{\scriptscriptstyle 0,0}}$ | $b_{_{0,1}}$ | $b_{_{0,2}}$ | $b_{_{0,3}}$ |
|---------------------------------|--------------|--------------|--------------|
| $b_{_{l,0}}$                    | $b_{_{I,I}}$ | $b_{_{0,0}}$ | $b_{_{I,3}}$ |
| $b_{_{2,0}}$                    | $b_{_{2,1}}$ | $b_{_{2,2}}$ | $b_{_{2,3}}$ |
| $b_{_{\scriptscriptstyle 3,0}}$ | $b_{_{3,1}}$ | $b_{_{3,2}}$ | $b_{_{3,3}}$ |

even rounds



Initialization: nonce, AES with master k, 0, AES with master k, AES with ks Processing Associated Data: xor with state, 4R AES

Processing Message: xor with message, 4R AES LEX leak

Finalization: encrypt with AES



 $a_i$  = associated data

 $m_i$  = message

 $C_i$  = ciphertext

+

- only 4 AES-128 rounds per block
- enc/dec similar
- state 2x128 bits
- faster in compact ASIC implementation
- associated data

AES = AES-128

 $\mathcal{K}$  = 128-bit key

 $\tau$  = tag



 $a_i$  = associated data

 $m_i$  = message

 $C_i$  = ciphertext

+

- only 4 AES-128 rounds per block
- enc/dec similar
- state 2x128 bits
- faster in compact ASIC implementation
- associated data

AES = AES-128

 $\mathcal{K}$  = 128-bit key

 $\tau$  = tag

serial

- state recovery with nonce reuse

### Assumptions for ALE

 Assumption 1. Nonce-respecting adversary: A nonce is only used once with the same master key for encryption

 Assumption 2. Abort on verification failure: No additional information returned if tampering is detected (in particular, no plaintext blocks)

#### Claims for ALE

- Claim 1. State recovery: State recovery with complexity = t data blocks succeeds with prob at most t2-128
- Claim 2. Key recovery: State recovery with complexity = t data blocks succeeds with prob at most  $t2^{-128}$ , even if state recovered
- Claim 3. Forgery w/o state recovery: forgery not involving key/state recovery succeeds with prob at most 2<sup>-128</sup>

# Lightweight ASIC implementation for ALE

- ALE implemented using as base AES architecture the smallest available [Moradi et al., Eurocrypt 2011]
- Reference algorithms were implemented using the same starting AES
- STMicroelectronics 65 nm CMOS LP-HVT, Synopsis 2009.06, 20 MHz

# Lightweight ASIC implementation for ALE

| Design       | Area  | Net per 128-bit block | Overhead per message | Power  |
|--------------|-------|-----------------------|----------------------|--------|
|              | (GE)  | (clock cycles)        | (clock cycles)       | (uW)   |
| AES-ECB      | 2,435 | 226                   | -                    | 87.84  |
| AES-OCB2     | 4,612 | 226                   | 452                  | 171.23 |
| AES-OCB2 e/d | 5,916 | 226                   | 452                  | 211.01 |
| ASC-1 A      | 4,793 | 370                   | 904                  | 169.11 |
| ASC-1 A e/d  | 4,964 | 370                   | 904                  | 193.71 |
| ASC-1 B      | 5,517 | 235                   | 904                  | 199.02 |
| ASC-1 B e/d  | 5,632 | 235                   | 904                  | 207.13 |
| AES-CCM      | 3,472 | 452                   | -                    | 128.31 |
| AES-CCM e/d  | 3,765 | 452                   | -                    | 162.15 |
| ALE          | 2,579 | 105                   | 678                  | 94.87  |
| m ALE~e/d    | 2,700 | 105                   | 678                  | 102.32 |

# Lightweight ASIC implementation for ALE



### Software implementation of ALE

- Target platforms:
  - Sanby Bridge 3.1GHz (using AES-NI)
  - Embedded (estimated)
- Parallel or multiple message at a time
- Standard Sandy Bridge desktop @ 3.1 GHz

Repeated 100.000 and averaged

# Software implementation of ALE (Sandy Bridge)

cycles per byte (AES-NI)

|                                    |                      | message length (bytes) |                      |                      |                      |                      |                      |
|------------------------------------|----------------------|------------------------|----------------------|----------------------|----------------------|----------------------|----------------------|
| Algorithm                          | 128                  | 256                    | 512                  | 1024                 | 2048                 | 4096                 | 8192                 |
| ECB<br>CTR                         | 1.53<br>1.61         | 1.16<br>1.22           | 0.93<br>0.99         | 0.81<br>0.87         | 0.75<br>0.80         | $0.72 \\ 0.77$       | 0.71<br>0.76         |
| CCM*<br>GCM<br>OCB3                | 3.97<br>4.95<br>2.69 | 3.49<br>3.88<br>1.79   | 3.31<br>3.33<br>1.34 | 3.22<br>3.05<br>1.12 | 3.18<br>2.93<br>1.00 | 3.15<br>2.90<br>0.88 | 3.15<br>2.89<br>0.86 |
| $\mathbf{ASC}$ -1 $\mathbf{ALE}^*$ | $7.74 \\ 3.55$       | $4.80 \\ 2.34$         | 3.69 $1.74$          | 2.88<br>1.44         | 2.78<br>1.31         | 2.64<br>1.23         | 2.61<br>1.19         |

# Software implementation of ALE (Sandy Bridge)

cycles per byte (AES-NI)



# Software implementation of ALE (embedded)

- Serial constructions usually do not cause large overhead
- Estimated 2 to 2.5 time faster than AES-OCB

#### Conclusions

- Dedicated nonce-based AES-based AEAD design
- Reuses some cryptanalysis of Pelican-MAC and LEX
- Small hardware footprint
- Fast software (measured with AES-NI, estimated embedded)

### Thank you!