

### An Efficient Countermeasure against Correlation Power-Analysis Attacks with Randomized Montgomery Operations for DF-ECC Processor



Jen-Wei Lee, Szu-Chi Chung, Hsie-Chia Chang, and Chen-Yi Lee

Department of Electronics Engineering and Institute of Electronics, National Chiao-Tung University (NCTU), Hsinchu, Taiwan Email: jenweilee@gmail.com

# **Power-Analysis Attacks**



Execution time depends on key value by direct implementation

 $\rightarrow$  secrete information leakage through simple power-analysis (SPA) attack

# **Power-Analysis Attacks**

SPA attack can be counteracted by unified operations

#### Correlation power-analysis (CPA) attack

- utilize statistical analysis to disclose private information of cryptographic devices
- work on EC integrated encryption, single pass EC Diffie-Hellman or Menezes-Qu-Vanstone key agreement P = 7P

#### > CPA attack on SPA-resistant ECC device

- key-dependent EC scalar multiplication (ECSM

Algorithm: Montgomery Ladder Input: an integer *K* and a point *P* 

Output: KP

1. 
$$P_1 \leftarrow P, P_2 \leftarrow 2P;$$

2. For i from m - 2 down to 0 do

If  $K_i = 1$  then  $P_1 \leftarrow ECPA(P_1, P_2), P_2 \leftarrow ECPD(P_2)$ else

 $P_2 \leftarrow ECPA(P_2, P_1), P_1 \leftarrow ECPD(P_1)$ 

End

3. Return  $P_1$ 

C device 
$$K_{m-3}=1$$
  $P_1=71$  ...  
(ECSM)  
 $K_{m-2}=1$   $P_1=3P$   $P_2=4P$   $P_1=6P$  ...  
 $P_1=P$   $P_2=2P$   $K_{m-3}=0$   $P_2=7P$  ...  
 $K_{m-2}=0$   $K_{m-3}=1$   $P_1=5P$  ...  
 $P_1=2P$   $P_2=3P$   $P_2=6P$  ...  
 $P_2=3P$   $K_{m-3}=0$   $P_1=4P$  ...  
 $P_2=5P$  ...

Time complexity is O(2m)

# **Previous Works**

### Circuit level

- wave dynamic differential logic [HWANG'06]
- random switching logic [SAEKI'09]

### Register addressing

- random register renaming [ITOH'03]

### Algorithm level

- randomized EC point [CORON'99]
- randomized scalar key [CORON'99]
- randomized projective coordinates [CORON'99]
- elliptic curve isomorphisms over GF(p) [JOYE'01]

#### Software implementation

- random delay generation [CORON'09]

# Motivation

Provide a solution that is suitable and efficient for ECC hardware implementation

- support dual-field operations for high security level
  - dual-field ECC (DF-ECC) function is approved in IEEE P1363
- compatible to current public-key cryptography
  - use initial EC parameters
- hardware speed
  - field inversion/division and multiplication dominate execution time
- hardware complexity
  - arithmetic unit integration

# **Our Solution**

Mask intermediate values by computing field arithmetic in a randomized domain

- Montgomery domain
  - $A \equiv a \cdot 2^m \pmod{p}$ , *a* is in integer domain and *m* is field length
- random domain (or random field automorphism)
  - $A \equiv a \cdot 2^{\lambda} (mod \ p)$ , domain value  $\lambda$  equals to hamming weight of an *m*-bit non-zero random value  $\alpha$

Table 1. Operations in Randomized Domain

| Operation                                  | Arithmetic                                                   |
|--------------------------------------------|--------------------------------------------------------------|
| randomized Montgomery multiplication (RMM) | $RMM(X,Y) \equiv x \cdot y \cdot 2^{\lambda} (mod \ p)$      |
| randomized Montgomery division (RMD)       | $RMD(X,Y) \equiv x \cdot y^{-1} \cdot 2^{\lambda} (mod \ p)$ |
| randomized addition (RA)                   | $RA(X,Y) \equiv (x+y) \cdot 2^{\lambda} (mod \ p)$           |
| randomized subtraction (RS)                | $RS(X,Y) \equiv (x-y) \cdot 2^{\lambda} (mod \ p)$           |

# **Our Solution**

#### Random field automorphism for ECSM calculation



### **Proposed Randomized Montgomery Algorithm**

#### > Radix-2 RMM

- if  $\alpha_i = 1$ • decrease domain value by 1 in step 4 -R = R/2- if  $\alpha_i = 0$ • remain domain value in step 5

-R=R

- after *m* iterations
  - domain value is −<sup>λ</sup>

Algorithm 2 Radix-2 randomized Montgomery multiplicationInput: X, Y, p, and  $\alpha$ Output: R = RMM(X, Y)1. Let V = X, R = 0, S = Y2. For i from 0 to m - 1 do3.  $R \equiv R + V_0 \cdot S \pmod{p}, V = V/2$ 4. If  $\alpha_i = 1$  then  $R \equiv R/2 \pmod{p}$ 5. else  $S \equiv 2S \pmod{p}$ 6. Return R

### **Proposed Randomized Montgomery Algorithm**

#### Radix-2 RMD

- if  $\alpha_i = 1$  increase domain value by 1 in steps 4, 7, 10, 13 - U = U/2-R = 2R- if  $\alpha_i = 0$  remain domain value in steps 5, 8, 11, 14

- after *m* iterations
  - domain value is *λ*

Algorithm 4 Radix-2 randomized Montgomery division **Input:** X, Y, p, and  $\alpha$ **Output:** R = RMD(X, Y)1. Let U = p, V = Y, R = 0, S = X2. While (V > 0) do If U is even then U = U/23. 4. If  $\alpha_i = 1$  then  $S \equiv 2S \pmod{p}$ else  $R \equiv R/2 \pmod{p}$ 5.else if V is even then V = V/26. 7. If  $\alpha_i \equiv 1$  then  $R \equiv 2R \pmod{p}$ else  $S \equiv S/2 \pmod{p}$ 8. 9. else if U > V then U = (U - V)/2If  $\alpha_i = 1$  then  $R \equiv R - S \pmod{p}$ ,  $S \equiv 2S \pmod{p}$ 10. else  $R \equiv (R - S)/2 \pmod{p}$ 11. 12. else V = (V - U)/2If  $\alpha_i \equiv 1$  then  $S \equiv S - R \pmod{p}$ ,  $R \equiv 2R \pmod{p}$ 13. else  $S \equiv (S - R)/2 \pmod{p}$ 14.15. If i < m then i = i + 116. Return R

# **Extend Radix-2 to Radix-4 Approach**

#### Based on extended Euclidean algorithm

 $\begin{aligned} X^{-1} \cdot Y \cdot R &\equiv U \cdot 2^{i} \pmod{p} & \text{initial values:} \quad (U, V, R, S) \Rightarrow (p, Y, 0, X) \\ X^{-1} \cdot Y \cdot S &\equiv V \cdot 2^{i} \pmod{p} & \text{final iteration:} \quad (U, V, R, S) \Rightarrow (1, 0, XY^{-1}2^{m} \pmod{p}, 0) \end{aligned}$ 

|                                                                                  | С          | d             | Properties                                                                                   |  |  |
|----------------------------------------------------------------------------------|------------|---------------|----------------------------------------------------------------------------------------------|--|--|
| 1. $U$ or $V \pmod{4} = 0$                                                       | 0          | 0, 1, 2, or 3 | $gcd(U,V) = gcd(\frac{U}{4},V)$                                                              |  |  |
| 2. $U \pmod{4} = V \pmod{4}$                                                     | 1, 2, or 3 | 0             | $gcd(U, V) = gcd(U, \frac{V}{4})$                                                            |  |  |
| 2  U/U/(a = 1.4) := a = a = 1.4                                                  | c          | = d           | $gcd(U,V) = gcd(\frac{U-V}{4},V) = gcd(U,\frac{V-U}{4})$                                     |  |  |
| 3. $U/V \pmod{4}$ is even and $V/U \pmod{4}$ is odd<br>4. U and V (mod 4) is odd | 2          | 1 or 3        | $gcd(U,V) = gcd(\frac{\frac{U}{2} - V}{2}, V) = gcd(\frac{U}{2}, \frac{V - \frac{U}{2}}{2})$ |  |  |
|                                                                                  | 1 or 3     | 2             | $gcd(U,V) = gcd(\frac{U-\frac{V}{2}}{2},\frac{V}{2}) = gcd(U,\frac{\frac{V}{2}-U}{2})$       |  |  |
|                                                                                  | other      |               | $gcd(U,V) = gcd(\frac{U-V}{2},V) = gcd(U,\frac{V-U}{2})$                                     |  |  |

 $c = U \pmod{4}, d = V \pmod{4}$ 

# **Extend Radix-2 to Radix-4 Approach**

- Modify iterative calculation in radix-4 RMM/RMD to ensure domain value decreases/increases by 2 to 0
  - if two-bit random value is (11)
    - decrease/increase domain value by 2
  - if two-bit random value is (10) or (01)
    - decrease/increase domain value by 1
  - if two-bit random value is (00)
    - remain domain value

## **Proposed Randomized Montgomery Algorithm**

#### > Radix-4 RMM

- if  $(\alpha_{2i+1}, \alpha_{2i}) = (11)$ 
  - decrease domain value by 2 in step 5

-R = R/4

- if 
$$(\alpha_{2i+1}, \alpha_{2i}) = (10)$$
 or  $(01)$ 

decrease domain value by 1

in step 6

$$-R = R/2$$

- if  $(\alpha_{2i+1}, \alpha_{2i}) = (00)$ 

• remain domain value in step 7

-R = R

- after m/2 iterations
  - Domain value is −<sup>λ</sup>

Algorithm 3. Radix-4 randomized Montgomery multiplication **Input:** X, Y, p, and  $\alpha$ **Output:** R = RMM(X, Y)1. Let V = X, R = 0, S = Y2. For *i* from 0 to  $\left\lceil \frac{m}{2} \right\rceil - 1$  do If  $m \pmod{2} \equiv 1$  and  $i = \left\lceil \frac{m}{2} \right\rceil - 1$  then 3.  $R \equiv R + V_0 \cdot S \pmod{p}, V = \frac{V}{2}$ else 4.  $R \equiv R + V_0 \cdot S + V_1 \cdot 2S \pmod{p}, \ V = \frac{V}{4}$ If  $(\alpha_{2i+1}, \alpha_{2i}) = (1, 1)$  then 5. $R \equiv \frac{R}{4} \pmod{p}$ 6. else if  $(\alpha_{2i+1}, \alpha_{2i}) = (1, 0)$  or (0, 1) then  $R \equiv \frac{R}{2} \pmod{p}, S \equiv 2S \pmod{p}$ else 7.  $S \equiv 4S \pmod{p}$ 8. Return R

### **Proposed Randomized Montgomery Algorithm**

fixed

28.

29. Return R

randomized

### Radix-4 RMD

- if  $(\alpha_{i+1}, \alpha_i) = (11)$ 
  - increase domain value by 2 in step 24

-R = 4R

- if 
$$(\alpha_{i+1}, \alpha_i) = (10)$$
 or  $(01)$ 

increase domain value by 1 in step 25

-R = 4R/2

$$- \text{ if } (\alpha_{i+1}, \alpha_i) = (00)$$

remain domain value in step 26

-R = 4R/4

- after m/2 iterations
  - domain value is  $\lambda$

Algorithm 5. Radix-4 randomized Montgomery division **Input:** X, Y, p, and  $\alpha$ Output: R = RMD(X, Y)1. Let U = p, V = Y, R = 0, S = X, i = 02. While (V > 0) do  $c \equiv U \pmod{4}, d \equiv V \pmod{4}, t = 2$ 3. 4. If i = m - 1 then  $R \equiv 2R \pmod{p}, S \equiv 2S \pmod{p}, t = 1$ else if c = 0 then  $U = \frac{U}{4}, S \equiv 4S \pmod{p}$ 5.else if d = 0 then  $V = \frac{V}{4}, R \equiv 4R \pmod{p}$ 6. else if c = d then 7. If U > V then  $U = \frac{U-V}{4}$ 8.  $R \equiv R - S \pmod{p}, S \equiv 4S \pmod{p}$ else  $V = \frac{V - U}{4},$ 9.  $S \equiv S - R \pmod{p}, R \equiv 4R \pmod{p}$ else if c = 2 then 10.If  $\frac{U}{2} > V$  then  $U = \frac{\frac{U}{2} - V}{2}$ ,  $R \equiv R - 2S \pmod{p}, S \equiv 4S \pmod{p}$ 11. else  $V = \frac{V - \frac{U}{2}}{2}, U = \frac{U}{2}$ , 12. $S \equiv 2S - \tilde{R} \pmod{\tilde{p}}, R \equiv 2R \pmod{p}$ else if d = 2 then 13.If  $U > \frac{V}{2}$  then  $U = \frac{U - \frac{V}{2}}{2}, V = \frac{V}{2}$ ,  $R \equiv 2R - S \pmod{p}, S \equiv 2S \pmod{p}$ 14. else  $V = \frac{V}{2} - U$ 15. $S \equiv S - 2\tilde{R} \pmod{p}, R \equiv 4R \pmod{p}$ 16.elseIf U > V then  $U = \frac{U-V}{2}$ , 17.  $R \equiv R - S \pmod{p}, S \equiv 2S \pmod{p}, t = 1$ else  $V = \frac{V-U}{2}$ 18.  $S \equiv S - R$  (mod p),  $R \equiv 2R$  (mod p), t = 119.If i < m then 20.If i = m - 1 or t = 1 then If  $\alpha_i \equiv 1$  then  $R \equiv R \pmod{p}$ ,  $S \equiv S \pmod{p}$ 21.22.else  $R \equiv \frac{R}{2} \pmod{p}, S \equiv \frac{S}{2} \pmod{p}$ 23.else 24.If  $(\alpha_{i+1}, \alpha_i) = (1, 1)$  then  $R \equiv R \pmod{p}, S \equiv S \pmod{p}$ 25.else if  $(\alpha_{i+1}, \alpha_i) = (1, 0)$  or (0, 1) then  $R \equiv \frac{R}{2} \pmod{p}, S \equiv \frac{S}{2} \pmod{p}$ 26.

else  $R \equiv \frac{R}{4} \pmod{p}, S \equiv \frac{S}{4} \pmod{p} \stackrel{?}{1} \stackrel{?}{2}$ 27.i = i + telse  $R \equiv \frac{R}{2^t} \pmod{p}, S \equiv \frac{S}{2^t} \pmod{p}$ 

### **Hardware Architecture of DF-ECC Processor**



Fig. 2. Overall diagram for the DF-ECC processor.



Ring-oscillator based RNG

- 1. portable applications
- 2. resolve reset problem

Fig. 3. The domain flag is to randomly assign operating domain for GFAU.

DF-ECC processor

- 1. Galois field arithmetic unit (GFAU)
- 2. instant domain conversion
  - (RMD(a,1) = A, RMM(A, 1) = a)
- 3. CPA countermeasure circuit

### **Hardware Architecture of DF-ECC Processor**

Radix-2 GFAU



fully-pipelining to remove path (1)
 multiplier is shared in gray color

# **Verification and Measurement**

#### FPGA device



Fig. 7. (a) Environment of power measurement. (b) Current running through the DF-ECC processor recorded by measuring the voltage drop via a resistor in series with the board power pin and FPGA power pin.

| Design | Area (Slices) | $f_{\max}$ (MHz) | Field Arithmetic              |
|--------|---------------|------------------|-------------------------------|
| Ι      | 7,573 (32%)   | 27.7             | Radix-2 Montgomery            |
| II     | 8,158 (34%)   | 27.7             | Radix-2 Randomize Montgomery  |
| II     | 9,828 (41%)   | 20.2             | Radix-4 Montgomery            |
| IV     | 10,460 (43%)  | 20.2             | Radix-2 Randomized Montgomery |

#### Table 3. FPGA Implementation Results

### **Power Analysis**



Fig. 9. Correlation coefficients of the target traces and power model over power traces obtained from the (L) Design-II (R) Design-IV performing arithmetic in a randomized domain.

# **Performance and Comparison**

| Table 4. Implementation Results Compared with Related Works |                 |          |                       |                 |                           |                   |                     |               |
|-------------------------------------------------------------|-----------------|----------|-----------------------|-----------------|---------------------------|-------------------|---------------------|---------------|
|                                                             | CMOS<br>Process | Length   | Area (mm²)/<br>KGates | Finite<br>Field | f <sub>max</sub><br>(MHz) | Time(ms/<br>ECSM) | Energy<br>(µJ/ECSM) | AT<br>Product |
| Ours (Radix-2)                                              | 90-nm           | 160      | 0.21/61.3             | $GF(p_{160})$   | 277                       | 0.71              | 11.9                | 1             |
|                                                             |                 |          |                       | $GF(2^{160})$   | 277                       | 0.61              | 9.6                 | 1             |
| Ours (Radix-4)                                              | 90-nm           | 1.00     | 0.29/83.2             | $GF(p_{160})$   | 238                       | 0.43              | 11.2                | 0.82          |
|                                                             |                 | 160      |                       | $GF(2^{160})$   | 238                       | 0.39              | 8.97                | 0.87          |
| TCAS-II'09 [5]                                              | 130-nm          | 160      | 1.44/169              | $GF(p_{160})$   | 121                       | 0.61              | 42.6                | 1.63*         |
|                                                             |                 |          |                       | $GF(2^{160})$   | 146                       | 0.37              | 30.5                | 1.16*         |
| Ours (Radix-2)                                              | 90-nm 521       | 501      | 0.50/1.60             | $GF(p_{521})$   | 250                       | 8.08              | 452                 | 1             |
|                                                             |                 | 0.58/168 | $GF(2^{409})$         | 263             | 4.65                      | 246               | 1                   |               |
| Ours (Radix-4)                                              | 90-nm 521       | 501      | 0.93/265              | $GF(p_{521})$   | 232                       | 4.57              | 435                 | 0.89          |
|                                                             |                 | 521      |                       | $GF(2^{409})$   | 238                       | 2.77              | 238                 | 0.94          |
| ESSCIRC'10 [9]                                              | 90-nm 521       | 501      | 0.55/170              | $GF(p_{521})$   | 132                       | 19.2              | 1,123               | 2.40          |
|                                                             |                 | 521      |                       | $GF(2^{409})$   | 166                       | 8.2               | 480                 | 1.78          |

\* Technology scaled area-time product = gates  $\times$  (time  $\times f$ ), where f = 90-nm/130-nm.

# **Performance and Comparison**

|                                                                                                                                             | Ours (Radix-2) | Ours (Radix-4) | ESSCIRC'10 [9]     | JSSC'06 [12] | JSSC'10 [13] |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------|----------------|----------------|--------------------|--------------|--------------|--|--|
| Design                                                                                                                                      | 521 DF-ECC     | 521 DF-ECC     | 521 DF-ECC         | 128 AES      | 128AES       |  |  |
| Area                                                                                                                                        | 4.3%           | 3.6%           | 10%                | 210%         | 7.2%         |  |  |
| Time                                                                                                                                        | 0              | 0              | 14.0% a            | 288%         | 100%         |  |  |
| Energy                                                                                                                                      | 5.2%           | 3.8%           | 20.8% <sup>b</sup> | 270%         | 33%          |  |  |
| $Overhead = \frac{\text{Result differences between protected and unprotected circuit}}{\text{Results of unprotected circuit}} \times 100\%$ |                |                |                    |              |              |  |  |

#### Table 5. Overhead for CPA Resistance

a. Estimated by cycle count  $\times$  clock period.

b. Estimated by operation time  $\times$  average power.

## Conclusion

- An efficient CPA-resistant DF-ECC processor supporting arbitrary modulus is presented
  - no need to modify ASIC or FPGA design flow
  - applicable to IEEE P1363
  - low overhead (< 5%) for hardware speed, area, power</li>



# **Thanks for Your Attention!**

## References

- [1] Koblitz, N.: Elliptic Curve Cryptosystems. Math. Comp., 2001
- [2] Miller, V.: Uses of Elliptic Curves in Cryptography. CRYPTO'85, 1986
- [3] McIvor, C. J. et al: Hardware Elliptic Curve Cryptographic Processor over GF(p). IEEE Trans. Circuits Syst. I, 2006
- [4] Sakiyama, K. et al: Multicore Curve-Based Cryptoprocessor With Recon-figurable Modular Arithmetic Logic Units over GF(2<sup>n</sup>). IEEE Trans. Comput., 2007
- [5] Lai, J.-Y., Huang, C.-T.: A Highly Efficient Cipher Processor for Dual-Field Elliptic Curve Cryptography. IEEE Trans. Circuits Syst. II, 2009
- [6] Chen, J.-H. et al : A High-Performance Unified-Field Reconfigurable Cryptographic Processor. IEEE Trans. VLSI Syst., 2010
- [7] Kocher, P., Jaffe, J., Jun, B.: Differential Power Analysis. CRYPTO'99, 1999
- [8] Montgomery, P.: Speeding the Pollard and Elliptic Curve Methods of Factorization. Math. Comp., 1987
- [9] Lee, J.-W. et al : A 521-bit Dual-Field Elliptic Curve Cryptographic Processor With Power Analysis Resistance. ESSCIRC'10, 2010
- [10] Brier, E., Clavier, C., Olivier, F.: Correlation Power Analysis With a Leakage Model. CHES'04, 2004
- [11] IEEE: Standard Specifications or Public-Key Cryptography. IEEE Std. 1363, 2000
- [12] Hwang, D. et al: AES-Based Security Coprocessor IC in 0.18-µm CMOS With Resistance to Differential Power Analysis Side-Channel Attacks. IEEE J. Solid-State Circuits, 2006
- [13] Tokunaga, C., Blaauw, D.: Securing Encryption Systems With a Switched Capacitor Current Equalizer. IEEE J. Solid-State Circuits, 2010[14] Liu, P.-C. et al: A True Random-Based Differential Power Analysis Countermeasure Circuit for an AES Engine. IEEE Trans. Circuits Syst. II, 2012
- [15] Coron, J.: Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems. CHES'99, 1999
- [16] Joye, M., Tymen, C.: Protections against Differential Analysis for Elliptic Curve Cryptography An Algebraic Approach. CHES'01, 2001
- [17] Montgomery, P.: Modular Multiplication Without Trial Division. Math. Comp., 1985
- [18] Kaliski, B.: The Montgomery Inverse and Its Applications. IEEE Trans. Comput., 1995
- [19] Cohen, H., Miyaji, A., Ono, T.: Efficient Elliptic Curve Exponentiation Using Mixed Coordinates. ASIACRYPT'98, 1998
- [20] Golic, J.D.: New Methods for Digital Generation and Postprocessing of Random Data. IEEE Trans. Comp., 2006
- [21] Chen, Y.-L. et al: A Dual-Field Elliptic Curve Cryptographic Processor With a Radix-4 Unified Division Unit. ISCAS'11, 2011

## References

[HWANG'06] D. Hwang, et al., "AES-Based Security Coprocessor IC in 0.18-µm CMOS With Resistance to Differential Power Analysis Side-Channel Attacks," IEEE J. Solid-State Circuits, 2006

[SAEKI'09] M. Saeki, D. Suzuki, K. Shimizu, and A. Satoh, "A design methodology for a DPA-resistant cryptographic LSI with RSL techniques," in Cryptographic Hardware and Embedded Systems (CHES'09), vol. 5747, 2009, pp. 189–204.

[CORON'99] J. Coron, "Resistance against Differential Power Analysis for Elliptic Curve Cryptosystems," in Cryptographic Hardware and Embedded Systems (CHES'99), 1999

[ITOH'03] K. Itoh, T. Izu, and M. Takenaka, "A practical countermeasure against address-Bit differential power analysis," in Cryptographic Hardware and Embedded Systems (CHES'03), vol. 2779, 2003, pp. 382–396.

[JOYE'01] M. Joye and C. Tymen, "Protections against differential analysis for elliptic curve cryptography – an algebraic approach," in Cryptographic Hardware and Embedded Systems (CHES'01), vol. 2162, 2001, pp. 377–390.

[CORON'09] J.-S. Coron and I. Kizhvatov, "An efficient method for random delay generation in embedded software," in Cryptographic Hardware and Embedded Systems (CHES'09), vol. 5747, 2009, pp. 156–170.