Compact Implementations of BLAKE-32 and BLAKE-64 on FPGA
We propose compact architectures of the SHA-$3$ candidates BLAKE-32 and BLAKE-64 for several FPGA families. We harness the intrinsic parallelism of the algorithm to interleave the computation of four instances of the $G_i$ function. This approach allows us to design an Arithmetic and Logic Unit with four pipeline stages and to achieve high clock frequencies. With careful scheduling, we completely avoid pipeline bubbles. For the time being, the designs presented in this work are the most compact ones for any of the SHA-3 candidates. We show for instance that a fully autonomous implementation of BLAKE-32 on a Xilinx Virtex-5 device requires 56 slices and two memory blocks.
A Compact FPGA Implementation of the SHA-3 Candidate ECHO
We propose a compact architecture of the SHA-3 candidate ECHO for the Virtex-5 FPGA family. Our architecture is built around a 8-bit datapath. We show that a careful organization of the chaining variable and the message block in the register file allows one to design a compact control unit based on a 4-bit counter, an 8-bit counter, and a simple Finite State Machine. A fully autonomous implementation of ECHO on a Xilinx Virtex-5 FPGA requires $127$ slices and a single memory block to store the internal state, and achieves a throughput of $72$Mbps.