Tri-State Circuits: A Better Model of Computation for Garbling
We introduce tri-state circuits (TSCs). TSCs form a natural model of computation that, to our knowledge, has not been considered by theorists. The model captures a surprising combination of simplicity and power. TSCs are simple in that they allow only three wire values (0,1, and undefined -- Z) and three types of fan-in two gates; they are powerful in that their statically placed gates fire (execute) eagerly as their inputs become defined, implying orders of execution that depend on input. This behavior is sufficient to efficiently evaluate RAM programs. We construct a TSC that emulates T steps of any RAM program and that has only $O(T log^3 T log log T)$ gates. Contrast this with the reduction from RAM to Boolean circuits, where the best approach scans all of memory on each access, incurring quadratic cost. We connect TSCs with Garbled Circuits (GC). TSCs capture the power of garbling far better than Boolean Circuits, offering a more expressive model of computation that leaves per-gate cost essentially unchanged. As an important application, we construct authenticated Garbled RAM (GRAM), enabling constant-round maliciously-secure 2PC of RAM programs. Let $\lambda$ denote the security parameter. We extend authenticated garbling to TSCs; by simply plugging in our TSC-based RAM, we obtain authenticated GRAM running at cost $O(T log^3 T log log T \lambda)$, outperforming all prior work, including prior semi-honest GRAM. We also give semi-honest garbling of TSCs from a one-way function (OWF). This yields OWF-based GRAM at cost $O(T log^3 T log log T \lambda)$, outperforming the best prior OWF-based GRAM by more than factor $\lambda$.
Garbled Circuits With Sublinear Evaluator 📺
A recent line of work, Stacked Garbled Circuit (SGC), showed that Garbled Circuit (GC) can be improved for functions that include conditional behavior. SGC relieves the communication bottleneck of 2PC by only sending enough garbled material for a single branch out of the $b$ total branches. Hence, communication is sublinear in the circuit size. However, both the evaluator and the generator pay in computation and perform at least factor $\log b$ extra work as compared to standard GC evaluation. We extend the sublinearity of SGC to also include the work performed by the GC Evaluator E; thus we achieve a fully sublinear E, which is essential when optimizing for the online phase. We formalize our approach as a garbling scheme called GCWise: GC WIth Sublinear Evaluator. We show one attractive and immediate application, Garbled PIR, a primitive that marries GC with Private Information Retrieval. Garbled PIR allows the GC to non-interactively and sublinearly access a privately indexed element from a publicly known database, and then use this element in continued GC evaluation.
EpiGRAM: Practical Garbled RAM 📺 ★
Garbled RAM (GRAM) is a powerful technique introduced by Lu and Ostrovsky that equips Garbled Circuit (GC) with a sublinear cost RAM without adding rounds of interaction. While multiple GRAM constructions are known, none are suitable for practice, due to costs that have high constants and poor scaling. We present the first GRAM suitable for practice. For computational security parameter $\kappa$ and for a size-$n$ RAM that stores blocks of size $w = \Omega(\log^2 n)$ bits, our GRAM incurs only amortized $O(w \cdot \log^2 n \cdot \kappa)$ communication and computation per access. We evaluate the concrete cost of our GRAM; our approach outperforms trivial linear-scan-based RAM for as few as $512$ $128$-bit elements.
LogStack: Stacked Garbling with O(b log b) Computation 📺
Secure two party computation (2PC) of arbitrary programs can be efficiently achieved using garbled circuits (GC). Until recently, it was widely believed that a GC proportional to the entire program, including parts of the program that are entirely discarded due to conditional branching, must be transmitted over a network. Recent work shows that this belief is false, and that communication proportional only to the longest program execution path suffices (Heath and Kolesnikov, CRYPTO 20, [HK20a]). Although this recent work reduces needed communication, it increases computation. For a conditional with b branches, the players use O(b^2) computation (traditional GC uses only O(b)). Our scheme LogStack reduces stacked garbling computation from O(b^2) to O(b log b) with no increase in communication over [HK20a]. The cause of [HK20a]'s increased computation is the oblivious collection of garbage labels that emerge during the evaluation of inactive branches. Garbage is collected by a multiplexer that is costly to generate. At a high level, we redesign stacking and garbage collection to avoid quadratic scaling. Our construction is also more space efficient: [HK20a] algorithms require O(b) space, while ours use only O(log b) space. This space efficiency allows even modest setups to handle large numbers of branches. [HK20a] assumes a random oracle (RO). We track the source of this need, formalize a simple and natural added assumption on the base garbling scheme, and remove reliance on RO: LogStack is secure in the standard model. Nevertheless, LogStack can be instantiated with typical GC tricks based on non-standard assumptions, such as free XOR and half-gates, and hence can be implemented with high efficiency. We implemented LogStack (in the RO model, based on half-gates garbling) and report performance. In terms of wall-clock time and for fewer than 16 branches, our performance is comparable to [HK20a]'s; for larger branching factors, our approach clearly outperforms [HK20a]. For example, given 1024 branches, our approach is 31x faster.
Masked Triples: Amortizing Multiplication Triples across Conditionals 📺
A classic approach to MPC uses preprocessed multiplication triples to evaluate arbitrary Boolean circuits. If the target circuit features conditional branching, e.g. as the result of a IF program statement, then triples are wasted: one triple is consumed per AND gate, even if the output of the gate is entirely discarded by the circuit’s conditional behavior. In this work, we show that multiplication triples can be re-used across conditional branches. For a circuit with b branches, each having n AND gates, we need only a total of n triples, rather than the typically required bn. Because preprocessing triples is often the most expensive step in protocols that use them, this significantly improves performance. Prior work similarly amortized oblivious transfers across branches in the classic GMW protocol (Heath et al., Asiacrypt 2020, [HKP20]). In addition to demonstrating conditional improvements are possible for a different class of protocols, we also concretely improve over [HKP20]: their maximum improvement is bounded by the topology of the circuit. Our protocol yields improvement independent of topology: we need triples proportional to the size of the program’s longest execution path, regardless of the structure of the program branches. We implemented our approach in C++. Our experiments show that we significantly improve over a "naive" protocol and over prior work: for a circuit with 16 branches and in terms of total communication, we improved over naive by 12x and over [HKP20] by an average of 2.6x. Our protocol is secure against the semi-honest corruption of p-1 parties.
Garbling, Stacked and Staggered: Faster k-out-of-n Garbled Function Evaluation 📺
Stacked Garbling (SGC) is a Garbled Circuit (GC) improvement that efficiently and securely evaluates programs with conditional branching. SGC reduces bandwidth consumption such that communication is proportional to the size of the single longest program execution path, rather than to the size of the entire program. Crucially, the parties expend increased computational effort compared to classic GC. Motivated by procuring a subset in a menu of computational services or tasks, we consider GC evaluation of k-out-of-n branches, whose indices are known (or eventually revealed) to the GC evaluator E. Our stack-and-stagger technique amortizes GC computation in this setting. We retain the communication advantage of SGC, while significantly improving computation and wall-clock time. Namely, each GC party garbles (or evaluates) the total of n branches, a significant improvement over the O(nk) garblings/evaluations needed by standard SGC. We present our construction as a garbling scheme. Our technique brings significant overall performance improvement in various settings, including those typically considered in the literature: e.g. on a 1Gbps LAN we evaluate 16-out-of-128 functions ~7.68x faster than standard stacked garbling.
PrORAM: Fast O(log n) Authenticated Shares ZK ORAM 📺
We construct a concretely efficient Zero Knowledge (ZK) Oblivious RAM (ORAM) for ZK Proof (ZKP) systems based on authenticated sharings of arithmetic values. It consumes 2logn oblivious transfers (OTs) of length-2sigma secrets per access of an arithmetic value, for statistical security parameter sigma and array size n. This is an asymptotic and concrete improvement over previous best (concretely efficient) ZK ORAM BubbleRAM of Heath and Kolesnikov ([HK20a], CCS 2020), whose access cost is 1/2 log^2 n OTs of length-2sigma secrets. ZK ORAM is essential for proving statements that are best expressed as RAM programs, rather than Boolean or arithmetic circuits. Our construction is private-coin ZK. We integrate it with [HK20a]’s ZKP protocol and prove the resulting ZKP system secure. We implemented PrORAM in C++. Compared to the state-of-the-art BubbleRAM, our PrORAM is ~10x faster for arrays of size 2^20 of 40-bit values.
Stacked Garbling for Disjunctive Zero-Knowledge Proofs 📺
Zero-knowledge (ZK) proofs (ZKP) have received wide attention, focusing on non-interactivity, short proof size, and fast verification time. We focus on the fastest total proof time, in particular for large Boolean circuits. Under this metric, Garbled Circuit (GC)-based ZKP (Jawurek et al., [JKO], CCS 2013) remained the state-of-the-art technique due to the low-constant linear scaling of computing the garbling. We improve GC-ZKP for proof statements with conditional clauses. Our communication is proportional to the longest branch rather than to the entire proof statement. This is most useful when the number m of branches is large, resulting in up to factor m× improvement over JKO. In our proof-of-concept illustrative application, prover P demonstrates knowledge of a bug in a codebase consisting of any number of snippets of actual C code. Our computation cost is linear in the size of the code- base and communication is constant in the number of snippets. That is, we require only enough communication for a single largest snippet! Our conceptual contribution is stacked garbling for ZK, a privacy-free circuit garbling scheme that can be used with the JKO GC-ZKP protocol to construct more efficient ZKP. Given a Boolean circuit C and computational security parameter k, our garbling is L · k bits long, where L is the length of the longest execution path in C. All prior concretely efficient garbling schemes produce garblings of size |C| · k. The computational cost of our scheme is not increased over prior state-of-the-art. We implement our GC-ZKP and demonstrate significantly improved (m× over JKO) ZK performance for functions with branching factor m. Compared with recent ZKP (STARK, Libra, KKW, Ligero, Aurora, Bulletproofs), our scheme offers much better proof times for larger circuits (35-1000× or more, depending on circuit size and compared scheme). For our illustrative application, we consider four C code snippets, each of about 30-50 LOC; one snippet allows an invalid memory dereference. The entire proof takes 0.15 seconds and communication is 1.5 MB.
Stacked Garbling: Garbled Circuit Proportional to Longest Execution Path 📺
Secure two party computation (2PC) of arbitrary programs can be efficiently achieved using garbled circuits (GC). The bottleneck of GC efficiency is communication. It is widely believed that it is necessary to transmit the entire GC during 2PC, even for conditional branches that are not taken. This folklore belief is false. We propose a novel GC technique, stacked garbling, that eliminates the communication cost of inactive conditional branches. We extend the ideas of conditional GC evaluation explored in (Kolesnikov, Asiacrypt 18) and (Heath and Kolesnikov, Eurocrypt 20). Unlike these works, ours is for general 2PC where no player knows which conditional branch is taken. Our garbling scheme, Stack, requires communication proportional to the longest execution path rather than to the entire circuit. Stack is compatible with state-of-the-art techniques, such as free XOR and half-gates. Stack is a garbling scheme. As such, it can be plugged into a variety of existing protocols, and the resulting round complexity is the same as that of standard GC. The approach does incur computation cost quadratic in the conditional branching factor vs linear in standard schemes, but the tradeoff is beneficial for most programs: GC computation even on weak hardware is faster than GC transmission on fast channels. We implemented Stack in C++. Stack reduces communication cost by approximately the branching factor: for 16 branches, communication is reduced by 10.5x. In terms of wall-clock time for circuits with branching factor 16 over a 50 Mbps WAN on a laptop, Stack outperforms state-of- the-art half-gates-based 2PC by more than 4x.
MOTIF: (Almost) Free Branching in GMW via Vector-Scalar Multiplication 📺
MPC functionalities are increasingly specified in high-level languages, where control-flow constructions such as conditional statements are extensively used. Today, concretely efficient MPC protocols are circuit-based and must evaluate all conditional branches at high cost to hide the taken branch. The Goldreich-Micali-Wigderson, or GMW, protocol is a foundational circuit-based technique that realizes MPC for p players and is secure against up to p-1 semi-honest corruptions. While GMW requires communication rounds proportional to the computed circuit’s depth, it is effective in many natural settings. Our main contribution is MOTIF (Minimizing OTs for IFs), a novel GMW extension that evaluates conditional branches almost for free by amortizing Oblivious Transfers (OTs) across branches. That is, we simultaneously evaluate multiple independent AND gates, one gate from each mutually exclusive branch, by representing them as a single cheap vector-scalar multiplication (VS) gate. For 2PC with b branches, we simultaneously evaluate up to b AND gates using only two 1-out-of-2 OTs of b-bit secrets. This is a factor ~b improvement over the state-of-the-art 2b 1-out-of-2 OTs of 1-bit secrets. Our factor b improvement generalizes to the multiparty setting as well: b AND gates consume only p(p ? 1) 1-out-of-2 OTs of b-bit secrets. We implemented our approach and report its performance. For 2PC and a circuit with 16 branches, each comparing two length-65000 bitstrings, MOTIF outperforms standard GMW in terms of communication by ~9.4x. Total wall-clock time is improved by 4.1 - 9.2x depending on network settings. Our work is in the semi-honest model, tolerating all-but-one corruptions.