International Association for Cryptologic Research

# IACR News Central

Here you can see all recent updates to the IACR webpage. These updates are also available:

Now viewing news items related to:

20 May 2019
Key encapsulation mechanism (KEM) variants of the Fujisaki-Okamoto (FO) transformation (CRYPTO 1999 and Journal of Cryptology 2013) that turn a weakly-secure public-key encryption (PKE) into an IND-CCA-secure KEM, were proposed by Hofheinz, Hoevelmanns and Kiltz (TCC 2017) and widely used among the KEM submissions to the NIST Post-Quantum Cryptography Standardization Project. The security reductions for these variants in the quantum random oracle model (QROM) were given by Hofheinz, Hoevelmanns and Kiltz (TCC 2017) and Jiang et al. (Crypto 2018). However, under standard CPA security assumptions, i.e., OW-CPA and IND-CPA, all these security reductions are far from desirable due to the quadratic security loss.

In this paper, for KEM variants of the FO transformation, we show that a typical measurement-based reduction in the QROM from breaking standard OW-CPA (or IND-CPA) security of the underlying PKE to breaking the IND-CCA security of the resulting KEM, will inevitably incur a quadratic loss of the security, where measurement-based" means the reduction measures a hash query from the adversary and uses the measurement outcome to break the underlying security of PKE. In particular, all currently known security reductions in (TCC 2017 and Crypto 2018) are of this type, and our results suggest an explanation for the lack of progress in improving the reduction tightness in terms of the degree of security loss. We emphasize that our results do not expose any post-quantum security weakness of KEM variants of FO transformation.
The purpose of this paper is to provide a comprehensive analysis and side-by-side comparison of the noise growth behaviour in the BGV and FV somewhat homomorphic encryption schemes, both heuristically and in their implementations in the libraries HElib and SEAL, respectively. We run extensive experiments in HElib and SEAL to com- pare the heuristic noise growth to the noise growth in practice. From the experiments, we observe that for both schemes, the heuristic bounds are not tight. We attempt to improve the tightness of the bounds in a num- ber of ways, including the definition of new notions of noise, such as the invariant noise for BGV and the scaled inherent noise for FV. This does not significantly tighten the bounds, thus we conclude that the current heuristic bounds are the best possible in terms of a theoretical analysis. As an additional contribution, we update the comparison between the two schemes presented by Costache and Smart [22], and find that BGV has a slight advantage over FV. Thus, the conclusions of [22] still hold, although the differences between BGV and FV are less dramatic.
There is a well-known gap between second-preimage resistance and preimage resistance for length-preserving hash functions. This paper introduces a simple concept that fills this gap. One consequence of this concept is that tight reductions can remove interactivity for multi-target length-preserving preimage problems, such as the problems that appear in analyzing hash-based signature systems. Previous reduction techniques applied to only a negligible fraction of all length-preserving hash functions, presumably excluding all off-the-shelf hash functions.
ePrint Report Best Information is Most Successful Eloi de Cherisey, Sylvain Guilley, Olivier Rioul, Pablo Piantanida
Using information-theoretic tools, this paper establishes a mathematical link between the probability of success of a side-channel attack and the minimum number of queries to reach a given success rate, valid for any possible distinguishing rule and with the best possible knowledge on the attacker's side. This link is a lower bound on the number of queries highly depends on Shannon's mutual information between the traces and the secret key. This leads us to derive upper bounds on the mutual information that are as tight as possible and can be easily calculated. It turns out that, in the case of an additive white Gaussian noise, the bound on the probability of success of any attack is directly related to the signal to noise ratio. This leads to very easy computations and predictions of the success rate in any leakage model.
This work presents 2 sigma protocols with helper to prove knowledge of:

-A solution to a system of quadratic polynomials

-A solution to an instance of the Permuted Kernel Problem

We then remove the helper from the protocol with a "cut-and-choose" protocol and we apply the Fiat-Shamir transform to obtain signature schemes with security proof in the QROM. We show that the resulting signature schemes, which we call the "MUltivarite quaDratic FIat-SHamir" scheme (MUDFISH) and the "ShUffled Solution to Homogeneous linear SYstem FIat-SHamir" scheme (SUSHSYFISH), are more efficient than existing signatures based on the MQ problem and the Permuted Kernel Problem. We also leverage the ZK-proof for PKP to improve the efficiency of Stern-like Zero Knowledge proofs for lattice statements.
ePrint Report Memory-Efficient High-Speed Implementation of Kyber on Cortex-M4 Leon Botros, Matthias J. Kannwischer, Peter Schwabe
This paper presents an optimized software implementation of the module-lattice-based key-encapsulation mechanism Kyber for the ARM Cortex-M4 microcontroller. Kyber is one of the round-2 candidates in the NIST post-quantum project. In the center of our work are novel optimization techniques for the number-theoretic transform (NTT) inside Kyber, which make very efficient use of the computational power offered by the “vector” DSP instructions of the target architecture. We also present results for the recently updated parameter sets of Kyber which equally benefit from our optimizations. As a result of our efforts we present software that is 18% faster than an earlier implementation of Kyber optimized for the Cortex-M4 by the Kyber submitters. Our NTT is more than twice as fast as the NTT in that software. Our software runs at about the same speed as the latest speed-optimized implementation of the other module-lattice based round-2 NIST PQC candidate Saber. However, for our Kyber software, this performance is achieved with a much smaller RAM footprint. Kyber needs less than half of the RAM of what the considerably slower RAM-optimized version of Saber uses. Our software does not make use of any secret-dependent branches or memory access and thus offers state-of-the-art protection against timing attacks
Enigma 2000 (E2K) is a cipher that updates the World War II-era Enigma Machine for the twenty-first century. Like the original Enigma, E2K is intended to be computed by an offline device; this prevents side channel attacks and eavesdropping by malware. Unlike the original Enigma, E2K uses modern cryptographic algorithms; this provides secure encryption. E2K is intended for encrypted communication between humans only, and therefore it encrypts and decrypts plaintexts and ciphertexts consisting only of the English letters A through Z plus a few other characters. E2K uses a nonce in addition to the secret key, and requires that different messages use unique nonces. E2K performs authenticated encryption, and optional header data can be included in the authentication. This paper defines the E2K encryption and decryption algorithms, analyzes E2K’s security, and describes an encryption appliance based on the Raspberry Pi computer for doing E2K encryptions and decryptions offline.
19 May 2019
We present a new generic construction of multi-client functional encryption (MCFE) for inner products from single-input functional inner-product encryption and standard pseudorandom functions. In spite of its simplicity, the new construction supports labels, achieves security in the standard model under adaptive corruptions, and can be instantiated from the plain DDH, LWE, and Paillier assumptions. Prior to our work, the only known constructions required discrete-log-based assumptions and the random-oracle model. Since our new scheme is not compatible with the compiler from Abdalla et al. (PKC 2019) that decentralizes the generation of the functional decryption keys, we also show how to modify the latter transformation to obtain a decentralized version of our scheme with similar features.
One of Bitcoin’s core security guarantees is that, for an attacker to be able to successfully interfere with the Bitcoin network and reverse transactions, they need to control 51% of total hash power. Eyal et al., however, significantly reduces Bitcoin’s security guarantee by introducing another type of attack, called "Selfish Mining". The key idea behind selfish mining is for a miner to keep its discovered blocks private, thereby intentionally forking the chain. As a result of a selfish mining attack, even a miner with 25% of the computation power can bias the agreed chain with its blocks. After Eyal's original paper, the concept of selfish mining has been actively studied within the Bitcoin community for several years. This paper studies a fundamental problem regarding the selfish mining strategy under the existence of mining pools. For this, we propose a new attack strategy, called "Detective Mining", and show that selfish mining pool is not profitable anymore when other miners use our strategy.
13 May 2019
ePrint Report A taxonomy of pairings, their security, their complexity Razvan Barbulescu, Nadia El Mrabet, Loubna Ghammam
A recent NFS attack against pairings made it necessary to increase the key sizes of the most popular families of pairings : BN, BLS12, KSS16, KSS18 and BLS24. The attack applies to other families of pairings but not to all. In this paper we compute the key sizes required for more than 150 families of pairings to verify if there are any other families which are better than BN. The security estimation is not straightforward because it is not a mathematical formula, but rather one has to instantiate the Kim-Barbulescu attack by proposing polynomials and parameters.

After estimating the practical security of an extensive list of families, we compute the complexity of the optimal Ate pairing at 128 and 192 bits of security. For some of the families the optimal Ate has never been studied before. We show that a number of families of embedding degree 9, 14 and 15 are very competitive with $BN$, $BLS12$ and $KSS16$ at 128 bits of security. We identify a set of candidates for 192 bits and 256 bits of security.
ePrint Report New Number-Theoretic Cryptographic Primitives Eric Brier, Houda Ferradi, Marc Joye, David Naccache
This paper introduces new p^rq-based one-way functions and companion signature schemes. The new signature schemes are interesting because they do not belong to the two common design blueprints, which are the inversion of a trapdoor permutation and the Fiat-Shamir transform. In the basic signature scheme, the signer generates multiple RSA-like moduli n_i = p_i 2q_i and keeps their factors secret. The signature is a bounded-size prime whose Jacobi symbols with respect to the ni's match the message digest. The generalized signature schemes replace the Jacobi symbol with higher-power residue symbols. The case of 8th-power residue symbols is fully detailed along with an efficient implementation thereof. Given of their very unique design the proposed signature schemes seem to be overlooked missing species in the corpus of known signature algorithms
Motivated by the application of delegating computation, we revisit the design of filter permutators as a general approach to build stream ciphers that can be efficiently evaluated in a fully homomorphic manner. We first introduce improved filter permutators that allow better security analyses, instances and implementations than the previously proposed FLIP family of stream ciphers. We also put forward the similarities between these improved constructions and a popular PRG design by Goldreich. Then, we exhibit the relevant cryptographic parameters of two families of Boolean functions, direct sums of monomials and XOR-MAJ functions, which give candidates to instantiate the improved filter permutator paradigm. We develop new Boolean functions techniques to study them, and refine Goldreich's PRG locality bound for this purpose. We give an asymptotic analysis of the noise level of improved filter permutators instances using both kind of functions, and recommend them as good candidates for evaluation with a third-generation FHE scheme. Finally, we propose a methodology to evaluate the performance of such symmetric cipher designs in a FHE setting, which primarily focuses on the noise level of the symmetric ciphertexts (hence on the amount of operations on these ciphertextsthat can be homomorphically evaluated). Evaluations performed with HElib show that instances of improved filter permutators using direct sums of monomials as filter outperform all existing ciphers in the literature based on this criteria. We also discuss the (limited) overheads of these instances in terms of latency and throughput.
ePrint Report Tiny WireGuard Tweak Jacob Appelbaum, Chloe Martindale, Peter Wu
We show that a future adversary with access to a quantum computer, historic network traffic protected by WireGuard, and knowledge of a WireGuard user's long-term static public key can likely decrypt many of the WireGuard user's historic messages. We propose a simple, efficient alteration to the WireGuard protocol that mitigates this vulnerability, with negligible additional computational and memory costs. Our changes add zero additional bytes of data to the wire format of the WireGuard protocol. Our alteration provides transitional post-quantum security for any WireGuard user who does not publish their long-term static public key -- it should be exchanged out-of-band.
ePrint Report An Efficient and Compact Reformulation of NIST Collision Estimate Test Prasanna Raghaw Mishra, Bhartendu Nandan, Navneet Gaba
In this paper we give an efficient and compact reformulation of NIST collision estimate test given in SP-800 90B. We correct an error in the formulation of the test and show that the test statistic can be computed in a much easier way. We also propose a revised algorithm for the test based on our findings.
Along with blockchain technology, smart contracts have found intense interest in lots of practical applications. A smart contract is a mechanism involving digital assets and some parties, where the parties deposit assets into the contract and the contract redistributes the assets among the parties based on provisions of the smart contract and inputs of the parties. Recently, several smart contract systems are constructed that use zk-SNARKs to provide privacy-preserving payments and interconnections in the contracts (e.g. Hawk [IEEE S&P, 2016] and Gyges [ACM CCS, 2016]). Efficiency of such systems severely are dominated by efficiency of the underlying UC-secure zk-SNARK that is achieved using COCO framework [Kosba et al., 2015] applied on a non-UC-secure zk-SNARK. In this paper, we show that recent progresses on zk-SNARKs, allow one to simplify the structure and also improve the efficiency of both systems with a UC-secure zk-SNARK that has simpler construction and better efficiency in comparison with the currently used ones. More precisely, with minimal changes, we present a variation of Groth and Maller's zk-SNARK from Crypto 2017, and show that it achieves UC-security and has better efficiency than the ones that currently are used in Hawk and Gyges. We believe, new variation can be of independent interest.
LoRaWAN is an IoT protocol deployed worldwide. Whereas the first version 1.0 has been shown to be weak against several types of attacks, the new version 1.1 has been recently released, and aims, in particular, at providing corrections to the previous release. It introduces also a third entity, turning the original 2-party protocol into a 3-party protocol. In this paper, we provide the first security analysis of LoRaWAN 1.1 in its 3-party setting using a provable approach, and show that it suffers from several flaws. Based on the 3(S)ACCE model of Bhargavan et al., we then propose an extended framework that we use to analyse the security of LoRaWAN-like 3-party protocols, and describe a generic 3-party protocol provably secure in this extended model. We use this provable security approach to propose a slightly modified version of LoRaWAN 1.1. We show how to concretely instantiate this alternative, and formally prove its security in our extended model.
ePrint Report BEARZ Attack FALCON: Implementation Attacks with Countermeasures on the FALCON signature scheme Sarah McCarthy, James Howe , Neil Smyth, Seamus Brannigan, Máire O'Neill
Post-quantum cryptography is an important and growing area of research due to the threat of quantum computers, as recognised by the National Institute of Standards and Technology (NIST) recent call for standardisation. Lattice-based signatures have been shown in the past to be susceptible to side-channel attacks. Falcon is a lattice-based signature candidate submitted to NIST, which has good performance but lacks in research with respect to implementation attacks and resistance. This research proposes the first fault attack analysis on Falcon and finds its lattice trapdoor sampler is as vulnerable to fault attacks as the GPV sampler used in alternative signature schemes. We simulate the post-processing component of this fault attack and achieve a 100% success rate at retrieving the private-key. This research then proposes an evaluation of countermeasures to prevent this fault attack and timing attacks on Falcon. We provide cost evaluations on the overheads of the proposed countermeasures which shows that Falcon has only up to 30% deterioration in performance of its key generation, and only 5% in its signing, compared to without countermeasures.
10 May 2019
Modern secure messaging protocols such as Signal can offer strong security guarantees, in particular Post-Compromise Security (PCS). The core PCS mechanism in these protocols is inherently pairwise, which causes bad scaling behaviour and makes PCS inefficient for large groups. To address this, two recently proposed designs for secure group messaging, ART and MLS Draft-04, use group keys derived from tree structures to efficiently enable PCS mechanisms in large groups.

In this work we highlight a previously unexplored difference between the pairwise and group-key based approaches. We show that without additional mechanisms, both ART and MLS Draft-04 offer significantly lower PCS guarantees than those offered by groups based on pairwise PCS channels. In particular, for MLS Draft-04, it seems that the protocol does not yet meet the informal PCS security guarantees described in the draft.

We explore the causes of this problem and lay out the design space to identify solutions. Optimizing security and minimizing overhead leads us to a promising solution based on (i) global updates and (ii) post-compromise secure signatures. While rotating signatures had been discussed before as options for both MLS and ART, our work indicates that combining specific update patterns for all groups with a post-compromise secure signature scheme, may be strictly necessary to achieve any reasonable PCS guarantee.
Using modular addition as a source of nonlinearity is frequently used in many symmetric-key structures such as ARX and Lai--Massey schemes. At FSE'16, Fu \etal proposed a Mixed Integer Linear Programming (MILP)-based method to handle the propagation of differential trails through modular additions assuming that the two inputs to the modular addition and the consecutive rounds are independent. However, this assumption does not necessarily hold. In this paper, we study the propagation of the XOR difference through the modular addition at the bit level and show the effect of the carry bit. Then, we propose a more accurate MILP model to describe the differential propagation through the modular addition taking into account the dependency between the consecutive modular additions. The proposed MILP model is utilized to launch a differential attack against Bel-T-256, which is a member of the Bel-T block cipher family that has been adopted recently as a national standard of the Republic of Belarus. In particular, we employ the concept of partial Differential Distribution Table to model the 8-bit S-Box of Bel-T using a MILP approach in order to automate finding a differential characteristic of the cipher. Then, we present a $4\frac{1}{7}$-round (out of 8) differential attack which utilizes a $3$-round differential characteristic that holds with probability $2^{-111}$. The data, time and memory complexities of the attack are $2^{114}$ chosen plaintexts, $2^{237.14}$ $4\frac{1}{7}$-round encryptions, and $2^{224}$ 128-bit blocks, respectively.
ePrint Report Dual-Mode NIZKs from Obfuscation Dennis Hofheinz, Bogdan Ursu
Two standard security properties of a non-interactive zero-knowledge (NIZK) scheme are soundness and zero-knowledge. But while standard NIZK systems can only provide one of those properties against unbounded adversaries, dual-mode NIZK systems allow to choose dynamically and adaptively which of these properties holds unconditionally. The only known dual-mode NIZK systems are Groth-Sahai proofs (which have proved extremely useful in a variety of applications), and the concurrent and independent FHE-based NIZK constructions of Canetti et al. and Peikert et al. However, all these constructions rely on specific algebraic settings.

Here, we provide a generic construction of dual-mode NIZK systems for all of NP. The public parameters of our scheme can be set up in one of two indistinguishable ways. One way provides unconditional soundness, while the other provides unconditional zero-knowledge. Our scheme relies on subexponentially secure indistinguishability obfuscation and subexponentially secure one-way functions, but otherwise only on comparatively mild and generic computational assumptions. These generic assumptions can be instantiated under any one of the DDH, k-LIN, DCR, or QR assumptions.

As an application, we reduce the required assumptions necessary for several recent obfuscation-based constructions of multilinear maps. Combined with previous work, our scheme can be used to construct multilinear maps from obfuscation and a group in which the strong Diffie-Hellman assumption holds. We also believe that our work adds to the understanding of the construction of NIZK systems, as it provides a conceptually new way to achieve dual-mode properties.
ePrint Report A Note on SIMON-$32/64$ Security John Matthew Macnaghten, James Luke Menzies, Mark Munro
This paper presents the results of a new approach to the cryptanalysis of SIMON-$32/64$, a cipher published by NSA in 2013. Our cryptanalysis essentially considers combinatorial properties. These properties allow us to recover a secret key from two plaintext/ciphertext pairs, in a time ranging from a few hours to a few days, with rather limited computing resources. The efficiency of our cryptanalysis technique compared to all known cryptanalyses (including key exhaustive search) is a justification for not revealing the cryptanalysis techniques used. We have adopted a zero-knowledge-inspired method of proof which was initiated in \cite{filiol_e0}.
Multivariate public key signature scheme has a good performance on speed and signature size. But most of them have a huge public key size. In this paper, we propose a new method to reduce the public key size of unbalance oil and vinegar (UOV) signature scheme. We can reduce the public key size of UOV scheme to about 4KB for 128 bits security level. This method can be used to reduce the public key sizes of other multivariate public key cryptosystems.
The Walnut Digital Signature Algorithm (WalnutDSA) brings together methods in group theory, representation theory, and number theory, to yield a public-key method that provides a means for messages to be signed and signatures to be verified, on platforms where traditional approaches cannot be executed. After briefly reviewing the various heuristic/practical attacks that have be posited by Hart et al, Beullens-Blackburn, Kotov-Menshov-Ushakov, and Merz-Petit, we detail the parameter choices that defeat each attack, ensure the security of the of the method, and demonstrate its continued utility.
ePrint Report UC-Secure CRS Generation for SNARKs Behzad Abdolmaleki, Karim Baghery, Helger Lipmaa, Janno Siim, Michal Zajac
Zero-knowledge SNARKs (zk-SNARKs) have recently found various applications in verifiable computation and blockchain applications (Zerocash), but unfortunately they rely on a common reference string (CRS) that has to be generated by a trusted party. A standard suggestion, pursued by Ben Sasson et al. [IEEE S&P, 2015], is to generate CRS via a multi-party protocol. We enhance their CRS-generation protocol to achieve UC-security. This allows to safely compose the CRS-generation protocol with the zk-SNARK in a black-box manner with the insurance that the security of the zk-SNARK is not influenced. Differently from the previous work, the new CRS-generation protocol also avoids the random oracle model which is typically not required by zk-SNARKs themselves. As a case study, we apply the protocol to the state-of-the-art zk-SNARK by Groth [EUROCRYPT, 2016].
We devise an efficient and \emph{data-oblivious} algorithm for solving a bounded integral linear system of arbitrary rank over the rational numbers via the Moore--Penrose pseudoinverse, using finite-field arithmetic. This particular problem setting stems from our goal to run the algorithm as a secure multiparty computation (MPC). Beyond MPC, our algorithm could be valuable in other scenarios, like secure enclaves in CPUs, where data-obliviousness is crucial for protecting secrets. We compute the Moore--Penrose inverse over a finite field of sufficiently large order, so that we can recover the rational solution from the solution over the finite field.

Previous work by Cramer, Kiltz and Padr\'o (\textsl{CRYPTO 2007}) proposes a constant-rounds protocol for computing the Moore--Penrose pseudoinverse over a finite field. The asymptotic complexity (counted as the number of secure multiplications) of their solution is $O(m^4 + n^2 m)$, where $m$ and $n$, $m\leq n$, are the dimensions of the linear system.

To reduce the number of secure multiplications, we sacrifice the constant-rounds property and propose a protocol for computing the Moore--Penrose pseudoinverse over the rational numbers in a linear number of rounds, requiring only $O(m^2n)$ secure multiplications.

To obtain the common denominator of the pseudoinverse, required for constructing an integer-representation of the pseudoinverse, we generalize a result by Ben-Israel for computing the squared volume of a matrix. Also, we show how to precondition a symmetric matrix to achieve generic rank profile while preserving symmetry and being able to remove the preconditioner after it has served its purpose. These results may be of independent interest.
Protecting a driver’s privacy is one of the major concerns in vehicular ad hoc networks (VANETs). Currently, Azees et al. has proposed an efficient anonymous authentication protocol (EAAP) for VANETs. The authors claim that their scheme can implement conditional privacy, and that it can provide resistance against impersonation attack and bogus message attack from an external attacker. In this paper, we show that their scheme fails to resist these two types of attack as well as forgery attack. By these attacks, an attacker can broadcast any messages successfully. Further, the attacker cannot be traced by a trusted authority, which means their scheme does not satisfy the requirement of conditional privacy. The results of this article clearly show that the scheme of Azees et al. is insecure.
In 2017, Aggarwal, Joux, Prakash, and Santha proposed an innovative NTRU-like public-key cryptosystem that was believed to be quantum resistant, based on Mersenne prime numbers $$q = 2^N-1$$. After a successful attack designed by Beunardeau, Connolly, Géraud, and Naccache, the authors revised the protocol which was accepted for Round 1 of the Post-Quantum Cryptography Standardization Process organized by NIST. The security of this protocol is based on the assumption that a so-called Mersenne Low Hamming Combination Search Problem (MLHCombSP) is hard to solve. In this work, we present a reduction of MLHCombSP to an instance of Integer Linear Programming (ILP). This opens new research directions that are necessary to be investigated in order to assess the concrete robustness of such cryptosystem. We propose different approaches to perform such reduction. Moreover, we uncover a new family of weak keys, for whose our reduction leads to an attack consisting in solving $$<N^3$$ ILP problems of dimension 3.
Inspired by the literature on side-channel attacks against cryptographic implementations, we describe a framework for the analysis of location privacy. It allows us to revisit (continuous) re-identification attacks with a combination of information theoretic and security metrics. Our results highlight conceptual differences between re-identification attacks exploiting leakages that are internal or external to a pseudonymised database. They put forward the amount of data to collect in order to estimate a predictive model as an important -- yet less discussed -- dimension of privacy assessments. They finally leverage recent results on the security evaluations/certification of cryptographic implementations to connect information theoretic and security metrics, and to formally bound the risk of re-identification with external leakages.
ePrint Report Privacy-Preserving K-means Clustering with Multiple Data Owners Jung Hee Cheon, Jinhyuck Jeong, Dohyeong Ki, Jiseung Kim, Joohee Lee, Seok Won Lee
Recently with the advent of technology, a lot of data are stored and mined in cloud servers. Since most of the data contain potential private information, it has become necessary to preserve the privacy in data mining. In this paper, we propose a protocol for collaboratively performing the K-means clustering algorithm on the data distributed among multiple data owners, while protecting the sensitive private data. We employ two service providers in our scenario, namely a main service provider and a key manager. Under the assumption that the cryptosystems used in our protocol are secure and that the two service providers do not collude, we provide a perfect secrecy in the sense that the cluster centroids and data are not leaked to any party including the two service providers. Also, we implement the scenario using recently proposed leveled homomorphic encryption called HEAAN. With our construction, the privacy-preserving K-means clustering can be done in less than one minute while maintaining 80-bit security in a situation with 10,000 data, 8 features and 4 clusters.
Clustering analysis is one of the most significant unsupervised machine learning tasks, and it is utilized in various fields associated with privacy issue including bioinformatics, finance and image processing. In this paper, we propose a practical solution for privacy-preserving clustering analysis based on homomorphic encryption~(HE). Our work is the first HE solution for the mean-shift clustering algorithm. To reduce the super-linear complexity of the original mean-shift algorithm, we adopt a novel random sampling method called dust sampling which perfectly fits in HE and achieve the linear complexity. We also substitute non-polynomial kernels by a new polynomial kernel so that it can be efficiently computed in HE. The quality of clustering analysis with the new HE-friendly kernel is fairly fine in practice.

The performance of our modified mean-shift clustering algorithm based on the approximate HE scheme HEAAN is quite remarkable in terms of speed and accuracy. It takes about $30$ minutes with $99\%$ accuracy over several public datasets with hundreds of data, but even for two hundred thousands of data it takes only $82$ minutes with SIMD operations in HEAAN. Our results outperform the previously best known result over $400$ times.