## CryptoDB

### Paper: Oblivious Sampling with Applications to Two-Party k-Means Clustering

Authors: Paul Bunn Rafail Ostrovsky DOI: 10.1007/s00145-020-09349-w Search ePrint Search Google The k -means clustering problem is one of the most explored problems in data mining. With the advent of protocols that have proven to be successful in performing single database clustering, the focus has shifted in recent years to the question of how to extend the single database protocols to a multiple database setting. To date, there have been numerous attempts to create specific multiparty k -means clustering protocols that protect the privacy of each database, but according to the standard cryptographic definitions of “privacy-protection”, so far all such attempts have fallen short of providing adequate privacy. In this paper, we describe a Two-Party k -Means Clustering Protocol that guarantees privacy against an honest-but-curious adversary, and is more efficient than utilizing a general multiparty “compiler” to achieve the same task. In particular, a main contribution of our result is a way to compute efficiently multiple iterations of k -means clustering without revealing the intermediate values. To achieve this, we describe a technique for performing two-party division securely and also introduce a novel technique allowing two parties to securely sample uniformly at random from an unknown domain size. The resulting Division Protocol and Random Value Protocol are of use to any protocol that requires the secure computation of a quotient or random sampling. Our techniques can be realized based on the existence of any semantically secure homomorphic encryption scheme. For concreteness, we describe our protocol based on Paillier Homomorphic Encryption scheme (see Paillier in Advances in: cryptology EURO-CRYPT’99 proceedings, LNCS 1592, pp 223–238, 1999). We will also demonstrate that our protocol is efficient in terms of communication, remaining competitive with existing protocols (such as Jagannathan and Wright in: KDD’05, pp 593–599, 2005) that fail to protect privacy.
##### BibTeX
@article{jofc-2020-30752,
title={Oblivious Sampling with Applications to Two-Party k-Means Clustering},
journal={Journal of Cryptology},
publisher={Springer},
volume={33},
pages={1362-1403},
doi={10.1007/s00145-020-09349-w},
author={Paul Bunn and Rafail Ostrovsky},
year=2020
}