Many techniques for privacy-preserving data mining (PPDM) have been investigated over the past decade. Often, the entities involved in the data mining process are end-users or organizations with limited computing and storage resources. As a result, such entities may want to refrain from participating in the PPDM process. To overcome this issue and to take many other benefits of cloud computing, outsourcing PPDM tasks to the cloud environment has recently gained special attention. We consider the scenario where n entities outsource their databases (in encrypted format) to the cloud and ask the cloud to perform the clustering task on their combined data in a privacy-preserving manner. We term such a process as privacy-preserving and outsourced distributed clustering (PPODC). In this paper, we propose a novel and efficient solution to the PPODC problem based on k-means clustering algorithm. The main novelty of our solution lies in avoiding the secure division operations required in computing cluster centers altogether through an efficient transformation technique. Our solution builds the clusters securely in an iterative fashion and returns the final cluster centers to all entities when a pre-determined termination condition holds. The proposed solution protects data confidentiality of all the participating entities under the standard semi-honest model. To the best of our knowledge, ours is the first work to discuss and propose a comprehensive solution to the PPODC problem that incurs negligible cost on the participating entities. We theoretically estimate both the computation and communication costs of the proposed protocol and also demonstrate its practical value through experiments on a real dataset.
Date of this Version