A distributed approach to enabling privacy-preserving model-based classifier training


This paper proposes a novel approach for privacy-preserving distributed model-based classifier training. Our approach is an important step towards supporting customizable privacy modeling and protection. It consists of three major steps. First, each data site independently learns a weak concept model (i.e., local classifier) for a given data pattern or concept by using its own training samples. An adaptive EM algorithm is proposed to select the model structure and estimate the model parameters simultaneously. The second step deals with combined classifier training by integrating the weak concept models that are shared from multiple data sites. To reduce the data transmission costs and the potential privacy breaches, only the weak concept models are sent to the central site and synthetic samples are directly generated from these shared weak concept models at the central site. Both the shared weak concept models and the synthetic samples are then incorporated to learn a reliable and complete global concept model. A computational approach is developed to automatically achieve a good trade off between the privacy disclosure risk, the sharing benefit and the data utility. The third step deals with validating the combined classifier by distributing the global concept model to all these data sites in the collaboration network while at the same time limiting the potential privacy breaches. Our approach has been validated through extensive experiments carried out on four UCI machine learning data sets and two image data sets.


Privacy-preserving classifier training, Synthetic samples, Adaptive EM algorithm

Date of this Version



Knowledge and Information Systems Volume 20, Number 2, 157-185