Keywords
Machine Learning, Gaussian mixture models, Model selection, EM algorithm, Parallel Computing
Presentation Type
Poster
Research Abstract
In recent years, model selection methods have seen significant advancement, but improvements have tended to be bench marked on its efficiency. An effective model selection system requires a robust feature extraction module. A model selection system is developed by using Finite Multivariate Generalized Gaussian Mixture Model, which organize data points to clusters. Clustering is basically to assign data set into different groups based on their similarity. In this model, expectation maximization method is used to calculate the distance from each point to their dummy center point, where center point will be changing with the process of simulation to get the best fitting results. Parallel computing is utilized to accelerate simulation process. The performance of the developed model is studied through experimental evaluation with ten thousands data points and identification accuracy. The system still can be improved by a new algorithm to separate the cluster. Performance evaluations will be investigated and compared.
Session Track
Modeling and Simulation
Recommended Citation
Tian Qiu, Georgios Karagiannis, and Guang Lin,
"Model Selection Using Gaussian Mixture Models and Parallel Computing"
(August 4, 2016).
The Summer Undergraduate Research Fellowship (SURF) Symposium.
Paper 142.
https://docs.lib.purdue.edu/surf/2016/presentations/142
Model Selection Using Gaussian Mixture Models and Parallel Computing
In recent years, model selection methods have seen significant advancement, but improvements have tended to be bench marked on its efficiency. An effective model selection system requires a robust feature extraction module. A model selection system is developed by using Finite Multivariate Generalized Gaussian Mixture Model, which organize data points to clusters. Clustering is basically to assign data set into different groups based on their similarity. In this model, expectation maximization method is used to calculate the distance from each point to their dummy center point, where center point will be changing with the process of simulation to get the best fitting results. Parallel computing is utilized to accelerate simulation process. The performance of the developed model is studied through experimental evaluation with ten thousands data points and identification accuracy. The system still can be improved by a new algorithm to separate the cluster. Performance evaluations will be investigated and compared.