Date of Award
5-2018
Degree Type
Thesis
Degree Name
Master of Science (MS)
Department
Computer and Information Technology
Committee Chair
William G. McCartney
Committee Co-Chair
Joseph P. Robinson
Committee Member 1
Bartek Rajwa
Committee Member 2
John A. Springer
Abstract
This paper introduces a novel approach for assessing multiple patterns in biological imaging datasets. The developed tool should be able to provide most probable structure of a dataset of images that consists of biological patterns not encountered during the model training process. The tool includes two major parts: (1) feature learning and extraction pipeline and (2) subsequent clustering with estimation of number of classes. The feature-learning part includes two deep-learning techniques and a feature quantitation pipeline as a benchmark method. Clustering includes three non-parametric methods. K-means clustering is employed for validation and hypothesis testing by comparing results with provided ground truth. The most appropriate methods and hyper-parameters were suggested to achieve maximum clustering quality. A convolutional autoencoder demonstrated the most stable and robust results: entropy-based V-measure metric 0.9759 on a dataset of classes employed for training and 0.9553 on a dataset of completely novel classes.
Recommended Citation
Patsekin, Aleksandr, "Feature Learning as a Tool to Identify Existence of Multiple Biological Patterns" (2018). Open Access Theses. 1436.
https://docs.lib.purdue.edu/open_access_theses/1436