Feature Learning as a Tool to Identify Existence of Multiple Biological Patterns

Aleksandr Patsekin, Purdue University

Abstract

This paper introduces a novel approach for assessing multiple patterns in biological imaging datasets. The developed tool should be able to provide most probable structure of a dataset of images that consists of biological patterns not encountered during the model training process. The tool includes two major parts: (1) feature learning and extraction pipeline and (2) subsequent clustering with estimation of number of classes. The feature-learning part includes two deep-learning techniques and a feature quantitation pipeline as a benchmark method. Clustering includes three non-parametric methods. K-means clustering is employed for validation and hypothesis testing by comparing results with provided ground truth. The most appropriate methods and hyper-parameters were suggested to achieve maximum clustering quality. A convolutional autoencoder demonstrated the most stable and robust results: entropy-based V-measure metric 0.9759 on a dataset of classes employed for training and 0.9553 on a dataset of completely novel classes.

Degree

M.S.

Advisors

Robinson, Purdue University.

Subject Area

Bioinformatics|Artificial intelligence|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS