Using objective ground-truth labels created by multiple annotators for improved video classification: A comparative study
Abstract
We address the problem of predicting category labels for unlabeled videos in a large video dataset by using a ground-truth set of objectively labeled videos that we have created. In large video databases like YouTube, the labeling of videos according to a prescribed set of category labels is done by the video uploaders and is likely to be corrupted by their subjective biases. These labels, despite their noisy nature, are frequently used as gold standard in multimedia classication and retrieval. Our goal in this thesis is to present a strategy for first creating an objectively labeled ground-truth set of videos and then based on such a ground-truth, to predict the objective labels for a set of unlabeled videos. The objectively-labeled ground-truth dataset is created based on the majority opinion rendered by multiple human annotators. This set consists of randomly-selected 1000 videos from the TinyVideos database that contains roughly 52,000 videos (courtesy of Karpenko and Aarabi [1]). Through a four-fold cross-validation experiment on the ground-truth set, the superior consistency of objective labels for video classication, compared to subjective labels is demonstrated. We show that this claim is valid for several different kinds of feature sets that one can use to compare videos and with two different types of classifiers that one can use for label prediction. Subsequently, we use the ground-truth set to predict the objective category labels of the remaining 51,000 videos. We compare the objective labels thus determined with the subjective labels provided by the video uploaders and qualitatively argue for the more informative nature of the objective labels. We also answer the question that since it is difficult to manually create such labels for a large data set, is it possible to use a few labeled data instances together with abundant unlabeled instances for learning a label prediction function? This is accomplished with a semi-supervised learning algorithm called Self-training. The effect of the size of unlabeled data in such a learning algorithm is investigated. It is shown that the improvement in label prediction performance is dependent on how well the data satisfies the model assumptions made by the learning algorithm.
Degree
Ph.D.
Advisors
Park, Purdue University.
Subject Area
Computer Engineering|Artificial intelligence
Off-Campus Purdue Users:
	To access this dissertation, please log in to our
	proxy server.
 
				