Content analysis of user generated video

Golnaz Abdollahian, Purdue University

Abstract

Due to the availability of online repositories, such as YouTube and social networking sites, there has been a tremendous increase in the amount of personal video generated and consumed by average users. Such video is often referred to as user generated video (UGV) or consumer video as opposed to produced video, which is produced and edited by professionals. Examples of produced video are television programs, movies, and commercials. The large amount of user generated content available has increased the need for compact representations of UGV sequences that are intuitive for users and let them easily and quickly browse large collections of video data, especially on portable devices, such as mobile phones. UGV has special properties that are distinct from the properties of produced video, which makes the content analysis for this type of video more challenging. Many analysis techniques that are designed for produced video can not be directly used for UGV. In this thesis we developed a system for the analysis of UGV. The system consists of two major components: camera motion-based video summarization and video annotation based on location information. UGV often has a rich camera motion structure that is generated at the time the video is recorded by the person taking the video, i.e. the “camera person.” We exploit this structure by defining a new concept known as camera view for temporal segmentation of UGV. The segmentation provides a video summary with unique properties that is useful in applications such as video annotation. Camera motion is also a powerful feature for identification of keyframes and regions of interest (ROIs) since it is an indicator of the camera person’s interests in the scene and can also attract the viewers’ attention. We examined the effect of camera motion on human visual attention through an eye tracking experiment. The results showed a high dependency between the distribution of fixation points of the viewers and the direction of camera movement. Based on the results of this experiment, we introduced a new location-based saliency map which is generated based on camera motion parameters. This map is combined with other saliency maps generated using features such as color contrast, object motion and face detection to determine the ROIs. The output of the summarization step is a set of keyframes with highlighted ROIs. Several user studies were designed and conducted to assess the performance of our system. The subjective evaluation indicated that our system produces video summaries that are consistent with viewers’ preferences. The video annotation part in our proposed system generates a set of tags or keywords for the video which is provided to the user along with the video summary. Our proposed annotation approach uses the location metadata of the video and visual features to find tags that are most likely relevant to the video in a user-tagged image database. We examined different image matching methods to be used to measure the visual similarity between the video keyframes and images in the database. These methods have been tested and compared to each other over a variety of different cases.

Degree

Ph.D.

Advisors

Delp, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS