On human emotion and activity analysis

Rami Alazrai, Purdue University

Abstract

This thesis aims at investigating methodologies for recognizing human emotions, single human daily life activities and human-human interactions by utilizing different types of non-verbal human behavioral signals such as facial expressions, body postures, actions and interactions as video input signals. Two recognition schemes have been investigated and developed for recognizing human activity and emotion from an input video. In the first recognition scheme, we propose to decouple the spatial context from the temporal dynamics of human body and facial expressions instead of dealing with both temporal and spatial modalities simultaneously. To achieve this decoupling, we have developed two techniques for automatically identifying temporal phases of human actions and emotions in input videos, such that each temporal phase is completely characterized by its spatial context. Furthermore, we have developed a two-layered framework for recognizing human emotional and action states based on analyzing the temporal dynamics of human emotions/actions using the decoupled spatial context. In the first layer, the decoupled spatial context is utilized to convert the input video into a string of labels, where each label represents the class of the temporal phase to which its corresponding frame belongs. Then, in the second layer, we perform a temporal analysis to classify the sequence of labels that was generated from the first layer into one of different emotional/action states. In our second approach, we propose a video classification platform that is based on utilizing a Nonlinear AutoRegressive with eXogenous input (NARX) model with recurrent neural network realization - we name it recurrent NARX network model. The proposed recurrent NARX network model is utilized for recognizing human emotions and actions from given input videos. This approach formulates video classification problem as a parametric temporal sequence regression problem, and solves it in a temporal-spatial fashion. Computer simulations and experiments using publicly available databases were conducted to evaluate the performance of both recognition schemes. Experimental results showed that using the decoupling recognition scheme, the average recognition rate for human activities and emotions were 98.89% and 93.53%, respectively. These results outperformed the average recognition rates obtained when the recurrent NARX network model was used as a recognition engine by approximately 4%. Unlike human activities that involve single human, recognizing human-human interactions is more challenging and requires taking into consideration the semantic meanings and the inter-relations of the moving body-parts of each human. Hence, for this purpose, we have developed a view-invariant geometric representation that utilizes 3D joint pose of human body-parts to capture the semantic meaning of different spatiotemporal configurations of two interacting persons using an RGBD input data from a Kinect sensor. The proposed representation utilizes the concept of anatomical planes to construct motion and pose profiles for each interacting person, and then these two profiles are concatenated to form a geometric descriptor for two interacting humans. Using the proposed geometric representation, we have developed frameworks to perform human-human interaction analysis from two perspectives: human-human interaction classification and prediction from an input video. The performance of the proposed human-human interaction classification and prediction frameworks were evaluated using a human-human interaction dataset that have been collected in our lab, which consists of 27500 frames for 12 individuals performing 12 different interactions. Using the proposed geometric descriptor, human-human interaction classification framework was able to achieve an average recognition rate of 94.86%, while human-human interaction prediction framework was able to achieve an average prediction accuracy of 82.46% at a progress level of 50%.

Degree

Ph.D.

Advisors

Lee, Purdue University.

Subject Area

Electrical engineering

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS