Date of Award
Fall 2013
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Electrical and Computer Engineering
First Advisor
Jeffrey M. Siskind
Committee Chair
Jeffrey M. Siskind
Committee Member 1
Robert L. Givan
Committee Member 2
Thomas M. Talavage
Committee Member 3
Anthony G. Cohn
Abstract
Humans not only outperform AI and computer-vision systems, but use an unknown computational mechanism to perform tasks for which no suitable approaches exist. I present work investigating both novel tasks and how humans approach them in the context of computer vision and linguistics. I demonstrate a system which, like children, acquires high-level linguistic knowledge about the world. Robots learn to play physically-instantiated board games and use that knowledge to engage in physical play. To further integrate language and vision I develop an approach which produces rich sentential descriptions of events depicted in videos. I then show how to simultaneously detect and track objects, recognize events, and produce sentences. This tighter integration of language and vision enables a novel task: sentential video retrieval. A video corpus can be searched for clips which depict a target sentence rather than just a collection of individual query words. This work assumes a compositional representation of events, composing sentence models from word models. Perhaps the reason why humans perform tasks such as the above with ease is because of a tight integration of language and vision exploiting the compositionality inherent in both modalities. I present work indicating that this may be the case. Humans are shown videos while fMRI data is acquired and sentences which describe those videos are recovered compositionally.
Recommended Citation
Barbu, Andrei, "Reasoning Across Language and Vision in Machines and Humans" (2013). Open Access Dissertations. 181.
https://docs.lib.purdue.edu/open_access_dissertations/181