Date of Award

Fall 2013

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Electrical and Computer Engineering

First Advisor

Jeffrey M. Siskind

Committee Chair

Jeffrey M. Siskind

Committee Member 1

Robert L. Givan

Committee Member 2

Thomas M. Talavage

Committee Member 3

Anthony G. Cohn

Abstract

Humans not only outperform AI and computer-vision systems, but use an unknown computational mechanism to perform tasks for which no suitable approaches exist. I present work investigating both novel tasks and how humans approach them in the context of computer vision and linguistics. I demonstrate a system which, like children, acquires high-level linguistic knowledge about the world. Robots learn to play physically-instantiated board games and use that knowledge to engage in physical play. To further integrate language and vision I develop an approach which produces rich sentential descriptions of events depicted in videos. I then show how to simultaneously detect and track objects, recognize events, and produce sentences. This tighter integration of language and vision enables a novel task: sentential video retrieval. A video corpus can be searched for clips which depict a target sentence rather than just a collection of individual query words. This work assumes a compositional representation of events, composing sentence models from word models. Perhaps the reason why humans perform tasks such as the above with ease is because of a tight integration of language and vision exploiting the compositionality inherent in both modalities. I present work indicating that this may be the case. Humans are shown videos while fMRI data is acquired and sentences which describe those videos are recovered compositionally.

Share

COinS