Keywords

V4, Curvature, Neural Net, Invariance

Abstract

Convolutional neural nets (CNNs) are currently the highest performing image recognition computer algorithms. Of interest is whether these CNNs, following extensive supervised training, perform computations similar to those in the ventral visual stream. We investigated whether CNN units’ tuning for shape boundaries was similar to V4’s as described in the angular position and curvature (APC) model of Pasupathy and Connor 2001. From units in all layers of AlexNet (see Figure A), an object recognition CNN, we recorded responses to the original study’s set of shape stimuli (51 simple closed shapes at up to 8 rotations) presented at 51 spatial translations (2 pixel increments). We found many units in all layers with V4-like APC shape tuning, but only the later layers had the translation invariance to deem them truly V4-like (Figure B). We then asked whether the CNN could directly predict responses of V4 neurons better than the simpler APC model (Figure D). We found that even model units in the second layer could serve as good V4 models so we have started to probe quantitatively the representation of the early layers in terms of form and chromatic representation. We have found the first layer (Figure C.) can be described with a handful of parameters (orientation, peak frequency, bandwidth) and the pattern of weights in the second layer approximate classical properties of V1 including cross-orientation suppression. We will discuss the implications of these results for mid-level visual encoding and the development of state-of-the-art image-computable models for mid-level visual representation.

Start Date

17-5-2017 9:44 AM

End Date

17-5-2017 10:06 AM

Location

University of Washington, Seattle

Share

COinS
 
May 17th, 9:44 AM May 17th, 10:06 AM

Evaluating and Interpreting a Convolutional Neural Net as a Model of V4

University of Washington, Seattle

Convolutional neural nets (CNNs) are currently the highest performing image recognition computer algorithms. Of interest is whether these CNNs, following extensive supervised training, perform computations similar to those in the ventral visual stream. We investigated whether CNN units’ tuning for shape boundaries was similar to V4’s as described in the angular position and curvature (APC) model of Pasupathy and Connor 2001. From units in all layers of AlexNet (see Figure A), an object recognition CNN, we recorded responses to the original study’s set of shape stimuli (51 simple closed shapes at up to 8 rotations) presented at 51 spatial translations (2 pixel increments). We found many units in all layers with V4-like APC shape tuning, but only the later layers had the translation invariance to deem them truly V4-like (Figure B). We then asked whether the CNN could directly predict responses of V4 neurons better than the simpler APC model (Figure D). We found that even model units in the second layer could serve as good V4 models so we have started to probe quantitatively the representation of the early layers in terms of form and chromatic representation. We have found the first layer (Figure C.) can be described with a handful of parameters (orientation, peak frequency, bandwidth) and the pattern of weights in the second layer approximate classical properties of V1 including cross-orientation suppression. We will discuss the implications of these results for mid-level visual encoding and the development of state-of-the-art image-computable models for mid-level visual representation.