Abstract

We investigate the representational dynamics of recurrent convolutional neural networks (RCNNs) in order to understand the time-course of visual recognition. We explore a family of models with bottom-up and lateral connections that were optimized for face-identification and object-recognition tasks. Using representational similarity analysis (RSA), we observed that only models that were trained for face identification showed a late-emerging prominent distinction of identities as seen in the monkey face patch AM. Early model responses strongly separated the objects from the faces. These findings suggest that the dynamics of face recognition that emerges in a hierarchical recurrent neural network prioritizes category-level recognition at early stages (face detection), triggering later category-specific computations that enable individual-level recognition (face recognition) as observed in neurophysiological findings. Our results also show that models that were trained simultaneously on both face identification and object recognition were more likely to show the signature of mirror symmetric viewpoint tuning in their intermediate representations as has been reported for monkey face patch AL. We also examined the tuning properties of individual units in the last layer of our network across timesteps. After embedding the face/non-face images in a representational space, for each unit the tuning was determined as the direction in which the unit responses increased. With increasing steps, the units showed an emerging identity discrimination tuning that was recently observed in primate face patches. Taken together, these results give us a candidate mechanistic account of primate face perception, consistent with evidence on individual unit tuning and population geometry.

Keywords

object recognition, recurrent models, face perception

Start Date

15-5-2025 11:00 AM

End Date

15-5-2025 11:30 AM

Share

COinS
 
May 15th, 11:00 AM May 15th, 11:30 AM

Capturing the Representational Dynamics of Face Perception in Deep Recurrent Neural Networks

We investigate the representational dynamics of recurrent convolutional neural networks (RCNNs) in order to understand the time-course of visual recognition. We explore a family of models with bottom-up and lateral connections that were optimized for face-identification and object-recognition tasks. Using representational similarity analysis (RSA), we observed that only models that were trained for face identification showed a late-emerging prominent distinction of identities as seen in the monkey face patch AM. Early model responses strongly separated the objects from the faces. These findings suggest that the dynamics of face recognition that emerges in a hierarchical recurrent neural network prioritizes category-level recognition at early stages (face detection), triggering later category-specific computations that enable individual-level recognition (face recognition) as observed in neurophysiological findings. Our results also show that models that were trained simultaneously on both face identification and object recognition were more likely to show the signature of mirror symmetric viewpoint tuning in their intermediate representations as has been reported for monkey face patch AL. We also examined the tuning properties of individual units in the last layer of our network across timesteps. After embedding the face/non-face images in a representational space, for each unit the tuning was determined as the direction in which the unit responses increased. With increasing steps, the units showed an emerging identity discrimination tuning that was recently observed in primate face patches. Taken together, these results give us a candidate mechanistic account of primate face perception, consistent with evidence on individual unit tuning and population geometry.