Abstract
Deep neural networks (DNNs) have become influential computational models of human vision, particularly in explaining neural responses in the ventral stream. However, they frequently diverge from well-established findings in psychophysics, especially with regard to human perceptual biases, robustness, and generalization behavior. Much of the progress in aligning DNNs with human perception has focused on the task of core object recognition—the rapid identification of objects in static images (DiCarlo et al., 2012). In contrast, other critical dimensions of visual perception, such as motion processing and multi-object scene understanding, remain comparatively underexplored. In this talk, I present our recent work on modeling how motion supports perceptual organization and discuss both the prospects and limitations of using DNNs as scientific models of dynamic scene perception. I argue that geometric aspects of visual experience—particularly motion and depth—offer a promising path forward for bridging human and machine vision.
Keywords
DNNs, Motion, Perceptual Organization, Scene Perception
Start Date
15-5-2025 11:30 AM
End Date
15-5-2025 12:00 PM
Recommended Citation
Tangemann, Matthias; Kümmerer, Matthias; and Bethge, Matthias, "Beyond Core Object Recognition: DNNs as Models of Dynamic Scene Perception" (2025). MODVIS Workshop. 2.
https://docs.lib.purdue.edu/modvis/2025/Program/2
Included in
Beyond Core Object Recognition: DNNs as Models of Dynamic Scene Perception
Deep neural networks (DNNs) have become influential computational models of human vision, particularly in explaining neural responses in the ventral stream. However, they frequently diverge from well-established findings in psychophysics, especially with regard to human perceptual biases, robustness, and generalization behavior. Much of the progress in aligning DNNs with human perception has focused on the task of core object recognition—the rapid identification of objects in static images (DiCarlo et al., 2012). In contrast, other critical dimensions of visual perception, such as motion processing and multi-object scene understanding, remain comparatively underexplored. In this talk, I present our recent work on modeling how motion supports perceptual organization and discuss both the prospects and limitations of using DNNs as scientific models of dynamic scene perception. I argue that geometric aspects of visual experience—particularly motion and depth—offer a promising path forward for bridging human and machine vision.