Keywords

generative vision, visual search, template

Abstract

Search theory relies heavily on the concept of a template, an internal representation of the target that, via a matching process to a visual input, creates a top-down signal biasing attention to a target. The search template was originally conceptualized as being specific to a given object, but over the years this definition broadened to include the features of a target category. Exploiting recent generative methods, we suggest re-conceptualizing the search template yet again, thinking of it now as a fully-generated target object residing in peripheral vision and not just a collection of features. Our approach is to generate potential target-object appearances in degraded peripheral pixels. For example, when searching for a mosquito, our attention may be drawn to any small, roundish-shaped objects because they provide an ideal canvas for generating or attaching limbs. We used an adversarial training method to reconstruct peripheral objects so that they more closely resemble the typical appearance of the target category. We quantified the extent of pixel changes required by this reconstruction and tested whether the "reconstruction cost" accounts for target guidance in both digit and natural object-array search tasks. Our model, even though it was not explicitly trained for target object detection or search tasks, exhibited remarkable performance (~%90 accuracy) in locating target objects, particularly in blurred peripheral input, outperforming a DNN-based detector baseline. Moreover, the model exhibits a strong behavioral alignment with human eye-movement data collected during the same task. For example, our model explained attention guidance comparably or significantly better than an object-detector baseline in both target-present and target-absent conditions. (Our, Pearson’s r = 0.891, p = 0.013, and Detector, r = 0.911, p = 0.012 for target-present; Our, r = 0.332, p = 0.052, and Detector, r = 0.134, p = 0.056 for target-absent). Our work suggests that the target template may be an internal generation of a potential search target in peripheral vision.

Start Date

15-5-2024 4:00 PM

End Date

15-5-2024 5:00 PM

Share

COinS
 
May 15th, 4:00 PM May 15th, 5:00 PM

A Generative Approach to Understanding Categorical Visual Search

Search theory relies heavily on the concept of a template, an internal representation of the target that, via a matching process to a visual input, creates a top-down signal biasing attention to a target. The search template was originally conceptualized as being specific to a given object, but over the years this definition broadened to include the features of a target category. Exploiting recent generative methods, we suggest re-conceptualizing the search template yet again, thinking of it now as a fully-generated target object residing in peripheral vision and not just a collection of features. Our approach is to generate potential target-object appearances in degraded peripheral pixels. For example, when searching for a mosquito, our attention may be drawn to any small, roundish-shaped objects because they provide an ideal canvas for generating or attaching limbs. We used an adversarial training method to reconstruct peripheral objects so that they more closely resemble the typical appearance of the target category. We quantified the extent of pixel changes required by this reconstruction and tested whether the "reconstruction cost" accounts for target guidance in both digit and natural object-array search tasks. Our model, even though it was not explicitly trained for target object detection or search tasks, exhibited remarkable performance (~%90 accuracy) in locating target objects, particularly in blurred peripheral input, outperforming a DNN-based detector baseline. Moreover, the model exhibits a strong behavioral alignment with human eye-movement data collected during the same task. For example, our model explained attention guidance comparably or significantly better than an object-detector baseline in both target-present and target-absent conditions. (Our, Pearson’s r = 0.891, p = 0.013, and Detector, r = 0.911, p = 0.012 for target-present; Our, r = 0.332, p = 0.052, and Detector, r = 0.134, p = 0.056 for target-absent). Our work suggests that the target template may be an internal generation of a potential search target in peripheral vision.