Keywords

object recognition, visual feature importance, human in the loop, convolutional neural networks

Abstract

A central goal in vision science is to identify features that are important for object and scene recognition. Reverse correlation methods have been used to uncover features important for recognizing faces and other stimuli with low intra-class variability. However, these methods are less successful when applied to natural scenes with variability in their appearance.

To rectify this, we developed Clicktionary, a web-based game for identifying features for recognizing real-world objects. Pairs of participants play together in different roles to identify objects: A “teacher” reveals image regions diagnostic of the object’s category while a “student” tries to recognize the object. Aggregating game data across players yields importance maps for objects, where each pixel is scored by its contribution to recognition. We found that these importance maps are consistent across participants and identify object features that are distinct from those used by deep convolutional networks (DCNs) for object recognition or those predicted by salience maps derived from both human participants and models.

We also extended Clicktionary to support large-scale feature map discovery (http://clickme.ai), whereby human teachers play with DCN students. This has generated a dataset of tens of thousands of unique images, which we incorporate into DCN training routines to make them emphasize these features. This procedure changes DCN object representations, reducing the reliance on background information and highlighting similar features as humans. Human feature importance maps identified by Clicktionary and our DCN models trained with this information will enable a richer understanding of the foundations of object recognition.

Start Date

17-5-2017 3:54 PM

End Date

17-5-2017 4:16 PM

Share

COinS
 
May 17th, 3:54 PM May 17th, 4:16 PM

Large-scale discovery of visual features for object recognition

A central goal in vision science is to identify features that are important for object and scene recognition. Reverse correlation methods have been used to uncover features important for recognizing faces and other stimuli with low intra-class variability. However, these methods are less successful when applied to natural scenes with variability in their appearance.

To rectify this, we developed Clicktionary, a web-based game for identifying features for recognizing real-world objects. Pairs of participants play together in different roles to identify objects: A “teacher” reveals image regions diagnostic of the object’s category while a “student” tries to recognize the object. Aggregating game data across players yields importance maps for objects, where each pixel is scored by its contribution to recognition. We found that these importance maps are consistent across participants and identify object features that are distinct from those used by deep convolutional networks (DCNs) for object recognition or those predicted by salience maps derived from both human participants and models.

We also extended Clicktionary to support large-scale feature map discovery (http://clickme.ai), whereby human teachers play with DCN students. This has generated a dataset of tens of thousands of unique images, which we incorporate into DCN training routines to make them emphasize these features. This procedure changes DCN object representations, reducing the reliance on background information and highlighting similar features as humans. Human feature importance maps identified by Clicktionary and our DCN models trained with this information will enable a richer understanding of the foundations of object recognition.