Keywords

Multi-Layer Model, Linear-Nonlinear Model, Deep Network, Jacobian, Inverse

Abstract

Analyzing the mathematical properties of perceptually meaningful linear-nonlinear transforms is interesting because this computation is at the core of many vision models. Here we make such analysis in detail using a specific model [Malo & Simoncelli, SPIE Human Vision Electr. Imag. 2015] which is illustrative because it consists of a cascade of standard linear-nonlinear modules. The interest of the analytic results and the numerical methods involved transcend the particular model because of the ubiquity of the linear-nonlinear structure.

Here we extend [Malo&Simoncelli 15] by considering 4 layers: (1) linear spectral integration and nonlinear brightness response, (2) definition of local contrast by using linear filters and divisive normalization, (3) linear CSF filter and nonlinear local con- trast masking, and (4) linear wavelet-like decomposition and nonlinear divisive normalization to account for orientation and scale-dependent masking. The extra layers were measured using Maximum Differentiation [Malo et al. VSS 2016].

First, we describe the general architecture using a unified notation in which every module is composed by isomorphic linear and nonlinear transforms. The chain-rule is interesting to simplify the analysis of systems with this modular architecture, and invertibility is related to the non-singularity of the Jacobian matrices. Second, we consider the details of the four layers in our particular model, and how they improve the original version of the model. Third, we explicitly list the derivatives of every module, which are relevant for the definition of perceptual distances, perceptual gradient descent, and characterization of the deformation of space. Fourth, we address the inverse, and we find different analytical and numerical problems in each specific module. Solutions are proposed for all of them. Finally, we describe through examples how to use the toolbox to apply and check the above theory.

In summary, the formulation and toolbox are ready to explore the geometric and perceptual issues addressed in the introductory section (giving all the technical information that was missing in [Malo&Simoncelli 15]).

Start Date

12-5-2016 9:50 AM

End Date

12-5-2016 10:15 AM

Location

C:\disco_portable\mundo_irreal\jesus\CONGRESOS_CHARLAS\MODVIS_SUBMISSION

Share

COinS
 
May 12th, 9:50 AM May 12th, 10:15 AM

Derivatives and Inverse of a Linear-Nonlinear Multi-Layer Spatial Vision Model

C:\disco_portable\mundo_irreal\jesus\CONGRESOS_CHARLAS\MODVIS_SUBMISSION

Analyzing the mathematical properties of perceptually meaningful linear-nonlinear transforms is interesting because this computation is at the core of many vision models. Here we make such analysis in detail using a specific model [Malo & Simoncelli, SPIE Human Vision Electr. Imag. 2015] which is illustrative because it consists of a cascade of standard linear-nonlinear modules. The interest of the analytic results and the numerical methods involved transcend the particular model because of the ubiquity of the linear-nonlinear structure.

Here we extend [Malo&Simoncelli 15] by considering 4 layers: (1) linear spectral integration and nonlinear brightness response, (2) definition of local contrast by using linear filters and divisive normalization, (3) linear CSF filter and nonlinear local con- trast masking, and (4) linear wavelet-like decomposition and nonlinear divisive normalization to account for orientation and scale-dependent masking. The extra layers were measured using Maximum Differentiation [Malo et al. VSS 2016].

First, we describe the general architecture using a unified notation in which every module is composed by isomorphic linear and nonlinear transforms. The chain-rule is interesting to simplify the analysis of systems with this modular architecture, and invertibility is related to the non-singularity of the Jacobian matrices. Second, we consider the details of the four layers in our particular model, and how they improve the original version of the model. Third, we explicitly list the derivatives of every module, which are relevant for the definition of perceptual distances, perceptual gradient descent, and characterization of the deformation of space. Fourth, we address the inverse, and we find different analytical and numerical problems in each specific module. Solutions are proposed for all of them. Finally, we describe through examples how to use the toolbox to apply and check the above theory.

In summary, the formulation and toolbox are ready to explore the geometric and perceptual issues addressed in the introductory section (giving all the technical information that was missing in [Malo&Simoncelli 15]).