Date of Award


Degree Type


Degree Name

Doctor of Philosophy (PhD)



Committee Chair

Ralph Siebert

Committee Member 1

Mohitosh Kejriwal

Committee Member 2

Justin Tobias

Committee Member 3

Stephen Martin

Committee Member 4

Joe Mazur


This dissertation is composed of three independent chapters relating the theory and empirical methodology in economics to machine learning and important topics in information age . The first chapter raises an important problem in structural estimation and provide a solution to it by incorporating a culture in machine learning. The second chapter investigates a problem of statistical discrimination in big data era. The third chapter studies the implication of information uncertainty in the security software market.

Structural estimation is a widely used methodology in empirical economics, and a large class of structural econometric models are estimated through the generalized method of moments (GMM). Traditionally, a model to be estimated is chosen by researchers based on their intuition on the model, and the structural estimation itself does not directly test it from the data. In other words, not sufficient amount of attention is paid to devise a principled method to verify such an intuition. In the first chapter, we propose a model selection for GMM by using cross-validation, which is widely used in machine learning and statistics communities. We prove the consistency of the cross-validation. The empirical property of the proposed model selection is compared with existing model selection methods by Monte Carlo simulations of a linear instrumental variable regression and oligopoly pricing model. In addition, we propose the way to apply our method to Mathematical Programming of Equilibrium Constraint (MPEC) approach. Finally, we perform our method to online-retail sales data to compare dynamic model to static model.

In the second chapter, we study a fair machine learning algorithm that avoids a statistical discrimination when making a decision. Algorithmic decision making process now affects many aspects of our lives. Standard tools for machine learning, such as classification and regression, are subject to the bias in data, and thus direct application of such off-the-shelf tools could lead to a specific group being statistically discriminated. Removing sensitive variables such as race or gender from data does not solve this problem because a disparate impact can arise when non-sensitive variables and sensitive variables are correlated. This problem arises severely nowadays as bigger data is utilized, it is of particular importance to invent an algorithmic solution. Inspired by the two-stage least squares method that is widely used in the field of economics, we propose a two-stage algorithm that removes bias in the training data. The proposed algorithm is conceptually simple. Unlike most of existing fair algorithms that are designed for classification tasks, the proposed method is able to (i) deal with regression tasks, (ii) combine explanatory variables to remove reverse discrimination, and (iii) deal with numerical sensitive variables. The performance and fairness of the proposed algorithm are evaluated in simulations with synthetic and real-world datasets.

The third chapter examines the issue of information uncertainty in the context of information security. Many users lack the ability to correctly estimate the true quality of the security software they purchase, as evidenced by some anecdotes and even some academic research. Yet, most of the analytical research assumes otherwise. Hence, we were motivated to incorporate this “false sense of security” behavior into a game-theoretic model and study the implications on welfare parameters. Our model features two segments of consumers, well-and ill-informed, and the monopolistic software vendor. Well-informed consumers observe the true quality of the security software, while the ill-informed ones overestimate. While the proportion of both segments are known to the software vendor, consumers are uncertain about the segment they belong to. We find that, in fact, the level of the uncertainty is not necessarily harmful to society. Furthermore, there exist some extreme circumstances where society and consumers could be better off if the security software did not exist. Interestingly, we also find that the case where consumers know the information structure and weight their expectation accordingly does not always lead to optimal social welfare. These results contrast with the conventional wisdom and are crucially important in developing appropriate policies in this context.