#### Research Title

Model-free Method of Reinforcement Learning for Visual Tasks

#### Research Website

https://engineering.purdue.edu/elab/index.html

#### Keywords

Reinforcement, Q-learning, Computer Vision, machine learning

#### Presentation Type

Event

#### Research Abstract

There has been success in recent years for neural networks in applications requiring high level intelligence such as categorization and assessment. In this work, we present a neural network model to learn control policies using reinforcement learning. It takes a raw pixel representation of the current state and outputs an approximation of a Q value function made with a neural network that represents the expected reward for each possible state-action pair. The action is chosen an \epsilon-greedy policy, choosing the highest expected reward with a small chance of random action. We used gradient descent to update the weights and biases of this network as it is efficient in terms of computation and convergence rate even with large scale models. To test this network, we designed a simple search task over a 4x4 grid. No assumptions were made about the control task. Given only raw inputs for state and the reward received from actions paired with that state, the agent was able to learn this task. Performance was evaluated using the number of rewards received out of 10000 opportunities. Over the course of 5 epochs, the network demonstrated significantly higher accuracy than random action alone for low dimensionality spaces. On higher dimensionality inputs, oscillation is observed leading to significantly lower accuracy and much higher variability. PCA proved to be an effective means of feature extraction reducing the dimensionality of the input, increasing precision; however, it required a dataset to be generated from initial random action.

#### Session Track

Sensing

#### Recommended Citation

Jeff S. Soldate, Jonghoon JIn, and Eugenio Culurciello,
"Model-free Method of Reinforcement Learning for Visual Tasks"
(August 7, 2014).
*The Summer Undergraduate Research Fellowship (SURF) Symposium.*
Paper 52.

http://docs.lib.purdue.edu/surf/2014/presentations/52

Model-free Method of Reinforcement Learning for Visual Tasks

There has been success in recent years for neural networks in applications requiring high level intelligence such as categorization and assessment. In this work, we present a neural network model to learn control policies using reinforcement learning. It takes a raw pixel representation of the current state and outputs an approximation of a Q value function made with a neural network that represents the expected reward for each possible state-action pair. The action is chosen an \epsilon-greedy policy, choosing the highest expected reward with a small chance of random action. We used gradient descent to update the weights and biases of this network as it is efficient in terms of computation and convergence rate even with large scale models. To test this network, we designed a simple search task over a 4x4 grid. No assumptions were made about the control task. Given only raw inputs for state and the reward received from actions paired with that state, the agent was able to learn this task. Performance was evaluated using the number of rewards received out of 10000 opportunities. Over the course of 5 epochs, the network demonstrated significantly higher accuracy than random action alone for low dimensionality spaces. On higher dimensionality inputs, oscillation is observed leading to significantly lower accuracy and much higher variability. PCA proved to be an effective means of feature extraction reducing the dimensionality of the input, increasing precision; however, it required a dataset to be generated from initial random action.

http://docs.lib.purdue.edu/surf/2014/presentations/52