Research Website

https://engineering.purdue.edu/elab/index.html

Keywords

Reinforcement, Q-learning, Computer Vision, machine learning

Presentation Type

Event

Research Abstract

There has been success in recent years for neural networks in applications requiring high level intelligence such as categorization and assessment. In this work, we present a neural network model to learn control policies using reinforcement learning. It takes a raw pixel representation of the current state and outputs an approximation of a Q value function made with a neural network that represents the expected reward for each possible state-action pair. The action is chosen an \epsilon-greedy policy, choosing the highest expected reward with a small chance of random action. We used gradient descent to update the weights and biases of this network as it is efficient in terms of computation and convergence rate even with large scale models. To test this network, we designed a simple search task over a 4x4 grid. No assumptions were made about the control task. Given only raw inputs for state and the reward received from actions paired with that state, the agent was able to learn this task. Performance was evaluated using the number of rewards received out of 10000 opportunities. Over the course of 5 epochs, the network demonstrated significantly higher accuracy than random action alone for low dimensionality spaces. On higher dimensionality inputs, oscillation is observed leading to significantly lower accuracy and much higher variability. PCA proved to be an effective means of feature extraction reducing the dimensionality of the input, increasing precision; however, it required a dataset to be generated from initial random action.

Session Track

Sensing

Share

COinS
 
Aug 7th, 12:00 AM

Model-free Method of Reinforcement Learning for Visual Tasks

There has been success in recent years for neural networks in applications requiring high level intelligence such as categorization and assessment. In this work, we present a neural network model to learn control policies using reinforcement learning. It takes a raw pixel representation of the current state and outputs an approximation of a Q value function made with a neural network that represents the expected reward for each possible state-action pair. The action is chosen an \epsilon-greedy policy, choosing the highest expected reward with a small chance of random action. We used gradient descent to update the weights and biases of this network as it is efficient in terms of computation and convergence rate even with large scale models. To test this network, we designed a simple search task over a 4x4 grid. No assumptions were made about the control task. Given only raw inputs for state and the reward received from actions paired with that state, the agent was able to learn this task. Performance was evaluated using the number of rewards received out of 10000 opportunities. Over the course of 5 epochs, the network demonstrated significantly higher accuracy than random action alone for low dimensionality spaces. On higher dimensionality inputs, oscillation is observed leading to significantly lower accuracy and much higher variability. PCA proved to be an effective means of feature extraction reducing the dimensionality of the input, increasing precision; however, it required a dataset to be generated from initial random action.

http://docs.lib.purdue.edu/surf/2014/presentations/52