Using reinforcement learning to learn relevance ranking of search queries

Hareesh Sandupatla, Purdue University

Abstract

Web search has become a part of everyday life for hundreds of millions of users around the world. However, the effectiveness of a user's search depends vitally on the quality of search result ranking. Even though enormous efforts have been made to improve the ranking quality, there is still significant misalignment between search engine ranking and an end user's preference order. This is evident from the fact that, for many search results on major search and e-commerce platforms, many users ignore the top ranked results and click on the lower ranked results. Nevertheless, finding a ranking that suits all the users is a difficult problem to solve as every user's need is different. So, an ideal ranking is the one which is preferred by the majority of the users. This emphasizes the need for an automated approach which improves the search engine ranking dynamically by incorporating user clicks in the ranking algorithm. In existing search result ranking methodologies, this direction has not been explored profoundly. A key challenge in using user clicks in search result ranking is that the relevance feedback that is learnt from click data is imperfect. This is due to the fact that a user is more likely to click a top ranked result than a lower ranked result, irrespective of the actual relevance of those results. This phenomenon is known as position bias which poses a major difficulty in obtaining an automated method for dynamic update of search rank orders. In my thesis, I propose a set of methodologies which incorporate user clicks for dynamic update of search rank orders. The updates are based on adaptive randomization of results using reinforcement learning strategy by considering the user click activities as reinforcement signal. Beginning at any rank order of the search results, the proposed methodologies guaranty to converge to a ranking which is close to the ideal rank order. Besides, the usage of reinforcement learning strategy enables the proposed methods to overcome the position bias phenomenon. To measure the effectiveness of the proposed method, I perform experiments considering a simplified user behavior model which I call color ball abstraction model. I evaluate the quality of the proposed methodologies using standard information retrieval metrics like Precision at n (P@n), Kendall tau rank correlation, Discounted Cumulative Gain (DCG) and Normalized Discounted Cumulative Gain (NDCG). The experiment results clearly demonstrate the success of the proposed methodologies.

Degree

M.S.

Advisors

Mohammad, Purdue University.

Subject Area

Web Studies|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS