Privacy through deniable search
Abstract
Query-based web search is becoming an integral part of many people’s daily activities. Most do not realize that their search history can be used to identify them (and their interests). In July 2006, AOL released an anonymized search query log of some 600K randomly selected users. While valuable as a research tool, the anonymization was insufficient: individuals could be identified from the contents of the queries alone. We propose a client-centered approach based on deniable search: actual user queries are replaced with a set of k queries that hide the actual query. We formalize the definition of Deniable Search and develop two complementary schemes that achieve deniable privacy for web search. In the first approach of Plausibly Deniable Search, cover queries have characteristics similar to the user query but on unrelated topics. The system ensures that any of these k queries will produce the same set of k queries, giving k possible topics the user could have been searching for. We also investigate a system for generating queries automatically by creating clusters of a query log in the semantic space. In the second approach, we ensure that sequences of queries (and cover queries) retain deniability. Real user queries share terms and topics in a sequence as the users refine queries. We extract features from user query logs and use them to generate cover query sequences that have similar properties. We evaluate the methods using a large query log and DMOZ categorized webpages.
Degree
Ph.D.
Advisors
Clifton, Purdue University.
Subject Area
Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.