Towards Ontology-Based Phishing Detection

Gilchan Park, Purdue University


Detection of phishing emails is a topic that has received a lot of attention both from academia and industry due to the devastating effects that phishing enabled data breaches have had on private individual and companies. Notwithstanding enormous efforts to detect phishing attacks, phishing still remains a major threat in information security, and the damages from it are not forecasted to disappear in the near future. One of the reasons is the diversity of attacks, especially within spear phishing and whaling. Another reason is that the natural language part of the detectors is usually devoid of semantics. Many of the existing phishing detection techniques make use of keyword matching. However, phishers exploit genuine messages and users’ background information to forge counterfeit or fake baits so as to increase the success rate of deception. Since phishers craft legitimate-looking emails, many common words between legitimate emails and phishing emails appear in the email body. In addition, phishers often obtain keyword lists used in the matching systems, and they can easily detour defensing mechanisms that analyze the words of an email. The purpose of this dissertation is to investigate the effectiveness of conceptualization for lexical features, which is hypothesized to reduce vulnerability to variance in superficial characteristics. The proposed approach adds semantics to highly accurate bag-of-words and part-of-speech approaches. This study shows that while the current approach is not as effective as a starting point, it retains its performance as a testing corpus deviates from training, while the performance of the original approach decreases with the amount of deviations.




Rayz, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server