Date of Award

8-2018

Degree Type

Dissertation

Degree Name

Doctor of Philosophy (PhD)

Department

Technology

Committee Chair

Julia Rayz

Committee Member 1

Victor Raskin

Committee Member 2

J. Eric Dietz

Committee Member 3

Eric Matson

Committee Member 4

William G. McCartney

Abstract

Detection of phishing emails is a topic that has received a lot of attention both from academia and industry due to the devastating effects that phishing enabled data breaches have had on private individual and companies. Notwithstanding enormous efforts to detect phishing attacks, phishing still remains a major threat in information security, and the damages from it are not forecasted to disappear in the near future. One of the reasons is the diversity of attacks, especially within spear phishing and whaling. Another reason is that the natural language part of the detectors is usually devoid of semantics. Many of the existing phishing detection techniques make use of keyword matching. However, phishers exploit genuine messages and users’ background information to forge counterfeit or fake baits so as to increase the success rate of deception. Since phishers craft legitimate-looking emails, many common words between legitimate emails and phishing emails appear in the email body. In addition, phishers often obtain keyword lists used in the matching systems, and they can easily detour defensing mechanisms that analyze the words of an email. The purpose of this dissertation is to investigate the effectiveness of conceptualization for lexical features, which is hypothesized to reduce vulnerability to variance in superficial characteristics. The proposed approach adds semantics to highly accurate bag-of-words and part-of-speech approaches. This study shows that while the current approach is not as effective as a starting point, it retains its performance as a testing corpus deviates from training, while the performance of the original approach decreases with the amount of deviations.

Share

COinS