Abstract

Phishing is one of the most potentially disruptive actions that can be performed on the Internet. Intellectual property and other pertinent business information could potentially be at risk if a user falls for a phishing attack. The most common way of carrying out a phishing attack is through email. The adversary sends an email with a link to a fraudulent site to lure consumers into divulging their confidential information. While such attacks may be easily identifiable for those well-versed in technology, it may be difficult for the typical Internet user to spot a fraudulent email.

The emphasis of this research is to detect phishing attempts within emails. To date, various phishing detection algorithms, mostly based on the blacklists, have been reported to produce promising results. Yet, the phishing crime rates are not likely to decline as the cyber-criminals devise new tricks to avoid those phishing filters. Since the early non-text based approaches do not address the text content of the email that actually deludes users, this paper proposes a text-based phishing detection algorithm. In particular, this research focuses on improving upon the previously published text-based approach. The algorithm in the previous work analyzes the body text in an email to detect whether the email

message asks the user to do some action such as clicking on the link that directs the user to a fraudulent website. This work expanded the text analysis portion of that algorithm, which performed poorly in catching phishing emails. The modified algorithm generated considerably higher results in filtering out malicious emails than the original algorithm did; but the rate of text incorrectly identified as phishing, which is the FPR, was slightly worse. To address the FP problem, a statistical approach was adopted and the method ameliorated the FPR while minimizing the decrease in the phishing detection accuracy.

The studies in this research make use of a simulation model technique to illustrate the algorithms. The simulation model visualizes the overall process of the analysis and yields graphical and statistical results that are used to conduct the experiments. In addition, since the simulation model operates in the environment controlled by a user, using the simulation model allows the user to easily apply modified concepts for experiments. This simulation feature was utilized to find and eliminate the unnecessary factors in the algorithm, and therefore the optimal performance time was measured.

Keywords

anylogic, natural language processing, phishing, simulation model technique

Disciplines

Computer Sciences

Degree Type

Thesis

Degree Name

Master of Science (MS)

Department

Computer and Information Technology

First Advisor

Julia Taylor

Committee Chair

Julia Taylor

Committee Member 1

Eric Dietz

Committee Member 2

Eric Matson

Date of Award

2013

Recommended Citation

Park, Gilchan, "Text-Based Phishing Detection Using A Simulation Model" (2013). Open Access Theses. 137.
https://docs.lib.purdue.edu/open_access_theses/137

Download

Included in

Computer Sciences Commons

COinS

Open Access Theses

Text-Based Phishing Detection Using A Simulation Model

Abstract

Keywords

Disciplines

Degree Type

Degree Name

Department

First Advisor

Committee Chair

Committee Member 1

Committee Member 2

Date of Award

Recommended Citation

Included in

Search

Links

Links for Authors

Browse

Open Access Theses

Text-Based Phishing Detection Using A Simulation Model

Author

Abstract

Keywords

Disciplines

Degree Type

Degree Name

Department

First Advisor

Committee Chair

Committee Member 1

Committee Member 2

Date of Award

Recommended Citation

Included in

Share

Search

Links

Links for Authors

Browse