IRC channel data analysis using Apache Solr

Nikhil Reddy Boreddy, Purdue University

Abstract

Internet Relay Chat (IRC) was one of the first real-time communication protocols over the internet. It was not designed with any form of Authentication, Authorization and Accounting features. This made IRC channels a place to conduct transactions in complete anonymity. On the other hand with the advent of Big Data we are now able to process large quantities of data in a very short period of time. This research presents a method to use Apache Solr, a text indexing server built on top of Lucene to index and search large quantities of IRC data collected over months from public IRC channels. It even presents a highly scalable approach to monitor public IRC channels by creation of IRC Client Bots which are in turn controlled by a robust IRC Parent Bot. The data thus collected is analyzed by Apache Solr and MS SQL servers and the response times are compared. This research concluded that Apache Solr outperforms MS SQL by a very great margin and such an implementation can be used by digital forensic investigators to monitor and search public IRC channels.

Degree

M.S.

Advisors

Rogers, Purdue University.

Subject Area

Information Technology|Multimedia Communications

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS