Graph based mining on weighted directed graphs for subnetworks and path discovery

Sijin Cherupilly Abdulkarim, Purdue University

Abstract

Subnetwork or path mining is an emerging data mining problem in many areas including scientific and commercial applications. Graph modeling is one of the effective ways in representing real world networks. Many natural and man-made systems are structured in the form of networks. Traditional machine learning and data mining approaches assume data as a collection of homogenous objects that are independent of each other whereas network data are potentially heterogeneous and interlinked. In this paper we propose a novel algorithm to find subnetworks and Maximal paths from a weighted, directed network represented as a graph. The main objective of this study is to find meaningful Maximal paths from a given network based on three key parameters: node weight, edge weight, and direction. This algorithm is an effective way to extract Maximal paths from a network modeled based on a user's interest. Also, the proposed algorithm allows the user to incorporate weights to the nodes and edges of a biological network. The performance of the proposed technique was tested using a Colorectal Cancer biological network. The subnetworks and paths obtained through our network mining algorithm from the biological network were scored based on their biological significance. The subnetworks and Maximal paths derived were verified using Metacore™ as well as literature. The algorithm is developed into a tool where the user can input the node list and the edge list. The tool can also find out the upstream and downstream of a given entity (genes/proteins etc.) from the derived Maximal paths. The complexity of finding the algorithm is found to be O(nlogn) in the best case and O(n2 logn) in the worst case.

Degree

M.S.

Advisors

Palakal, Purdue University.

Subject Area

Bioinformatics|Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS