Ensemble classification techniques for relational domains

Hoda M Eldardiry, Purdue University

Abstract

Ensemble learning techniques combine predictions of multiple models to improve classification, while relational learning methods focus on utilizing link information to improve classification for network data. Our goal is to combine these two machine learning directions by applying ensemble classification to improve relational learning. There are many domains in which data exhibits complex and heterogeneous relational structures. However, applying traditional ensemble methods in relational domains has a number of limitations that have neither been studied nor addressed before. This dissertation (1) explores these limitations, (2) gives explanations for why they exist, (3) provides solutions for them by proposing a relational ensemble framework, (4) applies the proposed relational ensemble framework to combine link information for collective classification in multi-network domains, (5) develops a more general framework that works for single-network settings, and (6) presents a theoretical analysis framework to support the empirical findings. Traditional ensemble methods assume independent and identically distributed (i.i.d.) data and exact inference models. Both assumptions are violated in relational domains, where data has heterogeneous link structures, and models use collective inference techniques. Ensemble methods that assume i.i.d. data use independent sampling approaches during ensemble learning. This underestimates the increased variance exhibited by network data, so the ensemble is unable to reduce the full amount of variance in learning. We propose a novel method for learning ensembles from relational data, which can capture and reduce more learning variance. The exact inference assumption overlooks inference variance, introduced by collective classification techniques. We propose a novel ensemble classification method for relational data, specifically when the ensemble uses collective inference base models. This is the first ensemble method that accounts for and reduces inference variance.

Degree

Ph.D.

Advisors

Neville, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS