Ontology based text sanitization

Pedro J Pastrana-Camacho, Purdue University

Abstract

Documents containing personally identifiable information cannot be shared freely. Text sanitization plays an important role in information dissemination. Current sanitization methods and techniques have focused on finding and hiding personally identifiable information, giving different importance to each. The process of finding personally identifiable information has received more attention by the scientific community than the process of hiding the discovered identifiers. This work presents the use of ontologies as an alternative to efficiently handle the main issues of text sanitization. First we present the ontological framework that supports an information-theoretic approach to reduce sensitivity and identifiability. We then present how the use of properties and axioms from the ontology are used to optimize semantics and utility of text without compromising privacy. Finally, we explain a set of algorithms optimizing semantics with multiple word senses using ontologies and how the result can be used as an alternative to word sense disambiguation as shown by experimental results.

Degree

M.S.

Advisors

Clifton, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS