Date of Award
8-2016
Degree Type
Dissertation
Degree Name
Doctor of Philosophy (PhD)
Department
Consumer Science
First Advisor
Richard A. Feinberg
Second Advisor
Christopher J. Kowal
Committee Chair
Richard A. Feinberg
Committee Co-Chair
Christopher J. Kowal
Committee Member 1
Corinne A. Novell
Committee Member 2
Tongxiao Catherine Zhang
Abstract
Twitter may be a data resource to support healthcare research. Literature is still limited related to the potential of Twitter data as it relates to healthcare. The purpose of this study was to contrast the processes by which a large collection of unstructured disease-related tweets could be converted into structured data to be further analyzed. This was done with the objective of gaining insights into the content and behavioral patterns associated with disease-specific communications on Twitter. Twelve months of Twitter data related to cancer, diabetes, and asthma were collected to form a baseline dataset containing over 34 million tweets. As Twitter data in its raw form would have been difficult to manage, three separate data reduction methods were contrasted to identify a method to generate analysis files, maximizing classification precision and data retention. Each of the disease files were then run through a CHAID (chi-square automatic interaction detector) analysis to demonstrate how user behavior insights vary by disease. Chi-square Automatic Interaction Detector (CHAID) was a technique created by Gordon V. Kass in 1980. CHAID is a tool used to discover the relationship between variables. This study followed the standard CRISP-DM data mining approach and demonstrates how the practice of mining Twitter data fits into this six-stage iterative framework. The study produced insights that provide a new lens into the potential Twitter data has as a valuable healthcare data source as well as the nuances involved in working with the data.
Recommended Citation
Chulis, Kimberly, "Data mining Twitter for cancer, diabetes, and asthma insights" (2016). Open Access Dissertations. 745.
https://docs.lib.purdue.edu/open_access_dissertations/745