Feature extraction to improve nowcasting using social media event detection on cloud computing and sentiment analysis

David L Kimmey, Purdue University

Abstract

Nowcasting is defined as the prediction of the present, the very near future, and the very recent past using real-time data. Nowcasting with social media creates challenges because of the HACE characteristics of big data (i.e., heterogeneous, autonomous, complex, and evolving associations). Thus, this thesis proposes a feature extraction method to improve nowcasting with social media. The proposed social media event detection algorithm utilizes K-SPRE methodology and the results are processed with sentiment analysis. In addition, we develop a parallel algorithm of the methodology on a cloud environment, and we adapt an artificial neural network to build a predictive model for nowcasting. Furthermore, we complete a case study with real data: Twitter and the Center for Disease Control (CDC) influenza like illness (ILI) reports. Experiments with predicting the CDC’s ILI report shows nowcasting with social media outperforms the traditional time series AR(1) model by as much as 16% to 20%, in terms of statistical error. In addition, implementation of the social media event detection algorithm with cloud computing improved the algorithm’s running time by 65%.

Degree

M.S.

Advisors

Yoo, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS