Research Website

http://www.purdue.edu/discoverypark/vaccine/

Keywords

Social Media Data Analyzing, Topic Extraction using NMF, Information Search and Retrieval

Presentation Type

Event

Research Abstract

With the fast growth of social media services, vast amount of user-generated content with time-space stamps are produced everyday. Considerable amount of these data are publicly available online, some of which collectively convey information that are of interest to data analysts. Social media data are dynamic and unstructured by nature, which makes it very hard for analysts to efficiently and effectively retrieve useful information. Social Media Analytics Reporting Toolkit (SMART), a system developed at Purdue VACCINE lab, aims to support such analyzing. The current framework collects real-time Twitter messages and visualizes volume densities on a map. It uses Latent Dirichilet Allocation (LDA) to extract regional topics and can optionally apply Seasonal-Trend decomposition using Loess (STL) to detect abnormal events. While Twitter has a fair amount of active users, they account for a small portion of total active social media users. Data generated by many other social media services are not currently utilized by SMART. Therefore, my work focused on expanding data sources of SAMRT system by creating means to collect data from other sources such as Facebook and Instagram. During a test run using a collection of 88 specified keywords in search, over two million Facebook posts were collected in one week. Besides, current SMART framework utilizes only one topic model, i.e. LDA, which is considered to be slower than Non-negative Matrix Factorization (NMF) model, thus I also put my effort into integrating NMF algorithm into the system. The improved SMART system can be used to fulfill a variety of analyzing tasks such as monitoring regional social media responses from different sources in disastrous events, detecting user reported crimes and so on. SMART is currently an ongoing and promising project that can be further improved by integrating new features.

Session Track

Data Analytics

Share

COinS
 
Aug 7th, 12:00 AM

Social Media Analytics Reporting Toolkit

With the fast growth of social media services, vast amount of user-generated content with time-space stamps are produced everyday. Considerable amount of these data are publicly available online, some of which collectively convey information that are of interest to data analysts. Social media data are dynamic and unstructured by nature, which makes it very hard for analysts to efficiently and effectively retrieve useful information. Social Media Analytics Reporting Toolkit (SMART), a system developed at Purdue VACCINE lab, aims to support such analyzing. The current framework collects real-time Twitter messages and visualizes volume densities on a map. It uses Latent Dirichilet Allocation (LDA) to extract regional topics and can optionally apply Seasonal-Trend decomposition using Loess (STL) to detect abnormal events. While Twitter has a fair amount of active users, they account for a small portion of total active social media users. Data generated by many other social media services are not currently utilized by SMART. Therefore, my work focused on expanding data sources of SAMRT system by creating means to collect data from other sources such as Facebook and Instagram. During a test run using a collection of 88 specified keywords in search, over two million Facebook posts were collected in one week. Besides, current SMART framework utilizes only one topic model, i.e. LDA, which is considered to be slower than Non-negative Matrix Factorization (NMF) model, thus I also put my effort into integrating NMF algorithm into the system. The improved SMART system can be used to fulfill a variety of analyzing tasks such as monitoring regional social media responses from different sources in disastrous events, detecting user reported crimes and so on. SMART is currently an ongoing and promising project that can be further improved by integrating new features.

http://docs.lib.purdue.edu/surf/2014/presentations/100