Identifying Undesirable Behavior in Social Media: Towards Automated Fact-Checking and Youtube Meta-Data Spam Detection

Ayush Patwari, Purdue University

Abstract

Automated scrutiny and ltering of undesirable data is of paramount importance in the modern digital world driven by plentiful yet unrened data. We study two related problems in the social media domain: automated fact-checking for computational journalism and ne-grained meta-data spam detection for YouTube, the world's largest video sharing platform. Fact-checking political discussions has become an essential clog in computational journalism. This task encompasses an important sub-task – identifying the set of statements with 'check-worthy' claims. Previous work has treated this as a simple text classication problem discounting the nuances involved in determining what makes statements check-worthy. We introduce a dataset of political debates from the 2016 US Presidential election campaign annotated using all major fact-checking media outlets. We study the characteristics of check-worthy statements and show that there is a need to model conversation context, debate dynamics and implicit world knowledge. We design a multi-classier system TATHYA, that models latent groupings in data and improves state-of-the-art systems by 19.5% in F1-score on a held-out test set, gaining primarily in recall. YouTube is plagued with spam campaigns that include spreading malicious links through video description or comments, disseminating adult or illegal content and generating articial trac through click baits. We tackle the problem of detecting misleading videos – those having description and title unrelated to the posted content. We show several characteristics of misleading (spam) behavior modeled through texxi tual and temporal analysis of comments and the uploader. We develop NIRMALYA – a supervised learning framework to detect spam videos that can help prune search recommendations to contain only the legitimate videos. We evaluate our system on a novel manually annotated data set curated from a large corpus of 500K videos. It achieves mean F1-score of 0.82 in detecting spam videos with a recall of 0.83.

Degree

M.S.

Advisors

Bagchi, Purdue University.

Subject Area

Computer science

Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server
.

Share

COinS