Generalized Plot Matrices, Automatic Cognostics, and Efficient Data Exploration
Abstract
Statistical visualization of large-scale data has become an increasingly essential task in the era of big data. In particular, exploratory data analysis and visualization is the first step towards any in-depth statistical modeling and analysis. Being able to rapidly specify and generate visualizations regardless of data-scale is crucial. Trelliscope handles data visualization at scale by attaching cognostics (univariate metrics) to each panel aiding in the organization of panels of interest. While Trelliscope provides a general framework for visualizing data at scale, there are several aspects that can be improved to help users generate displays more rapidly (such as cognostics, axis scales, etc.). When visually modeling complex data with Trelliscope, traditional two-grouped plot matrices do not allow for a mixed-scale axis to display both continuous and discrete data natively. Web-based visualization systems like Trelliscope, that retrieve information from a back-end service such as R, must maximize performance for an engaging user experience. Addressing the mixed-scale plot matrix axis, a generalized plot matrix is developed for two-grouped data which displays both continuous and discrete data using appropriate visualization methods for each panel. To compliment Trelliscope’s panel organization, automatic cognostic summaries are established by mapping the context of what is visualized to classes of metrics that are meaningful for each type of visualization layer at no additional user effort. Finally, communication from web-based visualization systems to back-end R services is greatly improved by leveraging the GraphQL query language which minimizes the number of required data queries needed to perform data extraction. Together, these three contributions curtail the increasing complexity and scale of data visualization.
Degree
Ph.D.
Advisors
Hafen, Purdue University.
Subject Area
Statistics|Information science|Computer science
Off-Campus Purdue Users:
To access this dissertation, please log in to our
proxy server.