Samuel Gerber
Visiting Assistant Professor
Mathematics, Duke University
|
home -
publications -
research -
software
|
|
Visualizing Correlation in Lare Data Sets
The degree of correlation between random variables is a key quantity in many
scientific inquires and engineering applications. Traditionally, the degree of
correlation between two random variables is used to support or refute a specific
hypothesis. In today's scientific process correlation is often used as an
exploratory tool to help form new hypotheses and sift through vast amounts of
data. However, low-level visualization tools for exploratory correlation
analysis lack the capacity to deal with these increasingly large data sets.
Correlation can be measured in different ways. One of the most basic and
ubiquitous is Pearson's correlation which measures the amount of linear
dependence between two random variables. While there exists a few methods to
visualize Pearson's correlation they do not scale well beyond a few
tens to hundreds of variables, and they often require additional computational steps to
present visually meaningful results. The goal of this work is to take advantage of the
human pattern recognition capabilities to explore
correlation structures while scaling to data sets with tens to hundreds of
thousands of variables. To achieve this goal our work exploits the
geometrical encoding of Pearson's correlation, in combination with user-interactions,
to form a simple but rich approach to visually explore correlation structures in large data sets.
|