Samuel Gerber
Visiting Assistant Professor
Mathematics, Duke University


Projection of Correlation

Visualizing Correlation in Lare Data Sets

The degree of correlation between random variables is a key quantity in many scientific inquires and engineering applications. Traditionally, the degree of correlation between two random variables is used to support or refute a specific hypothesis. In today's scientific process correlation is often used as an exploratory tool to help form new hypotheses and sift through vast amounts of data. However, low-level visualization tools for exploratory correlation analysis lack the capacity to deal with these increasingly large data sets.

Correlation can be measured in different ways. One of the most basic and ubiquitous is Pearson's correlation which measures the amount of linear dependence between two random variables. While there exists a few methods to visualize Pearson's correlation they do not scale well beyond a few tens to hundreds of variables, and they often require additional computational steps to present visually meaningful results. The goal of this work is to take advantage of the human pattern recognition capabilities to explore correlation structures while scaling to data sets with tens to hundreds of thousands of variables. To achieve this goal our work exploits the geometrical encoding of Pearson's correlation, in combination with user-interactions, to form a simple but rich approach to visually explore correlation structures in large data sets.

november 2012