Suresh Venkatasubramanian :: talks :: dagstuhl
Home Research Papers Talks CV Links About Me The Geomblog

Algorithms for Information Distances

Abstract:

In many data analysis settings, whether they be in data cleaning, network analysis, machine learning, computer vision, or even neuroscience, the basic primitive is a distribution, or a histogram. The analysis task involves estimating distances between such primitives, exactly or approximately.

Distributions can be viewed as points in a d-dimensional space, and "standard" distances (like those derived from l_p norms) can be employed in tasks like classification and clustering. It turns out though that other distances (especially measures like relative entropy that are derived from information-theoretic considerations) are more meaningful in many application settings, both formally and in practice.

This presents an interesting algorithmic challenge: Are there good algorithms for estimating such distances, preferably in sublinear space/time ? Can these be used effectively in various application domains, and at scale ?

In this talk, I will describe current research directions in the "algorithmics" of distributions, and sketch out recent work that I've been doing on both the applied and theoretical ends of this area.