Suresh Venkatasubramanian :: talks :: epfl
Home Research Papers Talks CV Links About Me The Geomblog

Algorithms for Information Distances

Abstract:

In many data analysis settings, whether they be in data cleaning, network analysis, machine learning, computer vision, or even neuroscience, the basic primitive is a distribution, or a histogram. The analysis task involves estimating distances between such primitives, exactly or approximately.

Distributions can be viewed as points in a d-dimensional space, and "standard" distances (like those derived from l_p norms) can be employed in tasks like classification and clustering. It turns out though that other distances (especially measures like relative entropy that are derived from information-theoretic considerations) are more meaningful in many application settings, both formally and in practice.

This presents an interesting algorithmic challenge: Are there good algorithms for estimating such distances, preferably in sublinear space/time ? Can these be used effectively in various application domains, and at scale ?

In this talk, I will describe current research directions in the "algorithmics" of distributions, and sketch out recent work that I've been doing on both the applied and theoretical ends of this area.


Related Papers: