http://www.cs.utah.edu/~suresh
suresh at cs utah edu
Ph: 801 581 8233
Room 3404, School of Computing
50 S. Central Campus Drive,
Salt Lake City, UT 84112.

NSF CCF-0953066: CAREER: Geometric Algorithms For Data Analysis In Spaces Of Distributions

Collections of distributions arise naturally when analyzing large data sets. Since it is impractical to store all but a small fraction of such data, distributional representations are typically used to summarize the data in compact form. For example, a document in a corpus is typically represented by a normalized vector of frequencies of occurrence of keywords, an image is represented by a histogram over gradient features and speech signals are represented by spectral densities over a frequency domain.

Representing data sets as collections of distributions enables analysis via powerful concepts from statistics, learning theory and information theory. Concepts like strength of belief, information content, and pattern likelihood are used to extract meaning and structure from the data and are quantified using information measures like the Kullback-Leibler distance and its parent class, the Bregman divergences.

These measures capture meaning in data in a manner that traditional metrics cannot, by connecting abstract notions of information loss and transfer with concrete geometric notions like distances. However, they lack properties like symmetry and the triangle inequality that are essential requirements for the application of traditional geometric algorithms for data analysis.

In this project, the PI will develop a systematic, rigorous and global algorithmic framework for manipulating these distances. This framework will provide the foundation for efficient and accurate data analysis of spaces of distributions, and will lead to deeper insights into analysis problems across a wide range of applications.

All the research described below has been funded by the NSF under grant CCF-0953066. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."

Efficient Protocols for Distributed Classification and Optimization

Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian
arXiv:1204.3523v1 [cs.LG]

Protocols for Learning Classifiers on Distributed Data

Hal Daumé, Jeff M. Phillips, Avishek Saha and Suresh Venkatasubramanian
In the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

Adaptive Sampling for Large-Data MDS

Arvind Agarwal, Chad Brubaker, Hal Daumé III, Jeff M. Phillips and Suresh Venkatasubramanian
Submitted.

Approximate Bregman near neighbors in sublinear time: Beyond the triangle inequality

Amirali Abdullah, John Moeller and Suresh Venkatasubramanian
Proc. Symposium on Computational Geometry, 2012
http://arxiv.org/abs/1108.0835

Generating a Diverse Set of High-Quality Clusterings

Jeff Phillips, Parasaran Raman and Suresh Venkatasubramanian
arXiv:1108.0017
In the 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings (held in conjunction with ECML/PKDD 2011)
Best Paper Award.

Active Supervised Domain Adaptation

Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian, and Scott L. DuVall
In the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011)

Johnson-Lindenstrauss Dimensionality Reduction on the Simplex

Rasmus J. Kyng, Jeff M. Phillips and Suresh Venkatasubramanian
In the 20th Fall Workshop on Computational Geometry, 2010.

The Johnson-Lindenstrauss Transform: An Empirical Study

Suresh Venkatasubramanian and Qiushi Wang
ALENEX11: Workshop on Algorithms Engineering and Experimentation (in conjunction with SODA 2011)

Spatially-Aware Comparison and Consensus for Clusterings

Jeff M. Phillips, Parasaran Raman and Suresh Venkatasubramanian
Proc. 2011 SIAM Conference on Data Mining, Apr 2011.

New Developments in the theory of clustering (Tutorial)

Sergei Vassilvitskii and Suresh Venkatasubramanian
In 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010

Universal Multi-Dimensional Scaling

Arvind Agarwal, Jeff Phillips and Suresh Venkatasubramanian
In 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010