http://www.cs.utah.edu/~suresh
suresh at cs utah edu
Ph: 801 581 8233
Room 3404, School of Computing
50 S. Central Campus Drive,
Salt Lake City, UT 84112.

NSF CCF-0953066: CAREER: Geometric Algorithms For Data Analysis In Spaces Of Distributions

Collections of distributions arise naturally when analyzing large data sets. Since it is impractical to store all but a small fraction of such data, distributional representations are typically used to summarize the data in compact form. For example, a document in a corpus is typically represented by a normalized vector of frequencies of occurrence of keywords, an image is represented by a histogram over gradient features and speech signals are represented by spectral densities over a frequency domain.

Representing data sets as collections of distributions enables analysis via powerful concepts from statistics, learning theory and information theory. Concepts like strength of belief, information content, and pattern likelihood are used to extract meaning and structure from the data and are quantified using information measures like the Kullback-Leibler distance and its parent class, the Bregman divergences.

These measures capture meaning in data in a manner that traditional metrics cannot, by connecting abstract notions of information loss and transfer with concrete geometric notions like distances. However, they lack properties like symmetry and the triangle inequality that are essential requirements for the application of traditional geometric algorithms for data analysis.

In this project, the PI will develop a systematic, rigorous and global algorithmic framework for manipulating these distances. This framework will provide the foundation for efficient and accurate data analysis of spaces of distributions, and will lead to deeper insights into analysis problems across a wide range of applications.

All the research described below has been funded by the NSF under grant CCF-0953066. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."

A directed isoperimetric inequality with application to Bregman near neighbor lower bounds

Amirali Abdullah and Suresh Venkatasubramanian
ArXiv: arXiv:1404.1191

Power to the points: validating data memberships in clusterings [author]Parasaran Raman and Suresh Venkatasubramanian[/author]

Proc. IEEE International Conference on Data Mining, 2013 (ICDM)

Sensor Network Localization for Moving Sensors [author]Arvind Agarwal, Hal Daume III, Jeff M. Phillips, Suresh Venkatasubramanian[/author] The Second IEEE ICDM Workshop on Data Mining in Networks

Efficient Protocols for Distributed Classification and Optimization [author]Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian[/author] Proc. 23rd International Conference on Algorithmic Learning Theory (ALT), 2012.
arXiv:1204.3523v1 [cs.LG]

Protocols for Learning Classifiers on Distributed Data [author]Hal Daumé, Jeff M. Phillips, Avishek Saha and Suresh Venkatasubramanian[/author] In the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

Adaptive Sampling for Large-Data MDS [author]Arvind Agarwal, Chad Brubaker, Hal Daumé III, Jeff M. Phillips and Suresh Venkatasubramanian [/author] Submitted.

Approximate Bregman near neighbors in sublinear time: Beyond the triangle inequality [author]Amirali Abdullah, John Moeller and Suresh Venkatasubramanian[/author] Proc. Symposium on Computational Geometry, 2012
http://arxiv.org/abs/1108.0835

Generating a Diverse Set of High-Quality Clusterings [author]Jeff Phillips, Parasaran Raman and Suresh Venkatasubramanian[/author] arXiv:1108.0017
In the 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings (held in conjunction with ECML/PKDD 2011)
Best Paper Award.

Active Supervised Domain Adaptation [author]Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian, and Scott L. DuVall[/author] In the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011)

Johnson-Lindenstrauss Dimensionality Reduction on the Simplex [author]Rasmus J. Kyng, Jeff M. Phillips and Suresh Venkatasubramanian[/author] In the 20th Fall Workshop on Computational Geometry, 2010.

The Johnson-Lindenstrauss Transform: An Empirical Study [author]Suresh Venkatasubramanian and Qiushi Wang[/author] ALENEX11: Workshop on Algorithms Engineering and Experimentation (in conjunction with SODA 2011)

Spatially-Aware Comparison and Consensus for Clusterings [author]Jeff M. Phillips, Parasaran Raman and Suresh Venkatasubramanian[/author] Proc. 2011 SIAM Conference on Data Mining, Apr 2011.

New Developments in the theory of clustering (Tutorial) [author]Sergei Vassilvitskii and Suresh Venkatasubramanian[/author] In 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010

Universal Multi-Dimensional Scaling [author]Arvind Agarwal, Jeff Phillips and Suresh Venkatasubramanian[/author] In 16th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), 2010