<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Suresh Venkatasubramanian &#187; CCF 0953066</title>
	<atom:link href="http://www.cs.utah.edu/~suresh/web/tag/career/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.cs.utah.edu/~suresh/web</link>
	<description></description>
	<lastBuildDate>Tue, 26 Feb 2013 16:47:11 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.5</generator>
		<item>
		<title>Sensor Network Localization for Moving Sensors</title>
		<link>http://www.cs.utah.edu/~suresh/web/2012/10/15/sensor-network-localization-for-moving-sensors/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2012/10/15/sensor-network-localization-for-moving-sensors/#comments</comments>
		<pubDate>Mon, 15 Oct 2012 17:44:24 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>
		<category><![CDATA[CCF 1115677]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=287</guid>
		<description><![CDATA[[author]Arvind Agarwal, Hal Daume III, Jeff M. Phillips, Suresh Venkatasubramanian[/author] The Second IEEE ICDM Workshop on Data Mining in Networks Abstract: Sensor network localization (SNL) is the problem of determining the locations of the sensors given sparse and usually noisy inter-communication distances among them. In this work we propose an iterative algorithm named PLACEMENT to [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Arvind Agarwal, Hal Daume III, Jeff M. Phillips, Suresh Venkatasubramanian[/author]<br />
<em><a href="http://damnet.reading.ac.uk/">The Second IEEE ICDM Workshop on Data Mining in Networks</a></em></p>
<p><span id="more-287"></span></p>
<p>Abstract:</p>
<blockquote><p>Sensor network localization (SNL) is the problem of determining the locations of the sensors given sparse and usually noisy inter-communication distances among them. In this work we propose an iterative algorithm named PLACEMENT to solve the SNL problem.<br />
This iterative algorithm requires an initial estimation of the locations and in each iteration, is guaranteed to reduce the cost function. The proposed algorithm is able to take advantage of the good initial estimation of sensor locations making it suitable for localizing moving sensors, and also suitable for the reﬁnement of the results produced by other algorithms. Our algorithm is very scalable. We have<br />
experimented with a variety of sensor networks and have shown that the proposed algorithm outperforms existing algorithms both in terms of speed and accuracy in almost all experiments. Our algorithm can embed 120,000 sensors in less than 20 minutes.</p>
</blockquote>
<p>Links: <a href="http://www.cs.utah.edu/~suresh/papers/damnet/damnet.pdf">PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2012/10/15/sensor-network-localization-for-moving-sensors/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Efficient Protocols for Distributed Classification and Optimization</title>
		<link>http://www.cs.utah.edu/~suresh/web/2012/04/16/efficient-protocols-for-distributed-classification-and-optimization/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2012/04/16/efficient-protocols-for-distributed-classification-and-optimization/#comments</comments>
		<pubDate>Tue, 17 Apr 2012 02:25:28 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=275</guid>
		<description><![CDATA[[author]Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian[/author] Proc. 23rd International Conference on Algorithmic Learning Theory (ALT), 2012. arXiv:1204.3523v1 [cs.LG] Abstract: In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Hal Daume III, Jeff M. Phillips, Avishek Saha, Suresh Venkatasubramanian[/author]<br />
<a href="http://www-alg.ist.hokudai.ac.jp/~thomas/ALT12/index.html">Proc. 23rd International Conference on Algorithmic Learning Theory (ALT), 2012.</a><br />
<a href="http://arxiv.org/abs/1204.3523">arXiv:1204.3523v1 [cs.LG]</a></p>
<p><span id="more-275"></span><br />
<strong>Abstract:</strong></p>
<blockquote><p>In distributed learning, the goal is to perform a learning task over data distributed across multiple nodes with minimal (expensive) communication. Prior work (Daume III et al., 2012) proposes a general model that bounds the communication required for learning classifiers while allowing for $\eps$ training error on linearly separable data adversarially distributed across nodes.</p>
<p>In this work, we develop key improvements and extensions to this basic model. Our first result is a two-party multiplicative-weight-update based protocol that uses $O(d^2 \log{1/\eps})$ words of communication to classify distributed data in arbitrary dimension $d$, $\eps$-optimally. This readily extends to classification over $k$ nodes with $O(kd^2 \log{1/\eps})$ words of communication. Our proposed protocol is simple to implement and is considerably more efficient than baselines compared, as demonstrated by our empirical results.<br />
In addition, we illustrate general algorithm design paradigms for doing efficient learning over distributed data. We show how to solve fixed-dimensional and high dimensional linear programming efficiently in a distributed setting where constraints may be distributed across nodes. Since many learning problems can be viewed as convex optimization problems where constraints are generated by individual points, this models many typical distributed learning scenarios. Our techniques make use of a novel connection from multipass streaming, as well as adapting the multiplicative-weight-update framework more generally to a distributed setting. As a consequence, our methods extend to the wide range of problems solvable using these techniques. </p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2012/04/16/efficient-protocols-for-distributed-classification-and-optimization/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Protocols for Learning Classifiers on Distributed Data</title>
		<link>http://www.cs.utah.edu/~suresh/web/2011/12/12/protocols-for-learning-classifiers-on-distributed-data/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2011/12/12/protocols-for-learning-classifiers-on-distributed-data/#comments</comments>
		<pubDate>Mon, 12 Dec 2011 21:53:20 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=260</guid>
		<description><![CDATA[[author]Hal Daumé, Jeff M. Phillips, Avishek Saha and Suresh Venkatasubramanian[/author] In the 15th International Conference on Artificial Intelligence and Statistics (AISTATS), 2012. Abstract: We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Hal Daumé, Jeff M. Phillips, Avishek Saha and Suresh Venkatasubramanian[/author]<br />
In the <a href="http://www.aistats.org/">15th International Conference on Artificial Intelligence and Statistics</a> (AISTATS), 2012.</p>
<p><span id="more-260"></span><br />
<strong>Abstract:</strong></p>
<p>We consider the problem of learning classifiers for labeled data that has been distributed across several nodes. Our goal is to find a single classifier, with small approximation error, across all datasets while minimizing the communication between nodes. This setting models real-world communication bottlenecks in the processing of massive distributed datasets.  We present several very general sampling-based solutions as well as some two-way protocols which have a provable exponential speed-up over any one-way protocol. We focus on core problems for <em>noiseless</em> data distributed across two or more nodes. The techniques we introduce are reminiscent of active learning, but rather than actively probing labels, nodes actively communicate with each other, each node simultaneously learning the important data from another node. </p>
<p>Links: <a href="http://www.cs.utah.edu/~suresh/papers/active/active.pdf">PDF </a>(this is the submitted version, not the final accepted version)</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2011/12/12/protocols-for-learning-classifiers-on-distributed-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Adaptive Sampling for Large-Data MDS</title>
		<link>http://www.cs.utah.edu/~suresh/web/2011/10/17/adaptive-sampling-for-large-data-mds/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2011/10/17/adaptive-sampling-for-large-data-mds/#comments</comments>
		<pubDate>Mon, 17 Oct 2011 17:19:56 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=258</guid>
		<description><![CDATA[[author]Arvind Agarwal, Chad Brubaker, Hal Daumé III, Jeff M. Phillips and Suresh Venkatasubramanian [/author] Submitted. Abstract: Multidimensional scaling (MDS) is one of the most popular methods for reducing the dimensionality of data. As data sizes have grown, the space and time limitations of traditional MDS algorithms have become more pronounced, and extensive research has gone [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Arvind Agarwal, Chad Brubaker, Hal Daumé  III, Jeff M. Phillips and Suresh Venkatasubramanian [/author]<br />
<em>Submitted</em>.</p>
<p><span id="more-258"></span><br />
<strong>Abstract</strong>:<br />
Multidimensional scaling (MDS) is one of the most popular methods for reducing the dimensionality of data. As data sizes have grown, the space and time limitations of traditional MDS algorithms have become more pronounced, and extensive research has gone into designing methods for performing MDS that scale to larger data sets. However, these approaches generally start with a matrix decomposition approach to solving MDS. This matrix decomposition is expensive in time and space, and thus the approaches focus on trying to approximate the decomposition using Nystr&ouml;m methods to solve a <em>smaller</em> matrix decomposition problem. </p>
<p>In this paper, we present a new approach to scalable MDS that combines adaptive sampling methods, multi-pass streaming algorithms, and multi-core extensions, and gives a much better error-time tradeoff than prior approaches. Our approach uses a <em>nonlinear</em> projection technique that was recently developed for MDS and avoids expensive matrix decompositions, from which it derives much of its space and time efficiency. </p>
<p>This method allows us to perform MDS feasibly and accurately on data sets of the order of hundreds of thousands of points. While this is still not &#8220;enormous&#8221;, it is orders of magnitude larger (for similar error rates) than previous known methods. In addition, because of the underlying approach we use, this method generalizes to many variants of MDS (using robust error metrics, in <em>non-Euclidean</em> spaces) that have never been studied at scale. </p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2011/10/17/adaptive-sampling-for-large-data-mds/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Approximate Bregman near neighbors in sublinear time: Beyond the triangle inequality</title>
		<link>http://www.cs.utah.edu/~suresh/web/2011/07/29/approximate-bregman-near-neighbors-in-sublinear-time-beyond-the-triangle-inequality/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2011/07/29/approximate-bregman-near-neighbors-in-sublinear-time-beyond-the-triangle-inequality/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 22:51:08 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=245</guid>
		<description><![CDATA[[author]Amirali Abdullah, John Moeller and Suresh Venkatasubramanian[/author] Proc. Symposium on Computational Geometry, 2012 http://arxiv.org/abs/1108.0835 Abstract: Bregman divergences are important distance measures that are used extensively in data-driven applications such as computer vision, text mining, and speech processing, and are a key focus of interest in machine learning. Answering nearest neighbor (NN) queries under these measures [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Amirali Abdullah, John Moeller and Suresh Venkatasubramanian[/author]<br />
<em><a href="http://socg2012.web.unc.edu/">Proc. Symposium on Computational Geometry, 2012</a></em><br />
<a href="http://arxiv.org/abs/1108.0835 "><em>http://arxiv.org/abs/1108.0835 </em></a></p>
<p><span id="more-245"></span><br />
<strong>Abstract</strong>:</p>
<blockquote><p>
Bregman divergences are  important distance measures that are used extensively in data-driven applications such as computer vision, text mining, and speech processing, and are a key focus of interest in machine learning. Answering nearest neighbor  (NN) queries under these measures is very important in these applications and has been the subject of extensive study, but is problematic because these distance measures  lack metric properties like symmetry and the triangle inequality.</p>
<p>In this paper, we present the first provably approximate nearest-neighbor (ANN)  algorithms. These process queries in $O(\log n)$ time for Bregman divergences in fixed dimensional spaces. We also obtain $\text{poly}\log n$ bounds for a more abstract class of distance measures (containing Bregman divergences) which satisfy certain structural properties . Both of these bounds apply to both the regular asymmetric Bregman divergences as well as their symmetrized versions.</p>
<p>To do so, we develop two geometric properties vital to our analysis: a reverse triangle inequality (RTI) and a relaxed triangle inequality called $\mu$-defectiveness where $\mu$ is a domain-dependent parameter. Bregman divergences  satisfy the RTI but not $\mu$-defectiveness. However, we show that the square root of a Bregman divergence does satisfy $\mu$-defectiveness. This allows us to then utilize both properties in an efficient search data structure that follows the general two-stage paradigm of a ring-tree decomposition followed by a quad tree search used in previous near-neighbor algorithms for Euclidean space and spaces of bounded doubling dimension. </p>
<p>Our first algorithm resolves a query for a $d$-dimensional $(1+\eps)$-ANN in $O \left((\frac{\log n}{\eps})^{O(d)} \right)$ time and $O(\left(n \log^{d-1} n \right)$ space and holds for generic $\mu$-defective distance measures satisfying a RTI. Our second algorithm is more specific in analysis to the Bregman divergences and uses a further structural constant, the maximum ratio of second derivatives over each dimension of our domain ($c_0$). This allows us to locate a $(1+\eps)$-ANN in $O(\log n)$ time and $O(n)$ space, where there is a further $(c_0)^d$ factor in the big-Oh for the query time.</p>
</blockquote>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2011/07/29/approximate-bregman-near-neighbors-in-sublinear-time-beyond-the-triangle-inequality/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Generating a Diverse Set of High-Quality Clusterings</title>
		<link>http://www.cs.utah.edu/~suresh/web/2011/07/29/generating-a-diverse-set-of-high-quality-clusterings/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2011/07/29/generating-a-diverse-set-of-high-quality-clusterings/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 22:19:19 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=243</guid>
		<description><![CDATA[[author]Jeff Phillips, Parasaran Raman and Suresh Venkatasubramanian[/author] arXiv:1108.0017 In the 2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings (held in conjunction with ECML/PKDD 2011) Best Paper Award. Abstract: We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Jeff Phillips, Parasaran Raman and Suresh Venkatasubramanian[/author]<br />
<a href="http://arxiv.org/abs/1108.0017">arXiv:1108.0017</a><br />
<em>In the <a href="http://dme.rwth-aachen.de/en/MultiClust2011">2nd MultiClust Workshop: Discovering, Summarizing and Using Multiple Clusterings</a> (held in conjunction with <a href="http://www.ecmlpkdd2011.org/">ECML/PKDD 2011</a>)</em><br />
<strong>Best Paper Award.</strong><br />
<span id="more-243"></span><br />
<strong>Abstract:</strong></p>
<blockquote><p>We provide a new framework for generating multiple good quality partitions (clusterings) of a single data set. Our approach decomposes this problem into two components, generating many high-quality partitions, and then grouping these partitions to obtain k representatives. The decomposition makes the approach extremely modular and allows us to optimize various criteria that control the choice of representative partitions.</p></blockquote>
<p>Links: <a href="http://www.cs.utah.edu/~suresh/papers/multiclust11/alternative.pdf">PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2011/07/29/generating-a-diverse-set-of-high-quality-clusterings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Active Supervised Domain Adaptation</title>
		<link>http://www.cs.utah.edu/~suresh/web/2011/07/29/active-supervised-domain-adaptation/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2011/07/29/active-supervised-domain-adaptation/#comments</comments>
		<pubDate>Fri, 29 Jul 2011 22:11:09 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0841185]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=239</guid>
		<description><![CDATA[[author]Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian, and Scott L. DuVall[/author] In the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2011) Abstract: In this paper, we harness the synergy between two important learning paradigms, namely, active learning and domain adaptation. We show how active learning [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Avishek Saha, Piyush Rai, Hal Daumé III, Suresh Venkatasubramanian, and Scott L. DuVall[/author]<br />
<em>In the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (<a href="http://www.ecmlpkdd2011.org/index.php">ECML-PKDD 2011</a>)</em><br />
<span id="more-239"></span><br />
<strong>Abstract:</strong><br />
In this paper, we harness the synergy between two important learning paradigms, namely, active learning and domain adaptation. We show how active learning in a target domain can leverage information from a different but related source domain. Our proposed framework, Active Learning Domain Adapted (Alda), uses source domain knowledge to transfer information that facilitates active learning in the target domain. We propose two variants of Alda: a batch B-Alda and an online O-Alda. Empirical comparisons with numerous baselines on real-world datasets establish the efficacy of the proposed methods.</p>
<p>Links: <a href="http://www.cs.utah.edu/~suresh/papers/ecml2011/alda.pdf">PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2011/07/29/active-supervised-domain-adaptation/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Johnson-Lindenstrauss Dimensionality Reduction on the Simplex</title>
		<link>http://www.cs.utah.edu/~suresh/web/2010/10/15/johnson-lindenstrauss-dimensionality-reduction-on-the-simplex/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2010/10/15/johnson-lindenstrauss-dimensionality-reduction-on-the-simplex/#comments</comments>
		<pubDate>Fri, 15 Oct 2010 07:33:42 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0841185]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=217</guid>
		<description><![CDATA[[author]Rasmus J. Kyng, Jeff M. Phillips and Suresh Venkatasubramanian[/author] In the 20th Fall Workshop on Computational Geometry, 2010. We propose an algorithm for dimensionality reduction on the simplex, mapping a set of high-dimensional distributions to a space of lower-dimensional distributions, whilst approximately preserving pairwise Hellinger distance between distributions. By introducing a restriction on the input [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Rasmus J. Kyng, Jeff M. Phillips and Suresh Venkatasubramanian[/author]<br />
In the <a href="http://www.ams.sunysb.edu/~jsbm/fwcg-2010.html">20th Fall Workshop on Computational Geometry</a>, 2010.</p>
<p><span id="more-217"></span><br />
We propose an algorithm for dimensionality reduction on the simplex, mapping a set of high-dimensional distributions to a space of lower-dimensional distributions, whilst approximately preserving pairwise Hellinger distance between distributions. By introducing a restriction on the input data to distributions that are in some sense quite smooth, we can map $n$ points on the $d$-simplex to the simplex of $O(\eps^{-2}\log n)$ dimensions with $\eps$-distortion with high probability. The techniques used rely on a classical result by Johnson and Lindenstrauss on dimensionality reduction for Euclidean point sets and require the same number of random bits as non-sparse methods proposed by Achlioptas for database-friendly dimensionality reduction.</p>
<p>Links. <a href="http://www.cs.utah.edu/~suresh/papers/jlsimplex/fwcg10.pdf">PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2010/10/15/johnson-lindenstrauss-dimensionality-reduction-on-the-simplex/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The Johnson-Lindenstrauss Transform: An Empirical Study</title>
		<link>http://www.cs.utah.edu/~suresh/web/2010/10/05/the-johnson-lindenstrauss-transform-an-empirical-study/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2010/10/05/the-johnson-lindenstrauss-transform-an-empirical-study/#comments</comments>
		<pubDate>Wed, 06 Oct 2010 04:51:58 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=212</guid>
		<description><![CDATA[[author]Suresh Venkatasubramanian and Qiushi Wang[/author] ALENEX11: Workshop on Algorithms Engineering and Experimentation (in conjunction with SODA 2011) Abstract: The Johnson-Lindenstrauss Lemma states that a set of $n$ points may be embedded in a space of dimension $O(\log n/\eps^2)$ while preserving all pairwise distances within a factor of $(1+\epsilon)$ with high probability. It has inspired a [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Suresh Venkatasubramanian and Qiushi Wang[/author]<br />
<em><a href="http://www.siam.org/meetings/alenex11/">ALENEX11: Workshop on Algorithms Engineering and Experimentation</a> (in conjunction with SODA 2011)</em></p>
<p><span id="more-212"></span><br />
<strong>Abstract:</strong><br />
The Johnson-Lindenstrauss Lemma states that a set of $n$ points may be embedded in a space of dimension $O(\log n/\eps^2)$ while preserving all pairwise distances within a factor of $(1+\epsilon)$ with high probability. It has inspired a number of proofs that extend the result, simplify it, and improve the efficiency of  computing the resulting embedding.  The lemma is a critical tool in the realm of dimensionality reduction and high dimensional approximate computational geometry.  It is also employed for data mining in domains that analyze intrinsically  high dimensional objects such as images and text. However, while algorithms for performing the dimensionality reduction have become increasingly sophisticated, there is little understanding of the behavior of these embeddings in practice. In this paper, we present the first comprehensive study of the empirical behavior of algorithms for dimensionality reduction based on the JL Lemma.</p>
<p>Our study answers a number of important questions about the quality of the embeddings and the performance of algorithms used to compute them. Among our key results:</p>
<ul>
<li>Determining a likely range for the big-Oh constant in practice for the dimension of the target space, and demonstrating the accuracy of the predicted bounds.</li>
<li>Finding `best in class&#8217; algorithms over wide ranges of data size and source dimensionality, and showing that these depend heavily on parameters of the data as well its sparsity.</li>
<li>Developing the best implementation for each method, making use of non-standard optimized codes for key subroutines.</li>
<li>Identifying critical computational bottlenecks that can spur further theoretical study of efficient algorithms.</li>
</ul>
<p>Links: <a href="http://www.cs.utah.edu/~suresh/papers/jlx/jlx.pdf">PDF</a></p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2010/10/05/the-johnson-lindenstrauss-transform-an-empirical-study/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Spatially-Aware Comparison and Consensus for Clusterings</title>
		<link>http://www.cs.utah.edu/~suresh/web/2010/07/06/spatially-aware-comparison-and-consensus-for-clusterings/</link>
		<comments>http://www.cs.utah.edu/~suresh/web/2010/07/06/spatially-aware-comparison-and-consensus-for-clusterings/#comments</comments>
		<pubDate>Tue, 06 Jul 2010 08:27:49 +0000</pubDate>
		<dc:creator>suresh</dc:creator>
				<category><![CDATA[Papers]]></category>
		<category><![CDATA[CCF 0953066]]></category>

		<guid isPermaLink="false">http://www.cs.utah.edu/~suresh/web/?p=140</guid>
		<description><![CDATA[[author]Jeff M. Phillips, Parasaran Raman and Suresh Venkatasubramanian[/author] Proc. 2011 SIAM Conference on Data Mining, Apr 2011. Abstract: This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach yields not only a distance function, but a Hilbert space-based representation of clusters as a [...]]]></description>
				<content:encoded><![CDATA[<p>[author]Jeff M. Phillips, Parasaran Raman and Suresh Venkatasubramanian[/author]<br />
<em><a href="http://www.siam.org/meetings/sdm11/">Proc. 2011 SIAM Conference on Data Mining</a>, Apr 2011. </em><br />
<span id="more-140"></span><br />
Abstract:<br />
This paper proposes a new distance metric between clusterings that incorporates information about the spatial distribution of points and clusters. Our approach yields not only a distance function, but a Hilbert space-based representation of clusters as a combination of the representations of their constituent points. We use this representation and the underlying metric to design a spatially-aware consensus clustering procedure, the first of its kind. This consensus procedure also introduces a novel reduction to Euclidean clustering, and is very simple to implement. All of our results apply to comparing both soft and hard clusterings. We accompany these algorithms with a detailed experimental evaluation that demonstrates the efficiency and quality of our techniques.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.cs.utah.edu/~suresh/web/2010/07/06/spatially-aware-comparison-and-consensus-for-clusterings/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>