Computational Complexity Conference, 2015. http://eccc.hpi-web.de/report/2014/086/

Abstract:

In the setting of streaming interactive proofs (SIPs), a client (verifier) needs to compute a given function on a massive stream of data, arriving online, but is unable to store even a small fraction of the data. It outsources the processing to a third party service (prover), but is unwilling to blindly trust answers returned by this service. Thus, the service cannot simply supply the desired answer; it must convince the verifier of its correctness via a short interaction after the stream has been seen.

In this work we study “barely interactive” SIPs. Specifically, we show that two or three rounds of interaction suffice to solve several query problems — including Index, Median, Nearest Neighbor Search, Pattern Matching, and Range Counting — with polylogarithmic space and communication costs. Such efficiency with O(1) rounds of interaction was thought to be impossible based on previous work.

On the other hand, we initiate a formal study of the limitations of constant-round SIPs by introducing a new hierarchy of communication models called Online Interactive Proofs (OIPs). The online nature of these models is analogous to the streaming restriction placed upon the verifier in an SIP. We give upper and lower bounds that (1) characterize, up to quadratic blowups, every finite level of the OIP hierarchy in terms of other well-known communication complexity classes, (2) separate the first four levels of the hierarchy, and (3) reveal that the hierarchy collapses to the fourth level. Our study of OIPs reveals marked contrasts and some parallels with the classic Turing Machine theory of interactive proofs, establishes limits on the power of existing techniques for developing constant-round SIPs, and provides a new characterization of (non-online) Arthur–Merlin communication in terms of an online model.

]]>

**Abstract**:

Why does Deep Learning work? What representations does it capture? How do higher-order representations emerge? We study these questions from the perspective of group theory, thereby opening a new approach towards a theory of deep learning.

One factor behind the recent resurgence of the subject is a key algorithmic step called pretraining: first search for a good generative model for the input samples, and repeat the process one layer at a time. We show deeper implications of this simple principle, by establishing a connection with the interplay of orbits and stabilizers of group actions. Although the neural networks themselves may not form groups, we show the existence of shadow groups whose elements serve as close approximations.

Over the shadow groups, the pretraining step, originally introduced as a mech- anism to better initialize a network, becomes equivalent to a search for features with minimal orbits. Intuitively, these features are in a way the simplest. Which explains why a deep learning network learns simple features first. Next, we show how the same principle, when repeated in the deeper layers, can capture higher order representations, and why representation complexity increases as the layers get deeper.

**Links: **PDF

Arxiv: http://arxiv.org/abs/1412.3756

What does it mean for an algorithm to be biased?

In U.S. law, the notion of bias is typically encoded through the idea ofdisparate impact: namely, that a process (hiring, selection, etc) that on the surface seems completely neutral might still have widely different impacts on different groups. This legal determination expects an explicit understanding of the selection process.

If the process is an algorithm though (as is common these days), the process of determining disparate impact (and hence bias) becomes trickier. First, it might not be possible to disclose the process. Second, even if the process is open, it might be too complex to ascertain how the algorithm is making its decisions. In effect, since we don’t have access to the algorithm, we must make inferences based on the data it uses.

]]>We make three contributions to this problem. First, we link the legal notion of disparate impact to a measure of classification accuracy that while known, has not received as much attention as more traditional notions of accuracy. Second, we propose a test for the possibility of disparate impact based on analyzing the information leakage of protected information from the data. Finally, we describe methods by which data might be made “unbiased” in order to test an algorithm. Interestingly, our approach bears some resemblance to actual practices that have recently received legal scrutiny.

Here are the slides (4.5 MB PDF).

]]>ArXiv: arXiv:1404.1191

Bregman divergences $D_\phi$ are a class of divergences parametrized by a convex function $\phi$ and include well known distance functions like $\ell_2^2$ and the Kullback-Leibler divergence. There has been extensive research on algorithms for problems like clustering and near neighbor search with respect to Bregman divergences; in all cases, the algorithms depend not just on the data size $n$ and dimensionality $d$, but also on a structure constant $\mu \ge 1$ that depends solely on $\phi$ and can grow without bound independently.

In this paper, we provide the first evidence that this dependence on $\mu$ might be intrinsic. We focus on the problem of approximate near neighbor search for Bregman divergences. We show that under the cell probe model, any non-adaptive data structure (like locality-sensitive hashing) for $c$-approximate near-neighbor search that admits $r$ probes must use space $\Omega(n^{1 + \frac{\mu}{c r}})$. In contrast, for LSH under $\ell_1$ the best bound is $\Omega(n^{1+\frac{1}{cr}})$.

Our new tool is a directed variant of the standard boolean noise operator. We show that a generalization of the Bonami-Beckner hypercontractivity inequality exists “in expectation” or upon restriction to certain subsets of the Hamming cube, and that this is sufficient to prove the desired isoperimetric inequality that we use in our data structure lower bound.

We also present a structural result reducing the Hamming cube to a Bregman cube. This structure allows us to obtain lower bounds for problems under Bregman divergences from their $\ell_1$ analog. In particular, we get a (weaker) lower bound for approximate near neighbor search of the form $\Omega(n^{1 + \frac{1}{cr}})$ for an $r$-query non-adaptive data structure, and new cell probe lower bounds for a number of other near neighbor questions in Bregman space.

]]>In IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTICS), 2013

**Abstract**:

In the classical maximum independent set problem, we are given a graph G of “conflicts” and are asked to find a maximum conflict-free subset. If we think of the remaining nodes as being “assigned” (at unit cost each) to one of these independent vertices and ask for an assignment of minimum cost, this yields the vertex cover problem. In this paper, we consider a more general scenario where the assignment costs might be given by a distance metric d (which can be unrelated to G) on the underlying set of vertices. We call this problem minimum edge- weighted independent set. This problem, in addition to being a natural generalization of vertex cover and an interesting variant of matroid median problem, also has connection to constrained clustering and database repair.

Understanding the relation between the conflict structure (the graph) and the distance structure (the metric) for this problem turns out to be the key to isolating its complexity. We show that when the two structures are unrelated, the problem inherits a trivial upper bound from vertex cover and provide an almost matching lower bound on hardness of approximation. We then prove a number of lower and upper bounds that depend on the relationship between the two structures, including polynomial time algorithms for special graphs.

Links: PDF

**Abstract**:

In this paper, we present a method to attach affinity scores to the implicit labels of individual points in a clustering. The affinity scores capture the confidence level of the cluster that claims to “own” the point. We demonstrate that these scores accurately capture the quality of the label assigned to the point. We also show further applications of these scores to estimate *global* measures of clustering quality, as well as accelerate clustering algorithms by orders of magnitude using active selection based on affinity.

This method is very general and applies to clusterings derived from any geometric source. It lends itself to easy visualization and can prove useful as part of an interactive visual analytics framework. It is also efficient: assigning an affinity score to a point depends only polynomially on the number of clusters *and is independent both of the size and dimensionality of the data*. It is based on techniques from the theory of interpolation, coupled with sampling and estimation algorithms from high dimensional computational geometry.

Links: (older arXiv version: submitted version)

]]>Abstract:

This column comes in two parts. In the first, I discuss various ways of defining distances between distributions. In the second, Jeff Erickson (chair of the SoCG steering committee) discusses some matters related to the relationship between ACM and the Symposium on Computational Geometry.

Links: PDF

Lecture notes: PDF

I strongly recommend that this lecture be conducted with a demonstration using real cake. But don’t use a crumbly cake, or one with too much icing !

And finally, if you use this lecture and like it, please drop me a note or post a comment. If you’d like help with adapting it to your audience, I’d be happy to help.

Cake Cutting Algorithms by Suresh Venkatasubramanian is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License. Based on a work at http://www.cs.utah.edu/~suresh/web/2013/06/25/cake-cutting-algorithms/.

]]>