Talk:CS 6963 Summaries

From ResearchWiki

Revision as of 10:01, 25 November 2008 by Piyush (Talk | contribs)
Jump to: navigation, search

Contents

28 Aug

Nathan

Is it possible for objects in your chosen probability space S to have a zero probability, or do all objects in S contain the properties desired for your application?
(Piyush) For the former question, the answer, in principle, is yes. But usually the probability space S is constructed appropriately by posing certain restrictions on it so that we end up getting a non-zero probability of existence. These restrictions, however, would usually be reasonable ones. For example, in the monochromatic subgraph problem, we assume that {n\choose k}2^{-{k \choose 2} + 1} < 1, which would indeed be a reasonable assumption for many problem instances. On the other hand, for the latter question, it would be highly unlikely (and probably uninteresting) to have a case when all objects in S have the desirable properties.

Scott

In section 6.4.2, when it says " G \in G_{n, p}", what does the notion "Gn,p" mean?
(Piyush) Gn,p is a random graph with n nodes that probability p of having an edge between any two nodes.

Amit

I am not able to convince myself the max cut size be at least m/2. I buy the later argument that we need at most m/2 +1 samples to find a cut with at least m/2 edges.
(Piyush) Note that the probability that an edge ei connects a vertex in A to a vertex in B is 1/2. To see this, consider both vertices (say v1 and v2) associated with ei. Since we are assigning v1 and v2 to sets A or B randomly and independently, the probability that they go to different sets (i.e. one to A and the other to B) is 1/2. If it is still unclear, think of this: you can assign the first vertex to any of the sets (say A) with probability 1. But then you must assign the second vertex to the other set (B), with probability 1/2. Since these two events are independent, we get an overall probability of 1/2. Invoking the linearity of expectation argument, we get the m/2 lower bound on the max cut size.

Rob

In the analysis of the Las Vegas algorithm for the coloring problem (or, in general, any other Las Vegas algorithm), is it possible to prove the algorithm always completes in polynomial time, or can we simply prove that it completes in polynomial time with high probability? (For example, we can imagine that there is only one value for the random variable(s) that leads the algorithm to find a correct solution, but a truly random distribution is not guaranteed to return that value within a particular (finite) number of trials - it's just probable that it will.)
(Piyush) The Las Vegas algorithm assumes conditions (i.e. sample size and the verification algorithm both should be polynomial in the problem size) that, in polynomial time, guarantee a correct solution or the non-existence of such a solution. If the assumed conditions are not met, you cannot have a Las Vegas algorithm for the problem. In your case, the sample size clearly is not polynomial in the problem size. So, by definition, it cannot be called a Las Vegas algorithm.

Tilottama

I have a general question -While choosing to apply either the Counting Argument or the Expectation Argument, is there any intuition as to which method is more suited to a particular problem?
(Piyush) In general, it depends on the problem. Note that the object in consideration is a random variable. If it is possible to express the desired properties as some expected value depending on this random variable, the expectation argument might be preferred. Such were the cases, for example, with the max-cut and maxsat problems.

Parasaran

Given a NP-Hard problem, what tells an algorithm analyst to choose the Randomization (Probabilistic method) or Derandomization (Deterministic Method)?

(Piyush) It depends on the problem structure. If the correctness or feasibility of a randomized algorithm is not satisfactory, one may try derandomization. But whether you can derandomize an randomized algorithm may or may not be possible, again depending on the problem.

In the 'Sample and Modify' Example-1, how do we choose the average degree of the vertex (d) to be 2m/n ?

(Piyush) Each edge is shared between two vertices. Thus the sum of degrees would be 2m. Since there are n vertices, we get an average 2m/n.

Avishek

As we have seen, existential proofs using probabilistic methods can be converted into efficient randomized or deterministic construction algorithms. Although I am not sure, I guess there might be situations/cases where this conversion is not possible.

(Piyush) Yes, there can be situations where a randomized algorithm may not be able to give a satisfactory solution (in term of correctness or feasibility) or a derandomized construction is not possible.
If this is true, then can we in some manner infer from the probabilistic method whether it would be able to give rise to some (randomized or deterministic) construction algorithm or nothing at all ?
(Piyush) If you construct a Monte Carlo algorithm, the success will depend on the probability of getting the correct sample. In you construct a Las Vegas algorithm, it would depend on being able to efficiently sample a candidate solution, sample size, and the time to check whether the solution is correct or not (these values should be polynomial in problem size in order to give a satisfactory solution). Note, however, that some probabilistic methods (such as sample and modify), give us a randomized algorithm as a byproduct of the existence proof itself.

Hal

I don't follow the "sample and modify" algorithm for independent sets. In particular, I don't see why it holds that the vertices obtained by the algorithm form an independent set.
(Piyush) In both the steps, by construction (or "destruction"), we are removing one of the two vertices that are connected by an edge. So at the end, we will never have a vertex pair that was connected originally. And since we end up removing all the edges at the end of step-2, the set of vertices we are left with forms an independent set.

Arvind

I don't understand the step 2 of the "sample and modify" algorithm for independent set.
(Piyush) By deleting an edge (and one of the associated vertices), step-2 just removes one vertex from each pair of vertices that are connected (and hence could never be in an independent set together). This (combined with step-1) leaves us only with a set of vertices that were not connected in the beginning.

Sep 2

Nathan

The first handout shows that the LLL generally produces very low probabilities of existence (but still positive) for your desired object. Are there any tricks to improve this in order to facilitate the creation of efficient algorithms?
(Suresh) there's a considerable amount of discussion on this issue: the most recent paper on the k-SAT problem discusses this in the related work. A celebrated paper by Molloy and Reed from STOC 1998 explored conditions under which the Lovasz local lemma can be converted into an effective algorithm, and citations to this paper reveal much more. Maybe this would also have been a nice project topic !

Scott

Given some problem, I don't see it as always trivial to construct the necessary dependency graph. Since dependency can be hard to detect from sampling, are there ways to relax condition 2 of Theorem 6.11?
(Piyush) We don't have to "construct" the dependency graph. All we need to come up with is an upper bound on the degree of dependency graph which I think should always be possible to obtain from the problem structure. That said, I think it's probably also crucial how tight the upper bound is.
(Amit) I think its not bad to have the upper bound loose (0.1 < 1 is good may not be interesting). The bound 4dp \leq 1 bounds the probability of dependence and a bad event at same time.

Parasaran

I don't quite understand how we are arriving at the constant queue size at each edge (Pg 4. in Handout 1). And how does delaying each packet that needs to be routed help?
(Piyush) The handout only provides a portion of the result. In particular, it shows that using LLL, we can prove that every edge has a log(c) congestion in a "frame" (the structure constructed after introducing the random delays). This fact is used later for the rest of the proof which is actually part of Leighton et al. paper (ref [5]). Also note that the random delay introduced here doesn't necessarily "help" us get a good schedule. All it does is provide us with the necessary conditions under which LLL can be applied, and it can be shown that there does exist an O(c+d) length schedule with a small positive probability.

Amit

It seems like this method could be useful for atleast finding existence of good events if there is less dependence b/w variables. Does this fact somehow could be used to make your algorithm more efficient with better provable bounds?
(Piyush) Note than the less likely the bad events are (i.e. small p), the more willing we are to tolerate dependence (because of the condition 4pD \leq 1). That is, we can still afford to have a reasonable amount of dependency. For your question on efficient algorithms, refer to Suresh's answer to Nathan's question above.
Also if we try changing the 4dp \leq 1 condition; how will it affect the overall analysis?
(Piyush) A weaker condition such as ep(D+1) \leq 1 may allow you to tolerate more dependence.


Tilottama

The LLL seems to guaranty a very small probability (even in the best case) for the event being shown to exist.Doesnt this limit its scope of application? Also, it seems to rely on the order of dependency between variables which I guess is inherent to the sampling method or problem.But is there any other method/factor which can be moderated to make the algorithm more efficient and widely applicable?
(Piyush) LLL is especially applicable for proving existence of events that are rare. Finding the bound on dependency depends on the problem structure and I think is possible to come up with such a bound for any problem (ref: Scott's question). For your question on efficient algorithms, refer to Suresh's answer to Nathan's question above.


Arvind

I do not quite follow the analysis of packet routing after we have the expectation of the congestion (c/ac = 1/a).
(Piyush) Since the frame length is log(c), the expected congestion of an edge f in frame F will be (1/a)*log(c). Now we need to find the probability of the event that congestion in an edge is more than log(c). To find this, they apply Chernoff bound with suitably chosen mean μ = log(c) / a and std dev δ = (1 − a) (note that the Chernoff bound looks like Pr(X > (1 + δ)μ) < = F + (μ,δ)). A union bound is then applied which gives an upper bound on the probabilities of each bad event (i.e. an edge having congestion more than log(c)). Degree of dependency graph is also shown to be bounded by some number. Combining these two facts, we get the desired result from LLL.

Rob

I don't understand why the proof of the LLL in the first handout introduces the variable T - I don't follow why this "conditioning technique" helps the proof, and when you might want to use it.

Sep 4

Nathan

In the attached article, why did the authors choose to set k = \frac{3\log n}{\log \ log n}? My guess is just so the math would work out, but are there any other reasons?
(Piyush): I think that's the reason.
(Suresh) this is a useful fact worth remembering: if you need a number x such that xx = n, then x = logn / loglogn
(Amit) I think mostly in such analysis, we need to show the probability in \frac{1}{n^c} form and setting k = \frac{3\log n}{\log \ log n} does the trick for us. The reason is the fact stated by Suresh above.
Is the line directly above theorem 6.1.1 a typo? Shouldn't the probability be greater than 1 - \frac{2}{n^2}?
(Piyush): Note that we need to sum over all the Pr(Ai,k) so we need to multiply \frac{2}{n^2} by n, yielding \frac{2}{n}.

Scott

From an applied view, I'm not following exactly where the randomization comes into play. If I'm making a hash table and have a function than goes from a key to an index, shouldn't that function be deterministic so that I both add to the hash table and find elements in it?
(Piyush): I think the reason is following: We choose a function h from a family H (the 2-universal family) of hash functions. By construction, a large fraction of functions from this family are such that, for small-cardinality set of keys, the number of collisions would be small. Thus we expect that, for a particular S, even a random h from H would give us the desired performance (i.e. small number of collisions and expected time of O(1) for search and updates in the table).
(Suresh): Scott, the function is deterministic !! it's merely constructed in a randomized manner. you fix a and b once the hashing starts.

Rob

I'm having a little trouble with the construction of universal hash functions presented here - it's stated that a and b are selected from \mathbb{Z}_p, and that \mathbb{Z}_p = {0, 1, ... p-1} - but then as part of fa,b, we take b \mod p. Isn't b \leq p - 1 by definition?
(Piyush): I think it's meant to be read like (ax+b) mod p and not like ax + b mod p.

Piyush

The universal hash family assumes a small cardinality for the set S of keys. Is it small in the absolute sense, or relative to the size of the universe M?
(Amit) : I would say its relative. Small and big can't be defined in absolute terms.
I don't completely understand why the 2-universal families can't fully capture the pairwise independence property and how the strongly 2-universal families can?
(Amit) : I think reading Section 3.4 of the book will address this question (definition of pairwise independence says something like this h(x1) = y1 and h(x2) = y2). Here values of h(x1),h(x2) need not be equal. Said that I think Suresh can give a better intuition since this section is not covered that much in detail.
In the construction, is there any reason behind choosing a linear function for fa,b? Does it matter/help to if we choose some other type of function?
(Amit) : I think we need to choose a simple function (like linear) since if we chose something complex, the evaluation may unnecessarily take longer.

Parasaran

How exactly do you differentiate a 2-universal family from the strong 2-universal one?
(Amit) crude form of answer is 2-strong universal one subsumes 2-universal family. The reason is if y1 = y2 in 2-strong universal, then both are doing the same thing.
I cannot follow how the 2- universal family helps the dynamic dictionary problem?
(Amit) In dynamic case, we can have updates which cannot be taken care by any deterministic strategy. Hence we need randomization to beat the adversary. And 2-universal hash families is exactly trying to do that for us.

Avishek

The last paragraph in 8.4.4 [MR] says that k-wise independence reduces the chances of collision to 1/n^k and thus makes things better. This, as we saw in the last seminar, corroborates with the Lovasz Local (L2) lemma which says that an higher degree of mutual independence increases the chances of existence of a good event.
(Amit) I am not sure whether having more independence always increases the chances of existence of a good event. Also independence and dependence are inherent to the problem.
1. is there any connection between the two ?
(Amit) I can't see obvious connection. Though more independence reduces the probability of bad events i.e. collision in our case.
2. what is the intuition for the "importance of (mutual) independence" in formulation/design of randomized techniques ? or in other words, why is "(mutual) independence" so important for randomization ?
(Amit) Independence assumptions makes things easy to analyze. Also more dependence among variables makes problems combinatorial. Let aside randomization, even in graphical models, independence among rvs is good.

Jagadeesh J

1. When | N | = | S | , isn't it possible to sample a random perfect hash from perfect has family ? I suspect the answer is no, other wise we don't need to go for a 2-level hashing. But I am still not able to digest the need to go for a 2-level hashing function when we are already using the memory of size equal to the size of Keys.
(Jagadeesh) I think its the trade off between the memory size vs. the search time.
(Amit) When |S|=|N| doesn't mean you know your Set S (chosen from some Set M) a priori. If you know your |S| in advance like in static case, then you are good. Otherwise, adversary could change S according to your chosen hash function. You could do separate chaining or 2-level hashing to avoid conflicts.
(Amit) http://en.wikipedia.org/wiki/Q.E.D.

Sep 9

Nathan

This is a pretty simple question, but I'm missing something. At the tail end of lemma 8.17, how do they make the jump from k(xy)modp = jr implying that the number of values causing collision is at most  \frac{2(p-1)}{r}?
(Avishek) In the equation immediately above k(xy)modp = jr, it says that k(xy)modp results in values which belong to a set \{\pm r,\pm 2r,\ldots,\pm \lfloor (p-1)/r \rfloor r\} (say, \mathcal{R}). Since, x,y and p are fixed in this equation, only by varying k can k(xy)modp = jr attain all values in \mathcal{R}. And the number of values in \mathcal{R} is the sum of  \frac{(p-1)}{r} positive values and  \frac{(p-1)}{r} negative values which equals to a total of  \frac{2(p-1)}{r}.

Rob

The scheme described in MoRa requires 6s + 1 cells; the constant 6 feels pretty large. Are there similar schemes that use provably fewer cells? Clearly, the limit would be s + 1 (with the extra cell being used to store k) if one could find a perfect hash function. Since the "incremental" cost of a cell in the primary table that collides is 5 (3 for the secondary table, 1 to record its length, and 1 to record which secondary hash function is in use), and the expected number of collisions in the primary table is low, we might expect to be able to get closer to s + 1 + 5b, where b is the number of collisions, and b \ll s.
(Avishek) The way I understood it is that the primary hash function is good not because it overally reduces the total no. of collisions (b) to b \ll s, but rather because it spreads out the collisions evenly across the entire primary hash table so that the no single cell in the primary hash table has a large no. of collisions. A good choice of primary hash function doesn't necessarily reduce the total number of collisions but instead tries to ensure a linear bound over the overall size of the entire hash table despite the fact that secondary hash tables require size quadratic in primary hash table bin sizes. The total no. of collisions (as given by the 2nd last line in Pg 226) still remains \sum_{i=0}^{s-1} b_i(k,s,S) =s which when plugged into your expression s + 1 + 5b yields again 6s + 1. This might seem circular. Please feel free to probe further.

Parasaran

How does the nearly-2-universal hash function work for both primary and secondary hashing to arrive at near perfect hashing?

Arvind

I do not understand why m has to be extremely large relative to n. What I think is that perfect hashing is possible when h is a one-to-one function and this would be true only when n>m. Am I missing something obvious?
(Avishek) Perfect hashing is indeed possible when n>m but that's the naive implementation of hash tables which we discussed in last lecture. The whole point is to find good hash families such that the hash tables have much smaller size as compared to the naive case (or the universe size m), but result in near-perfect hashing. As we will see in discussion of Ex 8.13, making m much larger than n guarantees the existence of good hash families which require small sized hash table but with very small number of collisions.

Piyush

Is the size quadratic in bin sizes indeed the best we can do for the secondary hash tables?

Jagadeesh J

I can't understand how the size of perfect hash family is polynomial to 2Ω(s) when s = n. Isn't the case that the first key can be mapped to n, while second key has n − 1 ways leading to a hash family of size n! ?

Sept 11

Rob

In the backwards analysis of the Delauny Triangulation algorithm, it's not clear to me why it's important that you pick the last step to start the analysis with. The analysis shows that the runtime of the subroutine is constant, so it has found the runtime of all iterations, not the last in particular. It is in effect simply multiplying the runtime of the inner loop (constant) by the runtime of the outer loop (linear). In what sense is this "backwards"?
(Suresh) The "backwards" part of the analysis comes in the fact that we can show that the expected degree of the vertex removed is a constant. You are right that this is true at any step. However, it's true because we're running backwards and can condition on the structure that was in place BEFORE deletion (which is backwards, since actually this structure exists AFTER insertion)

Parasaran

In the constructing of planar convex hulls, I don't follow how the backward analysis helps us to arrive at the expected number of conflict edges changes to be 2/r for the point p?

Avishek

Section 2 claims that Shamos and Hoey's O(n log n) algorithm for construction of Delaunay triangulations is worst case optimal for reasonable models of computation. What are these reasonable models of computation ? I am also curious about how things change for other (unreasonable) models of computation.

Piyush

Based on the problem structure, can we say something about whether the problem is amenable to backward analysis (for example, as shown on section 6, something like triangulation in higher dimensions is not)?

Ruihong

You said we averaged out the worst case through randomization, as to the Quicksort which I am more familiar with, I know what can be the worst case, but to the other Geometric algorithms, delaunay triangulations of convex polygons and construction of planar convex hulls, what kind of worst cases have we averaged out through randomizing the order of the unit operation?

Nathan

What are the general requirements that need to be filled in order to use backward analysis? They authors give some examples of analysis that work well and others that do not (QUICKSORT), how is this possible?

Sept 16

Rob

The book claims that selecting a proper value for r is clear from the analysis, but I don't see it. How do you choose r?
(Parasaran) The value of r has to chosen so that it is much smaller that n. We shall see that if r becomes closer to n, the cost of the search will become O(n). (which linear and not O(logn))

Nathan

Can you give an example of the point location problem in T(R) in class?
(Parasaran) The point location problem is a fundamental topic of computational geometry. Applications include areas processing geometrical data: computer graphics, geographic information systems (GIS), motion planning, and computer aided design (CAD). A simpler example would be : "Each time you click a mouse to follow a link in a web browser, the point location problem must be solved in order to determine which area of the computer screen is under the mouse pointer". We shall discuss an example of the point location problem in T(R) in class.

Scott

It says we sort the elements in R in constant time. How is this not cheating?
(Parasaran) We have the size of R as a constant (R has r numbers). And note that r is small compared to n. So we can sort this set R in constant time.

Avishek

Why do we need the log term in "anlog r/r" ? We can choose a and r such that a is much smaller compared to r ? Does the log r has anything to do with searching q in the r+1 bins ?
(Parasaran) We call a subset in S\R as good if its size is bound by(an log r)/r. This is because when the search is continued by picking a subset from S\R and recursing in it (if b > r), the depth of the search tree is log r. (where b is the size of subset).

Piyush

The random sampling scheme seems to be applicable only for cases when we are given a discrete set of objects (i.e. from a collection of points, or lines -- sets that are discrete). Are there schemes that work for cases where we are required to sample from continuous sets (for example, sampling a random point from a continuous region)? More importantly, do such continuous settings exist (or are they commonly encountered) in analyzing or constructing randomized algorithms?
(Jagadeesh) One possible situation I can think of is, sampling points from a continuous probability distributions where the cumulative probability distribution can be calculated but may not invertible. In such cases it may be possible to search within small sub-regions and stop when the size of the sub-region is | x1x2 | < ε thus converging to \frac{x_1+x_2}{2}

Arvind

I have the same question as Rob? How do we choose good r? I see that r has to be smaller than n but then how small could it be? It seems to me that r gives you a way to put the cost on the preprocessing part than the actual search part. Can we ask something like -- how far can we go to make search really really fast?

Jagadeesh

I have question regarding the choice of r. As the depth of the search tree depends on r it appears to me that the cost of search can also be written in terms of r which in turn may help us in deciding the best possible choice.

Sept 18

Nathan

Is there any way of expanding these methods to work for IPs that do not confine their variables to be 0 or 1?
(Arvind) I think answer is yes. You can solve the corresponding linear program and then round the variables to nearest integer with appropriate probability. Note that difficult part is to come up with the appropriate probability distribution.


Avishek

I didn't understand the statement that if the probabilities of any problem instance span a wide range then Hoeffding's bound in weak (ref. Page 13, line 2) ?

Piyush

Same as Nathan's question above: can we generalize this approach to general discrete optimization problems that are NP hard?
(Arvind) See the answer above. I am not sure what you mean by general discrete optimization problems. I think this technique is appropriate where you can relax the constraints, solve the problem and be able to force the constraint back by rounding. If problem you are talking about, fall in this category, then it may be possible to get the solution. Analysis may be a bit involved though.
(Piyush): I just meant the problem where the variable can take values from some discrete set such as {1,...,m}.
Similar to the LP relaxation, can we use other relaxation techniques (such as semidefinite relaxation) and expect them to help?
(Arvind) I think it's hard to say it for the general relaxation methods. If you use any relaxation method and get the solution, then be able to force the constraint back (the one you relaxed, Note that this may not always be possible to do) then I think it should help. But then it is hard to say about the quality of the result.

Parasaran

In the Multi-Commodity flow problem, why do we do the path stripping before randomizing the path selection? I don't understand how we do the path stripping if all we know if the set Sources and Sinks.


Jagadeesh (not a question but a remark to Nathan's and Piyush's question)

If a variable takes values from {1 ... m} then can't we decompose this variable into m independent binary variables and proceed with the same technique ? This brings in another additional constraint that the sum of these new sub-set of variables should sum to 1.

Rob

What is the advantage of using randomization here rather than some deterministic algorithm? Does randomized rounding give a (probabilistically) provably better bound than setting, say, the highest n variables to 1, and the rest to 0?

Suresh

To answer the questions posed by Nathan, Piyush, and Jagadeesh...
  • restricting the integer range to 0,1 is not a problem. you can either do as Jagadeesh suggested and represent larger numbers in unary, or even write them in binary: the basic principles work the same. If the problem admits it, a constraint of the form x \in \{0\ldots m\} can be replaced by the m+1 constraints x_i \in \{0,1\}, 0 \le \sum x_i \le m. I'm not saying this is the only way of doing it, but this shows that the general integer programming problem can be reduced to the 0-1 variant.
  • Semidefinite rounding is indeed a way of going about things: In fact there, things get more interesting: the semidefinite constraint will often place points on a sphere, and then the goal is to "round" to a hyperplane or something like that. Two well known problems (MAX CUT and 3-COLORING) have their best approximations via semidefinite rounding.

Sept 23

Parasaran

Since the task is do perform program verification, are we allowed the probability of error? (even though I can see that the error factor is reduced if independent iterations of the entire algorithm is performed) In other words, aren't we required to be 100% correct when we do program verification?
{Scott}: The task here is simply to verify that f1f2 = f3, or more generally, Q = 0. We're okay with this Monte Carlo approach. If we want 100% accuracy, we can simply sample r enough times (2n + 1 times) without replacement (but this takes too long).

Amit

On page144, second last para, they talk about some deterministic algorithm that do 2n+1 evaluations on O(n) size polynomials and take θ(nlog2n). Is it trivial to show?
{Scott}: I haven't found something explaining why it takes nlog2n time. To me this indicates that it probably isn't trivial to show.


Arvind

In the matrix multiplication example, do we have to restrict ourselves to choose r from a set of size of 2? I think, choosing from a bigger set would reduce the probability of the error even more. It would be 1/|S|, more like the polynomial verification example.
{Amit}: I think exercise 7.2 asks us to compare 2 methods. I think \frac{1}{2^k} is better bound than 1/|k| (we fix size of |S| to be k). If we run the algorithm for 2 alphabets k times, it will take O(kn) time with \frac{1}{2^k} error probability. However , if we run with k alphabets, then picking random r will require O(kr) work (The former needed only O(2n) work) Same work but the error is 1/|k| which is much larger than \frac{1}{2^k} error probability.

Piyush

The book suggests that for infinite fields of integers, there is a way to deal with the case of enormous value arising in the polynomial evaluations. Is there a way for infinite fields of reals as well (if reals are used)?
Same question as Amit's.
{Scott}: I'm not sure, but I don't believe so. If we have reals, then it's possible to need infinite precision (which we, of course, don't have). This is one reason we're confining things over finite fields in general.

Nathan

How large do the fingerprints need to be compared to the superset in question? I assume this can be controlled by how much error we want to allow in the procedure, and is somewhat dependent on the problem at hand?
{Scott}: If we ignore that pesky bit about it requiring an infinite number of random bits, we can even sample r from the set of all reals, to get probability of 0 for a bad event. In general the probability of error is \frac{d}{|S|}.

Jagadeesh J

On page 165, the book says the polynomial is easy to evaluate as the determinant of M at a specific point can be computed in polynomial time. It appears to me that, this determinant itself takes O(n!) as it requires summing over n! operands (each of size n again). Am I missing something ?
{Scott}: To find the determinant of a matrix in O(n3) time, we just do Gauss elimination to obtain an upper triangular matrix, and then multiply the diagonal entries. Gauss elimination takes O(n) time.
(page 164, same as Amit's question) Can't we evaluate the polynomial at 2n+1 points in O(nlogn) by selecting the points to be the 2n + 1 roots of unity ?
{Scott}: That's a fantastic point. And to be honest I don't know the answer.
(Suresh). Yay ! someone paid attention in Algorithms class. The answer is that the evaluation points MUST be random. If not, the adversary will supply you with a bad polynomial. As to why the other strategy takes n log^2n time, I have no clue. Time for some googling !
{Scott}: But if we have 2n + 1 points, then they don't need to be random (because 2n points uniquely define a polynomial).

Sept 25

Rob

Why does the book's Las Vegas version of the pattern matching algorithm fall all the way back to the "brute force" approach if it gets a false match? Can't you just detect the false match and continue using the fingerprints matching technique?
{Amit}: The probability of failure is atmost 1/n. So for a bad prime we may have to atmost check (that match is correct before outputing it) n times. And each checking requires O(m) time. Hence in worst case, its O(mn). You can't stop if you find one false match since it doesn't mean that other matches are also false. In worst case, you may have to check all the matches thats when it will take O(mn) time.

Nathan

Directly under Theorem 7.5, the books states that verifying a number is prime is hard. I thought PRIMES is in P? :) Sure, we could have a high coefficient or exponent in the runtime for primality testing, but isn't the potential prime supposed to be small?
{Amit}: Yeah the primality testing is in P. The book was written before this result came out. Since we are picking p from some set of O(n) and we know that primality checking is polynomial time. Hence , we will only need O(log n) bits.

Scott

Just as a sanity check, in the Alice/Bob example of 7.4, Alice sends Bob Fp(a), and p, correct?
{Amit}: yup! Bob should have same p to verify.

Parasaran

In the string pattern matching problem, I can see that the Montecarlo and then the Las Vegas algorithm workk efficiently bringing down the time from O(mn) to O(m + n). But I have a strong feeling that this definitely depends on a good fingerprinting function F. How do you choose F so that it has a small probability of error?
{Amit}' You need to pick p from a big set to reduce this probability. Generally error probability is 1/c from the second fingerprinting technique. If you repeated trails you can further reduce it.

Arvind

I do not quite follow the section 7.5 where they compare previous methods. Particularly, what do they mean when they say "The fingerprint technique from 7.4 is in some sense a dual of the first technique"
{Amit} In first technique we use to evaluate the polynomial at some random point say r from some fixed field Zp whereas in other method we fix r =2 and now our field Zp is not fixed anymore. In a way they are dual of each other. Both trying to solve verification of polynomial problem but in different ways. We will discuss it in the class.

Jagadeesh

In describing the running time of the pattern matching algorithm, p = O(log(mn)) = O(logn) if that is doable in case of multiplication then the final running time should also be O(log(m + n)) = O(logn). Can we conclude this ?


In the pseudo code of the algorithm, computing the equality of the two numbers Fp(Y) = = Fp(X(j)) does a comparison of O(logn) bits. If this is also included isn't the final running time of the algorithm is O(nlogn + m) ?

Avishek

The book says that the second technique is better than the first since it is efficient - it only has to do some incremental computation at each step. Why not use the third fingerprinting technique for pattern matching purposes ?
{Amit} The book makes that statement with regards to pattern matching problem. Today's technique is more efficient than the evaluating at some random point r technique for polynomial verification. Why is that true we will discuss in the seminar.

Sept 30

Parasaran

How do we arrive at Corollary 6.7 - Cuv < n3?
(Nathan) Cu,v = 2mR from a few theorems before this corollary. R is the length of the shortest path in G and will be at most n. Here, m = O(n2), so we have 2mR = 2 * n2 * n = O(n3).

Rob

I feel like I'm missing something important: the variable m appears in several equations (eg. Theorem 6.9), but I can't figure out where it's coming from.
(Nathan) m = \vert E \vert, the cardinality of the edge set.

Oct 02

Scott

On page 310, it says "Clearly, 0 < \#F \leq 2^n." I get the 2n bit, how is it clear that it's not 0 \leq \#F?
(Nathan) If any clause of a DNF formula is satisfiable than the entire formula is. The only way to guarantee a clause is not satisfiable under any setting is to allow a variable and its negation to appear in the same clause, which we assumed never happens in this formulation of the problem.

Piyush

In section 11.2.1 they say that, for their abstraction, computing f(u) and sampling from U both are assumed to take *unit* time. However, for the special case of DNF, they say that these can be done in *polynomial* time. Why is there a difference?
(Nathan) For the abstraction I think the authors just didn't want to get into a generalized runtime for these operations because it would depend on the problem.
In section 11.2.2, they say that for all i, | Hi | is computable in polynomial time. Just to make sure if I understood correctly: does " | Hi | is computable" mean evaluating a given clause for all possible truth assignments (or maybe for some truth assignment) of the set Hi?
(Nathan) It's computable for a given truth assignment.

Parasaran

I dont understand how the sampling technique becomes efficient when the requirement of obtaining an ε approximation relative to the size of G is relaxed (to an approximation of a small error w.r.t |U|).
(Nathan) This is where the importance sampling comes into play, the samples we get out of |U| are gathered in such a way to increase the chance they come from G.

Oct 07

Parasaran

In the proof of the proposition RP1 = RP2, I can see from the definition of RP1 & RP2, that RP1 \supseteq RP2. But somehow, I cannot the follow the amplification proof to show that RP1 \subseteq RP2. What role does t(|x|) play here?
Avishek: Basically, what the second part is trying to say is that if the probability of something bad happening (over here the machine returning 0 although the string is in the language, i.e., if  x \in L, Prob[M(x)=0]<1/2) is upper bounded by half then we can repeatedly invoke the machine to make this probability of the bad event arbitrarily small - this is the amplification technique. I just said that we have to repeatedly invoke the machine - but how many repetitions should we carry out ? t(x) is nothing but the number of repetitions.

Rob

It seems like we've mostly seen algorithms this semester that have one-sided error (the hashing and pattern matching algorithms come to mind), but nothing with two-sided error. What's an example of a real algorithm in the latter category?
Avishek: Random sampling can be an example of the latter category. If x belongs to L, and the sample is not good then we can make an erroneous conclusion. Again, if x does not belong to L, and our sample says otherwise, we again make an error. Thus, we are allowed to make errors on both sides. Actually, problems that can be solved by Monte Carlo algorithms lie in BPP. (An FYI in this context, problems which have Las Vegas algorithms lie in ZPP). However, there can be Monte Carlo algorithms which error only on one side and the corresponding problems lie in RP.


Scott

In section 1.2 (Offline Approach) of the l7 notes, it says "Imagine that the machine receives this second input from an external 'coin tossing device' rather than toss coins internally." I'm not catching why we want to do this.
(Suresh) It is often convenient to think of the random coin tosses as a separate "advice" string provided to the algorithm. The "external coin tossing device" emphasizes this formalism. Also, forcing the use of an external device ensures that the coin tosses are not adaptive: i.e they are computed prior to running the algorithm.
Avishek: Let's try to understand it in the context of NP. NP can be defined in two ways: 1. the classical definition says that we have a non-deterministic turing machine (NTM) which makes multiple copies of itself at each iteration. All of its copies which don't reach some "good state" die. And atleast one copy of itself which traverses the "good path" reaches the "good state". This "good state" is the witness or certificate. But, we don't supply it to the NTM. The NTM finds it for itself. 2. But NTMs are fictitious machines. So how do we simulate this with a DTM ? - in this case, we compute the "good state" beforehand (which we again call witness or certificate) and check whether by following the witness the DTM reaches an accept or reject state. So in this case the NTM is the online approach where decisions are taken dynamically and DTM is the offline approach. Now, we extend this to the probabilistic case and say that if the machine (PTM) tosses coin internally then it takes decision dynamically and so follows an online approach. However, in case we need to simulate the whole process using a DTM then we need to precompute and supply the witness beforehand. The witness in this case is precomputed using a external coin tossing device and is supplied to the DTM. The DTM need not take any decision on-the-fly and hence this is an offline approach.


Piyush

In the proof of RP1 \supseteq RP2, the authors say that the finite many inputs for which being in RP2 doesn't imply being in RP1 can be incorporated in the machine of definition 3 (i.e. RP1). How exactly is it done?
In showing the equivalence of RP1 and RP2, the authors say that noticeable probability of success (\geq 1/p(|x|)) implies existence of efficient algorithm with negligible probability of error (\geq 1 - 2^{-p(|x|)}). Doesn't it seem counter-intuitive because it might also mean that as the probability of success in RP1 goes further down (with increasing p( | x | )), in RP2 the probability of error would get even smaller? Am I interpreting something wrong here?

Arvind

I do not think I understand the difference between non-deterministic Turing machines and probabilistic Turing machines. Are probabilistic Turing machines more general than non-deterministic ones?

Ruihong

It seems that in every proof, if you can define an appropriate turing machine which captured the key property of the problem, you can complete the analysis smoothly, but how can we identify the key properties, and how can we represent them in turing machines? Can you give some tricks?
(Suresh) It's not easy. For things like time, space and randomness, it's possible to do this. Parallelism can sort of be captured as well. We'll see "interaction" as a resource to be quantified, as well as quantum bits (later on). But some notions are more elusive.

Oct 09

Parasaran

I cannot see what kind of problems fall into the class \Sigma^p_i and \Pi^p_i, for i > 2. Can you give some examples so that I can understand these classes of problems better?
(Suresh) For any particular i, there might not be natural problems, but if you look at the arbitrary long alternation, then you get to model games. Think of the model this way: We want some move (there exists) such that for all moves of the adversary (for all) there exists a move (there exists) such that for all moves of the adversary (for all) ..... there exists a winning position (a poly time evaluation). The polynomial time hierarchy defined in this manner is contained in PSPACE, which captures most game playing problems. Another place where this shows up is in logic (and database theory): some complex predicates over a database might have a few nested levels of "there exists" and "for all".

Oct 21

Rob

I'm not following the protocol for TQBF, and the main reason is that I don't fully understand the role of the polynomial s(X1). It would seem that it must be easier to evaluate than the full sum of Equation 7, since there's no nested summation involved. But, it's not clear to me how this helps prove the value of K claimed by the prover.
(Parasaran) We shall describe an Interactive Protocol for #SATD, a decision version of the #SAT problem. In the Sumcheck protocol (which will verify the number of satisfying assignments), the Prover sends back some polynomial s(X1). We still do not know if s(X1) is infact the correct polynomial h(X1). Through induction we can prove that, whatsoever s(X1) is, the Verifier will reject incorrect polynomials with a high probability. We can extend this to the TQBF problem and show that TQBF  \in  IP[poly(n)].

Piyush

Are there specific scenarios where it would be easier/more useful to go for a public coin IP and not private coin IPs?
Maybe it's just a minor pedantic notational issue: the definition 8.1 defines the output of f after a k-round interaction as f(x,a1,...,ak) and for g as g(x,a1,...,ak). For f (assuming the interaction begins with f), shouldn't the output be something like f(x,a1,...,ak − 1)?
(Parasaran) The section 8.4 on Public Coins and AM, MA classes talks about the use of public coins. The interactive proof for GNI seem to depend upon the fact that P cannot see the random bits of V . If P knew those bits, P can trivially always guess correctly. So, though it may seem that allowing the verifier to keep its coins private adds significant power to interactive proofs, in a lot of cases it doesn't make much significance even if the coin tosses are public. We shall not be looking in to this section in the lecture though.

Avishek

This is again a minor issue and I might be wrong. But I wonder whether the 'output=1' in 'Soundness' of Defn. 8.2 should be 'output=0' ?
(Parasaran) Yes this is a typo. It should be zero.

Jagadeesh

I didn't quite understand the role of the function k(n) in defining IP[k]. Does it mean that the interaction completes in 2 * k(n) steps ?
The first remark on page 149 talks about the deterministic prover which is an average of the probabilistic prover P. In the example 8.4, does the deterministic prover correspond to outputting either of Coke or Pepsi always ?
(Parasaran) The prover indeed is Marla who tells the verifier (Arthur) if the drink she tasted was coke or pepsi.

Oct 23

Parasaran

How are the various classes - P, NP, coNP, RP, IP etc... related to PCP?
{Scott}
That will be one of the major punchlines of the talk- NP = PCP.
<Parasaran>
I just saw this result in Wikipedia. It basically says NP = PCP(O(logn),O(1)) for O(logn) coin tosses and O(1) oracle queries on an input of size n. Does something interesting happen if the O(logn) and O(1) are changed to higher bounds?
(Suresh) if you allow more randomness, then you can capture NEXP (!)

Oct 30

Piyush

Is it true that a random walk on only those expander graphs will result in a distribution close to a uniform one for which the second largest eigen value is less than 1? From theorem 4, it appears so.
(Suresh). I think the point is that the graph isn't an expander unless the second largest eigenvalue is less than 1. The rate of convergence of the distribution is what's important here: it's controlled by the second eigenvalue.
For the two point sampling method, why do they assume that for at least n/2 values of r, A(x,r) = 1? Does it matter?
(Suresh) Well because it's an RP algorithm.
(Jagadeesh) If you assume that this probability is something like \alpha \;\; 0\leq \alpha \leq 1 and work through the derivation as we did ... I guess we will get something like \frac{t \alpha (1-\alpha)}{t^2 \alpha^2} = \frac{1}{t}\cdot (\frac{1}{\alpha}-1). So if α is small then the error probability will become high again.

Scott

Can we, instead of taking a random walk, take an arbitrary walk (for example, Xi + 1 is the (i mod d)-th neighbor of Xi) and still have similar (or useful) behavior?
(Suresh) Hmm. not clear. Remember that we're using the random walk structure in order to guarantee that we are getting closer to the stationary distribution (and so the probability of deviation goes down exponentially).

Parasaran

When amplifying the success probability by the random walk, I can't see why we are using only m + tlogd bits?
(Jagadeesh) Remember from the previous graph that the size of this graph is 2m. Picking a random vertex from this graph costs m bits and then for this vertex you need to select one of its neighbor log (d) (since there are d neighbors for each vertex) and the walk is of length t. The final random bits used are m + tlogd.
In BPP, does lowering the bound for \Pr[A' \textrm{fails}] when x \in L, take care of the x \notin L case as well?
(Jagadeesh) β is a two sided error probability.

Nov 6

Rob

How can it be that we can prove the existence of hard-core predicates (as done at the end of the notes for Lecture 8), given that we know that if they exist, PRNGs exist, and therefore P \neq NP?
(Suresh) Aha, but notice the exact claim. IF p is a one-way permutation, then we can manufacture g, B such that B is hard-core for g. Someone still has to give us p ! (it's turtles all the way down....)

Parasaran

Why is the pairwise independence condition on X(1),...X(k) sufficient rather than being uniformly random?
(Suresh) I think you mean "independent" rather than "uniformly random". They are uniformly random. The pairwsie condition is sufficient because that's all we need to invoke Chebyshev, to get a decent bound.
When we boil down to log k guesses, how does the bitwise addition modulo 2 take care of generating the pairwise independent variables?
(Suresh) Roughly speaking, here's the idea. Take two sets of random variables formed via summation. If they are different, then there is at least one variable that is in one and not the other. Call this r. Now since r is chosen randomly, the probability of them both being (say 1) is the probability of the set not containing r being 1, and the set containing r being 1. But the set containing r will flip its state depending on the value of r, and so it is 1 with probability 1/2, just like the other set. That's where the independence comes in: in short, the fact that r is not in both sets "decouples" them. Note that with three sets we may not find such an r, so there's no "three-way" independence result.

Scott

It seems to me that the recurring theme over the last many lectures has been that we put in a few random bits, and we get out more (pseudo) random bits. In many of the cases we've seen we're reducing the number of random bits needed by some constant. Why are we all of a sudden caring about reducing things by constants? And, from an applications standpoint, since random numbers on a computer are all pseudo random anyway, if we want more random bits, why not just call rand() again?

Nov 11

Rob

In regards to the text at the beginning Section 4.4 of the reading, about derandomization: Given that we've shown this semester how much additional power we can get from randomness, why on earth would we want to take a randomized algorithm and completely derandomize it, making it deterministic?
(Suresh) "how much additional power": What does that mean ? from a practical matter, it's true that randomization actually helps. But from a formal perspective, does it really give us more power ? Also, there are many practical reasons why a randomized algorithm might not be suitable: you might actually NEED a deterministic algorithm for some problem.

Parasaran

How does reducing the amount of randomness affect the probability bounds of the probabilistic algorithms?
(Suresh) this is a vague question. Can you elaborate ?
Suppose we have a BPP Algorithm that has a certain probability bound k when x \in L, (achieved through repetition). How does this probability bound change when we do partial derandomization?


Nov 13

Parasaran

In the Hypergraph Matching problem, aren't we trying to find the bound on the probability of Edge Cardinality (after a 'bite') difference the Expected number of edges?
I am a little lost on why we need t = O(d^{\frac{1}{2}}).

Nov 18

Rob

I don't know the notation ||Ad||_\infty (I do know that Ad is a matrix-vector product) - what is it this set balancing problem is minimizing?
(Parasaran) It is trying to minimize the maxi = 1...n | ci | of the resultant column vector c. The Michael Mitzenmacher book [1] has a better graphical explanation.

Nov 25

Scott

In Section 10.3, they refer to the "Extended Janson Inequality". What is this?
(Piyush) It has an extra condition, apart from the standard conditions required by Janson inequality, which requires that \Delta \ge E[X], and is known to give stronger results in certain cases.

Rob

The summary refers to "shows up at many places such as while analyzing web-graphs." Are "web graphs" something from graph theory, or just graphs made up of linked web pages?
(Piyush) It's the latter, or graphs associated with social network structures.

Parasaran

Do we calculate the threshold function very precisely in all cases? How exactly is the threshold function useful? It appears to me that cr(n) for some constant c would also hold good for all graph theoretic properties. (as was the case with the triangle example).
(Piyush) I think it is usually done via some kind of asymptotic analysis over the property we care about. As in the triangle example, just writing out the expected number of triangles and considering the asymptotic case (i.e. n \rightarrow \infty) suggested an appropriate parametrization for the probability (and the threshold). Knowing the threshold function helps you decide what range the probability can take and you can tune the evolution of a given random graph such that it has the desired graph theoretic property.
Personal tools