Talk:CS 6963 Summaries
From ResearchWiki
Contents |
28 Aug
Nathan
- Is it possible for objects in your chosen probability space S to have a zero probability, or do all objects in S contain the properties desired for your application?
- (Piyush) For the former question, the answer, in principle, is yes. But usually the probability space S is constructed appropriately by posing certain restrictions on it so that we end up getting a non-zero probability of existence. These restrictions, however, would usually be reasonable ones. For example, in the monochromatic subgraph problem, we assume that
, which would indeed be a reasonable assumption for many problem instances. On the other hand, for the latter question, it would be highly unlikely (and probably uninteresting) to have a case when all objects in S have the desirable properties.
- (Piyush) For the former question, the answer, in principle, is yes. But usually the probability space S is constructed appropriately by posing certain restrictions on it so that we end up getting a non-zero probability of existence. These restrictions, however, would usually be reasonable ones. For example, in the monochromatic subgraph problem, we assume that
Scott
- In section 6.4.2, when it says "
", what does the notion "Gn,p" mean?
- (Piyush) Gn,p is a random graph with n nodes that probability p of having an edge between any two nodes.
Amit
- I am not able to convince myself the max cut size be at least m/2. I buy the later argument that we need at most m/2 +1 samples to find a cut with at least m/2 edges.
- (Piyush) Note that the probability that an edge ei connects a vertex in A to a vertex in B is 1/2. To see this, consider both vertices (say v1 and v2) associated with ei. Since we are assigning v1 and v2 to sets A or B randomly and independently, the probability that they go to different sets (i.e. one to A and the other to B) is 1/2. If it is still unclear, think of this: you can assign the first vertex to any of the sets (say A) with probability 1. But then you must assign the second vertex to the other set (B), with probability 1/2. Since these two events are independent, we get an overall probability of 1/2. Invoking the linearity of expectation argument, we get the m/2 lower bound on the max cut size.
Rob
- In the analysis of the Las Vegas algorithm for the coloring problem (or, in general, any other Las Vegas algorithm), is it possible to prove the algorithm always completes in polynomial time, or can we simply prove that it completes in polynomial time with high probability? (For example, we can imagine that there is only one value for the random variable(s) that leads the algorithm to find a correct solution, but a truly random distribution is not guaranteed to return that value within a particular (finite) number of trials - it's just probable that it will.)
- (Piyush) The Las Vegas algorithm assumes conditions (i.e. sample size and the verification algorithm both should be polynomial in the problem size) that, in polynomial time, guarantee a correct solution or the non-existence of such a solution. If the assumed conditions are not met, you cannot have a Las Vegas algorithm for the problem. In your case, the sample size clearly is not polynomial in the problem size. So, by definition, it cannot be called a Las Vegas algorithm.
Tilottama
- I have a general question -While choosing to apply either the Counting Argument or the Expectation Argument, is there any intuition as to which method is more suited to a particular problem?
- (Piyush) In general, it depends on the problem. Note that the object in consideration is a random variable. If it is possible to express the desired properties as some expected value depending on this random variable, the expectation argument might be preferred. Such were the cases, for example, with the max-cut and maxsat problems.
Parasaran
Given a NP-Hard problem, what tells an algorithm analyst to choose the Randomization (Probabilistic method) or Derandomization (Deterministic Method)?
- (Piyush) It depends on the problem structure. If the correctness or feasibility of a randomized algorithm is not satisfactory, one may try derandomization. But whether you can derandomize an randomized algorithm may or may not be possible, again depending on the problem.
In the 'Sample and Modify' Example-1, how do we choose the average degree of the vertex (d) to be 2m/n ?
- (Piyush) Each edge is shared between two vertices. Thus the sum of degrees would be 2m. Since there are n vertices, we get an average 2m/n.
Avishek
As we have seen, existential proofs using probabilistic methods can be converted into efficient randomized or deterministic construction algorithms. Although I am not sure, I guess there might be situations/cases where this conversion is not possible.
- (Piyush) Yes, there can be situations where a randomized algorithm may not be able to give a satisfactory solution (in term of correctness or feasibility) or a derandomized construction is not possible.
- If this is true, then can we in some manner infer from the probabilistic method whether it would be able to give rise to some (randomized or deterministic) construction algorithm or nothing at all ?
- (Piyush) If you construct a Monte Carlo algorithm, the success will depend on the probability of getting the correct sample. In you construct a Las Vegas algorithm, it would depend on being able to efficiently sample a candidate solution, sample size, and the time to check whether the solution is correct or not (these values should be polynomial in problem size in order to give a satisfactory solution). Note, however, that some probabilistic methods (such as sample and modify), give us a randomized algorithm as a byproduct of the existence proof itself.
Hal
- I don't follow the "sample and modify" algorithm for independent sets. In particular, I don't see why it holds that the vertices obtained by the algorithm form an independent set.
- (Piyush) In both the steps, by construction (or "destruction"), we are removing one of the two vertices that are connected by an edge. So at the end, we will never have a vertex pair that was connected originally. And since we end up removing all the edges at the end of step-2, the set of vertices we are left with forms an independent set.
Arvind
- I don't understand the step 2 of the "sample and modify" algorithm for independent set.
- (Piyush) By deleting an edge (and one of the associated vertices), step-2 just removes one vertex from each pair of vertices that are connected (and hence could never be in an independent set together). This (combined with step-1) leaves us only with a set of vertices that were not connected in the beginning.
Sep 2
Nathan
- The first handout shows that the LLL generally produces very low probabilities of existence (but still positive) for your desired object. Are there any tricks to improve this in order to facilitate the creation of efficient algorithms?
- (Suresh) there's a considerable amount of discussion on this issue: the most recent paper on the k-SAT problem discusses this in the related work. A celebrated paper by Molloy and Reed from STOC 1998 explored conditions under which the Lovasz local lemma can be converted into an effective algorithm, and citations to this paper reveal much more. Maybe this would also have been a nice project topic !
Scott
- Given some problem, I don't see it as always trivial to construct the necessary dependency graph. Since dependency can be hard to detect from sampling, are there ways to relax condition 2 of Theorem 6.11?
- (Piyush) We don't have to "construct" the dependency graph. All we need to come up with is an upper bound on the degree of dependency graph which I think should always be possible to obtain from the problem structure. That said, I think it's probably also crucial how tight the upper bound is.
- (Amit) I think its not bad to have the upper bound loose (0.1 < 1 is good may not be interesting). The bound
bounds the probability of dependence and a bad event at same time.
Parasaran
- I don't quite understand how we are arriving at the constant queue size at each edge (Pg 4. in Handout 1). And how does delaying each packet that needs to be routed help?
- (Piyush) The handout only provides a portion of the result. In particular, it shows that using LLL, we can prove that every edge has a log(c) congestion in a "frame" (the structure constructed after introducing the random delays). This fact is used later for the rest of the proof which is actually part of Leighton et al. paper (ref [5]). Also note that the random delay introduced here doesn't necessarily "help" us get a good schedule. All it does is provide us with the necessary conditions under which LLL can be applied, and it can be shown that there does exist an O(c+d) length schedule with a small positive probability.
Amit
- It seems like this method could be useful for atleast finding existence of good events if there is less dependence b/w variables. Does this fact somehow could be used to make your algorithm more efficient with better provable bounds?
- (Piyush) Note than the less likely the bad events are (i.e. small p), the more willing we are to tolerate dependence (because of the condition
). That is, we can still afford to have a reasonable amount of dependency. For your question on efficient algorithms, refer to Suresh's answer to Nathan's question above.
- (Piyush) Note than the less likely the bad events are (i.e. small p), the more willing we are to tolerate dependence (because of the condition
- Also if we try changing the
condition; how will it affect the overall analysis?
- (Piyush) A weaker condition such as
may allow you to tolerate more dependence.
- (Piyush) A weaker condition such as
Tilottama
- The LLL seems to guaranty a very small probability (even in the best case) for the event being shown to exist.Doesnt this limit its scope of application? Also, it seems to rely on the order of dependency between variables which I guess is inherent to the sampling method or problem.But is there any other method/factor which can be moderated to make the algorithm more efficient and widely applicable?
- (Piyush) LLL is especially applicable for proving existence of events that are rare. Finding the bound on dependency depends on the problem structure and I think is possible to come up with such a bound for any problem (ref: Scott's question). For your question on efficient algorithms, refer to Suresh's answer to Nathan's question above.
Arvind
- I do not quite follow the analysis of packet routing after we have the expectation of the congestion (c/ac = 1/a).
- (Piyush) Since the frame length is log(c), the expected congestion of an edge f in frame F will be (1/a)*log(c). Now we need to find the probability of the event that congestion in an edge is more than log(c). To find this, they apply Chernoff bound with suitably chosen mean μ = log(c) / a and std dev δ = (1 − a) (note that the Chernoff bound looks like Pr(X > (1 + δ)μ) < = F + (μ,δ)). A union bound is then applied which gives an upper bound on the probabilities of each bad event (i.e. an edge having congestion more than log(c)). Degree of dependency graph is also shown to be bounded by some number. Combining these two facts, we get the desired result from LLL.
Rob
- I don't understand why the proof of the LLL in the first handout introduces the variable T - I don't follow why this "conditioning technique" helps the proof, and when you might want to use it.
Sep 4
Nathan
- In the attached article, why did the authors choose to set
? My guess is just so the math would work out, but are there any other reasons?
- (Piyush): I think that's the reason.
- (Suresh) this is a useful fact worth remembering: if you need a number x such that xx = n, then x = logn / loglogn
- (Amit) I think mostly in such analysis, we need to show the probability in
form and setting
does the trick for us. The reason is the fact stated by Suresh above.
- (Piyush): I think that's the reason.
- Is the line directly above theorem 6.1.1 a typo? Shouldn't the probability be greater than
?
- (Piyush): Note that we need to sum over all the Pr(Ai,k) so we need to multiply
by n, yielding
.
- (Piyush): Note that we need to sum over all the Pr(Ai,k) so we need to multiply
Scott
- From an applied view, I'm not following exactly where the randomization comes into play. If I'm making a hash table and have a function than goes from a key to an index, shouldn't that function be deterministic so that I both add to the hash table and find elements in it?
- (Piyush): I think the reason is following: We choose a function h from a family H (the 2-universal family) of hash functions. By construction, a large fraction of functions from this family are such that, for small-cardinality set of keys, the number of collisions would be small. Thus we expect that, for a particular S, even a random h from H would give us the desired performance (i.e. small number of collisions and expected time of O(1) for search and updates in the table).
- (Suresh): Scott, the function is deterministic !! it's merely constructed in a randomized manner. you fix a and b once the hashing starts.
Rob
- I'm having a little trouble with the construction of universal hash functions presented here - it's stated that a and b are selected from
, and that
- but then as part of fa,b, we take
. Isn't
by definition?
- (Piyush): I think it's meant to be read like (ax+b) mod p and not like ax + b mod p.
Piyush
- The universal hash family assumes a small cardinality for the set S of keys. Is it small in the absolute sense, or relative to the size of the universe M?
- (Amit) : I would say its relative. Small and big can't be defined in absolute terms.
- I don't completely understand why the 2-universal families can't fully capture the pairwise independence property and how the strongly 2-universal families can?
- (Amit) : I think reading Section 3.4 of the book will address this question (definition of pairwise independence says something like this h(x1) = y1 and h(x2) = y2). Here values of h(x1),h(x2) need not be equal. Said that I think Suresh can give a better intuition since this section is not covered that much in detail.
- In the construction, is there any reason behind choosing a linear function for fa,b? Does it matter/help to if we choose some other type of function?
- (Amit) : I think we need to choose a simple function (like linear) since if we chose something complex, the evaluation may unnecessarily take longer.
Parasaran
- How exactly do you differentiate a 2-universal family from the strong 2-universal one?
- (Amit) crude form of answer is 2-strong universal one subsumes 2-universal family. The reason is if y1 = y2 in 2-strong universal, then both are doing the same thing.
- I cannot follow how the 2- universal family helps the dynamic dictionary problem?
- (Amit) In dynamic case, we can have updates which cannot be taken care by any deterministic strategy. Hence we need randomization to beat the adversary. And 2-universal hash families is exactly trying to do that for us.
Avishek
- The last paragraph in 8.4.4 [MR] says that k-wise independence reduces the chances of collision to 1/n^k and thus makes things better. This, as we saw in the last seminar, corroborates with the Lovasz Local (L2) lemma which says that an higher degree of mutual independence increases the chances of existence of a good event.
- (Amit) I am not sure whether having more independence always increases the chances of existence of a good event. Also independence and dependence are inherent to the problem.
- 1. is there any connection between the two ?
- (Amit) I can't see obvious connection. Though more independence reduces the probability of bad events i.e. collision in our case.
- 2. what is the intuition for the "importance of (mutual) independence" in formulation/design of randomized techniques ? or in other words, why is "(mutual) independence" so important for randomization ?
- (Amit) Independence assumptions makes things easy to analyze. Also more dependence among variables makes problems combinatorial. Let aside randomization, even in graphical models, independence among rvs is good.
Jagadeesh J
- 1. When | N | = | S | , isn't it possible to sample a random perfect hash from perfect has family ? I suspect the answer is no, other wise we don't need to go for a 2-level hashing. But I am still not able to digest the need to go for a 2-level hashing function when we are already using the memory of size equal to the size of Keys.
- (Jagadeesh) I think its the trade off between the memory size vs. the search time.
- (Amit) When |S|=|N| doesn't mean you know your Set S (chosen from some Set M) a priori. If you know your |S| in advance like in static case, then you are good. Otherwise, adversary could change S according to your chosen hash function. You could do separate chaining or 2-level hashing to avoid conflicts.
- (Amit) http://en.wikipedia.org/wiki/Q.E.D.
- (Jagadeesh) I think its the trade off between the memory size vs. the search time.
Sep 9
Nathan
- This is a pretty simple question, but I'm missing something. At the tail end of lemma 8.17, how do they make the jump from k(x − y)modp = jr implying that the number of values causing collision is at most
?
- (Avishek) In the equation immediately above k(x − y)modp = jr, it says that k(x − y)modp results in values which belong to a set
. Since, x,y and p are fixed in this equation, only by varying k can k(x − y)modp = jr attain all values in
. And the number of values in
is the sum of
positive values and
negative values which equals to a total of
.
- (Avishek) In the equation immediately above k(x − y)modp = jr, it says that k(x − y)modp results in values which belong to a set
Rob
- The scheme described in MoRa requires 6s + 1 cells; the constant 6 feels pretty large. Are there similar schemes that use provably fewer cells? Clearly, the limit would be s + 1 (with the extra cell being used to store k) if one could find a perfect hash function. Since the "incremental" cost of a cell in the primary table that collides is 5 (3 for the secondary table, 1 to record its length, and 1 to record which secondary hash function is in use), and the expected number of collisions in the primary table is low, we might expect to be able to get closer to s + 1 + 5b, where b is the number of collisions, and
.
- (Avishek) The way I understood it is that the primary hash function is good not because it overally reduces the total no. of collisions (b) to
, but rather because it spreads out the collisions evenly across the entire primary hash table so that the no single cell in the primary hash table has a large no. of collisions. A good choice of primary hash function doesn't necessarily reduce the total number of collisions but instead tries to ensure a linear bound over the overall size of the entire hash table despite the fact that secondary hash tables require size quadratic in primary hash table bin sizes. The total no. of collisions (as given by the 2nd last line in Pg 226) still remains
which when plugged into your expression s + 1 + 5b yields again 6s + 1. This might seem circular. Please feel free to probe further.
- (Avishek) The way I understood it is that the primary hash function is good not because it overally reduces the total no. of collisions (b) to
Parasaran
- How does the nearly-2-universal hash function work for both primary and secondary hashing to arrive at near perfect hashing?
Arvind
- I do not understand why m has to be extremely large relative to n. What I think is that perfect hashing is possible when h is a one-to-one function and this would be true only when n>m. Am I missing something obvious?
- (Avishek) Perfect hashing is indeed possible when n>m but that's the naive implementation of hash tables which we discussed in last lecture. The whole point is to find good hash families such that the hash tables have much smaller size as compared to the naive case (or the universe size m), but result in near-perfect hashing. As we will see in discussion of Ex 8.13, making m much larger than n guarantees the existence of good hash families which require small sized hash table but with very small number of collisions.
Piyush
- Is the size quadratic in bin sizes indeed the best we can do for the secondary hash tables?
Jagadeesh J
- I can't understand how the size of perfect hash family is polynomial to 2Ω(s) when s = n. Isn't the case that the first key can be mapped to n, while second key has n − 1 ways leading to a hash family of size n! ?
Sept 11
Rob
- In the backwards analysis of the Delauny Triangulation algorithm, it's not clear to me why it's important that you pick the last step to start the analysis with. The analysis shows that the runtime of the subroutine is constant, so it has found the runtime of all iterations, not the last in particular. It is in effect simply multiplying the runtime of the inner loop (constant) by the runtime of the outer loop (linear). In what sense is this "backwards"?
- (Suresh) The "backwards" part of the analysis comes in the fact that we can show that the expected degree of the vertex removed is a constant. You are right that this is true at any step. However, it's true because we're running backwards and can condition on the structure that was in place BEFORE deletion (which is backwards, since actually this structure exists AFTER insertion)
Parasaran
- In the constructing of planar convex hulls, I don't follow how the backward analysis helps us to arrive at the expected number of conflict edges changes to be 2/r for the point p?
Avishek
- Section 2 claims that Shamos and Hoey's O(n log n) algorithm for construction of Delaunay triangulations is worst case optimal for reasonable models of computation. What are these reasonable models of computation ? I am also curious about how things change for other (unreasonable) models of computation.
Piyush
- Based on the problem structure, can we say something about whether the problem is amenable to backward analysis (for example, as shown on section 6, something like triangulation in higher dimensions is not)?
Ruihong
- You said we averaged out the worst case through randomization, as to the Quicksort which I am more familiar with, I know what can be the worst case, but to the other Geometric algorithms, delaunay triangulations of convex polygons and construction of planar convex hulls, what kind of worst cases have we averaged out through randomizing the order of the unit operation?
Nathan
- What are the general requirements that need to be filled in order to use backward analysis? They authors give some examples of analysis that work well and others that do not (QUICKSORT), how is this possible?
Sept 16
Rob
- The book claims that selecting a proper value for r is clear from the analysis, but I don't see it. How do you choose r?
- (Parasaran) The value of r has to chosen so that it is much smaller that n. We shall see that if r becomes closer to n, the cost of the search will become O(n). (which linear and not O(logn))
Nathan
- Can you give an example of the point location problem in T(R) in class?
- (Parasaran) The point location problem is a fundamental topic of computational geometry. Applications include areas processing geometrical data: computer graphics, geographic information systems (GIS), motion planning, and computer aided design (CAD). A simpler example would be : "Each time you click a mouse to follow a link in a web browser, the point location problem must be solved in order to determine which area of the computer screen is under the mouse pointer". We shall discuss an example of the point location problem in T(R) in class.
Scott
- It says we sort the elements in R in constant time. How is this not cheating?
- (Parasaran) We have the size of R as a constant (R has r numbers). And note that r is small compared to n. So we can sort this set R in constant time.
Avishek
- Why do we need the log term in "anlog r/r" ? We can choose a and r such that a is much smaller compared to r ? Does the log r has anything to do with searching q in the r+1 bins ?
- (Parasaran) We call a subset in S\R as good if its size is bound by(an log r)/r. This is because when the search is continued by picking a subset from S\R and recursing in it (if b > r), the depth of the search tree is log r. (where b is the size of subset).
Piyush
- The random sampling scheme seems to be applicable only for cases when we are given a discrete set of objects (i.e. from a collection of points, or lines -- sets that are discrete). Are there schemes that work for cases where we are required to sample from continuous sets (for example, sampling a random point from a continuous region)? More importantly, do such continuous settings exist (or are they commonly encountered) in analyzing or constructing randomized algorithms?