CS 6963 Summaries

From ResearchWiki

(Difference between revisions)
Jump to: navigation, search
(Problem: Fingerprinting and Applications)
m (<h1><i> Randomized Algorithms</i> Motwani and Raghavan, sections 7.1-7.2</h1>)
Line 351: Line 351:
-
===<h1><i> Randomized Algorithms</i> Motwani and Raghavan, sections 7.1-7.2</h1>===
+
===<h1>Randomized Algorithms</i> Motwani and Raghavan, sections 7.1-7.2</h1>===
Notes by Sco<math>\pi</math> Alfeld
Notes by Sco<math>\pi</math> Alfeld

Revision as of 21:17, 24 September 2008

Back to CS6963

Contents

Aug 26, 2008

Problem: routing in a network.

We're given a directed graph on N nodes. Each edge is a link between communicating nodes. A link can carry one packet at a time. We assume that nodes have queues for each outgoing edge. All nodes have IDs, so we can refer to node 1, node 10, and so on.

Consider permutation routing: here, each node wants to send a packet to exactly one other node, and each node receives exactly one packet. A routing algorithm has to decide, for each (i,j=d(i)) start-end pair, how to route the packet that goes from i to j. Every edge can be busy at a time step: thus the running time of the routing algorithm is upper bounded by the max congestion on an edge.

Let's consider an "easy" class of routing algorithms: namely, algorithms that are oblivious: the route used depends solely on the pair (i,j). More precisely, the path taken by a packet from i depends only on j, and on nothing else. Oblivious routing needs little state: each (src,dest) pair uniquely defines a route, and so routing tables are easy to maintain.

Theorem: Let the graph have N nodes and out degree d. Then there exists a permutation that forces \Omega(\sqrt{N/d}) congestion for any deterministic oblivious routing scheme.

Consider a special network: the Boolean hypercube on n-bit strings (two nodes are adjacent if they differ in one bit). the degree d = n, and the size of the network N = 2n. The above result says that the number of steps needed is at least \sqrt{2^n/n}.

We will prove the following result:

Theorem: There exists a randomized routing scheme for the hypercube that requires only O(n) steps with high probability.

The deterministic lower bound works by a counting argument (show that each "destination graph" linking a fixed destination to its sources must have a large number of congested edges, and use averaging to show that there must be some large set of routes that all hit the same bad edge. Since routes are oblivious, this defines a permutation that forces congestion).

But let's get some intuition on the hypercube using a simple algorithm called 'bit fixing': find the left most bit where the current location differs from the target, and route along that path.

Lemma: there's a \sqrt{N/n} lower bound for bit-fixing methods. Proof: consider the following permutation: the node i = a_i b_i is routed to the node j = b_i a_i Idea: all routes must pass through a node of the form X X and there are only \sqrt{N} of these. Thus there exists at least one node that takes in N/\sqrt{N} packets, and it has degree n.

Here's a clue though: the adversary beat the algorithm by taking advantage of the bit fixing protocol. Suppose we could confuse it by scattering nodes so as not to create a bottleneck ?

Randomized algorithm

  1. Route every packet from i to a random location σ(i)
  2. Route every packet from σ(i) to d(i)

Intuition: the randomization spreads the packets out, so the routes don't get bottlenecked.

Proof. Suppose we show that Phase 1 has congestion c with high probability. Then this implies that phase 2 has congestion c with high probability as well by reversing time. Does this mean that the total congestion is at most 2c ? Yes, if we make sure that no packets from phase 1 interfere with packets from phase 2. So we make all packets wait until c steps are done, so that we can be sure that there's no interference.

What remains is to prove that phase 1 causes only congestion c. Fix a packet v_i from i, and look at how long it takes. It's n + delay. How much is the delay ?

Part I: identify structural cause of delay (independent of randomization) Part II: analyze structure using randomization

What's the cause of delay.

Fix a route for a particular packet v. Let S be the set of all packets that traverse at least one edge of this route. Claim: the delay in packet is at most |S| Intuition: any packet that shares a route with v can delay it only once.

Observation: once two routes separate, they cannot merge again. Proof: in the bit fixing model, two routes diverging implies that at some step, the two packets are at locations that have different prefixes. Now each prefix is a prefix of the final destination, so once this happens, the routes cannot merge again.

[Back to claim] We will charge every unit of delay for v to some packet in S. Suppose at time t, v travels packet ej. then the lag is t - j. Suppose v's lag goes from l to l+1, while v is about to cross ej. this means that some other packet must cross ej at the same time, and therefore also has lag l.

Fix the latest time at which some packet has lag l. This packet must leave the route, else it can continue on with lag l. We charge v's increase in lag to this packet, and now that it's gone, we will never charge it again.


Analyze this structure.

We now need to understand how many packets overlap v's route. Let Hij = 1 if routes for i and j share an edge. Therefore, the delay on v_i = \displaystyle{\sum_j} H_{ij}. Destinations are chosen randomly, and so Hij are independent Poisson trials. So we need to estimate E[\sum H_{ij}].

Fix an edge e, and let T(e) = # routes through it. Then \sum H_{ij} = \sum T(e_j) (for e_1\ldots e_k the route of i)

By symmetry, each edge is the same as each other in Phase 1, and so T(ei) = T(ej). Since the expected route length is n/2, and there are N of them, the total number of route-edges = Nn/2, and there are Nn edges in the hypercube, and so E[T(e)] = 1/2 Therefore, E[\sum H_{ij}] \le k/2 \le n/2

Applying a standard Chernoff bound, the probability that this is more than 6n is 2 − 6n. Moreover, we have to take a union bound for all packets, so we multiply this by 2n = 2 − 5n

NOTE: the T(ei) are not independent, but the Hij are. so we must use a Chernoff bound over the H_{ij} rather than over the T(ei)

Aug 28, 2008

The Probabilistic Method I

In today's lecture, we shall discuss some simple yet very effective probabilistic techniques that can be used for proving existence of objects with certain properties. These techniques fall under the broad umbrella of "Probabilistic Methods" and, using the power of randomization, they give us results that could be hard to prove otherwise. The basic recipe is simple:

Construct a sample space of objects. Now check to see if, with positive probability, a randomly selected object from this space has the required property. We are done.

Although the basic premise is simple, even in the randomized setting, certain combinatorial arguments are needed for proving the existence of such objects. The probabilistic method thus helps us by providing a bunch of techniques to deal with this. Apart from giving the existence results, the power of randomization is further evidenced by its ability to help us develop randomized algorithms that can actually find such objects. We shall discuss three important techniques and example situations where they yield useful results.

The Basic Counting Argument

The basic counting argument works by constructing an appropriate probability space S and then, by naively enumerating over all possible randomly chosen objects of interest, shows that the probability of existence of an object having required properties is non-zero.

Example: Consider the problem of coloring the edges of an n-vertex graph with only two colors such that there are no large cliques (say having k vertices) with all edges having the same color (i.e. no monochromatic subgraph). The following theorem gives the condition that guarantees the existence of such a coloring.

Theorem: If {n\choose k}2^{-{k \choose 2} + 1} < 1, then it is possible to color the edges of Kn with two colors so that it has no monochromatic Kk subgraph.

Proof: The proof works by constructing a sample space of all k-vertex cliques and computing the probability that all edges in each such clique are given the same color. It's easy to show that this probability turns out to be 2^{-{k \choose 2} + 1}. Using the union bound (and applying the condition given in the theorem above), we can easily see that the probability of at least one such clique being monochromatic is less than 1. Thus the probability that none of the k-cliques are monochromatic will be greater than zero and thus there must exist such a coloring.

Now that we know that such a coloring exists, we can even actually go further and find such a coloring using a randomized construction algorithm. We shall see examples of constructing both a Monte Carlo and a Las Vegas algorithm for finding such coloring. We will also investigate the conditions under which such constructions are practically useful (i.e. whether the Monte Carlo algorithm indeed yields a low error probability, or whether the Las Vegas algorithm can run in polynomial time).

The Expectation Argument

The basic counting argument works by constructing a probability space from which a randomly chosen object is shown to have the desired properties with a positive probability. However, sometimes it is easier to prove such a thing by instead working with an expectation (averaging) argument. The expectation argument is based on a fairly simple (and intuitive) fact: a random variable must, with positive probability, have at lease one value that is no greater than its expectation, and at least one value that is no smaller than its expectation.

Example: A satisfiability problem (also called a SAT formula) is a logical expression that is the conjunction (AND) of a set of clauses. Each clause is a disjunction (OR) of literals. The solution to SAT formula is an assignment (True/False) of the literals such that all the clauses are satisfied. In general, determining if a SAT formula has a solution is NP-hard. Given a SAT formula, a related problem is satisfying as many clauses as possible. This is called the MAXSAT problem. We shall see that, using the expectation argument, we can provide a lower bound on the number of satisfied clauses.

Theorem: Given a set of m clauses, let ki be the number of literals in the ith clause for i=1,...,m. Let k = miniki. Then there is a truth assignment that satisfies at least \sum(1 - 2^{-k_i}) \geq m(1 - 2^{-k}) clauses.

Proof: The proof works by first finding the probabilities of each clause being satisfied for some uniformly random choice of literals. Given these probabilities, we can easily compute the expected number of satisfied clauses. Invoking the expectation argument, we see that there is a positive probability that there must be an assignment that satisfies at least that many number of clauses.

We shall see another example of finding a large cut in a graph. In general, finding the maximum cut in a graph is NP-hard. However, using the expectation argument, we can give a lower bound on the value of the maximum cut and show that the value of maximum cut is at least half the number of edges in the graph. Transforming this existence argument, we will also see a Las Vegas algorithm that finds a large cut with this lower bound.

Sample and Modify

In some cases, it is easier to indirectly construct the object with required properties. The sample and modify method gives a 2-stage indirect construction. In the first stage, a random structure is constructed which does not have the required properties. In the second stage, this structure is incrementally modified until it becomes one with the required properties.

Example 1: An independent set of a graph is a set of vertices with no edges between them. Finding the largest independent set in a graph is NP-hard. Using sample and modify, however, we can give a lower bound on the size of the largest independent set of a graph.

Theorem: Let G = (V,E) be a graph on n vertices with m edges. Then G has an independent set with at least n2 / 4m vertices.

Proof: The proof (which can also be thought of as an algorithm) works by starting with the original graph and incrementally removing vertices and edges from it until all the edges are removed from the graph. The remaining vertices thus form an independent set. Thus in this case, we have a bound on the size of the independent set (which as we will see is n2 / 4m) as well as an algorithm for finding such a set.

Example 2: Another example is sample and modify technique is about proving the existence of graphs having a large girth. Girth of a graph is the length of its smallest cycle.

Theorem: For any integer k \geq 3 there is a graph with n nodes, at least \frac{1}{4}n^{1+1/k} edges, and girth at least k.

Usually we expect dense graphs (i.e. ones having a large number of vertices) to have a small girth. But the above theorem says that it is possible for even dense graphs to have large girths too.

Proof: The proof works by first sampling a random graph (i.e. one starting with n isolated vertices and randomly, with some probability p, adding edges between all pairs of vertices). Then in the second step, we count all possible cycles of length up to k-1 and destruct all such cycles. The destruction is simple: simply delete one edge from each of these cycles. In the graph we are left with, there would be cycles only having length more than or equal to k. To get the number of edges in the resulting graph, we can subtract the number of edges deleted in the second step from the number of edges we had in the graph to begin with. It can be shown that this number is at least \frac{1}{4}n^{1+1/k}.

Sept 02, 2008

The Probabilistic Method II

Continuing in the same vein as in the previous lecture, we shall see yet another powerful technique for giving existence proofs for rare events (i.e. ones having a very small probability). This is in contrast with several other probabilistic methods which usually show the existence of events that usually have a reasonably high probability of existence.

Consider a set of "bad" events (say E1,...,En). We often want to obtain a bound for the case when no bad event happens (which in general would be a rare event, possibly with a very small probability). That is, we want a bound on Pr(\bigcap_{i=1}^n \bar{E_i}). If the events are mutually exclusive or mutually independent, it is easy to obtain such a bound. However, often we are faced with situations that only allow limited dependence (or, seen another way, limited independence) among the events. The Lovasz Local Lemma shows that even under such limited dependence, we can prove such bounds, given certain conditions.

Mutual Exclusion

If all the bad events Ei are mutually exclusive, then the probability of at least one of them happening (i.e. their union) is merely the sum of their probabilities. This gives us the probability of a good event (i.e. none of the bad events happening) immediately.

Mutual Independence

For a set of mutually independent events, each with a bounded nonzero probability, it can be easily shown that the probability of none of the events occurring is positive. To see this, think of the events E1,..,En as a set of mutually independent events in some probability space. Then Pr(\bigcap_{i=1}^n E_i) = \prod_{i=1}^nPr(E_i). Also, if E1,...,En are mutually independent then the complements \bar{E_1},...,\bar{E_n} can be shown to be independent as well. If each probability Pr(Ei) is bounded, i.e. Pr(Ei) < 1, then: Pr(\bigcap_{i=1}^n \bar{E_i}) = \prod_{i=1}^nPr(\bar{E_i}) > 0, giving the desired results.

Limited Dependence

In many scenarios, mutual independence is too strong an assumption (and mutual exclusion too is very rare). However, often there are cases when the problem structure requires (or allows) us to slightly relax the independence conditions. In such cases, the events are no longer mutually independent but only have limited dependence among them. Such dependencies can be captured by a dependency graph which is a directed graph such that there exists an edge between any two nodes only if the associated events depend on each other.

Based on the notion of limited dependence, the Lovasz Local Lemma (LLL) gives us conditions and guarantee that all of a set of "bad" events can be avoided under certain conditions. The LLL comes in two versions - symmetric and asymmetric. However, for most applications, using the symmetric version suffices and we shall consider the symmetric version only.

Lovasz Local Lemma

Let E1,...,En be a set of events, with the following assumptions:

  • for all i, Pr(E_i) \leq p.
  • the degree D of dependency graph given by E1,...,En is bounded by D \geq 1 (i.e. each Ei is independent of all but at most D other events Ej).
  • 4pD \leq 1.

Then Pr(\bigcap_{i=1}^n \bar{E_i}) \geq (1 - 2p)^n> 0

Note: The condition 4pD < 1 means that since we are given p, D (degree of dependency) cannot be too large. Note that the less likely the bad events are, the more willing we are to tolerate dependence, because even chained bad events can't hurt us.

Note: For D = 0, the mutually independent case follows.

Applications

Listed below are some problems, and the related results which we shall see how to prove using the Lovasz Local Lemma.

  • Edge-disjoint paths: The problem is to come up with a communication strategy for n pairs of users such that no edge is shared between any two paths. Further, each pair i = 1,..,n can choose its path from a set Fi of m paths. Using LLL, we can show that if any path in Fi does not share edges with too many paths in Fj, then there exists a way to choose n edge-disjoint paths connecting the n pairs.
  • k-satisfiability problem: The problem is to check if there exists an assignment such that a set of clauses, each having k variables, is satisfied. Using LLL, we can show that such an assignment is possible if no variable appears in too many clauses.
  • Routing using prespecified paths: Consider a routing problem on a network where the path for each packet has already been specified (i.e., for packet i, a path Pi that connects vertices si and ti). The restriction is that each edge in the network can carry only one packet per time step. This would lead to congestion on an edge if there are several packets waiting to use that edge. We further assume that the paths are edge-simple, that is a path does not repeat an edge. We want to minimize the maximum time taken by any packet to reach its destination. Equivalently, we want to come up with a schedule such that the queue-sizes at each edge are kept as small as possible. It turns out that using LLL, it is possible to come up with such a schedule.

Sept 04, 2008

Hashing I

Hashing is one of the most powerful data structures. In this lecture, we will see internally how a hash table is created using randomization.

Why hashing?

We have lot of data structures like balanced search trees, random treaps and random skip lists, then why do we need hashing? The problem with all the above data structures is that they need Ω(logs) time to process search or update operation. However all of us know that using hash tables, we can do the look up as well as update in 0(1) time.

Naive way of doing hashing Suppose we have a set S whose keys are keys are chosen from a totally ordered universe of M of size m. Now if we create a Table T of size m; a table is simply an array supporting random access of set S. This strategy makes look up and updates in 0(1). However, the catch is the size of m is many magnitudes larger than the set S. Even though the approach is impractical, it gives us a new model which permits us to get around the comparison-based lower bounds on searching in a totally ordered set. Now our goal is to reduce the size of the table to get it close to size of set S.

Difference b/w static and dynamic hashing

  • In the static dictionary problems, we are given a fixed set of keys S and the job is to organize them into a data structure that supports efficient processing of FIND queries
  • In dynamic case, the Set S is not known in advance. Instead it is constructed using insert and delete operations that are intermingled with FIND queries.

Defination of hash function A hash function h is a mapping from M \rightarrow N where size of m>>n. N is a table consisting of entries indexed from N={0,1,\cdots,n-1}. The size of each word in in table is logm

Defining collision A collision occurs when two different elements present in M maps to same cell in table N.

Definition of Perfect hash function

A hash function h: M \rightarrow N is said to be perfect for a set S \subseteq M if h does not cause collisions among the keys of the set S.

  • For Static case, perfect hash functions can be used to obtain good solutions and we will talk about this in detail on next Tuesday.
  • In dynamic case, it is not possible to come up with a perfect hash function. The problem is that no hash function can be perfect for all possible sets S \subseteq M; since n<<m. Also the deterministic hash functions are not good idea, since adversary could then chose any input to beat the algorithm. Hence, we need randomized nearly perfect hash functions.
  • For more, refer exercise 8.11 and stuff below it

Universal Hash Families

Definition: Let M={0,1,\cdots,m-1} and N={0,1,\cdots,n-1}, with m \geq n. A family H of functions from M into N is said to be 2-universal if, for all x, y ε M such that x \neq y, and for h chosen uniformly at random from H,

Pr[h(x) = h(y)] \leq \frac{1}{n}

Can we do better than this bound? The answer is no? Refer to Theorem 8.12 (read notations above it)

Good properties of random function Refer Theorem 8.14

Why total randomness is bad A totally random mapping from M to N has a collision probability of exactly 1/n; thus, a random choice from a 2-universal family, of hash functions gives a seemingly random function. The collection of all possible functions from M to N is a 2-universal family, but it has several disadvantages. Picking a random function requires Ω(mlogn) random bits. This is also the number of bits of storage required to represent the chosen function.

Solution to above problem Again, we need a smaller 2-universal families of function that require a small amount of space and are easy to evaluate. In particular, the goal is to create a 2-universal family using only a small small subset of all possible functions.

Intuition So essentially what is happening above is we have lot of randomness and its not good for us since we need to do lot of work to create our hash functions. Hence, we need to do derandomization to reduce the space of all possible hash functions. The small set of space we get after derandomization could be used to pick and represent our hash functions.

Derandomization to reduce randomness

Constructing Universal Hash Families For fix m and n and choose a prime p \geq m . We will work over the field Zp={0,1,\cdots,p-1}. Let g: Z_p \rightarrow N be the function given by g(x) = x mod n. For all a,b εZp define the linear function  f_{a,b}: Z_p \rightarrow Z_p and the hash function h_{a,b} : Z_p \rightarrow N as follows.

  • fa,b(x) = ax + bmodp
  • ha,bx = g(fa,b(x))

The family of hash functions H = {ha,b | a,bεZp with  a \neq 0 is 2-universal. We can prove this fact using the definition of universal hash families.

  • Refer Theorem 8.15 and 8.16.

Summarize: Now, if you look at this family of hash functions, it is 2-universal as well as it requires only O(log m) bits as compared to (m logn) bits in total random case. Hence, here the big message is randomness is good. But as we know excess of everything is bad, its same with randomness. At that time, we could derandomize the algorithm and can use pseudorandomness to generate efficient algorithms with provable bounds

Definition of Strongly Universal Hash families Refer definition 8.6


Sept 09, 2008

Hashing II

Motivation

In the last lecture, we encountered hash functions for dynamic dictionaries. With these hash functions we could perform searches in expected constant time. However, the constant time bound was obtained using an average case analysis and for dynamic dictionaries the time complexity becomes unbounded in the worst case. In this lecture, we introduce the design of hash functions for static dictionaries which have constant search times in the worst case. The trade off being that the dictionary size should be fixed and known in advance. Dynamic dictionaries sizes, for instance, could update themselves and hence their size need not be fixed.

Why do we need perfect hash families ?

In order to ensure constant search time in the worst case for static dictionaries, we need to find (perfect) hash functions which cause zero collisions. We start by showing that families of perfect hash functions do exist for static dictionaries, for example, the trivial case of all functions from M to T. However, this family has a very large size and we hope to obtain (almost) similar good results with hash families of much smaller size. In addition, we also have come up with the requirement that the size of the family of good hash functions be bounded in polynomial of m, where m is the size of the universe (M) of keys.

Why aren't perfect hash families sufficient ?

As already mentioned, for perfect hash functions, the hash family size is very large. The hope is that by relaxing the "perfect" requirement on the hash functions, we can bring down the size of the near-perfect hash family. However, near perfect hash families might cause collisions. But we cannot afford to have many collisions since we intend to achieve O(1) search time in the worst case. To this effect, we introduce a secondary level of hashing which uses perfect hash functions to rehash all keys which collide within a particular bin in the primary hash table.

Overall hashing scheme

So, now we have:

  • a near perfect primary hash function which maps all keys from S \subseteq M to the primary hash table, and
  • an exactly perfect secondary hash function which maps colliding keys in primary cells to a secondary hash table.

We still need to prove:

  • the existence of such primary and secondary hash functions
Ex. 8.15 and 8.16 suggests using probabilistic methods to prove their existence
  • guarantees on the overall size of the hash table and the worst case search time
Th 8.19 proves that despite the fact that secondary hash tables require size quadratic in bin sizes of the primary hash table cells, we can still obtain a linear bound the overall size of the entire hash table by intelligently choosing the primary hash functions. Th 8.19 also proves that in the worst case, the FIND operation requires constant time


Food for thought

1. Given that we can construct chaining schemes (double hashing) for static dictionaries, why not do the same dynamic dictionaries ?

2. Defn 8.10 is analogous to construction of universal hash functions in the initial portions of section 8.4.3 (page 219). But now we only have a multiplicative factor in x (we multiply it by k) and no addition term (as we had b in the earlier case). Why ?

Sep 11, 2008

Randomized incremental constructions

The idea of a randomized incremental construction (RIC) is a way to "beat" an adversary by randomizing the input so that the adversary can't set up a worst-case sequence that makes life difficult for an algorithm. For example, insertion sort is an example of an algorithm that has a worst-case performance of Ω(n2) because the adversary can create a bad input. But suppose we were allowed to permute the input after the adversary chose it ? We then get a randomized version of the sort procedure, but one in which we can argue that bad cases get "averaged out".

A generic randomized incremental procedure looks like this:

  1. Randomly order inputs s_1, \ldots s_n
  2. insert them into a data structure one by one

This is incredibly easy from a programming point of view, and makes these algorithms very effective in practice. However, from an analysis point of view, it turns out that a better description is recursive.

Recursive-Solve(S = s_1, \ldots s_n)

  1. Pick s_i at random
  2. Recursive-Solve(Ssi)
  3. Insert si into resulting structure.

Unrolling the recursion makes the algorithm look "backward": It seems as if we're picking the s_i randomly in reverse order, running the algorithm from back to front. This allows us to employ a form of analysis called 'backward' analysis, where we analyze the algorithm as if it were running backward.

The advantage of this approach is that it changes the nature of the events we condition on when figuring out the cost of inserting (or deleting) a new item. Roughly speaking, the cost of insertion (a random variable), instead of being analyzed in terms of the existing structure, can be analyzed in terms of the resulting structure. This is because if we take the 'backwards' view of the algorithm, we can relate the cost of insertion in step 3 to the cost of deletion in Step 2. This changes the conditioning, and as we shall see in a number of examples, makes the analysis tremendously easier.

We'll look at 4 examples, in progressively increasing complexity:

  1. A re-analysis of QUICKSORT in this RIC setting
  2. Planar convex hulls
  3. Delaunay triangulations of convex polygons
  4. Fixed dimensional linear programming.

Sept 16, 2008

Random Sampling

There are situations for which randomized incremental construction, which was discussed in the last class, might not be appropriate. For instance, randomized incremental construction is inherently sequential, and may thus be unsuitable for designing parallel randomized geometric algorithms. Building a geometric structure is not often an end in itself; It is also a means for solving search problems. For eg., the Voronoi diagram serves as the basis for nearest neighbor queries. We focus on designing randomized geometric algorithms, (Also called random sampling or randomized divide-and-conquer).

Point Location Problem

We are given a set S of n numbers, and wish to answer membership queries: a query is a number, and we are to report whether or not the query number is a member of S. We pick a random sample R of r numbers from S, where r is a constant whose value is sufficiently small compared to n. We sort the elements of R (in constant time), and then partition S\R (in time q(n)) into r + 1 subsets; the ith subset contains those elements of S\R that are larger than exactly i elements of R. Sample R is good if everyone of the r + 1 resulting subsets of S\R has size at most (an log r)/r, for a fixed, suitable constant a.

Search Structure

For each subset containing more than b elements, for a suitable constant value of b > r, we recur by again choosing a random subset of r elements from it, and so on. This process induces a search-tree. Given a query q, we identify (in constant time) one of the r + 1 subsets of S in which to continue the search for q. We search recursively in the sub-tree associated with this subset.

Cost of Search

Let Q(n) be the cost of the search on a set of n elements. We have the following recurrence.  Q(n) \leq c + Q(\frac{an\log r}{r}) a is small compared to r/log r. c is a constant representing the cost of descending one level of the tree. We have Q(n) = O(log n), a constant because in the process of constructing the search tree, we ensured that the random sample at every level was good.

Point Location in Arrangements

Let L be a set of n lines in the plane. The lines in L partition the plane into O(n2) convex polygonal regions.The resulting structure is known as an arrangement of lines. we are only interested in the portion of this arrangement that lies within a fixed triangle τ that contains in its interior all points of intersection between lines in L. This can be viewed as a planar graph as follows. There is a vertex of the graph for each point at which two lines meet. In addition, there is also a vertex for each point at which a line of L intersects the boundary of τ. An edge between two vertices corresponds in the natural sense to the line segment between two vertices that are adjacent in the arrangement. Each face of this planar graph is one of the polygonal regions into which τ is partitioned by the lines in L. We study the following query problem: given a query point q in the plane, what facet of this graph contains the query point?

We now have, Given L, a triangular arrangement of the lines in L can be computed in time O(n2).

Point location problem in T(L)

1. Pick a random sample R of r lines from L, where r is a suitably large constant. Construct the arrangement T(R).The number of facets in T(R) is O(r2), and is thus a constant

2. For each (triangular) facet f in T(R), determine the set of lines of L\R intersecting f denote by Lf this set of lines. This can be done in time O(nr2). We say a facet f is good if it is intersected by no more than (anlogr)/r lines of L for a suitable constant a. We say the random sample R is good if every facet of T(R) is good. If the chosen sample R is not good, we repeatedly pick samples R until we get a good sample R

3. For each facet f of T(R) for which |Lf| > b for a constant b, we recur on this process. In the recursive steps, the enclosing triangle is just the triangle bounding the facet f. We maintain a pointer from each facet f to the triangular arrangement of the recursive random sample of lines intersecting f. These pointers will facilitate the search process

Note: Step 2 can in fact be implemented in time O(nr). <Hint : Zone Theorem>

We shall prove the following lemma.

Lemma

The probability that any facet of T(R) is intersected by more than (anlogr)/r lines of L is less than 1/2, for a suitably large constant a.

Corollary

The expected number of trials before we obtain a good sample R is at most 2.

The construction time satisfies the recurrence,  T(n) \leq n^2 + cr^2T(\frac{an\log r}{r}) where c is a constant and T(k) denotes the upper bound on the expected cost of constructing the data structure for an arrangement of k lines. This solves to T(n) = O(n2 + ε(r)), where ε(r) is a positive constant that becomes smaller as r gets larger.

Theorem

The data structure is constructed in expected time O(n2 + ε) for a set of n lines in the plane for any fixed ε > 0, and this data structure can support point location queries in time O(logn).

Point to ponder

What are the effects of increasing r on the construction time for the search structure, and on the query time?

Sept 18, 2008

Randomized Rounding

In this paper, we will show a randomized approach to solve a special class of problems so called 0-1 integer programs. We will show how we can relax the constraints, solve the relaxed constraint problem and then force the constraints back (randomly) and still get a good solution which otherwise would have been hard to solve. We will finally show how it would affect the solution.

Integer Programs: Integer programs are special case of linear programs (LP) where we have an additional constraint such that parameters that we minimize over, should be integers i.e. x\in \{0,1\}

We know that IPs are NP-hard therefore we solve them by relaxing their constraints and converting them into LP. This basically means that we no more force x to be integers rather we allow all real numbers i.e. x\in [0,1]. Here is the basic algorithm to solve the original 0-1 integer program problem using randomized rounding.

  1. Solve the relaxed version of IP i.e. LP.
  2. Randomly set variables xi to one or zero according to the rule: P[x_i=1]=\hat{x_i}, here \hat{x_i} is the value of xi obtained after solving LP.
  3. Not really part of the algorithm-- Analyze how it affects the solution which basically means that prove, after performing step 2, we have a non-zero probability of the existence of the solution. Also analyze the quality of the solution.

We will look at three examples (routing problem for VLSI, multi-commodity flow and k-matching for hypergraphs) to see how this technique works.


Sept 23, 2008

Algebraic Fingerprinting

Multiplying two polynomials of degree n can be done in O(nlogn) time using the FFT. We show that verifying f1f2 = f3 can be done in O(n) time. Further, verifying that f1 = f2 for some polynomials f1 and f2 can be done with the same algorithm. Extending this past polynomials, we can verify the equality of two strings a and b by viewing them as polynomials with integer coefficients and degree at most n. We do this by taking strings a = (a_1, \ldots, a_n) and b = (b_1, \ldots, b_n) viewing them as polynomials A(z) = \sum_{i=0}^{n-1}a_iz_i and B(z) = \sum_{i=0}^{n-1}b_iz_i.

To do this verification we find the fingerprint of the polynomials by selecting a random number r, and evaluating the polynomials at r. If the two fingerprints differ, then obviously the polynomials differ too. We shall show that if their fingerprints are equal, then they polynomials are equal with high probability.

The notes used for this lecture follow:


Randomized Algorithms</i> Motwani and Raghavan, sections 7.1-7.2

Notes by Scoπ Alfeld

23.9.08

Section 7.1 - Fingerprinting and Freivalds' Technique

Multiplying two n \times n matrices A and B can be done in O(n3) time via the standard matrix multiplication algorithm. The fastest alternative algorithm currently known runs in O(n2.376). Verifying, however, that

AB = C

can be done in O(n2) time with a randomized algorithm by exploiting the fact that we can multiply a matrix with a vector in O(n2) time.

Let r \in \{0,1\}^n be a random vector. Let

x = Br
y = Ax = ABr
z = Cr

Clearly

(AB = C) \Rightarrow (y = ABr = Cr = z)

Obviously, the converse is not true. We will show, however, that

P(y = z ~|~ AB \neq C) \leq \frac{1}{2}

Consider the n \times n matrix D = ABC. We note that P(y = z) = P(Dr = 0). Because AB \neq C, we know that D is not the zero-matrix. Let the k-th row of D, dT, be some row with a non-zero entry. Further, let z be the index of some non-zero entry in dT. The k-th element of Dr is the inner product dTr. Clearly, P(Dr = 0) \leq P(d^Tr = 0). We note that dTr = 0 iff

r_zd_z = -\sum_{i \neq z}d_ir_i

This is to say

r_z = \frac{-\sum_{i \neq z}d_ir_i}{d_z}

The right hand size of the above equation can be 1, 0, or some other number. Because rz is either 0 or 1 with equal likelihood, we find the (tight) upper bound.

 P(y = z) = P(Dr = 0) \leq P(d^Tr = 0) \leq \frac{1}{2}

By this analysis, we see that the decision to draw elements of r from \{0, 1\} was arbitrary. Any two distinct elements would lead to the same bound. Further, if instead we draw elements of r from some set \mathcal{S}, then

P(y = z) \leq \frac{1}{|\mathcal{S}|}

Section 7.2 - Verifying Polynomial Identities

We now extend this technique to verifying polynomial multiplication. Given polynomial functions f1(x), f2(x) of degree n, and f3(x) of degree 2n, we want to verify

f1(x)f2(x) = f3(x)

Multiplying f1(x) and f2(x) takes O(nlog(n)) time using the Fast Fourier Transform. We will show that, using a randomized algorithm, we can verify the multiplication in O(n) time.

Let r \in \mathcal{S} be a random number. We claim the polynomial multiplication to be valid if

f1(r)f2(r) = f3(r)

Clearly,

{(}f_1(r)f_2(r) \neq f_3(r){)} \Rightarrow {(}f_1(x)f_2(x) \neq f_3(x){)}

but the converse is not true. In parallel with the previous section, we will show that

P{(}{(}f_1(r)f_2(r) = f_3(r){)} ~|~ {(}f_1(x)f_2(x) \neq f_3(x){)}{)} \leq \frac{2n}{|\mathcal{S}|}

We let

Q(x) = f1(x)f2(x) − f3(x)

Because the degree of f1 and f2 is n, Q has at most 2n distinct roots. Therefore,

P{(}Q(r) = 0 ~|~ Q(x) \not\equiv 0{)} \leq \frac{2n}{|\mathcal{S}|}

We can illustrate this pictorially. The graph of Q(x) crosses the x-axis at most 2n times, therefore when we select r, the probability that it is one of these zeros is at most \frac{2n}{|\mathcal{S}|}. We can also use a deterministic approach of simply picking 2n + 1 points, and evaluating Q(x) for each. However, the naive approach to this takes O(n2) time, and even a more advanced technique brings this only to O(nlog2(n)) time (longer than multiplying the original polynomials directly).

In general, this procedure can be used to determine a polynomial identity f1(x) = f2(x) by simply letting Q(x) = f1(x) − f2(x). Given f1(x) and f2(x) explicitly, we can also compare coefficients directly - a deterministic O(n) procedure. This cannot be done, however, when the polynomials are instead given implicitly. One key advantage of this randomized algorithm is that it allows a black-box view of the polynomials.

We now extend this technique to handle the case for multivariate polynomials. First we adopt the book's change of variables in Q. Rather than the degree of the polynomials, n now represents the number of variables. The degree of a term in Q is the sum of the exponents, and the total degree d of Q is the maximum of the degrees of its terms. We now choose a random vector r \in \mathcal{S}^n, and want to prove

P{(}Q(r_1, \ldots, r_n) = 0 ~|~ Q(x_1, \ldots, x_n) \not\equiv 0{)} \leq \frac{d}{|\mathcal{S}|}

We prove this by induction on n. Our base case

P{(}Q(r_1) = 0 ~|~ Q(x_1) \not\equiv 0{)} \leq \frac{d}{|\mathcal{S}|}

follows from our preceding analysis. We then factor out x1 (or any other x of our choosing), defining 0 < k \leq d to be the greatest exponent of x1. This leads to

Q(x_1, \ldots, x_n) = \sum_{i=0}^kx_1^iQ_i(x_2,\ldots,x_n)

where Q_i(x_2, \ldots, x_n) is the coefficient of x_1^i. We note that Q_i(x_2,\ldots,x_n) is a polynomial of n − 1 variables, and has degree at most di.


We now consider the polynomial Q_k(x_2, \ldots, x_n). The total degree of Qk is at most dk, and by our choice of k, we know that

Q_k \not\equiv 0

We now consider the two cases.


Case I: Q_k(r_2, \ldots, r_n) \neq 0

By our induction hypothesis, we know that P{(}Q_k(r_2, \ldots, r_n) = 0{)} \leq \frac{(d-k)}{|\mathcal{S}|}.



Case II: Q_k(r_2, \ldots, r_n) = 0

We define a new, univariate, polynomial q(x_1) = Q(x_1, r_2, r_3, \ldots, r_n) By the fact that the coefficient of x_1^k is Q_k(r_2, \ldots, r_n), we know that q has degree k, and q \not\equiv 0. Because q is univariate, we know that P{(}q(r_1) = 0{)} \leq \frac{k}{|\mathcal{S}|}



We now know that

P{(}Q_k(r_2, \ldots, r_n) = 0{)} \leq \frac{d-k}{|\mathcal{S}|}

And that

P{(}Q(r_1, \ldots, r_n) =0 ~|~ Q_k(r_2, \ldots, r_n) \neq 0{)} \leq \frac{k}{|\mathcal{S}|}

Exercise 7.3 in the book shows that for any two events A and B

P(A) \leq P(A ~|~ \bar{B}) + P(B)

Therefore

P{(}Q(r_1, \ldots, r_n) = 0 ~|~ Q(x_1, \ldots, x_n) \not\equiv 0{)} \leq \frac{d-k}{|\mathcal{S}|} + \frac{k}{|\mathcal{S}|} = \frac{d}{|\mathcal{S}|}

Exercise 7.3

For events A and B we want to show

P(A) \leq P(A ~|~ \bar{B}) + P(B)

We know that

P(A) = P(A \wedge B) + P(A \wedge \bar{B})

and, by Bayes' Rule

P(A~|~\bar{B}) = \frac{P(A \wedge \bar{B})}{P(B)}

and clearly,

\frac{P(A \wedge \bar{B})}{P(B)} \geq P(A \wedge \bar{B})

Therefore

P(A) \leq P(A \wedge B) + \frac{P(A \wedge \bar{B})}{P(B)} = P(A \wedge B) + P(A ~|~ \bar{B}) \leq P(B) + P(A ~|~ \bar{B})

Aug 26, 2008

Problem: Fingerprinting and Applications

In this lecture, we will be looking at another technique of fingerprinting. Suppose we have two parties Alice and Bob and they want to exchange information to verify that both of them have same information stored in their respective databases. The simple way to verify both have same information is if one sends all its information to the other. But if the information is n-bits long, then one has to transfer all n-bits but our goal is to verify both have same data without sending all bits.

To solve this we will use another fingerprinting technique which is based on taking a modulo of the information by some prime p and then Alice send that fingerprint to Bob who compares it with its fingerprint. This technique will fail only if Fp(a) = Fp(b) | a \ne b. We will see that if we pick p from a big enough set, then the P[Fp(a) = Fp(b)] | a \ne b is small. Also, we will see using this technique we need only O(log n) bits to verify, which is an exponential improvement.

We will also look at two applications:

1. Arithmetic modulo a random prime

If we have very large integers, its computationally expensive to do computations on them. These can be results of evaluating polynomials. And generally our goal is to say if two large integers equal or not. If we use modulo operator at time of evaluating, then we can ensure a correct answer with high probability which will again only take O(logn) bits.

2. Pattern matching

We are given a string X=x_1x_2 \dots x_n and shorter pattern Y=y_1y_2 \dots y_m with m<n. The goal is to find if Y occurs as a contiguous substring of X. The brute force methods take O(mn) time, however there exists complicated deterministic algorithms that run in O(m+n). They are difficult to implement and have large overheads. We will see that we can use fingerprinting technique and solve it in O(m+n) which is a randomized algorithm. Again the probability of failure is small. And we can further reduce error probability by converting Monte Carlo algorithm into Las Vegas algorithm.

Personal tools