CS 6963 Summaries

From ResearchWiki

(Difference between revisions)
Jump to: navigation, search
m (The Probabilistic Method I)
(Sample and Modify)
Line 101: Line 101:
In some cases, it is easier to indirectly construct the object with required properties. The sample and modify method gives a 2-stage indirect construction. In the first stage, a random structure is constructed which does not have the required properties. In the second stage, this structure is incrementally modified until it becomes one with the required properties.
In some cases, it is easier to indirectly construct the object with required properties. The sample and modify method gives a 2-stage indirect construction. In the first stage, a random structure is constructed which does not have the required properties. In the second stage, this structure is incrementally modified until it becomes one with the required properties.
-
'''Example:''' An independent set of a graph is a set of vertices with no edges between them. Finding the largest independent set of NP-hard. Using sample and modify, however, we can give a lower bound on the size of the largest independent set of a graph.
+
'''Example 1:''' An independent set of a graph is a set of vertices with no edges between them. Finding the largest independent set of NP-hard. Using sample and modify, however, we can give a lower bound on the size of the largest independent set of a graph.
'''Theorem:''' Let G = (V,E) be a graph on n vertices with m edges. Then G has an independent set with at least <math>n^2/4m</math> vertices.
'''Theorem:''' Let G = (V,E) be a graph on n vertices with m edges. Then G has an independent set with at least <math>n^2/4m</math> vertices.
'''Proof:''' The proof (which can also be thought of as an algorithm) works by starting with the original graph and incrementally removing vertices and edges from it until all the edges are removed from the graph. The remaining vertices thus form an independent set. Thus in this case, we have a bound on the size of the independent set (which as we will see is <math>n^2/4m</math>) as well as an algorithm for finding such a set.
'''Proof:''' The proof (which can also be thought of as an algorithm) works by starting with the original graph and incrementally removing vertices and edges from it until all the edges are removed from the graph. The remaining vertices thus form an independent set. Thus in this case, we have a bound on the size of the independent set (which as we will see is <math>n^2/4m</math>) as well as an algorithm for finding such a set.
 +
 +
'''Example 2:''' Another example is sample and modify technique is about proving the existence of graphs having a large girth. Girth of a graph is the length of its smallest cycle.
 +
 +
'''Theorem:''' For any integer <math>k \geq 3</math> there is a graph with n nodes, at least <math>\frac{1}{4}n^{1+1/k}</math> edges, and girth at least k.
 +
 +
Usually we expect dense graphs (i.e. ones having a large number of vertices) to have a small girth. But the above theorem says that it is possible for even dense graphs to have large girths too.
 +
 +
'''Proof:''' The proof works by first sampling a random graph (i.e. one starting with n isolated vertices and randomly, with some probability p, adding edges between all pairs of vertices). Then in the second step, we count all possible cycles of length up to k-1 and ''destruct'' all such. The destruction is simple: simply deleter one edge from each of these cycles. In the graph we are left with, there would be cycles only having length more than or equal to k. To get the number of edges in the resulting graph, we can subtract the number of edges deleted in the second step from the number of edges we had in the graph to begin with. It can be shown that this number is at least <math>\frac{1}{4}n^{1+1/k}</math>.

Revision as of 00:29, 28 August 2008

Contents

Aug 26, 2008

Problem: routing in a network.

We're given a directed graph on N nodes. Each edge is a link between communicating nodes. A link can carry one packet at a time. We assume that nodes have queues for each outgoing edge. All nodes have IDs, so we can refer to node 1, node 10, and so on.

Consider permutation routing: here, each node wants to send a packet to exactly one other node, and each node receives exactly one packet. A routing algorithm has to decide, for each (i,j=d(i)) start-end pair, how to route the packet that goes from i to j. Every edge can be busy at a time step: thus the running time of the routing algorithm is upper bounded by the max congestion on an edge.

Let's consider an "easy" class of routing algorithms: namely, algorithms that are oblivious: the route used depends solely on the pair (i,j). More precisely, the path taken by a packet from i depends only on j, and on nothing else. Oblivious routing needs little state: each (src,dest) pair uniquely defines a route, and so routing tables are easy to maintain.

Theorem: Let the graph have N nodes and out degree d. Then there exists a permutation that forces \Omega(\sqrt{N/d}) congestion for any deterministic oblivious routing scheme.

Consider a special network: the Boolean hypercube on n-bit strings (two nodes are adjacent if they differ in one bit). the degree d = n, and the size of the network N = 2n. The above result says that the number of steps needed is at least \sqrt{2^n/n}.

We will prove the following result:

Theorem: There exists a randomized routing scheme for the hypercube that requires only O(n) steps with high probability.

The deterministic lower bound works by a counting argument (show that each "destination graph" linking a fixed destination to its sources must have a large number of congested edges, and use averaging to show that there must be some large set of routes that all hit the same bad edge. Since routes are oblivious, this defines a permutation that forces congestion).

But let's get some intuition on the hypercube using a simple algorithm called 'bit fixing': find the left most bit where the current location differs from the target, and route along that path.

Lemma: there's a \sqrt{N/n} lower bound for bit-fixing methods. Proof: consider the following permutation: the node i = a_i b_i is routed to the node j = b_i a_i Idea: all routes must pass through a node of the form X X and there are only \sqrt{N} of these. Thus there exists at least one node that takes in N/\sqrt{N} packets, and it has degree n.

Here's a clue though: the adversary beat the algorithm by taking advantage of the bit fixing protocol. Suppose we could confuse it by scattering nodes so as not to create a bottleneck ?

Randomized algorithm

  1. Route every packet from i to a random location σ(i)
  2. Route every packet from σ(i) to d(i)

Intuition: the randomization spreads the packets out, so the routes don't get bottlenecked.

Proof. Suppose we show that Phase 1 has congestion c with high probability. Then this implies that phase 2 has congestion c with high probability as well by reversing time. Does this mean that the total congestion is at most 2c ? Yes, if we make sure that no packets from phase 1 interfere with packets from phase 2. So we make all packets wait until c steps are done, so that we can be sure that there's no interference.

What remains is to prove that phase 1 causes only congestion c. Fix a packet v_i from i, and look at how long it takes. It's n + delay. How much is the delay ?

Part I: identify structural cause of delay (independent of randomization) Part II: analyze structure using randomization

What's the cause of delay.

Fix a route for a particular packet v. Let S be the set of all packets that traverse at least one edge of this route. Claim: the delay in packet is at most |S| Intuition: any packet that shares a route with v can delay it only once.

Observation: once two routes separate, they cannot merge again. Proof: in the bit fixing model, two routes diverging implies that at some step, the two packets are at locations that have different prefixes. Now each prefix is a prefix of the final destination, so once this happens, the routes cannot merge again.

[Back to claim] We will charge every unit of delay for v to some packet in S. Suppose at time t, v travels packet ej. then the lag is t - j. Suppose v's lag goes from l to l+1, while v is about to cross ej. this means that some other packet must cross ej at the same time, and therefore also has lag l.

Fix the latest time at which some packet has lag l. This packet must leave the route, else it can continue on with lag l. We charge v's increase in lag to this packet, and now that it's gone, we will never charge it again.


Analyze this structure.

We now need to understand how many packets overlap v's route. Let Hij = 1 if routes for i and j share an edge. Therefore, the delay on v_i = \displaystyle{\sum_j} H_{ij}. Destinations are chosen randomly, and so Hij are independent Poisson trials. So we need to estimate E[\sum H_{ij}].

Fix an edge e, and let T(e) = # routes through it. Then \sum H_{ij} = \sum T(e_j) (for e_1\ldots e_k the route of i)

By symmetry, each edge is the same as each other in Phase 1, and so T(ei) = T(ej). Since the expected route length is n/2, and there are N of them, the total number of route-edges = Nn/2, and there are Nn edges in the hypercube, and so E[T(e)] = 1/2 Therefore, E[\sum H_{ij}] \le k/2 \le n/2

Applying a standard Chernoff bound, the probability that this is more than 6n is 2 − 6n. Moreover, we have to take a union bound for all packets, so we multiply this by 2n = 2 − 5n

NOTE: the T(ei) are not independent, but the Hij are. so we must use a Chernoff bound over the H_{ij} rather than over the T(ei)

Aug 28, 2008

The Probabilistic Method I

In today's lecture, we shall discuss some simple yet very effective probabilistic techniques that can be used for proving existence of objects with certain properties. These techniques fall under the broad umbrella of "Probabilistic Methods" and, using the power of randomization, they give us results that could be hard to prove otherwise. The basic recipe is simple:

Construct a sample space of objects. Now check to see if, with positive probability, a randomly selected object from this space has the required property. We are done.

Although the basic premise is simple, even in the randomized setting, certain combinatorial arguments are needed for proving the existence of such objects. The probabilistic method thus helps us by providing a bunch of techniques to deal with this. Apart from giving the existence results, the power of randomization is further evidenced by its ability to help us develop randomized algorithms that can actually find such objects. We shall discuss three important techniques and example situations where they yield useful results.

The Basic Counting Argument

The basic counting argument works by constructing an appropriate probability space S and then, by naively enumerating over all possible randomly chosen objects of interest, shows that the probability of existence of an object having required properties is non-zero.

Example: Consider the problem of coloring the edges of an n-vertex graph with only two colors such that there are no large cliques (say having k vertices) with all edges having the same color (i.e. no monochromatic subgraph). The following theorem gives the condition that guarantees the existence of such a coloring.

Theorem: If {n\choose k}2^{-{k \choose 2} + 1} < 1, then it is possible to color the edges of Kn with two colors so that it has no monochromatic Kk subgraph.

Proof: The proof works by constructing a sample space of all k-vertex cliques and computing the probability that all edges in each such clique are given the same color. It's easy to show that this probability turns out to be 2^{-{k \choose 2} + 1}. Using the union bound (and applying the condition given in the theorem above), we can easily see that the probability of at least one such clique being monochromatic is less than 1. Thus the probability that none of the k-cliques are monochromatic will be greater than zero and thus there must exist such a coloring.

Now that we know that such a coloring exists, we can even actually go further and find such a coloring using a randomized construction algorithm. We shall see examples of constructing both a Monte Carlo and a Las Vegas algorithm for finding such coloring. We will also investigate the conditions under which such constructions are practically useful (i.e. whether the Monte Carlo algorithm indeed yields a low error probability, or whether the Las Vegas algorithm can run in polynomial time).

The Expectation Argument

The basic counting argument works by constructing a probability space from which a randomly chosen object is shown to have the desired properties with a positive probability. However, sometimes it is easier to prove such a thing by instead working with an expectation (averaging) argument. The expectation argument is based on a fairly simple (and intuitive) fact: a random variable must, with positive probability, have at lease one value that is no greater than its expectation, and at least one value that is no smaller than its expectation.

Example: A satisfiability problem (also called a SAT formula) is a logical expression that is the conjunction (AND) of a set of clauses. Each clause is a disjunction (OR) of literals. The solution to SAT formula is an assignment (True/False) of the literals such that all the clauses are satisfied. In general, determining if a SAT formula has a solution is NP-hard. Given a SAT formula, a related problem is satisfying as many clauses as possible. This is called the MAXSAT problem. We shall see that, using the expectation argument, we can provide a lower bound on the number of satisfied clauses.

Theorem: Given a set of m clauses, let ki be the number of literals in the ith clause for i=1,...,m. Let k = miniki. Then there is a truth assignment that satisfies at least \sum(1 - 2^{-k_i}) \geq m(1 - 2^{-k}) clauses.

Proof: The proof works by first finding the probabilities of each clause being satisfied for some uniformly random choice of literals. Given these probabilities, we can easily compute the expected number of satisfied clauses. Invoking the expectation argument, we see that there is a positive probability that there must be an assignment that satisfies at least that many number of clauses.

We shall see another example of finding a large cut in a graph. In general, finding the maximum cut in a graph is NP-hard. However, using the expectation argument, we can give a lower bound on the value of the maximum cut and show that the value of maximum cut is at least half the number of edges in the graph. Transforming this existence argument, we will also see a Las Vegas algorithm that finds a large cut with this lower bound.

Sample and Modify

In some cases, it is easier to indirectly construct the object with required properties. The sample and modify method gives a 2-stage indirect construction. In the first stage, a random structure is constructed which does not have the required properties. In the second stage, this structure is incrementally modified until it becomes one with the required properties.

Example 1: An independent set of a graph is a set of vertices with no edges between them. Finding the largest independent set of NP-hard. Using sample and modify, however, we can give a lower bound on the size of the largest independent set of a graph.

Theorem: Let G = (V,E) be a graph on n vertices with m edges. Then G has an independent set with at least n2 / 4m vertices.

Proof: The proof (which can also be thought of as an algorithm) works by starting with the original graph and incrementally removing vertices and edges from it until all the edges are removed from the graph. The remaining vertices thus form an independent set. Thus in this case, we have a bound on the size of the independent set (which as we will see is n2 / 4m) as well as an algorithm for finding such a set.

Example 2: Another example is sample and modify technique is about proving the existence of graphs having a large girth. Girth of a graph is the length of its smallest cycle.

Theorem: For any integer k \geq 3 there is a graph with n nodes, at least \frac{1}{4}n^{1+1/k} edges, and girth at least k.

Usually we expect dense graphs (i.e. ones having a large number of vertices) to have a small girth. But the above theorem says that it is possible for even dense graphs to have large girths too.

Proof: The proof works by first sampling a random graph (i.e. one starting with n isolated vertices and randomly, with some probability p, adding edges between all pairs of vertices). Then in the second step, we count all possible cycles of length up to k-1 and destruct all such. The destruction is simple: simply deleter one edge from each of these cycles. In the graph we are left with, there would be cycles only having length more than or equal to k. To get the number of edges in the resulting graph, we can subtract the number of edges deleted in the second step from the number of edges we had in the graph to begin with. It can be shown that this number is at least \frac{1}{4}n^{1+1/k}.

Personal tools