CS 4960/6960: Homework

Note: On this page, I will post a stream of HW problems. If you spot typos/clarifications, please ask! The same holds if you do not understand some notation.

  1. Let cRn be some non-zero vector (i.e., at least one of its entries is non-zero). Suppose x=(x1,x2,,xn) is a random vector obtained by setting each xi to be 0 or 1 with probability (1/2), independently of the other vectors.

    (i) Prove that Pr[c,x=0]12. (Hint: focus on some index i where ci0, now argue about the probability after conditioning on any setting of values for xj, for ji.)

    (ii) Give an example where equality occurs.

  2. As we proved in the class the recurrence relation for the expected runtime of QuickSort is as below (and the base case is T(1)=0. Prove that the upper bound for T(n) is O(nlog(n)).

    T(n)=(n+1)+2n(T(0)+T(1)+...T(n1)).

    [Hint: Prove an intermediate step that T(n)n+1=T(n1)n+2n+1]

  3. The goal of this exercise is to show that non-negativity is required for Markov's inequality. Concretely, give an example of a random variable X such that E[X]=1 and Pr[X>4]=0.9.

  4. Suppose we take the numbers 1,2,...,n, and randomly permute them, to obtain the sequence π1,π2,,πn. Let X be the random variable that counts the number of "fixed" points, i.e., j such that πj=j. What is E[X]?
    [Hint: Define random variables Yj = 1 when πj=j and Yj = 0 when πjj​ and use linearity of expectation. ]

  5. (a) Prove the following probability statement: let X1,X2,,Xn be independent random variables. Then, var(X1+X2++Xn)=i=1nvar(Xi) [Hint: recall that the variance of a random variable X with mean μ is defined as E[(Xμ)2].] (b) Show that for the above to hold, you don't need all the Xi to be independent, but you need only that E[XiXj]=E[Xi]E[Xj] for all pairs i,j. [If you're interested, you should ask yourself, doesn't this mean that Xi,Xj are independent? We will discuss this point in class on Monday, Jan 29.]

  6. (Optional) A set of n-bit binary strings C={s1,s2,,sN} is said to be an error correcting code (ECC) if for all i,j[N] and ij, we have d(si,sj)n/4, where d(si,sj) denotes the Hamming distance, defined as the number of indices in which si and sj differ. Background. In coding theory, one wishes to construct ECCs with N as large as possible (for a given n). This is because we should think of s1,...,sN as encodings of "messages" {1,2,...,N}, and we would like to encode as many messages as possible. Problem. Prove that there exists an ECC with N>20.1n. Follow the random greedy addition algorithm, and argue that it produces a set C whose expected size is N:

    [Hint: whenever |C|<2N, show that the probability that an iteration of the for loop "succeeds" is at least 1/2. You may find it useful to assume that (n0)+(n1)+(nn/4)<20.9n2 ]


This completes HW 1. It is due on Wednesday, February 7 (11:59). If you need more time, please ask by email at least 1 day before the deadline.


  1. The key property of convex functions over a convex domain (used in most optimization algorithms) is that a local minimum is also a global minimum. This condition can be stated in many ways, and here's one. Let f be a convex function, and let DRn be a convex set (which is the domain). Suppose xD satisfies the property that for some ρ>0, f(y)f(x) for all yBall(x,ρ). (In other words, x is a local minimum). Now prove that for all zD, we must have f(z)f(x).

    (Hint. Recall that one definition of convexity is that for all t[0,1] and points x,y, f(tx+(1t)y)tf(x)+(1t)f(y). Use this to prove by contradiction. Suppose there exists z such that f(z)<f(x), and now use the definition of convexity to contradict the local minimum property. Also, in your proof, where are you using the convexity of D?)

  1. Another nice property of a convex function f over any bounded (and closed, to be precise) domain DRn is that the maximum value of f over D is always attained at a "boundary" point; more formally, there is always a point on the boundary that attains the maximum value over D. Prove this fact.

    (Hint. Geometrically, if you draw a line through any point x that is not on the boundary, it will intersect the boundary in (at least) two points, one on either side.)

  1. Linear classifiers (passing through the origin) are probably the most classic models in machine learning. Given a set of points x1,x2,,xmRn and their labels y1,y2,,ym, the goal is to find a "hypothesis" that is a vector wRn such that sign(w,xi)=yi for all i[m].

    Solve the problem of finding such a w using a linear program. (You may assume that there exists such a w.) (Hint. You will be able to express the problem just as a question of finding any feasible solution. There is a subtlety here about avoiding having w,xi=0. Think about how you can achieve this...)

  1. Recall the LP relaxation for the set-cover problem, where we had m sets and n elements. As we did in class, suppose that the sets are called S1,S2,,Sm and the elements are 1,2,,n. For every j[n], the LP has the constraint: i such that jSixi1. Now suppose we know that for some parameter D, every element j is contained in at most D sets. Show how to obtain a factor D approximation algorithm for set-cover on such instances. (In other words, give an algorithm that is guaranteed to output a solution no worse than factor D times the optimal.) (Hint. The solution is discussed in one of the lecture notes linked on the course page.)

  1. Recall the linear program we saw in class for the facility location problem. Suppose the clients are numbered 1,2,,n, and the facilities are numbered 1,2,,m. The variables are (a) y1,y2,,ym, where yi represents if i is picked or not, and (b) xji, that denote if client j is assigned to facility i. In the LP relaxation, we imposed the constraints 0yi1 and 0xji1, for all i,j. The objective had "opening cost", which was i[m]fiyi, and "connection cost", which was j[n]i[m]d(j,i)xji, where d(j,i) is the distance from client j to facility i.

    We also imposed the constraint ixji=1 for all i (i.e., every client is assigned to some facility). Recall the definition Rj=i[m]d(j,i)xji, which is basically the connection cost of the LP solution for client j. In class, we showed that every client j, if Bj denotes the ball of radius 2Rj around client j, then iBjyi12.

    Use this observation to analyze the following randomized rounding procedure: for all i, open facility i with probability min(1,αyi), for some parameter α>1.

    (a) Prove that for every client j, the probability that we don't open any facility at distance 2Rj around it, is at most exp(α/2). (b) [Extra credit, optional] Show how you would obtain an approximation algorithm for facility location using this idea, and write out the approximation factors for the opening cost and the connection costs separately.


This completes HW 2. It is due on Wednesday, March 13 (11:59). If you need more time, please ask by email at least 1 day before the deadline.