Tue-Thu 3:40-5pm (Spring 2013)
The Geometry of Data
Computational geometry is the study of algorithms for geometric objects: points, curves, lines, surfaces, and so on.
But what makes computational geometry a vastly powerful discipline is where these points, lines, surfaces etc come from. It’s not just proteins, or 3D models, or maps (although they are very important). It’s virtually any problem involving the analysis of data, whether it be text, video, images, speech, or even movie recommendations.
Computational geometry is the structural underpinning of data mining: it gives data its “shape” and tells us how to manipulate it efficiently.
For more on this, read my short note: “Data, Dimensions and Geometry“.
The course will cover a variety of topics in computational geometry, with an emphasis on constructs that relate to data analysis. These include:
- Foundations: Convex hulls, Voronoi diagrams, Delaunay triangulations, arrangements, duality, (orthogonal) range searching
- Randomization: $\epsilon$-nets/samples, VC dimension, cuttings,
- Optimization: linear programming and beyond
- Approximations: grids, near neighbor searching, core sets, projections, reweighting techniques (MWU)
- The Geometry of Learning: manifolds, kernels, Bregman divergences
This course will assume competence in the material covered in CS 6150 (Advanced Algorithms), which means facility with algorithm analysis techniques, randomness and probability, approximation algorithms and basic complexity theory. No programming skills will be required.
The course will be drawn from material in:
- Computational Geometry: Algorithms and Applications
- Geometric Approximation Algorithms (and an online draft)
I will provide additional material as needed.
- Convexity and convex hulls.
- Voronoi Diagrams
- Delaunay Triangulations (some notes from David Mount)
- $\varepsilon$-nets, $\varepsilon$-samples, VC dimension (Sariel’s notes, Chapter 5)
- VC dimension continued: How do we compute VC dimension of a space ?
- More VC dimensions (shatter dimension, dual range spaces) and some PAC-learning.
- Three lectures on PAC learning (the second one is really a recap of VC-dimension: the first defines PAC learning and the third outlines an argument for the $\varepsilon$-net construction.
- Reweighting: the multiplicative-weight-update method. (Sariel’s book, chapter 6.3/6.4), and my blog post.
- Low-dimensional linear programming (Seidel’s algorithm). (Section 9.2)
- High-dimensional linear programming: the simplex algorithm, and the ellipsoid method.
- Grid-based approximations (Sariel’s notes, Chapter 1)
- Continuation of grid-based algorithms (minimum enclosing ball), and quad trees (chapter 2).
- Compressed quad trees and well-separated pair decompositions (Chapter 2,3)
- WSPDS continued, range trees and kd-trees.
- Near-neighbor searching (Chapter 11 from Sariel’s online notes, Chapter 17 in his book)
- Near-neighbor searching in high dimensions (Chapter 20 from the book)
- (Guest lecture by Jeff Phillips) Projections: the JL lemma and friends
- Core sets and extents.
The Geometry of Learning
- Kernels (Hal Daumé’s excellent review)
- Bregman divergences
- The geometry of graphs (spectral methods)
Note: While this is the order in which I will try to cover the topics, they may not have a one-one mapping with lectures. Some topics might span multiple lectures, and others might be compressed to fit in one.
Grading will be based solely on homeworks (5-6). There will be no exams or projects.