Data Mining: Algorithms, Geometry, and Probability
Jeff M. Phillips
This ``book'' consists if a series of lecture notes I have built over many years teaching a Data Mining Course at the University of Utah. It is designed for senior undergraduates, or first year graduate students in a computing program. It assumes basic programming, and basic knowledge about probability, linear algebra, and algorithms.

When writing these notes, I was heavily influenced by the following two books, which were developed partially in parallel and cover similar material, but from a different perspective, IMO.
MMDS: Mining Massive Data Sets by Anand Rajaraman, Jure Leskovec, and Jeff Ullman.
FoDS: Foundations of Data Science by Avrim Blum, John Hopcroft and Ravindran Kannan.

These notes also overlap some with another ``book" on The Foundations of Data Analysis, I have created on similar topics aimed at less advanced students.

Many of my lectures on this material appear on Utah's School of Computing's YouTube Channel.


1. Introduction
2. Statistical Principles, Hashing, and Concentration of Measure
    more on Chernoff-Hoeffding Bounds

Similarities and Distances
3. Jaccard Distance and nGrams
4. MinHashing
5. Locality Sensitive Hashing (LSH)
6. Distances
7. Approximate Nearest Neighbors

Clustering
8. Hierarchical Aggolerative Clustering
9. Assignment-based Clustering (k-means etc)
10. Spectral Clustering

Streaming and High Frequency Items
11. Deterministic Heavy-Hitters and Quantiles
12. Count-Min Sketch and Frequent Itemsets

Regression and Dimensionality Reduction
13. Types of Regression in 2 Dimensions
14. Singular Value Decomposition (SVD)
15. Metric Learning
16. Matrix Sketching
17. Random Projections
18. Orthogonal Matching Pursuit and Compressed Sensing
19. Ridge Regression and Lasso

Managing and Using Noise
20. Outliers and Cross-Validation
21. Privacy

Graph Analysis
22. Markov Chains
23. PageRank and Search Engines
24. MapReduce and the Big Data Revolution
25. Detecting Communities
26. Graph Sparsification