Jeff M. Phillips

This ``book'' consists if a series of lecture notes I have built over many years teaching a Data Mining Course at the University of Utah. It is designed for senior undergraduates, or first year graduate students in a computing program. It assumes basic programming, and basic knowledge about probability, linear algebra, and algorithms.

When writing these notes, I was heavily influenced by the following two books, which were developed partially in parallel and cover similar material, but from a different perspective, IMO.

These notes also overlap some with another ``book" on The Foundations of Data Analysis, I have created on similar topics aimed at less advanced students.

Many of my lectures on this material appear on Utah's School of Computing's YouTube Channel.

1. Introduction

2. Statistical Principles, Hashing, and Concentration of Measure

more on Chernoff-Hoeffding Bounds

3. Jaccard Distance and nGrams

4. MinHashing

5. Locality Sensitive Hashing (LSH)

6. Distances

7. Approximate Nearest Neighbors

8. Hierarchical Aggolerative Clustering

9. Assignment-based Clustering (k-means etc)

10. Spectral Clustering

11. Deterministic Heavy-Hitters and Quantiles

12. Count-Min Sketch and Frequent Itemsets

13. Types of Regression in 2 Dimensions

14. Singular Value Decomposition (SVD)

15. Metric Learning

16. Matrix Sketching

17. Random Projections

18. Orthogonal Matching Pursuit and Compressed Sensing

19. Ridge Regression and Lasso

20. Outliers and Cross-Validation

21. Privacy

22. Markov Chains

23. PageRank and Search Engines

24. MapReduce and the Big Data Revolution

25. Detecting Communities

26. Graph Sparsification