Foundations for Geometric Analysis of Noisy Data
PI : Jeff M. Phillips
NSF CCF-1350888 (May 2014 - April 2019)
An important role of computational geometry is to understand and formalize the structure of data. And as data is becoming a central currency of modern science, this role is growing in importance. However, much of classical computational geometry inherently assumes that all aspects of data are known and precise. This is rarely the case in practice. This project focuses on building the foundations for two extensions to classic geometric settings pertinent to noisy data.

1. We study locational uncertainty in point sets, where the location of each data point is described by a probability distribution. Given such an input, the goal is to formalize how to construct, approximate, and concisely represent the distribution of geometric queries on this uncertain data.

2. We study the geometric consequences of applying a statistical kernel (e.g. a Gaussian kernel) to a data set. He will investigate how this process can smooth data, remove degeneracies, and implicitly simplify and regularize algorithms. Moreover, he will explore the geometric structure of the resulting kernel density estimate, and how it relates to algorithms for the data and approximate representations of the data.

The PI will lead the development of a data-focused educational program around the themes of data analysis, algorithmics, and visualization. The PI is developing a model course for this program on data mining; it focuses on the geometric, statistical, and algorithmic properties of data. An extensive set of course notes is being compiled, accompanied with videotaped lectures freely available online. This class and program attract many interdisciplinary and diverse students and observers. This program is part of a larger effort to make relevant data analysis techniques from computational geometry available to a broader data-rich audience.


Publications:
  • Approximating the Distribution of the Median and other Robust Estimators on Uncertain Data.
         Kevin Buchin, Jeff M. Phillips and Pingfan Tang. International Symposium on Computational Geometry (SOCG). June 2018.
        arxiv:1601.00630. March 2018.

  • Near-Optimal Coresets of Kernel Density Estimates. (Invited to Special Issue)
         Jeff M. Phillips and Wai Ming Tai. International Symposium on Computational Geometry (SOCG). June 2018.
        arxiv:1802.01751. February 2018.

  • Improved Coresets for Kernel Density Estimates.
         WaiMing Tai and Jeff M. Phillips. 29th Annual ACM-SIAM Symposium on Discrete Algorithms (SoDA). January 2018.
        arxiv:1710.04325. October 2017.

  • Visualization of Big Spatial Data using Coresets for Kernel Density Estimates.
         Yan Zheng, Yi Ou, Alexander Lex, and Jeff M. Phillips. Visual Data Science (VDS). October 2017.
        arxiv:1709.04453. September 2017.
        Project Page (and code).

  • Visualizing Sensor Network Coverage with Location Uncertainty.
         Tim Sodergren, Jessica Hair, Jeff M. Phillips, and Bei Wang. Visual Data Science (VDS). October 2017.
        arxiv:1710.06925. September 2017.

  • Relative Error Embeddings for the Gaussian Kernel Distance.
         Di Chen and Jeff M. Phillips. Algorithmic Learning Theory (ALT). October 2017.
        arxiv:1602.05350. February 2016.

  • Coresets for Kernel Regression.
         Yan Zheng and Jeff M. Phillips. ACM Conference on Knowledge Discovery and Data Mining (KDD). August 2017.
         arxiv:1702.03644. February 2017.
         Project Page

  • The Robustness of Estimator Composition.
         Pingfan Tang and Jeff M. Phillips. Conference on Neural Information Processing (NIPS). December 2016.

  • epsilon-Kernel Coresets for Stochastic Points.
         Lingxiao Huang, Jian Li, Jeff M. Phillips, and Haitao Wang. European Symposium on Algorithms (ESA). August 2016.
         arxiv.org:1411.0194. November 2014.

  • An Integrated Classification Scheme for Mapping Estimates and Errors of Estimation from the American Community Survey.
         Ran Wei, Daoqin Tong, and Jeff M. Phillips. Computers, Environment and Urban Systems (CEUS). April 2016.

  • Coresets and Sketches.
         Jeff M. Phillips. Handbook of Discrete and Computational Geometry. 3rd edition, CRC Press, Chapter 48. 2016.
         arxiv.org:1601.00617. January 2016.

  • Streaming Kernel Principal Component Analysis.
         Mina Ghashami, Daniel Perry, and Jeff M. Phillips. International Conference on Artificial Intelligence and Statistics (AISTATS). May 2016.
         arxiv.org:1512.05059. December 2015.

  • Subsampling in Smooth Range Spaces.
         Jeff M. Phillips and Yan Zheng. Algorithmic Learning Theory (ALT). October 2015.
         short version appeared in Computational Geometry : Young Researchers Forum. June 2015.

  • L_infity Error and Bandwidth Selection for Kernel Density Estimates of Large Data.
         Yan Zheng and Jeff M. Phillips. ACM Conference on Knowledge Discovery and Data Mining (KDD). August 2015.

  • Geometric Inference on Kernel Density Estimates.
         Jeff M. Phillips, Bei Wang, and Yan Zheng. International Symposium on Computational Geometry (SOCG). June 2015.
         full version: arXiv:1307.7760. March 2015.
         early version appeared as Kernel Distance for Geometric Inference. Jeff M. Phillips and Bei Wang. 22nd Fall Workshop on Computational Geometry. October 2012.


    Educational Development:
  • Data Mining through the lens of Geometry and Algorithms.
    Page above includes links to extensive notes (and here) and videos of lectures on YouTube.

  • Foundations of Data Analysis undergraduate introduction to math required for advanced machine learning and data mining.
    I have developed most of a book: Mathematical Foundations of Data Analysis to support this course, free online.

  • Big Data Program with emphasis on algorithm principles and geometric interpretation of data.
    Program Youtube Channel with lectures from many classes.
    Publicity from the U, the Salt Lake Tribune, the Daily Utah Chronicle, KPCW: Cool Science Radio, and the Utah Business Magazine.

  • Data Science Option of Computing MS Degree new curriculum requirements approved for MS degree in Computing. A new Data Science specification, jointly organized with Utah Math depamrtent (which houses Statistics).

  • Utah Data Science Day co-organized. Over 160 registered participants. Job fair with 15 local companies. An industry panel, poster session, and 5 invited talks.


    Personnel:
  • PI: Jeff Phillips
  • GRA: Yan Zheng (PhD 2017, first job @ Visa Research)
  • GRA: Pingfan Tang
  • GRA: Sunipa Dev
  • Ug: Benwei Shi
  • Ug: Jamie Iong (BS Thesis 2015, first job @ EMC)