# Algorithms Seminar/Fall10

### From ResearchWiki

**Modelling Data With Uncertainty**

**Wed 1:25-2:45pm**

**MEB 3147 (LCR)**

## Contents |

## Synopsis

We will cover many recent developments in the modeling and processing of uncertain data in computer science. The seminar will focus and the modeling, algorithmic, and data-structural foundations needed for understanding, processing, and visualizing uncertain data. First we will overview different models of uncertainty, tracing their origins and motivation (Many-world models in databases, imprecision models in geometry, probabilistic models in databases). Then we will proceed to survey recent developments in computational geometry, databases, visualization, and machine learning and statistics.

- The computational geometry section will describe basic algorithmic tools and analysis several models of uncertain data.
- The databases section will focus on data structures to manage and process large amounts of uncertain data concisely.
- The visualization section will look at models for data uncertainty for surfaces and volumes, and how the resulting data is visualized.
- The machine learning and statistics section will cover classic techniques for learning and detecting uncertainty.

Through this seminar, participants will gain an understanding of the state-of-the-art in modeling and processing uncertain data and will be exposed to several important open problems and exciting research directions.

## Participants

- Jeff Phillips, CI Postdoctoral Fellow, School of Computing
- Suresh Venkatasubramanian, Assistant Professor, School of Computing
- Parasaran Raman, PhD Student, School of Computing
- John Moeller, PhD Student, School of Computing
- Avishek Saha, PhD Student, School of Computing
- Seth Juarez, PhD Student, School of Computing
- Kristi Potter, Post-Doc, SCI
- Josh Levine, Post-Doc, SCI
- Piyush Rai, PhD Student, School of Computing
- Harsh Bhatia, PhD Student, School of Computing
- Bo Wang, PhD Student, School of Computing

## Schedule

(subject to change)

Date | Topic | Outline and Paper(s) | Presenter |
---|---|---|---|

Aug 25 | Models of Data Uncertainty | Motivation behind modeling Uncertainty. Survey different models and where they arise. | Jeff Phillips |

Geometry
| |||

Sep 1 | Range Spaces and epsilon-Samples | How well does random sampling work? Define important geometric-combinatorial notions: Range spaces, eps-nets, eps-samples. Prove random sampling bound for eps-samples (Sariel's Notes read 5.1 and 5.3.1 and 5.3.2 and the "naive proof"). Better deterministic and from continuous to discrete (also a different way of defining concepts). | John Moeller |

Sep 8 | Epsilon-Quantizations and Epsilon-SIPs | Modeling questions on uncertain data with probability distributions. Review specific applications to smallest enclosing ball. (Shape Fitting on Point Sets with Probability Distributions, Jeff's Slides) | Seth Juarez |

Sep 15 | Geometry on Imprecise Points | Imprecision in data. eps-geometry for rounding errors (eps-Geometry). More general geometry problems: Basic Geometry Measures on Imprecise Points and maybe Largest and Smallest Convex Hulls on Imprecise Points | Parasaran |

Databases
| |||

Sep 22 | Sketching and Streaming | How to maintain a concise summary of uncertain data on-the-fly. Sketching probabilistic data streams | John Moeller |

Sep 29 | Beyond Random Sampling | Can we do better than random sampling? Theoretically (high level overview): deterministic eps-samples and for continuous domains. Practically (using insights from theory): low discrepancy sets i.e. eps-Sentinels and EASE. If time histogram results on uncertain probability distributions. Histograms and wavelets on probabilistic data Probabilistic Histograms for Probabilistic Data | Avishek |

Oct 6 | Ranking | Retrieving the top (most important) data points, when all values are uncertain. Focus on Expected Ranks. Note there are more general frameworks A Unified Approach to Ranking in Probabilistic Databases with slides. | Shantharaju |

Clustering
| |||

Oct 20 | Extracting Clusters | Extracting clusters when some data points are the noise. Clustering Noisy Data with Persistence and current work. | Josh Levine |

Nov 22 (10:45 am) | Finding Clusters in Uncertain Data | Building clusters on full (but uncertain) data sets. Approximation Algorithms for Clustering Uncertain Data (postponed from Oct 27) | Parasaran Raman |

Visualization (see page on Uncertainty Visualization)
| |||

Nov 3 | Uncertainty Visualization | State of the Art | Kristi Potter |

Nov 10 | Uncertain Vector Fields | Visualizing Uncertainty in Vector Fields. I will be discussing (a) Uncertain 2D Vector Field Topology, (b) my current project. | Harsh Bhatia |

Machine Learning And Statistics
| |||

Nov 17 | Support Vector Machines | Classifying uncertain data. Support Vector Classification with Input Data Uncertainty, Robustness and Regularization of Support Vector Machines | Piyush Rai |

Nov 24 | Multi-Armed Bandits | Trading off investigation of uncertainty and rewards. Multi-armed Bandits via the Gittens Index Strategy. The BalancedIndex for Restless Multi-armed Bandits. (Another older reference survey MAB-survey.) | Avishek |

Dec 1 | Particle Filters | Maintaining uncertain estimates. Particle Filtering Introduction, Particle Filters (another short note) Particle Filter Animation and helpful slides | Piyush Rai |

Dec 8 | Markov Random Fields | Harnessing Locality. MRF tutorial. An application to image tracking Optical Flow Estimation with Uncertainties through Dynamic MRFs | Bo Wang |