Suresh Venkatasubramanian :: papers :: datadiff
Home Research Papers Talks CV Links About Me The Geomblog

An Information-Theoretic Approach to Detecting Changes in Multi-dimensional Data Streams

Tamraparni Dasu, Shankar Krishnan, Suresh Venkatasubramanian, Kevin Yi

Abstract:

An important problem in processing large data streams is detecting changes in the underlying distribution that generates the data. The challenge in designing change detection schemes is making them general, scalable, and statistically sound. In this paper, we take a general, information-theoretic approach to the change detection problem, which works for multidimensional as well as categorical data. We use relative entropy, also called the Kullback-Leibler distance, to measure the difference between two given distributions. This is then tied to a statistical inference procedure based on the theory of bootstrapping, which allows us to determine whether our measurements are statistically significant.

In addition to providing change detections, our method generalizes Kulldorff's spatial scan statistic, allowing us to quantitatively identify specific regions in space where large changes have occurred. Our scheme is nonparametric and requires no assumptions on the underlying distributions. It can also be implemented efficiently using a quadtree-like structure that scales well with the size of data. In an experimental study we demonstrate the generality of our approach with different kinds of data.