Anomaly detection is important in a variety of data monitoring settings. Determining ``bumps'', or clusters with unusually high/low data density, is a specific case of anomaly detection that has applications in biosurveillance and environmental monitoring. One of the most popular measures for detecting clusters is the scan statistic, a likelihood ratio test that determines how likely it is that measurements in a specific region are different from the surrounding areas. One such statistic is the Kulldorff scan statistic, a Poisson-based measure that is used widely in biosurveilance settings. In general, the statistic is a discrepancy function} that measures the difference between measurements in a region and baseline data for that region. The goal, given such a function and a class of regions, is to find the region maximizing this discrepancy function.
In this talk, we present algorithms for maximizing statistical discrepancy over the space of axis-parallel rectangles. We give provable approximation guarantees, both additive and relative, and our methods apply to any convex discrepancy function. Our algorithms work by connecting statistical discrepancy to combinatorial discrepancy; roughly speaking, we show that in order to maximize a convex discrepancy function over a class of shapes, one need only maximize a discrepancy function over the same set of shapes. Our results also extend to prospective discrepancy, where the region under consideration extends from some time $t$ to the current time, in addition to its spatial extents. They also carry over to hyper-rectangles in $d$ dimensions. We also present experimental validation of our method, demonstrating its efficacy and accuracy on various data sets and comparint it to known methods in the literature.