L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts] 

Overview

Kernel density estimates are a robust way to reconstruct a continuous distribution from a discrete point set. Typically their effectiveness is measured either in L1 or L2 error. In this paper we investigate the challenges in using L∞(or worst case) error, a stronger measure than L 1 or L 2. We present efficient solutions to two linked challenges: how to evaluate the L∞ error between two kernel density estimates and how to choose the bandwidth parameter for a kernel density estimate built on a subsample of a large data set.

Papers and Talks

1. L∞ Error and Bandwidth Selection for Kernel Density Estimates of Large Data

    Full version: [PDF]

Source Code

Important Notice

If you use this code for your work, please kindly cite our paper. Thanks!

If you find any bugs or have any suggestions/comments, we would be very happy to hear from you!

Code Description

The code package includes the methods to generate the coresets using GNU C++, scripts to run the experiments in the paper and the real 2-dimensional datasets.

Download

L∞ Error and Bandwidth Selection code [tgz]

Quick Install

The folder names are self-explanatory and contain a Makefile for easy-compilation. All programs have a readme and verbose help output to explain what parameters are expected.

Dataset

We have generated and experimented with the datasets described in the paper. A sample data is provided, please refer to readme for an example of the sample data. For now, our code can successfully deal with time series data and spacial spacial data.

Acknowledgement

Research described below has been funded by the NSF under grants CCF-1350888, IIS-1251019, and ACI-1443046. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Contacts

Yan Zheng