Finding Frequent Items in Probabilistic Data

Supported in part by the IIS program from NSF, award #0916448 (while at FSU), #1212310 (after transering to Utah), NSF link 1, NSF link 2.

[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts] 

Overview

Papers and Talks

1. Finding Frequent Items in Probabilistic Data,

    Full version:   Talk:  

Source Code

Important Notice

If you use this library for your work, please kindly cite our paper. Thanks!

If you find any bugs or any suggestions/comments, we are very happy to hear from you!

Library Description

The library is developed in GNU C++.  It also comes with the data Generator in Matlab. To compile, simply go to each folder and type Make. We also have an efficient C++ implementation of the space saving algorithm (An Integrated Efficient Solution for Computing Frequent and Top-k Elements in Data Streams, by Metwally et al., ACM TODS, 2006).

Download

PHitter Library [tar.gz]

Quick Install

The subfolder's names are self-explain. Each subfolder contains a Makefile for easy-compilation. All the main test program has a verbose help output to explain what parameters it expects.

Dataset

We have generated and experimented with the datasets described in the paper. In the source-code released above, it also contains the generator for the synthetic data sets. For real data sets, please follow the description in our paper.

Contacts

Feifei Li