Finding Frequent Items in Probabilistic Data

Library Description

The library is developed in GNU C++.  It also comes with the data Generator in Matlab. To compile, simply go to each folder and type Make. We also have an efficient C++ implementation of the space saving algorithm (An Integrated Efficient Solution for Computing Frequent and Top-k Elements in Data Streams, by Metwally et al., ACM TODS, 2006).


Quick Install

The subfolder's names are self-explain. Each subfolder contains a Makefile for easy-compilation. All the main test program has a verbose help output to explain what parameters it expects.


We have generated and experimented with the datasets described in the paper. In the source-code released above, it also contains the generator for the synthetic data sets. For real data sets, please follow the description in our paper.


Feifei Li