Characterizing and Exploiting Reference Locality in Data Stream Applications

Overview

In this paper we discuss a new approach to process queries in data stream applications that takes into account the reference locality that has been observed in many real life data streams. We identify two different causes of reference locality: popularity over long time scales and temporal correlations over shorter time scales. An elegant mathematical model is shown to precisely quantify the degree of those sources of locality. Furthermore, we analyze the impact of locality-awareness on achievable performance gains over traditional algorithms on applications such as MAX-subset join and approximate count estimation. Finally, we experimentally compare several existing algorithms against our locality-aware algorithms over a number of real datasets. The results validate the usefulness of our approach.
 

Publications

Feifei Li, Ching Chang, George Kollios, and Azer Bestavros. Characterizing and Exploiting Reference Locality in Data Stream Applications
pdf ps ppt
 
 
Download
 
Stock Trading traces
U.S. Time and Sales Stock Data used in the experiments.
Data Stream Generator
Synthetic Stream Generator used in the experiments.
Weather traces
Pacific Northwest weather data used in the experiments.
Network data
A fraction of the Origin-Destination Flow data used in the experiments.
Stock Data Extractor
A program that collects stock trading traces.
The model implementation in Matlab
A program that computes the composition of reference locality in a dataset.
Stream Simulator
A simulation program that process join operation in streams.

Contact
 
Feifei Li