Approximate Aggregation Techniques for Sensor Databases

Jeffrey Considine, Feifei Li, George Kollios and John Byers

[Abstract] [Papers and Talks] [Source Code]

Abstract

In the emerging area of sensor-based systems, a significant challenge is to develop scalable, fault-tolerant methods to extract useful information from the data the sensors collect. An approach to this data management problem is the use of sensor database systems, exemplified by TinyDB and Cougar, which allow users to perform aggregation queries such as MIN, COUNT and AVG on a sensor network. Due to power and range constraints, centralized approaches are generally impractical, so most systems use in-network aggregation to reduce network traffic. Also, aggregation strategies must provide fault-tolerance to address the issues of packet loss and node failures inherent in such a system. An unfortunate consequence of standard methods is that they typically introduce duplicate values, which must be accounted for to compute aggregates correctly. Another consequence of loss in the network is that exact aggregation is not possible in general. With this in mind, we investigate the use of approximate in-network aggregation using small sketches. Our contributions are as follows: 1) we generalize well known duplicate-insensitive sketches for approximating COUNT to handle SUM (and by extension, AVG and other aggregates), 2) we present and analyze methods for using sketches to produce accurate results with low communication and computation overhead (even on low-powered CPUs with little storage and no floating point operations), and 3) we present an extensive experimental validation of our methods. 

Papers and Talks

1. Approximate Aggregation Techniques for Sensor Databases In Proceedings of 20th IEEE International Conference on Data Engineering (ICDE2004), Boston, MA, March 30 - April 2, 2004. (Best Paper Award)      

Conference version:   Talk:  (prepared by John Byers)

2. Journal Version, ACM TODS Vol. 34, No. 1, pages 1-35.

Source Code 

This code is written in Java and it is based on the original TAG simulator contributed by Samuel Madden. In addition to the FM sketch used in the conference version of the paper, it also includes an implementation of loglog sketch. The code is developed as a project in JBuilder. So it is very easy to open the project under JBuilder environment (the project file is included). However, You could choose any IDE you like to make further modifications. For any question regarding to the implementation, please contact me.

Simulator of sketch aggregation techniques for sensor databases [zip], by Feifei Li.

Last modified 03/31/06