Efficient Threshold Monitoring for Distributed Probabilistic Data

[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts] 

Overview

In distributed data management, a primary concern is monitoring the distributed data and generating an alarm when a user specified constraint is violated. A particular useful instance is the threshold based constraint, which is commonly known as the distributed threshold monitoring problem. This work extends this useful and fundamental study to distributed probabilistic data that emerge in a lot of applications, where uncertainty naturally exists when massive amounts of data are produced at multiple sources in distributed, networked locations. Examples include distributed observing stations, large sensor fields, geographically separate scientific institutes/units and many more. When dealing with probabilistic data, there are two thresholds involved, the score and the probability thresholds. One must monitor both simultaneously, as such, techniques developed for deterministic data are no longer directly applicable. This work presents a comprehensive study to this problem. Our algorithms have significantly outperformed the baseline method in terms of both the communication cost (number of messages and bytes) and the running time, as shown by an extensive experimental evaluation using several, real large datasets.

Papers and Talks

1. Efficient Threshold Monitoring for Distributed Probabilistic Data,

    Full version: Talk: Poster:

Source Code

Important Notice

If you use this library for your work, please kindly cite our paper. Thanks!

If you find any bugs or have any suggestions/comments, we would be very happy to hear from you!

Library Description

The library was developed in C++ on Fedora 12. For installation and usage, please refer to the README file in the tar ball.

Download

Efficient Threshold Monitoring for Distributed Probabilistic Library. [tar.gz]

Dataset

We have generated and experimented with the datasets described in the paper. To generate experimental data sets please follow the directions outlined in our paper.

Contacts

Mingwang Tang