[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts]
In distributed data management, a primary concern is monitoring the distributed data and generating an alarm when a user specified constraint is violated. A particular useful instance is the threshold based constraint, which is commonly known as the distributed threshold monitoring problem. This work extends this useful and fundamental study to distributed probabilistic data that emerge in a lot of applications, where uncertainty naturally exists when massive amounts of data are produced at multiple sources in distributed, networked locations. Examples include distributed observing stations, large sensor fields, geographically separate scientific institutes/units and many more. When dealing with probabilistic data, there are two thresholds involved, the score and the probability thresholds. One must monitor both simultaneously, as such, techniques developed for deterministic data are no longer directly applicable. This work presents a comprehensive study to this problem. Our algorithms have significantly outperformed the baseline method in terms of both the communication cost (number of messages and bytes) and the running time, as shown by an extensive experimental evaluation using several, real large datasets.
1. Efficient Threshold Monitoring for Distributed Probabilistic Data,
Full version: Talk: Poster:
If you use this library for your work, please kindly cite our paper. Thanks!
If you find any bugs or have any suggestions/comments, we would be very happy to hear from you!
The library was developed in C++ on Fedora 12. For installation and usage,
please refer to the README file in the tar ball.
Download Efficient Threshold Monitoring for Distributed Probabilistic Library. [tar.gz] We have generated and experimented with the datasets described in
the paper. To generate experimental data sets please follow the directions outlined in our paper.
Efficient Threshold Monitoring for Distributed Probabilistic Library. [tar.gz]
We have generated and experimented with the datasets described in the paper. To generate experimental data sets please follow the directions outlined in our paper.