Semantics of Ranking Queries for Probabilistic Data

Supported in part by the IIS program from NSF, award #0916448 (while at FSU), #1212310 (after transering to Utah), NSF link 1, NSF link 2.

[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts] 

Overview

Recently, there have been several attempts to propose definitions and algorithms for ranking queries on probabilistic data. However, these lack many intuitive properties of a top-k over deterministic data. We define several fundamental properties, including exact-k, containment, unique-rank, value-invariance, and stability, which are satisfied by ranking queries on certain data. We argue these properties should also be carefully studied in defining ranking queries in probabilistic data, and fulfilled by definition for ranking uncertain data for most applications. We propose an intuitive new ranking definition based on the observation that the ranks of a tuple across all possible worlds represent a well-founded rank distribution. We studied the ranking definitions based on the expectation, the median and other statistics of this rank distribution for a tuple and derived the expected rank, median rank and quantile rank correspondingly. We are able to prove that the expected rank, median rank and quantile rank satisfy all these properties for a ranking query. We provide efficient solutions to compute such rankings across the major models of uncertain data, such as attribute-level and tuple-level uncertainty. Finally, a comprehensive experimental study confirms the effectiveness of our approach.

Papers and Talks

1. Semantics of Ranking Queries for Probabilistic Data

    Full version:

Source Code

Important Notice

If you use this library for your work, please kindly cite our paper. Thanks!

If you find any bugs or have any suggestions/comments, we would be very happy to hear from you!

Library Description

The library is developed in GNU C++. To compile the source code, simply go to each folder and type Make. The Matlab data generators are also included.

Download

ExpRank and QuantRank Library [tgz]

Quick Install

The folder names are self-explanatory and contain a Makefile for easy-compilation. All programs have verbose help output to explain what parameters are expected.

Dataset

We have generated and experimented with the datasets described in the paper. To generate experimental data sets please follow the directions outlined in our paper.

Contacts

Jeffrey Jestes