Rank Large Temporal Data

[Overview] [Papers and Talks] [Source Code] [Dataset] [Contacts] 

Overview

Ranking temporal data has not been studied until recently, even though ranking is an important operator (being promoted as a first class citizen) in database systems. However, only the instant top-k queries on temporal data were studied in, where objects with the k highest scores at a query time instance t are to be retrieved. The instant top-k definition clearly comes with limitations (sensitive to outliers, difficult to choose a meaningful query time t). A more flexible and general ranking operation is to rank objects based on the aggregation of their scores in a query interval, which we dub the aggregate top-k query on temporal data. For example, return the top-10 weather stations having the highest average temperature from 10/01/2010 to 10/07/2010; find the top-20 stocks having the largest total transaction volumes from 02/05/2011 to 02/07/2011. This work presents a comprehensive study to this problem by designing both exact and approximate methods (with approximation quality guarantees). We also provide theoretical analysis on the construction cost, the index size, the update and the query costs of each approach. Extensive experiments on large real datasets clearly demonstrate the efficiency, the effectiveness, and the scalability of our methods compared to the baseline methods.

Papers and Talks

1. Ranking Large Temporal Data,

    Full version: Talk:

Source Code

Important Notice

If you use this library for your work, please kindly cite our paper. Thanks!

If you find any bugs or have any suggestions/comments, we would be very happy to hear from you!

Library Description

The library was developed in C++ on Ubuntu 12.04. For installation and usage, please refer to the README file in the tar ball.

Download

Ranking Large Temporal Data. [tar.tgz]

Dataset

We have generated and experimented with the datasets described in the paper. Due to the copyright issues, we can't release the original data set. We provide a sample temporal data set (tYWm10k.txt) in this case. The sample data set is in binary format and has 10000 curve objects. All the curve objects have the same dead time point value 853916400. It's available [here]

Contacts

Jeffrey Jestes     

Mingwang Tang