Top-k Queries on Temporal Data

Supported in part by the IIS program from NSF, award #0916448 (while at FSU), #1212310 (after transering to Utah), NSF link 1, NSF link 2.

Feifei Li, Ke Yi and Wangchao Le

[Overview] [Papers and Talks] [Source Code] [Romanian Translation Site] [Contacts] 

Overview

Temporal data arises in a large number of domains, such as time series data, transactional databases, spatio-temporal trajectories and many others. The database community have devoted extensive amount of efforts to indexing and querying temporal data in the past decades. However, insufficient amount of attention has been paid to temporal ranking queries, arguably one of the most important query types. More precisely, given any time instance t, the query asks for the top-k objects at time t with respect to some score attribute. Some generic indexing structures based on R-trees do support ranking queries on temporal data, but as they are not tailored for such queries, the performance is far from satisfactory. We feel that the importance of ranking queries justifies the need of a specifically designed index. To this end, we present the Seb-tree, a simple indexing scheme that supports temporal ranking queries much more efficiently than the R-tree based generic methods. In fact, the Seb-tree answers a top-k query for any time instance t in the optimal number of I/Os in expectation, namely, O(log_B (N/B + k/B) ) I/Os, where N is the size of the data set and B is the disk block size. The index has near-linear size (for constant and reasonable kmax values, where kmax is the maximum value for the possible values of the query parameter k), can be constructed in near-linear time (for constant and reasonable kmax values), and also supports insertions and deletions without affecting its optimal (in expectation) query performance guarantee. Most of all, the Seb-tree is especially appealing in practice due to its simplicity as it uses the B-tree as the only building block. Extensive experiments on many large data sets, show that the Seb-tree is more than an order of magnitude faster than the R-tree based indexes for temporal ranking queries.

Papers and Talks

1. Top-k Queries on Temporal Data,

    Full version:   Talk:  

Source Code

Important Notice

If you use this library for your work, please kindly cite our paper. Thanks!

If you find any bugs or any suggestions/comments, we are very happy to hear from you!

Library Description

The library is developed in GNU C++.  Names of the source files are self-explained. The implementation of Seb-tree is in the folder "rtcodepub". The code for Seb-tree has two major components: one ("buildSEB") for building the Seb-tree and the other ("query") for querying the Seb-tree. Besides, we also provide the code for data and query generators in the same folder.

Our Seb-tree library is built on two other libraries, namely, the CGAL library and the customized TPIE library. We provide below these two libraries used in our implementation. The customized TPIE library is inside the tarball of the Seb-tree library.

Download

Seb-tree Library [tar.gz]

CGAL Library [tar.gz]

[please send email to Feifei for this library.]

Quick Install

1. Install CGAL library. Extract the CGAL-3.3.1 folder from the CGAL tarball and use the the command "install_cgal" inside the CGAL-3.3.1 folder. Follow the installation guide and add CGAL_MAKEFILE to your environment variable.

2. Install TPIE library. The TPIE library is inside rtcodepub folder. TPIE needs a scratch folder for computation. Create a folder, e.g. $HOME/tmp and add "AMI_SINGLE_DEVICE=$HOME/tmp" to your environment variable. Change directory to the TPIE folder. Type "./configure" and "make lib" to install TPIE. Do NOT change the Makefile inside the test folder.

3. Compile and use the Seb-tree library. In the rtcodepub folder, use make to generate all the executables. You may need to generate a synthetic dataset by calling the "datagen.exe" and some synthetic queries by calling the "qgen.exe". To build a Seb-tree, use the executable "buildSEB.exe". To query a Seb-tree, use the executable "query.exe". If you are using the synthetic data or are formating your own data in the same way as our synthetic data, put "raw format" in the "buildSEB.exe" argument list. To indicate the page size, put "4" for 4096 bytes. For all the other arguments, please read our paper for details.

Romanian Version of the Project

Thanks for the wonderful effort of Mr. Maxim Petrenko. This project is also available in Romania, check out his website here!

Contacts

Feifei Li     

Ke Yi     

Wangchao Le