Optimal Splitters for Temporal and Multi-version Databases

Wangchao Le, Feifei Li, Yufei Tao and Robert Christensen

[Overview] [Papers and Talks] [Source Code] [Contacts]

Overview

Temporal and multi-version databases are ideal candidates for a distributed store, which offers large storage space, and parallel and distributed processing power from a cluster of (commodity) machines. A key challenge is to achieve a good load balancing algorithm for storage and processing of these data, which is done by partitioning the database. We introduce the concept of optimal splitters for temporal and multi-version databases, which induce a partition of the input data set, and guarantee that the size of the maximum bucket be minimized among all possible configurations, given a budget for the desired number of buckets. We design efficient methods for memory- and disk-resident data respectively, and show that they significantly outperform competing baseline methods both theoretically and empirically on large real data sets.

Paper and Talk

Optimal Splitters for Temporal and Multi-version Databases,

Final version: Talk:

Source Code & Data

Optimal Splitter [.tgz]

Readme

For quick install, please refer to the instructions inside the code folder. The tarball also comes with two sample data sets, one of which contains 500 million intervals we excerpted from the MesoWest Project (11 GB).

If you find our work useful, please kindly cite our paper!

Contacts

If you have any question regarding the code, please send an email to Wangchao Le or Robert Christensen

If you are interested in obtaining larger interval datasets excerpted from the MesoWest Project and the MEME Tracker Project, please contact Professor Feifei Li.

Last Updated: Jul 2013