StatiX++: Flexible XML
Statistics Gathering and XML Query Cardinality Estimation
Introduction
A key component of XML data management systems is the result size estimator,
which estimates the cardinalities of user queries. Estimated cardinalities are
needed in a variety of tasks, including query optimization and cost-based
storage design; and they can also be used to give users early feedback about
the expected outcome of their queries.
StatiX++
is a framework that gathers XML statistics and estimates XML query cardinality
with flexibilities. StatiX++ uses and histograms to uniformly capture both the
structural and value skew present in documents. In particular, StatiX++
exploits schema information and schema transformation technology to produce
optimized high-quality and concise statistical summaries, as well as accurate
query cardinality estimation.
Important Features
- Support for recursive schemata
- Support for ambiguous schemata that result from schema transformations
- Techniques to further compress the statistical summaries
- Efficient statistics gathering---ability to handle very large XML documents
Publications
People
This project is currently funded by the National Science Foundation grant
number IIS-0534628.
Juliana Freire
Last modified: Tue Feb 7 13:41:22 PST 2006