StatiX++: Flexible XML Statistics Gathering and XML Query Cardinality Estimation


A key component of XML data management systems is the result size estimator, which estimates the cardinalities of user queries. Estimated cardinalities are needed in a variety of tasks, including query optimization and cost-based storage design; and they can also be used to give users early feedback about the expected outcome of their queries.

StatiX++ is a framework that gathers XML statistics and estimates XML query cardinality with flexibilities. StatiX++ uses and histograms to uniformly capture both the structural and value skew present in documents.  In particular, StatiX++ exploits schema information and schema transformation technology to produce optimized high-quality and concise statistical summaries, as well as accurate query cardinality estimation.

