http://www.cs.utah.edu/~suresh
suresh at cs utah edu
Ph: 801 581 8233
Room 3404, School of Computing
50 S. Central Campus Drive,
Salt Lake City, UT 84112.
Information Theory For Data Management (Tutorial)
Monday September 07th 2009, 10:59 pm
Filed under: Papers

[author]Divesh Srivastava and Suresh Venkatasubramanian[/author]
35th International Conference on Very Large Databases (VLDB)

We are awash in data. The explosion in computing power and computing infrastructure allows us to generate multitudes of data, in differing formats, at different scales, and in inter-related areas. Data management is fundamentally about the harnessing of this data to extract information, discovering good representations of the information, and analyzing information sources to glean structure. Data management generally presents us with cost-benefit tradeoffs. If we store more information, we get better answers to queries, but we pay the price in terms of increased storage. Conversely, reducing the amount of information we store improves performance at the cost of decreased accuracy for query results. The ability to quantify information gain or loss can only improve our ability to design good representations, storage mechanisms, and analysis tools for data.

Information theory provides us with the tools to quantify information in this manner. It was originally designed as a theory of data communication over noisy channels. However, it has more recently been used as an abstract domain-independent technique for representing and analyzing data. For example, entropy measures the degree of disorder in data and mutual information captures the idea of noisy relationships among data. In general, viewing information theory as a tool to express and quantify notions of information content and information transfer has been very successful as a way of extracting structure from data.

In this tutorial, we will explore the use of information theory as part of a data representation and analysis toolkit. We will do this with illustrative examples that span a wide range of topics of interest to data management researchers and practitioners. We will also examine the computational challenges associated with information-theoretic primitives, indicating how they might be computed efficiently.

Links: PPT (Warning: 9 MB file!)



No Comments so far



Leave a comment
Line and paragraph breaks automatic, e-mail address never displayed, HTML allowed: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

(required)

(required)