Data quality is a serious concern in every data management application, and a variety of quality measures have been proposed to capture common sources of data quality degradation. We identify and focus attention on a novel measure, "column heterogeneity", that seeks to quantify the data quality problems that can arise when merging data from different sources. We identify desiderata that a column heterogeneity measure should intuitively satisfy, and describe our technique to quantify database column heterogeneity using operators from information theory like mutual information and entropy.
Finally, we present detailed experimental results, using diverse data sets of different types, to demonstrate that our approach provides a robust mechanism for identifying and quantifying database column heterogeneity.