Refreshments 3:20 p.m.
Abstract
Record Matching is a key element of data cleaning technology.
Error-Tolerant Record Matching reconciles multiple representations of
the same entity in the presence of errors such as spelling mistakes
and abbreviations. In this talk, we describe some of the key
scenarios and the underlying technology for error-tolerant record
matching that we have developed as part of our Data Cleaning project
at Microsoft Research.
BIO
Surajit Chaudhuri is a Distinguished Scientist at Microsoft Research.
He started the AutoAdmin project on self-tuning database systems.
Surajit has also worked in the area of data cleaning. Their research
on both physical database design and data cleaning has been
incorporated in Microsoft products and services such as SQL Server and
Bing. Surajit did his Ph.D. from Stanford University and he is an ACM
Fellow. He was awarded the ACM SIGMOD Contributions award in 2004 ,
the 10 year VLDB Best paper Award in 2007, and ACM SIGMOD Edgar F.
Codd Innovations Award in 2011.