Loading Events

« All Events

  • This event has passed.

Data Science Lecture Series. Speaker: El Kindi Rezig

September 20 @ 10:30 am - 11:45 am

Title: Data Preparation: The Biggest Roadblock in Data Science
Abstract: When building Machine learning (ML) models, data scientists face a significant hurdle: data preparation. ML models are exactly as good as the data we train them on. Unfortunately, data preparation is tedious and laborious because it often requires human judgment on how to proceed. In fact, data scientists spend at least 80% of their time locating the datasets they want to analyze, integrating them together, and cleaning the result.In this talk, I will present my key contributions in data preparation for data science, which address the following problems: (1) data discovery: how to discover data of interest from a large collection of heterogeneous tables (e.g., data lakes); (2) error detection: how to find errors in the input and intermediate data in complex data workflows; and (3) data repairing: how to repair data errors with minimal human intervention. The developed systems are specifically designed to support data science development which poses particular requirements such as interactivity and modularity.


September 20
10:30 am - 11:45 am


FASB 295