Banner Banner

Lunchtalk: Towards Data Integration On-Demand

Icon

April 24, 2025 Icon 12:00 - 13:00

Icon

Technische Universität Berlin, Room EN148, 1st floor (Einsteinufer 17, 10587 Berlin)

Icon

Luca Zecchini

During this Lunch Talk, Luca Zecchini from the Data Integration and Data Preparation group at BIFOLD will talk about “Towards Data Integration On-Demand,” presenting his research in the scope of deduplication and dataset discovery, fostering on-demand use of available resources and shifting data integration towards a task-driven paradigm.

Abstract: Companies and organizations depend heavily on their data to make informed business decisions. Thus, they often collect huge amounts of data, storing them in raw form, e.g., in data lakes. To extract valuable information from these large corpora of dirty data, practitioners are then required to produce high-quality subsets to be used in their downstream tasks, e.g., to perform data analysis or to train machine learning models. Integrating and cleaning all dirty data upfront would be prohibitively expensive. Thus, they need novel solutions that operate in an on-demand fashion, i.e., focusing the cleaning effort only on the portion of the data that is useful to their task and returning clean results as soon as they are available. In this talk, Luca Zecchini will present the outcomes of his research in the scope of deduplication and dataset discovery, fostering on-demand use of available resources and shifting data integration towards a task-driven paradigm.

©Luca Zecchini

Bio: Luca Zecchini is a research associate at BIFOLD and TU Berlin, working in the Data Integration and Data Preparation Lab led by Prof. Ziawasch Abedjan. Previously, he completed his PhD program in 2024 at the University of Modena and Reggio Emilia (Italy), with Prof. Sonia Bergamaschi and Prof. Giovanni Simonini as advisors. During his PhD, he spent two visiting periods at the Hasso Plattner Institute in Potsdam, under the supervision of Prof. Felix Naumann. His research is mainly focused on data integration, with a special interest in the tasks of entity resolution, data preparation, and dataset discovery.