Banner Banner

MaTElDa: Multi-Table Error Detection

Fatemeh Ahmadi
Marc Speckmann
Malte Fabian Kuhlmann
Ziawasch Abedjan

March 25, 2025

As data-driven applications gain popularity, ensuring high data quality is a growing concern. Yet, data cleaning techniques are limited to treating one table at a time. A table-by-table appli cation of such methods is cumbersome, because these methods either require previous knowledge about constraints or often require labor-intensive configurations and manual labeling for each individual table. As a result, they hardly scale beyond a few tables and miss the chance for optimizing the cleaning process. To tackle these issues, we introduce a novel semi-supervised er ror detection approach, Matelda, that organizes a given set of tables by folding their cells with regard to domain and quality similarity to facilitate user supervision. We propose a unified feature embedding that makes cell values comparable across tables. Experimental evaluations demonstrate that Matelda out performs various configurations of existing single-table cleaning methodologies in the multi-table scenario.