ICDE 2022 Best Demo Award
A framework to efficiently create training data for optimizers
A demo paper co-authored by a group of BIFOLD researchers on “Farming Your ML-based Query Optimizer’s Food” presented at the virtual conference ICDE 2022 has won the best demo award. The award committee members have unanimously chosen this demonstration based on the relevance of the problem, the high potential of the proposed approach and the excellent presentation.
As machine learning is becoming a core component in query optimizers, e.g., to estimate costs or cardinalities, it is critical to collect a large amount of labeled training data to build this machine learning models. The training data should consist of diverse query plans with their label (execution time or cardinality). However, collecting such a training dataset is a very tedious and time-consuming task: It requires both developing numerous plans and executing them to acquire ground-truth labels. The latter can take days if not months, depending on the size of the data.
In a research paper presented last year at SIGMOD 2021 the authors presented DataFarm, a framework for efficiently creating training data for optimizers with learning-based components. This demo paper extends DataFarm with an intuitive graphical user interface which allows users to get informative details of the generated plans and guides them through the generation process step-by-step. As an output of DataFarm, users can download both the generated plans to use as a benchmark and the training data (jobs with their labels).
The publication in detail:
Robin van de Water, Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Volker Markl: Farming Your ML-based Query Optimizer’s Food (to appear)