ICDE 2022 Best Demo Award

Home >

ICDE 2022 Best Demo Award

A framework to efficiently create training data for optimizers

A demo paper co-authored by a group of BIFOLD researchers on “Farming Your ML-based Query Optimizer’s Food” presented at the virtual conference ICDE 2022 has won the best demo award. The award committee members have unanimously chosen this demonstration based on the relevance of the problem, the high potential of the proposed approach and the excellent presentation.

As machine learning is becoming a core component in query optimizers, e.g., to estimate costs or cardinalities, it is critical to collect a large amount of labeled training data to build this machine learning models. The training data should consist of diverse query plans with their label (execution time or cardinality). However, collecting such a training dataset is a very tedious and time-consuming task: It requires both developing numerous plans and executing them to acquire ground-truth labels. The latter can take days if not months, depending on the size of the data.

In a research paper presented last year at SIGMOD 2021 the authors presented DataFarm, a framework for efficiently creating training data for optimizers with learning-based components. This demo paper extends DataFarm with an intuitive graphical user interface which allows users to get informative details of the generated plans and guides them through the generation process step-by-step. As an output of DataFarm, users can download both the generated plans to use as a benchmark and the training data (jobs with their labels).

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

The publication in detail:

Robin van de Water, Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Volker Markl: Farming Your ML-based Query Optimizer’s Food (to appear)

Abstract

Machine learning (ML) is becoming a core component
in query optimizers, e.g., to estimate costs or cardinalities.
This means large heterogeneous sets of labeled query plans or
jobs (i.e., plans with their runtime or cardinality output) are
needed. However, collecting such a training dataset is a very
tedious and time-consuming task: It requires both developing
numerous jobs and executing them to acquire ground-truth
labels. We demonstrate DATAFARM, a novel framework for
efficiently generating and labeling training data for ML-based
query optimizers to overcome these issues. DATAFARM enables
generating training data tailored to users’ needs by learning from
their existing workload patterns, input data, and computational
resources. It uses an active learning approach to determine a
subset of jobs to be executed and encloses the human into
the loop, resulting in higher quality data. The graphical user
interface of DATAFARM allows users to get informative details
of the generated jobs and guides them through the generation
process step-by-step. We show how users can intervene and
provide feedback to the system in an iterative fashion. As an
output, users can download both the generated jobs to use as a
benchmark and the training data (jobs with their labels).

ACM SIGMOD Research Highlight

Home >

ACM SIGMOD Research Highlight

The Best of Both Worlds

BIFOLD researchers combined imperative and functional control flow in dataflow systems.

Mitos combines the advantages of Apache Flink and Apache Spark.
Copyright: BIFOLD

The paper “Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance” of six BIFOLD researchers was honored with a 2021 ACM SIGMOD Research Highlights Award. This prestigious award recognizes the work of Gábor E. Gévay, Tilmann Rabl, Sebastian Breß, Lorand Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz and Volker Markl as a definitive milestone and emphasizes its potentially significant impact. A short version of their paper (Imperative or Functional Control Flow Handling: Why not the Best of Both Worlds?), which emphasizes the advantages of combining imperative and functional control flow in dataflow systems, is published in the 2021 ACM SIGMOD Research Highlights special issue. The paper was already distinguished with a Best Paper Award at the 37. IEEE International Conference on Data Engineering (ICDE) 2021.

More information:

2022 ACM SIGMOD Record

ICDE 2021 honors BIFOLD Researchers with Best Paper Award

Best Poster Award for Kim Nicoli

Home >

Best Poster Award for Kim Nicoli

Best Poster Award for Kim Nicoli

During the Summer School on Machine Learning for Quantum Physics and Chemistry, in September 2021 in Warsaw, BIFOLD PhD candidate Kim. A. Nicoli was awarded with the Best Poster Award. His poster was democratically selected by the participants and the scientific committee for being the best amongst more than 80 participants. The corresponding paper “Estimation of Thermodynamic Observables in Lattice Field Theories with Deep Generative Models” is a joint international effort of several BIFOLD researchers: Kim Nicoli, Christopher Anders, Pan Kessel, Shinichi Nakajima, as well as a group of researchers affiliated with DESY (Zeuthen) and other institutions. The work is published in Physics Review Letters.

Kim A. Nicoli
(Copyright: Kim Nicoli)

“Modeling and understanding the interactions of quarks, fundamental subatomic, yet indivisible particles, which represent the smallest known units of matter, is the main goal of current ongoing research in the field of High Energy Physics. Deepening our understanding of such phenomena, leveraging on modern machine learning techniques, would have some important implications in many related fields of applied science and research, such as quantum computer devices, drug discoveries and many more.”

The summer school on Machine Learning for Quantum Physics and Chemistry was co-organized by the University of Warzaw and the Institute for Photonics Sciences, Barcelona.

More information: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.126.032001

Making the use of AI systems safe

Home >

Making the use of AI systems safe

Making the use of AI systems safe

BIFOLD Fellow Dr. Wojciech Samek and Luis Oala (Fraunhofer Heinrich Hertz Institute) together with Jan Macdonald and Maximilian März (TU Berlin) were honored with the award for “best scientific contribution” at this year’s medical imaging conference BVM. Their paper “Interval Neural Networks as Instability Detectors for Image Reconstructions” demonstrates how uncertainty quantification can be used to detect errors in deep learning models.

The award winners were announced during the virtual BVM (Bildverarbeitung für die Medizin) conference on March 9, 2021. The award for “best scientific contribution” is granted each year by the BVM Award Committee. It honors innovative research with a methodological focus on medical image processing in a medically relevant application context.

The interdisciplinary group of researchers investigated the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Limits in the understanding of an AI system’s behavior create risks for system failure. Hence, the identification of failure modes in AI systems is an important pre-requisite for their reliable deployment in medicine.

In a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-of-distribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates on two use cases how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. This is an important contribution to making the use of AI systems safer and more reliable.

The paper in detail:
“Interval Neural Networks as Instability Detectors for Image Reconstructions”

Authors:
Jan Macdonald, Maximilian März, Luis Oala, Wojciech Samek

Abstract:
This work investigates the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Indeed, in a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-ofdistribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates, how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. Such an ability is crucial to ensure a safe use of deep learning-based methods for medical image reconstruction.

Publication:
In: Bildverarbeitung für die Medizin 2021. Informatik aktuell. Springer Vieweg, Wiesbaden.
https://doi.org/10.1007/978-3-658-33198-6_79