Beyond Explainable AI

Home >

Beyond Explainable AI

Wojciech Samek and Klaus-Robert Mueller published new book on Explainable AI

To tap the full potential of artificial intelligence, not only do we need to understand the decisions it makes, these insights must also be made applicable. This is the aim of the new book “xxAI – Beyond Explainable AI”, edited by Wojciech Samek, head of the Artificial Intelligence department at the Fraunhofer Heinrich Hertz Institute (HHI) and BIFOLD researcher and Klaus-Robert Mueller, professor of machine learning at the Technical University of Berlin (TUB) and co-director at BIFOLD. The publication is based on a workshop held during the International Conference on Machine Learning in 2020. Co-editors also include AI experts Andreas Holzinger, Randy Goebel, Ruth Fong and Taesep Moon. It is already the second publication by Samek and Mueller.

Following the great resonance of the editors’ first book, “Explainable AI: Interpreting, Explaining and Visualizing Deep Learning” (2019), which presented an overview of methods and applications of Explainable AI (XAI) and racked up over 300,000 downloads worldwide, their new publication goes a step further. It provides an overview of current trends and developments in the field of XAI. In one chapter, for example, Samek and Mueller’s team shows that XAI concepts and methods developed for explaining classification problems can also be applied to other types of problems. When solving classification problems, the target variables sought are categorical, such as “What color is the traffic light right now, red, yellow, or green?”. XAI techniques for solving these problems can help explain problems in unsupervised learning, reinforcement learning, or generative models. Thus, the authors expand the horizons of previous XAI research and provide researchers and developers with a set of new tools that can be used to explain a whole new range of problem types and models.

The book is available free of charge.
C: Fraunhofer HHI

As the title “Beyond Explainable AI” suggests, the book also highlights solutions regarding the practical application of insights from methodological aspects to make models more robust and efficient. While previous research has focused on the process from AI as a “black box” to explaining its decisions, several chapters in the new book address the next step, toward an improved AI model. Furthermore, other authors reflect on their research not only in their own field of work, but also in the context of society as a whole. They cover a variety of areas that go far beyond classical XAI research. For example, they address the relationships between explainability and fairness, explainability and causality, and legal aspects of explainability.

The book is available free of charge here.


BIFOLD Colloquium 2022/05/20

Home >

BIFOLD Colloquium 2022/05/20

Machine Learning for Remote Sensing Applications powered by Modular Supercomputing Architectures

Speaker: Dr. Gabriele Cavallaro, Forschungszentrum Jülich

Venue: TU Berlin, Architekturgebäude, Straße des 17. Juni 152, 10623 Berlin, Room: A151

Date & time: May 20/2022, 2 pm

Title: Machine Learning for Remote Sensing Applications powered by Modular Supercomputing Architectures

Supercomputers are unique computing environments with extremely high computational capabilities. They are able to solve problems and perform calculations which require more speed and power than traditional computers are capable of. In particular, they represent a concrete solution for data-intensive applications as they can boost the performance of processing workflows with more efficient access to and scalable processing of extremely large data sets.
This talk will first give an overview of the work and research activities of the ‘‘AI and ML for Remote Sensing’’ Simulation and Data Lab hosted at the Jülich Supercomputing Centre (JSC). Then, it will introduce the Modular Supercomputing Architecture (MSA) systems that are operated by the JSC. An MSA is a computing environment that integrates heterogeneous High Performance Computing (HPC) systems, which can include different types of accelerators (e.g., GPUs, FPGAs) and cutting-edge computing technologies (e.g., quantum and neuromorphic computing) and that is “modularized” by its software stack. The presentation will finally include different examples from Remote Sensing applications that can exploit MSA to drastically reduce the time to solution and provide users with timely and valuable information.

Dr. Gabriele Cavallaro is the Head of the ‘‘AI and ML for Remote Sensing’’ Simulation and Data Lab at the Jülich Supercomputing Centre, Forschungszentrum Jülich Germany. He is currently the Chair of the ‘‘High-Performance and Disruptive Computing in Remote Sensing’’ (HDCRS) Working Group of the IEEE GRSS ESI Technical Committee and Visiting Scientist at the Φ-Lab of the European Space Agency. His research interests cover remote sensing data processing with parallel machine learning methods that scale on cutting-edge distributed computing technologies.

ICDE 2022 Best Demo Award

Home >

ICDE 2022 Best Demo Award

A framework to efficiently create training data for optimizers

A demo paper co-authored by a group of BIFOLD researchers on “Farming Your ML-based Query Optimizer’s Food” presented at the virtual conference ICDE 2022 has won the best demo award. The award committee members have unanimously chosen this demonstration based on the relevance of the problem, the high potential of the proposed approach and the excellent presentation.

As machine learning is becoming a core component in query optimizers, e.g., to estimate costs or cardinalities, it is critical to collect a large amount of labeled training data to build this machine learning models. The training data should consist of diverse query plans with their label (execution time or cardinality). However, collecting such a training dataset is a very tedious and time-consuming task: It requires both developing numerous plans and executing them to acquire ground-truth labels. The latter can take days if not months, depending on the size of the data.

In a research paper presented last year at SIGMOD 2021 the authors presented DataFarm, a framework for efficiently creating training data for optimizers with learning-based components. This demo paper extends DataFarm with an intuitive graphical user interface which allows users to get informative details of the generated plans and guides them through the generation process step-by-step. As an output of DataFarm, users can download both the generated plans to use as a benchmark and the training data (jobs with their labels).


By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

The publication in detail:

Robin van de Water, Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Volker Markl: Farming Your ML-based Query Optimizer’s Food (to appear)


Machine learning (ML) is becoming a core component
in query optimizers, e.g., to estimate costs or cardinalities.
This means large heterogeneous sets of labeled query plans or
jobs (i.e., plans with their runtime or cardinality output) are
needed. However, collecting such a training dataset is a very
tedious and time-consuming task: It requires both developing
numerous jobs and executing them to acquire ground-truth
labels. We demonstrate DATAFARM, a novel framework for
efficiently generating and labeling training data for ML-based
query optimizers to overcome these issues. DATAFARM enables
generating training data tailored to users’ needs by learning from
their existing workload patterns, input data, and computational
resources. It uses an active learning approach to determine a
subset of jobs to be executed and encloses the human into
the loop, resulting in higher quality data. The graphical user
interface of DATAFARM allows users to get informative details
of the generated jobs and guides them through the generation
process step-by-step. We show how users can intervene and
provide feedback to the system in an iterative fashion. As an
output, users can download both the generated jobs to use as a
benchmark and the training data (jobs with their labels).

BIFOLD Summer School

Home >

BIFOLD Summer School

Ethics in Machine Learning & Data Management

The BIFOLD summer school will take place from 20-24 June 2022 at a Hotel near Berlin. It will focus on the latest ethical considerations in machine learning and data management by offering lectures and workshops on two main tracks. The school is designed for doctoral students of the BMBF‘s network of AI competence centres and organized by the BIFOLD Graduate School in collaboration with the Ethics in Residence Program with researchers of the Weizenbaum Institute for the Networked Society – the German Internet Institute.

Sufficient time for hands-on workshops and individual feedback is included.
C: StockSnap/pixabay

The summer school complements technological research on artificial intelligence (AI) within the AI competence centres with ethical aspects of explainability and sustainability. It is part of the Ethics in Residence program. The program includes multiple hands-on workshops to advance individual research projects, several guest lectures including Q&A, a panel discussion, and Ph.D. student presentation sessions with expert jury feedback. The summer school offers two tracks on explainable deep neural networks (XNN) and sustainable AI (SAI) for more specialized training of the doctoral students. All of BIFOLD’s PhD students are invited to participate. In addition BIFOLD offers places for the PhD students of the German AI competence centre network (ML2R, MCML, TUE-AI, ScaDS, DFKI).

Invited international experts

International expert researchers with backgrounds in computing within limits, disaster research, and COVID-19 data research are joining the summer school as speakers and are reachable for individual feedback.

Daniel Pargman, Ph.D.
KTH Royal Institute of Technology, Stockholm, Sweden

Teresa Cerratto Pargman, Ph.D.
Stockholm University, Sweden

Yuya Shibuya, Ph.D.
The University of Tokyo, Japan

Raphael Sonabend, Ph.D.
Imperial College London, UK & University of Kaiserslautern, Germany

Alan Borning, Ph.D. & Lance Bannett, Ph.D.
University of Washington, Seattle, USA

Enrico Costanza, Ph.D.
University College London, UK

Focus tracks

Track XNN focuses on evaluating interpretable machine learning to provide students with the ability to empirically validate claims about interpretability:

  • Critical review of XAI methods: taxonomies of XAI approaches, review of explanation goals, user benefits and current results from user studies
  • Rigorous methods for validating explanation methods with users: interdisciplinary methodological training, suitable evaluation datasets, user tasks and study designs, participant recruitment, validity, and reproducibility considerations

Track SAI emphasizes on ecological and socio-political aspects of AI to understand how AI and data can contribute to action in the name of sustainability transition:

  • Sustainability in research and policies: What is sustainability? United Nations SDGs, COVID-19, critical thinking on AI, environmental monitoring, sustainable smart cities and communities
  • SAI approaches and methods: data feminism, digital civics, computing within limits, citizen science, social media data, and mixed methods.
Organisational details

Pandemic conditions permitting, the school will be held as an on-site event at a lakeside hotel near Berlin, allowing time and space for outside sports and relaxing. Participants are expected to attend the entire program, arrival from 19. June, departure 25. June 2022. There is no tuition fee.
Please get in touch with us in case you need child care.

Program venue: Weizenbaum Institute for the Networked Society
Accomodation: Hampton by Hilton Berlin City West, in walking distance to Weizenbaum & TU Berlin, in the heart of Berlin, with the nearby Tiergarten Park providing plenty of greenery  


Please send one pdf-file, including your CV, an abstract of your (preliminary) Ph.D. project, and a short motivation message describing why you would like to participate and what you would like to learn during the summer school, to
Application deadline is 30. April 2022.


Andrea Hamm
Doctoral Researcher, Research Group “Responsibility and the Internet of Things”, Weizenbaum Institute for the Networked Society & TU Berlin, Germany

Martin Schuessler
Doctoral Researcher, Research Group “Criticality of AI-based Systems”, Weizenbaum Institute for the Networked Society & TU Berlin, Germany

Dr. Stefan Ullrich
Weizenbaum Institute for the Networked Society, Berlin, Germany

Prof. Dr. Volker Markl
Co-Director BIFOLD

Prof. Dr. Klaus-Robert Müller
Co-Director BIFOLD

Dr. Tina Schwabe, Dr. Manon Grube
Coordinators of the BIFOLD Graduate School

Available PhD research topics

Home >

Available PhD research topics

Available PhD research topics

Based on the overarching research foci of BIFOLD, the BIFOLD Graduate School is offering new PhD projects in the areas of current challenges in artificial intelligence (AI) and data science (DS), with focus on data management, machine learning, and their intersection.
Below is a brief description of the current research pursued by the BIFOLD research groups, including short lists of their main topics and foci. For more details, we recommend that you look at the respective webpages of the group leads or for more detailed information.
Please feel free to reach out to the group leads directly or to: – depending of the nature of your query.

Full job posting: EN / DE

BIFOLD Research Groups and their topics

The Distinguished Research Group of Volker Markl works on a wide range of topics and challenges in Database Systems and Information Management, with the overarching goal to address both the human and technical latencies prevalent in the data analysis process. The group investigates:

  • Automatic Optimization of Data Processing on Modern Hardware.
  • Automatic Optimization of Distributed ML Programs.
  • Optimization of the Data Science and ML Process.
  • Hardware-tailored Code Generation.
  • Compliant Geo-distributed Data Analytics.
  • Efficient Visualization of Big Data.
  • Scalable Gathering and Processing of Distributed Streaming Data.
  • Data Processing on Modern Hardware.
  • Scalable State Management.

The Distinguished Research Group led by Klaus-Robert Müller tackles problems within the bigger fields of Machine Learning and Intelligent Data Analysis, with the overarching goals to develop robust and interpretable ML methods for learning from complex structured and non-stationary data, and the fusion of heterogeneous multi-modal data sources. The group works on:

  • Learning from Structured, Non-stationary and Multi-modal Data.
  • Incorporating Domain Knowledge and Symmetries in ML Models.
  • Robust Explainable AI for Structured, Heterogeneous Data.
  • Structured Anomaly Detection.
  • Robust Reinforcement Learning in Complex, Partially Observed State Spaces.
  • ML Applications in the Sciences.
  • Deep Learning and GANs.

The Senior Research Group led by Begüm Demir works on Big Data Analytics for Earth Observation (EO) at the intersection of remote sensing, DM and ML. The group investigates and creates theoretical and methodological foundations of DM and ML for EO, with the goal to process and analyze a large amount of decentralized EO data in a scalable and privacy-aware manner and focuses on the following topics:

  • Privacy-preserving Analysis of EO Data.
  • Continual Learning for Large-Scale EO Data Analysis.
  • Heterogeneous Multi-Source EO Data Analysis.
  • Uncertainty-Aware Analysis of Large-Scale EO Data.

The Senior Research Group led by Frank Noé concentrates on the development of ML methods for solving fundamental problems in chemistry and physics. Currently, the group focuses on:

  • New ML Methods for Solving Fundamental Physics Problems.
  • Quantum Mechanics – Electronic Structure Problem.
  • Statistical Mechanics – Sampling Problem.
  • New ML methods Inspired by Physics.
  • Neural Network Optimization, Sampling, and Statistical Mechanics.
  • Graph Neural Networks.

The Junior Research Group led by Grégoire Montavon works on advancing Explainable AI (XAI) in the context of deep neural networks (DNNs). Its research focuses on solidifying the theoretical and algorithmic foundations of XAI for DNNs and closing the gap between existing XAI methods and practical desiderata:

  • From Explainable AI to trustworthy models.
  • From Explainable AI to actionable systems.
  • Application to historical networks and biological interaction network.

The Independent Research Group led by Jorge-A. Quiané-Ruiz looks into Big Data Systems with the goal to develop a scalable and efficient big data infrastructure that supports next-generation distributed information systems and creates an open data-related ecosystem.

  • Worldwide-scalable data processing.
  • Efficient secure data processing.
  • Reliable pricing, usage-tracing, and payment models

The Independent Research Group of Shinichi Nakajima focuses on probabilistic modelling and inference methods for multimodal, heterogeneous, and complex structured data analysis, providing ML tools that can incorporate multiple aspects of data samples observed under different circumstances, in efficient and theoretically grounded ways.

  • Generative Models and Inference Methods.
  • Applications of Generative Models and Bayesian Inference Methods.
  • Practical Uncertainty Estimation Methods.

The Research Training Group led by Steffen Zeuch works on developing a data management system for the processing of heterogeneous data streams in distributed fog and edge environments. The aim is to design a data management system that unifies cloud, fog, and sensor environments at an unprecedented scale. In particular, a system that can host these environments on a unified platform, and leverages the opportunities of the unified architecture for cross-paradigm data processing optimizations, to support emerging IoT applications.

  • Data Processing on Modern Hardware.
  • Data Processing in a Fog/Cloud Environment.

The Research Training Group led by Stefan Chmiela focuses on Machine Learning for many-body problems, with particular focus on quantum chemistry. The group develops methods that combine fundamental physical principles with statistical modeling approaches to overcome the combinatorial challenges that manifest themselves when large numbers of particles interact. Research is centered around topics such as

  • graph neural networks,
  • large-scale kernel methods and
  • the challenge of invariant/equivariant modelling.

Science and Startups launches AI Initiative

Home >

Science and Startups launches AI Initiative

Science and Startups launches AI Initiative

Science & Startups is the association of the four startup services of Freie Universität Berlin, Humboldt-Universität zu Berlin, Technische Universität Berlin and Charité – Universitätsmedizin Berlin. It provides a gateway and access to the joint programmes and resources of these universities, to successfully start and develop a company. Since 2021, Science & Startups has been specifically strengthening research transfer in the field of Artificial Intelligence (AI). Now they officially launched their new focus programme: K.I.E.Z. (Künstliche Intelligenz Entrepreneurship Zentrum). With an initial 6.85 million euros provided by the BMWI and co-financed by the state of Berlin, a unique ecosystem is being created. K.I.E.Z. is an initiative dedicated to facilitate entrepreneurs in AI with scientific expertise as well as access to capital, industry partners and hiring talent. K.I.E.Z. will be carried out in close cooperation with the Berlin Institute for the Foundations of Learning and Data (BIFOLD).

Representing the four startup services that launched K.I.E.Z.: Volker Hofmann (Humboldt-Innovation GmbH), Karin Kricheldorff (Centre for Entrepreneurship at TU Berlin, Marcus Luther (BIH Innovation), Dr. Tina Klüwer (Director AI, K.I.E.Z.), Steffen Terberl (Profound Innovation).
(Copyright: K.I.E.Z./Tanja Schnitzler)

In his statement BIFOLD Co-Director Prof. Dr. Volker Markl emphasized that this initiative is “yet another important building block on the way to establish Berlin in the Champions League of AI locations”. The programme will focus on an AI-oriented expansion of the entire innovation chain: from the identification of startup potential in research to the targeted acceleration of the feasibility phase to an accelerator programme at the new AI Campus Berlin. The integration of a strong industry network in all innovation phases will be a core element as well as the establishment of an AI Academy for startups and stakeholders. Thus, AI-based startups will be identified, optimally supported and established on the market.

On December 2nd, Dr. Tina Klüwer, director of the AI program K.I.E.Z., former founder and CEO of parlamind GmbH as well as board member of the German Federal Association for AI, and Dr. Susanne Perner, technology scout (with AI focus) at TU Berlin’s internal startup service – the Centre for Entrepreneurship, will give an introduction to the new initiative.

Date and time: December 2nd, 2021, 12:30 pm – 13:30 pm

Location: virtual

Register: here

More information:

Artificial Intelligence Entrepreneurship Center, K.I.E.Z.

In Search for Algorithmic Fairness

Home >

In Search for Algorithmic Fairness

In Search for Algorithmic Fairness

Artificial intelligence (AI) has found its way into many work routines – be it the development of hiring procedures, the granting of loans, or even law enforcement. However, the machine learning (ML) systems behind these procedures repeatedly attract attention by distorting results or even discriminating against people on the basis of gender or race. “Accuracy is one essential factor of machine learning models, but fairness and robustness are at least as important,” knows Felix Neutatz, a BIFOLD doctoral student in the group of Prof. Dr. Ziawasch Abedjan, BIFOLD researcher and former professor at TU Berlin who recently moved to Leibniz Universität Hannover. Together with Ricardo Salazar Diaz they published “Automated Feature Engineering for Algorithmic Fairness“, a paper on fairness of machine learning models in Proceedings of the VLDB Endowment.

Statue of Lady justice
BIFOLD researchers suggest a new machine learning model that leads to both: high accuracy and fairness.
(Copyright: Pixabay)

Algorithms might reinforce biases against groups of people that have been historically discriminated against. Examples include gender bias in machine learning applications on online advertising or recruitment procedures.

The paper presented at VLDB 2021 specifically considers algorithmic fairness. “Previous machine learning models for hiring procedures, usually discriminate systematically against women”, knows Felix Neutatz: “Why? Because they learn on old datasets derived from times when fewer women were employed.” Currently, there are several ways to improve the fairness of such algorithmic decisions. One is to specify that attributes such as gender, race or age are not to be considered in the decision. However, it turns out that other attributes also allow conclusions to be drawn about these sensitive characteristics.

The state-of-the-art bias reduction algorithms simply drop sensitive features and create new artificial non-sensitive instances to counterbalance the loss in the dataset. In case of recruiting procedures, this would mean simply adding lots of artificially generated data from hypothetical female employees to the training dataset. While this approach successfully removes bias it might lead to fairness overfitting and is likely to influence the classification accuracy because of potential information loss.

Felix Neutatz.
(Copyright: Privat)

“There are several important metrics that determine the quality of machine learning models,” Felix Neutatz knows, “these include, for example, privacy, robustness to external attacks, interpretability, and also fairness. The goal of our research is to automatically influence and balance these metrics.”

The researchers developed a new approach that addresses the problem with a feature-wise, strategy. “To achieve both, high accuracy and fairness, we propose to extract as much unbiased information as possible from all features using feature construction (FC) methods that apply non-linear transformations. We use FC first to generate more possible candidate features and then drop sensitive features and optimize for fairness and accuracy”, explains Felix Neutatz. “If we stick to the example of the hiring process, each employee has different attributes depending on the dataset, such as gender, age, experience, education level, hobbies, etc. We generate many new attributes from these real attributes by a large number of transformations.  For example, such a new attribute is generated by dividing age by gender or multiplying experience by education level. We show that we can extract unbiased information from biased features by applying human-understandable transformations.”

Finding a unique feature set that optimizes the trade-off between fairness and accuracy is challenging. In their paper, the researchers not only demonstrated a way to extract unbiased information from biased features. They also propose an approach where the ML system and the user collaborate to balance the trade-off between accuracy and fairness and validate this approach by a series of experiments on known datasets.

The publication in detail:

Ricardo Salazar, Felix Neutatz, Ziawasch Abedjan: Automated Feature
Engineering for Algorithmic Fairness
PVLDB 14(9): 1694 – 1702 (2021).


One of the fundamental problems of machine ethics is to avoid the
perpetuation and amplification of discrimination through machine
learning applications. In particular, it is desired to exclude the influence of attributes with sensitive information, such as gender or
race, and other causally related attributes on the machine learning
task. The state-of-the-art bias reduction algorithm Capuchin breaks
the causality chain of such attributes by adding and removing tuples. However, this horizontal approach can be considered invasive
because it changes the data distribution. A vertical approach would
be to prune sensitive features entirely. While this would ensure fairness without tampering with the data, it could also hurt the machine learning accuracy. Therefore, we propose a novel multi-objective feature selection strategy that leverages feature construction to generate more features that lead to both high accuracy and fairness.
On three well-known datasets, our system achieves higher accuracy
than other fairness-aware approaches while maintaining similar or
higher fairness.