Our overall mission is to create novel theories, algorithms, and technologies and to develop prototypical systems and tools that advance the state-of-the-art in the sciences and humanities.
BIFOLD continues the successful strategic research agendas of its predecessors BBDC and BZML. It addresses challenges in artificial intelligence (AI) and data science (DS), with a particular focus on foundational research in data management (DM), scalable data processing, and machine learning (ML) as well as their intersection. An important area at the intersection of data management and machine leaning is the management of data science processes, considering both the individual steps as well as the holistic view of data modeling. Another focus of BIFOLD research is responsible AI, which aims to provide methods and technologies for AI applications to be understandable, reproducible, and compliant with ethical and legal frameworks.
Overview of Research Groups and Labs
Database Systems and Information Management
The Distinguished Research Group led by Volker Markl addresses the human and technical latencies prevalent in the data analysis process. This entails simplifying the specification of data analysis programs via the automatic distribution, parallelization and hardware adaptation of data processing operations to reduce the human latency and thereby increase programmer productivity. This includes: (i) devising intelligent data processing algorithms, (ii) exploiting novel advances in computer architecture (processing, network, storage), and (iii) building efficient data management, data science, and machine learning technologies as well as systems, to reduce the technical latency and thereby increase execution efficiency and throughput.
Machine Learning and Intelligent Data Analysis
The Distinguished Research Group of Klaus-Robert Müller is concentrating on the development of robust and interpretable machine learning methods for learning from complex structured and non-stationary data and the fusion of heterogeneous multi-modal data sources. A special focus lies on the efficient modeling of non-stationary, heterogeneous and structured data sources with deep learning and kernel methods. He and his team also work on theoretically sound incorporation of a priori knowledge from the application domain as well as the detection of anomalies in structured data. The resulting models are expected not only to be accurate, but also to explain their nonlinear decisions, quantify decision uncertainties, and create new knowledge about the studied data. In addition, Klaus-Robert Müller has been pursuing a long history of bringing machine learning into the sciences, which has helped to arrive at genuinely novel insights. In the last decade his attention has focused primarily on quantum chemistry, cancer research as well as computational neuroscience.
Big Data Analytics for Earth Observation
The mission of the Senior Research Group led by Begüm Demir is to develop innovative AI-driven solutions for diverse earth observation (EO) problems. With advances in satellite technology, EO data archives are continuously growing with high speed, delivering an unprecedented amount of data on the state of our planet over time. For example, via the Copernicus Programme – the European flagship satellite initiative in EO – Sentinel satellites acquire roughly 12 terabyte (TB) of satellite images per day, and the total size of the Copernicus data archives is almost 20 petabytes (PB). The “big EO data” is a great source of information that is relevant for several varying EO applications, as for example climate change analysis, urban area studies, forestry applications, risk and damage assessment, water quality assessment or crop monitoring. To address challenging problems in this field, the research activities of this group lie at the intersection of remote sensing, data management and machine learning.
AI for the Sciences
Frank Noé’s Senior Research Group focuses on the development of machine learning methods for solving fundamental problems in physics and chemistry. The group is internationally known for co-pioneering the Markov modeling approach, i.e. a suite of Bayesian and shallow machine learning methods for extracting molecular kinetics and thermodynamics from vast simulation data and the use of these techniques to solve fundamental problems in biophysics, such as protein folding and protein-protein association. In past research, the group has focused on the development of deep learning methods for molecular physics and chemistry. The group will continue and deepen this research direction, in close collaboration with other members of BIFOLD’s Inference Systems for the Sciences and Humanities Lab (SCI-Lab). More recently, the group has also been developing generic machine learning methods, mainly inspired by physical principles or methods. Frank Noé and his team build upon this research by developing machine learning methods for solving fundamental physics problems and new generic machine learning methods inspired by physical principles.
Explaining Deep Neural Networks
The Junior Research Group of Grégoire Montavon advances the foundations and algorithms of explainable AI (XAI) in the context of deep neural networks. One particular focus is on closing the gap between existing XAI methods and practical desiderata. Examples include using XAI to build more trustworthy and autonomous machine learning models and using XAI to model the behavior of complex real-world systems so that the latter become meaningfully actionable. In future research, the team explores: (1) how to use XAI to assess, on which data a deep neural network model can be trusted to perform autonomously or requires human intervention, and (2) how to use XAI in combination with a deep neural network to model complex real-world systems and identify actionable components. Grégoire Montavon and his team will collaborate with the members of BIFOLD’s “Explainable Artificial Intelligence Lab” (XAI-Lab).
Big Data Systems
The goal of the Independent Research Group of Jorge-Arnulfo Quiane-Ruiz is to develop a scalable and efficient big data infrastructure that supports next-generation distributed information systems, such as open information ecosystems. The group also aims at developing big data management techniques that ease the use of big data infrastructures to widen the access to the envisioned big data infrastructure for the layperson. In particular, we will conduct research in two core areas: (i) technologies for distributed information ecosystems that allow for publishing, sharing, and using data, algorithms, and computing infrastructure, and (ii) cross-platform data processing. In future research, we intend to create an open data-related ecosystem. Especially, we will investigate world-wide-scalable data processing techniques, efficient secure data processing techniques, and reliable pricing, usage-tracing, and payment models for data ecosystems. The group collaborates with other members of BIFOLD through the Data Infrastructures for the Sciences and Humanities Lab (DISH-Lab).
Probabilistic Modeling and Inference
Shinichi Nakajima leads the Independent Research Group Probabilistic Modeling and Inference. His aim is to develop novel probabilistic models and inference methods for multimodal, heterogeneous, and complex structured data analysis. In particular he wants to provide machine learning tools that can incorporate multiple aspects of data samples observed under different circumstances, in efficient and theoretically grounded ways. This includes: (i) developing novel probabilistic models with efficient inference methods, (ii) exploring novel applications of probabilistic models, and (iii) establishing uncertainty estimation methods for deep probabilistic models.
Machine Learning for Molecular Simulation in Quantum Chemistry
The Research Training Group of Stefan Chmiela focuses on developing machine learning methods for molecular simulations, with a special emphasis on many-body problems in quantum chemistry. Modeling many-body problems is computationally intensive due to the rapidly growing number of non-local interactions with system size. In quantum chemistry even the smallest practical problems already involve enough interacting electrons to render analytical solutions impossible. To address this challenge, the group develops methods that combine fundamental principles from computational physics with statistical modeling approaches. A data-driven angle allows questions to be asked in new ways and can give rise to new perspectives on established problems. In this context, the group will collaborate with members of BIFOLD’s Inference Systems for the Sciences and Humanities Lab (SCI-Lab).
Distributed Data Stream Processing in Heterogeneous Environments
Steffen Zeuch and his Research Training Group concentrate on developing a data management system for the processing of heterogeneous data streams in distributed fog and edge environments. An explosion in both the number and types of connected devices will create novel data-driven applications in the near future. These applications require low-latency, location awareness, wide-spread geographical distribution, and real-time data processing on potentially millions of distributed data sources and potentially millions of simultaneous data processing operations. In some cases these applications have to operate under tight resource constraints with respect to bandwidth, processing power, and energy consumption. The aim is to design a data management system that unifies cloud, fog and sensor environments at an unprecedented scale. This system should host these environments on a unified platform and leverage the opportunities of the unified architecture for cross-paradigm data processing optimizations, to support emerging IoT applications. In order to achieve that, they will collaborate with the members of BIFOLD’s Data and Application Management for the Internet of Things Lab (IoT-Lab).
Data and Application Management for the Internet of Things (IoT-Lab)
The mission of this lab is to enable users to execute data-driven analytics in a heavily distributed environment of sensors and processing devices. A concrete goal is to build and advance NebulaStream, a novel open-source data stream management system, which combines the cloud, the fog, and sensors into a single unified platform, and provides a holistic view for the processing of distributed fast data.
Data Infrastructures for the Sciences and Humanities (DISH-Lab)
The mission of this lab is to enable researchers in the sciences and humanities to publish, share, and access data and algorithms, and compose and execute data science workflows. A concrete goal is to research and build Agora, a collaborative platform for AI applications that manages all assets (data, algorithms, compute resources) needed to define and execute DM and ML workflows in both open and protected settings. We strive to enable collaborative and open-innovation in earth observation, healthcare, digital humanities, and contribute to citizen science.
Researchers in this lab include:
Jorge-Arnulfo Quiané-Ruiz, Volker Markl, Begüm Demir, Matteo Valleriani, Tim Conrad, Kaustubh Beedkar, Florian Schintke, Konrad Rieck, Ziawash Aabedjan, Sebastian Möller, Giuseppe Caire, Eleni Tzirita Zacharatou, Danh Le Phuoc
Inference Systems for the Sciences and Humanities Lab (SCI-Lab)
The mission of this lab is to develop ML methods to solve fundamental problems in the sciences and humanities. One exemplary aim is the efficient and accurate inference and simulation of molecular properties, which involves ML methods and ML-driven simulators for quantum mechanics, molecular mechanics and statistical mechanics. A concrete goal is to be able to simulate protein-inhibitor systems in a computationally efficient way with an ML-driven accurate description of quantum effects, long-ranged interactions and within a rigorous statistical mechanics framework.
Researchers in this lab include:
Frank Noé, Klaus-Robert Müller, Shinichi Nakajima, Jürgen Renn, Stefan Chmiela, Christoph Schütte, Gerd Graßhoff, Titus Kühne, Uwe Ohler, Roland Schwarz, Vladimir Spokoiny, Slawomir Stanczak, Martin Vingron, Jan Hermann, Kristof T. Schütt
Explainable AI Lab (XAI-Lab)
The mission of this lab is to strengthen the foundations of explainable AI. This includes developing the theoretical and algorithmic basis to systematically identify features that contribute to the output of an ML model, adapting these techniques to new practically relevant scenarios (e.g., deep learning, causal systems, multi-agent models) and building new foundations for problems, such as XAI-based uncertainty estimation and learning, or XAI-based data mining. Practical motivations will be provided by use-cases in biomedicine and the digital humanities.
Researchers in this lab include:
Klaus-Robert Müller, Grégoire Montavon, Wojciech Samek, Frederick Klauschen, Alexander Meyer, Matteo Valleriani, Shinichi Nakajima, Manfred Opper, Marina Marie-Claire Höhne
Watch this years conference presentations on BIFOLD’s Youtube channel!