AI research “made in Berlin”

Home >

AI research “made in Berlin”

Multi-million funding for AI research “made in Berlin”

Joint press release of the Senate Department for Higher Education and Research, Health, Long-Term Care and Gender Equality, BIFOLD and Technische Universität Berlin

Berlin’s AI competence center the Berlin Institute for the Foundations of Learning and Data (BIFOLD) at Technische Universität Berlin (TUB) has now made the transition from project funding to permanent joint funding provided by the federal government and the State of Berlin. This sees the establishment of a national AI competence center in Berlin that will make an important contribution to the development and applications of artificial intelligence. Through a partnership with Charité – Universitätsmedizin Berlin, BIFOLD is set to become a cross-university central institute in the near future.
BIFOLD will receive institutional funding of 22 million euros annually from 1 July 2022 as part of the federal-state AI strategy implementation funding program. Half of the funding will be provided by the federal government with the remaining funding coming from the State of Berlin.

BIFOLD focuses on researching the theoretical and algorithmic foundations of data management and machine learning as well as related technologies and systems.
C: istock

Senator for higher education and research Ulrike Grote: “Permanent federal-state funding for BIFOLD as a national AI competence center represents a milestone in Berlin’s journey towards becoming a leading international center for AI research. It is also a recognition of the city’s innovative power as a center of science and research as well as the importance of examining how artificial intelligence can improve our lives. I would like to thank TU Berlin for its contribution to this process and wish everyone involved every success on this exciting journey!
Prof. Dr. Geraldine Rauch, president of TU Berlin: “By funding BIFOLD on the campus of Technische Universität Berlin, the federal government and the State of Berlin are supporting a unique nucleus in the area of artificial intelligence. BIFOLD brings together research, teaching and innovation at the interface between Big Data and machine learning and has already achieved a high international reputation. Funding research into the technological foundations of AI at public universities rather than just leaving it to private enterprises is the right approach for the future. As an integral part of our university, BIFOLD is also very well equipped for the essential task of linking the topic to issues affecting society. A further goal of the research center is to keep pace in the international competition for leading minds in the area of AI, and parallel to this to train the AI experts so urgently needed for the future. BIFOLD contributes significantly to Berlin as a center of science. I would like to congratulate everyone who made this major success possible. This is a great achievement for Berlin!”
Prof. Dr. Volker Markl, co-director at BIFOLD and professor at TU Berlin: “In BIFOLD, we focus on researching the theoretical and algorithmic foundations of data management and machine learning as well as related technologies and systems. It is precisely this research at the interface between machine learning and data management that will competitively drive the applications of artificial intelligence at international level. Our goal is to develop entirely novel economic as well as scientific and technical applications.”

Prof. Dr. Klaus-Robert Müller, co-director of BIFOLD and professor at TU Berlin: “What is important for us is to create new knowledge using AI, in other words to gain genuinely new insights in areas such as medicine and chemistry. Parallel to this, we are developing structures and open platforms for knowledge and technology transfer and thus helping to foster an effective environment for innovation in the Berlin metropolitan region. The Berlin university landscape offers both internationally recognized research groups, with whom we are already working closely, and many other relevant research institutions to examine and discuss the implications of AI for society.

BIFOLD emerged in 2020 from two Federal Ministry of Education and Research competence centers at Technische Universität Berlin, the Berlin Big Data Center (BBDC) and the Berlin Center for Machine Learning (BCML), which were financed by federal government project funding from 2014 and 2018 respectively. BIFOLD is to be expanded with additional Berlin partner institutions. A cooperation agreement with the Charité – Universitätsmedizin Berlin to develop BIFOLD as a cross-university central institute is to be signed in the coming days.
At a meeting of the Joint Science Conference on 13 November 2020, the federal government and the states of Baden-Württemberg, Bavaria, Berlin, North Rhine-Westphalia and Saxony adopted a federal-state agreement to establish five national AI competence centers. Up until now, these centers, including BIFOLD, have been financed by project funding from the Federal Ministry of Education and Research. Starting 1 July 2022, they will receive joint institutional funding from the federal government and their own state (a total of 100 million euros per year for all five centers on a 50/50 basis).
A federal-state committee was set up to oversee the establishment of the centers and their further development as well as define the relevant framework conditions. The decision regarding the permanent funding level for all five centers was taken by the committee on 17 December 2021 following a science-led review process. The centers will receive total funding of between 19.2 and 22 million euros per year starting 2023.

Homogenization of knowledge

Home >

Homogenization of knowledge

The Shared Scientific Identity of Europe

The project Sphere: Knowledge System Evolution and the Shared Scientific Identity of Europe is one of the leading Digital Humanities projects, exploring a large corpus of more than 350 book editions about geocentric cosmology and astronomy from the early days of printing between the 15th and the 17th centuries (Sphaera Corpus) for about 76.000 pages of material. The relatively large size of this humanities dataset presents a challenge to traditional historical approaches, but provides a great opportunity to computationally explore such a large collection of books. In this regard, the Sphere project is an incubator of multiple Digital Humanities (DH) approaches aimed at answering various questions about the corpus, with the ultimate objective to understand the evolution and transmission of knowledge in the early modern period.

At the base of all the computation approaches within the Sphere project lies its large knowledge graph modelled according to the CIDOC-CRM ontology . This ever-expanding knowledge graph contains detailed metadata on all the editions in the Sphaera corpus, as well as all the people involved in their composition and production. “Relying on this database, we were able to construct a multiplex network whose nodes represent the editions in our corpus, and whose edges represent various semantic relations” say Prof. Dr. Matteo Valleriani, research group leader at the Max Planck Institute for the History of Science, PI at BIFOLD, Honorary Professor at TU Berlin and Professor for Special Appointment at the University of Tel Aviv. The detailed analysis of this multiplex network shed the light on numerous influential and important editions within the Sphaera corpus.

Figure 1 – T-SNE Visualization of the Sphaera table pages, represented as histograms by the bigram network. The two highlighted pages are numerically identical.
C: project sphere

Some of these editions show disruptive behavior that executed long term significant impact on the corpus, which are called enduring innovations. As an example a Spanish edition published in 1535 by Francesco Faleiro has been identified that, besides basic cosmological doctrine of the time, also contains a compact report about pressing issues of the time, such as the art of navigation and particularly the subject of the magnetic variations that made compass use on ships challenging on transoceanic travels during the sixteenth century. Other editions played instead a major role in collecting information from past editions and passed them on towards future ones; They are called great transmitters.
By looking at the content of the Sphaera editions, the scientists investigated two different elements that often repeat across the corpus, astronomical tables and visual elements, and used those as proxy to better understand the evolution of knowledge. We first extracted all the pages containing a table from every edition of our corpus using a neural network, which totaled ca. 10.000 pages. While this might sound trivial, the most difficult task is identifying similar tables. Although the content of some tables might have remained the same, their designs potentially changed considerably. To solve this problem, we used the histogram output of a bigram network to calculate the similarity score between the 10.000 tables with pages (Fig. 1).

Figure 2 – Similar visual elements from various Sphaera editions. Visual elements with the same color coding are likely printed by the same woodblock.
C: project sphere

This was further validated using BiLRP, an Explainable AI method, to ensure that the results are justifiable. “When it comes to visual elements, we developed a workflow to extract, compare, and analyze the over 30.000 visual elements. This involved a neural network architecture to extract them from the corpus’ pages, which is followed by a rigorous similarity analysis, combing neural networks and standard computer vision approaches, the results of which are confirmed by domain experts. Using this workflow, we were able to identify numerous visual elements that were printed using the same woodblocks” explains Matteo Valleriani. (Fig. 2).

All this generated information eventually makes its way back into the Sphaera knowledge graph. With the increased amount of information representing each Sphaera edition (e.g. tables, images, text-parts, etc.), the data scientist Hassan el-Hajj developed the cidoc2vec approach to leverage knowledge graph structure and generate a vector representation of these editions. In this way, similarities between editions based on their stored metadata were highlighted, and several re-print clusters, where the producer reprinted the same content with minimal changes over a long period of time could also be easily identified.

All these studies help to better understand the dynamics of the homogenization of knowledge, which can be described as a mechanism of imitation, centered on the reformed Wittenberg. It ensured that at the end of the 16th century, students of astronomy across Europe were learning the same concepts. “As a digital humanities incubator, the Sphere project plays a major role in re-defining how we think about studying history, how we deal with big humanities data, and how we can convert computationally obtained results into sound historical hypotheses” summarizes Matteo Valleriani.

The publication in detail:

Hassan El-Hajj, Maryam Zamani, Jochen Büttner, Julius Martinetz, Oliver Eberle, Noga Shlomi, Anna Siebold, Grégoire Montavon, Klaus-Robert Müller, Holger Kantz & Matteo Valleriani: An Ever-Expanding Humanities Knowledge Graph: The Sphaera Corpus at the Intersection of Humanities, Data Management, and Machine Learning, Datenbank Spektrum (2022).

Categories
Allgemein

BIFOLD Colloquium 04/07/2022

Home >

BIFOLD Colloquium 04/07/2022

Algorithms for inferring cancer evolution from haplotype-specific somatic copy-number alterations

Speaker: Prof. Dr. Roland F. Schwarz, Center for Integrative Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), University Hospital and University of Cologne
Date & Time: July 4, 2022; 4:00 pm
Venue: TU Berlin, Straße des 17. Juni 135, 10623 Berlin, Main building, Room: H 1028

Abstract: Traditionally, phylogenetic inference methods have mostly focused on inferring evolutionary trees from single nucleotide variants (SNVs). In cancer, in addition to SNVs, genomic rearrangements and somatic copy number alterations (SCNAs) play an important role in the development and progression of the disease, and SCNAs can provide a rich source of genetic variation suitable for reconstructing cancer evolution. SCNAs thereby pose specific algorithmic challenges owing to the non-independence of adjacent genomic loci and their overlapping and cascading nature. Additionally, accurate phylogenetic inference from SCNAs requires identifying the parental chromosome of origin of each evolutionary event, a non-trivial task in many short-read sequencing datasets.
Roland Schwarz will introduce two key algorithms for inferring cancer evolution from SCNA profiles derived from multi-region sequencing data: refphase, which identifies the parental chromosome of origin of SCNAs by leveraging heterozygous germline variants shared between multiple samples from the same patient, and MEDICC2, a complete phylogenetic inference algorithm for SCNA profiles. He will describe how his group overcomes the problem of phasing of SCNAs, leverage finite-state transducers to compute exact minimum event distances between pairs of SCNA profiles, and detect, order and place individual evolutionary SCNA events including whole-genome doublings on the phylogenetic tree. Finally, he is going to demonstrate how in the future, the event histories inferred by MEDICC2 might be leveraged by machine learning algorithms to derive copy-number signatures which can identify the mutational processes underlying individual cancer types.

Speaker: Dr. Roland F. Schwarz is Professor for Computational Cancer Biology at the Center for Integrative Oncology (CIO), Cancer Research Center Cologne Essen (CCCE), University of Cologne, Germany. He is a renowned expert in machine learning, theoretical computer science and clinical oncology. He uses machine learning and statistical algorithms to explore the causes and functional consequences of differences in tumors and cancer evolution. 2016 Schwarz was awarded the Prize of the Berlin-Brandenburg Academy of Sciences for excellence in cancer research for his work on cancer genome evolution.

Prof. Dr. Roland Schwarz
C: private

Measuring the active brain

Home >

Measuring the active brain

Lecture: A new set of paradigms and machine learning algorithms to understand single-pulse electrical stimulation in the human brain

Speaker: Dr. Dora Hermes Miller and Dr. Kai Miller, Mayo Clinic, Rochester, USA

Venue: TU Berlin, Main building, Straße des 17. Juni 135, 10623 Berlin, Room: H 1028

Date & time: June 15/2022, 3 pm

Title: A new set of paradigms and machine learning algorithms to understand single-pulse electrical stimulation in the human brain

Abstract: Brain networks can be explored by delivering brief pulses of electrical current in one area while measuring voltage responses in other areas. We outline a set of paradigms to study these data, beginning with a “convergent” paradigm to study brain dynamics, focusing on a single brain site to observe the average effect of stimulating each of many other brain sites. Viewed in this manner, visually-apparent motifs in the temporal response shape emerge from adjacent stimulation sites. This work constructs and illustrates a data-driven machine learning approach to determine characteristic spatiotemporal structure in these response shapes, summarized by a set of unique “basis profile curves” (BPCs). Each BPC may be mapped back to underlying anatomy in a natural way, quantifying projection strength from each stimulation site using simple metrics. We then illustrate how this BPC formulation is useful for understanding projections to the collateral sulcus from limbic and temporal lobe structures. Our work then transitions to a “divergent” paradigm, describing a new type of parameterization of single pulse responses that we call “canonical response parameterization” (CRP). The CRP approach allows for robust statistical characterization of stimulation responses, regardless of what the shape of the response is. Our techniques are demonstrated for an array of implanted brain surface electrodes in human patients. These frameworks enable straightforward interpretation of single-pulse brain stimulation data, and can be applied generically to explore the diverse milieu of interactions that comprise the connectome.

Speaker:

Dora Hermes Miller, Ph.D., studies the signals measured in the living human brain in order to identify biomarkers of neurological and neuropsychiatric diseases and develop neuroprosthetics to interface with the brain. There are many different ways to measure the function of the living human brain, such as magnetic resonance imaging (MRI) and field potential recordings. Dr. Hermes uses a human systems neuroscience approach, including multimodal imaging and computational modeling, to advance fundamental understanding of the signals that can be measured in the human brain. Electrical stimulation and brain-machine interfaces are used to influence neuronal population activity and understand whether it is possible to restore typical brain function in neurological and neuropsychiatric diseases.

Dr. Hermes collaborates with neurologists and neurosurgeons to understand the extent to which smart sensing (predicting and stimulating brain activity) can result in new therapeutic devices that control epileptic brain activity.

Kai J. Miller, Ph.D., M.D., Ph.D., is a pediatric and adult neurosurgeon who specializes in brain tumors, epilepsy, and deep-brain stimulation. He manages patients in both in the operating room and in the outpatient setting. His research involves measuring electrical brain activity with patients undergoing therapy with implanted electrodes, to understand brain circuit dynamics and develop new therapies. In addition to brain tumors, epilepsy, and deep brain stimulation, Dr. Miller manages and treats general pediatric neurosurgical conditions.
On top of his active clinical and research programs, Dr. Miller teaches graduate students and mentors neurosurgical residents.

Availability of green Energy controls IT

Home >

Availability of green Energy controls IT

Flexible adjustment of computing workloads improves the carbon footprint of data centers

More and more data centers are connected to microgrids, that provide renewable energy.
C: pixabay

The growing electricity demand of IT infrastructure will soon have a considerable impact on the environment. To reduce their carbon footprint, more and more computing systems are therefore connected to microgrids to gain direct access to renewable energy sources. However, the local availability of solar and wind energy is highly variable and requires consumers to timely adapt their consumption to the current supply. Researchers from the Berlin Institute for the Foundation of Learning and Data (BIFOLD) have developed a new admission control approach that accepts flexible workloads such as machine learning training jobs only if they can be computed relying solely on renewable excess energy. Their publication “Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads” will be presented at Euro-Par 2022.

As the demand for computing continues to grow year by year, so are operating expenses and the associated carbon emissions caused by consuming energy from the public power grid. “Data centers already account for more than 1% of global energy consumption and this number is expected to rise further – especially when considering the additional demand of emerging domains like the internet of things, edge and fog computing”, says Prof. Dr. Odej Kao. One approach towards more sustainable and cost-effective cloud and edge computing systems is directly equipping IT infrastructure with on-site renewable energy sources like solar or wind. However, especially smaller compute nodes, such as on-premise installations or edge data centers, are not always able to consume all generated power at all times, resulting in so-called excess energy. This problem can only partially be mitigated by energy storage or leveling consumption within microgrids.

The researchers propose to schedule flexible workloads.
C: P. Wiesner

To make better use of renewable excess energy at compute nodes, hence reducing associated carbon emissions and electricity costs, researchers of the TU Berlin and University of Glasgow proposed a new approach to schedule flexible workloads, meaning workloads that tolerate some delay in execution. Flexible workloads are common in cloud environments but can also occur in otherwise time-critical edge computing environments. For example, in many edge intelligence scenarios like traffic management, devices adapt to continuously evolving environments by iteratively re-training local machine learning models on new data. Yet, the exact time and extent of such training jobs is subject to some flexibility.

In their publication, Philipp Wiesner, Thorsten Wittkopp and the BIFOLD researchers Dominik Scheinert, Dr. Lauritz Thamsen and Prof. Dr. Odej Kao suggest that under certain conditions flexible workloads should be upfront rejected by a computing system if they cannot be computed using renewable excess energy only. Their proposed system forecasts the free computational capacity of a node as well as the expected energy consumption and production, to determine if additional jobs can be accepted to a node’s workload queue without violating any deadlines and without consuming energy from the public grid. Using probabilistic forecasting, the admission control can be configured towards being strict and only accepting workloads that are almost guaranteed to run on green energy, or towards being more relaxed and not completely ruling out potential grid power usage, if this allows more workloads to be processed.

“Increased deployment of renewables requires flexibility of electricity consumers, which is not always easy to coordinate in highly distributed and heterogeneous systems,” explains Philipp Wiesner, “We expect our approach to be an integral building block in exploiting the varying availability of renewable energy in computing through local decision-making.”

The publication in detail:

Philipp Wiesner, Dominik Scheinert, Thorsten Wittkopp, Lauritz Thamsen, Odej Kao: Proceedings of the 28th International European Conference on Parallel and Distributed Computing (Euro-Par): „Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads

Abstract

The growing electricity demand of cloud and edge computing increases operational costs and will soon have a considerable impact on the environment. A possible countermeasure is equipping IT infrastructure directly with on-site renewable energy sources. Yet, particularly smaller data centers may not be able to use all generated power directly at all times, while feeding it into the public grid or energy storage is often not an option. To maximize the usage of renewable excess energy, we propose Cucumber, an admission control policy that accepts delay tolerant workloads only if they can be computed within their deadlines without the use of grid energy. Using probabilistic forecasting of computational load, energy consumption, and energy production, Cucumber can be configured towards more optimistic or conservative admission.
We evaluate our approach on two scenarios using real solar production forecasts for Berlin, Mexico City, and Cape Town in a simulation environment. For scenarios where excess energy was actually available, our results show that Cucumber’s default configuration achieves acceptance rates close to the optimal case and causes 97.0 % of accepted workloads to be powered using excess energy, while more conservative admission results in 18.5 % reduced acceptance at almost zero grid power usage.

Two Demo Papers accepted at VLDB

Home >

Two Demo Papers accepted at VLDB

Two demo papers accepted at VLDB

Two demonstration papers of BIFOLD researchers have been accepted at the 48th International Conference on Very Large Databases (VLDB). The VLDB 2022 will take place in Sydney, Australia (and hybrid) in September 05-09, 2022.

DORIAN in action: Assisted Design of Data Science Pipelines

Authors: Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, Volker Markl

Abstract: Existing automated machine learning solutions and intelligent discovery assistants are popular tools that facilitate the end-user with the design of data science (DS) pipelines. However, they yield limited applicability for a wide range of real-world use cases and application domains due to (a) the limited support of DS tasks; (b) a small, static set of available operators; and (c) restriction to evaluation processes with objective loss functions. We demonstrate DORIAN, a human-in-the-loop approach for the assisted design of data science pipelines that supports a large and growing set of DS tasks, operators, and arbitrary user-defined evaluation procedures. Based on the user query, i.e., a dataset and a DS task, DORIAN computes a ranked list of candidate pipelines that the end-user can choose from, alter, execute and evaluate. It stores executed pipelines in an experiment database and utilizes similarity-based search to identify relevant previously-run pipelines from the experiment database. DORIAN also takes user interaction into account to improve suggestions over time. We show how users can interact with DORIAN to create and compare DS pipelines on various real-world DS tasks we extracted from OpenML.

Satellite Image Search in AgoraEO

Authors: Ahmet Kerem Aksoy, Pavel Dushev, Eleni Tzirita Zacharatou, Holmer Hemsen, Marcela Charfuelan, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, Volker Markl

Abstract: The growing operational capability of global Earth Observation (EO) creates new opportunities for data-driven approaches to understand
and protect our planet. However, the current use of EO archives is very restricted due to the huge archive sizes and the limited exploration capabilities provided by EO platforms. To address this limitation, we have recently proposed MiLaN, a content-based image retrieval approach for fast similarity search in satellite image archives. MiLaN is a metric-learning-based deep hashing network that encodes high-dimensional image features into compact binary hash codes. We then use these codes as keys in a hash table to enable real-time nearest neighbor search and highly accurate retrieval. In this demonstration, we showcase the efficiency of MiLaN by integrating it with EarthQube, a browser and search engine within AgoraEO. EarthQube supports interactive visual exploration and Query-by-Example over satellite image repositories. Demo visitors will interact with EarthQube playing the role of different users that search images in a large-scale remote sensing archive by their semantic content and apply other filters.

Advanced urban remote sensing

Home >

Advanced urban remote sensing

BIFOLD Lecture: Advanced urban remote sensing

Prof. Dr. Paolo Gamba from the University of Pavia, Italy, will visit BIFOLD to give an invited 8h lecture on the week of June 13-17. The former President of the IEEE Geoscience and Remote Sensing Society is guest of Prof. Dr. Begüm Demir, BIFOLD group lead and Professor for Remote Sensing Image Analysis at TU Berlin. The lecture will introduce Earth observation (EO) data sources and their characteristics for urban monitoring.
The courses are open for everybody and take place in different locations on TU Berlin’s main campus. View campus map here.

Date, Time & Location: 

June, 13, 10-11.30 am, H 2032
June, 16, 10-11.30 am, HE 101
June, 17, 10-11.30 am, A 151 & 11.45 am – 1.15 pm, A 151

Content:

  • Introduction to urban remote sensing: principles of urban remote sensing, analysis of urban areas at different scales, using different sensors and aiming at different applications.
  • Urban area extraction using radar data: methods and techniques for urban area extraction using radar data, starting with coarse resolution data and moving to high resolution and very high resolution data. Applications at the global, regional and local data. Fusion of multiple SAR data sets on the same area.
  • Urban area extraction using multispectral data: techniques for urban area extraction from Landsat and Sentinel-2 data. Challenges related to global data set analysis and processing. The possibilities offered by cloud computing (with examples from Google Earth Engine).
  • Urban data fusion: how to use multiple sensors, how to select and integrate multiple data sources, how to exploit social media, image from the ground, sensor networks and geospatial data.
  • Urban area risk applications: examples of risk application in urban area considering remote sensing data and extracted information as an input. Focus on disaster exposure, vulnerability and damage assessment.

Speaker:

Paolo Gamba is Professor at the University of Pavia, Italy, where he leads the Telecommunications and Remote Sensing Laboratory. He received the Laurea degree in Electronic Engineering “cum laude” from the University of Pavia, Italy, in 1989, and the Ph.D. in Electronic Engineering from the same University in 1993.
He served as Editor-in-Chief of the IEEE Geoscience and Remote Sensing Letters from 2009 to 2013, and as Chair of the Data Fusion Committee of the IEEE Geoscience and Remote Sensing Society (GRSS) from October 2005 to May 2009. He has been elected in the GRSS AdCom since 2014, served as GRSS President from 2019 to 2020, and is currently GRSS Junior Past President.
He has been the organizer and Technical Chair of the biennial GRSS/ISPRS Joint Workshops on “Remote Sensing and Data Fusion over Urban Areas” from 2001 to 2015. He also served as Technical Co-Chair of the 2010, 2015 and 2020 IGARSS conferences, in Honolulu (Hawaii), Milan (Italy), and on-line, respectively.
He has been invited to give keynote lectures and tutorials in several occasions about urban remote sensing, data fusion, EO data for physical exposure and risk management. He published more than 170 papers in international peer-review journals and presented 310 research works in workshops and conferences.

Prof. Paolo Gamba
Copyright: private

Beyond Explainable AI

Home >

Beyond Explainable AI

Wojciech Samek and Klaus-Robert Mueller published new book on Explainable AI

To tap the full potential of artificial intelligence, not only do we need to understand the decisions it makes, these insights must also be made applicable. This is the aim of the new book “xxAI – Beyond Explainable AI”, edited by Wojciech Samek, head of the Artificial Intelligence department at the Fraunhofer Heinrich Hertz Institute (HHI) and BIFOLD researcher and Klaus-Robert Mueller, professor of machine learning at the Technical University of Berlin (TUB) and co-director at BIFOLD. The publication is based on a workshop held during the International Conference on Machine Learning in 2020. Co-editors also include AI experts Andreas Holzinger, Randy Goebel, Ruth Fong and Taesep Moon. It is already the second publication by Samek and Mueller.

Following the great resonance of the editors’ first book, “Explainable AI: Interpreting, Explaining and Visualizing Deep Learning” (2019), which presented an overview of methods and applications of Explainable AI (XAI) and racked up over 300,000 downloads worldwide, their new publication goes a step further. It provides an overview of current trends and developments in the field of XAI. In one chapter, for example, Samek and Mueller’s team shows that XAI concepts and methods developed for explaining classification problems can also be applied to other types of problems. When solving classification problems, the target variables sought are categorical, such as “What color is the traffic light right now, red, yellow, or green?”. XAI techniques for solving these problems can help explain problems in unsupervised learning, reinforcement learning, or generative models. Thus, the authors expand the horizons of previous XAI research and provide researchers and developers with a set of new tools that can be used to explain a whole new range of problem types and models.

The book is available free of charge.
C: Fraunhofer HHI

As the title “Beyond Explainable AI” suggests, the book also highlights solutions regarding the practical application of insights from methodological aspects to make models more robust and efficient. While previous research has focused on the process from AI as a “black box” to explaining its decisions, several chapters in the new book address the next step, toward an improved AI model. Furthermore, other authors reflect on their research not only in their own field of work, but also in the context of society as a whole. They cover a variety of areas that go far beyond classical XAI research. For example, they address the relationships between explainability and fairness, explainability and causality, and legal aspects of explainability.

The book is available free of charge here.

BIFOLD Colloquium 2022/05/20

Home >

BIFOLD Colloquium 2022/05/20

Machine Learning for Remote Sensing Applications powered by Modular Supercomputing Architectures

Speaker: Dr. Gabriele Cavallaro, Forschungszentrum Jülich

Venue: TU Berlin, Architekturgebäude, Straße des 17. Juni 152, 10623 Berlin, Room: A151

Date & time: May 20/2022, 2 pm

Title: Machine Learning for Remote Sensing Applications powered by Modular Supercomputing Architectures

Abstract:
Supercomputers are unique computing environments with extremely high computational capabilities. They are able to solve problems and perform calculations which require more speed and power than traditional computers are capable of. In particular, they represent a concrete solution for data-intensive applications as they can boost the performance of processing workflows with more efficient access to and scalable processing of extremely large data sets.
This talk will first give an overview of the work and research activities of the ‘‘AI and ML for Remote Sensing’’ Simulation and Data Lab hosted at the Jülich Supercomputing Centre (JSC). Then, it will introduce the Modular Supercomputing Architecture (MSA) systems that are operated by the JSC. An MSA is a computing environment that integrates heterogeneous High Performance Computing (HPC) systems, which can include different types of accelerators (e.g., GPUs, FPGAs) and cutting-edge computing technologies (e.g., quantum and neuromorphic computing) and that is “modularized” by its software stack. The presentation will finally include different examples from Remote Sensing applications that can exploit MSA to drastically reduce the time to solution and provide users with timely and valuable information.

Speaker:
Dr. Gabriele Cavallaro is the Head of the ‘‘AI and ML for Remote Sensing’’ Simulation and Data Lab at the Jülich Supercomputing Centre, Forschungszentrum Jülich Germany. He is currently the Chair of the ‘‘High-Performance and Disruptive Computing in Remote Sensing’’ (HDCRS) Working Group of the IEEE GRSS ESI Technical Committee and Visiting Scientist at the Φ-Lab of the European Space Agency. His research interests cover remote sensing data processing with parallel machine learning methods that scale on cutting-edge distributed computing technologies.


ICDE 2022 Best Demo Award

Home >

ICDE 2022 Best Demo Award

A framework to efficiently create training data for optimizers

A demo paper co-authored by a group of BIFOLD researchers on “Farming Your ML-based Query Optimizer’s Food” presented at the virtual conference ICDE 2022 has won the best demo award. The award committee members have unanimously chosen this demonstration based on the relevance of the problem, the high potential of the proposed approach and the excellent presentation.

As machine learning is becoming a core component in query optimizers, e.g., to estimate costs or cardinalities, it is critical to collect a large amount of labeled training data to build this machine learning models. The training data should consist of diverse query plans with their label (execution time or cardinality). However, collecting such a training dataset is a very tedious and time-consuming task: It requires both developing numerous plans and executing them to acquire ground-truth labels. The latter can take days if not months, depending on the size of the data.

In a research paper presented last year at SIGMOD 2021 the authors presented DataFarm, a framework for efficiently creating training data for optimizers with learning-based components. This demo paper extends DataFarm with an intuitive graphical user interface which allows users to get informative details of the generated plans and guides them through the generation process step-by-step. As an output of DataFarm, users can download both the generated plans to use as a benchmark and the training data (jobs with their labels).

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

The publication in detail:

Robin van de Water, Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Volker Markl: Farming Your ML-based Query Optimizer’s Food (to appear)

Abstract

Machine learning (ML) is becoming a core component
in query optimizers, e.g., to estimate costs or cardinalities.
This means large heterogeneous sets of labeled query plans or
jobs (i.e., plans with their runtime or cardinality output) are
needed. However, collecting such a training dataset is a very
tedious and time-consuming task: It requires both developing
numerous jobs and executing them to acquire ground-truth
labels. We demonstrate DATAFARM, a novel framework for
efficiently generating and labeling training data for ML-based
query optimizers to overcome these issues. DATAFARM enables
generating training data tailored to users’ needs by learning from
their existing workload patterns, input data, and computational
resources. It uses an active learning approach to determine a
subset of jobs to be executed and encloses the human into
the loop, resulting in higher quality data. The graphical user
interface of DATAFARM allows users to get informative details
of the generated jobs and guides them through the generation
process step-by-step. We show how users can intervene and
provide feedback to the system in an iterative fashion. As an
output, users can download both the generated jobs to use as a
benchmark and the training data (jobs with their labels).