Using Math to Reduce Energy Consumption

Home >

Using Math to Reduce Energy Consumption

Using Math to Reduce Energy Consumption

Prof. Dr. Klaus-Robert Müller
(© Christian Kielmann)

Klaus-Robert Müller, professor of Machine Learning at TU Berlin and Co-Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD), discusses computation time as a climate killer and his predictions for science in 80 years.

Professor Müller, in our conversation prior to this interview about your vision for the future of the computer on the 80th anniversary of the invention of the Z3, you mentioned energy conservation as one of the major challenges we face. Why is this?

The world’s computer centers are major emitters of CO2. Huge amounts of fossil energy are still being used to power them. More and more calculations are performed, and the computation time required for these is increasing. It is not enough for us to go on Fridays for Future marches. We all have to try to do something in the areas where we have direct influence.

So, the work of your research group focuses directly on this topic?

Yes, but even more so our research at the Berlin Institute for the Foundations of Learning and Data, or BIFOLD for short, which was set up in 2020 as a part of the federal government’s AI strategy.

Where do you see possible solutions to significantly reduce the energy consumption of computer centers?

Solving a known image recognition problem uses about as much energy as a four-person household over a period of three months. One approach is to save computation time by using a different mathematical method. This could reduce energy consumption to the level of a four-person household for two months while achieving the same result. A greater saving would of course be better. We need to develop energy-saving methods of computing for AI. Data traffic for distributed learning requires a great deal of energy, so we are also looking to minimize this. My team has been able to demonstrate how smart mathematical solutions can reduce the requirement for data transfer from 40 terabytes to 5 or 6 gigabytes. Getting a good result is no longer the only issue; how you achieve that result is becoming increasingly important.

What for you were the most important milestones in the development of the computer over the past 80 years?

For me, it all began with Konrad Zuse and the Z3. I am fascinated how this computer with its three arithmetic calculations and a memory of just 64 words was able to give rise to the supercomputer. In the 1950s and 60s, some people were still able to perform calculations faster than computers. At the beginning of the 90s, around the time I received my doctorate, the first workstations became available. These marked the end of the time that you had to to log on to a mainframe computer. In 1994 while working as a postdoc in the USA, I had the opportunity to perform calculations using such a supercomputer, the Connection Machine CM5. The most recent major step are graphics processing units, or GPUs for short. These graphic processors not only allow you to have a mini supercomputer at your daily disposal for a small cost, their architecture also makes them ideal for machine learning and training large neural network models. This has led to many new scientific developments, which today form part of our lives. It really fascinates me how we have progressed in such a short time from a situation where people could perform calculations faster than a computer to today where I have a supercomputer under my desk. Although supercomputers aren’t everything.

How do you mean?

Three decades ago, I published a paper with another student on the properties of a neural network. There was another researcher working on the same topic who, unlike us, had access to the Cray supercomputer. We had to do perform our calculations using a work station. Well, we adapted our algorithm to this hardware and were able to achieve the same results using a simple computer as our colleague with access to the Cray XMP. This was greeted with amazement in our field. What I am getting at is that you can sometimes achieve good results with simpler equipment if you use a little more creativity.

This year marks the 80th anniversary of the invention of the computer. Are you able to predict what may be possible in the area of machine learning, in other words your area of research, within the next 80 years?

What I would say is that machine learning will become a standard tool in industry and science, for the humanities as well as natural sciences and medicine. To make this happen, we have to now train a generation of researchers to not only use these tools but also understand their underlying principles so as to prevent improper use of machine learning and thus false scientific findings. This includes an understanding of big data, as the data volumes required in science are becoming ever larger. These two areas – machine learning and big data – will become more and more closely connected with each other as well as with their areas of application. And this brings me back to BIFOLD: We see both areas as a single entity linked to its applications and it is precisely on this basis that we have now started to train a new generation of researchers.

Interview: Sybille Nitsche

ICDE 2021 honors BIFOLD researchers with Best Paper Award

Home >

ICDE 2021 honors BIFOLD researchers with Best Paper Award

ICDE 2021 honors BIFOLD researchers with Best Paper Award

The 37. IEEE International Conference on Data Engineering (ICDE) 2021 honored the paper “Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance” of six BIFOLD researchers with the Best Paper Award. Gábor E. Gévay, Tilmann Rabl, Sebastian Breß, Lorand Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz and Volker Markl were honored during the award session of the conference on April 21, 2021.

In modern data analytics, companies often want to analyze large datasets. For example, a company might want to analyze its entire network of user interactions in order to better understand how its products are used. Scaling data analysis to large datasets is a widespread need in many different contexts. Modern dataflow systems, such as Apache Flink and Apache Spark are widely used to accomplish that need. But the kind of algorithms that are used for data analysis are getting more and more complex. Complex algorithms are often iterative in nature, meaning that they gradually refine the results by repeated execution of a computation. A well-known example is the PageRank algorithm, which is used for ranking the importance of nodes in a network, for example ranking websites in Google search results. Both dataflow systems Apache Flink and Apache Spark have weaknesses when implementing iterative algorithms: they are either hard to use, or have suboptimal performance.

This paper introduces a new system, which combines an easy-to-use language with efficient execution. It is able to keep the language simple by relying on techniques from the programming language research literature, in addition to the database and distributed systems research literature, which earlier systems relied on. The simpler language makes it easy for users to run advanced analytics on large datasets. This is important for data scientists, who can then concentrate on the analytics instead of needing to become experts on the internal workings of the systems.

The annual IEEE International Conference on Data Engineering (ICDE) is the flagship IEEE conference addressing research issues in designing, building, managing, and evaluating advanced data-intensive systems and applications. For over three decades, IEEE ICDE has been a leading forum for researchers, practitioners, developers, and users to explore cutting-edge ideas and to exchange techniques, tools, and experiences.

The paper in detail:
“Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance”

Authors:
Gábor E. Gévay, Tilmann Rabl, Sebastian Breß, Lorand Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz, Volker Markl

Abstract:
Modern data analysis tasks often involve control flow statements, such as iterations. Common examples are PageRank and K-means. To achieve scalability, developers usually implement data analysis tasks in distributed dataflow systems, such as Spark and Flink. However, for tasks with control flow statements, these systems still either suffer from poor performance or are hard to use. For example, while Flink supports iterations and Spark provides ease-of-use, Flink is hard to use and Spark has poor performance for iterative tasks. As a result, developers typically have to implement different workarounds to run their jobs with control flow statements in an easy and efficient way. We propose Mitos, a system that achieves the best of both worlds: it achieves both high performance and ease-of-use. Mitos uses an intermediate representation that abstracts away specific control flow statements and is able to represent any imperative control flow. This facilitates building the dataflow graph and coordinating the distributed execution of control flow in a way that is not tied to specific control flow constructs. Our experimental evaluation shows that the performance of Mitos is more than one order of magnitude better than systems that launch new dataflow jobs for every iteration step. Remarkably, it is also up to 10.5 times faster than Flink, which has native iteration support, while matching the ease-of-use of Spark.

Publication:
To be published in the Proceedings of the 37th IEEE International Conference on Data Engineering, ICDE 2021, April 19 – 22
Preprint

BTW 2021 Best Paper Award and Reproducibility Badge for TU Berlin Data Science Publication

Home >

BTW 2021 Best Paper Award and Reproducibility Badge for TU Berlin Data Science Publication

BTW 2021 Best Paper Award and Reproducibility Badge for TU Berlin Data Science Publication

The research paper “Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing” by Alexander Kumaigorodski, Clemens Lutz, and Volker Markl received the Best Paper Award of the 19th Symposium on Database Systems for Business, Technology and Web (BTW 2021). On top, the paper received the Reproducibility Badge, awarded for the first time by BTW 2021, for the high reproducibility of its results.

TU Berlin Master’s graduate Alexander Kumaigorodski and his co-authors from Prof. Dr. Volker Markl‘s Department of Database Systems and Information Management (DIMA) at TU Berlin and from the Intelligent Analytics for Massive Data (IAM) research area at the German Research Centre for Artificial Intelligence (DFKI) present a new approach to speed up loading and processing of tabular CSV data by orders of magnitude.

CSV is a very frequently used format for the exchange of structured data. For example, the City of Berlin publishes its structured datasets in the CSV format in the Berlin Open Data Portal. Such datasets can be imported into databases for data analysis. Accelerating this process allows users to handle the increasing amount of data and to decrease the time required for its data analysis. Each new generation of computer networks and storage media provides higher bandwidths and allows for faster reading times. However, current loading and processing approaches using main processors (CPU) cannot keep up with these hardware technologies and unnecessarily throttle loading times.

© Alexander Kumaigorodski

The procedure described in this paper uses a new approach where CSV data is read and processed by graphics processors (GPU) instead. The advantage of these graphics processors lies primarily in their strong parallel computing power and fast memory access. Using this approach, new hardware technologies can be fully made use of, e.g., NVLink 2.0 or InfiniBand with Remote Direct Memory Access (RDMA). In conclusion, CSV data can be read directly from main memory or the network and processed with multiple gigabytes per second.

The transparency of the tests performed and the independent confirmation of the results also led to the award of the first-ever BTW 2021 Reproducibility Badge. In the data science community, the reproducibility of research results is becoming increasingly important. It serves to verify results as well as to compare them with existing work and is thus an important aspect of scientific quality assurance. Leading international conferences have therefore already devoted special attention to this topic.

To ensure high reproducibility, the authors provided the reproducibility committee with source code, additional test data, and instructions for running the benchmarks. The execution of the tests was demonstrated in a live session and could then also be successfully replicated by a member of the committee. The Reproducibility Badge recognizes above all the good scientific practice of the authors.

The paper in detail:
“Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing”

Authors:
Alexander Kumaigorodski, Clemens Lutz, Volker Markl

Abstract:
Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.

Publication:
K.-U. Sattler et al. (Hrsg.): Datenbanksysteme für Business, Technologie und Web (BTW 2021),Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2021
https://doi.org/10.18420/btw2021-01

Tapping into Nature’s Wisdom

Home >

Tapping into Nature’s Wisdom

Tapping into Nature’s Wisdom

Cellulose biosensors are robust in practice

Electroencephalography (EEG), electrocardiography (ECG), electromyography (EMG) – all of these non-invasive medical diagnostic methods rely on an electrode to measure and record electrical signals or voltage fluctuations of muscle or nerve cells underneath the skin. Depending on the type of diagnostics, this can then be used to measure electrical brain waves, or the currents in the heart or muscles. Present methods use metal sensors which are attached to the skin using a special gel to ensure continuous contact. Researchers at the University of Korea and Technische Universität Berlin have now developed so-called biosensors made of the plant material cellulose. They not only offer better and more durable conductivity than conventional electrodes. They are also 100 percent natural, reusable, do not cause skin irritation like other gels and are biodegradable. The paper “Leaf inspired homeostatic cellulose biosensors” has now been published in the renowned journal Science Advances.

The crucial keyword for the new sensors is: homeostasis.  In biology, this refers to the maintenance of a state of equilibrium. This is how leaves, for example, regulate the osmotic pressure in their cells, i.e. how much water they store. This internal cell pressure depends on the water content of the neighboring cells, but also on the environment (dry or humid) and is constantly readjusted.

“Most people know the feeling of walking through a damp garden with bare feet. Leaves stick to the soles of our feet and simply don’t fall off, even when we move,” explains Professor Klaus-Robert Müller, head of the Machine Learning group at TU Berlin and director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD). “The reason that leaves cling so effectively to our skin is due to the swelling properties of cellulose, the material that the cell walls of plants are made of, and is based on the principle of homeostasis.”

Sensors modeled on the structure of leaves

The image shows the structure of the biosensors, which is based on the leaf structure.
(© University of Korea)

Until now, cellulose has mainly appeared as a material for synthesis or filtration. Because cellulose itself is not conductive, it seemed unsuitable as potential material for electrodes. When cellulose fibers are placed in salty water, however, they swell and show great electrical conduction properties.

Inspired by the structure of leaves, the researchers have developed, analyzed and tested biosensors consisting of two layers of cellulose fibers that resemble the leaf structure and can be saturated with salt water. On top of the cellulose material lies a carrier membrane which in turn docks to a metal electrode with a cable.

Recyclable, skin-friendly and biodegradable

During an Electroencephalography (EEG) electrodes measure and record electrical signals or voltage fluctuations of nerve cells underneath the skin. (© Pixabay)

“These sensors showed continuously high-quality electrophysiological signals in various applications such as EEG, EMG, and ECG. They adhere excellently to different skin types – without the need for a synthetic gel. They also demonstrate good adhesion properties under stress, for example with sweating or moving test subjects,” explains Müller. Furthermore, these sensors feature a high transmission quality, low electrical resistance (impedance) and little resistance variance during long-term measurements.

The researchers have already tested the sensors in various application scenarios and on different skin types. “We were also able to demonstrate the versatility and robustness of the biosensors in combination with machine learning algorithms that were tested in challenging real-world situations. Tests have been conducted with test persons riding a bicycle or playing a computer game with a brain-computer interface, meaning the subjects were moving during the measurement, which can potentially generate artifacts,” says Müller.

Other advantages of the biosensors: They allow mass production in a simple and cost-effective process, are recyclable, skin-friendly and biodegradable. Klaus-Robert Müller is convinced: “These homeostatic cellulose biosensors are suitable for a broad range of clinical and non-clinical applications.”

The publication in detail:
“Leaf inspired homeostatic cellulose biosensors”

Authors:
Ji-Yong Kim, Yong Ju Yun, Joshua Jeong, C.-Yoon Kim, Klaus-Robert Müller and Seong-Whan Lee

Abstract:
An incompatibility between skin homeostasis and existing biosensor interfaces inhibits long-term electrophysiological signal measurement. Inspired by the leaf homeostasis system, we developed the first homeostatic cellulose biosensor with functions of protection, sensation, self-regulation, and biosafety. Moreover, we find that a mesoporous cellulose membrane transforms into homeostatic material with properties that include high ion conductivity, excellent flexibility and stability, appropriate adhesion force, and self-healing effects when swollen in a saline solution. The proposed biosensor is found to maintain a stable skin-sensor interface through homeostasis even when challenged by various stresses, such as a dynamic environment, severe detachment, dense hair, sweat, and long-term measurement. Last, we demonstrate the high usability of our homeostatic biosensor for continuous and stable measurement of electrophysiological signals and give a showcase application in the field of brain-computer interfacing where the biosensors and machine learning together help to control real-time applications beyond the laboratory at unprecedented versatility.

Publication:
Science Advances 7(16), eabe7432
https://doi.org/10.1126/sciadv.abe7432

Further information is available from:

Prof. Dr. Klaus-Robert Müller
TU Berlin
Machine Learning
Tel.: 030 314-78621
Email: klaus-robert.mueller@tu-berlin.de

New workshop series “Trustworthy AI”

Home >

New workshop series “Trustworthy AI”

New workshop series “Trustworthy AI”

The AI for Good global summit is an all year digital event, featuring a weekly program of keynotes, workshops, interviews or Q&As. BIFOLD Fellow Dr. Wojciech Samek, head of department of Artificial Intelligence at Fraunhofer Heinrich Hertz Institute (HHI), is implementing a new online workshop series “Trustworthy AI” for this platform.

The AI for Good series is the leading action-oriented, global and inclusive United Nations platform on Artificial Intelligence (AI). The Summit is organized all year, always online, in Geneva by the International Telecommunication Union (ITU) – the United Nations specialized agency for information and communication technologies. The goal of the AI for Good series is to identify practical applications of AI and scale those solutions for global impact.

“AI systems have steadily grown in complexity, gaining predictivity often at the expense of interpretability, robustness and trustworthiness. Deep neural networks are a prime example of this development. While reaching ‘superhuman’ performances in various complex tasks, these models are susceptible to errors when confronted with tiny, adversarial variations of the input – variations which are either not noticeable or can be handled reliably by humans”

Dr. Wojciech Samek

The workshop series will discuss these challenges of current AI technology and will present new research aiming at overcoming these limitations and developing AI systems which can be certified to be trustworthy and robust.

The workshop series will cover the following topics:

  • Measuring Neural Network Robustness
  • Auditing AI Systems
  • Adversarial Attacks and Defences
  • Explainability & Trustworthiness
  • Poisoning Attacks on AI
  • Certified Robustness
  • Model and Data Uncertainty
  • AI Safety and Fairness

The first workshop is held by Nicholas Carlini, Research Scientist at Google AI, 25. March 2021, 5:00 pm CET: Trustworthy AI: Adversarially (non-)Robust Machine Learning.

Register here: https://itu.zoom.us/webinar/register/WN_md37GUoSTdiQTq92ZNVvDw

„European Data Sovereignty is a critical success factor“

Home >

„European Data Sovereignty is a critical success factor“

„European Data Sovereignty is a critical success factor“

Prof. Markl invited to speak on “Artificial Intelligence and Competitiveness” at EU Committee “AIDA”

On Tuesday, March 23, 2021, 09:00-12:00 CET, the European Committee Artificial Intelligence in a Digital Age (AIDA) is organizing a hearing on “AI and Competitiveness”. AIDA is a standing committee, established by the European Parliament to analyze the future impact of artificial intelligence in the digital age on the EU economy.

The event will address the following questions: What are the choices for regulatory frameworks to enabling the potential of AI solutions for increasing EU enterprises competitiveness?; How to build a competitive and innovative AI sector?; What are the EU enterprises challenges in entering AI markets, by developing and adopting competitive AI solutions?

BIFOLD Co-Director Prof. Dr. Volker Markl

BIFOLD Co-Director Prof. Dr. Volker Markl is invited to give an initial intervention for the second panel on “How to build a competitive and innovative AI sector? What are EU enterprises challenges in entering AI markets, by developing and adopting competitive AI solutions?” 

Prof. Markl has been actively promoting European Data Sovereignty and a data analysis infrastructure for years. For the AIDA hearing, he strengthens three key aspects:

1. “Most of the novel AI applications today are due to advances in machine learning (ML) and big data (BD) systems and technologies. Due to international competition, the EU needs to make massive investments in research to develop next generation ML methods and BD systems as well as in education to train our workforce in their use. In particular, we need to provide basic training in data literacy and computer science competencies, such as data programming, data management, and data intensive algorithms. These subjects need to be taught throughout our educational studies (from elementary education, through middle and high-school and beyond at universities, across all academic study programs).”

2. “Data is the new production factor for our economy. Europe needs to be competitive. We need an independent, technical infrastructure and ecosystem that will enable us to create, share and use both data and algorithms as well as storage and compute resources in an open and inclusive way – moving beyond North American and Chinese solutions. If Europe intends to shape the future of the digital world through its industry, we have to: (i) maintain digital sovereignty in AI, (ii) retain technical talent, (iii) facilitate data-driven business opportunities and citizen science and (iv) compete globally.”

3.“Member states need to bootstrap and create demand for such an ecosystem by enabling a holistic European solution. We must go beyond data exchange and multi-cloud considerations, like GAIA-X, but rather be centered around the development of an easy-to-use, integrated single platform to store data, host algorithms, and perform processing. The creation of such an ecosystem should avoid the complexity and cacophony of too many stakeholders. Instead, it should be developed by a single institution with a clear vision, mission and objectives. It should leverage economies of scale, follow software-hardware co-design principles, and take recent technological advances in networking, distributed systems, data management, and machine learning into consideration. Moreover, it should enable EU startups, companies, and EU citizens to share data and algorithms as well as compose, process, and offer AI applications in open and protected spaces.”

The hearing will be publicly available via webstream. More information is available at https://www.europarl.europa.eu/committees/en/aida-hearing-on-ai-and-competitiveness/product-details/20210210CAN59709.

Making the use of AI systems safe

Home >

Making the use of AI systems safe

Making the use of AI systems safe

BIFOLD Fellow Dr. Wojciech Samek and Luis Oala (Fraunhofer Heinrich Hertz Institute) together with Jan Macdonald and Maximilian März (TU Berlin) were honored with the award for “best scientific contribution” at this year’s medical imaging conference BVM. Their paper “Interval Neural Networks as Instability Detectors for Image Reconstructions” demonstrates how uncertainty quantification can be used to detect errors in deep learning models.

The award winners were announced during the virtual BVM (Bildverarbeitung für die Medizin) conference on March 9, 2021. The award for “best scientific contribution” is granted each year by the BVM Award Committee. It honors innovative research with a methodological focus on medical image processing in a medically relevant application context.

The interdisciplinary group of researchers investigated the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Limits in the understanding of an AI system’s behavior create risks for system failure. Hence, the identification of failure modes in AI systems is an important pre-requisite for their reliable deployment in medicine.

In a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-of-distribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates on two use cases how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. This is an important contribution to making the use of AI systems safer and more reliable.

The paper in detail:
“Interval Neural Networks as Instability Detectors for Image Reconstructions”

Authors:
Jan Macdonald, Maximilian März, Luis Oala, Wojciech Samek

Abstract:
This work investigates the detection of instabilities that may occur when utilizing deep learning models for image reconstruction tasks. Although neural networks often empirically outperform traditional reconstruction methods, their usage for sensitive medical applications remains controversial. Indeed, in a recent series of works, it has been demonstrated that deep learning approaches are susceptible to various types of instabilities, caused for instance by adversarial noise or out-ofdistribution features. It is argued that this phenomenon can be observed regardless of the underlying architecture and that there is no easy remedy. Based on this insight, the present work demonstrates, how uncertainty quantification methods can be employed as instability detectors. In particular, it is shown that the recently proposed Interval Neural Networks are highly effective in revealing instabilities of reconstructions. Such an ability is crucial to ensure a safe use of deep learning-based methods for medical image reconstruction.

Publication:
In: Bildverarbeitung für die Medizin 2021. Informatik aktuell. Springer Vieweg, Wiesbaden.
https://doi.org/10.1007/978-3-658-33198-6_79

Making the role of AI in Medicine explainable

Home >

Making the role of AI in Medicine explainable

Making the role of AI in Medicine explainable

Analysis system for the diagnosis of breast cancer

Researchers at TU Berlin and Charité – Universitätsmedizin Berlin as well as the University of Oslo have developed a new tissue-section analysis system for diagnosing breast cancer based on artificial intelligence (AI). Two further developments make this system unique: For the first time, morphological, molecular and histological data are integrated in a single analysis. Secondly, the system provides a clarification of the AI decision process in the form of heatmaps. Pixel by pixel, these heatmaps show which visual information influenced the AI decision process and to what extent, thus enabling doctors to understand and assess the plausibility of the results of the AI analysis. This represents a decisive and essential step forward for the future regular use of AI systems in hospitals. The results of this research have now been published in Nature Machine Intelligence.

Cancer treatment is increasingly concerned with the molecular characterization of tumor tissue samples. Studies are conducted to determine whether and/or how the DNA has changed in the tumor tissue as well as the gene and protein expression in the tissue sample. At the same time, researchers are becoming increasingly aware that cancer progression is closely related to intercellular cross-talk and the interaction of neoplastic cells with the surrounding tissue – including the immune system.

Image data provide high spatial detail

Although microscopic techniques enable biological processes to be studied with high spatial detail, they only permit a limited measurement of molecular markers. These are rather determined using proteins or DNA taken from tissue. As a result, spatial detail is not possible and the relationship between these markers and the microscopic structures is typically unclear. “We know that in the case of breast cancer, the number of immigrated immune cells, known as lymphocytes, in tumor tissue has an influence on the patient’s prognosis. There are also discussions as to whether this number has a predictive value – in other words if it enables us to say how effective a particular therapy is,” says Professor Dr. Frederick Klauschen from the Institute of Pathology at the Charité.

“The problem we have is the following: We have good and reliable molecular data and we have good histological data with high spatial detail. What we don’t have as yet is the decisive link between imaging data and high-dimensional molecular data,” adds Professor Dr. Klaus-Robert Müller, professor of machine learning at TU Berlin. Both researchers have been working together for a number of years now at the national AI center of excellence the Berlin Institute for the Foundations of Learning and Data (BIFOLD) located at TU Berlin.

Missing link between molecular and histological data

Determination of tumor-infiltrating lymphocytes (TiLs) using Explainable AI technology. Histological preparation of a breast carcinoma.
(© Frederick Klauschen)

Result of the AI process shows a so-called heatmap, which highlights the TiLs in red. Other tissue/cells: blue and green.
(© Frederick Klauschen)

It is precisely this symbiosis which the newly published approach makes possible. “Our system facilitates the detection of pathological alterations in microscopic images. Parallel to this, we are able to provide precise heatmap visualizations showing which pixel in the microscopic image contributed to the diagnostic algorithm and to what extent,” explains Müller. The research team has also succeeded in significantly further developing this process: “Our analysis system has been trained using machine learning processes so that it can also predict various molecular characteristics, including the condition of the DNA, the gene expression as well as the protein expression in specific areas of the tissue, on the basis of the histological images.

Next on the agenda are certification and further clinical validations – including tests in tumor routine diagnostics. However, Frederick Klauschen is already convinced of the value of the research: “The methods we have developed will make it possible in the future to make histopathological tumor diagnostics more precise, more standardized and qualitatively better.”

Publication:

Morphological and molecular breast cancer profiling through explainable machine learning, Nature Machine Intelligence

Further information can be obtained from:

Prof. Dr. Klaus-Robert Müller
TU Berlin
Maschinelles Lernen
Tel.: 030 314 78621
E-Mail: klaus-robert.mueller@tu-berlin.de

Prof. Dr. Frederick Klauschen
Charité – Universitätsmedizin Berlin
Institut für Pathologie
Tel.: 030 450 536 053
E-Mail: frederick.klauschen@charite.de

Categories
Allgemein

2020 Pattern Recognition Best Paper Award

Home >

2020 Pattern Recognition Best Paper Award

2020 Pattern Recognition Best Paper Award

A team of scientists from TU Berlin, Fraunhofer Heinrich Hertz Institute (HHI) and University of Oslo has jointly received the 2020 “Pattern Recognition Best Paper Award” and “Pattern Recognition Medal” of the international scientific journal Pattern Recognition. The award committee honored the publication “Explaining Nonlinear Classification Decisions with Deep Taylor Decomposition” by Dr. Grégoire Montavon and Prof. Dr. Klaus-Robert Müller from TU Berlin, Prof. Dr. Alexander Binder from University of Oslo, as well as Dr. Wojciech Samek and Dr. Sebastian Lapuschkin from HHI.

Dr. Grégoire Montavon with the 2020 Pattern Recognition Best Paper Award in hand.

The publication addresses the so-called black box problem. Machine Learning methods, in particular Deep Learning, successfully solve a variety of tasks. However, in most cases they fail to provide the information that has led to a particular decision. The paper tackles this problem by using a pixel-by-pixel decomposition of nonlinear classifications and evaluates the procedure in different scenarios. This method provides a theoretical framework for Explainable Artificial Intelligence (XAI) that is generally applicable. XAI is a major research field of the Berlin Institute for the Foundations of Learning and Data (BIFOLD), of which the authors from TU Berlin and HHI are members.

The award was presented to Grégoire Montavon in January 2021, during the virtual International Conference on Pattern Recognition (ICPR). The “Pattern Recognition Best Paper Award” is granted every two years. It recognizes a highly cited paper in the area of pattern recognition and its application areas such as image processing, computer vision and biometrics.

“We are very proud to receive this award and for our work to be highlighted within the global scientific community.”

Dr. Grégoire Montavon.

BIFOLD Fellow Dr. Wojciech Samek heads newly established AI research department at Fraunhofer HHI

Home >

BIFOLD Fellow Dr. Wojciech Samek heads newly established AI research department at Fraunhofer HHI

BIFOLD Fellow Dr. Wojciech Samek heads newly established AI research department at Fraunhofer HHI

Dr. Samek (l.) and Prof. Müller in front of an XAI demonstrator at Fraunhofer HHI. (Copyright: TU Berlin/Christian Kielmann)

The Fraunhofer Heinrich Hertz Institute (HHI) has established a new research department dedicated to “Artificial Intelligence”. The AI expert and BIFOLD Fellow Dr. Wojciech Samek, previously leading the research group “Machine Learning” at Fraunhofer HHI, will head the new department. With this move Fraunhofer HHI aims at expanding the transfer of its AI research on topics such as Explainable AI and neural network compression to the industry.

Dr. Wojciech Samek: “The mission of our newly founded department is to make today’s AI truly trustable and in all aspects practicable. To achieve this, we will very closely collaborate with BIFOLD in order to overcome the limitations of current deep learning models regarding explainability, reliability and efficiency.“

“Congratulations, I look forward to a continued successful teamwork with BIFOLD fellow Wojciech Samek, who is a true AI hot shot.”

BIFOLD Director Prof. Dr. Klaus-Robert Müller

The new department further strengthens the already existing close connection between basic AI research at BIFOLD and applied research at Fraunhofer HHI and is a valuable addition to the dynamic AI ecosystem in Berlin.

“The large Berlin innovation network centered around BIFOLD is unique in Germany. This ensures that the latest research results will find their way into business, science and society.”

BIFOLD Director Prof. Dr. Volker Markl