BIFOLD Co-Director Prof. Volker Markl named 2020 ACM Fellow

Home >

BIFOLD Co-Director Prof. Volker Markl named 2020 ACM Fellow

BIFOLD Co-Director Prof. Volker Markl named 2020 ACM Fellow

The Association for Computing Machinery (ACM), the largest and oldest international association of computer scientists, has named Prof. Dr. Volker Markl, Co-Director of the Berlin Institute for the Foundations of Learning and Data (BIFOLD), Head of the Department of Database Systems and Information Management (DIMA) at TU Berlin and Head of Intelligent Analytics for Massive Data (IAM) at the German Research Center for Artificial Intelligence as ACM Fellow 2020.

Less than one percent of all worldwide ACM members are recognized as ACM Fellows for their outstanding achievements each year since 1993. For 2020, the ACM awarded 95 fellowships, only three of which went to Germany.
Volker Markl received this distinction for his contributions to query optimization, scalable data processing and data programmability. He is one of 22 German scientists who have been honored by the ACM so far.

Prof. Dr. Volker Markl
(Copyright: TU Berlin /PR/Simon)

“I am very happy about this rare honor and international recognition of my work.”

Prof. Dr. Volker Markl

Approximately 100,000 members belong to the ACM worldwide. The prestigious ACM Fellowship recognizes those members who have made outstanding contributions to computing and information technology and/or outstanding service to ACM and the computational science community.

To learn more about the 2020 ACM Fellows, please visit https://www.acm.org/media-center/2021/january/fellows-2020.

For information in German please read the official press release of TU Berlin.

BIFOLD Research into ML for Molecular Simulation is among the 2020 Most Downloaded Annual Reviews Articles

Home >

BIFOLD Research into ML for Molecular Simulation is among the 2020 Most Downloaded Annual Reviews Articles

BIFOLD Research into ML for Molecular Simulation is among the 2020 Most Downloaded Annual Reviews Articles

The paper “Machine Learning for Molecular Simulation” by BIFOLD Co-Director Prof. Dr. Klaus-Robert Müller, Principal Investigator Prof. Dr. Frank Noé and colleagues was among the top 10 most downloaded physical science articles of Annual Reviews in 2020.

Machine Learning has a growing influence in the physical sciences. In 2020 BIFOLD researchers contributed to major scientific advances, especially in the field of Machine Learning for quantum chemistry. As a result of cooperations on a national and international level BIFOLD researchers achieved i.e. a scientific breakthrough by proposing a reinforced learning method to separate and move single molecules out of a structure, developed a Deep Learning method to solve Schroedingers equation more accurately and leveraged Machine Learning to achieve high quantum chemical accuracy from density functional approximations.

The paper “Machine Learning for Molecular Simulation” by Frank Noé Alexandre Tkatchenko, Klaus-Robert Müller and Cecilia Clementi is another example of a high impact publication in quantum mechanics. The authors review Machine Learning methods for molecular simulation with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics.
The paper is among the most downloaded Annual Reviews articles of 2020, specifically one of the ten most downloaded physical science articles. Both Prof. Dr. Klaus-Robert Müller and Prof. Dr. Frank Noé were recently also featured in the Clarivate™ 2020 Highly Cited Researchers™ list, emphasizing their leading role in the international research community in the interdisciplinary area of computer science and chemistry.

The paper in detail:

Machine Learning for Molecular Simulation

Authors:
Frank Noé Alexandre Tkatchenko, Klaus-Robert Müller, Cecilia Clementi

Abstract:
Machine learning (ML) is transforming all areas of science. The complex and time-consuming calculations in molecular simulations are particularly suitable for an ML revolution and have already been profoundly affected by the application of existing ML methods. Here we review recent ML methods for molecular simulation, with particular focus on (deep) neural networks for the prediction of quantum-mechanical energies and forces, on coarse-grained molecular dynamics, on the extraction of free energy surfaces and kinetics, and on generative network approaches to sample molecular equilibrium structures and compute thermodynamics. To explain these methods and illustrate open methodological problems, we review some important principles of molecular physics and describe how they can be incorporated into ML structures. Finally, we identify and describe a list of open challenges for the interface between ML and molecular simulation

Publication:
Frank Noé Alexandre Tkatchenko, Klaus-Robert Müller, Cecilia Clementi: Machine Learning for Molecular Simulation.Annual Review of Physical Chemistry, Vol. 71: 361-390
https://doi.org/10.1146/annurev-physchem-042018-052331

Resilient Data Management for the Internet of Moving Things: TU Berlin and DFKI Paper was Accepted at BTW 2021

Home >

Resilient Data Management for the Internet of Moving Things: TU Berlin and DFKI Paper was Accepted at BTW 2021

Resilient Data Management for the Internet of Moving Things: TU Berlin and DFKI Paper was Accepted at BTW 2021

The paper “Towards Resilient Data Management for the Internet of Moving Things” by Elena Beatriz Ouro Paz, Eleni Tzirita Zacharatou and Volker Markl was accepted for presentation at the 19. Fachtagung für Datenbanksysteme für Business, Technologie und Web (BTW 2021) on September 20 – 24, 2021. Following the acceptance of a paper on fast CSV loading using GPUS, this is the second paper by researchers from the Database Systems and Information Management (DIMA) group at TU Berlin and the Intelligent Analytics for Massive Data (IAM) group at DFKI that will be presented at BTW 2021.
BTW is the leading database conference in the german-speaking area. For more Information on the conference, please visit https://sites.google.com/view/btw-2021-tud/.

Abstract:
Mobile devices have become ubiquitous; smartphones, tablets and wearables are essential commodities for many people. The ubiquity of mobile devices combined with their ever increasing capabilities, open new possibilities for Internet-of-Things (IoT) applications where mobile devices act as both data generators as well as processing nodes. However, deploying a stream processing system (SPS) over mobile devices is particularly challenging as mobile devices change their position within the network very frequently and are notoriously prone to transient disconnections. To deal with faults arising from disconnections and mobility, existing fault tolerance strategies in SPS are either checkpointing-based or replication-based. Checkpointing-based strategies are too heavyweight for mobile devices, as they save and broadcast state periodically, even when there are no failures. On the other hand, replication-based strategies cannot provide fault tolerance at the level of the data source, as the data source itself cannot be always replicated. Finally, existing systems exclude mobile devices from data processing upon a disconnection even when the duration of the disconnection is very short, thus failing to exploit the computing capabilities of the offline devices. This paper proposes a buffering-based reactive fault tolerance strategy to handle transient disconnections of mobile devices that both generate and process data, even in cases where the devices move through the network during the disconnection. The main components of our strategy are: (a) a circular buffer that stores the data which are generated and processed locally during a device disconnection, (b) a query-aware buffer replacement policy, and (c) a query restart process that ensures the correct forwarding of the buffered data upon re-connection, taking into account the new network topology. We integrate our fault tolerance strategy with NebulaStream, a novel stream processing system specifically designed for the IoT. We evaluate our strategy using a custom benchmark based on real data, exhibiting reduction in data loss and query latency compared to the baseline NebulaStream.

A preprint version of the paper (PDF) is available for download.

TU Berlin, DFKI and NUS Paper on Parallelizing Intra-Window Join on Multicores was Accepted at SIGMOD 2021

Home >

TU Berlin, DFKI and NUS Paper on Parallelizing Intra-Window Join on Multicores was Accepted at SIGMOD 2021

TU Berlin, DFKI and NUS Paper on Parallelizing Intra-Window Join on Multicores was Accepted at SIGMOD 2021

The paper “Parallelizing Intra-Window Join on Multicores: An Experimental Study” by Shuhao Zhang, Yancan Mao, Jiong He, Philipp Grulich, Steffen Zeuch, Bingsheng He, Richard Ma and Volker Markl was accepted for presentation at the ACM SIGMOD/PODS International Conference on Management of Data (SIGMOD/PODS 2021), which will take place from June 20 – 25, 2021 in Xi’an, China. This work is the result of a collaboration between researchers from the Database Systems and Information Management (DIMA) group at TU Berlin, the Intelligent Analytics for Massive Data (IAM) group at DFKI, the Department of Computer Science at the National University of Singapore and ByteDance.
The annual ACM SIGMOD/PODS Conference is a leading international forum for database researchers, practitioners, developers, and users to explore cutting-edge ideas and results, and to exchange techniques, tools, and experiences in all aspects of data management. To learn more about SIGMOD/PODS, please visit https://2021.sigmod.org/.

Abstract:
The intra-window join (IaWJ), i.e., joining two input streams over a single window, is a core operation in modern stream processing applications. This paper presents the first comprehensive study on parallelizing the IaWJ on modern multicore architectures. In particular, we classify IaWJ algorithms into lazy and eager execution approaches. For each approach, there are further design aspects to consider, including different join methods and partitioning schemes, leading to a large design space. Our results show that none of the algorithms always performs the best, and the choice of the most performant algorithm depends on: (i) workload characteristics, (ii) application requirements, and (iii) hardware architectures. Based on the evaluation results, we propose a decision tree that can guide the selection of an appropriate algorithm.

A preprint version of the paper is available here.

Researchers at FU Berlin Solve Schroedingers Equation with new Deep Learning Method

Home >

Researchers at FU Berlin Solve Schroedingers Equation with new Deep Learning Method

Researchers at FU Berlin Solve Schroedingers Equation with new Deep Learning Method

BIFOLD Principal Investigator Prof. Dr. Frank Noé and Senior Researcher Dr. Jan Hermann of the Artificial Intelligence for the Sciences group at Freie Universität Berlin developed a new, exceptionally accurate and efficient method to solve the electronic Schroedinger equation. Their approach could have a significant impact on the future of quantum chemistry.

Prof. Noé’s and Dr. Hermann’s method, PauliNet, avoids the limitation of previous approaches. It is not only a more accurate way of representing the electronic wave function, but also includes physical properties into the deep neural network.

“Escaping the usual trade-off between accuracy and computational cost is the highest achievement in quantum chemistry. As yet, the most popular such outlier is the extremely cost-effective density functional theory. We believe that deep “Quantum Monte Carlo,” the approach we are proposing, could be equally, if not more successful. It offers unprecedented accuracy at a still acceptable computational cost.”

Dr. Jan Hermann

“Building the fundamental physics into the AI is essential for its ability to make meaningful predictions in the field. This is really where scientists can make a substantial contribution to AI, and exactly what my group is focused on.”

Prof. Dr. Frank Noé

For more information, please visit the official press release of FU Berlin (available also at phys.org).

The paper in detail:
Deep-neural-network solution of the electronic Schrödinger equation

Authors:
Jan Hermann, Zeno Schätzle, Frank Noé

Abstract:
The electronic Schrödinger equation can only be solved analytically for the hydrogen atom, and the numerically exact full configuration-interaction method is exponentially expensive in the number of electrons. Quantum Monte Carlo methods are a possible way out: they scale well for large molecules, they can be parallelized and their accuracy has, as yet, been only limited by the flexibility of the wavefunction ansatz used. Here we propose PauliNet, a deep-learning wavefunction ansatz that achieves nearly exact solutions of the electronic Schrödinger equation for molecules with up to 30 electrons. PauliNet has a multireference Hartree–Fock solution built in as a baseline, incorporates the physics of valid wavefunctions and is trained using variational quantum Monte Carlo. PauliNet outperforms previous state-of-the-art variational ansatzes for atoms, diatomic molecules and a strongly correlated linear H10, and matches the accuracy of highly specialized quantum chemistry methods on the transition-state energy of cyclobutadiene, while being computationally efficient.

Publication:
Hermann, J., Schätzle, Z. & Noé, F. Deep-neural-network solution of the electronic Schrödinger equation. Nat. Chem. 12, 891–897 (2020). https://doi.org/10.1038/s41557-020-0544-y

The BigEarthNet Archive now Contains Sentinel-1 Satellite Images

Home >

The BigEarthNet Archive now Contains Sentinel-1 Satellite Images

The BigEarthNet Archive now Contains Sentinel-1 Satellite Images

The satellite image benchmark archive BigEarthNet, developed by the Remote Sensing Image Analysis (RSIM) and Database Systems and Information Management (DIMA) groups at TU Berlin, has been enriched by Sentinel-1 image patches. This enhances its potential for deep learning with geo data.

The goal of BigEathNet is to create a benchmark archive for earth image patches of Sentinel satellites annotated with land-cover classes. The project receives funding by European Research Council under the ERC Starting Grant BigEarth and by BIFOLD.

© G. Sumbul, M. Charfuelan, B. Demir, V. Markl

The project’s main publication BigEarthNet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding“, co-authored by BIFOLD Co-Director Prof. Dr. Volker Markl and Principal Investigator Prof. Dr. Begüm Demir, focused on Sentinel-2 data. In December 2020, the archive has been enriched by matching Sentinel-1 images. The new version contains 590,326 pairs of Sentinel-1 and Sentinel-2 image patches to support research studies on multi-modal/cross-modal image classification, retrieval and search.

By now there are also several tools and datasets related to BigEarthNet developed by external researchers.

The paper:

BigEarthNet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding

Authors:
G. Sumbul, M. Charfuelan, B. Demir, V. Markl

Abstract:
This paper presents the BigEarthNet that is a new large-scale multi-label Sentinel-2 benchmark archive. The BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120×120 pixels for 10 m bands; ii) 60×60 pixels for 20 m bands; and iii) 20×20 pixels for 60 m bands. Unlike most of the existing archives, each image patch is annotated by multiple land-cover classes (i.e., multi-labels) that are provided from the CORINE Land Cover database of the year 2018 (CLC 2018). The BigEarth et is significantly larger than the existing archives in remote sensing (RS) and thus is much more convenient to be used as a training source in the context of deep learning. This paper first addresses the limitations of the existing archives and then describes the proper-ties of the BigEarthNet. Experimental results obtained in the framework of RS image scene classification problems show that a shallow Convolutional Neural Network (CNN) architecture trained on the BigEarthNet provides much higher ac-curacy compared to a state-of-the-art CNN model pre-trained on the ImageNet (which is a very popular large-scale bench-mark archive in computer vision). The BigEarthNet opens up promising directions to advance operational RS applications and research in massive Sentinel-2 image archives.

Publication:
G. Sumbul, M. Charfuelan, B. Demir, V. Markl, BigEarthNet: A Large-Scale Benchmark Archive for Remote Sensing Image Understanding“, IEEE International Geoscience and Remote Sensing Symposium, pp. 5901-5904, Yokohama, Japan, 2019

TU Berlin and DFKI Vision Paper on Data Science Ecosystem “Agora” was Accepted for Publication in SIGMOD Record

Home >

TU Berlin and DFKI Vision Paper on Data Science Ecosystem “Agora” was Accepted for Publication in SIGMOD Record

TU Berlin and DFKI Vision Paper on Data Science Ecosystem “Agora” was Accepted for Publication in SIGMOD Record

A vision paper by researchers of the Database Systems and Information Management group (DIMA) at TU Berlin and the Intelligent Analytics for Massive Data (IAM) group at DFKI was accepted for publication in SIGMOD Record. In their paper the authors describe their vision towards a unified ecosystem that brings together data, algorithms, models, and computational resources and provides them to a broad audience.

The Paper “Agora: Bringing Together Datasets, Algorithms, Models and More in a Unified Ecosystem [Vision]” by Jonas Traub, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz and Volker Markl presents the vision of the data management system Agora that operates in a heavily decentralized and dynamic environment where data, algorithms, and even compute resources are dynamically created, modified, and removed by different stakeholders. It aims to offer a flexible exchange of assets among users and thereby addresses the problem of a lock-in effect to the few data management system providers that currently can afford the large investments into the multitude of assets necessary for such a system.

More information on SIGMOD Record is available at https://sigmodrecord.org/.

The paper in detail:

Agora: Bringing Together Datasets, Algorithms, Models and More in a Unified Ecosystem [Vision]

Authors: Jonas Traub, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz and Volker Markl

Abstract:
Data science and artificial intelligence are driven by a plethora of diverse data-related assets, including datasets, data streams, algorithms, processing software, compute resources, and domain knowledge. As providing all these assets requires a huge investment, data science and artificial intelligence technologies are currently dominated by a small number of providers who can afford these investments. This leads to lock-in effects and hinders features that require a flexible exchange of assets among users. In this paper, we introduce Agora, our vision towards a unified ecosystem that brings together data, algorithms, models, and computational resources and provides them to a broad audience. Agora (i) treats assets as first-class citizens and leverages a fine-grained exchange of assets, (ii) allows for combining assets to novel applications, and (iii) flexibly executes such applications on available resources. As a result, it enables easy creation and composition of data science pipelines as well as their scalable execution. In contrast to existing data management systems, Agora operates in a heavily decentralized and dynamic environment: Data, algorithms, and even compute resources are dynamically created, modified, and removed by different stakeholders. Agora presents novel research directions for the data management community as a whole: It requires to combine our traditional expertise in scalable data processing and management with infrastructure provisioning as well as economic and application aspects of data, algorithms, and infrastructure.

Publication:
Accepted at SIGMOD Record
Preprint

PD Dr. Alexander Meyer appointed as a Professor for „Clinical Applications of AI and Data Science“ at Charité

Home >

PD Dr. Alexander Meyer appointed as a Professor for „Clinical Applications of AI and Data Science“ at Charité

PD Dr. Alexander Meyer Appointed as a Professor for „Clinical Applications of AI and Data Science“ at Charité

PD Dr. Med. Alexander Meyer, BIFOLD Associated Investigator and Chief Medical Information Officer at German Heart Center Berlin, was appointed as a W2 professor for „Clinical Applications of AI and Data Science“ at Charité Berlin.
His professorship will focus on the application of AI and Data Science in cardiovascular medicine. The Berlin Institute for the Foundations of Learning and Data supports the professorship with two full-time positions for research assistants.

PD Dr. Alexander Meyer

In the past, PD Dr. Alexander Meyer developed a Big Data system for the quality control of chirurgical procedures in real-time. More recently he also developed a recurrent neural network, which was able to predict kidney failure as a complication of heart surgery more accurate then human professionals.

The professorship at Charité will enable PD Dr. Meyer to intensify his research with a focus on systems to support medical decision-making and data-based patient models. The aim is to create an individual “digital twin” of each patient in order to plan and simulate the best possible therapy. Alexander Meyer’s teaching concept includes new courses such as “Digital Data in Medicine” or “Machine Learning and Big Data in Medicine”.

More information is available in the official press release by DHZB (in german).

BIFOLD PI Prof. Dr. Giuseppe Caire receives 2021 Leibniz Prize

Home >

BIFOLD PI Prof. Dr. Giuseppe Caire receives 2021 Leibniz Prize

BIFOLD PI Prof. Dr. Giuseppe Caire receives 2021 Leibniz Prize

BIFOLD’s Principal Investigator Prof. Dr. Giuseppe Caire, head of the Chair of Communications and Information Theory (CommIT) at TU Berlin, has been awarded the 2021 Gottfried-Wilhelm-Leibniz-Prize. On 10 December 2020, the German Research Foundation (DFG) announced the recipients of Germany’s most important research award. Each individual award is endowed with a maximum of 2.5 million euros annually. 

Prof. Dr. Giuseppe Caire

Prof. Dr. Giuseppe Caire is one of the most influential minds in the theoretical foundations of information and communication technology. His work on wireless and mobile communication had great impact on modern life. Recently he was listed as one of the 2020 Highly Cited Researchers alongside  BIFOLD Co-Director Prof. Klaus-Robert Müller and Principal Investigator Prof. Dr. Frank Noé.

In its announcement, the DFG acknowledged Caire’s many significant contributions, stating, “Giuseppe Caire has been awarded the Leibniz Prize for laying the foundation for key principles in information theory within the field of wireless modern communication and information technology. Among his many achievements, he developed the theoretical basis for the optimization of special modulation processes (Bi-Interleaved Coded Modulation, BICM), in which messages from a sender running over a noisy channel can be decoded at the receiver end nearly without error. Such processes are now standard in wireless communication. His most recent work on distributed caching systems, where information is saved in several places physically separate from one another, has led to entirely new findings in the field of information theory. Caire’s accomplishments also extend to technology transfer: Among other business ventures, he is co-founder of SpaceMUX, a Silicon Valley startup which developed technologies for wireless networks in companies.” 

More information is available in the official press release of TU Berlin.

New study by BIFOLD Researchers: How did COVID-19 impact internet traffic?

Home >

New study by BIFOLD Researchers: How did COVID-19 impact internet traffic?

New study by BIFOLD Researchers: How did COVID-19 impact internet traffic?

BIFOLD PIs Prof. Dr. Anja Feldmann and Prof. Dr. Georgios Smaragdakis (INET group at TU Berlin) published a research study on the impact of the COVID-19 pandemic on the Internet traffic in the Proceedings of the ACM Internet Measurement Conference (IMC ’20).

Figure1: During the first phase of the pandemic, web conferencing and gaming related traffic increased.
(© Anja Feldmann et al.)

For their study, Professors Feldmann and Smaragdakis and their colleagues (BENOCS, Brandenburg University of Technology, DE-CIX, ICSI, IMDEA Networks, Max Planck Institute for Informatics, Universidad Carlos III Madrid) analyzed massive network data from multiple Internet vantage points (residential and mobile networks, Internet exchange points, educational networks) in Europe and the US before, during, and after the lockdown. Their analysis sheds light on the surge in the Internet traffic and the traffic of applications related to teleworking and teleconferencing, characterizes the rapid changes on the Internet traffic profile, but also confirms that the Internet was able to cope well with the demand.

Figure 2: During the lockdown phase traffic on workday shows weekend-like patterns.
(© Anja Feldmann et al.)

The paper in detail:

The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic

Authors:
Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, Georgios Smaragdakis

Abstract:
Due to the COVID-19 pandemic, many governments imposed lock downs that forced hundreds of millions of citizens to stay at home. The implementation of confinement measures increased Internet traffic demands of residential users, in particular, for remote working, entertainment, commerce, and education, which, as a result, caused traffic shifts in the Internet core. In this paper, using data from a diverse set of vantage points (one ISP, three IXPs, and one metropolitan educational network), we examine the effect of these lockdowns on traffic shifts. We find that the traffic volume increased by 15-20% almost within a week–while overall still modest, this constitutes a large increase within this short time period. However, despite this surge, we observe that the Internet infrastructure is able to handle the new volume, as most traffic shifts occur outside of traditional peak hours. When looking directly at the traffic sources, it turns out that, while hypergiants still contribute a significant fraction of traffic, we see (1) a higher increase in traffic of non-hypergiants, and (2) traffic increases in applications that people use when at home, such as Web conferencing, VPN, and gaming. While many networks see increased traffic demands, in particular, those providing services to residential users, academic networks experience major overall decreases. Yet, in these networks, we can observe substantial increases when considering applications associated to remote working and lecturing.

Publication:
IMC ’20: Proceedings of the ACM Internet Measurement Conference October 2020 Pages 1–18 https://doi.org/10.1145/3419394.3423658

Preprint

Media coverage:

This work is featured at