Paper on Novel Sketch Maintenance System Accepted in PVLDB

Home >

Paper on Novel Sketch Maintenance System Accepted in PVLDB

Paper on Novel Sketch Maintenance System Accepted for Publication in PVLDB Vol. 14

The Paper “Scotch: Generating FPGA-Accelerators for Sketching at Line Rate” by by Tu Berlin (DIMA) and DFKI (IAM) researchers Martin Kiefer, Ilias Poulakis, Sebastian Breß and Volker Markl will be featured in Proceedings of the VLDB Endowment (PVLDB), Volume 14.

In their paper, the authors propose Scotch, a novel system for accelerating sketch maintenance using the custom FPGA hardware. Scotch provides a domain-specific language for the user-friendly, high-level definition of a broad class of sketching algorithms. A code generator performs the heavy-lifting of hardware description, while an auto-tuning algorithm tunes the summary size. Our evaluation shows that FPGA accelerators generated by Scotch outperform CPU- and GPU-based sketching by up to two orders of magnitude in terms of throughput and up to one order of magnitude in terms of energy efficiency.

To learn more about PVLDB, please visit http://vldb.org/pvldb/

BIFOLD Database Systems Research Accepted at CIDR 2021

Home >

BIFOLD Database Systems Research Accepted at CIDR 2021

BIFOLD Database Systems Research Papers were Accepted at CIDR 2021

Researchers at the Database Systems and Information Management (DIMA) group at TU Berlin and the Intelligent Analytics for Massive Data (IAM) group at DFKI have been informed that their papers were accepted for presentation at the 11th Annual Conference on Innovative Data Systems Research (CIDR ’21) which will be held as a virtual event on January 11-15, 2021.

The vision paper „The Case for Distance-Bounded Spatial Approximations“ by Eleni Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy and Volker Markl advocates for approximate spatial data processing techniques that omit exact geometric tests and provide final answers solely on the basis of (fine-grained) approximations. Thanks to recent hardware advances, this vision can be realized today. Furthermore, these approximate techniques employ a distance-based error bound, i.e., a bound on the maximum spatial distance between false (or missing) and exact results which is crucial for meaningful analyses. This bound allows to control the precision of the approximation and trade accuracy for performance.
A preprint version is available here.

The demo paper “Semi-Supervised Data Cleaning with Raha and Baran” by Mohammad Mahdavi and Ziawash Abedjan demonstrate how two formerly developed systems, Raha and Baran, can be used within an end-to-end data cleaning pipeline. In practice, with a small number of 20 user-annotated tuples, it is possible to effectively identify and fix data quality problems inside a dataset. Furthermore, both systems benefit from knowledge on prior cleaning tasks. Using transfer learning, both systems can optimize the data cleaning task at hand in terms of error detection runtime and error connection effectiveness.
A preprint version is available here.

To learn more about CIDR 2021, please visit http://cidrdb.org/cidr2021/index.html

Research on ML in Quantum Science published in Nature Communications

Home >

Research on ML in Quantum Science published in Nature Communications

BIFOLD Research Paper on Machine Learning for Quantum Chemistry published in Nature Communications

The Paper “Quantum chemical accuracy from density functional approximations via machine learning” by Mihail Bogojeski, Leslie Vogt-Maranto, Mark E. Tuckerman, Klaus-Robert Müller and Kieron Burke was published in Nature Communications (2020)11:5223. In this paper, the authors from the Machine Learning group at TU Berlin, New York University and University of California leveraged Machine Learning to calculate coupled-cluster energies from DFT densities, reaching much better quantum chemical accuracy on test data than achieved with previous available methods. Moreover, their approach significantly reduced the amount of training data required.

The Paper in Detail:

Quantum chemical accuracy from density functional approximations via machine learning

Authors:
Mihail Bogojeski (TU Berlin), Leslie Vogt-Maranto (New York University), Mark E. Tuckerman (New York University), Klaus-Robert Müller (TU Berlin, Korea University, MPI), Kieron Burke (University of California)

Abstract:
Kohn-Sham density functional theory (DFT) is a standard tool in most branches of chemistry, but accuracies for many molecules are limited to 2-3 kcal ⋅ mol−1 with presently-available functionals. Ab initio methods, such as coupled-cluster, routinely produce much higher accuracy, but computational costs limit their application to small molecules. In this paper, we leverage machine learning to calculate coupled-cluster energies from DFT densities, reaching quantum chemical accuracy (errors below 1 kcal ⋅ mol−1) on test data. Moreover, density based Δ-learning (learning only the correction to a standard DFT calculation, termed Δ-DFT) significantly reduces the amount of training data required, particularly when molecular symmetries are included. The robustness of Δ-DFT is highlighted by correcting “on the fly” DFT-based molecular dynamics (MD) simulations of resorcinol (C6H4(OH)2) to obtain MD trajectories with coupled-cluster accuracy. We conclude, therefore, that Δ-DFT facilitates running gas-phase MD simulations with quantum chemical accuracy, even for strained geometries and conformer changes where standard DFT fails.

Publication:
NATURE COMMUNICATIONS | (2020)11:5223 | https://doi.org/10.1038/s41467-020-19093-1

Deep Reinforcement Learning enables Robot to Beat Humans in Olympic Sport

Home >

Deep Reinforcement Learning enables Robot to Beat Humans in Olympic Sport

Deep Reinforcement Learning enables Robot to Beat Humans in Olympic Sport

A Deep Reinforcement Learning framework, developed by one of BIFOLD’s directors, Prof. Dr. Klaus-Robert Müller, and his colleagues at the Department of Brain and Cognitive Engineering of the Korea University in Seoul, enabled the robot “Curly” to beat top-level athletes in the Olympic sport of curling. The work was recently featured in Nature Research Highlights.

„Curly“ on the playing field (© Korea University)

The sport of curling is a good testbed for the performance of AI in the real-world. On the one hand, the ice field on which the curling is performed is not overly complex and on the other hand, the conditions change constantly during the game. Also, the game’s timing rules do not allow for relearning while playing. Prof. Dr. Klaus-Robert Müller (Full Professor and Chair of the Machine Learning group at TU Berlin and Distinguished Professor at Korea University Seoul), Prof. Dr. Dong-Ok Won and Prof. Dr. Seong-Whan Lee (both Korea University of Seoul) met the challenge by designing an adaptive Reinforcement Learning framework which uses temporal features to deal with the uncertainties of the game.

All strategic decisions, planning, estimations in the synchronization between AI agents and robot control must be carried out not only in real time, but also under high uncertainties. At the same time, the data available to train the deep learning network is very limited. All in all, a huge challenge for modern AI.

Prof. Dr. Klaus-Robert Müller

Based on this framework, the robot “Curly” was able to beat top-level human curling players in three out of four games, after a short calibration phase. This human-like sports performance is an early but important step in physics-based real-world application of AI robots. The work was published in Robotics Science Vol. 5, Issue 46 and recently featured in Natur Research Highlights.

More information is available (in German) in the official press release of TU Berlin.

The Paper in Detail:

An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions

Authors: Dong-Ok Won, Klaus-Robert Müller, Seong-Whan Lee

Abstract:
The game of curling can be considered a good test bed for studying the interaction between artificial intelligence systems and the real world. In curling, the environmental characteristics change at every moment, and every throw has an impact on the outcome of the match. Furthermore, there is no time for relearning during a curling match due to the timing rules of the game. Here, we report a curling robot that can achieve human-level performance in the game of curling using an adaptive deep reinforcement learning framework. Our proposed adaptation framework extends standard deep reinforcement learning using temporal features, which learn to compensate for the uncertainties and nonstationarities that are an unavoidable part of curling. Our curling robot, Curly, was able to win three of four official matches against expert human teams [top-ranked women’s curling teams and Korea national wheelchair curling team (reserve team)]. These results indicate that the gap between physics-based simulators and the real world can be narrowed.

Published in: Science Robotics  23 Sep 2020: Vol. 5, Issue 46, eabb9764

Journal Article

Using Machine Learning to Combat the Coronavirus

Home >

Using Machine Learning to Combat the Coronavirus

Using Machine Learning to Combat the Coronavirus

A joint team of researchers from TU Berlin and the University of Luxembourg is exploring why a spike protein in the SARS-CoV-2 virus is able to bind much more effectively to human cells than other coronaviruses. Google.org is funding the research with 125,000 US dollars.

BIFOLD Principle Investigator Dr. Grégoire Montavon of TU Berlin’s Machine Learning group leads the project together with Professor Alexandre Tkatchenko of the University of Luxembourg. Using an approach which combines quantum mechanics and Machine Learning, their project seeks to gain a deeper understanding of the binding behavior of the novel coronavirus (SARS-CoV-2).

We are delighted to be receiving this funding and support for our fundamental research. Upon completion of the project, we will be publishing our findings for the entire scientific community.

Dr. Grégoire Montavon

The researchers are analyzing the mechanism which accounts for the unusually high binding affinity displayed by the SARS-CoV-2 spike protein regarding ACE2 human host cell receptors: Discovering this process would represent an important first step towards the future development of treatments for the coronavirus. Using precise, long-term simulations of the molecular dynamics of the spike protein and the human receptor, the researchers aim to gain a better understanding of how they interact.

More information is available in the press release of TU Berlin.

Berlin AI research presented at ELLIS inauguration

Home >

Berlin AI research presented at ELLIS inauguration

Prof. Dr. Müller presents Berlin’s AI research network at ELLIS Berlin inauguration

In a virtual inauguration event, the European Laboratory for Learning and Intelligent Systems (ELLIS) network welcomed new regional network units.  One of BIFOLD’s Directors, Prof. Dr. Klaus-Robert Müller, presented the AI research network in Berlin as well as BIFOLD’s approach of combining Machine Learning and Big Data research.

The European Laboratory for Learning and Intelligent Systems (ELLIS) aims to strengthen Machine Learning research across Europe by creating European research hotspots and higher education programs, connecting researchers and fostering knowledge transfer to the industry.
In April this year, a new ELLIS unit was established at Technische Universität Berlin, following a request by one of BIFOLD’s directors and head of Machine Learning group at TU Berlin, Prof. Dr. Klaus-Robert Müller, and other researchers. By now ELLIS comprises 30 research units in 14 different countries. In a virtual event on September 15, 2020, all units were inaugurated and given the opportunity to present their regional research focus.

In his function as director of ELLIS Berlin, Prof Dr. Klaus-Robert Müller presented the strengths of Machine Learning and AI research in Berlin. As the very unique feature he stressed the importance of research at the intersection of Machine Learning and Big Data in Berlin’s AI competence Center.

„It is time to rethink Machine Learning and database management systems to really connect them if we want to go to the next level of processing high amounts of data. And this is what we do here – from the theoretical and also from the practical side.”

Prof. Dr. Klaus-Robert Müller

As part of the theoretical focus, Prof. Müller highlighted core research activities in Explainable AI, Deep Learning and Scalable Machine Learning. But application-oriented research is also of great importance in Berlin’s competence centers. For example, research is concentrated on Machine Learning in quantum chemistry, medicine and even digital humanities. In addition to close networking with regional and national research competencies, the ELLIS unit Berlin and the Berlin competence centers maintain good contacts in Berlin’s flourishing start-up scene.

A recording of Prof. Müller’s presentation is available here: https://ellis.eu/units/berlin

TUB and DFKI presentations at VLDB 2020

Home >

TUB and DFKI presentations at VLDB 2020

TU Berlin and DFKI Database Systems Researchers Offer Multiple Presentations at VLDB 2020

Researchers at TU Berlin Database Systems and Information Management (DIMA) group and Intelligent Analytics for Massive Data (IAM) group at DFKI presented one full paper, one demo paper and three PhD thesis papers at the 46th International Conference on Very Large Databases (VLDB 2020).

Alexander Renz-Wieland presenting his paper „Dynamic Parameter Allocation in Parameter Servers“ at VLDB 2020.

The Paper “Dynamic Parameter Allocation in Parameter Servers” by Alexander Renz-Wieland et al. proposes to integrate dynamic parameter allocation into parameter servers, describes an efficient implementation of such a parameter server called Lapse, and experimentally compares its performance to existing parameter servers across a number of machine learning tasks.
To watch the recording of Alexander Renz-Wieland’s talk please visit: https://www.youtube.com/watch?v=aMSjPW8Dmc0
The paper is available here: https://www.vldb.org/pvldb/vol13/p1877-renz-wieland.pdf

In the demo paper “Demand-based Sensor Data Gathering with Multi-Query Optimization” Julius Hülsmann et al. demonstrate a technique for minimizing the number of network transmissions while maintaining the desired accuracy. The presented algorithm for read- and transmission sharing among queries goes hand-in-hand with state-of-the-art machine learning techniques for adaptive sampling. We 1. implement the technique and deploy it on a sensor node, 2. replay sensor-data from two real-world scenarios, 3. Provide an interface for submitting custom queries, and 4. Present an interactive dashboard. Here, visitors observe live statistics on the read- and transmission savings achieved in real-world use-cases. The dashboard also visualizes optimizations currently performed by the read scheduling procedure and hence conveys real-time insights and a deep understanding of the presented algorithm.
To watch the recording of Julius Hülsmann’s talk please visit: https://www.youtube.com/watch?v=ctpj-o3b4B4
The paper is available here: http://www.vldb.org/pvldb/vol13/p2801-hulsmann.pdf

At this year’s International Workshop on Very Large Internet of Things (VLIOT 2020), held in conjunction with VLDB 2020, Dr. Steffen Zeuch et al. presented their IoT analytics paper “NebulaStream: Complex analytics beyond the Cloud” . The goal of this paper is to bridge the gap between the requirements of upcoming IoT applications and the supported features of an IoT data management system. To this end, we outline how state-of-the-art SPEs have to change to exploit the new capabilities of the IoT and showcase how we tackle IoT challenges in our own system, NebulaStream. This paper lays the foundation for a new type of systems that leverages the IoT to enable large-scale applications over millions of IoT devices in highly dynamic and geodistributed environments.
To watch the recording of Steffen Zeuch’s talk please visit: https://www.youtube.com/watch?v=PCvihOXjhI8
The paper is available here: https://www.ronpub.com/ojiot/OJIOT_2020v6i1n07_Zeuch.html

Also in conjunction with VLDB 2020, the 2nd International Workshop on Large Scale Graph Data Analytics (LSGDA 2020) took place this year on September 04. Based on his paper „Distributed Graph Analytics with Datalog Queries in Flink„, co-authored by Gábor Gévay and Prof. Dr. Volker Markl, Muhammad Imran presented the Cog system that runs Datalog programs on Apache Flink. The authors implemented a parallel semi-naive evaluation algorithm that takes advantage of Flink’s delta iteration to propagate only the tuples that need to be processed for subsequent iterations. Flink’s delta iteration function reduces the overhead that occurs in acyclic data flow systems like Spark when evaluating recursive queries, making them more efficient. Their experiments show that Cog outperformed BigDatalog, the state-of-the-art distributed data-log evaluation system, in most tests.
To watch a recording of Muhammad Imran’s talk please visit: https://www.youtube.com/watch?v=Ozvr1wrQcy4
A preprint of the paper is available here: https://bit.ly/2HhzIsh

Additionally, three PhD students presented their work during the VLDB 2020 PhD workshop:

Serafeim Papadias : “Tunable Streaming Graph Embeddings at Scale”
Presentation | Paper
Kajetan Maliszewski : “Secure Data Processing at Scale”
Presentation | Paper
Ariane Ziehn : “Complex Event Processing for the Internet of Thing”
Paper

To learn more about VLDB 2020 please visit: https://vldb2020.org/

Molecule manipulation through reinforced learning

Home >

Molecule manipulation through reinforced learning

TUB and FZ Jülich researchers apply reinforced learning to robotic manipulation of molecules

In a cooperation between the Machine Learning group at TU Berlin, lead by Prof. Dr. Klaus-Robert Müller (Director BIFOLD) and Jülich’s Quantum Nanoscience institute, lead by Prof. Dr. Stefan Tautz, researchers enabled a robot to selectively grip and move single molecules from a layer, by applying reinforced learning. To achieve this, they let the machine learn in its real environment and a model in parallel. The corresponding paper was published in “Science Advances”.

This is the first time ever that we have succeeded in bringing together artificial intelligence and nanotechnology,” emphasizes Klaus-Robert Müller. “Up until now, this has only been a ‘proof of principle’,” Stefan Tautz adds. “However, we are confident that our work will pave the way for the robot-assisted automated construction of functional supramolecular structures, such as molecular transistors, memory cells, or qubits – with a speed, precision, and reliability far in excess of what is currently possible.

To learn more please visit the official press release of TU Berlin.

The Paper in Detail:

Autonomous robotic nanofabrication with reinforcement learning

Authors: Philipp Leinen, Malte Esders, Kristof T. Schütt, Christian Wagner, Klaus-Robert Müller, F. Stefan Tautz

Abstract:
The ability to handle single molecules as effectively as macroscopic building blocks would enable the construction of complex supramolecular structures inaccessible to self-assembly. The fundamental challenges obstructing this goal are the uncontrolled variability and poor observability of atomic-scale conformations. Here, we present a strategy to work around both obstacles and demonstrate autonomous robotic nanofabrication by manipulating single molecules. Our approach uses reinforcement learning (RL), which finds solution strategies even in the face of large uncertainty and sparse feedback. We demonstrate the potential of our RL approach by removing molecules autonomously with a scanning probe microscope from a supramolecular structure. Our RL agent reaches an excellent performance, enabling us to automate a task that previously had to be performed by a human. We anticipate that our work opens the way toward autonomous agents for the robotic construction of functional supramolecular structures with speed, precision, and perseverance beyond our current capabilities.

Published in: Science Advances  02 Sep 2020: Vol. 6, no. 36, eabb6987

BIFOLD State of Research Overview

Home >

BIFOLD State of Research Overview

An Overview of the Current State of Research in BIFOLD

Since the official announcement of the Berlin Institute for the Foundations of Learning and Data in January 2020, BIFOLD researchers achieved a wide array of advancements in the domains of Machine Learning and Big Data Management as well as in a variety of application areas by developing new Systems and creating impactfull publications. The following summary provides an overview of recent research activities and successes.

BIFOLD seeks to lay the foundations for Big Data and Machine Learning and offers internationally visible impulses from the interlocking research of Big Data and Machine Learning – a new scientific discipline. In this way, BIFOLD advances the scientific and industrial use of both technology domains. For the broad, productive application of AI, it is crucial to develop new, automatically scalable technologies that organize the constantly growing flood of data and use intelligent procedures to derive well-founded information for data-driven decisions. Only by combining Big Data and Machine Learning, new values can be derived from heterogeneous and large amounts of data, which is of fundamental importance for more and more applications, from healthcare to digital humanities. The close interaction between Big Data and Machine Learning is the unique selling point of BIFOLD within the network of German AI centers of excellence. Training, start-ups, transfer, cooperation and networking activities make BIFOLD a nucleus for technology transfer and innovation for the interdisciplinary ecosystem of Big Data and machine learning in Germany and Europe. Several BIFOLD employees are active in the consortia ELLIS (European Lab for Learning & Intelligent Systems specifically the Berlin ELLIS unit) and CLAIRE (Confederation of Laboratories for Artificial Intelligence Research in Europe) and contribute, for example, new AI methods for statistical mechanics and quantum chemistry.

State of Research

During the first half of 2020, BIFOLD Research Groups have produced over 55 new scientific publications and won several prestigious awards.

Machine Learning

In the field of Machine Learning, BIFOLD has and continues to make significant progress in both basic and applied research across a wide range of applications.

Under the direction of FU Berlin’s Prof. Dr. Tim Conrad, researchers at the Zuse Institute Berlin are working on the analysis of dynamic networks. The focus is on learning the laws of varying change processes in the context of medical and social phenomena. In collaboration with Université Paris-Saclay’s Prof. Dr. Gilles Blanchard, an improved estimation method for kernel embeddings of distributions was developed.

Under the direction of TU Berlin’s Prof. Dr. Manfred Opper, dynamic language models that combine approximate Bayesian inference and deep neural networks and suitable for conducting sensitivity analysis in systems were developed. In addition, research on neural variation inference for the circuits of dynamic systems and stochastic process approximations for the training of Bayesian neural networks is currently underway.

Under the direction of TU Berlin’s Prof. Dr. Klaus-Robert Müller and Dr. Grégoire Montavon, in the field of Explainable AI (XAI), two novel explanation methods were developed: BiLRP for deep similarity models and GNN-LRP for deep graph-based neural networks. Additionally, insights into protection mechanisms to guard against manipulation attacks for interpretation methods were gained. Furthermore, a BIFOLD publication on quantum chemistry was among the 50 most read Nature Communications articles in chemistry and materials sciences published in 2019.

TU Berlin’s Prof. Dr. Gitta Kutyniok, Head of the Applied Functional Analysis Group, conducts research on the theoretical foundations of artificial neural networks and is the coordinator of a new DFG Priority Program on the Theoretical Foundations of Deep Learning.

Under the direction of TU Berlin’s Prof. Dr. Giuseppe Caire, Head of the Communication and Information Theory Group, researchers are employing ML to solve various communications problems (e.g., real-time localization, cellular optimization, D2D connection planning) and improve existing solutions, jointly with TU Berlin Mathematicians.

At the Fraunhofer Heinrich Hertz Institute, Dr. Wojciech Samek and his colleagues have gained important insights into federated learning. To avoid suboptimal results (e.g., when data distributions deviated on the customer side), a novel framework for federated multi-tasking learning was developed. Clustered federated learning facilitates managing dynamic customer populations that evolve over time under privacy constraints.

Big Data

Researchers in TU Berlin’s Database Systems and Information Management (DIMA) Group led by Prof. Dr. Volker Markl, have been conducting research on the scalable real-time processing of very large, heterogeneous and geographically distributed data streams. Jointly with DIMA researchers, Project Lead Dr. Steffen Zeuch is currently developing an end-to-end data processing system for the Internet of Things (IoT) called NebulaStream. NebulaStream is being designed to cope with the heterogeneity and distribution of data and systems. It supports various data and programming models that go beyond relational algebra and addresses potentially unreliable communication. The NebulaStream platform enables the development of novel IoT applications. The first test results confirm that it offers fast and efficient data delivery. In cooperation with Prof. Dr. Matthias Böhm at the University of Graz, investigations into the declarative specification and automatic optimization of data science pipelines is underway.

Among the more recent accomplishments were a BIFOLD publication from the DIMA Group on the use of modern GPUs to accelerate the processing of database queries, which earned the 2020 ACM SIGMOD Best Paper Award, in a joint collaboration with HPI’s Prof. Dr. Tilmann Rabl, Head of the Data Engineering Systems Group and BIFOLD Principal Investigator. Furthermore, Data Engineering Systems group and DIMA developed a new message broker system, which addresses weaknesses of older systems regarding new storage and access technologies. This research work won second place in the highly regarded ACM SIGMOD 2020 Student Research Competition.

Today, in order to perform AI and Data Science (DS) on a multitude of assets, an enormous amount of resources is required. However, only a few players have this capability, which  results in a lock-in effect. Project Leads Dr. Jorge-Arnulfo Quiané-Ruiz and Dr. Jonas Traub, jointly with DIMA researchers, are currently designing and building an ecosystem called Agora to unify data, algorithms, models and computing resources, to enable exchange across a broad audience. Agora treats data-related assets as first-class citizens, uses fine-grained asset exchange, enables the combination of assets to create novel applications, and offers the flexibility required to run applications on the available resources.

Dr. Jorge-Arnulfo Quiané-Ruiz and Dr. Kaustubh Beedkar, jointly with Prof. Dr. Volker Markl are currently investigating how the processing of geo-distributed queries can be reconciled with data movement restrictions imposed by guidelines. To address these challenges, they have created a declarative specification language for guidelines, a conformance-based optimizer for distributed queries, and efficient mechanisms for policy evaluation. Moreover, Dr. Jorge-Arnulfo Quiané-Ruiz and Dr. Zoi Kaoudi, jointly with researchers at QCRI, have developed a novel approach for the debugging of large-scale dataflow jobs.

TU Berlin’s Prof. Dr. Begüm Demir, Head of the Remote Sensing Image Analysis Group, conducts research in remote sensing, signal processing, image processing, machine learning, and big data in earth observation. In particular, she currently holds an ERC Starting Grant called BigEarth,which aims to develop a scalable earth observation (EO) image search and retrieval system for the fast discovery of critical information contained in massive EO archives. Additionally, she is the recipient of the prestigious 2018 Early Career Award presented by the IEEE Geoscience and Remote Sensing Society.

TU Berlin’s Prof. Dr. Ziawasch Abedjan, Head of the Big Data Management (BigDaMa) Group conducts research in data cleaning, data integration, machine learning, and data science. BigDaMa created a prototype for the integration of web data with data sets for prediction tasks, which was awarded first place in the 2019 GI Data Science Challenge. In collaboration with DIMA researchers, they are investigating how to optimize machine learning processes via the integration of artifacts. A recent publication on configuration-free error detection in datasets received the ACM SIGMOD Most Reproducible Paper Award.

TU Berlin’s Prof. Dr. Odej Kao, Head of the Distributed and Operating Systems Group, conducts research on the collaborative processing of sensor data using heterogeneous distributed computing resources and achieving automatic compliance with prescribed properties. In collaboration with Dr. Fuyuki Ishikawa of the National Institute of Informatics in Tokyo, Japan, there is ongoing research in the area of reliability testing. Additionally, there is an ongoing collaboration with Prof. Dr. Gjorgji Madjarov of the University of Skopje in Northern Macedonia on self-monitored learning for medical decision support.

TU Berlin’s Prof. Dr. Georgios Smaragdakis, currently is Head of the Internet Network Architectures and Internet Measurement and Analysis Groups. His focus is on improving the performance of geo-distributed analyses via software-defined networks and an intelligent scheduler across varying analysis platforms. More recently, they analyzed data to assess the impact of the COVID-19 pandemic on network performance. Among the collaboration partners, there are researchers drawn from  a number of international and German institutions, including Imperial College London, Northeastern University, IMDEA Networks, FORTH, Akamai Technologies, Max Planck Institute for Informatics, and DE-CIX. His work was recognized with best paper awards in ACM CoNEXT 2019, Best of ACM SIGCOMM Computer Communication Review 2019, and two IETF/IRTF Applied Networking Research Prizes in 2019 and 2020.

DFKI’s Prof. Dr. Sebastian Möller, Head of the Speech and Language Technology Group is conducting research on scalable cross-lingual information extraction methods for the identification and prevention of interactions from social media and forums. The project is conducted in close cooperation with LIMSI, CNRS in France.

TU Berlin’s Prof. Dr. Manfred Hauswirth, Head of the Open Distributed Systems (ODS) group and  the managing director of the Fraunhofer Institute for Open Communication Systems, (FOKUS), conducts research on open distributed systems including sensor/stream middleware, cyber physical systems, P2P and Semantic Web/Linked Data. Together with Dr. Danh Le Phuoc, the ODS team has been building the CQELS Framework that supports neural-symbolic reasoning operations on multimodal stream data such as video streams, cameras/LIDARs and semantic streaming graphs. The CQELS Framework has helped the ODS team win the Best Paper Award at the 8th International Conference on the Internet of Things, 2018. A recent publication of ODS on “autonomous semantic stream processing” received the Best Paper Runner-up at the 9th Joint International Semantic Technology Conference, 2019.

This year, BIFOLD research on Big Data Science received very high international visibility at prestigious conferences such as SIGMOD, VLDB, ICDE and CIDR.

Application-oriented Research

Medicine

Various biomedical applications are tackled in BIFOLD. Prof. Dr. Frank Noé, Head of the Computational Molecular Biology Group at FU Berlin has been working on the development of SARS-CoV-2 drugs using simulation and ML for the JEDI COVID-19 Grand Challenge, jointly with virologists and physicians in Germany and the USA. Prof. Dr. Anja Hennemuth with Charité is conducting research in the analysis of the heart muscle and blood vessel walls using ML. Further focal points include the development of novel approaches for the interactive exploration of 4D-Radionics features of the heart as well as the clinical evaluation of research software. Under the direction of Prof. Dr. Hennemuth, the CADA-Challenge for the detection and analysis of cranial aneurysms is currently underway. The results will be evaluated and published at the upcoming MICCAI conference, to be held in September 2020.

Prof. Dr. Martin Vingron, Director of the Computational Molecular Biology Department at the Max Planck Institute for Molecular Genetics is investigating the mechanisms of cell type-specific gene regulation. Machine learning and statistical methods are employed to identify regulatory DNA elements in the genome and link them to their target genes. In close cooperation with Charité, under the direction of Prof. Dr. Frederick Klauschen, a clinically relevant contribution to the computer-assisted diagnosis of lung carcinomas and metastases of head and neck tumors have been made. The developed method exceeds the prediction accuracy of SVMs and randomized tests. The group of BIFOLD Senior Researcher Dr. Roland Schwarz at the Max Delbrueck Center for Molecular Medicine conducts research in cancer genomics using ML. Earlier this year, in conjunction with members of the International Cancer Genome Consortium, a paper on the Pan-Cancer Analysis of Whole Genomes (PCAWG) was published in Nature. The widely cited study identifies common mutation patterns in over 2600 whole cancer genomes.

Prof. Dr. Thomas Wiegand, Executive Director at the Fraunhofer HHI (Heinrich Hertz Institute) conducts research in signal processing, data and video compression, communications, and applied ML. Currently, he chairs the ITU/WHO Focus Group on AI for Health.

Prof. Dr. Uwe Ohler’s group at the Max-Delbrück-Center is active in applying ML to understand the gene regulatory code of complex organisms. The specific goal is to interpret how genetic sequence variation changes the activity of genes, and initial results show great promise for using deep learning to map and score the target sequences of regulatory proteins.

Security

In the field of computer security, Prof. Dr. Jean-Pierre Seifert, Head of the Security in Telecommunications (SECT) Group at TU Berlin is conducting research on multiple fronts: on modeling physically unclonable functions (PUF), in cooperation with researchers at CWI (Netherlands) and the University of Florida (USA) -and- on the separation of the PAC (Probably Approximately Correct) learning model for quantum computers from the classical PAC learning model, jointly with Prof. Dr. Jens Eisert, Head of a Research Group in the Dahlem Center for Complex Quantum Systems at FU Berlin.

At TU Braunschweig, Prof. Dr. Konrad Rieck is working on attacks and protective measures for learn-based and data-driven IT systems within the framework of BIFOLD. This research is carried out in cooperation with the Cyber Security in the Age of Large-Scale Adversaries (CASA) Cluster of Excellence at the Ruhr University Bochum and researchers from King’s College London.

Digital Humanities

In the Digital Humanities, Prof. Dr. Matteo Valleriani at the Max Planck Institute for the History of Science is working on the development of a deep neural network to calculate similarities between astronomical tables as printed in early modern scientific papers. This effort would enable us, in cooperation with experts from the humanities and researchers in the Explainable AI community, to automatically determine semantic similarity across a large number of historical numerical tables.