NebulaStream Paper Accepted at VLIoT 2020

Home >

NebulaStream Paper Accepted at VLIoT 2020

Newest work on the NebulaStream System by Database Systems Researchers of TU Berlin and DFKI will be Presented at VLIoT 2020

The Paper “NebulaStream: Complex Analytics Beyond the Cloud” by Steffen Zeuch et al. was accepted for presentation at the 2020 International Workshop on Very Large Internet of Things (VLIoT 2020). VLIoT 2020 will take place in conjunction with the VLDB 2020 conference.
In this paper, the NebulaStream (NES) project team in DFKI’s IAM group and TU Berlin’s DIMA group shows why there is a need for a new End-to-End data processing systems for the Internet of Things (IoT). NES deals with the heterogeneity and distribution of computers and data, supports various data and programming models that go beyond relational algebra and addresses potentially unreliable communication. The NebulaStream platform enables new IoT applications in various application areas.

The Paper in detail:

NebulaStream: Complex Analytics Beyond the Cloud

Authors: Steffen Zeuch, Eleni Tzirita Zacharatou, Shuhao Zhang, Xenofon Chatziliadis, Ankit Chaudhary, Bonaventura DelMonte, Philipp M. Grulich, Dimitrios Giouroukis, Ariane Ziehn, Volker Markl

Abstract:
The arising Internet of Things (IoT) will require significant changes to current stream processing engines (SPEs) to enable large-scale IoT applications. In this paper, we present challenges and opportunities for an IoT data management system to enable complex analytics beyond the cloud. As one of the most important upcoming IoT applications, we focus on the vision of a smart city. The goal of this paper is to bridge the gap between the requirements of upcoming IoT applications and the supported features of an IoT data management system. To this end, we outline how state-of-the-art SPEs have to change to exploit the new capabilities of the IoT and showcase how we tackle IoT challenges in our own system, NebulaStream. This paper lays the foundation for a new type of systems that leverages the IoT to enable large-scale applications over millions of IoT devices in highly dynamic and geo-distributed environments.

Preprint

TUB Datalog Paper accepted at LSGDA 2020

Home >

TUB Datalog Paper accepted at LSGDA 2020

TU Berlin Datalog Research Paper was Accepted for Presentation at LSGDA 2020

The Paper „Distributed Graph Analytics with Datalog Queries in Flink“ by TU Berlin Database Systems Reasearchers Muhammad Imran, Gábor Gévay, Volker Markl will be presented at the 2nd International Workshop on Large Scale Graph Data Analytics (LSGDA 2020) in conjunction with the 2020 VLDB Conference in Tokyo, Japan, at September 4, 2020.

The Paper in Detail:

Distributed Graph Analytics with Datalog Queries in Flink

Authors:
Muhammad Imran, Gábor Gévay, Volker Markl

Abstract:
Large-scale, parallel graph processing has been in demand over the past decade. Succinct program structure and efficient execution are among the essential requirements of graph processing frameworks. In this paper, we present Cog, which executes Datalog programs on the Apache Flink distributed dataflow system. We chose Datalog for its compact program structure and Flink for its efficiency. We implemented a parallel semi-naive evaluation algorithm exploiting Flink’s delta iteration to propagate only the tuples that need to be further processed to the subsequent iterations. Flink’s delta iteration feature reduces the overhead present in acyclic dataflow systems, such as Spark, when evaluating recursive queries, hence making it more efficient. We demonstrated in our experiments that Cog outperformed BigDatalog, the state-of-the-art distributed Datalog evaluation system, in most of the tests.

Preprint

DFG Program for Artificial Neural Networks

Home >

DFG Program for Artificial Neural Networks

Prof. Dr. Kutyniok coordinates new DFG priority program on artificial neural networks

Prof. Dr. Gitta Kutyniok is coordinator of the new DFG (German Research Foundation) priority program „Theoretical Foundations of Deep Learning“.

Artificial neural networks are currently the driving force in the field of Artificial Intelligence (AI). Methods based on such deep neural networks show impressive success in real applications ranging from autonomous driving to game intelligence and the health sector. At the same time, such methods have a similarly strong influence on science and often supplement or even replace classic model-based methods for solving mathematical problems such as inverse problems or partial differential equations. Despite these outstanding successes, the majority of research on neural networks is empirically driven. A theoretical basis is largely missing. Moreover, there are proven cases where these techniques fail completely at small perturbations, so improvements are needed that are backed up by a theoretical understanding.

“The main goal of this priority program is therefore the development of a comprehensive theoretical basis for artificial neural networks.”
says Prof. Dr. Kutyniok.

The DFG is providing approximately 8.5 million euros for the first three years of the Priority Program.

Outstanding Reviewer Dr. Sauceda

Home >

Outstanding Reviewer Dr. Sauceda

Dr. Huziel E. Sauceda nominated Reviewer of the Month by Communications Physics

Dr. Huziel E. Sauceda, postdoctoral researcher in Prof. Dr. Klaus-Robert Müller’s group, was nominated as Reviewer of the month by the Nature Research journal Communications Physics.

Every month Communications Physics highlights the outstanding contribution of referees, who have gone above and beyond what is expected of a reviewer in terms of the value of their reports, the detail of their analysis, or the degree to which they have helped the authors improve their manuscripts prior to publication.

Paper accepted in PVLDB Vol. 13

Home >

Paper accepted in PVLDB Vol. 13

Database Systems Research Paper Accepted for Publication in PLVDB Vol. 13

The Paper „Dynamic Parameter Allocation in Parameter Servers“ [1] authored by Alexander Renz-Wieland, Rainer Gemulla, Steffen Zeuch and Volker Markl from was accepted for publication in Proceedings of the VLDB Endowment Vol. 13 [2]. The authors from TU Berlin’s DIMA group [3] and DFKI’s IAM group [4] propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks.

References:

[1] „Dynamic Parameter Allocation in Parameter Servers“, Alexander Renz-Wieland, Rainer Gemulla, Steffen Zeuch, Volker Markl, https://bit.ly/2MZuXTU
[2] Proceedings of the VLDB Endowment, Volume 13, 2019-2020, http://www.vldb.org/pvldb/vol13.html
[3] TU Berlin Database Systems & Information Management Group, https://www.dima.tu-berlin.de/.
[4] DFKI Intelligent Analytics for Massive Data Group, https://bit.ly/2LKoY4Y.

The Paper in Detail:

Dynamic Parameter Allocation in Parameter Servers

Authors: Alexander Renz-Wieland, Rainer Gemulla, Steffen Zeuch, Volker Markl

Abstract: To keep up with increasing dataset sizes and model complexity, distributed training has become a necessity for large machine learning tasks. Parameter servers ease the implementation of distributed parameter management–-a key concern in distributed training–-, but can induce severe communication overhead. To reduce communication overhead, distributed machine learning algorithms use techniques to increase parameter access locality (PAL), achieving up to linear speed-ups. We found that existing parameter servers provide only limited support for PAL techniques, however, and therefore prevent efficient training. In this paper, we explore whether and to what extent PAL techniques can be supported, and whether such support is beneficial. We propose to integrate dynamic parameter allocation into parameter servers, describe an efficient implementation of such a parameter server called Lapse, and experimentally compare its performance to existing parameter servers across a number of machine learning tasks. We found that Lapse provides near linear scaling and can be orders of magnitude faster than existing parameter servers.

Preprint

Cloud Computing for Cancer Research

Home >

Cloud Computing for Cancer Research

Cloud Computing contributes to Cancer Research

PCAWG, the largest cancer research consortium in the world, has set itself the task of improving our understanding of genetic mutations in tumors. A new study by the international research group, to which researchers in Dr. Roland Schwarz’ group at the Max Delbrück Center for Molecular Medicine (MDC) substantially contributed, is now being published in the journal Nature.

Using high-performance computers and cloud computing, BIFOLD senior researcher Dr. Schwarz and his team at MDC have found out that such alterations in copy number play a decisive role in the altered gene expression of cancer cells. “By analyzing the DNA sequences of the entire genome, we have been able to accurately determine how copy number variants work together with point mutations – and what influence this in turn has on the expression of a specific parental allele,” Schwarz explains.

For more information on the topic, please visit https://www.mdc-berlin.de/news/press/PCAWG

New Research Fellow Olga Nicolaeva

Home >

New Research Fellow Olga Nicolaeva

New Research Fellow Olga Nicolaeva

Prof. Dr. Matteo Valleriani’s research group at the Max Planck Institute for the Histroy of Science (MPIWG) will expand by one research fellow: Olga Nicolaeva. In the frame of BIFOLD, they will create a predictive Machine Learning model, which is able to establish a causal connection between distribution of ‚knowledge atoms‘ (illustrations, tables, text parts) in the corpus of early modern textbooks on geocentric cosmology that they are analyzing and the formation of „dominant thought“ in the processes of knowledge evolution. They will look into various networks of knowledge atoms, based on their similarity, and apply methods of Graph Deep Learning to identify historical, socio-economic and institutional factors that can explain the emergence of dominant clusters of knowledge atoms and their „popularity“ in the processes of knowledge transformation.

Paper accepted at DEBS 2020

Home >

Paper accepted at DEBS 2020

Paper on Adaptive Sampling and Filtering for IoT accepted at DEBS 2020

A paper by data management systems researchers in the Database Systems and Information Management (DIMA) Group at TU Berlin and the Intelligent Analytics for Massive Data (IAM) Group at DFKI (the German Research Institute for Artificial Intelligence) has been accepted for presentation at the 14th ACM International Conference on Distributed and Event-Based Systems (DEBS 2020), 13. – 17. July 2020 in Montreal, Canada.
The paper „A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Things“, authored by D. Giouroukis et al. gathers representative, state-of-the-art algorithms to address scalability challenges in real-time and distributed sensor systems. To gather data timely and efficiently, the authors focus on two techniques, namely adaptive sampling and adaptive filtering. The paper outlines current research challenges for the IoT, future research directions, and aims to support researchers in their decision making process when designing distributed sensor systems.

The Paper in Detail:

A Survey of Adaptive Sampling and Filtering Algorithms for the Internet of Things

Authors: Dimitrios Giouroukis, Alexander Dadiani, Jonas Traub, Steffen Zeuch, Volker Markl

Abstract:
The Internet of Things (IoT) represents one of the fastest emerging trends in the area of information and communication technology. The main challenge in the IoT is the timely gathering of data streams from potentially millions of sensors. In particular, those sensors are widely distributed, constantly in transit, highly heterogeneous, and unreliable. To gather data in such a dynamic environment efficiently, two techniques have emerged over the last decade: adaptive sampling and adaptive filtering. These techniques dynamically reconfigure rates and filter thresholds to trade-off data quality against resource utilization. In this paper, we survey representative, state-of-the-art algorithms to address scalability challenges in real-time and distributed sensor systems. To this end, we cover publications from top peer reviewed venues for a period larger than 12 years. For each algorithm, we point out advantages, disadvantages, assumptions, and limitations. Furthermore, we outline current research challenges, future research directions, and aim to support readers in their decision process when designing extremely distributed sensor systems.

Preprint

References:

[1] TU Berlin Database Systems & Information Management Group, https://www.dima.tu-berlin.de/.
[2] DFKI Intelligent Analytics for Massive Data Group, https://bit.ly/2LKoY4Y.
[3] DEBS 2020, https://2020.debs.org/

SIGMOD 2020 Best Paper Award

Home >

SIGMOD 2020 Best Paper Award

BIFOLD Researchers Receive SIGMOD 2020 Best Paper Award

Database systems researchers at TU Berlin, HPI and DFKI were highly successful this year. Four of their papers were accepted at the 2020 ACM SIGMOD/PODS International Conference on the Management of Data. And, in particular, one of the paper’s received the 2020 ACM SIGMOD Best Paper Award. The paper entitled “Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects,” by Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl explores the use of GPUs to accelerate database query processing. GPUs are generally ill-suited for large-scale data processing for two reasons: (1) the on-board memory capacity is too small to store large datasets, and (2) the interconnect bandwidth to CPU main-memory is insufficient for ad-hoc data transfers. As a result, GPU-based systems face data transfer bottlenecks and do not scale to large datasets. In the paper, the authors demonstrate how a fast interconnect, such as NVLink 2.0 (linking dedicated GPUs to a CPU) can overcome the two scalability issues for a no-partitioning hash join. Consequently, the experiments achieved a speed-up of up to 18x over PCI-e 3.0 and up to 7.3x over an optimized CPU implementation. To download a preprint of the paper visit: https://bit.ly/3bHb3XO.

References

[1] TU Berlin Database Systems & Information Management Group, https://www.dima.tu-berlin.de/.

[2] Hasso Plattner Institute, Data Engineering Systems Group, https://hpi.de/rabl/home.html.

[3] DFKI Intelligent Analytics for Massive Data Group, https://bit.ly/2LKoY4Y.

[4] Four papers authored by TU Berlin and DFKI researchers have been accepted at SIGMOD 2020, https://bifold.berlin/sigmod-2020/

[5] The 2020 ACM SIGMOD / PODS International Conference on the Management of Data, https://sigmod2020.org/.

[6] “Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects,” Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, and Volker Markl, https://bit.ly/2N7ndPF

[7] A blog posting by Clemens Lutz, https://e2data.eu/blog/fast-gpu-interconnects.

[8] SIGMOD Best Paper Awards, https://bit.ly/3dqC9Eh

[9] Presentation by Clemens Lutz on Youtube: https://bit.ly/32Htvyy

Machine Learning meets quantum physics

Home >

Machine Learning meets quantum physics

Machine Learning meets quantum physics

BIFOLD researchers contributed to an in-depth referenced work on the physics-based machine learning techniques that model electronic and atomistic properties of matter.

Machine Learning Meets Quantum Physics” [Schütt, K.T., Chmiela, S., von Lilienfeld, O.A., Tkatchenko, A., Tsuda, K., Müller, K.-R. (Eds.)] was published in “Lecture Notes in Physics” (Springer)

Our book Machine Learning meets Quantum Physics gives an overview of this flourishing interdisciplinary research field between ML, AI, physics and quantum chemistry. It is intended as a versatile blend of both introductory and advanced materials for researchers from the natural sciences (Physics and Chemistry), computer scientists (Machine Learning and AI) and engineers alike. The book provides a broad and hopefully balanced snapshot in time of a highly active field that is experiencing a large growth of interest.”

Prof. Dr. Klaus-Robert Müller / Director BIFOLD

Read Prof. Dr. Müller’s full message in Springer’s “Quantum Science and Technology” blog!