Availability of green Energy controls IT

Home >

Availability of green Energy controls IT

Flexible adjustment of computing workloads improves the carbon footprint of data centers

More and more data centers are connected to microgrids, that provide renewable energy.
C: pixabay

The growing electricity demand of IT infrastructure will soon have a considerable impact on the environment. To reduce their carbon footprint, more and more computing systems are therefore connected to microgrids to gain direct access to renewable energy sources. However, the local availability of solar and wind energy is highly variable and requires consumers to timely adapt their consumption to the current supply. Researchers from the Berlin Institute for the Foundation of Learning and Data (BIFOLD) have developed a new admission control approach that accepts flexible workloads such as machine learning training jobs only if they can be computed relying solely on renewable excess energy. Their publication “Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads” will be presented at Euro-Par 2022.

As the demand for computing continues to grow year by year, so are operating expenses and the associated carbon emissions caused by consuming energy from the public power grid. “Data centers already account for more than 1% of global energy consumption and this number is expected to rise further – especially when considering the additional demand of emerging domains like the internet of things, edge and fog computing”, says Prof. Dr. Odej Kao. One approach towards more sustainable and cost-effective cloud and edge computing systems is directly equipping IT infrastructure with on-site renewable energy sources like solar or wind. However, especially smaller compute nodes, such as on-premise installations or edge data centers, are not always able to consume all generated power at all times, resulting in so-called excess energy. This problem can only partially be mitigated by energy storage or leveling consumption within microgrids.

The researchers propose to schedule flexible workloads.
C: P. Wiesner

To make better use of renewable excess energy at compute nodes, hence reducing associated carbon emissions and electricity costs, researchers of the TU Berlin and University of Glasgow proposed a new approach to schedule flexible workloads, meaning workloads that tolerate some delay in execution. Flexible workloads are common in cloud environments but can also occur in otherwise time-critical edge computing environments. For example, in many edge intelligence scenarios like traffic management, devices adapt to continuously evolving environments by iteratively re-training local machine learning models on new data. Yet, the exact time and extent of such training jobs is subject to some flexibility.

In their publication, Philipp Wiesner, Thorsten Wittkopp and the BIFOLD researchers Dominik Scheinert, Dr. Lauritz Thamsen and Prof. Dr. Odej Kao suggest that under certain conditions flexible workloads should be upfront rejected by a computing system if they cannot be computed using renewable excess energy only. Their proposed system forecasts the free computational capacity of a node as well as the expected energy consumption and production, to determine if additional jobs can be accepted to a node’s workload queue without violating any deadlines and without consuming energy from the public grid. Using probabilistic forecasting, the admission control can be configured towards being strict and only accepting workloads that are almost guaranteed to run on green energy, or towards being more relaxed and not completely ruling out potential grid power usage, if this allows more workloads to be processed.

“Increased deployment of renewables requires flexibility of electricity consumers, which is not always easy to coordinate in highly distributed and heterogeneous systems,” explains Philipp Wiesner, “We expect our approach to be an integral building block in exploiting the varying availability of renewable energy in computing through local decision-making.”

The publication in detail:

Philipp Wiesner, Dominik Scheinert, Thorsten Wittkopp, Lauritz Thamsen, Odej Kao: Proceedings of the 28th International European Conference on Parallel and Distributed Computing (Euro-Par): „Cucumber: Renewable-Aware Admission Control for Delay-Tolerant Cloud and Edge Workloads


The growing electricity demand of cloud and edge computing increases operational costs and will soon have a considerable impact on the environment. A possible countermeasure is equipping IT infrastructure directly with on-site renewable energy sources. Yet, particularly smaller data centers may not be able to use all generated power directly at all times, while feeding it into the public grid or energy storage is often not an option. To maximize the usage of renewable excess energy, we propose Cucumber, an admission control policy that accepts delay tolerant workloads only if they can be computed within their deadlines without the use of grid energy. Using probabilistic forecasting of computational load, energy consumption, and energy production, Cucumber can be configured towards more optimistic or conservative admission.
We evaluate our approach on two scenarios using real solar production forecasts for Berlin, Mexico City, and Cape Town in a simulation environment. For scenarios where excess energy was actually available, our results show that Cucumber’s default configuration achieves acceptance rates close to the optimal case and causes 97.0 % of accepted workloads to be powered using excess energy, while more conservative admission results in 18.5 % reduced acceptance at almost zero grid power usage.

Two Demo Papers accepted at VLDB

Home >

Two Demo Papers accepted at VLDB

Two demo papers accepted at VLDB

Two demonstration papers of BIFOLD researchers have been accepted at the 48th International Conference on Very Large Databases (VLDB). The VLDB 2022 will take place in Sydney, Australia (and hybrid) in September 05-09, 2022.

DORIAN in action: Assisted Design of Data Science Pipelines

Authors: Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, Volker Markl

Abstract: Existing automated machine learning solutions and intelligent discovery assistants are popular tools that facilitate the end-user with the design of data science (DS) pipelines. However, they yield limited applicability for a wide range of real-world use cases and application domains due to (a) the limited support of DS tasks; (b) a small, static set of available operators; and (c) restriction to evaluation processes with objective loss functions. We demonstrate DORIAN, a human-in-the-loop approach for the assisted design of data science pipelines that supports a large and growing set of DS tasks, operators, and arbitrary user-defined evaluation procedures. Based on the user query, i.e., a dataset and a DS task, DORIAN computes a ranked list of candidate pipelines that the end-user can choose from, alter, execute and evaluate. It stores executed pipelines in an experiment database and utilizes similarity-based search to identify relevant previously-run pipelines from the experiment database. DORIAN also takes user interaction into account to improve suggestions over time. We show how users can interact with DORIAN to create and compare DS pipelines on various real-world DS tasks we extracted from OpenML.

Satellite Image Search in AgoraEO

Authors: Ahmet Kerem Aksoy, Pavel Dushev, Eleni Tzirita Zacharatou, Holmer Hemsen, Marcela Charfuelan, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, Volker Markl

Abstract: The growing operational capability of global Earth Observation (EO) creates new opportunities for data-driven approaches to understand
and protect our planet. However, the current use of EO archives is very restricted due to the huge archive sizes and the limited exploration capabilities provided by EO platforms. To address this limitation, we have recently proposed MiLaN, a content-based image retrieval approach for fast similarity search in satellite image archives. MiLaN is a metric-learning-based deep hashing network that encodes high-dimensional image features into compact binary hash codes. We then use these codes as keys in a hash table to enable real-time nearest neighbor search and highly accurate retrieval. In this demonstration, we showcase the efficiency of MiLaN by integrating it with EarthQube, a browser and search engine within AgoraEO. EarthQube supports interactive visual exploration and Query-by-Example over satellite image repositories. Demo visitors will interact with EarthQube playing the role of different users that search images in a large-scale remote sensing archive by their semantic content and apply other filters.

ICDE 2022 Best Demo Award

Home >

ICDE 2022 Best Demo Award

A framework to efficiently create training data for optimizers

A demo paper co-authored by a group of BIFOLD researchers on “Farming Your ML-based Query Optimizer’s Food” presented at the virtual conference ICDE 2022 has won the best demo award. The award committee members have unanimously chosen this demonstration based on the relevance of the problem, the high potential of the proposed approach and the excellent presentation.

As machine learning is becoming a core component in query optimizers, e.g., to estimate costs or cardinalities, it is critical to collect a large amount of labeled training data to build this machine learning models. The training data should consist of diverse query plans with their label (execution time or cardinality). However, collecting such a training dataset is a very tedious and time-consuming task: It requires both developing numerous plans and executing them to acquire ground-truth labels. The latter can take days if not months, depending on the size of the data.

In a research paper presented last year at SIGMOD 2021 the authors presented DataFarm, a framework for efficiently creating training data for optimizers with learning-based components. This demo paper extends DataFarm with an intuitive graphical user interface which allows users to get informative details of the generated plans and guides them through the generation process step-by-step. As an output of DataFarm, users can download both the generated plans to use as a benchmark and the training data (jobs with their labels).


By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

The publication in detail:

Robin van de Water, Francesco Ventura, Zoi Kaoudi, Jorge-Arnulfo Quiane-Ruiz, Volker Markl: Farming Your ML-based Query Optimizer’s Food (to appear)


Machine learning (ML) is becoming a core component
in query optimizers, e.g., to estimate costs or cardinalities.
This means large heterogeneous sets of labeled query plans or
jobs (i.e., plans with their runtime or cardinality output) are
needed. However, collecting such a training dataset is a very
tedious and time-consuming task: It requires both developing
numerous jobs and executing them to acquire ground-truth
labels. We demonstrate DATAFARM, a novel framework for
efficiently generating and labeling training data for ML-based
query optimizers to overcome these issues. DATAFARM enables
generating training data tailored to users’ needs by learning from
their existing workload patterns, input data, and computational
resources. It uses an active learning approach to determine a
subset of jobs to be executed and encloses the human into
the loop, resulting in higher quality data. The graphical user
interface of DATAFARM allows users to get informative details
of the generated jobs and guides them through the generation
process step-by-step. We show how users can intervene and
provide feedback to the system in an iterative fashion. As an
output, users can download both the generated jobs to use as a
benchmark and the training data (jobs with their labels).

Uniting the cloud, the edge and the sensors

Home >

Uniting the cloud, the edge and the sensors

NebulaStream aims to unify the cloud, the edge and the sensors

NebulaStream, the novel, general-purpose, end-to-end data management system for the IoT and the Cloud, recently announced the release of NebulaStream 0.2.0., the closed-beta release. The System is developed and explored by a team of BIFOLD researchers led by Prof. Dr. Volker Markl. It addresses the unique challenges of the “Internet of Things” (IoT). “Currently, NebulaStream is undergoing a heavy development process and we would love to cooperate with external users and developers, who may sign up for code access and collaborate with us and/or use NebulaStream for their own research”, explains Dr. Steffen Zeuch, group lead of the NebulaStream team.

More and more data will be generated by an exponentially increasing number of IoT devices.
C: metamorworks/iStock

Recently, the International Data Corporation estimated that by 2025 the global amount of data will reach 175 zettabyte (ZB) and that 30 percent of this data will be gathered in real-time. In particular, more and more data will be generated by an exponentially increasing number of IoT devices (up to 20 billion connected devices in 2025), that continuously improve their processing capabilities. At the same time, the data processing landscape rapidly evolved towards a cloud-centric environment. Service providers leverage virtually unlimited resources to build novel data processing systems which offer high scalability, elasticity, and fault tolerance. As a result, end users can choose from a wide range of services depending on their requirements. However, the strong focus on cloud-based services fundamentally neglects that the majority of interesting data is produced outside the cloud. Thus, the main question for future system designs is how to enable analytics on zettabytes of data produced outside the cloud from millions of geo-distributed, heterogeneous devices in realtime.

To this end, NebulaStream aims to unify data management approaches that until now are realized in different systems: cloud-based streaming systems, fog-based data processing systems, and sensor networks. Cloud-based streaming systems support virtually unlimited processing capabilities and resource elasticity in the cloud but require that all data is shipped from sensors to the data center. Fog or edge-based data processing systems move the data processing to the sensor to reduce data transmission costs and can handle unreliable network connections but do not make use of cloud-based resources. Sensor networks optimize for battery lifetime and energy efficiency but do not support the execution of general queries. “As soon as the majority of interesting data will be produced outside the cloud, in the Smart-X universe, e.g., smart city, smart grid, smart home, we need to envision a unified data management system like NebulaStream that holistically manages the cloud, the edge, and the sensors”, says Steffen Zeuch.

Public transportation could benefit enormously by connecting local and cloud data.
C: pixabay

“A good everyday example is a public transportation provider in the city: Buses, trains, taxis, and so on are equipped with sensors that measure location, speed, occupancy, and many more metrics. All of this data is picked up by stationary base stations in the city and by using NebulaStream, either processed locally or forwarded through a network to the cloud. The data can be analyzed in real time and lead to actions that change the physical nature of the network. For example, if there is more demand on a station (higher occupancy) or traffic (longer wait times), then the transport provider can dispatch more vehicles”, explains Viktor Rosenfeld, member of the NebulaStream team. NebulaStream addresses these challenges by enabling heterogeneity and distribution of compute and data, supports diverse data and programming models going beyond relational algebra, deals with potentially unreliable communication, and enables constant evolution under continuous operation.

“NebulaStream is part of two joint international projects with a number of industrial and academic members”, adds Steffen Zeuch. ExDRa is a collaboration between Siemens, TU Graz, DFKI, and TU Berlin. The use case is the production process in paper mills. The project aims to predict the final paper quality during the production process. Various data are collected during the production from multiple production sites and fed into a federated learning process. NebulaStream is used to acquire sensor data and feed it into SystemDS which implements the federated learning pipeline.
ELEGANT is a Horizon 2020 project by the EU commission. It consists of six industry partners which provide use cases and expertise, and four academic partners, including TU Berlin. The goals of ELEGANT are similar to the goals of NebulaStream, i.e., to unify IoT processing and cloud processing, with a special focus on supporting hardware acceleration. The industry partners provide different use cases like video surveillance to extract a summary of interesting events from audio/video streams, a smart metering use case to monitor water usage and profile device usage, or a smart riding use case to classify skill of motorbike riders.

The publication in detail:

Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp M. Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch: ExDRa: Exploratory Data Science on Federated Raw Data,  SIGMOD/PODS ’21: June 2021, p. 2450–2463


Data science workflows are largely exploratory, dealing with under-specified objectives, open-ended problems, and unknown business value. Therefore, little investment is made in systematic acquisition, integration, and pre-processing of data. This lack of infrastructure results in redundant manual effort and computation. Furthermore, central data consolidation is not always technically or economically desirable or even feasible (e.g., due to privacy, and/or data ownership). The ExDRa system aims to provide system infrastructure for this exploratory data science process on federated and heterogeneous, raw data sources. Technical focus areas include (1) ad-hoc and federated data integration on raw data, (2) data organization and reuse of intermediates, and (3) optimization of the data science lifecycle, under awareness of partially accessible data. In this paper, we describe use cases, the overall system architecture, selected features of SystemDS’ new federated backend (for federated linear algebra programs, federated parameter servers, and federated data preparation), as well as promising initial results. Beyond existing work on federated learning, ExDRa focuses on enterprise federated ML and related data pre-processing challenges. In this context, federated ML has the potential to create a more fine-grained spectrum of data ownership and thus, even new markets.

ACM SIGMOD Research Highlight

Home >

ACM SIGMOD Research Highlight

The Best of Both Worlds

BIFOLD researchers combined imperative and functional control flow in dataflow systems.

Mitos combines the advantages of Apache Flink and Apache Spark.
Copyright: BIFOLD

The paper “Efficient Control Flow in Dataflow Systems: When Ease-of-Use Meets High Performance” of six BIFOLD researchers was honored with a 2021 ACM SIGMOD Research Highlights Award. This prestigious award recognizes the work of Gábor E. Gévay, Tilmann Rabl, Sebastian Breß, Lorand Madai-Tahy, Jorge-Arnulfo Quiané-Ruiz and Volker Markl as a definitive milestone and emphasizes its potentially significant impact. A short version of their paper (Imperative or Functional Control Flow Handling: Why not the Best of Both Worlds?), which emphasizes the advantages of combining imperative and functional control flow in dataflow systems, is published in the 2021 ACM SIGMOD Research Highlights special issue. The paper was already distinguished with a Best Paper Award at the 37. IEEE International Conference on Data Engineering (ICDE) 2021.

More information:

2022 ACM SIGMOD Record

ICDE 2021 honors BIFOLD Researchers with Best Paper Award

Summer School Information Event

Home >

Summer School Information Event

Summer School Information Event

Date and Time: Monday, 25. April, 2022; 4:00 pm

Speaker: Andrea Hamm, Martin Schuessler, Dr Stefan Ullrich

Venue: virtual event

Participation: If you are interested in participating please contact: gs@bifold.berlin

Andrea Hamm and Martin Schuessler, supported by Dr Stefan Ullrich, will present the program of the BIFOLD Ethics in Residence’s Summer School, which will take place from 20.-24. June in a hotel around Berlin and at the Weizenbaum Institute. The Summer School complements and serves the technological research on artificial intelligence (AI) within the AI Competence Centres with aspects of ethics, explainability, and sustainability. It is organized within the Ethics in Residence program, which is realized between the Weizenbaum Institute for the Networked Society – the German Internet Institute – and the Network of the German AI Competence Centres.

The Summer School is fully funded by BIFOLD and open to all BIFOLD PhD students, and in addition those from the other centres of the German AI Competence Center Network (ML2R, MCML, TUE-AI, ScaDS, DFKI).

The program includes multiple hands-on workshops to advance the participants’ individual research projects, several high-profile international guest lectures and Q&A sessions with the guest speakers, a panel discussion, and participants’ presentation sessions for expert jury feedback. The international expert researchers and guest speakers joining have backgrounds in computing within limits, disaster research, and COVID-19 data research. In addition, the summer school offers two main tracks, one on explainable deep neural networks (XNN) and one on sustainable AI (SAI), for a more specialized training of the PhD students.

About the presenters & track leaders
Andrea Hamm
Copyright: private

Andrea Hamm is a doctoral researcher at the Weizenbaum Institute for the Networked Society and TU Berlin. In her dissertation, she investigates civic technologies for environmental monitoring in the context of making cities and communities more sustainable. She is particularly interested in the real-world effects of AI technologies, for instance to understand how AI-supported simulations contribute to reducing CO2 footprints and material consumption. She has published at international venues such as at ACM CHI Conference on Human Factors in Computing Systems, ACM ICT for Sustainability Conference, and the International Conference on Wirtschaftsinformatik WI. Her work focuses on the interdisciplinary transition from human-computer interaction (HCI) and design studies to communication studies. She is a member of the German-Japanese Society for Social Sciences and the AI Climate Change Center Berlin-Brandenburg. In 2019, she was a guest researcher at Aarhus University, Denmark, after previously studying at Freie Universität Berlin (Germany), Vrije Universiteit Brussel (Belgium), and Université Catholique de Lille (France).

Martin Schuessler is a tech-savvy, interdisciplinary human-computer interaction researcher with the belief that it is the purpose of technology to enhance people’s lives. As an undergrad, he followed this belief from a technical perspective by investigating the usability of new interaction paradigms such as tangible and spatial interfaces at the OvGU Magdeburg and Telekom Innovation Labs Berlin. As a PhD student at the interdisciplinary Weizenbaum Institute he adopted a broader often less technical perspective on the same belief (still involving a lot of programming). His dissertation work looks at ways to make computer vision systems more intelligible for users. Martin has been a visiting researcher at the University College of London Interaction Center and the Heidelberg Collaboratory for Image Processing. He has published articles at top international conference on Learning Representations (ICLR), Human Factors in Computing Systems (CHI), Computer-Supported Cooperative Work and Social Computing (CSCW) and Intelligent User Interfaces (IUI).

Martin Schüssler
Copyright: private

The Summer School as a whole and as a part of the BIFOLD Ethics in Residence Program is fostered by Dr. Stefan Ullrich.

BIFOLD Summer School

Home >

BIFOLD Summer School

Ethics in Machine Learning & Data Management

The BIFOLD summer school will take place from 20-24 June 2022 at the Weizenbaum Institute for the Networked Society (near Zoo station). It will focus on the latest ethical considerations in machine learning and data management by offering lectures and workshops on two main tracks. The school is designed for doctoral students of the BMBF‘s network of AI competence centres and organized by the BIFOLD Graduate School in collaboration with the Ethics in Residence Program with researchers of the Weizenbaum Institute for the Networked Society – the German Internet Institute.

Sufficient time for hands-on workshops and individual feedback is included.
C: StockSnap/pixabay

The summer school complements technological research on artificial intelligence (AI) within the AI competence centres with ethical aspects of explainability and sustainability. It is part of the Ethics in Residence program. The program includes multiple hands-on workshops to advance individual research projects, several guest lectures including Q&A, a panel discussion, and Ph.D. student presentation sessions with expert jury feedback. The summer school offers two tracks on explainable deep neural networks (XNN) and sustainable AI (SAI) for more specialized training of the doctoral students. All of BIFOLD’s PhD students are invited to participate. In addition BIFOLD offers places for the PhD students of the German AI competence centre network (ML2R, MCML, TUE-AI, ScaDS, DFKI).
Summer School Website

Invited international experts

International expert researchers with backgrounds in computing within limits, disaster research, and COVID-19 data research are joining the summer school as speakers and are reachable for individual feedback.

Daniel Pargman, Ph.D.
KTH Royal Institute of Technology, Stockholm, Sweden

Teresa Cerratto Pargman, Ph.D.
Stockholm University, Sweden

Yuya Shibuya, Ph.D.
The University of Tokyo, Japan

Raphael Sonabend, Ph.D.
Imperial College London, UK & University of Kaiserslautern, Germany

Rainer Mühlhoff, Ph.D.
Osnabrück University, Germany

Enrico Costanza, Ph.D.
University College London, U

Focus tracks

Track XNN focuses on evaluating interpretable machine learning to provide students with the ability to empirically validate claims about interpretability:

  • Critical review of XAI methods: taxonomies of XAI approaches, review of explanation goals, user benefits and current results from user studies
  • Rigorous methods for validating explanation methods with users: interdisciplinary methodological training, suitable evaluation datasets, user tasks and study designs, participant recruitment, validity, and reproducibility considerations

Track SAI emphasizes on ecological and socio-political aspects of AI to understand how AI and data can contribute to action in the name of sustainability transition:

  • Sustainability in research and policies: What is sustainability? United Nations SDGs, COVID-19, critical thinking on AI, environmental monitoring, sustainable smart cities and communities
  • SAI approaches and methods: data feminism, digital civics, computing within limits, citizen science, social media data, and mixed methods.

The complete program can be found here.

Organisational details:

Participants are expected to attend the entire program, arrival from 19. June, departure 25. June 2022. There is no tuition fee.
Please get in touch with us in case you need child care.

Program venue: Weizenbaum Institute for the Networked Society

Accomodation: Hampton by Hilton Berlin City West, in walking distance to Weizenbaum & TU Berlin, in the heart of Berlin, with the nearby Tiergarten Park providing plenty of greenery  


Please send one pdf-file, including your CV, an abstract of your (preliminary) Ph.D. project, and a short motivation message describing why you would like to participate and what you would like to learn during the summer school, to gsapplication@bifold.tu-berlin.de.
Application deadline is 30. April 2022.


Andrea Hamm
Doctoral Researcher, Research Group “Responsibility and the Internet of Things”, Weizenbaum Institute for the Networked Society & TU Berlin, Germany

Martin Schuessler
Doctoral Researcher, Research Group “Criticality of AI-based Systems”, Weizenbaum Institute for the Networked Society & TU Berlin, Germany

Dr. Stefan Ullrich
Weizenbaum Institute for the Networked Society, Berlin, Germany

Prof. Dr. Volker Markl
Co-Director BIFOLD

Prof. Dr. Klaus-Robert Müller
Co-Director BIFOLD

Dr. Tina Schwabe, Dr. Manon Grube
Coordinators of the BIFOLD Graduate School

BIFOLD Colloquium 2022/02/14

Home >

BIFOLD Colloquium 2022/02/14

Apache Flink – From Academia into the Apache Software Foundation and Beyond

Speaker: Dr. Fabian Hüske

Venue: Virtual event

Time and Date: 4:00 pm, 14 February

Registration: If you are interested in participating, please contact: coordination@bifold.berlin


Apache Flink is a project with a very active, supportive, and continuously growing community. For several years in a row, Flink has been among the top ten projects of the Apache Software Foundation with the most traffic on user and development activity. Looking back, Flink started as a research prototype developed by three PhD students at TU Berlin in 2009. In 2014, the developers donated the code base to the ASF and joined the newly founded Apache Flink incubator project. Within three years, Flink grew into a healthy project and gained a lot of momentum. Now, almost 8 years later, the community is still growing and actively developing Flink. Moreover, it has established itself in the industry as a standard tool for scalable stream and batch processing.

In this presentation, Fabian Hüske will discuss Flink’s journey from an academic research project to one of the most active projects of the Apache Software Foundation. He will talk about the academic roots of the project, how the original developers got introduced to the ASF, Flink’s incubation phase, and how its user community and developer base evolved after it graduated and became an ASF top-level project. The talk will focus on the decisions, efforts, and circumstances that helped to grow a vital and welcoming open source community

(Copyright: private)

Fabian Hüske is a software engineer working on streaming things at Snowflake. He is a PMC member of Apache Flink and one of the three original authors of the Stratosphere research system, from which Apache Flink was forked in 2014. Fabian is a co-founder of data Artisans (now Ververica), a Berlin-based startup devoted to fostering Flink. He holds a PhD in computer science from TU Berlin and is the author of “Stream Processing with Apache Flink”.

BIFOLD Colloquium “Scalable and Fast Cloud Data Management”

Home >

BIFOLD Colloquium “Scalable and Fast Cloud Data Management”

“Scalable and Fast Cloud Data Management”

Speakers: Norbert Ritter (University of Hamburg), Felix Gessert (Baqend), and Wolfram Wingerath (Baqend)

Venue: Virtual event

Time and date: December 06, 2021: 4 pm – 6 pm

Registration: If you are interested in participating, please contact: coordination@bifold.berlin

Database research at the University of Hamburg is centered around scalable technologies for cloud data management and connects the dots between traditional database systems, web caching, and continuous data analytics. In this presentation, we provide a rundown of our research topics throughout the years and explain how we turned them into practice at the Software-as-a-Service company Baqend.
We first present an overview over the system space that we are concerned with and the high-level goals we pursue in our work. We then go into detail on how the Orestes architecture combines web caching with traditional data management techniques to accelerate primary key access in globally distributed setups. Next, we cover the InvaliDB architecture that employs continuous stream processing to extend the Orestes approach to complex database queries. Finally, we explain how the cloud service Speed Kit turns our research into practice by accelerating more than 100 million users per month. We close with ongoing and future work, including the Beaconnect project that revolves around continuous analytics over real-user tracking data with Apache Flink.


Norbert Ritter is a full professor of computer science at the University of Hamburg, where he heads the databases and information systems group (DBIS). He received his PhD from the University of Kaiserslautern in 1997. His research interests include distributed and federated database systems, transaction processing, caching, cloud data management, information integration, and autonomous database systems. He has been teaching NoSQL topics in various database courses for several years. Seeing the many open challenges for NoSQL systems, he, Wolfram Wingerath and Felix Gessert have been organizing the annual Scalable Cloud Data Management Workshop to promote research in this area.

Felix Gessert is CEO and co-founder of the Software-as-a-Service company Baqend. During his PhD studies at the University of Hamburg, he developed the core technology behind Baqend’s web performance service. Felix is passionate about making the web faster by turning research results into real-world applications. He frequently talks at conferences about exciting technology trends in data management and web performance. As a Junior Fellow of the German Informatics Society (GI), he is working on new ideas to facilitate the research transfer of academic computer science innovation into practice.

Wolfram Wingerath is the leading data engineer at Baqend where he is responsible for data analytics and all things related to real-time query processing. Starting in 2022, he will take over the Data Science professorship at the University of Oldenburg and will therefore transition into the Head of Research position at Baqend. During his PhD studies at the University of Hamburg, he conceived the scalable design behind Baqend’s real-time query engine and thereby also developed a strong background in real-time databases and related technology such as scalable stream processing, NoSQL database systems, cloud computing, and Big Data analytics.

New BIFOLD Research Groups established

Home >

New BIFOLD Research Groups established

New BIFOLD Research Groups established

The Berlin Institute for the Foundations of Learning and Data (BIFOLD) set up two new Research Training Groups, led by Dr. Stefan Chmiela and Dr. Steffen Zeuch. The goal of these new research units at BIFOLD is to enable a junior researcher to conduct independent research and prepare him for a leadership position. Initial funding includes their own position as well as two PhD students and/or research associates for three years.

One of the new Research Training Groups at BIFOLD led by Dr. Steffen Zeuch focuses on a general purpose, end-to-end data management system for the IoT.
(© Pixabay)

Steffen Zeuch is interested in how to overcome the data management challenges that the growing number of Internet of Things (IoT) devices bring: “Over the last decade, the amount of produced data has reached unseen magnitudes. Recently, the International Data Corporation estimated that by 2025, the global amount of data will reach 175 Zettabyte (ZB) and that 30 percent of this data will be gathered in real-time. In particular, the number of IoT devices increases exponentially such that the IoT is expected to grow as large as 20 billion connected devices in 2025.” The explosion in the number of devices will create novel data-driven applications in the near future. These applications require low-latency, location awareness, wide-spread geographical distribution, and real-time data processing on potentially millions of distributed data sources.

Dr. Steffen Zeuch
(© Steffen Zeuch)

“To enable these applications, a data management system needs to leverage the capabilities of IoT devices outside the cloud. However, today’s classical data management systems are not ready yet for these applications as they are designed for the cloud,” explains Steffen Zeuch. “The focus of my research lies in introducing the NebulaStream Platform – a general purpose, end-to-end data management system for the IoT.”

Stefan Chmiela concentrates on so-called many-body problems. This broad category of physical problems deals with systems of interacting particles, with the goal to accurately characterize their dynamic behavior. These types of problems arise in many disciplines, including quantum mechanics, structural analysis and fluid dynamics and generally require solving high-dimensional partial differential equations. “In my research group we will particularly focus on problems from quantum chemistry and condensed matter physics, as these fields of science rank among the most computationally intensive”, explains Stefan Chmiela. In these systems, highly complex collective behavior emerges from relatively simple physical laws for the motion of each individual particle. Because of this, the simulation of high-dimensional many-body problems requires extremely challenging computation capacities. There is a limit to how much computational efficiency can be gained through rigorous mathematical and physical approximations, yet fully empirical solutions are often too simplistic to be sufficiently predictive.

Dr. Stefan Chmiela
(© Stefan Chmiela)

The lack of simultaneously accurate and efficient approaches makes many phenomena hard to model reliably. “Reconciling these two contradicting aspects of accuracy and computational speed is our goal” states Stefan Chmiela. “Our idea is to incorporate readily available fundamental prior knowledge into modern learning machines. We leverage conservation laws – which are derivable for many symmetries of physical systems, in order to increase the models ability to be accurate with less data.”