BIFOLD Colloquium 2022/02/14

Home >

BIFOLD Colloquium 2022/02/14

Apache Flink – From Academia into the Apache Software Foundation and Beyond

Speaker: Dr. Fabian Hüske

Venue: Virtual event

Time and Date: 4:00 pm, 14 February

Registration: If you are interested in participating, please contact: coordination@bifold.berlin

Abstract:

Apache Flink is a project with a very active, supportive, and continuously growing community. For several years in a row, Flink has been among the top ten projects of the Apache Software Foundation with the most traffic on user and development activity. Looking back, Flink started as a research prototype developed by three PhD students at TU Berlin in 2009. In 2014, the developers donated the code base to the ASF and joined the newly founded Apache Flink incubator project. Within three years, Flink grew into a healthy project and gained a lot of momentum. Now, almost 8 years later, the community is still growing and actively developing Flink. Moreover, it has established itself in the industry as a standard tool for scalable stream and batch processing.

In this presentation, Fabian Hüske will discuss Flink’s journey from an academic research project to one of the most active projects of the Apache Software Foundation. He will talk about the academic roots of the project, how the original developers got introduced to the ASF, Flink’s incubation phase, and how its user community and developer base evolved after it graduated and became an ASF top-level project. The talk will focus on the decisions, efforts, and circumstances that helped to grow a vital and welcoming open source community

Speaker:
(Copyright: private)

Fabian Hüske is a software engineer working on streaming things at Snowflake. He is a PMC member of Apache Flink and one of the three original authors of the Stratosphere research system, from which Apache Flink was forked in 2014. Fabian is a co-founder of data Artisans (now Ververica), a Berlin-based startup devoted to fostering Flink. He holds a PhD in computer science from TU Berlin and is the author of “Stream Processing with Apache Flink”.

BIFOLD Colloquium “Scalable and Fast Cloud Data Management”

Home >

BIFOLD Colloquium “Scalable and Fast Cloud Data Management”

“Scalable and Fast Cloud Data Management”

Speakers: Norbert Ritter (University of Hamburg), Felix Gessert (Baqend), and Wolfram Wingerath (Baqend)

Venue: Virtual event

Time and date: December 06, 2021: 4 pm – 6 pm

Registration: If you are interested in participating, please contact: coordination@bifold.berlin


Abstract:
Database research at the University of Hamburg is centered around scalable technologies for cloud data management and connects the dots between traditional database systems, web caching, and continuous data analytics. In this presentation, we provide a rundown of our research topics throughout the years and explain how we turned them into practice at the Software-as-a-Service company Baqend.
We first present an overview over the system space that we are concerned with and the high-level goals we pursue in our work. We then go into detail on how the Orestes architecture combines web caching with traditional data management techniques to accelerate primary key access in globally distributed setups. Next, we cover the InvaliDB architecture that employs continuous stream processing to extend the Orestes approach to complex database queries. Finally, we explain how the cloud service Speed Kit turns our research into practice by accelerating more than 100 million users per month. We close with ongoing and future work, including the Beaconnect project that revolves around continuous analytics over real-user tracking data with Apache Flink.

Speakers:

Norbert Ritter is a full professor of computer science at the University of Hamburg, where he heads the databases and information systems group (DBIS). He received his PhD from the University of Kaiserslautern in 1997. His research interests include distributed and federated database systems, transaction processing, caching, cloud data management, information integration, and autonomous database systems. He has been teaching NoSQL topics in various database courses for several years. Seeing the many open challenges for NoSQL systems, he, Wolfram Wingerath and Felix Gessert have been organizing the annual Scalable Cloud Data Management Workshop to promote research in this area.

Felix Gessert is CEO and co-founder of the Software-as-a-Service company Baqend. During his PhD studies at the University of Hamburg, he developed the core technology behind Baqend’s web performance service. Felix is passionate about making the web faster by turning research results into real-world applications. He frequently talks at conferences about exciting technology trends in data management and web performance. As a Junior Fellow of the German Informatics Society (GI), he is working on new ideas to facilitate the research transfer of academic computer science innovation into practice.

Wolfram Wingerath is the leading data engineer at Baqend where he is responsible for data analytics and all things related to real-time query processing. Starting in 2022, he will take over the Data Science professorship at the University of Oldenburg and will therefore transition into the Head of Research position at Baqend. During his PhD studies at the University of Hamburg, he conceived the scalable design behind Baqend’s real-time query engine and thereby also developed a strong background in real-time databases and related technology such as scalable stream processing, NoSQL database systems, cloud computing, and Big Data analytics.

New BIFOLD Research Groups established

Home >

New BIFOLD Research Groups established

New BIFOLD Research Groups established

The Berlin Institute for the Foundations of Learning and Data (BIFOLD) set up two new Research Training Groups, led by Dr. Stefan Chmiela and Dr. Steffen Zeuch. The goal of these new research units at BIFOLD is to enable a junior researcher to conduct independent research and prepare him for a leadership position. Initial funding includes their own position as well as two PhD students and/or research associates for three years.

One of the new Research Training Groups at BIFOLD led by Dr. Steffen Zeuch focuses on a general purpose, end-to-end data management system for the IoT.
(© Pixabay)

Steffen Zeuch is interested in how to overcome the data management challenges that the growing number of Internet of Things (IoT) devices bring: “Over the last decade, the amount of produced data has reached unseen magnitudes. Recently, the International Data Corporation estimated that by 2025, the global amount of data will reach 175 Zettabyte (ZB) and that 30 percent of this data will be gathered in real-time. In particular, the number of IoT devices increases exponentially such that the IoT is expected to grow as large as 20 billion connected devices in 2025.” The explosion in the number of devices will create novel data-driven applications in the near future. These applications require low-latency, location awareness, wide-spread geographical distribution, and real-time data processing on potentially millions of distributed data sources.

Dr. Steffen Zeuch
(© Steffen Zeuch)

“To enable these applications, a data management system needs to leverage the capabilities of IoT devices outside the cloud. However, today’s classical data management systems are not ready yet for these applications as they are designed for the cloud,” explains Steffen Zeuch. “The focus of my research lies in introducing the NebulaStream Platform – a general purpose, end-to-end data management system for the IoT.”

Stefan Chmiela concentrates on so-called many-body problems. This broad category of physical problems deals with systems of interacting particles, with the goal to accurately characterize their dynamic behavior. These types of problems arise in many disciplines, including quantum mechanics, structural analysis and fluid dynamics and generally require solving high-dimensional partial differential equations. “In my research group we will particularly focus on problems from quantum chemistry and condensed matter physics, as these fields of science rank among the most computationally intensive”, explains Stefan Chmiela. In these systems, highly complex collective behavior emerges from relatively simple physical laws for the motion of each individual particle. Because of this, the simulation of high-dimensional many-body problems requires extremely challenging computation capacities. There is a limit to how much computational efficiency can be gained through rigorous mathematical and physical approximations, yet fully empirical solutions are often too simplistic to be sufficiently predictive.

Dr. Stefan Chmiela
(© Stefan Chmiela)

The lack of simultaneously accurate and efficient approaches makes many phenomena hard to model reliably. “Reconciling these two contradicting aspects of accuracy and computational speed is our goal” states Stefan Chmiela. “Our idea is to incorporate readily available fundamental prior knowledge into modern learning machines. We leverage conservation laws – which are derivable for many symmetries of physical systems, in order to increase the models ability to be accurate with less data.”

http://www.sgdml.org