Data management systems researchers in the Database Systems and Information Management (DIMA) Group at TU Berlin and the Intelligent Analytics for Massive Data (IAM) Group at DFKI (the German Research Institute for Artificial Intelligence) were informed that their papers have been accepted at the 2020 ACM SIGMOD/PODS International Conference on the Management of Data.
The “Rhino: Efficient Management of Very Large Distributed State for Stream Processing Engines,” paper authored by Del Monte et al. addresses the problem of large state migration and on-the-fly query reconfiguration, to support resource elasticity, fault-tolerance, and runtime optimization (e.g., for load balancing). A stream processing engine equipped with Rhino is capable of attaining lower latency processing and achieving continuous operation, even in the presence of failures.
The “Optimizing Machine Learning Workloads in Collaborative Environments,” paper authored by Derakhshan et al. presents a system that is capable of optimizing the execution of machine learning workloads in collaborative environments. This accomplishment is achieved by exploiting an experiment graph of stored artifacts drawn from previously performed operations and results.
The “Grizzly: Efficient Stream Processing Through Adaptive Query Compilation,” paper authored by Grulich et al. presents a novel adaptive query compilation-based stream processing engine that enables highly-efficient query execution on modern hardware and is able to dynamically adjust to changing data characteristics at runtime.
The “Pump Up the Volume: Processing Large Data on GPUs with Fast Interconnects,” paper authored by Lutz et al. provides an in-depth analysis of the new NVLink 2.0 interconnect technology, which enables users to overcome data transfer bottlenecks and efficiently process large datasets stored in main-memory on GPUs.
The parallel acceptance of these four publications at one of the top data management conferences is not only a great success for TU Berlin’s DIMA Group and DFKI’S IAM Group, it also shows that BIFOLD, the Berlin Institute for the Foundations of Learning and Data continues to positively impact international artificial intelligence and data management research.
THE PAPERS IN DETAIL:
>
Authors: Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, Volker Markl
Abstract: GPUs have long been discussed as accelerators for database query processing because of their high processing power and memory bandwidth. However, two main challenges limit the utility of GPUs for large-scale data processing: (1) the onboard memory capacity is too small to store large data sets, yet (2) the interconnect bandwidth to CPU main-memory is insufficient for ad-hoc data transfers. As a result, GPU-based systems and algorithms run into a transfer bottleneck and do not scale to large data sets. In practice, CPUs process large-scale data faster than GPUs with current technology. In this paper, we investigate how a fast interconnect can resolve these scalability limitations using the example of NVLink 2.0. NVLink 2.0 is a new interconnect technology that links dedicated GPUs to a CPU. The high bandwidth of NVLink 2.0 enables us to overcome the transfer bottleneck and to efficiently process large data sets stored in main-memory on GPUs. We perform an in-depth analysis of NVLink 2.0 and show how we can scale a no-partitioning hash join beyond the limits of GPU memory. Our evaluation shows speedups of up to 18× over PCI-e 3.0 and up to 7.3× over an optimized CPU implementation. Fast GPU interconnects thus enable GPUs to efficiently accelerate query processing.
Authors: Clemens Lutz, Sebastian Breß, Steffen Zeuch, Tilmann Rabl, Volker Markl
Abstract: GPUs have long been discussed as accelerators for database query processing because of their high processing power and memory bandwidth. However, two main challenges limit the utility of GPUs for large-scale data processing: (1) the onboard memory capacity is too small to store large data sets, yet (2) the interconnect bandwidth to CPU main-memory is insufficient for ad-hoc data transfers. As a result, GPU-based systems and algorithms run into a transfer bottleneck and do not scale to large data sets. In practice, CPUs process large-scale data faster than GPUs with current technology. In this paper, we investigate how a fast interconnect can resolve these scalability limitations using the example of NVLink 2.0. NVLink 2.0 is a new interconnect technology that links dedicated GPUs to a CPU. The high bandwidth of NVLink 2.0 enables us to overcome the transfer bottleneck and to efficiently process large data sets stored in main-memory on GPUs. We perform an in-depth analysis of NVLink 2.0 and show how we can scale a no-partitioning hash join beyond the limits of GPU memory. Our evaluation shows speedups of up to 18× over PCI-e 3.0 and up to 7.3× over an optimized CPU implementation. Fast GPU interconnects thus enable GPUs to efficiently accelerate query processing.
>
Authors: Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Ziawasch Abedjan, Tilmann Rabl, and Volker Markl
Abstract: Effective collaboration among data scientists results in high-quality and efficient machine learning (ML) workloads. In a collaborative environment, such as Kaggle or Google Colabratory, users typically re-execute or modify published scripts to recreate or improve the result. This introduces many redundant data processing and model training operations. Reusing the data generated by the redundant operations leads to the more efficient execution of future workloads. However, existing collaborative environments lack a data management component for storing and reusing the result of previously executed operations.
In this paper, we present a system to optimize the execution of ML workloads in collaborative environments by reusing previously performed operations and their results. We utilize a so-called Experiment Graph (EG) to store the artifacts, i.e., raw and intermediate data or ML models, as vertices and operations of ML workloads as edges. In theory, the size of EG can become unnecessarily large, while the storage budget might be limited. At the same time, for some artifacts, the overall storage and retrieval cost might outweigh the recomputation cost. To address this issue, we propose two algorithms for materializing artifacts based on their likelihood of future reuse. Given the materialized artifacts inside EG, we devise a linear-time reuse algorithm to find the optimal execution plan for incoming ML workloads. Our reuse algorithm only incurs a negligible overhead and scales for the high number of incoming ML workloads in collaborative environments. Our experiments show that we improve the run-time by one order of magnitude for repeated execution of the workloads and 50% for the execution of modified workloads in collaborative environments.
Authors: Behrouz Derakhshan, Alireza Rezaei Mahdiraji, Ziawasch Abedjan, Tilmann Rabl, and Volker Markl
Abstract: Effective collaboration among data scientists results in high-quality and efficient machine learning (ML) workloads. In a collaborative environment, such as Kaggle or Google Colabratory, users typically re-execute or modify published scripts to recreate or improve the result. This introduces many redundant data processing and model training operations. Reusing the data generated by the redundant operations leads to the more efficient execution of future workloads. However, existing collaborative environments lack a data management component for storing and reusing the result of previously executed operations.
In this paper, we present a system to optimize the execution of ML workloads in collaborative environments by reusing previously performed operations and their results. We utilize a so-called Experiment Graph (EG) to store the artifacts, i.e., raw and intermediate data or ML models, as vertices and operations of ML workloads as edges. In theory, the size of EG can become unnecessarily large, while the storage budget might be limited. At the same time, for some artifacts, the overall storage and retrieval cost might outweigh the recomputation cost. To address this issue, we propose two algorithms for materializing artifacts based on their likelihood of future reuse. Given the materialized artifacts inside EG, we devise a linear-time reuse algorithm to find the optimal execution plan for incoming ML workloads. Our reuse algorithm only incurs a negligible overhead and scales for the high number of incoming ML workloads in collaborative environments. Our experiments show that we improve the run-time by one order of magnitude for repeated execution of the workloads and 50% for the execution of modified workloads in collaborative environments.
>
Authors: Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl
Abstract: Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of- the-art, and reduces latency by three orders of magnitude upon a reconfiguration.
Authors: Bonaventura Del Monte, Steffen Zeuch, Tilmann Rabl, Volker Markl
Abstract: Scale-out stream processing engines (SPEs) are powering large big data applications on high velocity data streams. Industrial setups require SPEs to sustain outages, varying data rates, and low-latency processing. SPEs need to transparently reconfigure stateful queries during runtime. However, state-of-the-art SPEs are not ready yet to handle on-the-fly reconfigurations of queries with terabytes of state due to three problems. These are network overhead for state migration, consistency, and overhead on data processing. In this paper, we propose Rhino, a library for efficient reconfigurations of running queries in the presence of very large distributed state. Rhino provides a handover protocol and a state migration protocol to consistently and efficiently migrate stream processing among servers. Overall, our evaluation shows that Rhino scales with state sizes of up to TBs, reconfigures a running query 15 times faster than the state-of- the-art, and reduces latency by three orders of magnitude upon a reconfiguration.
>
Authors: Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl
Abstract: Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They rely on managed runtimes, an interpretation-based processing model, and do not perform runtime optimizations. Recent research states that this limits the utilization of modern hardware and neglects changing data characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE to enable highly efficient query execution on modern hardware. We extend query-compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime re-optimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to dynamically adjust to changing data-characteristics at runtime. Our experiments show that Grizzly achieves up to an order of magnitude higher throughput and lower latency compared to state-of-the-art interpretation-based SPEs.
Authors: Philipp M. Grulich, Sebastian Breß, Steffen Zeuch, Jonas Traub, Janis von Bleichert, Zongxiong Chen, Tilmann Rabl, Volker Markl
Abstract: Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They rely on managed runtimes, an interpretation-based processing model, and do not perform runtime optimizations. Recent research states that this limits the utilization of modern hardware and neglects changing data characteristics at runtime. In this paper, we present Grizzly, a novel adaptive query compilation-based SPE to enable highly efficient query execution on modern hardware. We extend query-compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime re-optimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to dynamically adjust to changing data-characteristics at runtime. Our experiments show that Grizzly achieves up to an order of magnitude higher throughput and lower latency compared to state-of-the-art interpretation-based SPEs.