Banner Banner
Icon

December 22, 2020

Prof. Dr. Volker Markl

TU Berlin and DFKI vision paper on data science ecosystem “Agora” was accepted for publication in SIGMOD record

A vision paper by researchers of the Database Systems and Information Management group (DIMA) at TU Berlin and the Intelligent Analytics for Massive Data (IAM) group at DFKI was accepted for publication in SIGMOD Record. In their paper the authors describe their vision towards a unified ecosystem that brings together data, algorithms, models, and computational resources and provides them to a broad audience.

The Paper “Agora: Bringing Together Datasets, Algorithms, Models and More in a Unified Ecosystem [Vision]” by Jonas Traub, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz and Volker Markl presents the vision of the data management system Agora that operates in a heavily decentralized and dynamic environment where data, algorithms, and even compute resources are dynamically created, modified, and removed by different stakeholders. It aims to offer a flexible exchange of assets among users and thereby addresses the problem of a lock-in effect to the few data management system providers that currently can afford the large investments into the multitude of assets necessary for such a system.

More information on SIGMOD Record is available at https://sigmodrecord.org/.

Authors: Jonas Traub, Zoi Kaoudi, Jorge-Arnulfo Quiané-Ruiz and Volker Markl

Abstract:
Data science and artificial intelligence are driven by a plethora of diverse data-related assets, including datasets, data streams, algorithms, processing software, compute resources, and domain knowledge. As providing all these assets requires a huge investment, data science and artificial intelligence technologies are currently dominated by a small number of providers who can afford these investments. This leads to lock-in effects and hinders features that require a flexible exchange of assets among users. In this paper, we introduce Agora, our vision towards a unified ecosystem that brings together data, algorithms, models, and computational resources and provides them to a broad audience. Agora (i) treats assets as first-class citizens and leverages a fine-grained exchange of assets, (ii) allows for combining assets to novel applications, and (iii) flexibly executes such applications on available resources. As a result, it enables easy creation and composition of data science pipelines as well as their scalable execution. In contrast to existing data management systems, Agora operates in a heavily decentralized and dynamic environment: Data, algorithms, and even compute resources are dynamically created, modified, and removed by different stakeholders. Agora presents novel research directions for the data management community as a whole: It requires to combine our traditional expertise in scalable data processing and management with infrastructure provisioning as well as economic and application aspects of data, algorithms, and infrastructure.

Publication:
Accepted at SIGMOD Record
Preprint