Two demonstration papers of BIFOLD researchers have been accepted at the 48th International Conference on Very Large Databases (VLDB). The VLDB 2022 will take place in Sydney, Australia (and hybrid) in September 05-09, 2022.
Dorian in action: Assisted design of data science pipelines
Authors: Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, Volker Markl
Abstract: Existing automated machine learning solutions and intelligent discovery assistants are popular tools that facilitate the end-user with the design of data science (DS) pipelines. However, they yield limited applicability for a wide range of real-world use cases and application domains due to (a) the limited support of DS tasks; (b) a small, static set of available operators; and (c) restriction to evaluation processes with objective loss functions. We demonstrate DORIAN, a human-in-the-loop approach for the assisted design of data science pipelines that supports a large and growing set of DS tasks, operators, and arbitrary user-defined evaluation procedures. Based on the user query, i.e., a dataset and a DS task, DORIAN computes a ranked list of candidate pipelines that the end-user can choose from, alter, execute and evaluate. It stores executed pipelines in an experiment database and utilizes similarity-based search to identify relevant previously-run pipelines from the experiment database. DORIAN also takes user interaction into account to improve suggestions over time. We show how users can interact with DORIAN to create and compare DS pipelines on various real-world DS tasks we extracted from OpenML.
Satellite image search in Agora-EO
Authors: Ahmet Kerem Aksoy, Pavel Dushev, Eleni Tzirita Zacharatou, Holmer Hemsen, Marcela Charfuelan, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, Volker Markl
Abstract: The growing operational capability of global Earth Observation (EO) creates new opportunities for data-driven approaches to understand
and protect our planet. However, the current use of EO archives is very restricted due to the huge archive sizes and the limited exploration capabilities provided by EO platforms. To address this limitation, we have recently proposed MiLaN, a content-based image retrieval approach for fast similarity search in satellite image archives. MiLaN is a metric-learning-based deep hashing network that encodes high-dimensional image features into compact binary hash codes. We then use these codes as keys in a hash table to enable real-time nearest neighbor search and highly accurate retrieval. In this demonstration, we showcase the efficiency of MiLaN by integrating it with EarthQube, a browser and search engine within AgoraEO. EarthQube supports interactive visual exploration and Query-by-Example over satellite image repositories. Demo visitors will interact with EarthQube playing the role of different users that search images in a large-scale remote sensing archive by their semantic content and apply other filters.