Two Demo Papers accepted at VLDB

Home >

Two Demo Papers accepted at VLDB

Two demo papers accepted at VLDB

Two demonstration papers of BIFOLD researchers have been accepted at the 48th International Conference on Very Large Databases (VLDB). The VLDB 2022 will take place in Sydney, Australia (and hybrid) in September 05-09, 2022.

DORIAN in action: Assisted Design of Data Science Pipelines

Authors: Sergey Redyuk, Zoi Kaoudi, Sebastian Schelter, Volker Markl

Abstract: Existing automated machine learning solutions and intelligent discovery assistants are popular tools that facilitate the end-user with the design of data science (DS) pipelines. However, they yield limited applicability for a wide range of real-world use cases and application domains due to (a) the limited support of DS tasks; (b) a small, static set of available operators; and (c) restriction to evaluation processes with objective loss functions. We demonstrate DORIAN, a human-in-the-loop approach for the assisted design of data science pipelines that supports a large and growing set of DS tasks, operators, and arbitrary user-defined evaluation procedures. Based on the user query, i.e., a dataset and a DS task, DORIAN computes a ranked list of candidate pipelines that the end-user can choose from, alter, execute and evaluate. It stores executed pipelines in an experiment database and utilizes similarity-based search to identify relevant previously-run pipelines from the experiment database. DORIAN also takes user interaction into account to improve suggestions over time. We show how users can interact with DORIAN to create and compare DS pipelines on various real-world DS tasks we extracted from OpenML.

Satellite Image Search in AgoraEO

Authors: Ahmet Kerem Aksoy, Pavel Dushev, Eleni Tzirita Zacharatou, Holmer Hemsen, Marcela Charfuelan, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, Volker Markl

Abstract: The growing operational capability of global Earth Observation (EO) creates new opportunities for data-driven approaches to understand
and protect our planet. However, the current use of EO archives is very restricted due to the huge archive sizes and the limited exploration capabilities provided by EO platforms. To address this limitation, we have recently proposed MiLaN, a content-based image retrieval approach for fast similarity search in satellite image archives. MiLaN is a metric-learning-based deep hashing network that encodes high-dimensional image features into compact binary hash codes. We then use these codes as keys in a hash table to enable real-time nearest neighbor search and highly accurate retrieval. In this demonstration, we showcase the efficiency of MiLaN by integrating it with EarthQube, a browser and search engine within AgoraEO. EarthQube supports interactive visual exploration and Query-by-Example over satellite image repositories. Demo visitors will interact with EarthQube playing the role of different users that search images in a large-scale remote sensing archive by their semantic content and apply other filters.

Earth Observation data for climate change research

Home >

Earth Observation data for climate change research

Earth Observation data for climate change research

AgoraEO: One platform integrates data from all over the world  
Visualization of sea surface temperature and salinity based on EO data.
(Copyright: European Space Agency)

Environmental reports on the dramatic retreat of the Arctic ice sheet, the ongoing deforestation of rain forests or the spread of forest fires are mostly based on the data analysis of satellite images. The analysis of large amounts of Earth Observation (EO) data plays a crucial role in understanding and quantifying climate change.

“The efficient use of these data makes it possible to monitor and predict the effects of climate change on a global scale with unprecedented reliability,” explains Prof. Dr. Begüm Demir, head of the Big Data Analytics for Earth Observation research group at the Berlin Institute for the Foundations of Learning and Data (BIFOLD) and professor of Remote Sensing Image Analysis at TU Berlin. Advances in satellite systems have massively increased the amount and the variety massively increased the amount and the variety as well as the spatial and spectral resolution of EO data. “Nowadays we possess huge EO data archives. The Sentinel satellites in the Copernicus program alone – Europe’s flagship EO satellite initiative – provide us with about 12 terabytes of satellite images per day,” says Begüm Demir.

The European Space Agency uses a multitude of satellites to create large ammounts of EO data.
(Copyright: European Space Agency)

The problem: There is no single platform that connects the different datasets of interest from all over the world intelligently. All existing analysis platforms rely on heterogeneous technologies with different interfaces and data formats, which prevents cross-platform use. For example, it is nearly impossible to apply an analytics procedure developed on one platform to another. “It’s like using Word on a PC without a Windows environment – meaning you have to instruct each computing operation individually. This ‘lock-in effect’ hinders innovation and thus the efficient use of the collected data for climate protection,” describes Dr. Jorge Quiané-Ruiz, head of the Big Data Systems research group at BIFOLD.

Overcoming these limitations in the use of EO data sets is the common goal of Begüm Demir and Jorge Quiané-Ruiz. Their project: AgoraEO: a universal Earth Observation ecosystem infrastructure for sharing, finding, assembling, and running datasets, algorithms, and other tools. While Begüm Demir brings expertise on remote sensing data processing and analysis, Jorge Quiané-Ruiz is an expert in data processing and data management. He develops the Agora infrastructure, a more general-purpose ecosystem for data science and AI innovation, on which AgoraEO is partially based on.

AgoraEO’s innovative infrastructure allows all interested parties to contribute both EO data as well as technologies without having to upload them to a common server. “Our goal is to create an infrastructure that enables federated analysis across different platforms, making modern Earth observation technology accessible to all scientists and society, thus promoting climate change innovation worldwide,” sais Jorge Quiané-Ruiz.

*This article appeared for the first time on 31.07.2021 in the supplement “Climate Research” of Der Tagesspiegel, Berlin.

More information is available at: https://www.tu.berlin/en/themen/special-supplement-in-der-tagesspiegel-july-2021/

YouTube

By loading the video, you agree to YouTube’s privacy policy.
Learn more

Load video

BIFOLD Junior Fellow Dr. Eleni Tzirita Zacharatou presents the vision for AgoraEO in a talk at TU Twente.

For more technical insights into the research in the Agora project, visit their blog on Medium: https://medium.com/the-research-behind-agora

Publication:

AGORA-EO: A UNIFIED ECOSYSTEM FOR EARTH OBSERVATION
– A VISION FOR BOOSTING EO DATA LITERACY –

Authors: Arne de Wall, Björn Deiseroth, Eleni Tzirita Zacharatou, Jorge-Arnulfo Quiané-Ruiz, Begüm Demir, Volker Markl

Abstract:
Today, interoperability among EO exploitation platforms is almost inexistent as most of them rely on a heterogeneous set of technologies with varying interfaces and data formats. Thus, it is crucial to enable cross-platform (federated) analytics to make EO technology easily accessible to everyone. We envision AgoraEO, an EO ecosystem for sharing, finding, composing, and executing EO assets, such as datasets, algorithms, and tools. Making AgoraEO a reality is challenging for several reasons, the main ones being that the ecosystem must provide interactive response times and operate seamlessly over multiple exploitation platforms. In this paper, we discuss the different challenges that AgoraEO poses as well as our ideas to tackle them. We believe that having an open, unified EO ecosystem would foster innovation and boost EOdata literacy for the entire population.

In proceedings of BiDS 2021