Banner Banner

Fault Tolerance Placement in the Internet of Things

Anastasiia Kozar
Bonaventura Del Monte
Steffen Zeuch
Volker Markl

May 30, 2024

Today's IoT applications exploit the capabilities of three different computation environments: sensors, edge, and cloud. Ensuring fault tolerance at the edge level presents unique challenges due to complex network hierarchies and the presence of resource-constrained computing devices. In contrast to the Cloud, the Edge lacks high availability standards and a persistent upstream backup. To ensure reliability, fault tolerance mechanisms have to be deployed on the edge devices along with processing operators competing for available resources. However, existing operator placement strategies are not aware of fault tolerance resource requirements, and existing fault tolerance approaches are not aware of available resources. This miscommunication in resource-constrained environments like the Edge leads to underprovisioning and failures. In this paper, we present a resource-aware fault-tolerance approach that takes the unique characteristics of the Edge into account to provide reliable stream processing. To this end, we model fault tolerance as an operator placement problem that uses multi-objective optimization to decide where to backup data. As opposed to existing approaches that treat operator placement and fault tolerance as two separate steps, we combine them and showcase that this is especially important for low-end edge devices. Overall, our approach effectively mitigates potential failures and outperforms state-of-the-art fault tolerance approaches by up to an order of magnitude in throughput.