Nowadays, dynamic graphs that continuously evolve are ubiquitous. To handle graph processing tasks on these dynamic graphs, solutions generally require incremental computation rather than recomputing results from scratch. This incremental approach effectively avoids unnecessary overhead and recalculation of unchanged parts of the results. The objective of this thesis is to identify redundancies in key research problems related to dynamic graphs and develop innovative algorithms that efficiently and effectively provide up-to-date results. In this thesis, we explore three distinct problems in highly dynamic graphs: (1) random walks, (2) butterfly counting, and (3) overlapping community detection. For each of these research problems, our algorithms incrementally compute the results without the need for complete recomputation. The first research problem we address is random walks in highly dynamic environments.
We introduce Wharf, a parallel system designed to generate and update huge random walk corpuses in these dynamic settings while storing them in a compact form. Wharf maintains a compressed, high-throughput, and low-latency data structure for storing random walks. Specifically, Wharf combines compressed purely functional binary trees with pairing functions to achieve an effective representation of the walks. Moreover, Wharf efficiently updates the walks in a batch fashion and reduces the search space during walk updates through the ordering properties of pairing functions. We evaluate Wharf, with real and synthetic graphs, with respect to throughput and latency when updating random walks. The results indicate the superiority of Wharf over inverted index- and tree-based solutions.
The second research problem we tackle is butterfly counting in fully dynamic bipartite graphs that entail both edge insertions and deletions. We introduce Abacus, a novel algorithm for estimating butterfly counts in fully dynamic graphs, which is the first method capable of addressing edge deletions along with edge insertions. Abacus provides highly accurate butterfly count estimates, achieving quite smaller errors compared to baselines. Furthermore, Abacus is efficient, performing butterfly counting with throughput comparable to competitors, and maintains linear scalability. Moreover, Abacus is theoretically robust and consistently delivers unbiased estimates of low variance as the input bipartite graph changes. We also introduce ParAbacus, the parallel version of Abacus, which conducts mini-batch processing and counts butterflies in a load-balanced way using versioned samples. We demonstrate that ParAbacus achieves significant speedup, making it well-suited for highly dynamic scenarios.
The third research problem we address is overlapping community detection on dynamic graphs. We introduce Communis, an innovative approach for efficiently and effectively identifying overlapping communities in dynamic graphs. Specifically, we devise incremental algorithms to update a persona graph, a concept that facilitates overlapping community detection using non-overlapping techniques. Communis is the first method to perform overlapping community detection on dynamic graphs. We demonstrate that Communis is significantly faster than the baseline across all datasets as the graph scales, without compromising the quality of the communities it detects. In addition, we compare Communis to a state-of-the-art method for overlapping community detection in static graphs and show that our method discovers higher-quality communities.
The outcome of the thesis is a set of techniques to efficiently and effectively tackle the problems of random walks, butterfly counting, and overlapping community detection on highly dynamic graphs. Our proposed solutions are not only efficient in providing up-to-date results but also effective in delivering high-quality results.