SIGCOMM 2021 Best Paper: Internet Hypergiants Expand into End-User Networks

Home >

SIGCOMM 2021 Best Paper: Internet Hypergiants Expand into End-User Networks

SIGCOMM 2021 Best Paper: Internet Hypergiants Expand into End-User Networks

BIFOLD Fellow Prof. Dr. Georgios Smaragdakis and his colleagues received the prestigious ACM SIGCOMM 2021 Best Paper Award for their research into the expansion of Hypergiant’s off-nets. They developed a methodology to measure how a few extremely large internet content providers deploy more and more servers in end-user networks over the last years. Their findings indicate changes in the structure of the internet, potentially impacting network end-user experience and neutrality regulations.

Percentages of internet users that can be served by Hypergiants’ off-nets in their networks.
(Copyright: Petros Gigis et al.)

An increasing amount of the digital content delivered to Internet users originates from a few very large providers, like Google, Facebook or Netflix, the so-called Hypergiants (HGs). In 2007 thousands of autonomous networks (AS) – e.g. networks of an Internet service provider or university – were necessary to provide 50% of all content. In 2019 only five Hypergiants managed to originate half of the total Internet traffic alone. To cope with the unprecedented demand, most of these Hypergiants increased their network capacities in their own networks, but also installed and operated their servers, called off-nets, inside other networks. Such off-nets operate closer to the end user and, thus, accelerate content delivery to end-users as well as support applications, e.g., video streaming and edge computing (machine learning, artificial intelligence, and 5G).

At least three of the current HGs increased their off-net footprint significantly between 2013 and 2021 (note: y-axis scales differ).
(Copyright: Petros Gigis et al.)

In their paper “Seven years in the life of Hypergiants’ off-nets,” Georgios Smaragdakis, Professor of Cybersecurity at TU Delft, and his colleagues from University College of London, Microsoft, Columbia University and FORTH-ICS present a methodology to measure the increase of such off-nets footprints by analyzing massive public data sets that include active scans and server digital certificates (TLS) that span over seven years (2013-2021). By analyzing the ownership of the certificates over time, they were able to track the deployment of Hypergiants’ off-nets around the globe. These Internet analytics are important to understand how the structure and operation of the Internet and the data flow has changed. For this work the researchers received the prestigious Best Paper Award of the 2021 ACM Special Interest Group on Data Communication (SIGCOMM 2021) conference. SIGCOMM is the flagship conference of the Association of Computing Machinery (ACM) on the topics of internet architecture and networking.

Prof. Dr. Georgios Smaragdakis
(Copyright: Georgios Smaragdakis)

“Internet infrastructures are the backbone of contemporary communication. Understanding developments in this sector is a key prerequisite for improving end-user experience, security, and privacy. We are very pleased that our efforts to monitor and explain changes in the Internet architectures are internationally recognized.”

“This is the first generic and scalable method to survey this development in the wild. We make publicly available the only extensive collection of data and visualizations that describe such Hypergiant off-net developments over seven years, from 2013 to 2021”, explains Georgios Smaragdakis. He and his colleagues found that large Hypergiants can serve large fractions of the world’s internet users directly from within the users’ networks. “While the deployment of off-nets can improve end-user performance and the introduction of encryption improve user privacy, our study shows that information about these deployments is leaked and can be potentially misused by adversaries or to gain business intelligence. In our work we suggest ways to address such issues”, says Georgios Smaragdakis. Prof. Smaragdakis and his colleagues believe that the insights by their data analysis and the release of public data can inform studies in other fields, including economics, political science, and regulation.

The publication in detail:

Petros Gigis, Matt Calder, Lefteris Manassakis, George Nomikos, Vasileios Kotronis, Xenofontas A. Dimitropoulos, Ethan Katz-Bassett, Georgios Smaragdakis: Seven years in the life of Hypergiants’ off-nets. SIGCOMM 2021: 516-533


Content Hypergiants deliver the vast majority of Internet traffic to end users. In recent years, some have invested heavily in deploying services and servers inside end-user networks. With several dozen Hypergiants and thousands of servers deployed inside networks, these off-net (meaning outside the Hypergiant networks) deployments change the structure of the Internet. Previous efforts to study them have relied on proprietary data or specialized per-Hypergiant measurement techniques that neither scale nor generalize, providing a limited view of content delivery on today’s Internet.
In this paper, we develop a generic and easy to implement methodology to measure the expansion of Hypergiants’ off-nets. Our key observation is that Hypergiants increasingly encrypt their traffic to protect their customers’ privacy. Thus, we can analyze publicly available Internet-wide scans of port 443 and retrieve TLS certificates to discover which IP addresses host Hypergiant certificates in order to infer the networks hosting off-nets for the corresponding Hypergiants. Our results show that the number of networks hosting Hypergiant off-nets has tripled from 2013 to 2021, reaching 4.5k networks. The largest Hypergiants dominate these deployments, with almost all of these networks hosting an off-net for at least one — and increasingly two or more — of Google, Netflix, Facebook, or Akamai. These four Hypergiants have off-nets within networks that provide access to a significant fraction of end user population.

More information is available from:

Prof. Dr. Georgios Smaragdakis

TU Delft – Cybersecurity
Van Mourik Broekmanweg 6
2628 XE Delft
The Netherlands


BTW 2021 Best Paper Award and Reproducibility Badge for TU Berlin Data Science Publication

Home >

BTW 2021 Best Paper Award and Reproducibility Badge for TU Berlin Data Science Publication

BTW 2021 Best Paper Award and Reproducibility Badge for TU Berlin Data Science Publication

The research paper “Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing” by Alexander Kumaigorodski, Clemens Lutz, and Volker Markl received the Best Paper Award of the 19th Symposium on Database Systems for Business, Technology and Web (BTW 2021). On top, the paper received the Reproducibility Badge, awarded for the first time by BTW 2021, for the high reproducibility of its results.

TU Berlin Master’s graduate Alexander Kumaigorodski and his co-authors from Prof. Dr. Volker Markl‘s Department of Database Systems and Information Management (DIMA) at TU Berlin and from the Intelligent Analytics for Massive Data (IAM) research area at the German Research Centre for Artificial Intelligence (DFKI) present a new approach to speed up loading and processing of tabular CSV data by orders of magnitude.

CSV is a very frequently used format for the exchange of structured data. For example, the City of Berlin publishes its structured datasets in the CSV format in the Berlin Open Data Portal. Such datasets can be imported into databases for data analysis. Accelerating this process allows users to handle the increasing amount of data and to decrease the time required for its data analysis. Each new generation of computer networks and storage media provides higher bandwidths and allows for faster reading times. However, current loading and processing approaches using main processors (CPU) cannot keep up with these hardware technologies and unnecessarily throttle loading times.

© Alexander Kumaigorodski

The procedure described in this paper uses a new approach where CSV data is read and processed by graphics processors (GPU) instead. The advantage of these graphics processors lies primarily in their strong parallel computing power and fast memory access. Using this approach, new hardware technologies can be fully made use of, e.g., NVLink 2.0 or InfiniBand with Remote Direct Memory Access (RDMA). In conclusion, CSV data can be read directly from main memory or the network and processed with multiple gigabytes per second.

The transparency of the tests performed and the independent confirmation of the results also led to the award of the first-ever BTW 2021 Reproducibility Badge. In the data science community, the reproducibility of research results is becoming increasingly important. It serves to verify results as well as to compare them with existing work and is thus an important aspect of scientific quality assurance. Leading international conferences have therefore already devoted special attention to this topic.

To ensure high reproducibility, the authors provided the reproducibility committee with source code, additional test data, and instructions for running the benchmarks. The execution of the tests was demonstrated in a live session and could then also be successfully replicated by a member of the committee. The Reproducibility Badge recognizes above all the good scientific practice of the authors.

The paper in detail:
“Fast CSV Loading Using GPUs and RDMA for In-Memory Data Processing”

Alexander Kumaigorodski, Clemens Lutz, Volker Markl

Comma-separated values (CSV) is a widely-used format for data exchange. Due to the format’s prevalence, virtually all industrial-strength database systems and stream processing frameworks support importing CSV input. However, loading CSV input close to the speed of I/O hardware is challenging. Modern I/O devices such as InfiniBand NICs and NVMe SSDs are capable of sustaining high transfer rates of 100 Gbit/s and higher. At the same time, CSV parsing performance is limited by the complex control flows that its semi-structured and text-based layout incurs. In this paper, we propose to speed-up loading CSV input using GPUs. We devise a new parsing approach that streamlines the control flow while correctly handling context-sensitive CSV features such as quotes. By offloading I/O and parsing to the GPU, our approach enables databases to load CSVs at high throughput from main memory with NVLink 2.0, as well as directly from the network with RDMA. In our evaluation, we show that GPUs parse real-world datasets at up to 60 GB/s, thereby saturating high-bandwidth I/O devices.

K.-U. Sattler et al. (Hrsg.): Datenbanksysteme für Business, Technologie und Web (BTW 2021),Lecture Notes in Informatics (LNI), Gesellschaft für Informatik, Bonn 2021