Multiple internet network architectures papers presented at IMC ’20

MULTIPLE INTERNET NETWORK ARCHITECTURES PAPERS PRESENTED AT IMC ’20

BIFOLD’s Principal Investigators Prof. Dr. Smaragdakis, Prof. Dr. Anja Feldmann and other researchers from the Internet Network Architectures (INET) group at TU Berlin presented four papers at the 20th ACM Internet Measurements Conference (IMC ‘20), which took place from October 27 – 29, 2020 as a virtual event. Among other topics, they examined the effects of the first pandemic lockdown on Internet traffic.

Among other topics Prof. Dr. Smaragdakis, Prof. Dr. Anja Feldmann et al. analyzed how Internet traffic changed during the first lockdown due to the COVID 19 pandemic. They found that the Internet infrastructure is able to handle the new volume, as most traffic shifts occur outside of traditional peak hours.
In another paper, the authors discovered that millions of IoT devices are detectable and identifiable within hours. Their methodology was able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While their methodology is effective for providing network analytics, it also highlights significant privacy consequences.

THE PAPERS IN DETAIL:

Authors:
Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, Georgios Smaragdakis

Abstract:
Due to the COVID-19 pandemic, many governments imposed lock-downs that forced hundreds of millions of citizens to stay at home. The implementation of confinement measures increased Internet traffic demands of residential users, in particular, for remote working, entertainment, commerce, and education, which, as a result, caused traffic shifts in the Internet core.
In this paper, using data from a diverse set of vantage points (one ISP, three IXPs, and one metropolitan educational network), we examine the effect of these lockdowns on traffic shifts. We find that the traffic volume increased by 15-20% almost within a week—while overall still modest, this constitutes a large increase within this short time period. However, despite this surge, we observe that the Internet infrastructure is able to handle the new volume, as most traffic shifts occur outside of traditional peak hours. When looking directly at the traffic sources, it turns out that, while hypergiants still contribute a significant fraction of traffic, we see (1) a higher increase in traffic of non-hypergiants, and (2) traffic increases in applications that people use when at home, such as Web conferencing, VPN, and gaming. While many networks see increased traffic demands, in particular, those providing services to residential users, academic networks experience major overall decreases. Yet, in these networks, we can observe substantial increases when considering applications associated to remote working and lecturing.

Publication:
Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, Georgios Smaragdakis: The Lockdown Effect: Implications of the COVID-19 Pandemic on Internet Traffic. IMC ’20: Proceedings of the ACM Internet Measurement Conference, October 2020, Pages 1–18
https://doi.org/10.1145/3419394.3423658

Authors:
Said Jawad Saidi, Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, Daniel J. Dubois, David Choffnes, Georgios Smaragdakis, Anja Feldmann

Abstract:
Consumer Internet of Things (IoT) devices are extremely popular, providing users with rich and diverse functionalities, from voice assistants to home appliances. These functionalities often come with significant privacy and security risks, with notable recent large-scale coordinated global attacks disrupting large service providers. Thus, an important first step to address these risks is to know what IoT devices are where in a network. While some limited solutions exist, a key question is whether device discovery can be done by Internet service providers that only see sampled flow statistics. In particular, it is challenging for an ISP to efficiently and effectively track and trace activity from IoT devices deployed by its millions of subscribers—all with sampled network data.
In this paper, we develop and evaluate a scalable methodology to accurately detect and monitor IoT devices at subscriber lines with limited, highly sampled data in-the-wild. Our findings indicate that millions of IoT devices are detectable and identifiable within hours, both at a major ISP as well as an IXP, using passive, sparsely sampled network flow headers. Our methodology is able to detect devices from more than 77% of the studied IoT manufacturers, including popular devices such as smart speakers. While our methodology is effective for providing network analytics, it also highlights significant privacy consequences.

Publication:
Said Jawad Saidi, Anna Maria Mandalari, Roman Kolcun, Hamed Haddadi, Daniel J. Dubois, David Choffnes, Georgios Smaragdakis, Anja Feldmann: A Haystack Full of Needles: Scalable Detection of IoT Devices in the Wild. IMC ’20: Proceedings of the ACM Internet Measurement Conference, October 2020, Pages 87–100
https://doi.org/10.1145/3419394.3423650

Data:
Signatures for IoT devices (supported by European Research Council (ERC) Starting Grant ResolutioNet (ERCStG-679158))

Authors:
Srdjan Matic, Costas Iordanou, Georgios Smaragdakis, Nikolaos Laoutaris

Abstract:
Several data protection laws include special provisions for protecting personal data relating to religion, health, sexual orientation, and other sensitive categories. Having a well-defined list of sensitive categories is sufficient for filing complaints manually, conducting investigations, and prosecuting cases in courts of law. Data protection laws, however, do not define explicitly what type of content falls under each sensitive category. Therefore, it is unclear how to implement proactive measures such as informing users, blocking trackers, and filing complaints automatically when users visit sensitive domains. To empower such use cases we turn to the Curlie.org crowdsourced taxonomy project for drawing training data to build a text classifier for sensitive URLs. We demonstrate that our classifier can identify sensitive URLs with accuracy above 88%, and even recognize specific sensitive categories with accuracy above 90%. We then use our classifier to search for sensitive URLs in a corpus of 1 Billion URLs collected by the Common Crawl project. We identify more than 155 millions sensitive URLs in more than 4 million domains. Despite their sensitive nature, more than 30% of these URLs belong to domains that fail to use HTTPS. Also, in sensitive web pages with third-party cookies, 87% of the third-parties set at least one persistent cookie.

Publication:
Srdjan Matic, Costas Iordanou, Georgios Smaragdakis, Nikolaos Laoutaris: Identifying Sensitive URLs at Web-Scale. IMC ’20: Proceedings of the ACM Internet Measurement Conference, October 2020, Pages 619–633
https://doi.org/10.1145/3419394.3423653

Data:
https://bitbucket.org/srdjanmatic/sensitive_web/src/master/ (supported by European Research Council (ERC) Starting Grant ResolutioNet (ERCStG-679158))

Authors:
John P. Rula , Philipp Richter , Georgios Smaragdakis , Arthur Berger

Abstract:
This work presents a large-scale, longitudinal measurement study on the adoption of application updates, enabling continuous reporting of potentially vulnerable software populations worldwide. Studying the factors impacting software currentness, we investigate and discuss the impact of the platform and its updating strategies on software currentness, device lock-in effects, as well as user behavior. Utilizing HTTP User-Agent strings from end-hosts, we introduce techniques to extract application and operating system information from myriad structures, infer version release dates of applications, and measure population adoption, at a global scale. To deal with loosely structured User-Agent data, we develop a semi-supervised method that can reliably extract application and version information for some 87% of requests served by a major CDN every day. Using this methodology, we track release and adoption dynamics of some 35,000 applications. Analyzing over three years of CDN logs, we show that vendors’ update strategies and platforms have a significant effect on the adoption of application updates. Our results show that, on some platforms, up to 25% of requests originate from hosts running application versions that are out-of-date by more than 100 days, and 16% more than 300 days. We find pronounced differences across geographical regions, and overall, less developed regions are more likely to have out-of-date software versions. Though, for every country, we find that at least 10% of requests reaching the CDN run software that is out-of-date by more than three months.

Publication:
John P. Rula , Philipp Richter , Georgios Smaragdakis , Arthur Berger: Who’s left behind?: Measuring Adoption of Application Updates at Scale. IMC ’20: Proceedings of the ACM Internet Measurement Conference, October 2020, Pages 710–723
https://doi.org/10.1145/3419394.3423656