New Type of Algorithm for Brain Research

Home >

New Type of Algorithm for Brain Research

New Type of Algorithm for Brain Research

Together with an international team of researchers from Mayo Clinic BIFOLD Co-Director Prof. Dr. Klaus-Robert Müller developed a new type of algorithm to explore which regions of the brain interact with each other. Their results could improve brain stimulation devices to treat disease. For millions of people with epilepsy and movement disorders such as Parkinson’s disease, electrical stimulation of the brain already is widening treatment possibilities. In the future, electrical stimulation may help people with psychiatric illness and direct brain injuries, such as stroke.

Brain
The new type of algorithm may help to understand which brain regions directly interact with one another.
(Copyright: Unsplash)

Studying how brain networks interact with each other is complicated. Brain networks can be explored by delivering brief pulses of electrical current in one area of a patient’s brain while measuring voltage responses in other areas. In principle, one should be able to infer the structure of brain networks from these data. However, with real-world data, this is difficult because the recorded signals are complex, and a limited amount of measurements can be made.

To make the problem manageable, the researchers developed a set of paradigms, or viewpoints, that simplify comparisons between effects of electrical stimulation on the brain. Because a mathematical technique to characterize how assemblies of inputs converge in human brain regions did not exist in the scientific literature, the Mayo team collaborated with Klaus-Robert Müller to develop a new type of algorithm called “basis profile curve identification.”

In a study published in PLOS Computational Biology, a patient with a brain tumor underwent placement of an electrocorticographic electrode array to locate seizures and map brain function before a tumor was removed. Every electrode interaction resulted in hundreds to thousands of time points to be studied using the new algorithm.

“Our findings show that this new type of algorithm may help us understand which brain regions directly interact with one another, which in turn may help guide placement of electrodes for stimulating devices to treat network brain diseases,” says Kai Miller, M.D., Ph.D., a Mayo Clinic neurosurgeon and first author of the study. “As new technology emerges, this type of algorithm may help us to better treat patients with epilepsy, movement disorders like Parkinson’s disease, and psychiatric illnesses like obsessive compulsive disorder and depression.”

BIFOLD Co-Director Prof. Dr. Klaus-Robert Müller
(Copyright: TU Berlin/Felix Noak)

“Neurologic data to date is perhaps the most challenging and exciting data to model for AI researchers,” says Klaus-Robert Müller, study co-author, professor for Machine Learning at Technische Universität Berlin and member of the Google Research Brain Team.”

In the study, the authors provide a downloadable code package so others may explore the technique. “Sharing the developed code is a core part of our efforts to help reproducibility of research,” says Dora Hermes, Ph.D., a Mayo Clinic biomedical engineer and senior author.

This research was supported by National Institutes of Health’s National Center for Advancing Translational Science Clinical and Translational Science Award, National Institute of Mental Health Collaborative Research in Computational Neuroscience, and the Federal Ministry of Education and Research.

The publication in detail:
Basis profile curve identification to understand electrical stimulation effects in human brain networks

Authors:
Kai J. Miller, Klaus-Robert Müller, Dora Hermes

Abstract:
Brain networks can be explored by delivering brief pulses of electrical current in one area while measuring voltage responses in other areas. We propose a convergent paradigm to study brain dynamics, focusing on a single brain site to observe the average effect of stimulating each of many other brain sites. Viewed in this manner, visually-apparent motifs in the temporal response shape emerge from adjacent stimulation sites. This work constructs and illustrates a data-driven approach to determine characteristic spatiotemporal structure in these response shapes, summarized by a set of unique “basis profile curves” (BPCs). Each BPC may be mapped back to underlying anatomy in a natural way, quantifying projection strength from each stimulation site using simple metrics. Our technique is demonstrated for an array of implanted brain surface electrodes in a human patient. This framework enables straightforward interpretation of single-pulse brain stimulation data, and can be applied generically to explore the diverse milieu of interactions that comprise the connectome.

Publication:
 Miller KJ, Müller K-R, Hermes D (2021) Basis profile curve identification to understand electrical stimulation effects in human brain networks. PLoS Comput Biol 17(9): e1008710.
https://doi.org/10.1371/journal.pcbi.1008710

This article was first published on: newsnetwork.mayoclinic.org

COVID-19: A Stresstest for the Internet

Home >

COVID-19: A Stresstest for the Internet

COVID-19: A Stress Test for the Internet

25 percent increase in Internet traffic within only a few days

On 11 March 2020, the day the WHO declared the coronavirus a global pandemic, the impact of SARS-CoV-2 also spread to the World Wide Web. Following this announcement, governments around the world began enacting stay-at-home orders and other regulations for working from home and homeschooling. Within a single week, Internet traffic volume increased by 25 percent – an increase which under normal circumstances is usually observed over the course of a year. Taking account of increased use during the second lockdown in fall 2020, the overall use of Internet services in 2020 increased between 35 and 50 percent, depending on the network. An international, interdisciplinary group of researchers led by Professor Dr. Georgios Smaragdakis, professor of Internet measurement and analysis at TU Berlin and Fellow of the Berlin Institute for the Foundations of Learning and Data (BIFOLD), has published these figures and other findings in a paper in Communications of the Association for Computing Machinery (ACM). The leading professional association recently named the paper a research highlight.

Despite the worldwide restrictions necessitated by COVID-19, life continued with the Internet playing an important role.

From essentially one day to the next, almost nothing was possible without a stable Internet connection. Since March of last year, team meetings, school lessons, and even private celebrations have primarily been held via digital screens. Those without a broadband connection or sufficient electronic devices have missed out. Despite the worldwide restrictions necessitated by COVID-19, life continued with the Internet playing an important role for companies, the education sector, entertainment, retail, and social interactions. “In the spring of 2020, no one could say with certainty whether the Internet would be able to withstand this rush demand,” explains Georgios Smaragdakis. “No one had previously expected a sudden surge in Internet traffic of such proportions.” In their project, financed in part by BIFOLD, the researchers investigated Internet data streams from different Internet providers across Europe. “Together they provide us with a good understanding of the impacts that the COVID-19 waves and lockdown measures had on Internet traffic,” continues Georgios Smaragdakis.

Within a year of the implementation of the first lockdown measures, the aggregate volume of Internet data traffic increased by approximately 40 percent, significantly more than the expected annual growth. At the same time, mobile data traffic first slightly decreased and then only grew moderately, as people were out and about less, thus using less mobile data. “Our calculations show that the use of services such as video conferencing and VPNs increased by up to 300 percent. Gaming applications also significantly increased. After moderate growth during the spring lockdown, use increased by about 300 percent during the fall lockdown. And while these applications were primarily used in the evening or on the weekend pre-pandemic, gaming usage increases were evenly distributed across each day of the week during the second lockdown, mainly in the mornings,” remarks Georgios Smaragdakis.

Overall traffic patterns in Internet usage have clearly changed: While peak times before the pandemic were on the weekend and in the evening, the sudden growth in Internet usage primarily occurred on weekdays during working hours. This asynchronous growth is precisely one reason why researchers believe the Internet was able to handle the increased traffic relatively well. Smaragdakis believes the good structure and overprovisioning of the network operators also helped.

“In terms of digitalization, the last months have been a tremendous success,” says Smaragdakis. “In just a matter of weeks, German universities and government authorities adopted developments that they had previously failed to implement in years. These days, a broadband connection is not just something that is nice to have, but rather an essential requirement to be able to work. This level of digitalization is the new normal. It will not be possible to return to previous practices.”

The researchers’ study also shows that overprovisioning, proactive network management, and automatization were key to providing resistant networks which could cope with the drastic and unexpected fluctuations in demand like those experienced during the COVID-19 pandemic. “Many, but not all, network providers succeeded in doing this. With the pandemic set to continue for some time, it is important that we continue to examine data traffic to understand how usage changes during these unprecedented times,” he concludes.

The Publication in Detail

Authors:
Anja Feldmann, Oliver Gasser, Franziska Lichtblau, Enric Pujol, Ingmar Poese, Christoph Dietzel, Daniel Wagner, Matthias Wichtlhuber, Juan Tapiador, Narseo Vallina-Rodriguez, Oliver Hohlfeld, Georgios Smaragdakis
Abstract:
In March 2020, the World Health Organization declared the Corona Virus 2019 (COVID-19) outbreak a global pandemic. As a result, billions of people were either encouraged or forced by their governments to stay home to reduce the spread of the virus. This caused many to turn to the Internet for work, education, social interaction, and entertainment. With the Internet demand rising at an unprecedented rate, the question of whether the Internet could sustain this additional load emerged. To answer this question, this paper will review the impact of the first year of the COVID-19 pandemic on Internet traffic in order to analyze its performance. In order to keep our study broad, we collect and analyze Internet traffic data from multiple locations at the core and edge of the Internet. From this, we characterize how traffic and application demands change, to describe the “new normal,” and explain how the Internet reacted during these unprecedented times.
Publication:
Communications of the ACM, July 2021, Vol. 64 No. 7, Pages 101-108
https://cacm.acm.org/magazines/2021/7/253468-a-year-in-lockdown/fulltext

For further information please contact:

Prof. Dr. Georgios Smaragdakis
TU Berlin
Tel.: 030 314-75169
Email: georgios@inet.tu-berlin.de

ACM SIGMOD 2021: 7 Data Management Papers Accepted

Home >

ACM SIGMOD 2021: 7 Data Management Papers Accepted

ACM SIGMOD 2021: 7 Data Management Papers Accepted

The upcoming 2021 ACM International Conference on the Management of Data (SIGMOD) – a top ranked international conference on database systems and information management – accepted seven papers submitted by BIFOLD Researchers. Large amounts of high-quality data are the backbone of modern machine learning applications in research, industry, and sectors, like medicine and mobility. To enable the next generation of Artificial Intelligence applications, an increasing number of different data sources need to be accessed and analyzed in a shorter period of time, while reducing computation costs, maintaining fault tolerance, and achieving high data quality. The group of BIFOLD Researchers, led by BIFOLD Co-Director Prof. Dr. Volker Markl, tackled some of these data management challenges and developed innovative solutions.

Prof. Dr. Volker Markl, BIFOLD Co-Director
(© TU Berlin/PR/Simon)

Six full research papers and one industrial paper by BIFOLD Researchers on data management topics were accepted at SIGMOD 2021. “The acceptance of such a high number of papers from one German research group at SIGMOD is exceptional. I am very proud of this BIFOLD success and the international recognition of our research efforts,” states Volker Markl. Two of the publications were the result of international research collaborations. One of these papers is due to joint work with colleagues at East China Normal University in Shanghai.

The authors propose HyMAC, a system that enables iterative Machine Learning algorithms to run more efficiently on distributed dataflow systems. The approach has the potential to speed up the process of Machine Learning with data from billions of datapoints by reducing the communication cost in dataflow systems, such as Apache Flink.

The other international collaboration resulted in a publication based on work conducted in the ExDRa (Exploratory Data Science on Raw Data) project, jointly with BIFOLD researchers, Siemens AG, TU Graz, and Know-Center GmbH. This paper was accepted in the SIGMOD Industrial Track. The ExDRa system is designed to support the exploratory data science process over heterogeneous and distributed data. Typically, industrial data scientists propose and evaluate hypotheses, integrate the necessary data, and build and execute models, in order to identify interesting patterns. To aid in this process, ExDRa investigates how to design and build a system that can offer support and help optimize the analysis of problems arising in several Siemens use-cases (e.g., chemical, pharmaceutical, water, oil, gas). For this, the project leverages the NebulaStream data management system for the IoT, which is currently under development in the BIFOLD IoT Lab.

The publications in detail:

“Enforcing Constraints for Machine Learning Systems via Declarative Feature Selection: An Experimental Study”
  • Authors:
  • Felix Neutatz, Felix Biessmann, Ziawasch Abedjan
  • Abstract:
    Responsible usage of Machine Learning (ML) systems in practice does not only require enforcing high prediction quality, but also accounting for other constraints, such as fairness, privacy, or execution time. One way to address multiple user-specified constraints on ML systems is feature selection. Yet, optimizing feature selection strategies for multiple metrics is difficult to implement and has been underrepresented in previous experimental studies. Here, we propose Declarative Feature Selection (DFS) to simplify the de-sign and validation of ML systems satisfying diverse user-specified constraints. We benchmark and evaluate a representative series of feature selection algorithms. From our extensive experimental results, we derive concrete suggestions on when to use which strategy and show that a meta-learning-driven optimizer can accurately predict the right strategy for an ML task at hand. These results demonstrate that feature selection can help to build ML systems that meet combinations of user-specified constraints, independent of the ML methods used.

    Preprint [PDF]

“The Power of Nested Parallelism in Big Data Processing – Hitting Three Flies with One Slap –“

Authors:
Gábor E. Gévay, Jorge-Arnulfo Quiané-Ruiz, Volker Markl
Abstract:
Many common data analysis tasks, such as performing hyperparameter optimization, processing a partitioned graph, and treating a matrix as a vector of vectors, offer natural opportunities for nested-parallel operations, i.e., launching parallel operations from inside other parallel operations. However, state-of-the-art dataflow engines, such as Spark and Flink, do not support nested parallelism. Users must implement workarounds, causing orders of magnitude slowdowns for their tasks, let alone the implementation effort. We present Matryoshka, a system that enables dataflow engines to support nested parallelism, even in the presence of control flow statements at inner nesting levels. Matryoshka achieves this via a novel two-phase flattening process, which translates nested-parallel programs to flat-parallel programs that can efficiently run on existing dataflow engines. The first phase introduces novel nesting primitives into the code, which allows for dynamic optimizations based on intermediate data characteristics in the second phase at run time. We validate our system using several common data analysis tasks, such as PageRank and K-means. The results show the superiority of Matryoshka over the state-of-the-art approaches (the DIQL system as well as the outer- and inner-parallel workarounds)to support nested parallelism in dataflow engines.

Preprint [PDF]

“Expand your Training Limits! Generating and Labeling Jobs for ML-based Data Management”
  • Authors:
    Francesco Ventura, Zoi Kaoudi, Jorge Arnulfo Quiane Ruiz, Volker Markl
    Abstract:
    Machine Learning (ML) is quickly becoming a prominent method in many data management components, including query optimizers which have recently shown very promising results. However, the low availability of training data (i.e., large query workloads with execution time or output cardinality as labels) widely limits further advancement in research and compromises the technology transfer from research to industry. Collecting a labeled query workload has a very high cost in terms of time and money due to the development and execution of thousands of realistic queries/jobs. In this work, we face the problem of generating training data for data management components tailored to users’ needs. We present DataFarm, an innovative framework for efficiently generating and labeling large query workloads. We follow a data-driven white box approach to learn from pre-existing small workload patterns, input data, and computational resources. Our framework allows users to produce a large heterogeneous set of realistic jobs with their labels, which can be used by any ML-based data management component. We show that our framework outperforms the current state-of-the-art both in query generation and label estimation using synthetic and real datasets. It has up to 9× better labeling performance, in terms of R2 score. More importantly, it allows users to reduce the cost of getting labeled query workloads by 54× (and up to an estimated factor of 104×) compared to standard approaches.

    Preprint [PDF]
“ExDRa: Exploratory Data Science on Federated Raw Data”

Authors:
Sebastian Baunsgaard, Matthias Boehm, Ankit Chaudhary, Behrouz Derakhshan, Stefan Geißelsöder, Philipp Marian Grulich, Michael Hildebrand, Kevin Innerebner, Volker Markl, Claus Neubauer, Sarah Osterburg, Olga Ovcharenko, Sergey Redyuk, Tobias Rieger, Alireza Rezaei Mahdiraji, Sebastian Benjamin Wrede, Steffen Zeuch
Abstract:
Data science workflows are largely exploratory, dealing with under-specified objectives, open-ended problems, and unknown business value. Therefore, little investment is made in systematic acquisition, integration, and pre-processing of data. This lack of infrastructure results in redundant manual effort and computation. Furthermore, central data consolidation is not always technically or economically desirable or even feasible (e.g., due to privacy, and/or data ownership). The ExDRa system aims to provide system infrastructure for this exploratory data science process on federated and heterogeneous, raw data sources. Technical focus areas include (1) ad-hoc and federated data integration on raw data, (2) data organization and reuse of intermediates, and (3) optimization of the data science lifecycle, under awareness of partially accessible data. In this paper, we describe use cases, the system architecture, selected features of SystemDS’ new federated backend, and promising results. Beyond existing work on federated learning, ExDRa focuses on enterprise federated ML and related data pre-processing challenges because, in this context, federated ML has the potential to create a more fine-grained spectrum of data ownership and thus, new markets.

Preprint [PDF]

“Compliant Geo-distributed Query Processing

Authors:
Kaustubh Beedkar, Jorge Quiane-Ruiz and Volker Markl
Abstract:
In this paper, we address the problem of compliant geo-distributed query processing. In particular, we focus on dataflow policies that impose restrictions on movement of data across geographical or institutional borders. Traditional ways to distributed query processing do not consider such restrictions and therefore in geo-distributed environments may lead to non-compliant query execution plans. For example, an execution plan for a query over data sources from Europe, North America, and Asia, which may otherwise be optimal, may not comply with dataflow policies as a result of shipping some restricted (intermediate) data. We pose this problem of compliance in the setting of geo-distributed query processing. We propose a compliance-based query optimizer that takes into account dataflow policies, which are declaratively specified using our policy expressions, to generate compliant geo-distributed execution plans. Our experimental study using a geo-distributed adaptation of the TPC-H benchmark data indicates that our optimization techniques are effective in generating efficient compliant plans and incur low overhead on top of traditional query optimizers.

Preprint [PDF]

“Hybrid Evaluation for Distributed Iterative Matrix Computation”

Authors:
Zihao Chen, Chen Xu, Juan Soto, Volker Markl, Weining Qian, Aoying Zhou
Abstract:
Distributed matrix computation is common in large-scale data processing and machine learning applications. Many iterative-convergent algorithms involving matrix computation share a common property: parameters converge non-uniformly. This property can be exploited to eliminate computational redundancy via incremental evaluation. Existing systems that support distributed matrix computation already explore incremental evaluation. However, they are oblivious to the fact that non-zero increments are scattered in different blocks in a distributed environment. Additionally, we observe that incremental evaluation does not always outperform full evaluation. To address these issues, we propose matrix reorganization to optimize the physical layout upon the state-of-art optimized partition schemes, and thereby accelerate the incremental evaluation. More importantly, we propose a hybrid evaluation to efficiently interleave full and incremental evaluation during the iterative process. In particular, it employs a cost model to compare the overhead costs of two types of evaluations and a selective comparison mechanism to reduce the overhead incurred by comparison itself. To demonstrate the efficiency of our techniques, we implement HyMAC, a hybrid matrix computation system based on SystemML. Our experiments show that HyMAC reduces execution time on large datasets by 23% on average in comparison to the state-of-art optimization technique and consequently outperforms SystemML, ScaLAPACK, and SciDB by an order of magnitude.

Preprint [PDF]

“Parallelizing Intra-Window Join on Multicores: An Experimental Study”

Authors:
Shuhao Zhang, Yancan Mao, Jiong He, Philipp M. Grulich, Steffen Zeuch, Bingsheng He, Richard T. B. Ma, Volker Markl
Abstract:
The intra-window join (IaWJ), i.e., joining two input streams over a single window, is a core operation in modern stream processing applications. This paper presents the first comprehensive study on parallelizing the IaWJ on modern multicore architectures. In particular, we classify IaWJ algorithms into lazy and eager execution approaches. For each approach, there are further design aspects to consider, including different join methods and partitioning schemes, leading to a large design space. Our results show that none of the algorithms always performs the best, and the choice of the most performant algorithm depends on: (i) workload characteristics, (ii) application requirements, and (iii) hardware architectures. Based on the evaluation results, we propose a decision tree that can guide the selection of an appropriate algorithm.

Preprint [PDF]