As you know, data has become an enormously important
production factor. Together with intelligent algorithms, they form the
cornerstone of Artificial Intelligence. It is only through the combination of
Big Data and Machine Learning that the great successes of AI have become
possible, which we have seen in recent years and will continue to see in the
future.
In Berlin, too, with the two competence centers BBDC
and BZML, we have already achieved internationally highly regarded successes,
from basic research and open source software development to very successful
spin-offs.
With BIFOLD, Berlin now has a technological research
beacon around which an entire ecosystem of spin-offs and application-oriented
research labs can develop. This will enable us to attract top international talent
to Berlin and make AI a relevant economic factor for Berlin.
The special thing about BIFOLD is that we are avoiding
the mistake that unfortunately is commonly made in science, namely, to consider
partial aspects of AI in isolation. For example, the best algorithms will not
help us, if we do not simultaneously research and develop the underlying
technologies and systems in which real data is efficiently provided and
processed jointly with analysis algorithms.
With respect to data – i.e., my research area in
BIFOLD – important challenges lie, for example, in the processing of widely geographically
distributed data, i.e., in some cases globally distributed data, which cannot
always be physically combined on an infrastructure for analysis due to data
protection laws as well as for technical reasons.
Think, for example, of the globally distributed
vehicle data of an automobile manufacturer or patient data that is collected across
hospitals. Thus, we need new data processing architectures, that on the one
hand handle the growing data streams efficiently, and on the other hand
reliably protect the privacy and rights of data producers.
An additional challenge is the exponential growth of
sensor data, the complete capture of which would quickly exceed the capacities
of our global cloud infrastructures and is neither necessary nor sensible.
We are therefore developing new approaches to
preprocess data at the source, at the so-called edge, in such a way that
we only transfer and store data that is relevant for a particular analysis.
This is not only economically more efficient, but also ecologically more
sensible and less questionable in terms of data protection.
The systems that we develop should make ideal use of
the growing variety of memory and chip technologies and at the same time be so
easy to operate, i.e., function in a largely automated manner, that would not
require users to hold a five-year computer science degree, in order to work
with them.
Because computer scientists, as you all know, are
currently a painful bottleneck in the job market.
You see, especially for the commercialization and
economic success of basic research, it is extremely important to look at the
entire stack of hardware, software, data, algorithms, and the broad ecosystem
of applications holistically, and preferably together in a research institute
of critical size.
And that’s why I am particularly pleased as a database
researcher and thank the German Federal Ministry of Education and Research (BMBF)
and the State of Berlin, that with BIFOLD we are now creating the conditions to
be able to do exactly this in Berlin.