BIFOLD Colloquium 2022/06/27
The Data Systems Grammar
Speaker: Stratos Idreos, Associate Professor of Computer Science at Harvard University
Date & Time: June, 27, 2022; 4:00 pm
Registration: If you are interested in participating, please contact: email@example.com
Data structures are everywhere. They define the behavior of modern data systems and data-driven algorithms. For example, with data systems that utilize the correct data structure design for the problem at hand, we can reduce the monthly bill of large-scale data systems applications on the cloud by hundreds of thousands of dollars. We can accelerate data science tasks by being able to dramatically speed up the computation of statistics over large amounts of data. We can train drastically more neural networks within a given time budget, improving accuracy.
However, knowing the right data structure and data system design for any given scenario is a notoriously hard problem; there is a massive space of possible designs while there is no single design that is perfect across all data, queries, and hardware scenarios. We will discuss our quest for the first principles of data structures and data system design. We will show signs that it is possible to reason about this massive design space, and we will show early results from a prototype self-designing data system which can take drastically different shapes to optimize for the workload, hardware, and available cloud budget using machine learning and what we call machine knowing. These shapes include data structure and system designs which are discovered automatically and do not exist in the literature or industry.
Stratos Idreos is an associate professor of Computer Science at Harvard University where he leads the Data Systems Laboratory. His research focuses on making it easy and even automatic to design workload and hardware conscious data structures and data systems with applications on relational, NoSQL, and data science problems. For his PhD thesis on adaptive indexing, Stratos was awarded the 2011 ACM SIGMOD Jim Gray Doctoral Dissertation award and the 2011 ERCIM Cor Baayen award from the European Research Council on Informatics and Mathematics. In 2015 he was awarded the IEEE TCDE Rising Star Award from the IEEE Technical Committee on Data Engineering for his work on adaptive data systems and in 2020 he received the ACM SIGMOD Contributions award for his work on reproducible research. He is also a recipient of the National Science Foundation Career award and the Department of Energy Early Career award. Stratos was PC Chair of ACM SIGMOD 2021 and IEEE ICDE 2022, and he is the founding editor of the ACM/IMS Journal of Data Science and the chair of the ACM SoCC Steering Committee.