Stefan Grafberger
I am a Ph.D. student at BIFOLD and TU Berlin in the DEEM Lab, conducting research at the intersection of data management and machine learning. I mainly publish at conferences like SIGMOD and VLDB.
My Ph.D. advisors are Sebastian Schelter and Paul Groth. I work on responsible data management (also in collaboration with Julia Stoyanovich). I spent the first three years of my Ph.D. at the University of Amsterdam in the Intelligent Data Engineering Lab, before Sebastian transitioned to TU Berlin. Before my Ph.D., I did my masters at TU Munich with Thomas Neumann and Alfons Kemper and focused on databases.
During my studies, I interned with Microsoft GSL, Amazon Research, Oracle Labs, and worked as a research assistant at TU Munich. I also interned and worked as a working student at TNG Technology Consulting in Munich and worked as a teaching assistant at University of Augsburg.
In the past, I have been working on deequ, a library for ‘unit-testing’ large datasets with Apache Spark, PGX, an in-memory graph analytics framework, and Umbra, a disk-based database with in-memory performance. Currently, I work on mlinspect and mlwhatif. The goal is to diagnose and mitigate robustness and reliability issues in machine learning pipelines.
Sebastian Schelter, Shubha Guha, Stefan Grafberger
Automated Provenance-Based Screening of ML Data Preparation Pipelines
Sebastian Schelter, Stefan Grafberger
Messy Code Makes Managing ML Pipelines Difficult? Just Let LLMs Rewrite the Code!
Stefan Grafberger
Instrumentation and Analysis of Native ML Pipelines via Logical Query Plans
Sebastian Schelter, Stefan Grafberger, Maarten de Rijke
Snapcase– Regain Control over Your Predictions with Low-Latency Machine Unlearning
Publication Highlight - Snapcase
At the VLDB 2024 conference, the BIFOLD Research Group DEEM Lab introduced "Snapcase," a demo paper that addresses the concept of machine unlearning.