Protecting sensitive data while allowing meaningful analysis

In many cases, in-depth data analysis comes with challenging data privacy issues. For example, hospitals and medical institutions often need to share patient data with researchers to advance medical studies while ensuring patient confidentiality. However, current database access control mechanisms only allow data officers to either grant or restrict access to sensitive data – without any middle ground. This rigid model has hindered data sharing due to a lack of trust—both among external users and even within the same organization.
To address this challenge, BIFOLD researchers from Volker Markl’s team Database Systems and Information Management now introduce “Mascara”, a novel access control middleware designed to manage partial data access while protecting privacy through anonymization. The corresponding paper “Disclosure-Compliant Query Answering” was accepted for SIGMOD 2025.
They introduce a new language for specifying how data can be accessed and, if necessary, anonymized using masking functions. These functions protect sensitive information either by adding noise or by replacing values with more general ones. This approach ensures privacy protection while still allowing meaningful analysis of the disclosed data. With Mascara, a hospital’s data officer could define disclosure policies that allow researchers to access patient records in an anonymized manner. For example: Personal identifiers (e.g., names, addresses) could be masked entirely, while sensitive attributes (e.g., exact age, income) could be generalized (e.g., replacing "34 years old" with "30-40 years old") or perturbed (e.g., adding minor noise to blood test results). Mascara introduces flexibility by enabling data officers to control the anonymization level based on the requested information thereby maintaining patients' confidentiality.
“The key innovation of our approach is the seamless integration of anonymization through masking functions into access control while allowing users to query the database using its original schema. Mascara then, automatically modifies the user’s queries to comply with the defined disclosure policies. Additionally, Mascara employs an innovative approach to estimate the quality of anonymized data. Based on these estimations, responses of the highest possible quality are returned without compromising data privacy“, explains first author Rudi Poepsel-Lemaitre. From the perspective of data subjects (the individuals whose data is being used), Mascara offers strong privacy guarantees. It ensures that access to sensitive data is only granted if the privacy requirements agreed upon by the user are met. Without such consent, data officers can simply restrict access.
With Mascara, BIFOLD researchers take a significant step toward a future where privacy and data accessibility are no longer in conflict but work together to enable secure, high-quality data analysis. Ultimately, this enhances trust between organizations that collect data and the individuals behind it.
Publication:
Disclosure-compliant Query Answering, Proceedings of the ACM on Management of Data, Volume 2, Issue 6.