Banner Banner

BIFOLD Colloquium 13/2024

Icon

November 14, 2024 Icon 10:00 - 11:00

Icon

TU Berlin, Marchstr. 23, 10587 Berlin, Germany, Room MAR 2.057

Icon

Prof. Juliana Freire, Ph.D.

Dataset Search for Data Discovery, Augmentation, and Explanation

Abstract: In recent years, we  have witnessed an explosion in our capacity to collect and catalog vast amounts of data about our environment, society, and populace. Moreover, with the push towards transparency and open data, scientists, governments, and organizations are increasingly making structured data available on the Web and in various repositories and data lakes. Combined with advances in analytics and machine learning, the availability of such data should, in theory, allow us to make progress on many of our most important scientific and societal questions.

However, this opportunity is often unrealized due to a central technical barrier: it remains nearly impossible for domain experts to sift through the overwhelming amount of available information to discover datasets they need for their specific applications. While search engines have addressed the discovery problem for Web documents, supporting the discovery of structured data presents new challenges. These include crawling the Web in search of datasets, indexing datasets and supporting dataset-oriented queries, creating new techniques to rank and display results.

In this talk, I will discuss these challenges and present our recent work in this area. Specifically, I will describe strategies for finding relevant datasets on the web and deriving metadata to be indexed. Additionally, I will introduce a new class of data-relationship queries and  outline a collection of methods that efficiently support various types of relationships, demonstrating how they can be used for data explanation and augmentation. Finally, I will showcase Auctus, an open-source dataset search engine that we have developed at the NYU Visualization, Imaging, and Data Analysis (VIDA) Center.  I will conclude by highlighting open problems and suggesting directions for future research.

 

© Juliana Freire
Prof. Juliana Freire, NYU.

Short-Bio: Juliana Freire is an Institute Professor at the Tandon School of Engineering and  Professor of Computer Science and Engineering and Data Science at New York University. She served as the elected chair of the ACM SIGMOD and as a council member of the Computing Community Consortium (CCC), and was the NYU lead investigator for the Moore-Sloan Data Science Environment,  a grant awarded jointly to UW, NYU, and UC Berkeley.  She develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. This spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, and web information discovery, as well as different application areas, including urban analytics, misinformation, predictive modeling, and computational reproducibility. She is an active member of the database and Web research communities, with over 250 technical papers (including 12 award-winning papers), several open-source systems, and 12 U.S. patents. According to Google Scholar, her h-index is 68 and her work has received over 19,000 citations. She is an ACM Fellow, a AAAS Fellow, and the recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She was awarded the ACM SIGMOD Contributions Award in 2020. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She has received M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook, and a B.S. degree in computer science from the Federal University of Ceara (Brazil).