The goal of NWDS is to bring together researchers and practitioners in the field of databases and data management systems working in the Pacific North-West.
One of our main activities is a talk series with a variety of distinguished speakers from academia and industry.
We thank our UWDB affiliates for supporting NWDS.
Our past talks can be found on the NWDS youtube channel.
Speaker: Juliana Freire
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center, CSE 291
When: Monday, March 6th, 2023, 1:30pm-2:30pm
Title: Dataset Search for Data Discovery, Augmentation, and Explanation
Abstract: Recent years have seen an explosion in our ability to collect and catalog immense amounts of data about our environment, society, and populace. Moreover, with the push towards transparency and open data, scientists, governments, and organizations are increasingly making structured data available on the Web and in various repositories and data lakes. Combined with advances in analytics and machine learning, the availability of such data should in theory allow us to make progress on many of our most important scientific and societal questions. However, this opportunity is often missed due to a central technical barrier: it is currently nearly impossible for domain experts to weed through the vast amount of available information to discover datasets that are needed for their specific application. While search engines have addressed the discovery problem for Web documents, there are many new challenges involved in supporting the discovery of structured data---from crawling the Web in search of datasets, to the need for dataset-oriented queries and new strategies to rank and display results. I will discuss these challenges and present our recent work in this area. In particular, I will introduce a new class of data-relationship queries that, given a dataset, identifies related datasets; I will describe a collection of methods that efficiently support different kinds of relationships that can be used for data explanation and augmentation; and I will demonstrate Auctus, an open-source dataset search engine that we have developed at the NYU VIDA Center.
Bio: Juliana Freire is a Professor of Computer Science and Data Science at New York University. She was the elected chair of the ACM Special Interest Group on Management of Data (SIGMOD), served as a council member of the Computing Research Association’s Computing Community Consortium (CCC), and was the NYU lead investigator for the Moore-Sloan Data Science Environment. She develops methods and systems that enable a wide range of users to obtain trustworthy insights from data. This spans topics in large-scale data analysis and integration, visualization, machine learning, provenance management, and web information discovery, and different application areas, including urban analytics, predictive modeling, and computational reproducibility. Freire has co-authored over 200 technical papers (including 11 award-winning publications), several open-source systems, and is an inventor of 12 U.S. patents. She is an ACM Fellow, a AAAS Fellow, and a recipient of an NSF CAREER, two IBM Faculty awards, and a Google Faculty Research award. She received the ACM SIGMOD Contributions Award in 2020. Her research has been funded by the National Science Foundation, DARPA, Department of Energy, National Institutes of Health, Sloan Foundation, Gordon and Betty Moore Foundation, W. M. Keck Foundation, Google, Amazon, AT&T Research, Microsoft Research, Yahoo! and IBM. She received a B.S. degree in computer science from the Federal University of Ceara (Brazil), and M.Sc. and Ph.D. degrees in computer science from the State University of New York at Stony Brook.
Speaker: Sean J. Taylor
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center, CSE 291
When: Monday, February 27th, 2023, 1:30pm-2:30pm
Title: When Do We Need Casual Inference in Data Science?
Abstract: The most common applications of causal inference to business decision-making are in two main areas: product experiments which inform launch decisions and algorithmic policies based on machine learning models. These applications focus on the special case where interventions are relatively cheap. However, practical analytics tasks encountered by many data scientists and analysts (where interventions are usually not possible) are currently underserved by causal inference. I review the tasks we tend to encounter in practice, discuss how the causal inference lens can change the results, and speculate about the barriers to adoption of these ideas in organizations.
Speaker: Alvitta Ottley
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center, CSE 291
When: Monday, February 13th, 2023, 1:30pm-2:30pm
Title: Improving Human-Machine Partnership Through Observational Learning
Abstract: There is a fast-growing interest in analyzing user interaction to create adaptive systems that can assist or collaborate on data analysis. However, the first step for an intelligent visualization response is understanding the user. Dr. Ottley’s work uses an observational learning framework, akin to humans learning concepts like language and behavior naturally through observations, often with no explicit feedback. The goal is to enable computers to infer user attributes and strategies by observing their interactions with a system. In this talk, Dr. Ottley summarizes her lab's work on user modeling for data visualization and gives a snapshot of the current research achievements and what is possible in the near and distant future. Then, she presents techniques for capturing and predicting user behavior, focusing on inferring attention, personality, biases, and knowledge by analyzing log data. Finally, Dr. Ottley highlights the significant roadblocks and future directions for visualization research.
Bio: Dr. Alvitta Ottley is an Assistant Professor in Computer Science & Engineering Department at Washington University in St. Louis, Missouri, USA. She also holds a courtesy appointment in the Psychological and Brain Sciences Department. Her research uses interdisciplinary approaches to solve problems such as how best to display information for effective decision-making and how to design human-in-the-loop visual analytics interfaces that are more attuned to the way people think. Dr. Ottley received an NSF CRII Award in 2018 for using visualization to support medical decision-making, the NSF Career Award for creating context-aware visual analytics systems, and the 2022 EuroVis Early Career Award. In addition, her work has appeared in leading conferences and journals such as CHI, VIS, and TVCG, achieving the best paper and honorable mention awards.
Speaker: Emre Kiciman
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center, CSE 291
When: Monday, February 6th, 2023, 1:30pm-2:30pm
Title: Modeling the Data-Generating Process is Necessary for Out-of-Distribution Generalization
Abstract: At Microsoft Research, we are working to broaden the usage of causal AI, especially for decision-making applications, through both fundamental research and practical tooling. In this talk, I'll briefly introduce the PyWhy open-source tools and ecosystem and the fundamental research challenges we are prioritizing based on our experiences with causal AI: better elicitation of the domain knowledge and causal assumptions necessary for a valid causal analysis; the need for better validation and trustworthiness of causal analyses; and the extension of causal analysis methods to support analysis over high-dimensional, unstructured data, such as images and text. I'll spend the bulk of the talk deep-diving into our recent research towards the latter topic, connecting causal graphs and the statistical independences they encode with the loss functions and constraints imposed by invariant representation learning approaches for domain generalization. Based on the causal relationships between spurious attributes and the classification label, we obtain realizations of the canonical causal graph that characterize common distribution shifts and show that each shift entails different independence constraints over observed variables. This work explains why no single current method performs consistently across all kinds of distribution shifts, and leads to a new algorithm, Causally Adaptive Constraint Minimization (CACM), that adaptively identifies and applies the correct independence constraint for regularization. Extensive experiments show that adaptive dataset-dependent constraints lead to the highest accuracy on unseen domains, demonstrating the criticality of modeling the causal relationships inherent in the data-generating process.
Bio: See here.
Speaker: Sudeepa Roy
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center
When: Monday, January 30th, 2023, 1:30pm-2:30pm
Title: Toward Interpretable and Actionable Data Analysis with Query Debugging and Causal Inference
Abstract: In today’s data-driven world, users in different fields routinely collect, study, and make decisions supported by data. This motivates development of new techniques to help users from various backgrounds and levels of expertise process data, extract useful information and insights from data, and subsequently make sound decisions. In this talk, I will describe some of our work toward interpretable and actionable data analysis focusing on two steps of the data analysis pipeline. First, I will discuss generating explanations to help new programmers and students debug wrong queries and write correct relational queries. Then, I will talk about our research on connecting data management research with causal inference research to enable causal analysis and hypothetical reasoning for large complex data, and conclude with future research directions.
Bio: Sudeepa Roy is an Associate Professor in Computer Science at Duke University. She works broadly in data management, with a focus on the foundational aspects of big data analysis, which includes causality and explanations for big data, data repair, query optimization, probabilistic databases, and database theory. Before joining Duke in 2015, she did a postdoc at the University of Washington, and obtained her Ph.D. from the University of Pennsylvania. She is a recipient of the VLDB Early Career Research Contributions Award, an NSF CAREER Award, and a Google Ph.D. fellowship in structured data. She is a co-director of the Almost Matching Exactly (AME) lab for interpretable causal inference at Duke.
Speaker: Anna Fariha
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center
When: Monday, November 22th, 2021, 2:30pm-3:30pm
Title: Blame the data, not the system: how data constraints can help in trustworthy machine learning and explain causes of data-system malfunction.
Abstract: The core of modern data-driven systems comprises models learned from large datasets, and they are usually optimized to target particular data and workloads. While these data-driven systems have seen wide adoption and success, their reliability and proper function hinge on the data's continued conformance to the systems initial settings and assumptions. My research focuses on designing mechanisms to assess the trustworthiness of a system's inferences and explain causes of system malfunction due to data nonconformance. The key idea here is that since data is central to data-driven systems, it can guide us to determine whether predictions made by an ML model can be trusted, and to expose the cause of a system's unexpected behavior. In this talk, I will talk about mechanisms and explanation frameworks to facilitate trusting and understanding outcomes involving data and data systems.
Bio: I am a Researcher at Microsoft. I obtained my Ph.D. from the University of Massachusetts, Amherst under the supervision of Alexandra Meliou. My primary area of research revolves around data management; but, the application areas of my research have been interdisciplinary, spanning from program synthesis and software engineering to machine learning, natural language processing, and human-computer interaction. I am interested in designing mechanisms for enhancing system usability, by developing intelligent tools towards boosting end-user productivity, and developing mechanisms for explaining system behavior ranging from traditional systems to opaque, data-driven systems.
Speaker: Tim Kraska
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center
When: Monday, May 24th, 2021, 9am - 10am
Title: Towards Instance-Optimized Data Systems
Abstract: Recently, there has been a lot of excitement around ML-enhanced (or learned) algorithm and data structures. For example, there has been work on applying machine learning to improve query optimization, indexing, storage layouts, scheduling, log-structured merge trees, sorting, compression, sketches, among many other things. Arguably, the motivation behind these techniques are similar: machine learning is used to model the data and/or workload in order to derive a more efficient algorithm or data structure. Ultimately, what these techniques will allow us to build are “instance-optimized” systems; systems that self-adjust to a given workload and data distribution to provide unprecedented performance and avoid the need for tuning by an administrator. In this talk, I will provide an overview of the opportunities and limitations of learned index structures, storage layouts, and query optimization techniques we have been developing in my group, and how we are integrating these techniques to build a first instance-optimized database system.
Bio: Tim Kraska is an Associate Professor of Electrical Engineering and Computer Science in MIT's Computer Science and Artificial Intelligence Laboratory, co-director of the Data System and AI Lab at MIT (DSAIL@CSAIL), and co-founder of Einblick Analytics. Currently, his research focuses on building systems for machine learning, and using machine learning for systems. Before joining MIT, Tim was an Assistant Professor at Brown, spent time at Google Brain, and was a PostDoc in the AMPLab at UC Berkeley after he got his PhD from ETH Zurich. Tim is a 2017 Alfred P. Sloan Research Fellow in computer science and received several awards including the VLDB Early Career Research Contribution Award, the VMware Systems Research Award, the university-wide Early Career Research Achievement Award at Brown University, an NSF CAREER Award, as well as several best paper and demo awards at VLDB and ICDE.
Speaker: Aaron Elmore
Where: University of Washington, Seattle.
Allen School of Computer Science and Engineering.
Paul G. Allen Center
When: Monday, April 12th, 2021, 11am-12:15pm
Title: CrocodileDB: Resource Efficient Database Execution
Abstract: The coming end of Moore’s law requires that data systems be more judicious with computation and resources as the growth in data outpaces the availability of computational resources. Current database systems are eager and aggressively consume resources to immediately and quickly complete the task at hand. Intelligently deferring a task to a later point in time can increase result reuse, reduce work that might later be invalidated, or avoid unnecessary work altogether. In this talk I will introduce CrocodileDB, a resource-efficient database system that automatically optimizes deferment based on user-specification and workload prediction. CrocodileDB integrates new ways of specifying timing information, new query execution policies, new task schedulers, and new data loading schemes. In particular, this talk will highlight two new query execution paradigms, Intermittent Query Processing and Incremental-Aware Query Execution.
Bio: Aaron J. Elmore is an Assistant Professor in the Department of Computer Science, and the College of the University of Chicago. Aaron was previously a Postdoctoral Associate at MIT. Aaron's thesis on Elasticity Primitives for Database-as-a-Service was completed at the University of California, Santa Barbara. His recent research interests focus on building data systems that address the growing data deluge. He is currently an associate editor for SIGMOD record, and has served as co-chair for SIGMOD demonstration track, the inaugural SIGMOD student research competition, and VLDB proceeding editor.
Listed in reverse chronological order. Click here for abstracts.
Please sign up for the nwds mailing list here. We use this list primarily to send announcements for upcoming events. After you register, you can send mail to that list at nwds at cs.washington.edu.
To become a member, please contact Magda or Dan.
The North-West Database Society was founded on January 1st 2006 by Dan Suciu and Magdalena Balazinska. It is inspired by the New-England Database Society.