Annual Sobel Lecture | Statistics and Applied Probability

April 22, 2026

This year's guest speaker for the Annual Sobel Lecture is Prof. Kathryn Roeder, who will be discussing about "Genomic Inferences in the era of black box predictions" on Wednesday, April 22nd from 3:30pm - 4:30pm in HSSB 1173.

Annual Sobel Lecture

Established in 2004, this lecture has been a preeminent lecture of distinction in the Department of Statistics and Applied Probability at UCSB. The lecture was established in recognition of the contributions made to statistical sciences by Prof. Milton Sobel 1919-2002.

Abstract:

Since the advent of high throughput genomic techniques, myriad statistical challenges have arisen due to high dimensionality and missing data. Intriguingly, however, powerful black-box models have been remarkably successful in filling in the missing data. The question that arises is, how can we adjust inferential techniques to account for the imputation? Here we illustrate two genomic applications in which we overcome these challenges. (1) While quantitative measurements produced by mass spectrometry proteomics experiments offer a direct way to explore the role of thousands of proteins in molecular mechanisms, analysis of such data are challenging due to the large proportion of missing values. To address this issue, a common strategy imputes missing data, although it often introduces systematic bias into downstream analyses if the imputation errors are ignored. We develop a statistical framework inspired by doubly robust estimators that offers valid and efficient inferences for proteomic data.
(2) Single-cell RNA sequencing used in conjunction with CRISPR-based perturbation (Perturb-seq) can uncover the function of genes; however, it can be costly to perform as many perturbations experiments as desired. Ideally it would be possible to use a model to predict the outcome of perturbations related to those already performed. Despite their high dimensionality and sparsity, these data have shown themselves to be amenable to analysis by deep learning methods, which provides us with a framework for this task. Remarkably we have had success in generating data for perturbation experiments that were never performed, provided we have a rich set of data from related experiments. We use ideas derived from semiparametric inference literature to obtain inferential techniques that are somewhat successful in this challenging setting.

Bio:

Kathryn Roeder joined the faculty at Carnegie Mellon University in 1994, and she is now the UPMC Professor of Statistics and Life Sciences in the Departments of Statistics & Data Science and Computational Biology. She earned her Ph.D. in statistics at Pennsylvania State University, after which she was on the faculty at Yale University for the six years. In 1997 she received the COPSS Presidents’ Award for the outstanding statistician under age 40. In 2020 she was awarded the COPSS Distinguished Achievement Award and Lectureship. In 2019 she was inducted into the National Academy of Sciences. Her research group develops statistical tools applied to genetic and genomic data to understand the workings of the human brain, and the interplay with genetic variation. These methods rely on various statistical and machine learning methods, causal inference, latent space embedding, sparse PCA and high dimensional nonparametric techniques