Seminar-Yuling Yan

Event Date: 

Wednesday, March 9, 2022 - 3:30pm to 4:30pm
Title: Inference for Heteroskedastic PCA with Missing Data
 
Abstract
 
This talk presents how to construct confidence regions for principal component analysis (PCA) in high dimension, a problem that has been vastly under-explored. While computing measures of uncertainty for nonlinear/nonconvex estimators is in general difficult in high dimension, the challenge is further compounded by the prevalent presence of missing data and heteroskedastic noise. We propose a suite of solutions to perform valid inference on the principal subspace based on two estimators: a vanilla SVD-based approach, and a more refined iterative scheme called HeteroPCA (Zhang et al., 2018). We develop non-asymptotic distributional guarantees for both estimators, and demonstrate how these can be invoked to compute both confidence regions for the principal subspace and entrywise confidence intervals for the spiked covariance matrix. Particularly worth highlighting is the inference procedure built on top of HeteroPCA, which is not only valid but also statistically efficient for broader scenarios (e.g., it covers a wider range of missing rates and signal-to-noise ratios). Our solutions are fully data-driven and adaptive to heteroskedastic random noise, without requiring prior knowledge about the noise levels and noise distributions. 
 
This is based on joint work with Yuxin Chen and Jianqing Fan. 
 
Bio:
 
Yuling Yan is currently a fourth year Ph.D. Student in the Department of Operations Research and Financial Engineering at Princeton University, advised by Professor Yuxin Chen and Professor Jianqing Fan. Before coming to Princeton, he received a bachelor's degree in statistics from Peking University in 2018. His research interests include high dimensional statistics, nonconvex optimization, reinforcement learning, and optimal transport.