Seminar - Yichen Zhang

Event Date: 

Monday, January 27, 2020 - 3:30pm to 4:30pm

Event Location: 

  • Broida 1640

Title: Statistical Inference for Large-Scale Data with Small Memory


With the advent of the data revolution, opportunities and challenges have been spawned from modeling and analyzing the data generated with modern technologies. For example, in a large-scale dataset that cannot be fit into memory or is distributed over many machines, how would one construct the estimators of the existing statistical methods?
We first study the statistical inference problem under memory constraint in quantile regression (QR). We propose a linear-type estimator for QR (LEQR) and further develop a computationally efficient procedure that can successively refine the estimator in a distributed setting. Theoretically, by establishing the Bahadur representation of the LEQR with an optimal rate reminder term, we establish the asymptotic normality of the estimator. We further show that our estimator is asymptotically efficient without any constraint on the number of machines. In addition, a variant of the proposed estimator for online streaming QR has also been developed. We further extend that to a general framework for empirical risk minimization with a convex loss that could be non-differentiable.
This is based on joint work with Xi Chen and Weidong Liu.