Event Date Details:
- Psych 1924
Title: Off-policy Learning in Theory and in the Wild
The talk considers the problem of offline policy learning for automated decision systems under the contextual bandits model, where we aim at evaluating the performance of a given policy (a decision algorithm) and also learning a better policy using logged historical data consisting of context, actions, rewards and probabilities of the actions taken. This is a generalization of the Average Treatment Effect (ATE) estimation problem and has some interesting new set of desiderata to consider.
In the first part of the talk, I will compare and contrast off-policy evaluation and ATE estimation and clarify how different assumptions change the corresponding minimax risk in estimating the ``causal effect''. In addition, I will talk about how one can achieve significantly better finite sample performance than asymptotically optimal estimators through the SWITCH estimator.
In the second part of the talk, I will talk about off-policy learning in the real world. I will highlight some of the real world challenges include: missing logging probability, confounding variables (Simpson's paradox) and model misspecification. We will demonstrate that a commonly-used naive approach of direct cross-entropy minimization is implicitly optimizing a causal objective without requiring us to know the probabilities of taking actions. Then we propose policy imitation, which can be used as a regularization and as a test of whether there are confounders or model-misspecification.
1. Yu-Xiang Wang, Alekh Agarwal, and Miroslav Dud??k. (2017) "Optimal and Adaptive Off-policy Evaluation in Contextual Bandits." In ICML-17. https://arxiv.org/abs/1612.01205
2. Yifei Ma, Yu-Xiang Wang, and Murali Balakrishnan. (2018) “Imitation-Regularized Offline Learning.” AISTATS’18, to appear. https://arxiv.org/abs/1901.
Yu-Xiang Wang is an Assistant Professor of Computer Science at UCSB. Prior to joining UCSB, he was a scientist with Amazon Web Services’s AI research lab in Palo Alto, CA from 2017 to 2018. Yu-Xiang received his PhD in Statistics and Machine Learning in 2017 from the world’s first Machine Learning Department in the School of Computer Science of Carnegie Mellon University (CMU). Before that, he received his master’s and bachelor’s degrees in Electrical Engineering from National University of Singapore in 2011 and 2013 respectively. Yu-Xiang’s research interests revolve around the intersection of machine learning, statistics and optimization with special focus on statistical theory and methodology, differential privacy, large-scale machine learning, reinforcement learning and deep learning.