Seminars 2016-2017



Friday, January 13, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Murat Erdogdu (Statistics, Stanford University)

Title: Design and Analysis of Scalable Algorithms via Statistical Tools

Abstract: Statistics and optimization have been closely linked since the very outset. This connection has become more essential lately, mainly because of the recent advances in computational resources, the availability of large amount of data, and the consequent growing interest in statistical and machine learning algorithms. In this talk, I will discuss how one can use tools from statistics such as Stein's lemma and subsampling to design scalable, efficient, and reliable optimization algorithms. The focus will be on large-scale problems where the iterative minimization of the empirical risk is computationally intractable, i.e., the number of observations n is much larger than the dimension of the parameter p, n >> p >> 1. The proposed algorithms have wide applicability to many supervised learning problems such as binary classification with smooth surrogate losses, generalized linear problems in their canonical representation, and M-estimators. The algorithms rely on iterations that are constructed by Stein's lemma, that achieve quadratic convergence rate, and that are cheaper than any batch optimization method by at least a factor of O(p). I will discuss theoretical guarantees of the proposed algorithms, along with their convergence behavior in terms of data dimensions. Finally, I will demonstrate their performance on well-known classification and regression problems, through extensive numerical studies on large-scale real datasets, and show that they achieve the highest performance compared to other widely used and specialized algorithms.

Monday, January 9, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Zijian Guo (Statistics, University of Pennsylvania)

Title: Inference for High Dimensional Linear Models: Fundamental Limits and Algorithms

Abstract: High dimensional linear regression is one of the most important models in analyzing modern data sets. Although the estimation problem is well understood, there is still a paucity of methods and fundamental theoretical results on confidence intervals for high dimensional linear regression. In this talk, I will present confidence interval results for a general linear functional. I will first construct confidence intervals of optimal expected length in the oracle setting of known sparsity level. Then, I will focus on the problem of adaptation to sparsity for the construction of confidence intervals. I will identify the regimes in which it is possible to construct adaptive confidence intervals. In terms of optimality and adaptivity, there are striking differences between linear functionals with a sparse loading and a dense loading.

In the framework of high dimensional linear models, another interesting quantity is the normalized inner product of the two regression vectors, which can represent an important concept in genetics, the genetic correlation between phenotypes. I will introduce Functional De-biased Estimator (FDE) which achieves the optimal convergence rate of estimating the genetic correlation. The FDE estimator is applied to estimate the genetic correlations among different phenotypes in a yeast data set. Finally, I will discuss an interesting connection between the aforementioned problems and provide a unified view of the proposed procedures.

Wednesday, November 30, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: John R. Gilbert (Computer Science, UCSB)

Title: Graphs and Sparse Matrices: There and Back Again

Abstract: The mathematical connections between graph theory and linear algebra are intimate and well known. The computational links between the two fields are also deep, extending from basic data structures to fundamental decompositions to the design of efficient algorithms. During the first 50 years of this computational relationship, graphs served numerical linear algebra by enabling efficient sparse matrix computation. Recently, matrix computation has been returning the favor.

I will talk about the past and present of this relationship in both directions, and speculate a bit on its future. Along the way, I will describe two software systems we have built for computing with large graphs and networks on parallel computers, CombBLAS and the Knowledge Discovery Toolbox. The key to their performance and scaling is sparse matrix computation. Finally, I will advertise the Graph BLAS Forum, an open effort to standardize primitives for graph computation, building on many groups' work on graph algorithms in the language of linear algebra.

Wednesday, November 16, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Ali Al-Sharadqah (California State University Northridge)

Title: Geometric Fitting in Error-In-Variables Models

Abstract: We will introduce Errors-in-Variables models (EIV) and its applications in geometric estimation, which is a widely known topic in computer vision and pattern recognition. In geometric estimation, two types of problems will be discussed: (1) Fitting geometric curves such as circles, and ellipses to a set of experimental observations whose both coordinates are contaminated by noisy errors. (2) Other applications in computer vision such as 'Fundamental Matrix' estimation and 'Homography' computation that are essential in 3D-reconstruction.

Some theoretical results in circle and ellipse fitting will be addressed first. These results lead to some methodological questions that require further investigation. Therefore, we developed our unconventional statistical analysis that allowed us to effectively assess EIV parameter estimates. We validated this approach through a series of numerical tests. We theoretically compared the most popular fits for circles and ellipses with each other and we showed why and by how much each fit differs from others. Our theoretical comparison leads to new unbeatable fits with superior characteristics that surpass all existing fits theoretically and experimentally.

Wednesday, November 9, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Xiaotong Shen (University of Minnesota)

Title: Personalized prediction and recommender systems

Abstract: Personalized prediction predicts a user's preference for a large number of items through user-specific as well as content-specific information, based on a very small amount of observed preference scores. In a sense, predictive accuracy depends on how to pool the information from similar users and items. Two major approaches are collaborative filtering and content-based filtering. Whereas the former utilizes the information on users that think alike for a specific item, the latter acts on characteristics of the items that a user prefers, on which two kinds of recommender systems Grooveshark and Pandora are built. In this talk, I will discuss various aspects of latent factor modeling, in addition to computational strategies for large problems.

Wednesday, November 2, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Josselin Garnier (Ecole Polytechnique)

Title: Correlation-based imaging in random media

Abstract: Sensor array imaging in a randomly scattering medium is usually limited because coherent signals recorded by the array and coming from a reflector to be imaged are weak and dominated by incoherent signals coming from multiple scattering by the medium. Stochastic and multi-scale analysis has recently allowed for the emergence of original imaging techniques. We will see in this talk how correlation-based imaging techniques can mitigate or even sometimes benefit from the multiple scattering of waves.

Wednesday, October 26, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: (Tony) Jianguo Sun (University of Missouri)

Title: Regression Analysis of Informatively Interval-censored Failure Time Data

Abstract: Interval-censored failure time data occur in many fields such as demography, economics, medical research and reliability, and many inference procedures on them have been developed (Chen et al., 2012; Sun, 2006). However, most of the existing approaches assume that the mechanism that yields interval censoring is independent of the failure time of interest and it is clear that this may not be true in practice. In this talk, we will discuss this latter situation and present some inference procedures for the problem.

Wednesday, October 19, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Jason Marden (ECE-UCSB)

Title: Incentivizing Local Behavior in Distributed Systems

Abstract: The central goal in multiagent systems is to design local control laws for the individual agents to ensure that the emergent global behavior is desirable with respect to a given system level objective. Game theory is beginning to emerge as a valuable set of tools for achieving this goal. A central component of this game theoretic approach is the assignment of utility functions to the individual agents. Here, the goal is to assign utility functions within an "admissible" design space such that the resulting game possesses desirable properties, e.g., existence and efficiency of pure Nash equilibria. Our first set of results focuses on ensuring the existence of pure Nash equilibria. Here, we prove that weighted Shapley values completely characterize the space of "local" utility functions that guarantee the existence of a pure Nash equilibrium. That is, if the agents' utility functions cannot be represented as a weighted Shapley value, then there exists a game for which a pure Nash equilibrium does not exist. One of the interesting consequences of this characterization is that guaranteeing the existence of a pure Nash equilibrium necessitates the use of a game structure termed "potential games". Building on this characterization, our second set of results will focus on characterizing the utility functions that optimize the efficiency of the resulting pure Nash equilibrium.

Wednesday, October 12, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Pierre-Oliver Goffard (PSTAT-UCSB)

Title: Boundary Crossing Problems with Applications to Risk Management

Abstract: Many problems in stochastic modeling come down to study the crossing time of a certain stochastic process through a given boundary, lower or upper. Typical fields of application are in risk theory, epidemic modeling, queueing, reliability and sequential analysis. The purpose of this talk is to present a method to determine boundary crossing probabilities linked to stochastic point processes having the order statistic property. A very well known boundary crossing result is revisited, a detailed proof is given. the same arguments may be used to derive results in trickier situations. We further discuss the practical implications of this classical result and if there is still some time left, some duality features might be presented.

Wednesday, September 28, South Hall 5607F, 3:30-5:00 p.m.; refreshments served at 3:15 p.m.

Speaker: Debases Sengupta (PSTAT-UCSB and Indian Statistical Institute, Kolkata)

Title: Feature sensitive and automated curve registration with paleo-climatic application

Abstract: Given two sets of functional data having a common underlying mean function but different degrees of distortion in time measurements, we provide a method of estimating the time transformation necessary to align (or 'register') them. The novelty of the method lies in the elimination of prior smoothing, which can be an impediment to good performance. We prove that the proposed method is consistent under fairly general conditions. Simulation results show superiority of the performance of the proposed method over two existing methods. The proposed method is illustrated through the analysis of three paleoclimatic data sets. (This work was done jointly with Dibyendu Bhowmick and Radhendushka Srivastava.)