Seminars  
201314 seminars 201213 seminars 201112 seminars 201011 seminars 200910 seminars 200809 seminars 200708 seminars 200607 seminars 200506 seminars Seminars 20142015Upcoming Wednesday, November 19, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Joseph Barr (Chief Analytics Officer, HomeUnion, Irvine, CA) Title: Real Estate Analytics Abstract:Real estate plays a significant part of our economy and there's no wonder that when home prices bottom out, so does the economy. The talk is about the Analytics of real estate, how location determines value, demographic dynamics, households, measuring and analyzing trends.
Wednesday, November 12, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Damla Senturk (UCLA) Title: Generalized MultiIndex Varying Coefficient ModelsGeneralized MultiIndex Varying Coecient Models Abstract:Among patients on dialysis, cardiovascular disease and infection are leading causes of hospitalization and death. Although recent studies have found that the risk of cardiovascular events is higher after an infectionrelated hospitalization, studies have not fully elucidated how the risk of cardiovascular events changes over time for patients on dialysis. In this work, we characterize the dynamics of cardiovascular event risk trajectories for patients on dialysis while conditioning on survival status via multiple time indices: (1) time since the start of dialysis, (2) time since the pivotal initial infectionrelated hospitalization and (3) the patient's age at the start of dialysis. This is achieved by using a new class of generalized multipleindex varying coefficient (GMIVC) models. The proposed GMIVC models utilize a multiplicative structure and onedimensional varying coefficient functions along each time and age index to capture the cardiovascular risk dynamics before and after the initial infectionrelated hospitalization among the dynamic cohort of survivors. We develop a twostep estimation procedure for the GMIVC models based on local maximum likelihood. We report new insights on the dynamics of cardiovascular events risk using the United States Renal Data System database, which collects data on nearly all patients with endstage renal disease in the U.S. Finally, simulation studies assess the performance of the proposed estimation procedures.
Wednesday, November 5, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Wenguang Sun (USC) Title: False Discovery Control in LargeScale Spatial Multiple Testing Abstract: This talk considers both pointwise and clusterwise spatial multiple testing problems. We derive oracle procedures which optimally control the false discovery rate, false discovery exceedance and false cluster rate, respectively. A datadriven finite approximation strategy is developed to mimic the oracle procedures on a continuous spatial domain. Our multiple testing procedures are asymptotically valid and can be effectively implemented using Bayesian computational algorithms for analysis of large spatial data sets.
Wednesday, October 22, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Sreenivas Konda (UCSB) Title: Consistency of Large Autocovariance Matrices Abstract:We consider Autoregressive (AR) processes of large p, but less than n, to approximate a linear time series. Using Bartlett's formula and strong mixing conditions, we show the consistency of the large sample autocovariance matrix by banding procedure. These large sample autocovariance matrices are consistent in operator norm as long as (log p)/n goes to 0. Parameters of large AR(p) model are estimated using a regularization procedure and banding of the autocovariance matrix. We also briefly review application of banding in finding the inverse of sum of two special matrices. Real examples from physics and business are used to illustrate the proposed methods.
Wednesday, October 8, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Ania SupadyChavan (KeyCorp) Title: Time Series Modeling and Forecasting an application to Banks’ stresstesting process. Abstract: I want to invite you to participate in a small presentation on how time series modeling can be performed to establish position during simulated stress. My goal is to gain your interest in the area of challenging current modeling techniques and looking beyond standard model assumptions testing to assess the true risk of the formulated model for the intended use. I am interested in exploring the procedures that happen behind the scenes of any code’s syntax to better explore statistics that play crucial role in assessing models performance as well as the forecasting process. The forecasting of next periods ahead is the process that I would like to emphasize the most. Wednesday, December 3, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Julie Swenson (UCSB) Title: A Bayesian Approach to Recommendation Systems. Abstract: Recommendation systems have proliferated in the last decade. Currently, most recommendation systems utilize content based algorithms, collaborative filtering based algorithms, or a combination of both. The recent surge in popularity of social networks has led to the creation of trust based algorithms. While such recommendation systems have proven that they can be more successful than their content and collaborative filtering based counterparts, they are often are plagued with problems with cold start and data sparseness. We propose a Bayesian trust based algorithm that addresses both of these problems. Our results indicate that our method can be more successful than an existing Bayesian trust based algorithm. Monday, January 5, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Jason Lee(Stanford University) Title: Selective Inference via the Condition on Selection Framework: Applications to Inference After Variable Selection Abstract: Selective Inference is the problem of testing hypotheses that are chosen or suggested by the data. Inference after variable selection in highdimensional linear regression is a common example of selective inference; we only estimate and perform inference for the selected variables. We propose the Condition on Selection framework, which is a framework for selective inference that allows selecting and testing hypotheses on the same dataset. In the case of inference after variable selection (variable selection by lasso, marginal screening, or forward stepwise), the Condition on Selection framework allows us to construct confidence intervals for regression coefficients, and perform goodnessoffit testing for the selected model. This is done by deriving the distribution of the test statistic conditioned on the selection event. Wednesday, January 7, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Erin Schliep (Duke University) Title: Stochastic Modeling for Environmental Velocities Abstract: The velocity of climate change is defined as an instantaneous rate of change needed to maintain a constant climate. It is computed as the ratio of the temporal gradient of climate change over the spatial gradient of climate change. Ecologically, understanding these rates of climate change is critical since the range limits of plants and animals are changing in response to climate change. A fully stochastic hierarchical model is proposed that incorporates the inherent relationship between climate, time, and space. Spacetime processes are employed to capture the spatial correlation in both the climate variable and the rate of change in climate over time. Directional derivative processes yield spatial and temporal gradients and, thus, the resulting velocities for a climate variable. The gradients and velocities can be obtained at any location in any direction and any time. Maximum gradients and their directions can also be obtained and, as a result, minimum velocities. Explicit parametric forms for the directional derivative processes provide full inference on the gradients and velocities including estimates of uncertainty. The model is applied to average annual temperature across the eastern United States for the years 1963 to 2012. Maps of the spatial and temporal gradients are produced as well as velocities of temperature change. This work provides a framework for future research in stochastic modeling of other environmental velocities, such as the velocity of disease epidemics or species distributions across a region. Friday, January 9, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Luo Xiao (Johns Hopkins University) Title: Quantifying the lifetime circadian rhythm of physical activity: a covariatedependent functional data approach Abstract: Objective measurement of physical activity using wearable devices such as accelerometers may provide tantalizing new insights into the association between activity and health outcomes. Accelerometers can record quasicontinuous activity information for many days and for hundreds of individuals. For example, in the Baltimore Longitudinal Study on Aging (BLSA) physical activity was recorded every minute for 773 adults for an average of 4.5 days per adult. An important scientific problem is to separate and quantify the systematic and random circadian patterns of physical activity as functions of time of day, age, and gender. To capture the systematic circadian pattern we introduce a practical bivariate smoother and two crucial innovations: 1) estimating the smoothing parameter using leaveonesubjectout cross validation to account for withinsubject correlation; and 2) introducing fast computational techniques that overcome problems both with the size of the data and with the crossvalidation approach to smoothing. The agedependent random patterns are analyzed by a new functional principal component analysis that incorporates both covariate dependence and multilevel structure. Results reveal several interesting, previously unknown, circadian patterns associated with human aging and gender. Monday, January 12, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Yuekai Sun (Stanford University) Title: Distributed estimation and inference for sparse regression Abstract: We address two outstanding challenges in sparse regression: (i) computationally efficient estimation in distributed settings (ii) valid inference for the selected coefficients. The main computational challenge in a distributed setting is harnessing the computational capabilities of all the machines while keeping communication costs low. We devise an approach that requires only a single round of communication among the machines. We show the approach recovers the convergence rate of the (centralized) lasso as long as each machine has access to an adequate number of samples. Turning to the second challenge, we devise an approach to postselection inference by conditioning on the selected model. In a nutshell, our approach gives inferences with the same frequency interpretation as those given by data/sample splitting, but it is more broadly applicable and more powerful. The validity of our approach also does not depend on the correctness of the selected model; i.e. it gives valid inferences even when the selected model is incorrect. Wednesday, January 14, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Johann GagnonBartsch (University of California Berkeley) Title: Removing Unwanted Variation with Negative Controls Abstract: Highthroughput biological data, such as microarray data and gene sequencing data, are plagued by unwanted variation  systematic errors introduced by variations in experimental conditions such as temperature, the chemical reagents used, etc. This unwanted variation is often stronger than the biological variation of interest, making analysis of the data challenging, and severely impeding the ability of researchers to capitalize on the promise of the technology. One of the biggest challenges to removing unwanted variation is that the factors causing this variation (temperature, atmospheric ozone, etc.) are unmeasured or simply unknown. This makes the unwanted variation difficult to identify; the problem is essentially one of unobserved confounders. In my talk, I will discuss the use of negative controls to help solve this problem. A negative control is a variable known a priori to be unassociated with the biological factor of interest. I will begin with an example that will introduce the notion of negative controls, and demonstrate the effectiveness of negative controls in dealing with unwanted variation. I will then discuss negative controls more generally, including a comparison with instrumental variables. Friday, January 16, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Heng Yang (City University of New York) Title: Simultaneous Detection And Identification With PostChange Uncertainty Abstract: We consider the problem of quickest detection of an abrupt change when there is uncertainty about the postchange distribution. Because of the uncertainty, We would like not only detecting the change point but also identifying the postchange distribution simultaneously. In particular we examine this problem in the continuoustime Wiener model where the drift of observations changes from zero to a drift randomly chosen from a collection. We set up the problem as a stochastic optimization in which the objective is to minimize a measure of detection delay subject to a frequency of false alarm constraint, while also identifying the value of the postchange drift up to prespecified error bounds. We consider a composite rule involving the CUSUM reaction period, that is coupled with an identification function, and show that by choosing parameters appropriately, such a pair of composite rule and identification function can be asymptotically optimal of first order to detect the change point and simultaneously satisfies the error bounds to identify the postchange drift as the average first false alarm increases without bound. We also discuss the detection problem along under some situations. Friday, January 9, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Luo Xiao (Johns Hopkins University) Title: Quantifying the lifetime circadian rhythm of physical activity: a covariatedependent functional data approach Abstract: Objective measurement of physical activity using wearable devices such as accelerometers may provide tantalizing new insights into the association between activity and health outcomes. Accelerometers can record quasicontinuous activity information for many days and for hundreds of individuals. For example, in the Baltimore Longitudinal Study on Aging (BLSA) physical activity was recorded every minute for 773 adults for an average of 4.5 days per adult. An important scientific problem is to separate and quantify the systematic and random circadian patterns of physical activity as functions of time of day, age, and gender. To capture the systematic circadian pattern we introduce a practical bivariate smoother and two crucial innovations: 1) estimating the smoothing parameter using leaveonesubjectout cross validation to account for withinsubject correlation; and 2) introducing fast computational techniques that overcome problems both with the size of the data and with the crossvalidation approach to smoothing. The agedependent random patterns are analyzed by a new functional principal component analysis that incorporates both covariate dependence and multilevel structure. Results reveal several interesting, previously unknown, circadian patterns associated with human aging and gender. Monday, January 12, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Yuekai Sun (Stanford University) Title: Distributed estimation and inference for sparse regression Abstract: We address two outstanding challenges in sparse regression: (i) computationally efficient estimation in distributed settings (ii) valid inference for the selected coefficients. The main computational challenge in a distributed setting is harnessing the computational capabilities of all the machines while keeping communication costs low. We devise an approach that requires only a single round of communication among the machines. We show the approach recovers the convergence rate of the (centralized) lasso as long as each machine has access to an adequate number of samples. Turning to the second challenge, we devise an approach to postselection inference by conditioning on the selected model. In a nutshell, our approach gives inferences with the same frequency interpretation as those given by data/sample splitting, but it is more broadly applicable and more powerful. The validity of our approach also does not depend on the correctness of the selected model; i.e. it gives valid inferences even when the selected model is incorrect. Wednesday, January 14, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Johann GagnonBartsch (University of California Berkeley) Title: Removing Unwanted Variation with Negative Controls Abstract: Highthroughput biological data, such as microarray data and gene sequencing data, are plagued by unwanted variation  systematic errors introduced by variations in experimental conditions such as temperature, the chemical reagents used, etc. This unwanted variation is often stronger than the biological variation of interest, making analysis of the data challenging, and severely impeding the ability of researchers to capitalize on the promise of the technology. One of the biggest challenges to removing unwanted variation is that the factors causing this variation (temperature, atmospheric ozone, etc.) are unmeasured or simply unknown. This makes the unwanted variation difficult to identify; the problem is essentially one of unobserved confounders. In my talk, I will discuss the use of negative controls to help solve this problem. A negative control is a variable known a priori to be unassociated with the biological factor of interest. I will begin with an example that will introduce the notion of negative controls, and demonstrate the effectiveness of negative controls in dealing with unwanted variation. I will then discuss negative controls more generally, including a comparison with instrumental variables. Friday, January 16, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Heng Yang (City University of New York) Title: Simultaneous Detection And Identification With PostChange Uncertainty Abstract: We consider the problem of quickest detection of an abrupt change when there is uncertainty about the postchange distribution. Because of the uncertainty, We would like not only detecting the change point but also identifying the postchange distribution simultaneously. In particular we examine this problem in the continuoustime Wiener model where the drift of observations changes from zero to a drift randomly chosen from a collection. We set up the problem as a stochastic optimization in which the objective is to minimize a measure of detection delay subject to a frequency of false alarm constraint, while also identifying the value of the postchange drift up to prespecified error bounds. We consider a composite rule involving the CUSUM reaction period, that is coupled with an identification function, and show that by choosing parameters appropriately, such a pair of composite rule and identification function can be asymptotically optimal of first order to detect the change point and simultaneously satisfies the error bounds to identify the postchange drift as the average first false alarm increases without bound. We also discuss the detection problem along under some situations. Wednesday, January 21, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Jennifer Bobb (Harvard University) Title: Beyond the oneexposure, oneoutcome paradigm for scientific discovery in environmental epidemiology Abstract: The most common approach in environmental epidemiology is to hypothesize a relationship between a particular exposure and a particular outcome and then estimate the health risks. In this talk I will present two case studies from my research that move beyond this standard oneexposure, oneoutcome paradigm. The first case study considers the problem of estimating the effects of multiple exposures on a single outcome. We propose a new approach for estimating the health effects of multipollutant mixtures, Bayesian kernel machine regression, which simultaneously estimates the (potentially highdimensional) exposureresponse function and incorporates variable selection to identify important mixture components. The second case study considers the effects of a single exposure (heat waves) on multiple outcomes (causespecific hospitalization rates). Rather than prespecifying a small number of individual diseases, we jointly consider all 15,000 possible discharge diagnosis codes and identify the full spectrum of diseases associated with exposure to heat waves among 23.7 million older adults. Through these case studies, we find that approaches that consider multiple exposures and/or multiple outcomes have the potential to lead to new scientific insights.
Wednesday, January 28, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Arnon Boneh (Tel Aviv University) Title: Review of the classical grouptesting problem and some new results Abstract: The classical (I,N,q) grouptesting (GT) problem is : In a (finite) population
of N identical members each one either has a well defined property (="good") or
does not have it (="bad"). There are N corresponding i.i.d. random variables each
having probability q of being "good"(0 Wednesday, February 25, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Alberto G Busetto (UCSB) Title: Nearoptimal Bayesian design of experiments for model selection of nonlinear dynamical systems" Abstract: The identification of dynamic processes is both statistically and computationally challenging. This talk introduces formal guarantees of nearoptimal Bayesian design of experiments aimed selecting nonlinear dynamical systems from data. We prove by reduction to graphical modeling and maximal coverage approximation that the joint selection of the most informative time points and components of the state space can be performed in polynomialtime and nearoptimally, and with the best constant approximation factor unless P=NP. We further discuss the case of selecting active interventions, and show that these same bounds apply under certain sufficient conditions, for instance when it is possible to restart the system, or perform experiments in parallel. We conclude by reporting our results in applications of concrete biomedical interest, such as phosphoproteomics and personalized treatment. Wednesday, March 4, South Hall 5607F, 3:304:30PM, Refreshments served at 3:15 PM Dr. Wade Herndon (UCSB) Title: Testing and Adjusting for Informative Selection in Survey Data Abstract: When sampling from a finite population, a combination of cost, efficiency, and logistical concerns often leads to a complex sample selection mechanism. Sampling weights are routinely used when analyzing survey data to account for the selection mechanism, and reflect the fact that some subgroups of the population are over or underrepresented in the sample. Due to the selection mechanism, the distribution of the data in the sample may be different from the distribution of the data in the target population. This is known as informative selection. When fitting statistical models to survey data, the sampling weights can be used to account for informative selection. One approach is to construct weighted estimates of model parameters that are consistent under proper model specification, whether or not informative selection is present. Another approach is to use the sampling weights to assess whether or not informative selection is present, and to adjust the model if it is present. This talk will focus on the latter approach to using the weights: assessing the presence of informative selection with a new likelihood ratio test, and adjusting for informative selection with a new semiparametric procedure. Asymptotic theory for the likelihood ratio test under the null hypothesis of noninformative selection and under local alternatives is described, along with a bootstrap version and applications to survey data. The semiparametric approach to adjusting for informative selection uses standard, modelbased approaches for fitting linear models combined with nonparametric, weighted estimators to account for the informative sampling design. Wednesday, March 11, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. SangYun Oh (Lawrence Berkeley National Lab) Title: Principled and Scalable Methods for High Dimensional Graphical Model Selection Abstract:Learning high dimensional graphical models is a topic of contemporary interest. A popular approach is to use L1 regularization methods to induce sparsity in the inverse covariance estimator, leading to sparse partial covariance/correlation graphs. Such approaches can be grouped into two classes: (1) regularized likelihood methods and (2) regularized regressionbased, or pseudolikelihood, methods. Regression based methods have the distinct advantage that they do not explicitly assume Gaussianity. One major gap in the area is that none of the popular approaches proposed for solving regression based objective functions guarantee the existence of a solution. Hence it is not clear if resulting estimators actually yield correct partial correlation/partial covariance graphs. To this end, we propose a new regression based graphical model selection method that is both tractable and has provable convergence guarantees, leading to welldefined estimators. In addition, we demonstrate that our approach yields estimators that have good large sample properties and computational complexity. The methodology is illustrated on both real and simulated data. We also present a novel unifying framework that places various pseudolikelihood graphical model selection methods as special cases of a more general formulation, leading to important insights. (Joint work with Bala Rajaratnam and Kshitij Khare) Wednesday, April 1, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Vilmos Prokaj (UCSB) Title: Shadow price for power utility Abstract:Consider the BlackScholes model with one risky asset. The investor goal is to maximize her expected discounted utility from consumption. This is the classical Merton problem, which was investigated in frictionless model as well as when the investor faces proportional transaction costs. Recently, Kallsen and MuhleKarbe proposed a new way to look at the problem using the notion of the shadow price or shadow market. The latter is a frictionless market with a price process (this is the shadow price) evolving in the bidask spread of the original market. The shadow market allows more admissible strategies, therefore when the optimal policy of the shadow market is admissible in the original market it is obviously optimal in both markets. That way one can find the optimal policy of a market with transaction cost by finding a suitable frictionless model and solving the Merton problem there. Kallsen and MuhleKarbe demonstrated their method of finding the shadow market in the case of logarithmic utility. What makes the logarithmic utility easy to handle is the fact that under quite general assumption the consumption is a fixed proportion of wealth, determined by the impatience factor of the question. This is not true for power utility and that prevented them to extend the method to this case. In a joint work with Attila Herczegh, we overcame this difficulty by showing that the marginal rate of substitution is a good candidate for the shadow price. As a byproduct we got the asymptotic expansion of the nontrading region and relative consumption rate for small transaction costs. Wednesday, April 8, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Sheldon Ross (USC) Title: Some Gambler Ruin Type Problems Abstract: We consider two multi player variants of the classical gambler's ruin problem. In both models we suppose there are k players, with initial fortunes n_1, ... , n_k In the rst model, at each stage two of the players with positive fortunes are chosen to play a game in which each is equally likely to win, with the loser giving one unit to the winner. In the second model, at each stage each player that has a positive fortune puts one unit into a pot which is equally likely to be won by each of them. In both models we are interested in such quantities as the mean number of games played. The first model is solved by elementary arguments, whereas the analysis in the second model utilizes martingale theory.
Wednesday, April 22, ESB 1001 (Engineering Science Building), 3:305PM, Refreshments served at 3:15 PM Dr. Barry Arnold (UCR) Sobel seminar Title: Some alternatives to the classical multivariate normal model Abstract:Although the classical multivariate model is aesthetically enticing, it is not unusual to encounter data sets which do not seem to fit the Gaussian model. A variety of flexible augmentations of the classical model have received attention in recent decades. A full survey will not be attempted. Instead we will concentrate on a few of the directions in which model augmentation has been proposed. Specifically we will discuss: conditional specification, hidden truncation, contour specification and a generalized Rosenblatt construction. Dr. Theodore P. Hill (Professor Emeritus of Mathematics, Georgia Tech and Research Scholar in Residence, California State Polytechnic Univ.) Title: Recent Advances in the Theory and Applications of Benford’s Law Abstract:Benford’s law, an empirical statistical phenomenon first observed in the nineteenth century, is now being used to detect earthquakes and alterations in digital images, to analyze math models and computational errors in scientific calculations, and to check for fraud in tax and voting returns. Benford’s law predicts that the significant digits of many data sets will be logarithmically distributed, rather than uniformly distributed as might be expected; for example, more than 30% of the leading decimal digits will be 1, and fewer than 5% will be 9. This talk will briefly survey both new empirical evidence of Benford's law in natural sciences, and very recent mathematical discoveries that help further explain the ubiquity of this distribution. Several of the latest applications will also be described, along with open problems in probability and statistics, dynamical systems, number theory, and differential equations. The talk will be aimed for the nonspecialist. Wednesday, May 6, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Tomasz J. Kozubowski (University of Nevada, Reno) Title: Multivariate stochastic models involving sums and maxima Abstract:We present a class of stochastic models connected with the joint distribution of (X,Y,N), where N is a deterministic or random integer while X and Y are, respectively, the sum and the maximum of N random variables, independent of N. Models of this form, particularly with random N, are desirable in many applications, ranging from hydroclimatology, to finance and insurance. Our construction is built upon a basic model involving a deterministic number n of IID exponential variables, where the basic characteristics of the joint distribution of (X,Y) admit explicit forms. We describe this special case in detail, and proceed with generalizations going beyond the exponential distribution and/or the IID assumption. One particular model we shall present involves the sum and the maximum of dependent, heavytail Pareto components. Another example with geometrically distributed N, representing the duration of the growth period of the daily logreturns of currency exchange rates, will be used to illustrate modeling potential of this construction. This research was partially carried out jointly with M. Arendarczyk, A.K. Panorska, and F. Queadan. Dr. Kaushik Ghosh (University of Nevada, Las Vegas) Title: Sampling designs via a multivariate hypergeometricDirichlet process model for a multispecies assemblage with unknown heterogeneity Abstract: In a sample of mRNA species counts, sequences without duplicates or with small numbers of copies are likely to carry information related to mutations or diseases and can be of great interest. However, in some situations, sequence abundance is unknown and sequencing the whole sample to find the rare sequences is not practically possible. To collect mRNA sequences of interest, we propose a twophase Bayesian sampling method that addresses these concerns. The first phase of the design is used to infer sequence (species) abundance through a cluster analysis applied to a pilot data set. The clustering method is built upon a multivariate hypergeometric model with a Dirichlet process prior for species’ relative frequencies. The second phase, through Monte Carlo simulations, infers the sample size needed to collect a certain number of species of particular interest. Efficient posterior computing schemes are proposed. The developed approach is demonstrated and evaluated via simulations. An mRNA segment data set is used to illustrate and motivate the proposed method. Wednesday, May 20, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Stéphane Guerrier (University of Illinois at UrbanaChampaign) Title: Robust inference for time series models: a waveletbased framework Abstract: We present a new framework for the robust estimation of time series models which is fairly general and, for example, covers models going from ARMA to statespace models. This approach provides estimators which are (i) consistent and asymptotically normally distributed, (ii) applicable to a broad spectrum of time series models, (iii) straightforward to implement and (iv) computationally efficient. The framework is based on the recently developed Generalized Method of Wavelet Moments and a new robust estimator of the wavelet variance. Compared to existing methods, the latter directly estimates the quantity of interest while performing better in finite samples and using milder conditions for its asymptotic properties to hold. Hence, not only does this paper provide an alternative estimator which allows to perform wavelet variance analysis when data are contaminated but also a general approach to robustly estimate the parameters of a variety of time series models. The simulation studies carried out confirm the better performance of the proposed estimators and the usefulness and broadness of the proposed methodology is shown using practical examples from the domains of hydrology and engineering with sample sizes up to 500,000. Wednesday, May 27, South Hall 5607F, 3:305PM, Refreshments served at 3:15 PM Dr. Andrew Papanicolaou (University of Sydney) Title: Filtering the Maximum Likelihood for Multiscale Problems Abstract:Filtering and parameter estimation under partial information for multi scale problems is studied in this paper. After proving mean square convergence of the nonlinear filter to a filter of reduced dimension, we establish a central limit theorem type correction for the conditional (on the observations) loglikelihood process. To achieve this we assume that the operator of the (hidden) fast process has a discrete spectrum. Based on these results, we then propose to estimate the unknown parameters of the model based on the limiting loglikelihood, which is of reduced dimension and easier to work with. We also establish consistency and asymptotic normality of the maximum likelihood estimator based on the reduced loglikelihood. Simulation results support our theoretical findings.


Site
Map  Contact Us 
Accessibility
 College of Letters & Science  UCSB website  UCSB
Points of Pride Statistics & Applied Probability University of California Santa Barbara, California 931063110 (805) 8932129 South Hall 5607A webmaster 