Next: Human Circadian Rhythms Up: Examples Previous: Core Body Temperature Data

fda.html, includes mean monthly temperatures at 35 Canadian weather stations. The left panel in Figure shows the plots of mean temperature from 4 selected weather stations.

Ramsay and Silverman (1997) and Ramsay and Li (1998) used this dataset to illustrate the usefulness of Functional Data Analysis (FDA). Among others, they addressed the following two questions: how to measure the departure from a sinusoidal model and how to register the curves for further analyses. In the following, we approach these questions with SNM models.

We assume that mean temperatures over month at all weather stations share a common shape function. Variation between stations are modeled by vertical shift, vertical scale and horizontal shift parameters. That is, we assume that

 (86)

where is the temperature of the th month at station ; is the time point scaled to ; , where is an unstructured covariance matrix; 's are errors modeled by an AR(1) within-subject correlation structure with lag 1 autocorrelation coefficient and variance ; , fixed, is the annual mean temperature for all stations; is the vertical shift; is the amplitude; and is the horizontal shift of station . Since the mean temperature is a periodic function with a period equal to one year, is periodic with a period equal to 1. Thus we assume . To make the constant and identifiable, we need the side condition . This is equivalent to assuming that , where represents all constant functions. Since the sinusoidal form provides rough approximation to the temperature profile, we use the -spline with . The GML estimate of the smoothing parameter approaches to zero slowly as iteration increases, which leads to wiggly estimates. To get smoother estimates, we set a lower bound for the smoothing parameter. Specifically, . To save time, we also change the convergence criterion from the default to .
> canada.fit <- snm(temp~b1+exp(b2)*f(month-alogit(b3)),
func=f(u)~list(~sin(2*pi*u)+cos(2*pi*u)-1,lspline(u,type=''sine0'')),
fixed=list(b1~1), random=list(b1+b2+b3~1), cor=corAR1(),
control=list(rkpk.control=list(limnla=c(-4,0)),prec.out=0.005))
...
AIC    BIC    logLik
1529.772 1607.5 -745.6426

Random effects:
Formula: list(b1 ~ 1, b2 ~ 1, b3 ~ 1)
Level: station
Structure: General positive-definite, Log-Cholesky parametrization
StdDev     Corr
b1       5.94427125 b1     b2
b2       0.30031948 -0.745
b3       0.06308751 -0.013 -0.462
Residual 1.48044625

Correlation Structure: AR(1)
Formula: ~1 | station
Parameter estimate(s):
Phi
0.7258356
Fixed effects: list(b1 ~ 1)
Value Std.Error  DF  t-value p-value
b1 1.690498 0.6188728 385 2.731576  0.0066

GCV estimate(s) of smoothing parameter(s): 0.0001000000
Equivalent Degrees of Freedom (DF):  10.24364

Converged after 5 iterations

The resulting fits are plotted in the left panel of Figure for the four selected stations. Fits for other stations are similar. The estimated correlation coefficient is with an approximate confidence interval . Thus there is evidence that serial correlation does exist in the data. Figure shows the overall fit of the common curve and its projections on subspaces (sinusoidal part) and (remaining part) together with their Bayesian confidence intervals. The small departure from sinusoidal form is statistically significant, especially in Spring and Fall. Our conclusion here is comparable to that of Ramsay and Silverman (1997), but their approach is on individual stations and does not provide a formal test.

By removing the horizontal shift in each curve, we can easily align all curves. Therefore our method can be used for curve registration.

grid <- NULL
for (i in 1:35) {
grid <- c(grid,seq(0,1,len=40)+
alogit(canada.fit$coef$randomstation[70+i])+.5) } p.canada.2 <- predict(canada.fit, newdata=data.frame(month=grid, station=rep(1:35,rep(40,35))))  The right panel in Figure displays the fitted curves for all 35 stations after aligning them together by removing shifts. Below is new Ramsay and Silverman (1997) also fitted various functional linear models (FLM) to this data set. The functional data can appear in the FLM as (i) the dependent variable, (ii) the independent variable, or (iii) both. We now show that those FLM are special cases of general smoothing spline models. Therefore they can be fitted by the ssr function. To investigate geographical differences in the pattern of the annual tempreture function, Ramsay and Silverman (1997) divided Canada into four meteorological zones: Atlantic, Pacific, Continental and Arctic. Then they considered the following FLM (Model (9.1) in Ramsay and Silverman (1997))  (87) where and correspond to climate zones Atlantic, Pacific, Continental and Arctic respectively, is the th temperature function in the th zone, is the grand mean function, represent zone effect, and are random errors. Model () is an example of case (i). Now consider expected temperature as a function of both zone and time : . Suppose that we want to model zone effect using an one-way ANOVA model and time effect using a periodic spline. Then we have the following SS ANOVA decomposition  (88) where is a constant, and are the main effects of zone and time, and is the interaction between zone and time. Comparing () with (), it is obvious that and . Therefore, model () can be regarded as an SS ANOVA model. Discretize the functional response by 12 months, we can fit the data as follows. Dense discretization such as by days could also be used. > temps <- matrix(scan(monthtemp.dat'',0), 35, 12, byrow=T) > atlindex <- c(1,2,3,4,5,6,7,8,9,10,11,13,14,16) > pacindex <- c(25,26,27,28,29) > conindex <- c(12,15,17,18,19,20,21,22,23,24,30,31,35) > artindex <- c(32,33,34) > t <- seq(0.5, 11.5, 1)/12 > tgrid <- seq(0,1,len=50) > x <- apply(temps,1,sum) > n <- 35*12 > temp <- as.vector(t(temps[c(atlindex,pacindex,conindex,artindex),])) > grp <- as.factor(rep(1:4,12*c(14,5,13,3))) > tempdata <- data.frame(temp=temp,t=rep(t,35),grp=grp) > canada4 <- ssr(temp ~ grp, spar="m",data=tempdata, rk=list(periodic(t),rk.prod(periodic(t),shrink1(grp)))) > summary(canada4) Smoothing spline regression fit by GML method Call: ssr(formula = temp ~ grp, rk = list(periodic(t), rk.prod(periodic(t), shrink1(grp))), data = tempdata, spar = "m") Coefficients (d): (Intercept) grp2 grp3 grp4 4.787500 2.809167 -5.204167 -16.651389 GML estimate(s) of smoothing parameter(s) : 0.2622870 0.9971152 Equivalent Degrees of Freedom (DF): 23.74366 Estimate of sigma: 3.762337 Number of Observations: 420 > grid <- list(grp=as.factor(rep(1:4,rep(50,4))),t=rep(tgrid,4)) > p.canada4 <- predict(canada4,newdata=expand.grid(t=tgrid,grp=as.factor(1:4)), terms=c(1,1,1,1,0,1))  Figures and display the estimated zone effects and the fits for four regions. They are very similar to Figures 9.1 and 9.2 in Ramsay and Silverman (1997). In addition to the estimates, we can provide confidence intervals. To investigate the possible replationship between total annual precipitation and the temperature function, Ramsay and Silverman (1997) considered the following FLM (Model (10.3))  (89) where is the logarithm of total annual precipitation at station , is a constant parameter, is the temperature function at station , is a unknown weight function, and are random errors. The goal is to estimate the weight function . Model () is an example of case (ii). Without loss of the generality, suppose that the time interval has been transformed into . That it, . It is reasonable to assume that is a smooth periodic function. Specifically, we model using cubic periodic spline space . Define linear functionals . are bounded when which we assume to be true. Then model () is a partial spline model. Denote where . Let be the reproducing kernel (RK) matrix for . From the definition in Section , the th elements of the matrix  (90) where is the RK of , and represnet middle point of month , is the temperature of month at station , , and . We used simple summations to approximate the integrals. More accurate approximations can be used. We also ignored a scaling constant () since it can be absorbed by the smoothing parameter. > prec <- matrix(scan(monthprec.dat'',0), 35, 12, byrow=T) > y <- log(apply(prec,1,sum)) > R <- temps %*% periodic(t) %*% t(temps) > canada5 <- ssr(y ~ x, rk=R) > summary(canada5) Smoothing spline regression fit by GML method Call: ssr(formula = y ~ x, rk = R, spar = "m") Coefficients (d): (Intercept) x 6.433069274 0.006122233 GML estimate(s) of smoothing parameter(s) : 0.006772994 Equivalent Degrees of Freedom (DF): 4.062615 Estimate of sigma: 0.3588121 Number of Observations: 35  The estimated weight function is represented by where is an estimate of . To compute at grid points of size , where is a vector of 1's of size , , , is the RK of , is a matrix of evaluated at all combinations of and , is a matrix with each row consists monthly tempretures at each station, and . Bayesian confidence intervals are not available for non-evaluational functionals. Therefore we use the following bootstrap procedure to compute confidence intervals. We used simple percentile intervals. Other approaches may be used (Wang and Wahba, 1995). > S <- periodic(tgrid,t)%*%t(temps) > f <- canada5coef$d[2] + S%*%canada5$coef$c > nboot <- 999 > fb <- NULL > for (i in 1:nboot) { yb <- canada5$fit+sample(canada5$resi,35,replace=T) bfit <- ssr(yb ~ x, rk=R) fb <- cbind(fb,bfit$coef$d[2] + S%*%bfit$coef$c) } > lb <- apply(fb,1,quantile,prob=.025) > ub <- apply(fb,1,quantile,prob=.975)  We used the default GCV method to select the smoothing parameter in the above computation. We repeated the computation with the GML method to select the smoothing parameter. Figure displays the estimated regression weight functions. It is interesting to note that the estimates with GCV and GML choices of the smoothing parameter are similar to the upper left plot and the lower right plot in Figure 10.3 of Ramsay and Silverman (1997). As indicated in Figure 10.6 of Ramsay and Silverman (1997) that it is difficult to get a precise estimate of the smoothing parameter due to the small sample size. Interestingly that the GCV method picks an estimate of the smoothing parameter that is close to the lower bound and the GML method picks an estimate that is close to the upper bound. Wide confidence intervals are the consequence of the small sample size (). To investigate the dependence of the complete log precipitation profile on the complete tempreture profile, Ramsay and Silverman (1997) considered the following FLM  (91) where is the logarithm of annual precipitation profile at station , plays the part of an intercept in the standard regression, is the temperature function at station , is a unknown weight function at time , and are random errors processes. The goal is to estimate and . Model () is an example of case (iii). The expected annual precipitation profile depends on two non-parametric functions: and . Again, we assume that , and and are smooth periodic functions. Specifically, we assume that , and . Equivalently, we use the following SS ANOVA decomposition for and :  (92) Again, we discretize the annual precipitation profile by 12 months. Model () becomes  (93) where is the logarithm of precipitation in month at station , , , and . For simplicity, we assume random errors are independent. Models with correlated random errors can be fitted similarly. Let , , , , , , and . Then model () can be written in a matrix form  (94) We have two parametric terms, and , and four non-parametric terms, , , and , in model (). To fit this model, we need to find RK matrices for four non-parametric terms. It is not difficult to see that the RK matrices for , and are , , , where is the RK of , , represents a matrix of evaluated at all combinations of , and is a matrix defined in (). To compute the RK matrix for , we define the linear functional and assume that it is bounded. Since the RK of a tensor product space is the product of the two RK's, the RK of is . Then elements of the repreducing kernel matrix for is Thus the RK matrix for is , where rk.prod prepresents elementwise product of two matrices. > prec <- matrix(scan(monthprec.dat'',0), 35, 12, byrow=T) > y <- as.vector(t(log(prec))) > xx <- rep(x,rep(12,35)) > R1 <- kronecker(matrix(1,35,35),periodic(t)) > R2 <- kronecker(R,matrix(1,12,12)) > R3 <- kronecker(x%*%t(x),periodic(t)) > R4 <- rk.prod(R1,R2) > canada6 <- ssr(y ~ xx, rk=list(R1,R2,R3,R4)) > summary(canada6) Smoothing spline regression fit by GCV method Call: ssr(formula = y ~ xx, rk = list(R1, R2, R3, R4)) Coefficients (d): (Intercept) xx 4.745601623 0.003035296 GCV estimate(s) of smoothing parameter(s) : 1.727184e+03 3.630885e-02 1.416582e+07 4.597862e-04 Equivalent Degrees of Freedom (DF): 60.52553 Estimate of sigma: 0.3549322 Number of Observations: 420  We now show how to compute estimated functions evaluated at grid points. We stacked all observations as . Denote the corresponding months for as . Denote as the total number of observations. Then the estimated functions are represented by where is the integer part of . To compute , , at grid points and for and respectively, where , , and . The interaction is a bivariate function. Thus we evaluate it at a bivariate grid : where is a matrix with . We also compute bootstrap confidence intervals. > S1 <- kronecker(t(rep(1,35)),periodic(tgrid,t)) > S2 <- kronecker(S,t(rep(1,12))) > S3 <- kronecker(t(x),periodic(tgrid,t)) > S4 <- NULL > for (i in 1:50) {for (j in 1:50) S4 <- rbind(S4,S1[i,]*S2[j,])} > alpha <- canada6$coef$d[1] + n*canada6$lambda[1]*S1%*%canada6$coef$c
> beta0 <- canada6$coef$d[2]
> beta1 <- n*canada6$lambda[2]*S2%*%canada6$coef$c > beta2 <- n*canada6$lambda[3]*S3%*%canada6$coef$c
> beta12 <- n*canada6$lambda[4]*S4%*%canada6$coef$c > beta <- beta0 + rep(beta1,rep(50,50)) + rep(beta2,50) + beta12 > nboot <- 99 > alphab <- betab0 <- betab1 <- betab2 <- betab12 <- betab <- NULL > for (i in 1:nboot) { yb <- canada6$fit+rnorm(n,sd=canada6$sig) bfit <- ssr(yb ~ xx, rk=list(R1,R2,R3,R4)) alphab <- cbind(alphab,bfit$coef$d[1] + n*bfit$lambda[1]*S1%*%bfit$coef$c)
bb0 <- bfit$coef$d[2]
bb1 <- n*bfit$lambda[2]*S2%*%bfit$coef$c bb2 <- n*bfit$lambda[3]*S3%*%bfit$coef$c
bb12 <- n*bfit$lambda[4]*S4%*%bfit$coef\$c
betab0 <- c(betab0,bb0)
betab1 <- cbind(betab1,bb1)
betab2 <- cbind(betab2,bb2)
betab12 <- cbind(betab12,bb12)
betab <- cbind(betab, bb0 + rep(bb1,rep(50,50)) + rep(bb2,50) + bb12)
}
> lba <- apply(alphab,1,quantile,prob=.025)
> uba <- apply(alphab,1,quantile,prob=.975)
> lbb1 <- apply(betab1,1,quantile,prob=.025)
> ubb1 <- apply(betab1,1,quantile,prob=.975)
> lbb2 <- apply(betab2,1,quantile,prob=.025)
> ubb2 <- apply(betab2,1,quantile,prob=.975)
> lbb12 <- apply(betab12,1,quantile,prob=.025)
> ubb12 <- apply(betab12,1,quantile,prob=.975)
> lbb <- apply(betab,1,quantile,prob=.025)
> ubb <- apply(betab,1,quantile,prob=.975)


Figure displays the estimates of and . The scale, as well as the GCV estimates of the smoothing parameters ( ), indicate that the interaction is not small. Figures and display splices of the estimated interaction function and their 95% bootstrap confidence intervals with one variable fixed approximately at the middle points of 12 months. Figures and display splices of the estimated function and their 95% bootstrap confidence intervals with one variable fixed approximately at the middle points of 12 months. Figure displays estimates of , and functions.

We also fit the data using the GML choice of the smoothing parameter.

> canada6.1 <- ssr(y ~ xx, rk=list(R1,R2,R3,R4),spar="m")
Smoothing spline regression fit by GML method
Call: ssr(formula = y ~ xx, rk = list(R1, R2, R3, R4), spar = "m")

Coefficients (d):
(Intercept)          xx
4.7327087   0.0030902

GML estimate(s) of smoothing parameter(s) :
7.195469e-01 4.852421e-02 3.633548e+04 3.613019e+00
Equivalent Degrees of Freedom (DF):  27.24519
Estimate of sigma:  0.3733665

Number of Observations:  420


The GML estimates of smoothing parameters indicates that both the interaction and the main effect are small. Figure displays the estimates of and . Figure displays estimates of , and functions.

Next: Human Circadian Rhythms Up: Examples Previous: Core Body Temperature Data
Yuedong Wang 2004-05-19