next up previous
Next: Global Climate Data Up: Examples Previous: Examples

Arosa Ozone Data

This is a data set in Andrews and Herzberg (1985). Monthly mean ozone thickness (Dobson units) in Arosa, Switzerland from 1926 to 1971 was recorded. The data is available as Arosa. We are interested in investigating how ozone thickness changes over months and years.

We use this data to illustrate how to fit a periodic spline, a partial spline, an $L$-spline, an $L$-spline with unequal variances, an $L$-spline spline with correlated random errors, a partial spline with both variables month and year, an SS ANOVA model, partial splines for the whole time series, a semi-parametric linear mixed effects model, and some varying coefficients models.

Let us ignore the year effect first and concentrate on the month effect on ozone thickness. Let $\mbox{{\tt csmonth}}=(\mbox{{\tt month}}-.5)/12$. We first fit a parametric sine and cosine model

$\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}})
=\mu + \beta_1 \sin...
...
+\beta_2 \cos ( 2 \pi \mbox{{\tt csmonth}})
+ \epsilon (\mbox{{\tt csmonth}}),$     (58)

and a periodic spline
$\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}})
=f({\tt csmonth}) + \epsilon (\mbox{{\tt csmonth}}),$     (59)

where $f \in W_2(per)$.
> data(Arosa)
> Arosa$csmonth <- (Arosa$month-0.5)/12
> attach(Arosa)
> ozone.fit1 <- lm(thick~sin(2*pi*csmonth)+cos(2*pi*csmonth), data=Arosa)
> summary(ozone.fit1)
...
Coefficients:
                      Estimate Std. Error t value Pr(>|t|)    
(Intercept)           337.0986     0.7605 443.263  < 2e-16 ***
sin(2 * pi * csmonth) -47.3881     1.0719 -44.209  < 2e-16 ***
cos(2 * pi * csmonth)   7.7966     1.0790   7.226 1.81e-12 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 17.31 on 515 degrees of freedom
Multiple R-Squared: 0.7958,	Adjusted R-squared: 0.795 
F-statistic:  1003 on 2 and 515 DF,  p-value: < 2.2e-16 

> ozone.fit2 <- ssr(thick~1, rk=periodic(csmonth), spar="m", data=Arosa) 
> summary(ozone.fit2)
...
Coefficients (d):
 (Intercept) 
     337.091

GML estimate(s) of smoothing parameter(s) : 1.717233e-06 
Equivalent Degrees of Freedom (DF):  9.0131 
Estimate of sigma:  16.78767 

> anova(ozone.fit2,simu.size=500)
Testing H_0: f in the NULL space

    test.value simu.size simu.p-value approximate.p-value 
LMP  0.2663855       500            0                    
GML  0.2025431       500            0                   0

Figure: Parametric and periodic spline fits. Points are observations; Solid line is the periodic spline fit; Dotted lines are 95% Bayesian confidence intervals; Dashed line is the parametric fit.
\begin{figure}\centerline{\psfig{figure=ozone1.ps,height=3.5in,width=4.5in}}\end{figure}

Figure [*] shows that parts of the parametric sine and cosine fit are outside the confidence intervals of the periodic spline fit. It suggests that the simple parametric model may not be sufficient. We can test the departure from the sine-cosine model using a partial spline model by adding the sine and cosine functions to the null space of a periodic spline

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}})$  
  $\textstyle =$ $\displaystyle \mu + \beta_1 \sin ( 2 \pi \mbox{{\tt csmonth}})
+ \beta_2 \cos (...
...box{{\tt csmonth}})
+f(\mbox{{\tt csmonth}}) + \epsilon (\mbox{{\tt csmonth}}),$ (60)

where $f \in W_2(per) \ominus \{ 1 \}$.

> ozone.fit3 <- ssr(thick~sin(2*pi*csmonth)+cos(2*pi*csmonth), 
                  rk=periodic(csmonth), spar="m", data=Arosa)
> summary(ozone.fit3)
...
Coefficients (d):
 (Intercept) sin(2 * pi * csmonth) cos(2 * pi * csmonth) 
    337.0933             -47.42298              7.696095

GML estimate(s) of smoothing parameter(s) : 4.49682e-06 
Equivalent Degrees of Freedom (DF):  7.46103 
Estimate of sigma:  16.78036 
> anova(ozone.fit3,simu.size=500)
Testing H_0: f in the NULL space

     test.value simu.size simu.p-value approximate.p-value 
LMP 0.001262064       500            0                    
GML   0.9553638       500            0        3.490208e-12

The departure from the parametric model is significant. Note that with penalty $\int_0^1 (f^{\prime \prime})^2 dt$, sine and cosine functions we added to the null space are also in the space $W_2(per) \ominus \{ 1 \}$. Thus the parametric part of the partial spline model ([*]) is not orthogonal to $W_2(per) \ominus \{ 1 \}$. Another approach to test the parametric model is to use a periodic $L$-spline model with $L=D[D^2+(2\pi)^2]$:

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}})$  
  $\textstyle =$ $\displaystyle \mu + \beta_1 \sin ( 2 \pi \mbox{{\tt csmonth}})
+ \beta_2 \cos (...
...box{{\tt csmonth}})
+f(\mbox{{\tt csmonth}}) + \epsilon (\mbox{{\tt csmonth}}),$ (61)

where $f \in W_2(per) \ominus \{ 1,~
\sin ( 2 \pi \mbox{{\tt csmonth}}),~
\cos ( 2 \pi \mbox{{\tt csmonth}})\}$. Now the sine and cosine functions are orthogonal to $f$. Thus this approach will be more efficient. See Wang and Brown (1996) for detail.

> ozone.fit4 <- update(ozone.fit3, rk=lspline(csmonth,type="sine1"))
> summary(ozone.fit4,simu.size=500)
...
Coefficients (d):
 (Intercept) sin(2 * pi * csmonth) cos(2 * pi * csmonth) 
    337.0937             -47.42247              7.695888

GML estimate(s) of smoothing parameter(s) : 5.404848e-09 
Equivalent Degrees of Freedom (DF):  6.421139 
Estimate of sigma:  16.77356 
> anova(ozone.fit4,simu.size=500)
Testing H_0: f in the NULL space

      test.value simu.size simu.p-value approximate.p-value 
LMP 2.539163e-06       500            0                    
GML    0.9533696       500            0        1.164513e-12

Figure [*] shows plots of the overall fits and their projections for three models: ozone.fit2, ozone.fit3 and ozone.fit4. As we can see, all three models have similar overall fits. But their projections are quite different. As expected, the confidence intervals for the projections of the $L$-spline model ([*]) are narrower than those of the partial spline model ([*]).

Figure: Overall fits and their projections. ``Smooth'' represents the component in the space ${\cal H}_1$. Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone2.ps,height=6.5in,width=6in}}\end{figure}

Figure [*] suggests that the variances may not be a constant. We calculate residual variances for each month and plot them on the log scale (left panel of Figure [*]). It is obvious that they are not equal. It seems that simple sine and cosine functions can be used to model the variance function.

> v <- sapply(split(ozone.fit4$resi,month),var)
> a <- unique(csmonth)
> b <- lm(log(v)~sin(2*pi*a)+cos(2*pi*a))
> summary(b)
...
Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      5.43715    0.05763  94.341 8.57e-15 ***
sin(2 * pi * a) -0.71786    0.08151  -8.807 1.02e-05 ***
cos(2 * pi * a) -0.49854    0.08151  -6.117 0.000176 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 0.1996 on 9 degrees of freedom
Multiple R-Squared: 0.9274,	Adjusted R-squared: 0.9113 
F-statistic: 57.49 on 2 and 9 DF,  p-value: 7.48e-06

Therefore we assume the following variance function for model ([*])

\begin{displaymath}
\sigma^2(\mbox{{\tt csmonth}}) =
\mbox{Var}(\epsilon(\mbox{...
...ox{{\tt csmonth}}) +
b \cos (2 \pi \mbox{{\tt csmonth}})] \}.
\end{displaymath}

> ozone.fit5 <- update(ozone.fit4, weights=varComb(varExp(form=
                ~sin(2*pi*csmonth)), varExp(form=~cos(2*pi*csmonth))))
> summary(ozone.fit5)
...
Coefficients (d):
 (Intercept) sin(2 * pi * csmonth) cos(2 * pi * csmonth) 
    337.0753             -47.42951              7.773344

GML estimate(s) of smoothing parameter(s) : 3.676933e-09 
Equivalent Degrees of Freedom (DF):  6.84808 
Estimate of sigma:  15.22469 

Combination of:
Variance function structure of class varExp representing
      expon 
 -0.3555964
Variance function structure of class varExp representing
      expon 
 -0.2496355
The estimated variance parameters, $-0.3555964$ and $-0.2496355$, are very close to (up to a scale of 2 by definition) those based on residual variances, $-0.7178552$ and $-0.4985431$. The fitted variance function is plotted on the left panel of Figure [*]. As can be seen, it is almost identical to the fit based on residual variances. The right panel of Figure [*] plots the $L$-spline fit to the mean function with 95% Bayesian confidence intervals. Note that these confidence intervals are conditional on the estimated variance parameters. Thus they may have less coverage than the nominal value since the degrees of freedom for estimating the variance parameters are not counted. Nevertheless, we can see that unequal variances are reflected in the widths of these confidence intervals.

Figure: Left: circles are residual variances on the log scale, dotted line is the fit to residual variances, and solid line is the fit from ozone.fit5. Right: points are observations, solid line is the $L$-spline fit, and dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone3.ps,height=3in,width=5.8in}}\end{figure}

Observations close in time may be correlated. In the following we include a first-order autoregressive structure for random errors. Since some observations are missing, we use the continuous AR(1) correlation structure. That is, we assume the covariance between the random error of month $i_1$ in year $j_1$ and the random error of month $i_2$ in year $j_2$ is $\sqrt{\sigma^2((i_1-.5)/12) \sigma^2((i_2-.5)/12)}
\rho^{\vert i_1-i_2\vert+12\vert j_1-j_2\vert}$.

> Arosa$z <- month+12*(year-1)
> ozone.fit6 <- update(ozone.fit5,corr=corCAR1(form=~z)) 
> summary(ozone.fit6)
...
Coefficients (d):
 (Intercept) sin(2 * pi * csmonth) cos(2 * pi * csmonth) 
    336.8969             -47.34584              7.788939

GML estimate(s) of smoothing parameter(s) : 3.573493e-09 
Equivalent Degrees of Freedom (DF):  7.326922 
Estimate of sigma:  15.31061 

Correlation structure of class corCAR1 representing
       Phi 
 0.3410923
Combination of:
Variance function structure of class varExp representing
      expon 
 -0.3602399
Variance function structure of class varExp representing
      expon 
 -0.3009847

Now let us consider the effects of both month and year variables. First, suppose that the simple sine and cosine functions are appropriate for modeling the month effect, and we do not want to assume a parametric model for the year effect. Then we may consider the following partial spline model

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}})$  
  $\textstyle =$ $\displaystyle \beta_1 \sin ( 2 \pi \mbox{{\tt csmonth}})
+\beta_2 \cos ( 2 \pi ...
...+f(\mbox{{\tt csyear}}) + \epsilon (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}}),$ (62)

where $\mbox{{\tt csyear}}=(\mbox{{\tt year}}-1)/45$ and $f\in W_2([0,1])$.

> csyear <- (year-1)/45
> ozone.fit7 <- ssr(thick~sin(2*pi*csmonth)+cos(2*pi*csmonth)+I(csyear-0.5), 
                   rk=cubic(csyear), spar="m", data=Arosa)
> summary(ozone.fit7)
...
Coefficients (d):
 (Intercept) sin(2 * pi * csmonth) cos(2 * pi * csmonth) I(csyear - 0.5) 
    336.8798             -47.37047              7.776179        7.199672

GML estimate(s) of smoothing parameter(s) : 4.741079e-07 
Equivalent Degrees of Freedom (DF):  16.49958 
Estimate of sigma:  16.5235 
> anova(ozone.fit7,simu.size=500)
Testing H_0: f in the NULL space

    test.value simu.size simu.p-value approximate.p-value 
LMP 0.01157664       500        0.026                    
GML  0.9891654       500            0        0.0004092629

If we want to model both month and year non-parametrically, we can fit the following SS ANOVA model with a periodic spline for month and a cubic spline for year:

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}})$  
  $\textstyle =$ $\displaystyle \mu + \beta \mbox{{\tt csyear}} + s_1(\mbox{{\tt csmonth}})
+ s_2(\mbox{{\tt csyear}})
+ sl(\mbox{{\tt csmonth}},\mbox{{\tt csyear}})$  
    $\displaystyle + ss(\mbox{{\tt csmonth}},\mbox{{\tt csyear}}) +
\epsilon (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}}).$ (63)

> ozone.fit8 <- ssr(thick~I(csyear-0.5),  spar="m", data=Arosa,
                    rk=list(periodic(csmonth), cubic(csyear), 
                    rk.prod(periodic(csmonth), kron(csyear-.5)), 
                    rk.prod(periodic(csmonth), cubic(csyear))))
> summary(ozone.fit8)
...
Coefficients (d):
 (Intercept) I(csyear - 0.5) 
    336.8298        6.031531

GML estimate(s) of smoothing parameter(s) : 
4.108999e-01 7.457006e-02 1.656897e+05 5.438436e+03 
Equivalent Degrees of Freedom (DF):  24.59541 
Estimate of sigma:  15.84154 
> grid <- data.frame(csmonth=seq(0,1,len=50), csyear=seq(0,1,len=50))
> interaction.fit8 <- predict(ozone.fit8, grid, terms=c(0,0,0,0,1,1))
> max(abs(interaction.fit8$fit)/interaction.fit8$pstd)
[1] 0.009646192

The smoothing parameters for the interaction terms between month and year are large, which indicates that the interaction effects are small. Indeed the fitted values of interaction terms are very small (around $10^{-4}$) compared to posterior standard deviations (around 0.01). Therefore, we delete these interaction terms in our final model.

> ozone.fit9 <- update(ozone.fit8, rk=list(periodic(csmonth), cubic(csyear)))
> summary(ozone.fit9)
...
Coefficients (d):
 (Intercept) I(csyear - 0.5) 
    336.8287        5.989813

GML estimate(s) of smoothing parameter(s) : 0.41248034 0.07316391 
Equivalent Degrees of Freedom (DF):  24.66142 
Estimate of sigma:  15.8378

Figure [*] shows the estimated main effects of month and year.

Figure: Estimated main effects with 95% Bayesian confidence intervals. A dot in the left panel is the average thickness for a particular month minus the overall mean. A dot in the right panel is the average thickness for a particular year minus the overall mean.
\begin{figure}\centerline{\psfig{figure=ozone4.ps,height=2.5in,width=5.5in}}\end{figure}

We may also consider observations as a long time series. Note that many observations are missing, thus the state space approach in Kitagawa and Gersch (1984) cannot be used directly. Our approach allows observations at unequally spaced points. Thus it can also be used to impute missing observations. Let us define a time variable as $t=\mbox{\tt csmonth}+\mbox{\tt year}-1$. Similar to Kitagawa and Gersch (1984), we may consider the following model

$\displaystyle \mbox{{\tt thickness}}(t) = \mu + \beta_1 \sin (2\pi t) +
\beta_2 \cos (2\pi t) + f(t) + \epsilon(t),$     (64)

where $\sin (2\pi t)$ and $\cos (2\pi t)$ model seasonal trend, $f$ models the long term trend and $f \in W_2([0,T]) \ominus \{ 1 \}$, $T=45.45833$ is the maximum value of $t$, and $\epsilon(t)$'s are random errors and $\epsilon(t) \stackrel{iid}{\sim} \mbox{N}(0,\sigma^2)$. Model ([*]) is a partial spline. Since the domain is $[0,T]$ with $T\ne 1$, we use the cubic2 kernel function.
> Arosa$t <- csmonth+Arosa$year-1
> ozone.fit10 <- ssr(thick~t+sin(2*pi*t)+cos(2*pi*t), rk=cubic2(t), 
                      spar="m", data=Arosa)
> summary(ozone.fit10) 
...
Coefficients (d):
    (Intercept)               t sin(2 * pi * t) cos(2 * pi * t) 
    330.8107521      -0.9901175     -47.3246660       7.7786452 

GML estimate(s) of smoothing parameter(s) : 0.01568941 
Equivalent Degrees of Freedom (DF):  20.38134 
Estimate of sigma:  16.25793 
> anova(ozone.fit10, simu.size=500)
Testing H_0: f in the NULL space

    test.value simu.size simu.p-value approximate.p-value
LMP   1075.457       500         0.01                    
GML    0.98481       500            0         3.64301e-05

Estimate of the overall function and its two components, seasonal trend and long term trend, are shown in Figure [*]. We can see that the long term trend is different from a constant.

Figure: Plots of estimates from ozone.fit10. Above: observations as dots and estimated overall function as the solid line. Middle: estimated seasonal trend. Below: estimated long term trend as the solid line and 95% Bayesian confidence intervals as two dotted lines.
\begin{figure}\centerline{\psfig{figure=ozone5.ps,height=7in,width=6in}}\end{figure}

Observations close in time are likely to be correlated. Suppose that we want to use the exponential correlation structure with nugget effect. Specifically, regard $\epsilon(t)$ as a zero mean stochastic process with

$\displaystyle \mbox{Cov} (\epsilon(s),\epsilon(t))=\left\{
\begin{array}{ll}
\s...
...) \exp (-\vert s-t\vert / r),& s \ne t,\\
\sigma^2, & s=t,
\end{array} \right.$     (65)

where $c$ is the nugget effect, and $r$ is the range parameter.
> ozone.fit11 <- update(ozone.fit10, corr=corExp(form=~t,nugget=T))
> ozone.fit11
...
GML estimate(s) of smoothing parameter(s) : 469618845337 
Equivalent Degrees of Freedom (DF):  4 
Estimate of sigma:  17.36963 
Correlation structure of class corExp representing
    range    nugget 
0.3661093 0.6364710
New estimates are plotted in Figure [*]. We can see that now the long term trend is almost a constant. The equivalent degrees of freedom is decreased to 4.008176.

Figure: Plots of estimates from ozone.fit11. Above: observations as dots and estimated overall function as the solid line. Middle: estimated seasonal trend. Below: estimated long term trend as the solid line and 95% Bayesian confidence intervals as two dotted lines.
\begin{figure}\centerline{\psfig{figure=ozone6.ps,height=7in,width=6in}}\end{figure}

The null space contains four components: the constant, linear, sine and cosine functions. Using the cubic2 kernel penalizes the sine and cosine functions which may lead to biases. If this is to be avoided, we can use the lspline kernel with type=''linSinCos''. The period for the seasonal component is 1, while the period for the function lspline with type=''linSinCos'' is $2\pi$. Therefore we multiply the covariate $t$ by a constant $2\pi$ to match the scale.

> ozone.fat12 <- update(ozone.fit11, rk=lspline(2*pi*t,type="linSinCos"))
> ozone.fit12
...
GML estimate(s) of smoothing parameter(s) : 9.562702e+13 
Equivalent Degrees of Freedom (DF):  4 
Estimate of sigma:  17.37282 
Correlation structure of class corExp representing
    range    nugget 
0.3665007 0.6360280

The estimates are plotted in Figure [*].

Figure: Plots of estimates from ozone.fit12. Above: observations as dots and estimated overall function as the solid line. Middle: estimated seasonal trend. Below: estimated long term trend as the solid line and 95% Bayesian confidence intervals as two dotted lines.
\begin{figure}\centerline{\psfig{figure=ozone7.ps,height=7in,width=6in}}\end{figure}

As in Kitagawa and Gersch (1984), sometimes we want to use a stochastic process to model the autocorrelation and regard this process as part of the signal. Then we need to separate this process with measurement errors and predict it at desired points. Specifically, assume

$\displaystyle \mbox{{\tt thickness}}(t) = \mu + \beta_1 \sin (2\pi t) +
\beta_2\cos (2\pi t) +f(t)+ u(t) + \epsilon(t),$     (66)

where $f$ and $\epsilon$ are the same as in model ([*]). Assume that $u(t)$ is a stochastic process independent of $\epsilon(t)$ with mean zero and $\mbox{Cov} (u(s),u(t))= \sigma_1^2 \exp (-\vert s-t\vert / r)$, where $r$ is the range parameter as in ([*]).

Denote $\mbox{\boldmath$t$}$ as the vector of design points for the variable $t$. Let $u(\mbox{\boldmath$t$})$ be the vector of the $u$ process evaluated at design points. $u(\mbox{\boldmath$t$})$ are random effects and $u(\mbox{\boldmath$t$}) \sim \mbox{N} (0, \sigma_1^2D)$, where $D$ depends the parameter $r$. ([*]) is a SLM model. However, it cannot be fitted directly using slm since $D$ depends on the range parameter $r$ nonlinearly. We fit model ([*]) in two steps. We first regard $u(\mbox{\boldmath$t$})$ as part of random error and estimate the range parameter. This is done in ozone.fit11. Then we calculate the estimate of $D$ (without nugget effect) and regard it as the true covariance matrix. We calculate the Cholesky decomposition of $D$ as $D = ZZ^T$, and transform the random effects $u(\mbox{\boldmath$t$})=Z \mbox{\boldmath$b$}$, where $\mbox{\boldmath$b$}\sim \mbox{N}(0,\sigma_1^2 I)$. Now we are ready to fit the transformed SLM:

> tau <- coef(ozone.fit11$cor.est, F)
> D <- corMatrix(initialize(corExp(tau[1],form=~t), data=Arosa))
> Z <- chol.new(D)
> ozone.fit13 <- slm(thick~t+sin(2*pi*t)+cos(2*pi*t), rk=cubic2(t),
                     random=list(pdIdent(~Z-1)), data=Arosa)	
> summary(ozone.fit13)
...
Coefficients (d):
    (Intercept)               t sin(2 * pi * t) cos(2 * pi * t) 
    332.2848808       0.3174285     -47.2930087       7.8994571 

GML estimate(s) of smoothing parameter(s) : 50.62773 
Equivalent Degrees of Freedom (DF):  4.517319 
Estimate of sigma:  13.86935

Suppose that we want to predict $u$ on grid points $\mbox{\boldmath$s$}$, $u(\mbox{\boldmath$s$})$. Let $R=\mbox{Cov} (u(\mbox{\boldmath$s$}),u(\mbox{\boldmath$t$}))$. Then $\hat{u}(\mbox{\boldmath$s$})=R D^{-1} \hat{u}(\mbox{\boldmath$t$}) = R Z^{-T} \hat{\mbox{\boldmath$b$}}$.

> grid3 <- data.frame(t=seq(0,max(Arosa$t)+0.001,len=500))
> newdata <- data.frame(t=c(Arosa$t,grid3$t))
> RD <- corMatrix(initialize(corExp(tau[1],form=~t), data=newdata))
> R <- RD[(length(Arosa$t)+1):length(newdata$t),1:length(Arosa$t)] 
> u.new <- R%*%t(solve(Z))%*%as.vector(ozone.fit13$lme.obj$coef$random[[2]])

Estimates from ozone.fit13 are shown in Figure [*].

Figure: Plots of estimates from ozone.fit13. First row: observations as dots and estimated overall function as the solid line. Second row: estimated seasonal trend. Third row: estimated long term trend as the solid line and 95% Bayesian confidence intervals as two dotted lines. Fourth row: estimated local stochastic trend.
\begin{figure}\centerline{\psfig{figure=ozone8.ps,height=7in,width=6in}}\end{figure}

We now show how to use nnr and snr to fit varying coefficients models. Suppose that the thickness in each year can be well approximated by a sinusoidal function of month. We want to investigate how two coefficients, the average thickness and amplitude, change over years. Specifically, we consider the following model:

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}})$  
  $\textstyle =$ $\displaystyle f_1(\mbox{{\tt csyear}})+f_2(\mbox{{\tt csyear}})
\cos 2\pi \left...
...alogit}(\alpha) \right) +
\epsilon (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}}),$ (67)

where $f_1(\mbox{{\tt csyear}})$ and $f_2(\mbox{{\tt csyear}})$ are yearly average thickness and amplitude respectively, $\mbox{alogit}(\alpha)=\exp(\alpha)/(1+\exp(\alpha))$ guarantees the horizontal shift to be between 0 and 1. Assume that $f_1,f_2 \in W_2([0,1])$. Note that the constraint $f_2 \ge 0$ is not enforced here. We may replace $f_2$ by $\exp(f_2)$ as the amplitude function. The resulting model will be fitted later. Model ([*]) is a special case of the SNR model, and thus can be fitted using the function snr. We use the coefficients from ozone.fit1 to calculate the initial value for $\alpha$.
> tmp <- atan(-ozone.fit1$coef[3]/ozone.fit1$coef[2])/(2*pi)
> tmp <- log(tmp/(1-tmp))
> ozone.fit14 <- snr(thick~f1(csyear)+f2(csyear)*cos(2*pi*(csmonth+alogit(a))),
                   func=list(f1(x)+f2(x)~list(~x, cubic(x))), params=list(a~1),
                   data=Arosa, start=list(params=c(tmp)), spar="m")
> summary(ozone.fit14)
Semi-parametric Nonlinear Regression Model Fit by Gauss-Newton Method
Model: thick ~ f1(csyear) + f2(csyear) * cos(2 * pi * (csmonth + alogit(a))) 
Data: Arosa 

       AIC      BIC    logLik
  4362.429 4370.929 -2179.214

Coefficients:
      Value  Std.Error   t-value p-value
a -1.243126 0.01933295 -64.30086       0

Standardized residuals:
        Min          Q1         Med          Q3         Max 
-3.37533987 -0.60538971 -0.03263471  0.49885054  3.13869256 

GML estimate(s) of smoothing spline parameter(s): 0.0001225915 0.4999998276 
Equivalent Degrees of Freedom (DF) for spline function:  16.95087 
Residual standard error: 16.81619 

Converged after 3 iterations

> p.ozone.fit14 <- intervals(ozone.fit14,data.frame(x=seq(0,1,len=100)),
            terms=list(f1=matrix(c(1,1,1,1,1,0,0,0,1),ncol=3,byrow=T),
                       f2=matrix(c(1,1,1,1,1,0,0,0,1),ncol=3,byrow=T)))

Figure: Estimated $f_1$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone9.ps,height=2in,width=5in}}\end{figure}

Figure: Estimated $f_2$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone10.ps,height=2in,width=5in}}\end{figure}

Figures [*] and [*] show the estimated functions and their projections. The yearly average thickness has a similar trend as before, and the amplitude has a linear increasing, but non-significant, trend.

The sinusoidal function for monthly changes may be too restrictive. Then we can consider the following model

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}})$  
  $\textstyle =$ $\displaystyle f_1(\mbox{{\tt csyear}})+f_2(\mbox{{\tt csyear}})
f_3(\mbox{{\tt csmonth}}) + \epsilon (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}}),$ (68)

where the $\cos$ function in model ([*]) is replaced by a nonparametric periodic function $f_3$. Assume that $f_3 \in W_2(per) \ominus \{ 1 \}$. The constant functions were removed from the model space for identifiability. Thus we have $\int_0^1 f_3(t) dt = 0$. Also, for identifiability, we assume $\sup \vert f_3(t)\vert=1$. Again, we did not enforce the constraint $f_2 \ge 0$. Model ([*]) is a special case of the NNR model, thus can be fitted using nnr. In the following, instead of using nnr directly, we write a simple program using backfitting methods to estimate $f_1$, $f_2$ and $f_3$. The purpose is to show that writing S codes for more complicated models is fairly easy.
# transform time variable into [0,1]
z <- (Arosa$t-min(Arosa$t))/(max(Arosa$t)-min(Arosa$t))

# create matrices for RK's 
S1 <- S2 <- cubic(z)
S3 <- periodic((max(Arosa$t)-min(Arosa$t))*z) 

# find initial values
f3.tmp <- ssr(thick~1,rk=S3,data=Arosa,spar="m")
f3.est <- as.vector(S3%*%f3.tmp$rkpk.obj$c/max(abs(S3%*%f3.tmp$rkpk.obj$c)))
f2.est <- rep(max(abs(S3%*%f3.tmp$rkpk.obj$c)),length(thick))
f1.est <- rep(f3.tmp$rkpk.obj$d[1],length(thick))

# backfitting 
prec <- 1
while (prec>.0001) {
    # fix f2 and f3, fit f1
    y.tmp <- thick-f2.est*f3.est
    f1.tmp <- ssr(y.tmp~z,rk=S1,spar="m",limnla=c(-3,1))
    f1.est.old <- f1.est
    f1.est <- f1.tmp$fit

    # fix f1 and f3, fit f2. Note RK are multiplied by f3
    y.tmp <- thick-f1.tmp$fit
    f2.tmp <- ssr(y.tmp~f3.est+I(f3.est*z)-1,rk=rk.prod(S2,f3.est),
                  spar="m",limnla=c(-3,1))
    f2.est.old <- f2.est
    f2.est <- f2.tmp$rkpk.obj$d[1]+f2.tmp$rkpk.obj$d[2]*z
             +as.vector(S2%*%f2.tmp$rkpk.obj$c)

    # fix f1 and f2, fit f3, note the empty null space and weights
    f3.tmp <- ssr(I(y.tmp/f2.est)~-1,rk=S3,spar="m",weights=f2.est)
    f3.est.old <- f3.est
    # enforce \sup |f_3(t)|=1
    f3.est <- as.vector(S3%*%f3.tmp$rkpk.obj$c/
              max(abs(S3%*%f3.tmp$rkpk.obj$c)))

    # stopping criteria
    prec <- max(sum((f1.est-f1.est.old)**2),
                sum((f2.est-f2.est.old)**2),
                sum((f3.est-f3.est.old)**2))
}

# for prediction of f3 on new grid, we need to refit which specifies the 
# rk function rk=periodic. rk= a exist matrix will not work
y.tmp <- thick-f1.tmp$fit
f3.tmp <- ssr(I(y.tmp/f2.est)~-1,
              rk=periodic((max(Arosa$t)-min(Arosa$t))*z),
              spar="m",weights=f2.est)

# calculate fits for the plot.bCI function. 
# Note we inflate the posterior variances via degrees of freedom
p.ozone.fit15.f1 <- predict(f1.tmp,terms=matrix(c(1,1,1,1,1,0,0,0,1),ncol=3,
                            byrow=T))
p.ozone.fit15.f1$pstd <- p.ozone.fit15.f1$pstd*sqrt((length(thick)-
                f1.tmp$df)/(length(thick)-f1.tmp$df-f2.tmp$df-f3.tmp$df+1))
p.ozone.fit15.f2 <- predict(f2.tmp,terms=matrix(c(1,1,1,1,1,0,0,0,1),ncol=3,
                            byrow=T))
p.ozone.fit15.f2$fit <- p.ozone.fit15.f2$fit/f3.est
p.ozone.fit15.f2$pstd <- p.ozone.fit15.f2$pstd*sqrt((length(thick)-
     f2.tmp$df)/(length(thick)-f1.tmp$df-f2.tmp$df-f3.tmp$df+1))/abs(f3.est)
grid4 <- data.frame(z=seq(0,1/(max(Arosa$t)-min(Arosa$t)),len=100))
p.ozone.fit15.f3 <- predict(f3.tmp,grid4)
p.ozone.fit15.f3$pstd <- p.ozone.fit14.f3$pstd*sqrt((length(thick)-
                f3.tmp$df)/(length(thick)-f1.tmp$df-f2.tmp$df-f3.tmp$df+1))

Figure: Estimated $f_1$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone11.ps,height=2in,width=5in}}\end{figure}

Figure: Estimated $f_2$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone12.ps,height=2in,width=5in}}\end{figure}

Figure: Solid line is the estimated $f_3$. Dotted lines are 95% Bayesian confidence intervals. Dashed line is the sine function.
\begin{figure}\centerline{\psfig{figure=ozone13.ps,height=3in,width=4in}}\end{figure}

Figures [*], [*] and [*] show the estimated functions and their projections. Yearly average thickness $f_1$ and amplitude $f_2$ have similar trends as before. $f_3$ is different from a sinusoidal function, even though the difference is small.

To enforce the constraint $f_2 \ge 0$ in model ([*]), we consider the following model

    $\displaystyle \mbox{{\tt thickness}} (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}})$  
  $\textstyle =$ $\displaystyle f_1(\mbox{{\tt csyear}})+ \exp (f_2(\mbox{{\tt csyear}}))
f_3(\mbox{{\tt csmonth}}) + \epsilon (\mbox{{\tt csmonth}}, \mbox{{\tt csyear}}),$ (69)

where $f_1 \in W_2([0,1])$ and $f_3 \in W_2(per) \ominus \{ 1 \}$. Since $f_3$ is close to a sinusoidal function, we use the $L$-spline with $L=D^2+(2\pi)^2$. For identifiability, we remove the constant functions from the model space for $f_2$: $f_2 \in W_2([0,1]) \ominus \{ 1 \}$.

> S3 <- cubic(z)
> f3.tmp <- ssr(thick~1,rk=S3,data=Arosa,spar="m")
> f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c)
> ozone.fit16 <- nnr(thick~f1(csyear)+exp(f2(csyear))*f3(csmonth),
                     func=list(f1(x)~list(~I(x-.5),cubic(x)),
                               f2(x)~list(~I(x-.5)-1,cubic(x)),
                               f3(x)~list(~sin(2*pi*x)+cos(2*pi*x)-1,
                               lspline(x,type="sine0"))),
                     data=Arosa, 
                     start=list(f1=mean(thick),f2=0,f3=f3.ini),
                     control=list(converg="coef"))
> ozone.fit16
Nonlinear Nonparametric Regression Model Fit by Gauss-Newton Method
 Model: thick ~ f1(csyear) + exp(f2(csyear)) * f3(csmonth) 
 Data: Arosa 

 GML estimate(s) of smoothing parameter(s): 2.081637e-07 1.930501e-03 1.053201e-06 
 Equivalent Degrees of Freedom (DF):  34.20186 

 Residual standard error: 15.62308 
 Number of Observations: 518 
 Converged after 7 iterations

> x <- seq(0,1,len=50)
> u <- seq(0,1,len=50)
> p.ozone.fit16 <- intervals(ozone.fit16, newdata=list(csyear=x,csmonth=u),
               terms=list(f1=matrix(c(1,1,1,1,1,0,0,0,1),nrow=3,byrow=T),
                          f2=matrix(c(1,1,1,0,0,1),nrow=3,byrow=T),
                          f3=matrix(c(1,1,1,1,1,0,0,0,1),nrow=3,byrow=T)))

Figure: Estimated $f_1$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone14.ps,height=2in,width=5in}}\end{figure}

Figure: Estimated $f_2$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone15.ps,height=2in,width=5in}}\end{figure}

Figure: Estimated $f_3$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=ozone16.ps,height=2in,width=5in}}\end{figure}

Estimated functions and their projections are shown in Figures [*], [*] and [*]. They are similar to previous estimates.


next up previous
Next: Global Climate Data Up: Examples Previous: Examples
Yuedong Wang 2004-05-19