This data set, downloaded from
`http://www-personal.buseco.monash.edu.au/hyndman`
`/TSDL/`,
contains monthly number of reported cases of chickenpox in New
York City from 1931 to the first six months of 1972. It has been
analyzed by several authors to investigate dynamics in an epidemic
(Yorke and London, 1973; Schaffer and Kot, 1985). Figure shows
time series plots of square root of monthly cases. We illustrate
how to use the SS ANOVA and NNR models to investigate long
term trend over years, seasonal trend and their interactions.

Denote as the square root of reported cases in month
of year . Both and are transformed into the
interval . We first consider an additive model

and a full SS ANOVA model

where , , , , and are respectively constant, linear main effect of year, smooth main effect of month, smooth main effect of year, smooth-linear interaction between month and year, and smooth-smooth interaction between month and year. We model month () effect using a periodic spline and year () effect using a cubic spline.

> data(chickenpox) > chickenpox$ct<- sqrt(chickenpox$count) > chickenpox$csmonth<- (chickenpox$month-0.5)/12 > chickenpox$csyear<- ident(chickenpox$year) > chickenpox.fit1 <- ssr(ct~csyear, rk=list(periodic(csmonth),cubic(csyear)), data=chickenpox) > chickenpox.fit1 ... GCV estimate(s) of smoothing parameter(s) : 0.189468039 0.001004518 Equivalent Degrees of Freedom (DF): 46.32076 Estimate of sigma: 3.932079 Number of Observations: 498 > chickenpox.fit2 <- update(chickenpox.fit1, rk=list(periodic(csmonth),cubic(csyear), rk.prod(periodic(csmonth),kron(csyear)), rk.prod(periodic(csmonth),cubic(csyear)))) > chickenpox.fit2 ... GCV estimate(s) of smoothing parameter(s) : 3.175017e-02 1.703363e-04 3.253160e-01 2.387995e-07 Equivalent Degrees of Freedom (DF): 278.5618 Estimate of sigma: 1.975239 Number of Observations: 498

The seasonal variation was mainly caused by two factors: (a) social
behavior of children who made close contacts when school was in session;
and (b) temperature and humidity which may affect the survival and
transmission of dispersal stages (Yorke and London, 1973; Schaffer and Kot, 1985). Thus
the seasonal variations were similar over the years. In the following
NNR model we assume that the seasonal variation has the same shape
after vertical shift and vertical scale transformations. Specifically we
assume that

where , and are three unknown functions which represent respectively yearly mean cases, seasonal trend in a year and the magnitude of the seasonal variation for a particular year. We refer as the amplitude. A bigger amplitude corresponds to a bigger seasonal variation. Thus in addition to being more parsimonious than the SS ANOVA model (), the NNR model () has component functions with nice interpretations. We model and using cubic splines. It has been recognized that a simple sinusoidal model may be inappropriate (Earn et al., 2000). Since is periodic and is close to a sinusoidal function, we use the -spline with . To make model () identifiable, we use the following side conditions: (a) , and (b) . Therefore, the model spaces for , and are , and respectively.

> S3 <- periodic(chickenpox.data$month) > f3.tmp <- ssr(y~1,rk=S3,data=chickenpox.data,spar=''m'') > f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c) > chickenpox.fit3 <- nnr(y~f1(year)+exp(f2(year))*f3(month), func=list(f1(x)~list(~I(x-.5),cubic(x)), f2(x)~list(~I(x-.5)-1,cubic(x)), f3(x)~list(~sin(2*pi*x)+cos(2*pi*x)-1, lspline(x,type=''sine0''))), data=chickenpox.data,start=list(f1=mean(y),f2=0,f3=f3.ini), control=list(converg=''coef''),spar=''m'') > chickenpox.fit3 ... GML estimate(s) of smoothing parameter(s): 7.7261e-07 1.7787e-03 2.0090e-07 Equivalent Degrees of Freedom (DF): 27.85445 Residual standard error: 3.67597 Number of Observations: 498 Converged after 4 iterationsWe can compare three models using a model selection procedure such as GCV, AIC or BIC.

> n <- 498 > rss1 <- sum(chickenpox.fit1$resi**2) > rss2 <- sum(chickenpox.fit2$resi**2) > rss3 <- sum(chickenpox.fit3$resi**2) > gcv1 <- rss1/n/(1-chickenpox.fit1$df/n)**2 > gcv2 <- rss2/n/(1-chickenpox.fit2$df/n)**2 > gcv3 <- rss3/n/(1-chickenpox.fit3$df$f/n)**2 > aic1 <- n*log(rss1/n)+2*chickenpox.fit1$df > aic2 <- n*log(rss2/n)+2*chickenpox.fit2$df > aic3 <- n*log(rss3/n)+2*chickenpox.fit3$df$f > bic1 <- n*log(rss1/n)+log(n)*chickenpox.fit1$df > bic2 <- n*log(rss2/n)+log(n)*chickenpox.fit2$df > bic3 <- n*log(rss3/n)+log(n)*chickenpox.fit3$df$f > print(c(gcv1,gcv2,gcv3)) > [1] 17.04683 8.85435 14.31334 > print(c(aic1,aic2,aic3)) > [1] 1407.7146 826.9648 1323.6549 > print(c(bic1,bic2,bic3)) > [1] 1602.753 1999.877 1440.939Thus the GCV and AIC criteria select the SS ANOVA model, and the BIC criteria selects the more parsimonious NNR model. The estimates of and and their 95% confidence intervals are shown in Figure . We also superimposed yearly averages on the plot of and the logarithm of scaled ranges on the plot of . The scaled range of a specific year was calculated as the differences between the maximum and the minimum monthly number of cases divided by the range of the estimated seasonal trend . It is clear that captures the long term trend in the mean and captures the long term trend in the range of seasonal variation. From Figure , the SS ANOVA model () captures local trend, particularly biennial pattern from 1945 to 1955, better than the NNR model (). From the estimate of , we can see that yearly averages peaked in the 1930s and 1950s, and gradually decreased in the 1960s after the introduction of mass vaccination. The amplitude reflects the seasonal variation in transmission rate. From the estimate of in Figure , we can see that magnitude of the seasonal variation peaked in the 1950s and then declined in the 1960s, possibly as a result of changing public health conditions including mass vaccination. Figure shows the estimate of the seasonal trend and its projections onto the null space (the simple sinusoidal model) and the orthogonal complement of the null space . Since the projection onto the complement space is significantly different from zero, we conclude that a simple sinusoidal model does not provide an accurate approximation.