Next: Lake Acidity Study Up: Examples Previous: United States Historical Climate

## Chickenpox Epidemic

This data set, downloaded from http://www-personal.buseco.monash.edu.au/hyndman /TSDL/, contains monthly number of reported cases of chickenpox in New York City from 1931 to the first six months of 1972. It has been analyzed by several authors to investigate dynamics in an epidemic (Yorke and London, 1973; Schaffer and Kot, 1985). Figure shows time series plots of square root of monthly cases. We illustrate how to use the SS ANOVA and NNR models to investigate long term trend over years, seasonal trend and their interactions.

Denote as the square root of reported cases in month of year . Both and are transformed into the interval . We first consider an additive model

 (73)

and a full SS ANOVA model
 (74)

where , , , , and are respectively constant, linear main effect of year, smooth main effect of month, smooth main effect of year, smooth-linear interaction between month and year, and smooth-smooth interaction between month and year. We model month () effect using a periodic spline and year () effect using a cubic spline.

> data(chickenpox)
> chickenpox$ct<- sqrt(chickenpox$count)
> chickenpox$csmonth<- (chickenpox$month-0.5)/12
> chickenpox$csyear<- ident(chickenpox$year)
> chickenpox.fit1 <- ssr(ct~csyear, rk=list(periodic(csmonth),cubic(csyear)),
data=chickenpox)
> chickenpox.fit1
...
GCV estimate(s) of smoothing parameter(s) : 0.189468039 0.001004518
Equivalent Degrees of Freedom (DF):  46.32076
Estimate of sigma:  3.932079

Number of Observations: 498

> chickenpox.fit2 <- update(chickenpox.fit1,
rk=list(periodic(csmonth),cubic(csyear),
rk.prod(periodic(csmonth),kron(csyear)),
rk.prod(periodic(csmonth),cubic(csyear))))

> chickenpox.fit2
...
GCV estimate(s) of smoothing parameter(s) : 3.175017e-02 1.703363e-04
3.253160e-01 2.387995e-07
Equivalent Degrees of Freedom (DF):  278.5618
Estimate of sigma:  1.975239

Number of Observations: 498


The seasonal variation was mainly caused by two factors: (a) social behavior of children who made close contacts when school was in session; and (b) temperature and humidity which may affect the survival and transmission of dispersal stages (Yorke and London, 1973; Schaffer and Kot, 1985). Thus the seasonal variations were similar over the years. In the following NNR model we assume that the seasonal variation has the same shape after vertical shift and vertical scale transformations. Specifically we assume that

 (75)

where , and are three unknown functions which represent respectively yearly mean cases, seasonal trend in a year and the magnitude of the seasonal variation for a particular year. We refer as the amplitude. A bigger amplitude corresponds to a bigger seasonal variation. Thus in addition to being more parsimonious than the SS ANOVA model (), the NNR model () has component functions with nice interpretations. We model and using cubic splines. It has been recognized that a simple sinusoidal model may be inappropriate (Earn et al., 2000). Since is periodic and is close to a sinusoidal function, we use the -spline with . To make model () identifiable, we use the following side conditions: (a) , and (b) . Therefore, the model spaces for , and are , and respectively.

> S3 <- periodic(chickenpox.data$month) > f3.tmp <- ssr(y~1,rk=S3,data=chickenpox.data,spar=''m'') > f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c) > chickenpox.fit3 <- nnr(y~f1(year)+exp(f2(year))*f3(month), func=list(f1(x)~list(~I(x-.5),cubic(x)), f2(x)~list(~I(x-.5)-1,cubic(x)), f3(x)~list(~sin(2*pi*x)+cos(2*pi*x)-1, lspline(x,type=''sine0''))), data=chickenpox.data,start=list(f1=mean(y),f2=0,f3=f3.ini), control=list(converg=''coef''),spar=''m'') > chickenpox.fit3 ... GML estimate(s) of smoothing parameter(s): 7.7261e-07 1.7787e-03 2.0090e-07 Equivalent Degrees of Freedom (DF): 27.85445 Residual standard error: 3.67597 Number of Observations: 498 Converged after 4 iterations  We can compare three models using a model selection procedure such as GCV, AIC or BIC. > n <- 498 > rss1 <- sum(chickenpox.fit1$resi**2)
> rss2 <- sum(chickenpox.fit2$resi**2) > rss3 <- sum(chickenpox.fit3$resi**2)
> gcv1 <- rss1/n/(1-chickenpox.fit1$df/n)**2 > gcv2 <- rss2/n/(1-chickenpox.fit2$df/n)**2
> gcv3 <- rss3/n/(1-chickenpox.fit3$df$f/n)**2
> aic1 <- n*log(rss1/n)+2*chickenpox.fit1$df > aic2 <- n*log(rss2/n)+2*chickenpox.fit2$df
> aic3 <- n*log(rss3/n)+2*chickenpox.fit3$df$f
> bic1 <- n*log(rss1/n)+log(n)*chickenpox.fit1$df > bic2 <- n*log(rss2/n)+log(n)*chickenpox.fit2$df
> bic3 <- n*log(rss3/n)+log(n)*chickenpox.fit3$df$f
> print(c(gcv1,gcv2,gcv3))
> [1] 17.04683  8.85435 14.31334
> print(c(aic1,aic2,aic3))
> [1] 1407.7146  826.9648 1323.6549
> print(c(bic1,bic2,bic3))
> [1] 1602.753 1999.877 1440.939

Thus the GCV and AIC criteria select the SS ANOVA model, and the BIC criteria selects the more parsimonious NNR model. The estimates of and and their 95% confidence intervals are shown in Figure . We also superimposed yearly averages on the plot of and the logarithm of scaled ranges on the plot of . The scaled range of a specific year was calculated as the differences between the maximum and the minimum monthly number of cases divided by the range of the estimated seasonal trend . It is clear that captures the long term trend in the mean and captures the long term trend in the range of seasonal variation. From Figure , the SS ANOVA model () captures local trend, particularly biennial pattern from 1945 to 1955, better than the NNR model (). From the estimate of , we can see that yearly averages peaked in the 1930s and 1950s, and gradually decreased in the 1960s after the introduction of mass vaccination. The amplitude reflects the seasonal variation in transmission rate. From the estimate of in Figure , we can see that magnitude of the seasonal variation peaked in the 1950s and then declined in the 1960s, possibly as a result of changing public health conditions including mass vaccination. Figure shows the estimate of the seasonal trend and its projections onto the null space (the simple sinusoidal model) and the orthogonal complement of the null space . Since the projection onto the complement space is significantly different from zero, we conclude that a simple sinusoidal model does not provide an accurate approximation.

Next: Lake Acidity Study Up: Examples Previous: United States Historical Climate
Yuedong Wang 2004-05-19