next up previous
Next: Lake Acidity Study Up: Examples Previous: United States Historical Climate

Chickenpox Epidemic

This data set, downloaded from$\tilde{~}$hyndman /TSDL/, contains monthly number of reported cases of chickenpox in New York City from 1931 to the first six months of 1972. It has been analyzed by several authors to investigate dynamics in an epidemic (Yorke and London, 1973; Schaffer and Kot, 1985). Figure [*] shows time series plots of square root of monthly cases. We illustrate how to use the SS ANOVA and NNR models to investigate long term trend over years, seasonal trend and their interactions.

Figure: Plots of the square root of monthly cases (dotted lines) and fits (dotted lines) from models ([*]) (top), ([*]) (middle) and ([*]) (bottom).

Denote $y$ as the square root of reported cases in month $t_1$ of year $t_2$. Both $t_1$ and $t_2$ are transformed into the interval $[0,1]$. We first consider an additive model

$\displaystyle y(t_1,t_2)= \mu + \beta t_2 + s_1(t_1) + s_2(t_2) + \epsilon(t_1,t_2),$     (73)

and a full SS ANOVA model
$\displaystyle y(t_1,t_2)= \mu + \beta t_2 + s_1(t_1) + s_2(t_2)
+ sl_{12}^1(t_1,t_2) + ss_{12}(t_1,t_2),$     (74)

where $\mu$, $\beta t_2$, $s_1(t_1)$, $s_2(t_2)$, $sl_{12}^1(t_1,t_2)$ and $ss_{12}(t_1,t_2)$ are respectively constant, linear main effect of year, smooth main effect of month, smooth main effect of year, smooth-linear interaction between month and year, and smooth-smooth interaction between month and year. We model month ($t_1$) effect using a periodic spline and year ($t_2$) effect using a cubic spline.

> data(chickenpox)
> chickenpox$ct<- sqrt(chickenpox$count) 
> chickenpox$csmonth<- (chickenpox$month-0.5)/12 
> chickenpox$csyear<-  ident(chickenpox$year) 
> chickenpox.fit1 <- ssr(ct~csyear, rk=list(periodic(csmonth),cubic(csyear)), 
> chickenpox.fit1
GCV estimate(s) of smoothing parameter(s) : 0.189468039 0.001004518 
Equivalent Degrees of Freedom (DF):  46.32076 
Estimate of sigma:  3.932079 

Number of Observations: 498 

> chickenpox.fit2 <- update(chickenpox.fit1, 

> chickenpox.fit2
GCV estimate(s) of smoothing parameter(s) : 3.175017e-02 1.703363e-04 
                                            3.253160e-01 2.387995e-07 
Equivalent Degrees of Freedom (DF):  278.5618 
Estimate of sigma:  1.975239 

Number of Observations: 498

The seasonal variation was mainly caused by two factors: (a) social behavior of children who made close contacts when school was in session; and (b) temperature and humidity which may affect the survival and transmission of dispersal stages (Yorke and London, 1973; Schaffer and Kot, 1985). Thus the seasonal variations were similar over the years. In the following NNR model we assume that the seasonal variation has the same shape after vertical shift and vertical scale transformations. Specifically we assume that

$\displaystyle y(t_1,t_2) = g_1(t_2)+ \exp (g_2(t_2)) \times g_3(t_1) +
\epsilon (t_1,t_2),$     (75)

where $g_1$, $g_3$ and $g_2$ are three unknown functions which represent respectively yearly mean cases, seasonal trend in a year and the magnitude of the seasonal variation for a particular year. We refer $\exp (g_2(t_2))$ as the amplitude. A bigger amplitude corresponds to a bigger seasonal variation. Thus in addition to being more parsimonious than the SS ANOVA model ([*]), the NNR model ([*]) has component functions with nice interpretations. We model $g_1$ and $g_2$ using cubic splines. It has been recognized that a simple sinusoidal model may be inappropriate (Earn et al., 2000). Since $g_3$ is periodic and is close to a sinusoidal function, we use the $L$-spline with $L=D^2+(2\pi)^2$. To make model ([*]) identifiable, we use the following side conditions: (a) $\int_0^1 g_2(t) dt=0$, and (b) $\int_0^1 g_3(t) dt=0$. Therefore, the model spaces for $g_1$, $g_2$ and $g_3$ are $W_2[0,1]$, $W_2[0,1] \ominus \{ 1 \}$ and $W_2(per) \ominus \{ 1 \}$ respectively.

> S3 <- periodic($month)
> f3.tmp <- ssr(y~1,rk=S3,,spar=''m'')
> f3.ini <- as.vector(S3%*%f3.tmp$rkpk.obj$c)
> chickenpox.fit3 <- nnr(y~f1(year)+exp(f2(year))*f3(month),
> chickenpox.fit3
 GML estimate(s) of smoothing parameter(s): 7.7261e-07 1.7787e-03 2.0090e-07
 Equivalent Degrees of Freedom (DF):  27.85445

 Residual standard error: 3.67597
 Number of Observations: 498
 Converged after 4 iterations
We can compare three models using a model selection procedure such as GCV, AIC or BIC.
> n <- 498
> rss1 <- sum(chickenpox.fit1$resi**2)
> rss2 <- sum(chickenpox.fit2$resi**2)
> rss3 <- sum(chickenpox.fit3$resi**2)
> gcv1 <- rss1/n/(1-chickenpox.fit1$df/n)**2
> gcv2 <- rss2/n/(1-chickenpox.fit2$df/n)**2
> gcv3 <- rss3/n/(1-chickenpox.fit3$df$f/n)**2
> aic1 <- n*log(rss1/n)+2*chickenpox.fit1$df
> aic2 <- n*log(rss2/n)+2*chickenpox.fit2$df
> aic3 <- n*log(rss3/n)+2*chickenpox.fit3$df$f
> bic1 <- n*log(rss1/n)+log(n)*chickenpox.fit1$df
> bic2 <- n*log(rss2/n)+log(n)*chickenpox.fit2$df
> bic3 <- n*log(rss3/n)+log(n)*chickenpox.fit3$df$f
> print(c(gcv1,gcv2,gcv3))
> [1] 17.04683  8.85435 14.31334
> print(c(aic1,aic2,aic3))
> [1] 1407.7146  826.9648 1323.6549
> print(c(bic1,bic2,bic3))
> [1] 1602.753 1999.877 1440.939
Thus the GCV and AIC criteria select the SS ANOVA model, and the BIC criteria selects the more parsimonious NNR model. The estimates of $g_1$ and $g_2$ and their 95% confidence intervals are shown in Figure [*]. We also superimposed yearly averages on the plot of $g_1$ and the logarithm of scaled ranges on the plot of $g_2$. The scaled range of a specific year was calculated as the differences between the maximum and the minimum monthly number of cases divided by the range of the estimated seasonal trend $g_3$. It is clear that $g_1$ captures the long term trend in the mean and $g_2$ captures the long term trend in the range of seasonal variation. From Figure [*], the SS ANOVA model ([*]) captures local trend, particularly biennial pattern from 1945 to 1955, better than the NNR model ([*]). From the estimate of $g_1$, we can see that yearly averages peaked in the 1930s and 1950s, and gradually decreased in the 1960s after the introduction of mass vaccination. The amplitude reflects the seasonal variation in transmission rate. From the estimate of $g_2$ in Figure [*], we can see that magnitude of the seasonal variation peaked in the 1950s and then declined in the 1960s, possibly as a result of changing public health conditions including mass vaccination. Figure [*] shows the estimate of the seasonal trend $g_3$ and its projections onto the null space ${\cal H}_{30}$ (the simple sinusoidal model) and the orthogonal complement of the null space ${\cal H}_{31}$. Since the projection onto the complement space is significantly different from zero, we conclude that a simple sinusoidal model does not provide an accurate approximation.

Figure: Left: plot of yearly averages (circles), estimate of $g_1$ (solid line) and its 95% confidence intervals (dotted lines). Right: plot of yearly scaled ranges on logarithm scale (circles), estimate of $g_2$ (solid line) and its 95% confidence intervals (dotted lines).

Figure: Estimated $g_3$ (overall), and its projections to ${\cal H}_0$ (parametric) and ${\cal H}_1$ (smooth). Dotted lines are 95% Bayesian confidence intervals.

next up previous
Next: Lake Acidity Study Up: Examples Previous: United States Historical Climate
Yuedong Wang 2004-05-19