next up previous
Next: Examples Up: ASSIST: A Suite of Previous: The snm Function

Vector Spline Models

In many applications two or more dependent variables are observed at several values of independent variables such as at multiple time points. Often observations from different variables are contemporaneously correlated. Observations from the same variable may also be correlated. The statistical problems are (i) to estimate functions that model their dependences on the independent variables and (ii) to investigate relationships between these functions. Wang et al. (2000) proved that the joint estimates have smaller posterior variances than those of function-by-function estimates and are therefore more efficient. In this section we show how to use ssr and nnr to fit functions jointly.

Consider the following model

$\displaystyle y_{ji} = f_j(t_{ji}) + \epsilon_{ji},~~~~j=1,\cdots,J;~ i=1,\cdots,n_j,$     (48)

where the $i$th response of the $j$th variable $y_{ji}$ is generated as the $j$th function $f_j$ evaluated at the design point $t_{ji}$ plus a random error $\epsilon_{ji}$. We assume that $f_j$ has the model space ([*]).

Denote $\mbox{\boldmath$t$}_j = (t_{j1},\cdots,t_{jn_j})^{T}$, $\mbox{\boldmath$f$}_j = (f_j(t_{j1}),\cdots,f_j(t_{jn_j}))^{T}$, $\mbox{\boldmath$y$}_j = (y_{j1},\cdots,y_{jn_j})^{T}$, $\mbox{\boldmath$\epsilon$}_j = (\epsilon_{j1},\cdots,\epsilon_{jn_j})^{T}$, $\mbox{\boldmath$f$}= (\mbox{\boldmath$f$}_1^{T},\cdots,\mbox{\boldmath$f$}_J^{T})^{T}$, $\mbox{\boldmath$y$}= (\mbox{\boldmath$y$}_1^{T},\cdots,\mbox{\boldmath$y$}_J^{T})^{T}$, and $\mbox{\boldmath$\epsilon$}= (\mbox{\boldmath$\epsilon$}_1^{T},\cdots,\mbox{\boldmath$\epsilon$}_J^{T})^{T}$. Assume that $\mbox{\boldmath$\epsilon$}\sim \mbox{N}(0, \sigma^2 W^{-1})$, where $W$ depends on some parameters $\mbox{\boldmath$\tau$}$.

We estimate all functions $f_j$'s jointly as the minimizer of the following penalized weighted least squares

$\displaystyle (\mbox{\boldmath$y$}-\mbox{\boldmath$f$})^T W (\mbox{\boldmath$y$...
...um_{j=1}^J \sum_{k=1}^{p_j}
\theta_{jk}^{-1} \vert\vert P_{jk}f_j\vert\vert^2 ,$     (49)

where $P_{jk}$ is the orthogonal projection operator of $f_j$ onto ${\cal H}_{jk}$ in ${\cal H}_j$. We use the GML method to estimate the variance-covariance parameters $\mbox{\boldmath$\tau$}$ and the smoothing parameters $\theta_{jk}$'s as the minimizers of ([*]).

We now show how to trick ssr to fit model ([*]). For the simplicity of notations, we consider the case of $J=2$. Situations with $J>2$ can be fitted similarly. Note that the domains of $t_{ji}$ may be different for different $j$. For most applications they are the same, which is assumed in the remaining of this section. Denote the common domain as ${\cal T}$. Rewrite $f_j(t)$ as $f(j,t)$, which is now considered as a function of both $j$ and $t$ variables on the tensor product domain $\{ 1,2 \} \otimes {\cal T}$. Then we can represent the original functions as

$\displaystyle f(j,t) = f_1(t) \times I_{[j=1]} + f_2(t) \times I_{[j=2]}.$     (50)

Let
$\displaystyle {\cal M}_j = {\cal H}_{j0} \oplus {\cal H}_{j1} \oplus \cdots \oplus {\cal H}_{jp_j}, ~~~~j=1,2$     (51)

be the model space for $f_j$, where ${\cal H}_{j0} = \mbox{span} \{ \phi_{j1}(t),\cdots,\phi_{jm_j} \}$, and ${\cal H}_{jk}$'s are RKHS's with RK $R_{jk}(s,t)$ for $k \ge 1$. Then it is easy to check that
$\displaystyle f(j,t) \in {\cal H}_0 \oplus {\cal H}_1 \oplus \cdots \oplus {\cal H}_{p_1} \oplus {\cal H}_{p_1+1}
\oplus \cdots {\cal H}_{p_1+p_2},$     (52)

where ${\cal H}_0=\mbox{span} \{
\phi_1(j,t),\cdots,\phi_{m_1}(j,t),
\phi_{m_1+1}(j,t),\cdots,\phi_{m_1+m_2}(j,t) \}$, $\phi_k(j,t)=\phi_{1k}(t) \times I_{[j=1]}$ for $1 \le k \le m_1$, and $\phi_{k}(j,t)=\phi_{2k}(t) \times I_{[j=2]}$ for $m_1+1 \le k \le m_1+m_2$. ${\cal H}_k$ are RKHS's with RKs $R_{k}((l,s),(j,t)) = R_{1k}(s,t)) \times I_{[l=1]} \times I_{[j=1]}$ for $1 \le k \le p_1$ and $R_{k}((l,s),(j,t)) = R_{2k}(s,t)) \times I_{[l=2]} \times I_{[j=2]}$ for $p_1+1 \le k \le p_1+p_2$. The model space ([*]) is similar to that of a SS ANOVA model. Thus we can use the function ssr to fit functions $f_j$'s jointly.

We may reparametrize $f(j,t)$ as

$\displaystyle f(j,t)$ $\textstyle =$ $\displaystyle f_1(t) + (f_2(t)-f_1(t)) \times I_{[j=2]}$ (53)
  $\textstyle =$ $\displaystyle (f_1(t)+f_2(t))/2 + (f_1(t)-f_2(t)) \times (I_{[j=1]}-I_{[j=2]})/2.$ (54)

([*]) and ([*]) are SS ANOVA decompositions of $f(j,t)$ with the set-to-zero and sum-to-zero side conditions respectively. This kind of ANOVA decomposition can be carried out for general $J$.

For ([*]), let $g_1(t)=f_1(t)$ and $g_2(t)=f_2(t)-f_1(t)$. Let ${\cal M}_j$ in ([*]) be the model space of $g_j$. Then

$\displaystyle f(j,t) \in {\cal H}_0 \oplus {\cal H}_1 \oplus \cdots \oplus {\cal H}_{p_1} \oplus {\cal H}_{p_1+1}
\oplus \cdots {\cal H}_{p_1+p_2},$     (55)

where ${\cal H}_0=\mbox{span} \{
\phi_1(j,t),\cdots,\phi_{m_1}(j,t),
\phi_{m_1+1}(j,t),\cdots,\phi_{m_1+m_2}(j,t) \}$, $\phi_k(j,t)=\phi_{1k}(t)$ for $1 \le k \le m_1$, and $\phi_k(j,t)=\phi_{2k}(t) \times I_{[j=2]}$ for $m_1+1 \le k \le m_1+m_2$. ${\cal H}_k$ are RKHS's with RKs $R_{k}((l,s),(j,t)) = R_{1k}(s,t)$ for $1 \le k \le p_1$ and $R_k((l,s),(j,t)) = R_{2k}(s,t) \times I_{[l=2]} \times I_{[j=2]}$ for $p_1+1 \le k \le p_1+p_2$.

For ([*]), denote $g_1(t)=(f_1(t)+f_2(t))/2$ and $g_2(t)=(f_2(t)-f_1(t))/2$. Let ${\cal M}_j$ in ([*]) be the model space of $g_j$. Then

$\displaystyle f(j,t) \in {\cal H}_0 \oplus {\cal H}_1 \oplus \cdots \oplus {\cal H}_{p_1} \oplus {\cal H}_{p_1+1}
\oplus \cdots {\cal H}_{p_1+p_2},$     (56)

where ${\cal H}_0=\mbox{span} \{
\phi_1(j,t),\cdots,\phi_{m_1}(j,t),
\phi_{m_1+1}(j,t),\cdots,\phi_{m_1+m_2}(j,t) \}$, $\phi_k(j,t)=\phi_{1k}(t)$ for $1 \le k \le m_1$, and $\phi_k(j,t)=\phi_{2k}(t) \times (I_{[j=1]}-I_{[j=2]})$ for $m_1+1 \le k \le m_1+m_2$. ${\cal H}_k$ are RKHS's with RKs $R_{k}((l,s),(j,t)) = R_{1k}(s,t)$ for $1 \le k \le p_1$ and $R_{k}((l,s),(j,t)) = R_{2k}(s,t) \times (I_{[l=1]}-I_{[l=2]}) \times (I_{[j=1]}-I_{[j=2]})$ for $p_1+1 \le k \le p_1+p_2$.

Often we are interested in possible relationships, if any, between $f_1$ and $f_2$. For example, one may want to check if they are equal or parallel. Let $P$ be a probability measure on ${\cal T}$. Consider the following SS ANOVA decomposition

$\displaystyle f(j,t) = \mu + \alpha_j + g_1(t) + g_{12}(j,t) ,$     (57)

where

\begin{eqnarray*}
\mu &=& \frac{1}{2} \int_{{\cal T}} (f_1(t)+f_2(t)) dP(t) , \\...
... \\
g_{12}(j,t) &=& f_j(t) - \mu - \alpha_j - g_1(t),~~~~k=1,2.
\end{eqnarray*}

We have $\sum_{j=1}^2 \alpha_j = \int_0^1 g_1(t) dt =
\sum_{j=1}^2 g_{12}(j,t) = \int_0^1 g_{12}(j,t) dt = 0$. $\mu$ is the overall mean, $\alpha_j$ and $g_1(t)$ are the main effects and $g_{12}(j,t)$ is the interaction. ([*]), equivalent of ([*]), will make some hypotheses more transparent. It is easy to check that the following hypotheses are equivalent:
$\displaystyle \mbox{H}_0: ~~f_1(t)=f_2(t)$ $\textstyle \Longleftrightarrow$ $\displaystyle \mbox{H}_0:
~~\alpha_j+g_{12}(j,t) = 0 ,$  
$\displaystyle \mbox{H}_0: ~~f_1(t)-f_2(t)=\mbox{constant}$ $\textstyle \Longleftrightarrow$ $\displaystyle \mbox{H}_0:
~~g_{12}(j,t) = 0 ,$  
$\displaystyle \mbox{H}_0: ~~\int_0^1 f_1(t)dt=\int_0^1 f_2(t)dt$ $\textstyle \Longleftrightarrow$ $\displaystyle \mbox{H}_0: ~~ \alpha_j = 0 ,$  
$\displaystyle \mbox{H}_0: ~~f_1(t)+f_2(t)=\mbox{constant}$ $\textstyle \Longleftrightarrow$ $\displaystyle \mbox{H}_0: ~~
g_1(t) = 0 .$  

Furthermore, if $\alpha_j \ne 0$ and $g_1(t) \ne 0$,
$\displaystyle \mbox{H}_0: ~~af_1(t)+bf_2(t)=c,~~\vert a\vert+\vert b\vert>0$ $\textstyle \Longleftrightarrow$ $\displaystyle \mbox{H}_0: g_{12}(j,t) =
\beta \alpha_j g_1(t) .$  

Therefore the hypothesis that $f_1$ and $f_2$ are equal is equivalent to the $j$ effect $\alpha_j+g_{12}(j,t) = 0$. The hypothesis that $f_1$ and $f_2$ are parallel is equivalent to the hypothesis that the interaction $g_{12}=0$. The hypothesis that the integral of $f_1$ equals to that of $f_2$ is equivalent to the hypothesis that the main effect $\alpha_j=0$. The hypothesis that the summation of $f_1$ and $f_2$ is a constant is equivalent to the hypothesis that the main effect $g_1=0$. The hypothesis that there exists a linear relationship between the functions $f_1$ and $f_2$ is equivalent to the hypothesis that the interaction is multiplicative. Thus, for these simple hypotheses we can fit the SS ANOVA model ([*]) and perform tests on the corresponding components.

For illustration, we generate a data set from the following model

\begin{eqnarray*}
y_{1i} &=& \sin (2\pi i / 100) + \epsilon_{1i}, \\
y_{2i} &=&...
... / 100) + 2 \times (i/100) + \epsilon_{2i}, ~~~~i=1,\cdots,100 ,
\end{eqnarray*}

where random errors $(\epsilon_{1i},\epsilon_{2i})$'s follow bivariate normal with $\mbox{Var}(\epsilon_{1i})=.25$, $\mbox{Var}(\epsilon_{2i})=1$ and $\mbox{Cor}(\epsilon_{1i},\epsilon_{2i})=.5$. The variance-covariance structure can be specified with the combination of the weights and correlation options. We will fit with an arbitrary pairwise variance-covariance structure.

Suppose we want to use cubic splines to model $f_1$ and $f_2$ in ([*]), $f_1$ and $f_2-f_1$ in ([*]), and $(f_1+f_2)/2$ and $(f_1-f_2)/2$ in ([*]). Then $m_1=m_2=2$, $p_1=p_2=1$, $\phi_{11}(t)=\phi_{21}(t)=1$, $\phi_{12}(t)=\phi_{22}(t)=t-.5$, and $R_{11}$ and $R_{21}$ are the RK of a cubic spline. In the following we first fit $f_1$ and $f_2$ using marginal data as bisp.fit1 and bisp.fit2 respectively. Then we fit jointly using the formulation ([*]) as bisp.fit3 the formulation ([*]) as bisp.fit4, and the formulation ([*]), or equivalently the formulation ([*]), as bisp.fit5.

> options(contrasts=rep("contr.treatment", 2))	
> n <- 100
> s1 <- .5
> s2 <- 1
> r <- .5
> A <- diag(c(s1,s2))%*%matrix(c(sqrt(1-r**2),0,r,1),2,2)
> e <- NULL
> for (i in 1:n) e <- c(e,A%*%rnorm(2))
> t <- 1:n/n
> y1 <- sin(2*pi*t) + e[seq(1,2*n,by=2)]
> y2 <- sin(2*pi*t) + 2*t + e[seq(2,2*n,by=2)]
> bisp.dat <- data.frame(y=c(y1,y2),t=rep(t,2),id=rep(c(0,1),rep(n,2)),
                         pair=rep(1:n,2))

# fit separately
> bisp.fit1 <- ssr(y~I(t-.5),rk=cubic(t),spar="m",
                   data=bisp.dat[bisp.dat$id==0,])
> p.bisp.fit1 <- predict(bisp.fit1)
> bisp.fit2 <- ssr(y~I(t-.5),rk=cubic(t),spar="m",
                   data=bisp.dat[bisp.dat$id==1,])
> p.bisp.fit2 <- predict(bisp.fit2)
	
# fit jointly	
> bisp.fit3 <- ssr(y~id*I(t-.5), rk=list(rk.prod(cubic(t),kron(id==0)),
                  rk.prod(cubic(t),kron(id==1))), spar="m",
                  weights=varIdent(form=~1|id),
                  cor=corSymm(form=~1|pair), data=bisp.dat)
> bisp.fit3
...
GML estimate(s) of smoothing parameter(s) : 0.2793703 0.3788567 
Equivalent Degrees of Freedom (DF):  11.84544 
Estimate of sigma:  0.4616973 
Correlation structure of class corSymm representing
 Correlation: 
      1 
2 0.523
Variance function structure of class varIdent representing
 0        1 
 1 2.270982

> p.bisp.fit3.1 <- predict(bisp.fit3,newdata=bisp.dat[bisp.dat$id==0,],
                           terms=c(1,0,1,0,1,0))
> p.bisp.fit3.2 <- predict(bisp.fit3,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(1,1,1,1,0,1))
> p.bisp.fit3.1$pstd <- p.bisp.fit3.1$pstd*sqrt((2*n-bisp.fit3$df)
                        /(2*n-bisp.fit3$df-1))
> p.bisp.fit3.2$pstd <- p.bisp.fit3.2$pstd*sqrt((2*n-bisp.fit3$df)
                        /(2*n-bisp.fit3$df-1))

> bisp.fit4 <- ssr(y~id*I(t-.5), rk=list(cubic(t),
                   rk.prod(cubic(t),kron(id==1))), 
                   spar="m", weights=varIdent(form=~1|id),
                   cor=corSymm(form=~1|pair), data=bisp.dat)
> p.bisp.fit4.1 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==0,],
                           terms=c(1,0,1,0,1,0))
> p.bisp.fit4.2 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(1,1,1,1,1,1))
> p.bisp.fit4.3 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(0,1,0,1,0,1))
> p.bisp.fit4.4 <- predict(bisp.fit4,newdata=bisp.dat[bisp.dat$id==1,],
                           terms=c(0,0,0,1,0,1))
> p.bisp.fit4.1$pstd <- p.bisp.fit4.1$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))
> p.bisp.fit4.2$pstd <- p.bisp.fit4.2$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))
> p.bisp.fit4.3$pstd <- p.bisp.fit4.3$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))
> p.bisp.fit4.4$pstd <- p.bisp.fit4.4$pstd*sqrt((2*n-bisp.fit4$df)
                           /(2*n-bisp.fit4$df-1))	
> bisp.fit5 <- ssr(y~id*I(t-.5), 
                rk=list(cubic(t),rk.prod(cubic(t),kron((id==0)-(id==1)))), 
                spar="m", weights=varIdent(form=~1|id),
                cor=corSymm(form=~1|pair), data=bisp.dat)
> p.bisp.fit5.1 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==0,],
                   terms=c(1,0,1,0,1,1))
> p.bisp.fit5.2 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==1,],
                   terms=c(1,1,1,1,1,1))
> p.bisp.fit5.3 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==1,],
                   terms=c(0,1,0,1,0,1))
> p.bisp.fit5.4 <- predict(bisp.fit5,newdata=bisp.dat[bisp.dat$id==1,],
                   terms=c(0,0,0,1,0,1))
> p.bisp.fit5.1$pstd <- p.bisp.fit5.1$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))
> p.bisp.fit5.2$pstd <- p.bisp.fit5.2$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))
> p.bisp.fit5.3$pstd <- p.bisp.fit5.3$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))
> p.bisp.fit5.4$pstd <- p.bisp.fit5.4$pstd*sqrt((2*n-bisp.fit5$df)
                   /(2*n-bisp.fit5$df-1))

For each fit, we calculate the estimated functions and posterior variances. For fits bisp.fit4 and bisp.fit5, we also calculate the estimates and posterior variances of $f_1-f_2$ and $g_{12}$ in ([*]). They are used to check if $f_1=f_2$ and if they are parallel respectively. Note that we inflate the posterior variances by one more degree of freedom used for estimating the correlation parameter. These estimates are shown in Figures [*], [*] and [*]. Even though not obvious in Figure [*], the Bayesian confidence intervals of the joint estimates are uniformly narrower than those of the function-by-function estimates. Obviously from Figures [*] and [*] that these two functions are not equal, nor parallel.

Figure: Upper left: estimate of $f_1$ from bisp.fit1. Upper right: estimate of $f_2$ from bisp.fit2. Lower left: estimate of $f_1$ from bisp.fit3. Lower right: estimate of $f_2$ from bisp.fit3. Dashed lines are the true function. Solid lines are the estimates. Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=bisp1.ps,height=2in,width=6.5in}}\end{figure}

Figure: Upper left: estimate of $f_1$ from bisp.fit4. Upper right: estimate of $f_2$ from bisp.fit4. Lower left: estimate of $f_1-f_2$ from bisp.fit4. Lower right: estimate of $g_{12}$ from bisp.fit4. Dashed lines are the true function. Solid lines are the estimates. Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=bisp2.ps,height=2in,width=6.5in}}\end{figure}

Figure: Upper left: estimate of $f_1$ from bisp.fit5. Upper right: estimate of $f_2$ from bisp.fit5. Lower left: estimate of $f_1-f_2$ from bisp.fit5. Lower right: estimate of $g_{12}$ from bisp.fit5. Dashed lines are the true function. Solid lines are the estimates. Dotted lines are 95% Bayesian confidence intervals.
\begin{figure}\centerline{\psfig{figure=bisp3.ps,height=2in,width=6.5in}}\end{figure}


next up previous
Next: Examples Up: ASSIST: A Suite of Previous: The snm Function
Yuedong Wang 2004-05-19