Model and Estimation

Next: The ssr Function Up: General Smoothing Spline Regression Previous: General Smoothing Spline Regression

Model and Estimation

The general smoothing spline regression (SSR) model with one variable assumes that (Wahba, 1990)

$\displaystyle y_i=L_i f + \epsilon_i, \ \ \ \ i=1, \cdots, n,$

(1)

where

's are univariate responses;

is an unknown function of an independent variable

with

belonging to an arbitrary domain ${\cal T}$ and $f\in{\cal H}$ , a given Reproducing Kernel Hilbert Space (RKHS);

are bounded linear functionals on ${\cal H}$ ; and $\epsilon_i$ 's are random errors with $\epsilon_i\stackrel{iid}{\sim} \mbox{N}(0, \sigma^2)$ . Note that

may be a vector. For most applications,

's are evaluation functionals at design points:

Suppose that

$\displaystyle {\cal H}={\cal H}_0\oplus{\cal H}_1,$

(2)

where ${\cal H}_0$ is a finite dimensional space with basis functions $\phi_1(t), \cdots, \phi_M(t)$ , and ${\cal H}_1$ is a RKHS with reproducing kernel

. See Aronszajn (1950) and Wahba (1990) for more information about RKHS. The estimate of

, $\hat{f}_{\lambda}$ , is the minimizer of the following penalized least squares

$\displaystyle \frac{1}{n}\sum_{i=1}^n(y_i-L_if)^2 + \lambda \vert\vert P_1f\vert\vert^2,$

(3)

where

is the orthogonal projection operator of

onto ${\cal H}_1$ in ${\cal H}$ , and $\lambda$ is a smoothing parameter controlling the balance between goodness-of-fit measured by the least squares and departure from the null space ${\cal H}_0$ measured by $\vert\vert P_1f\vert\vert^2$ . Note that functions in ${\cal H}_0$ are not penalized.

Let $\mbox{\boldmath$y$}=(y_1,\cdots,y_n)^T$ . Define $\xi_i(t)= L_{i(\cdot)} R_1 (t,\cdot)$ , $T_{n\times M}={\{L_i\phi_v\}_{i=1}^n}_{v=1}^M$ and $\Sigma=\{<\xi_i, \xi_j>\}_{i, j=1}^n$ . Given $\lambda$ , the solution to () has the form (Wahba, 1990)

$\displaystyle \hat{f}_{\lambda}(t)=\sum_{i=1}^M d_i \phi_i(t) +\sum_{j=1}^n c_j \xi_j(t),$

(4)

where the coefficients $\mbox{\boldmath$d$}=(d_1, \cdots, d_M)^T$ and $\mbox{\boldmath$c$}=(c_1, \cdots, c_n)^T$ are solutions to

$\displaystyle (\Sigma + n\lambda I)\mbox{\boldmath$c$}+ T\mbox{\boldmath$d$}$	$\textstyle =$	$\displaystyle \mbox{\boldmath$y$},$
$\displaystyle T^T\mbox{\boldmath$c$}$	$\textstyle =$	$\displaystyle 0 .$	(5)

The Fortran subroutine dsidr.r in RKPACK was developed to solve equations (

) (Gu, 1989). In our ASSIST package, the S function dsidr serves as an intermediate interface between S and the driver dsidr.r.

Next: The ssr Function Up: General Smoothing Spline Regression Previous: General Smoothing Spline Regression

Yuedong Wang 2004-05-19