mgcv overview
mgcv is an R package which provides Generalized Additive Modelling routines for R.
The routines are similar to what is described in the White Book and implemented in S-PLUS, but there are
some substantial differences in the underlying implementation:
- The mgcv gam() routine defaults to automatic selection of smoothing parameters
by minimizing a GCV score (or related C_p score) for the whole fitted model.
- GAMs are not estimated by backfitting, but by direct penalized regression or
penalized likelihood methods. The penalties control the smoothness of model terms,
and the `smoothing parameters', which control the weight given to each penalty in
fitting, are what get optimized to minimize the GCV score.
- To facilitate the above, smooths are represented using basis functions and associated
penalties. Built in bases are cubic regression splines, cyclic cubic regression splines,
and the very general `thin plate regression splines', which can be used for smooths of any
number of variables.
- User defined smooths can be added: P-splines are provided as an example of how to
do this.
- Smooths are typically of low rank - that is they have substantially fewer parameters than
there are data to fit, but more parameters than the modeller believes are strictly necessary: the
penalization avoids over-fitting.
- loess smooths are not available, as they can not be represented using a basis and
quadratic penalty, but smooths of more than one variable are available, via (isotropic)
thin plate regression splines or tensor product smooths.
- A very general mechanism for tensor product smoothing is implemented: this enables
smooths of several variables to be implemented in a straightforward manner from
tensor products of smooths of fewer variables (usually one). tensor product smooths have
a penalty per covariate of the smooth, and are invariant to linear rescaling of covariates.
The construction avoids the undersmoothing problems seen with single penalty tensor product
smooths.
- Confidence intervals are calculated using a Bayesian model of the smoothing process related
to the mixed model representaion of a GAM. The estimated posterior covariance matrix of the
parameters of the GAM is directly available to the user, and makes prediction with confidence/credible intervals
particularly straightforward: for example predict.gam() can produce standard errors for predictions
at any point without difficulty.
- Generalized Additive Mixed Models (GAMMs) are implemented with PQL or REML estimation.