S361 Linear statistical modelling (S1)

Introduction

The methods of simple linear regression and analysis of variance introduced in the Statistical inference module (867Z) can be generalized in several ways to investigate how the distribution of a response variable is influenced by one or more explanatory variables and by experimental factors. This module is mainly concerned with graphical and numerical methods for describing and quantifying such relationships. Linear regression models under the assumption of Normality are shown to be particular cases of a general Normal Linear Model (NLM), which is expressed in matrix notation: the NLM includes many other models of importance in statistics, and several of these are investigated. The MINITAB statistical package is used to examine and analyse data sets. Three practical sessions provide experience in using this package and develop skills in preparing reports on statistical analyses.

Aims

  1. To extend the ideas of the simple linear regression model to models with more than one explanatory variable.
  2. To develop the results on matrices, univariate distributions and multivariate distributions required for the NLM.
  3. To define the NLM and describe some important special cases.
  4. To derive and illustrate methods of inference for the NLM.
  5. To provide experience of modelling and data analysis based on the NLM using the MINITAB statistical package.
  6. To give experience of preparing reports on statistical analyses.

Prerequisites

Statistical inference (867Z). Concurrent attendance at Inner product spaces (M302) is recommended.

Syllabus summary

Relationships between variables; transformations to linearity; revision of simple linear regression.

Normal and related distributions: applications to linear regression. Random vectors; linear and quadratic forms in independent Normal variables; partitioning sums of squares.

Regression theory in matrix notation: the Normal Linear Model and some special cases; least-squares estimates, fitted values and residuals; model and residual sums of squares. Student-t and F statistics for linear hypotheses; analyses of variance.

One-way classification and other simple analysis-of-variance models.

Texts

  1. Christensen, R., Analysis of Variance, Design and Regression, Chapman & Hall, 1996.
    Useful for S361 and the Design and Analysis of Experiments module in Statistics 4.
  2. Freund, J.E., Mathematical Statistics, 5th Edition, Prentice-Hall, 1992.
    Relevant to the distribution theory.
  3. Krzanowski, W., An Introduction to Statistical Modelling, Arnold, 1998.
    Covers much of the material of S361 and S363, but at a slightly lower level.
  4. Montgomery, D.C. & Peck, E.A., Introduction to Linear Regression Analysis, 2nd Edition, Wiley, 1992.
    Recommended for module S361.
  5. Weisberg, S., Applied Linear Regression, 2nd Edition, Wiley, 1985.
    Also useful for module S361.

Syllabus

  1. Relationships between variables; numerical examples and models; transformations to linearity; revision of simple linear regression. (4)
  2. Residual and regression sums of squares; analysis of variance in simple linear regression; inferences about slope, expected response and future response; residual analysis. (2)
  3. Linear regression with two explanatory variables: least-squares estimates; alternative formulation; analysis of variance tables; inferences about regression coefficients; analysis of residuals. (2)
  4. Gamma function; continuous random variables; Normal, Gamma, chi-square distributions; t, F distributions and tests. (2)
  5. Matrix theory: idempotent matrices, projections, partitioned matrices. (1)
  6. Multivariate distributions: p.d.f., Jacobian; marginal and conditional distributions; expectation vector; variance matrix; m.g.f. (2)
  7. Multivariate Normal distributions: definition; m.g.f.; distributions of linear and quadratic forms (with examples from Normal linear models); decompositions of sums of squares. (2)
  8. Normal Linear Model: regression models in matrix notation; least-squares estimates and their properties; fitted values and residuals; distributions of sums of squares; inferences about regression parameters using t-statistics and extra sums of squares. (3)
  9. One-way classification and other analysis-of-variance models. (4)
  10. Report writing and use of Scientific Word.

Notes and Links

  1. There are three practical classes using MINITAB: this package is used extensively for illustrating regression calculations and plots, and is needed for several tutorial exercises.
  2. Development of the Normal Linear Model depends strongly on matrix results, especially linear equations, linear independence and rank, orthogonal transformations, quadratic forms, idempotent matrices and projections, partitioned matrices. Some use is also made of vector differentiation and Lagrange multipliers.
  3. Two of the practical assignments will be assessed, and will together contribute 15% to the final mark for the module.
  4. The two-way classification is discussed only briefly: this topic, and higher-way layouts and analysis of covariance, might be included in S4: Design and Analysis of Experiments or S4: Data Analysis.
  5. A small inducement to handing in good work is that statistical summer vacation employment can be found for several students each year: recommendations are made on the basis of marks obtained in S361.
  6. Three or four talks on careers in statistics are organized for one afternoon in Term 1.

Outcomes

  1. Knowledge of the definition and properties of the NLM.
  2. Familiarity with some examples of the NLM and ability to recognise other special cases.
  3. Knowledge of distributions related to the Normal distribution.
  4. Ability to use MINITAB commands for manipulation of data, regression, simple analysis of variance and graph plotting.
  5. Ability to write short reports of statistical analyses with Scientific Word using selected MINITAB output.