Machine Learning for Biostatistics (MLB)

Recent years have brought a rapid growth in the amount and complexity of health data captured, requiring new statistical techniques in both predictive and descriptive learning. Machine learning algorithms for classification and prediction, complement classical statistical tools in the analysis of these data. This unit will cover modern machine learning methods particularly useful for large and complex health data.


Prof Armando Teixeira-Pinto Sydney School of Public Health, University of Sydney

Prof Armando Teixeira-Pinto University of Sydney, Sydney School of Public Health Semester 2
General outline


Linear Models or Regression methods for epidemiology (or equivalent unit)


Categorical Data and Generalised Linear Models

Time commitment

8-12 hours total study time per week

Semester availability

Semester 2


Two major assignments worth 40% each (equivalent to 2 x 2000 words) and two short assignments worth 10% each.

Prescribed Texts

James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer, 2003. (freely available online: For details, including ISBN, see the BCA Textbook and Software Guide

Special Computer Requirements

R and RStudio


The topics covered include : Linear Regression and K -Nearest Neighbors; Classification (logistic regression, linear discriminant analysis); Resampling Methods (Cross-Validation, Bootstrap); Model Selection and Regularization (subset selection, shrinkage methods, dimension reduction methods); Beyond Linearity (fractional polynomials, basis functions, splines, generalized additive models); Tree-Based Methods (decision trees, bagging, random forests, boosting).

Special Computer Requirements

Course notes, assignment material and interaction facilities available online

*co-requisite, may be taken before or concurrently