Machine Learning for Biostatistics (MLB)

Recent years have brought a rapid growth in the amount and complexity of health data captured, requiring new statistical techniques in both predictive and descriptive learning. Machine learning algorithms for classification and prediction, complement classical statistical tools in the analysis of these data. This unit will cover modern machine learning methods particularly useful for large and complex health data.



COORDINATORS:
Prof Armando Teixeira-Pinto University of Sydney, Sydney School of Public Health Semester 2
General outline

Prerequisites

Epidemiology, Mathematical Foundations for Biostatistics, Principles of Statistical Inference, Regression Modelling for Biostatistics 1

Time commitment

8-12 hours total study time per week

Semester availability

Semester 2

Assessment

Two major assignments worth 40% each (equivalent to 2 x 2000 words) and two short assignments worth 10% each.

Prescribed Texts

James G, Witten D, Hastie T, Tibshirani R. An Introduction to Statistical Learning with Applications in R. Springer, 2003. (freely available online: http://www-bcf.usc.edu/~gareth/ISL/ISLR%20Seventh%20Printing.pdf)

Special Computer Requirements

R and RStudio

Content

The topics covered include: Linear Regression and K -Nearest Neighbors; Classification (logistic regression, linear discriminant analysis); Resampling Methods (Cross-Validation, Bootstrap); Model Selection and Regularization (subset selection, shrinkage methods, dimension reduction methods); Beyond Linearity (fractional polynomials, basis functions, splines, generalized additive models); Tree-Based Methods (decision trees, bagging, random forests, boosting).

Special Computer Requirements

Course notes, online mini-lecture videos, online tutorials, discussion board