Data Management and Statistical Computing (DMC)

The aim of this unit is to provide students with the knowledge and skills required to undertake moderate to high level data manipulation and management in preparation for statistical analysis of data typically arising in health and medical research. Specific objectives are for students to: • Gain experience in data manipulation and management using two major statistical software packages (Stata and R) • Learn how to display and summarise data using statistical software • Become familiar with the checking and cleaning of data • Learn how to link files through use of unique and non-unique identifiers • Acquire fundamental programming skills for efficient use of software packages • Learn key principles regarding confidentiality and privacy in data storage, management and analysis



COORDINATORS:
Dr Shenal Dedduwakumara University of Adelaide, School of Public Health Semester 1
Dr Louise Marquart-Wilson The University of Queensland, School of Public Health Semester 2
General outline

Prerequisites

None

Time commitment

8-12 hours total study time per week

Semester availability

Semester 1 & 2

Assessment

Three written assignments worth 30%, 35% and 35%

Recommended Texts

If you have not used R or Stata previously, it is recommended that you have access to the text for the relevant software.

Hadley Wickham and Garrett Grolemund, R for Data Science: Import, Tidy, Transform, Visualize, and Model Data.  O’Reilly Media, 2017.

Svend Juul and Morten Frydenberg. An Introduction to Stata for Health Researchers. Stata Press 2014

Special Computer Requirements

R and Stata software; RStudio is also strongly recommended.

Content

The topics covered are:

  • Module 1 – Stata and R: The basics (importing and exporting data, recoding data, formatting data, labelling variable names and data values; using dates, data display and summary presentation, and creating programs)
  • Module 2 – Stata and R: graphs, data management and statistical quality assurance methods (including advanced graphics to produce publication-quality graphs)
  • Module 3 – Data management using Stata and R (using functions to generate new variables, appending, merging, transposing longitudinal data; programming skills for efficient and reproducible use of these packages, including loops and arguments

Resources

Course notes, online mini-lecture videos, online tutorials, discussion board

The BCA acknowledges we live and work on the ancestral lands of Aboriginal and Torres Strait Islander peoples, who have for thousands of generations exchanged knowledge for the benefit of all. We pay our respects to those who have cared and continue to care for Country.