menuicon

Undergraduate Study

Computational Statistical Methods

STAT5003 (Semester 2, 2017)

This unit of study forms part of the Master of Information Technology degree program. It follows on from STAT5002, and will be held in Semester 2.

The objectives of this unit of study are to develop an understanding of modern computationally intensive methods for statistical learning, inference, exploratory data analysis and data mining. Advanced computational methods for statistical learning will be introduced, including clustering, density estimation, smoothing, predictive models, model selection, combinatorial optimisation, simulation methods, sampling methods, the Bootstrap, Monte Carlo, Cross Validation and Jackknife approach. In addition, the unit will demonstrate how to apply the above techniques effectively for use on large data sets in practice. Finally, this unit will show how to make inferences about populations of interest in data mining problems.

  • Classes: one evening per week from 6 pm to 9 pm on Wednesday.
  • Location: PNR learning Studio 310
  • Lecture information sheet: PDF
  • You will need to bring your own laptop for working on execrise and examples.

Lecturer

Dr. Pengyi Yang
Consultation time: Carslaw 827 (10:30 - 12pm Tuesday, please email me a day prior before you come)
Please contect me if you have any inquiries regarding this course (stat5003usyd@gmail.com). We will also use ED STEM to share and address questions. Go here to post and comment on questions regarding this course.

Textbooks

  • An Introduction to Statistical Learning (with Applications in R), Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, 2014, Springer PDF
  • Computational Statistics (Second Edition), Geof Givens, Jennifer Hoeting, 2013, Wiley
  • Applied Predictive Modeling, Max Kuhn, Kjell Johnson, 2013, Springer.
  • If you need an introduction to statistics and R, please refer to Introductory Statistics with R, Peter Dalgaard, 2008, Springer PDF

Project groups

Find your assigned group here: EXCEL

Survey data

Please download survey data and project description from:

  • Survey data: EXCEL
  • Project description: PDF

Final project

  • Project description: DOC
  • Project material: ZIP

Weekly lectures and tutorials

Week  Date  Summary  Details  Notes  Tutorial  Lab Rmds
1 2 August Basics of statistical computing Review of descriptive statistics, R programming, R markdown (Reproducible research), creating survey data PDF PDF ZIP
2 9 August Exploratory data analysis and clustering Hierarchical clustering, k-means clustering, fuzzy c-means clustering and their variants. Evaluation of clustering results PDF PDF ZIP
3 16 August Density estimation Univariate, multivariate density estimation, using kernel and nonkernel based methods PDF PDF ZIP ZIP
4 23 August Regression review and data smoothing Reviewing concept and model related to linear regressions. Introducing kernel, spline, and loess based data smoothing methods PDF PDF ZIP
5 30 August Introduction to classification Introducing classification settings, models and performance metrics PDF PDF ZIP
6 6 September Quiz1; Cross-validation and bootstrap Introducing cross-validation for evaluation, model selection and parameter selection. Introducing bootstrap for estimating varibilities and errors PDF PDF ZIP
7 13 September Group presentation Group presentation on survey data analysis
8 20 September More on classification Introduce maximal margin classifier and SVM; Introduce imbalanced class distribution and semi-supervised learning PDF PDF ZIP
9 4 October Feature and model selection Feature selection using filter and wrapper methods; Shrinkage-based methods for model selection in regression PDF PDF ZIP
10 11 October Combinatorial optimisation Monte Carlo methods for integration and optimisation; Genetic algorithms based combinatorial optimisation PDF PDF ZIP
11 18 October Quiz2; Tree-based ensemble classifiers Introduction to tree-based ensemble classifiers including bagging, random forest, and boosting PDF PDF ZIP

Datasets for labs and tutorials

See also the STAT5003 information in the University's unit of study database.

Timetable

Last revised 11/10/17

STAT5003MondayTuesdayWednesdayThursdayFriday
6pm  
 
Lecture
PNR 310
P.Yang
 
 
   
 
Holding
ABSL3100
(Wks 1,3-13)
 
 
   
 
Holding
ABSL3300
(Week 2)
 
 
7pm  
 
Lecture
PNR 310
Y.Tam
KY.Wang
P.Yang
 
 
   
 
Holding
ABSL3100
(Wks 1,3-13)
 
 
   
 
Holding
ABSL3300
(Week 2)
 
 
8pm  
 
Lecture
PNR 310
Y.Tam
KY.Wang
P.Yang
 
 
   
 
Holding
ABSL3100
(Wks 1,3-13)
 
 
   
 
Holding
ABSL3300
(Week 2)
 
 

Show timetable / Hide timetable.