class: split-70 with-border hide-slide-number bg-brand-red background-image: url("images/USydLogo-black.svg") background-size: 200px background-position: 2% 90% .column.white[.content[ <br><br><br> # Getting Started and R Markdown ## .black[STAT3022 Applied Linear Models Lecture 1] <br><br><br> ### .black[2020/02/24] ]] .column.bg-brand-charcoal[.content.white[ ## Today 1. Introduce yourself to someone you don't know. 1. Course structure and expectations. 1. Meet the toolkit - learn to use R Markdown. ]] --- background-color: black background-image: url("images/disability.png") background-size: contain --- class: split-30 .column.bg-brand-blue.white[ .split-two[ .row[.content.vmiddle.center[ # Lecturers ]] .row[.content.vmiddle.center[ # STAT3022 ]] ]] .column.black[.split-two[ .row[.content[ * .brand-blue[A/Prof. Jennifer Chan] for .brand-blue[STAT3022] *
jennifer.chan@sydney.edu.au *
Carslaw 817 * .brand-blue[Dr. Munir Hiabu] for .brand-blue[STAT3922] *
munir.hiabu@sydney.edu.au *
Carslaw 827 ]] .row[.content[ ## What is this unit about? The overall aim of this unit is to develop skills in the **statistical analysis** of data from **designed experiments** and **observational studies**. The unit will be divided into five themes: <center> <img src="images/themes.png" width="80%"/> </center> ]] ]] --- class: split-30 .column.bg-brand-blue.white[.content.vmiddle.center[ # .brand-yellow[STAT3022] Course Structure ]] .column.black[.content[ ## Structure * 3 one hour lectures per week on Mon, Thu & Fri 9am from week 1 to 13 * 1 tutorial per week from week 2 to 13 * 1 computer lab per week from week 1 to 13 (replaced by lecture for STAT3922 students) ## Assessments * 2 assignments due .brand-red[Mon 6th April] and .brand-red[Mon 25th May] * each worth 5% of total mark for STAT3022 students * each worth 5% of total mark for STAT3922 students * 1 quiz held .brand-red[9am Thu 7th May] * each worth 20% of total mark for STAT3022 students * each worth 15% of total mark for STAT3922 students * final exam worth 70% of total mark for STAT3022 students and 60% for STAT3922 students during .brand-red[June exam period] ]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # Academic Integrity ]] .column[.content.black[ ## Assignments * You may make use of any online resources but you must explicitly cite where you obtained any information you directly use (or use as inspiration). * Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. * On assignments you may not directly share code with another student in this class. * You are welcome to discuss the problems together and ask for advice, but you may not send or make use of code from another person. <center> <img src="images/dsbox-logo.png" width="35%"/> </center> .bottom_abs.width100[ <img src="images/dsbox-logo.png" height="30px">Sourced from Data Science in a Box: https://datasciencebox.org/ ] ]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ #
Help! ]] .column[.content.black[ ## Where to get help * Post your questions on edstem. * Ask your tutor .brand-blue[during tutorial or lab] or lecturer during <br>.brand-blue[consultation hour: 10-11am Fridays]. ## Tips for asking questions * *First search* existing resources and discussion for answers. * Use proper formatting in edstem. E.g. if using code, use code formatting and LaTeX for mathematical equations. * Give context and be precise in your description: * Good description: "For Tutorial 3 Q1 (k), the answer is given as `\(50.59165 \pm 1.812461 \times 4.567673\)`. Where does the value 1.812461 come from? Should this be `\(t_{10}^{1}(0.975)=2.228\)`, which is the same as question 1 (j)?" * Bad description: "I don't get Tutorial 3 Q1 (k). ]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # Expectations ]] .column[.content.black[ ## Organisation You are expected to: * check the STAT3022 canvas website frequently; * check and _contribute_ to the STAT3022 edstem discussion board; * complete the tutorial and computer lab questions in Week `\(n\)` by Week `\(n + 1\)`; * seek help from your tutor during tutorial and computer lab; and * post your questions in edstem discussion board. ## Technical skills You are expected to: * program in statistical programming language R; if you are not familiar with R, you are recommended to check the [resources page in canvas](https://canvas.sydney.edu.au/courses/14626/pages/resources) and quickly catch up with the basics in R; and * completed (STAT2x12 or DATA2x02) and MATH1x02 or equivalent courses. ]] --- class: center, middle, bg-brand-red, white # What's your expectations out of this course? --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # Theme 2 Overview ]] .column[.content.black[ ## Linear regression * Model fit * Simple & Multiple linear regression * Polynomial regression * Robust regression * Model diagnostics * Leverage points and outliers * Multi-collinearity * Goodness of fit measures: Multiple correlation coefficient, AIC, Cp and BIC * Model selection * Forward, backwards, step-wise selection procedures * Inference: t-test and general F-test ]] --- class: center, middle, bg-brand-red, white # Did someone say "Applied"? -- <br><br> # Theme 1 --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # Toolkits # for # analysis ]] .column[.content.black[ <br> <center> <img src="images/R.jpeg" width="100px"/><br><br> <span class="font_large">+</span> <br><br> <img src="images/RStudio-Logo-Flat.png" width="200px"/> <br><br> <span class="font_large">+</span> <br><br> <img src="images/hex-tidyverse.png" width="100px"/> <img src="images/hex-ggplot2.png" width="100px"/> <img src="images/hex-dplyr.png" width="100px"/> <img src="images/hex-rmarkdown.jpeg" width="100px"/>... <br><br> .blockquote.font_large[What's the difference between these?] </center> ]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # R essentials ]] .column[.content[ ## A short list: * Functions are (most often) verbs, followed by what they will be applied to in parantheses: ```r do_this(to_this) do_that(to_this, to_that, with_those) ``` * Columns (variables) in data frames are accessed with `$`: ```r dataframe$var_name ``` * Packages are installed with the install.packages function and loaded with the library function, once per session: ```r install.packages("package_name") library(package_name) ``` .bottom_abs.width100[ <img src="images/dsbox-logo.png" height="30px">Sourced from Data Science in a Box: https://datasciencebox.org/ ] ]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # Literate Programming ] .bottom_abs[ In .yellow[literate programming], you interweave code, output and narrative in one place. ] ] .column[.content.black[ ## Reproducibility checklist .blockquote[What does it mean for a data analysis to be "reproducible"?] ### Near-term goals: * Are the tables and figures reproducible from the code and data? * Does the code actually do what you think it does? * In addition to what was done, is it clear why it was done? (e.g., how were parameter settings chosen?) ### Long-term goals: * Can the code be used for other data? * Can you extend the code to do other things? .bottom_abs.width100[ Side note: .brand-blue[Literate Programming] by Donald Knuth is the seminal book on literate programming.<br> <img src="images/dsbox-logo.png" height="30px"> Sourced from Data Science in a Box: https://datasciencebox.org/ ]]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # Markdown # Syntax ]] .column[.content[ * You will achieve literate programming by using .brand-blue[R Markdown] in this course. * Let's start with simple markdown syntax for text. <img src="images/markdown2.png" width="50%"/><img src="images/markdown1.png" width="50%"/> ]] --- layout: true class: split-three .column.bg-brand-blue.white[ .content.vmiddle.center[ # Markdown # Example ] .bottom_abs.width100.pad1[ {{content}} #### See also RStudio > #### Help > #### Cheatsheet > #### R Markdown Reference Guide ]] .column[.content[ # .orange[Code] ## <u>Headers</u> ```markdown # First level header ## Second level header ### Third level header ``` ## <u>Emphasis</u> ```markdown *This text will be italic* **This text will be bold** *You **can** combine them* ``` ## <u>Images</u> ```markdown ![](images/kunoichi.svg) ``` ]] .column.bg-gray[.content[ # .orange[Output] .bg-code[ # First level header ## Second level header ### Third level header ] <br><br> .bg-code[ *This text will be italic*<br> **This text will be bold**<br> *You **can** combine them*<br> ] <br><br> .center[ <img src="images/kunoichi.svg" width="100px"> ]]] --- {{content}} --- count: false ##
<img src="images/down-arrow.svg" style="height:1em; width:auto; "/> Look here
--- layout: true class: split-50 with-border border-black border-dotted .column.bg-brand-yellow[ # .center[An R Markdown document] ### File extension should be .font-mono[.Rmd] ````markdown --- title: "A Simple Regression" author: "Yihui Xie" output: html_document --- We built a linear regression model. ```{r} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit, 1) ``` The *slope* of the regression is **`r b[2]`**. ```` {{content}} ] .column[.content.nopadding[ .img-fill[![](images/knitr_html_output.png)] ]] --- class: show-10 {{content}} --- class: show-10 count: false Now it's now time to start knitting <img src="images/knitting.svg" width="40px"> --- count: false Now it's now time to start knitting <img src="images/knitting.svg" width="40px"> --- layout: false class: split-50 with-border border-black border-dotted .column.bg-brand-yellow[ # .center[An R Markdown document] ### File extension should be .font-mono[.Rmd] ````markdown --- title: "A Simple Regression" author: "Yihui Xie" output: pdf_document --- We built a linear regression model. ```{r} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit, 1) ``` The *slope* of the regression is **`r b[2]`**. ```` * The output of your document is now a **pdf** file. * You will need to have [LaTeX](https://www.latex-project.org/get/) installed for this to work! * You can get LaTeX also through [tinytex](https://yihui.name/tinytex/) R-package. ] .column[.content.nopadding[ .img-fill[![](images/knitr_pdf_output.png)] ]] --- layout: true class: split-50 with-border border-black border-dotted .column.bg-brand-yellow[ # .center[An R Markdown document] ### File extension should be .font-mono[.Rmd] ````markdown --- title: "A Simple Regression" author: "Yihui Xie" output: word_document --- We built a linear regression model. ```{r} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit, 1) ``` The *slope* of the regression is **`r b[2]`**. ```` The output of your document is now a **word** file. {{content}} ] ] .column[.content.nopadding[ .fig90[![](images/knitr_word_output.png)] ]] --- --- count: false Hint: Start a R Markdown file with RStudio > New File > R Markdown... --- layout: false class: split-60 with-border .column.bg-brand-blue.white[.content[ Don't want your .yellow[code chunks] to .orange[appear] in the report? ````markdown *```{r, echo=F} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit, 1) ``` ```` Don't want your .yellow[code chunks] to be .orange[evaluated] (but appear) in the report? ````markdown *```{r, eval=F} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit, 1) ``` ```` Want your .yellow[code chunks] to be evaluated but hide code and output? ````markdown *```{r, include=F} fit <- lm(dist ~ speed, data = cars) b <- coef(fit) plot(fit, 1) ``` ```` ]] .column[.content.vmiddle.center[ #You can find other code chunk options [here](https://yihui.name/knitr/options/). ]] --- class: split-30 .column.bg-brand-blue.center.white[.content.vmiddle[ # R Markdown ]] .column[.content[ * Fully reproducible reports -- each time you knit the analysis is ran from the beginning. .blockquote[ What is the difference between .green[Markdown] and .green[R Markdown]? ] ```r summary(cars$dist) ``` ``` Min. 1st Qu. Median Mean 3rd Qu. Max. 2.00 26.00 36.00 42.98 56.00 120.00 ``` ```r summary(cars$speed) ``` ``` Min. 1st Qu. Median Mean 3rd Qu. Max. 4.0 12.0 15.0 15.4 19.0 25.0 ``` <center> <img src="images/rcode.png" width="50%"/> </center> ]] --- class: split-30 background-image: url("images/bg1.jpg") background-size: cover .column.bg-pink.center.white[.content.vmiddle[ # Excuse me, do you have time to talk about your .yellow[future]? ]] .column[.content.black[ ]] ??? Is this going to be examinable? --- layout: true class: split-60 .column.bg-brand-red.white[.content[ # Summary * Carefully read the expectations. * Be sure to schedule and plan well (important graduates attributes!) * Get your toolkit ready - study them on your own time if you are not familiar with them. * Remember your BIG goal - mastery of the theory and applications broadly is helpful than you knowing how to answer specific questions. * Use R Markdown for reproducible reports. * Learn to use R Markdown in this week's computer lab (needed for assignments). ]] .column.bg-brand-charcoal.white[.content[ # Next lesson * Data wrangling and visualisation in R # To do * Download and install toolkits ]] --- class: show-10 --- count: false