Type: Seminar

Distribution: World

Expiry: 19 May 2017

CalTitle1: The glue that binds statistical inference, tidy data, grammar of graphics, data visualisation and visual inference

Auth: jormerod@1.152.97.42 (jormerod) in SMS-WASM

Abstract: Buja et al (2009) and Majumder et al (2012) established and validated protocols that place data plots into the statistical inference framework. This combined with the conceptual grammar of graphics initiated by Wilkinson (1999), refined and made popular in the R package ggplot2 (Wickham, 2016) builds plots using a functional language. The tidy data concepts made popular with the R packages tidyr (Wickham, 2017) and dplyr (Wickham & Francois, 2016) completes the mapping from random variables to plot elements. Visualisation plays a large role in data science today. It is important for exploring data and detecting unanticipated structure. Visual inference provides the opportunity to assess discovered structure rigorously, using p-values computed by crowd-sourcing lineups of plots. Visualisation is also important for communicating results, and we often agonise over different choices in plot design to arrive at a final display. Treating plots as statistics, we can make power calculations to objectively determine the best design. This talk will be interactive. Email your favourite plot to dicook@monash.edu ahead of time. We will work in groups to break the plot down in terms of the grammar, relate this to random variables using tidy data concepts, determine the intended null hypothesis underlying the visualisation, and hence structure it as a hypothesis test. Bring your laptop, so we can collaboratively do this exercise. Joint work with Heike Hofmann, Mahbubul Majumder and Hadley Wickham