EDA using Direct Manipulation Graphics Di Cook Department of Statistics, Iowa State University Question: What is the primary goal of exploratory data analysis? Answer: Insight. The clear (and often sudden) understanding of a complex situation, the power or act of seeing into a situation; Insight implies detecting and uncovering underlying structure in the data. "The primary goal of EDA is to maximize the analyst's insight into a data set and into the underlying structure of a data set, while providing all of the specific items that an analyst would want to extract from a data set, such as: 1. a good-fitting, parsimonious model 2. a list of outliers 3. a sense of robustness of conclusions 4. estimates for parameters 5. uncertainties for those estimates 6. a ranked list of important factors 7. conclusions as to whether individual factors are statistically significant 8. optimal settings" http://www.itl.nist.gov/div898/handbook/ With the evolution of fast, graphically enabled desktop computers, exploratory data analysis has has evolved into a highly interactive, real-time, dynamic and visual process. To perceive multi-dimensional structure, the user displays multiple views of the data and links graphics objects in one plot with the other plots. Thus with basic building blocks such as histograms, scatterplots, barcharts, mosaic and double decker plots the statistician can pull out a lot of information about high-dimensional relationships in data. This talk will demonstrate the highly interactive data analysis process as applied to a microarray data set (groan?) collected at ISU. It's interesting data! There are treatments, repeated measures, and model diagnostics. I began the analysis using graphics. A student began the analysis using statistical modeling. The lists of interesting genes that we produced had almost zero overlap! Why is this so? This talk will discuss how this is possible, what we learn from each approach and what we miss. And along the way we will see how to use multivariate graphics, some new complex linking between plots, and some surprising data findings. This is joint work with Heike Hofmann, Eun-Kyung Lee, Hao Yang, Basil Nikolau and Eve Wurtele.