# Seminars in 2011

**March 25, 2011**

Qiying Wang

School of Mathematics and Statistics

Qiying Wang

University of Sydney

Title:

**Martingale limit theorems revisited and non-linear cointegrating regression**

Abstract: For a certain class of martingales, the convergence to mixture normal distribution is established under the convergence in distribution for the conditional variance. This is less restrictive in comparison with the classical martingale limit theorem where one generally requires the convergence in probability. The extension removes a main barrier in the applications of the classical martingale limit theorem to non-parametric estimates and inferences with non-stationarity, and essentially enhances the effectiveness of the classical martingale limit theorem as one of the main tools in the investigation of asymptotics in statistics, econometrics and other fields. The main result is applied to the investigations of asymptotics for the conventional kernel estimator in a nonlinear cointegrating regression, which essentially improves the existing works in literature.

**April 1, 2011**

Clare McGrory

MIPS Portfolio, Mathematical Sciences, Queensland University of Technology

Clare McGrory

Title: **Variational Bayes for Spatial Data Analysis **

Abstract: The Variational Bayes method is emerging as a viable alternative to MCMC-based approaches for performing Bayesian inference. The key strengths of the variational Bayes approach are its computational efficiency and ease of implementation. This makes it particularly useful for practical applications where large datasets are frequently encountered. In this talk we will discuss its use for spatial data or image analysis with a focus on hidden Markov random field modelling. The variational approach that we will outline will be illustrated with applications in medical imaging and environmental modelling.

**April 15, 2011**

Jonathan M. Keith

School of Mathematical Sciences, Clayton campus, Monash University

Jonathan M. Keith

Title: **Applications of Bayesian methods in Bioinformatics and Genetics **

Abstract: This talk will discuss three applications of Bayesian methods to complex large-scale problems in bioinformatics and genetics. The first application involves identifying new genes and new classes of functional element in genomes, by segmenting whole genome alignments and simultaneously classifying segments into putative functional classes. The second application involves analysis of whole-genome association study data to determine genomic regions responsible for genetic disease, using a logistic regression model with variable selection. The third application involves modelling the spread of invasive pest species, incorporating genetic data. All three applications involve large data sets and complex hierarchical Bayesian models. The models are too large and complex to be implemented in the popular WinBUGS package, and are instead implemented in C++.

**May 6, 2011**

Mohamad Khaled

Operations Management and Econometrics, Business School

Mohamad Khaled

University of Sydney

Title:

**Estimation of copula models with discrete margins**

Abstract: Estimation of copula models with discrete margins is known to be difficult beyond the bivariate case. We show how this can be achieved by augmenting the likelihood with uniform latent variables, and computing inference using the resulting augmented posterior. To evaluate this we propose two efficient Markov chain Monte Carlo sampling schemes. One generates the latent variables as a block using a Metropolis-Hasting step with a proposal that is close to its target distribution. Our method applies to all parametric copulas where the conditional copula functions can be evaluated, not just elliptical copulas as in previous Bayesian work. Moreover, the copula parameters can be estimated joint with any marginal parameters. We establish the effectiveness of the estimation method by modeling consumer behavior in online retail using Archimedean and Gaussian copulas and by estimating 16 dimensional D-vine copulas for a longitudinal model of usage of a bicycle path in the city of Melbourne, Australia. Finally, we extend our results and method to the case where some margins are discrete and others continuous.

The paper is a joint work with Professor Michael S. Smith (Melbourne Business School).

**May 20, 2011**

Andrzej Stefan Kozek

Department of Statistics

Andrzej Stefan Kozek

Macquarie University

Title:

**Data sharpening by improved quantile estimators**

Abstract: In probability density estimation the kernel method, despite its many drawbacks, remains popular, next to histograms, because of its simplicity. There exist many approaches improving the original Parzen-Rozenblatt estimator, one of them has been labelled as 'data sharpening'. The data sharpening consists in replacing the original data with slightly corrected 'sharpened data'. This correction results in reduction of bias of kernel estimators of the probability density function. We show that properly chosen nonparametric estimators of (i/(n+1))-quantiles can serve as the sharpened data and in simulations on average they consistently outperform the original estimator with the Sheather-Jones-Hall-Marron smoothing parameter.

**June 3, 2011**

David Warton

School of Mathematics and Statistics and Evolution & Ecology Research Centre

David Warton

University of New South Wales

Title:

**Unifying methods for species distribution modelling using presence-only data in ecology**

Abstract: Technology has enabled rapid advances in data analysis across multiple disciplines - with the collection of new types of data posing new challenges, and with the development of new methods for analysing data rapidly increasing our analytical capacity. An important example is species distribution modelling using presence-only data - geographic information systems (GIS) enable the study of environmental variables at a spatial resolution far higher than previously possible, and new methods of data analysis are rapidly being developed for studying how such environmental variables relate to species occurrence (or "presence-only") records.

In this talk, we show that three different methods of analysis, from the ecology, machine learning and statistical literatures, are all equivalent. This advance offers new insights on how to overcome the methodological weaknesses of the two most widely used methods for species distribution modelling using presence-only data - pseudo-absence regression and MAXENT - via the use of a point process model specification. An example issue that can now be addressed more effectively is understanding the role of spatial resolution in species distribution modelling. The increased functionality available via point process models will be discussed, and finally, a new method for accounting for observer bias proposed.

**June 10, 2011**

James P. Hobert

Department of Statistics , University of Florida

James P. Hobert

Title:

**Improving the Data Augmentation Algorithm**

Abstract: After a brief review of the data augmentation (DA) algorithm, I will introduce a simple modification that results in a new Markov chain that remains reversible with respect to the target distribution. We call this modification the "sandwich algorithm" because it involves an extra move that is sandwiched between the two conditional draws of the DA algorithm. General results will be presented showing that the sandwich algorithm is at least as good as the underlying DA algorithm in terms of both efficiency and convergence rate. An example will be used to demonstrate that, in practice, massive gains are possible. (This is joint work with K. Khare, D. Marchev and V. Roy.)

**July 29, 2011**

Yoni Nazarathy

Applied Mathematics, Faculty of Engineering & Industrial Sciences, Swinburne University of Technology

Yoni Nazarathy

Title:

**Scaling limits of cyclically varying birth-death processes**

Abstract: Fluid limits of stochastic queueing systems have received considerable attention in recent years. The general idea is to scale space, time and/or system parameters as to obtain a simpler, yet accurate description of the system. A basic example is the single server queue with time speeded up and space scaled down at the same rate. A second well known example is the Markovian infinite server queue with the arrival rate speeded up and space scaled down at the same rate. Such scalings and their network generalizations are often useful for obtaining stability conditions and approximating optimal control policies. In this talk we consider birth-death processes with general transition rates and obtain an asymptotic scaling result, generalizing the Markovian single server and infinite server cases. We apply our results to the steady-state analysis of queueing systems with cyclic or time varying behaviour. Examples are systems governed by deterministic cycles, queues with hysteresis control and queues with Markov-modulated arrival or service rates. The unifying property of such systems, is that if they are properly scaled, the resulting trajectories follow a cyclic or piece-wise deterministic behaviour which is determined by the asymptotic scaling. This yields simple a approximation for the stationary distribution which is shown to be asymptotically exact. Joint work with Matthieu Jonckheere.

**August 12, 2011**

Peter C. Thomson

Veterinary Biometry, Faculty of Veterinary Science, University of Sydney

Peter C. Thomson

Title:

**Microarray analysis: a systematic approach using mixed-models and finite-mixtures**

Abstract: Microarrays measure the expression levels of many thousands of genes simultaneously, and these measurements are usually perfumed on different physiological states (e.g. tissue types, times, strains, etc.). A frequent goal is then to determine which genes are differentially expressed (DE) as opposed to those that are not differentially expressed (non-DE). Genes are classified as being DE if their expression levels differ "significantly" between two (or more) states, or show "unusual" behaviour in a particular state. A common approach is to consider one gene at a time (e.g. t-tests or ANOVA). However this poses a multiple testing problem which may in part be overcome by false discovery rate (FDR) control.

An alternative approach is to develop a model for the evaluation of all the gene features simultaneously, and proceed as a model fitting rather than a hypothesis testing process. For this method, a two-stage analysis is performed. (1) All the normalised expression level data are analysed simultaneously using a large-scale linear mixed model. This model includes fixed effects to describe the physical design of the microarrays, as well as random effects to describe the overall effects of genes (G) and the effects of genes in different states (G.S). (2) A two-component mixture model (DE and non-DE) is then fitted to the BLUPs of the G.S effects. The mixture model is fitted using the E-M algorithm, and the process returns posterior probabilities of individual genes being DE. These posterior probabilities are used to classify which genes are DE, and an FDR-type strategy may be applied to these. However, further refinements to this process can be made by a simultaneous fitting of a mixture of linear mixed models. This method will be illustrated by the analysis of a large-scale microarray study of lactation in the tammar wallaby.

**August 26, 2011**

Georgy Sofronov

Department of Statistics, Macquarie University

Georgy Sofronov

Title:

**Change-point detection in binary sequences**

Abstract: Change-point problems (or break point problems, disorder problems) can be considered one of the central issues of mathematical statistics, connecting asymptotic statistical theory and Monte Carlo methods, frequentist and Bayesian approaches, fixed and sequential procedures. In many real applications, observations are taken sequentially over time, or can be ordered with respect to some other criterion. The basic question, therefore, is whether the data obtained are generated by one or by many different probabilistic mechanisms. The change-point problem arises in a wide variety of fields, including biomedical signal processing, speech and image processing, seismology, industry (e.g. fault detection) and financial mathematics. In this talk, we consider various approaches to change-point detection in binary sequences, using Monte Carlo simulation to find estimates of change-points as well as parameters of the process on each segment. We also demonstrate the methods for a realistic problem arising in computational biology.

**September 16, 2011**

Robert Lanfear

Centre for Macroevolution and Macroecology, Research School of Biology, Australian National University

Robert Lanfear

Title:

**Measuring mutants: statistical and mathematical challenges in understanding molecular evolution, and how to solve some of them**

Abstract: Understanding molecular evolution is both interesting in its own right, and also central to how we use and interpret the information contained in DNA sequences. It is also fraught with statistical and mathematical challenges. In this talk I will discuss a number of recent attempts to overcome some of these challenges, and describe how each new solution changed how we understand the information in DNA sequences. These challenges include how we account for variation in evolutionary processes at different sites of DNA sequences, how we can statistically compare patterns of molecular evolution in different lineages, and how we can use comparative methods to understand the causes and consequences of molecular evolution.

**September 23, 2011**

Alicia Oshlack

Murdoch Childrens Research Institute, The Royal Children's Hospital

Alicia Oshlack

Title:

**Dealing with the data deluge: differential expression analysis for RNA-seq**

Abstract: New high-throughput DNA sequencing technologies have been rapidly developing over the past 5 years and are routinely able to produce millions of short DNA reads. These technologies are being widely used to study the transcriptome in which steady state RNA is sequenced (RNA-seq). These technologies produce vast amounts of data which need to be analysed in order to make biological inference. RNA-seq data are complex and the analysis involves a series of steps for which research is ongoing. In this talk I will describe the data and outline the major steps involved in analysing an RNA-seq dataset, focusing on methods to determine differentially expressed genes between samples and to perform gene set testing.

**October 7, 2011**

Fabio Ramos

School of Information Technologies, University of Sydney

Fabio Ramos

Title:

**Making sense of the World with Machine Learning for Data Fusion**

Abstract: Physically grounded problems are typically characterised by the need to jointly infer multiple quantities from various sensor modalities, at different space and time resolutions. As an example, consider the problem of estimating a real-time spatial-temporal model of pollution dispersion in a river using mobile platforms. Given the technology available, the vehicle can sense biomass, temperature, PH and many other chemical/physical quantities. Understanding the relationships between these quantities can significantly improve the accuracy of the method while reducing the uncertainty about the phenomenon. In this talk I will show a set of techniques for nonparametric Bayesian modelling that address the challenges in spatial-temporal modelling with heterogenous sensors. In particular, I will show: 1) how to define exact and sparse models that are scalable to large datasets; 2) how to integrate data collected at different support and resolutions; and 3) how to automatically learn relationships between different quantities in real-time, from mobile platforms. I will show applications of these methods to a number of problems in robotics, mining and environment monitoring.

Short bio: Fabio Tozeto Ramos received the B.Sc. and the M.Sc. degrees in Mechatronics Engineering at University of Sao Paulo, Brazil, in 2001 and 2003 respectively, and the Ph.D. degree at University of Sydney, Australia, in 2007. From 2007 to 2010 he was an Australian Research Council (ARC) research fellow at the Australian Centre for Field Robotics (ACFR). In 2011, he commenced as a Senior Lecturer in machine learning at the School of Information Technologies, University of Sydney. He has over 70 peer-reviewed publications and received the Best Paper Award at the International Conference on Intelligent Robots and Systems (IROS) and at the Australian Conference on Robotics and Automation (ACRA). He is an associate editor for ICRA and IROS, and a program committee member for RSS, AAAI and IJCAI. His research focuses on statistical learning techniques for large-scale regression and classification problems, stochastic spatial modelling, and multi-sensor data fusion with applications in robotics and mining. He leads the Learning and Reasoning group at ACFR.

**October 28, 2011**

Aurore Delaigle

Department of Mathematics and Statistics, The University of Melbourne

Aurore Delaigle

Title:

**Nonparametric Regression from Group Testing Data**

Abstract: To reduce cost and increase speed of large screening studies, data are often pooled in groups. In these cases, instead of carrying out a test (say a blood test) on all individuals in the study to see if they if they are infected or not, one only tests the pooled blood of all individuals in each group. We consider this problem when a covariate is also observed, and one is interested in estimating the conditional probability of contamination. We show how to estimate this conditional probability using a simple nonparametric estimator. We illustrate the procedure on data from the NHANES study.

**November, 2011**

Uwe Hassler

Goethe University Frankfurt

Uwe Hassler

Abstract (pdf file)