# Seminars in 2012

**May 11, 2012**

Ray Chambers

Centre for Statistical and Survey Methodology (CSSM), University of Wollongong

Ray Chambers

Title:

**M-Quantile Regression for Binary Data wih Application to Small Area Estimation**

Asbtract: M-quantile regression models were first proposed in Breckling and Chambers (1988), and were first applied to small area estimation by Chambers and Tzavidis (2006). These models represent a robust and flexible alternative to the widespread use of random effects models in small area estimation. However, since quantiles, and more generally M-quantiles, are only uniquely defined for continuous variables, M-quantile models have to date only been applicable when the variable of interest is continuously distributed. In this presentation I will show how the M-quantile regression approach can be extended to binary data, and more generally to categorical data. I will then apply this approach to estimation of the small area average of a binary variable (i.e. a proportion). The current industry standard for estimating such a proportion is to use a plug-in version of the Empirical Best predictor based on a mixed model for the logit of the probability that the target binary variable takes the value one. I will show results from both model-based and design-based simulations that compare the binary M-quantile predictor and the plug-in EB predictor. Some tentative conclusions about the usefulness of the binary M-quantile approach will be made.

Joint work with Nicola Salvati (DSMAE, University of Pisa) and Nikos Tzavidis (S3RI, University of Southampton)

**September 21, 2012**

Alan Huang

School of Mathematical Sciences, University of Technology Sydney

Title:

**Joint estimation of the mean and error distribution in generalised linear models**

Asbtract: We show that both the mean-model and the error distribution in a generalised linear model (GLM) can be estimated simultaneously from data, with the proposed estimator being consistent and jointly asymptotically normal in distribution. We also show how to construct asymptotically valid inferences for the mean-model without having to specify any error distribution or variance function for the data. This provides a distinct alternative to quasilikelihood based methods. Simulations demonstrate that the proposed approach performs remarkably well even for smaller sample sizes.

**August 24, 2012**

Rachel Wang

University of California, Berkeley

Title:

**Inferring gene networks using sparse canonical correlation analysis**

Asbtract: The study of networks pervades many disciplines of science as a way of analyzing complex systems with interacting components. Some of these networks contain clusters of densely connected nodes known as communities and a close examination of them allows for insights into how the individual components behave. In particular, this concept is commonly used in the study of biological pathways in gene networks. We develop a two-stage procedure for estimating gene networks and inferring pathways as tightly connected communities. We propose a novel way of assigning edge weights in a gene network, which relies on sparse canonical correlation analysis with repeated subsampling and random partition. Based on the estimated network, community structures can be identified using the block model or hierarchical clustering. Our approach is conceptually appealing, allowing for the detection of higher level group interactions among genes. As demonstrated using both simulated and real data, our procedure achieves considerably lower false positive rates than the Pearson's correlation approach without assuming any prior biological knowledge.

**May 18, 2012**

Chris Lloyd

Melbourne Business School, University of Melbourne

Chris Lloyd

Title:

**Computing Highly Accurate Confidence Limits from Discrete data using Importance Sampling**

Asbtract: For discrete data, confidence limits based on a normal approximation to standard likelihood based pivotal quantities can perform poorly even for quite large sample sizes. To construct exact limits requires the probability of a suitable tail set as a function of the unknown parameters. In this paper, importance sampling is used to estimate this surface and hence the confidence limits. The technology is simple and straightforward to implement. Unlike the recent methodology of Garthwaite and Jones (2009), the new method allows for nuisance parameters; is an order of magnitude more efficient than the Robbins-Monro bound; does not require any simulation phases or tuning constants; gives a straightforward simulation standard error for the target limit; includes a simple diagnostic for simulation breakdown.

**June 8, 2012**

John Robinson

School of Mathematics and Statistics

John Robinson

University of Sydney

Title:

**Relative Error of the Bootstrap for Serial Correlation**

Asbtract: Consider the first serial correlation coefficient under an AR(1) model where errors are not assumed to be Gaussian. In this case it is in general necessary to consider bootstrap approximations for tests based on the statistic when the distribution of observations is unknown. We obtain saddle-point approximations for tail probabilities of the statistic and its bootstrap version and use these to show that the bootstrap tail probabilities approximate the true values with given relative errors, thus extending the classical results of Daniels [Biometrika 43 (1956) 169-185] for the Gaussian case. The methods require conditioning on the set of odd numbered observations.

**April 27, 2012**

Simon Ho

School of Biological Sciences, The University of Sydney

Simon Ho

Title:

**Estimating evolutionary timescales using DNA**

Asbtract: The estimation of evolutionary time-scales is pivotal to a wide range of biological studies. It allows us to place our analyses and interpretations into a temporal context, and provides us with insights into rates of speciation, population divergence, and other evolutionary processes. Despite the apparent simplicity of the “molecular clock”, estimating evolutionary time-scales from DNA is not a trivial exercise. A wide range of problems can affect the estimation process, including calibration errors and rate variation among lineages. Over the past decade, there have been numerous developments in this field, allowing the molecular evolutionary process to be modelled with increasing sophistication. In addition, there are now various ways of using information from the fossil record to calibrate the molecular clock. In this talk, I will provide a brief overview of the current state of the field.

**November 9, 2012**

Tanya Garcia

Texas A&M University

Title:

**Nonparametric estimation for censored mixture data with application to the Cooperative Huntington's Observational Research Trial**

Asbtract: This work presents methods for estimating genotype-specific distributions from genetic epidemiology studies where the event times are subject to right censoring, the genotypes are not directly observed, and the data arise from a mixture of scientifically meaningful subpopulations. Examples of such studies include kin-cohort studies and quantitative trait locus (QTL) studies. Current methods for analyzing censored mixture data include two types of nonparametric maximum likelihood estimators (NPMLEs) which do not make parametric assumptions on the genotype-specific density functions. Although both NPMLEs are commonly used, we show that one is inefficient and the other inconsistent. To overcome these deficiencies, we propose three classes of consistent nonparametric estimators which do not assume parametric density models and are easy to implement. They are based on the inverse probability weighting (IPW), augmented IPW (AIPW), and nonparametric imputation (IMP). The AIPW achieves the efficiency bound without additional modeling assumptions. Extensive simulation experiments demonstrate satisfactory performance of these estimators even when the data are heavily censored. We apply these estimators to the Cooperative Huntington's Observational Research Trial (COHORT), and provide age-specific estimates of the effect of mutation in the Huntington gene on mortality using a sample of family members. The close approximation of the estimated non-carrier survival rates to that of the U.S. population indicates small ascertainment bias in the COHORT family sample. Our analyses underscore an elevated risk of death in Huntington gene mutation carriers compared to non-carriers for a wide age range, and suggest that the mutation equally affects survival rates in both genders. The estimated survival rates are useful in genetic counseling for providing guidelines on interpreting the risk of death associated with a positive genetic testing, and in facilitating future subjects at risk to make informed decisions on whether to undergo genetic mutation testings.

**August 31, 2012**

Mohamad Khaled

Business School, University of Technology Sydney

Title:

**Copula parameterizations for binary tables**

Asbtract: There exists a large class of parameterizations for contingency tables, such as marginal log-linear and graphical models. Copula models have only recently started to be widely used for modelling dependence among discrete variables but with no clear connection to the prevalent existing methodologies for contingency tables. This paper develops a rigorous mathematical framework showing some existence and identification criteria linking a sub-class of marginal log-linear models with copula parameterizations in binary contingency tables. Using combinatoric results such as Mobius inversion in lattices, a bijective mapping between the different parameterizations is derived. Several illustrative examples are given as well.

**March 30, 2012**

David Hobson

Department of Statistics, University of Warwick

David Hobson

Title:

**Robust hedging of variance swaps**

Asbtract: the twin assumptions of continuous monitoring and a price process which is continuous, Dupire and Neuberger separately showed how the variance swap can be replicated perfectly with an investment in the underlying and the puchase of -2 (minus two) log contracts. The log contracts can be replicated with vanilla options so that if calls and puts are liquidly traded then the variance swap has a unique model independent price.

But what if the price process is not continuous, or if, as is always the case in practice, the payoff of the variance swap is based on price changes over a finite number of dates?

We show how to construct best possible sub- and super-hedges for the variance swaps, and describe the worst case models. A perhaps suprising corollary is that the effects of jumps depends crucially on the precise formulation of the kernel of the variance swap.

**May 25, 2012**

Barry Quinn

Department of Statistics, Macquarie University

Barry Quinn

Title:

**Estimating the period of a Point Process/ Estimating the parameters in an exponentially damped sinusoid**

Asbtract: The motivation for the first problem is from passive radar. Pulses are received from a radar emitter and the times of arrival recorded. However, some pulses may not be recorded and some may be recorded more than once. There is also additive noise in the recorded times. The model for the arrival times is a linear regression with unknown but integer-valued independent variable. The slope represents the period of transmission, and is known to lie in an interval for which the upper limit is twice the lower. Two estimation methods are suggested, and their asymptotic properties discussed. The work has been conducted jointly with Vaughan Clarkson of the University of Queensland, and Robby McKilliam of the University of South Australia. I started working on the second problem when my son James came home from his Sydney University Physics practical with some data from a damped mass-spring system, for which he had to estimate the period or frequency and the damping coefficient. The first solution to the problem was obtained by Prony (1795). We have adapted a simple technique I developed for frequency estimation, based on Fourier coefficients. I shall discuss the asymptotic theory and the simple ideas behind the development of the algorithm.

**October 19, 2012**

Jan Luts

School of Mathematical Sciences, University of Technology Sydney

Title:

**Real-time semiparametric regression**

Asbtract: Semiparametric regression is an extension of regression that permits incorporation of flexible functional relationships using basis functions, such as splines and wavelets, and penalties and is now well-developed for cross-sectional, longitudinal and spatial data. Almost all semiparametric regression analyses process the data in batch. That is, a data set is fed into a semiparametric regression procedure at some point in time after its collection. This talk discusses doing semiparametric regression in real time, with data processed as it is collected and made immediately available via modern telecommunications technologies. Regression summaries may be thought of as dynamic web-pages or iDevice apps rather than static tables and figures on a piece of paper. Online processing of data is an old idea and has a very large literature. Our work uses Bayesian approaches, that handle automatic choices of smoothing parameters, and make use of fast variational approximations that are amenable to online updating. This talk represents joint research with Professor Matt P. Wand.

**September 7, 2012**

Justin Wishart

School of Mathematics and Statistics, University of New South Wales

Title:

**Wavelet Deconvolution in a Periodic Setting with Long-Range Dependent Errors**

Asbtract: In this paper, a hard thresholding wavelet estimator is constructed for a deconvolution model in a periodic setting that has long-range dependent noise. The estimation paradigm is based on a maxiset method that attains a near optimal rate of convergence for a variety of L_p loss functions and a wide variety of Besov spaces and the strong dependence. The effect of long-range dependence is detrimental to the rate of convergence. The method is implemented using a modification of the WaveD-package in R and an extensive numerical study is conducted. The study supplements the theoretical results and to compares the LRD estimator with naively using the standard WaveD approach.

**October 26, 2012**

**Please note unusual venue, Access Grid Room Carslaw Room 829!**

Peter J. Green

University of Technology, Sydney, and University of Bristol

Title:

**Emission tomography and Bayesian inverse problems**

Asbtract:
Inverse problems are almost ubiquitous in applied science and
technology, so have long been a focus for intensive study, aimed at both
methodological development and theoretical analysis. Formulating inverse
problems as questions of Bayesian inference has great appeal in terms of
coherence, interpretability and integration of uncertainty: the Bayesian
formulation comes close to the way that most scientists intuitively
regard the inferential task, and in principle allows the free use of
subject knowledge in probabilistic model building. However, it is
important to understand the relationship between the chosen Bayesian
model and the resulting solution.

We discuss the Bayesian approach to reconstruction in single-photon
emission computed tomography, giving several empirical illustrations. We
also present theoretical results from joint work with Natalia Bochkina
(Edinburgh) about consistency of the posterior distribution of the
reconstruction, and a version of the Bernstein--von Mises theorem that
quantifies the efficiency of Bayesian inference for such ill-posed
generalised linear inverse problems with constraints.

**March 23, 2012**

John Kolassa

Department of Statistics, Rutgers University

John Kolassa

Title:

**Infinite Parameter Estimates in Polytomous Regression**

Asbtract: We discuss a variety of methods for inference for polytomous regression models,when model parameters are estimated at infinity. The suggested result is a special case of the method of Kolassa (1997). Competing techniques, including that of Firth (1993), are described. As time permits, connections to proportional hazards regression are discussed. Join work with Juan Zhang.