Home - Research

Research

My research has centered on the development of statistical methodology and applications in biological and medical research. More specifically my recent work has focused on the development of statistical methods for the design and analysis of gene expression experiments using microarrays.  I am also interested in integrating expression studies and other biological metadata such as gene annotation, sequence information and clinical data.  Topics of ongoing research include:

* Design of spotted microarray experiments.

Sound experimental design facilitates simple yet powerful data analysis and interpretation. These aims must be balanced against the inevitable constraints of microarray cost, availability, and the amount of RNA available for testing and replication. Unlike classical experimental design, gene expression levels associated with a variable of interest are not measured directly with data from the spotted microarray system. In general, we observe the relative gene expression levels associated with two variables. The choice of experimental design should be determined by optimizing precision of the estimates among comparisons of interest, subject to scientific and physical constraints of the experiment. An initial review of this work is presented in #7, Yang & Speed (NRG 2002). I am interested in further exploring the relationship between the number of replicates, the effect of pooling of mRNA materials and amplification procedures.

* Image analysis and normalization of spotted microarray scanned images.

Pre-processing, such as image analysis and normalization is an important aspect of microarray experiments, that can have a potentially large impact on downstream analyses. In collaboration with Dr. Michael Buckley in CSIRO Mathematical and Information Sciences, Australia, we proposed a new addressing, segmentation, and background correction method for extracting measure of gene expression from scanned microarray images (#4, Yang et al, JCGS 2002).   I have implemented these methods in the initial version of the software package Spot.  In addition, I performed a comparison study with some existing image analysis packages suggesting that the new method implemented in Spot generally provides more accurate identification of differentially expressed genes.    Following a careful image analysis procedure, normalization is a key pre-processing step to adjust for systematic sources of variation in measured gene expression levels. My initial work proposed normalization methods that are based on robust local regression that account for intensity and spatial dependent dye biases of cDNA microarray experiments (#3, Yang et al, NAR 2002).  In collaboration with members of the Ngai Lab at University of California, Berkeley, we have also developed a novel set of control spots that aids in intensity dependent normalization.  Recently, I have utilized a splice array layout as a comparison platform for the various normalization methods and part of my ongoing work is to develop a more adaptive normalization method that is data dependent

* Software for microarray analysis packages.

Access to an efficient computing environment is an essential aspect of the analysis of genomic data. I am a core member of the Bioconductor Project, an open source software project for the analysis of biomedical and genomic data.   More specifically, in collaboration with Dr. Sandrine Dudoit of UC Berkeley, I had implemented functions for processing output from various image analysis packages, diagnostic plots, normalization and the identification of differentially expressed genes (#11, Dudoit & Yang, 2003). I plan to continue my involvement with the Bioconductor project with the emphasis of providing assessable implementation of methodologies to users of microarray technologies.

* Identification of differentially expressed genes and integrating expression studies with clinical or genomics data.

This question arises in a broad range of microarray experiments and involves identifying genes whose expression levels are associated with response variables of interest. My initial work in this area has been done with Dr. Matt Callow from the Lawrence Berkeley National Laboratory, where we examined the difference in expression levels between knock-out and control mice. Recently, in collaboration with members of the Ngai Lab at UC Berkeley, we explored possible different spatial patterns of gene expression in mouse olfactory and retina system.  Using a regression-based method to combine expression data from different spatial regions, candidate genes were identified and validated biologically (#10, Diaz et al, PNAS 2003). As a statistician of these studies, I analyzed the data and wrote the statistical methods section of this paper.  I am also a member of the Sydney University Biological Informatics Technology Centre (SUBIT). Through my involvement with this facility, I will continue to collaborate with different investigators and participate in a broad spectrum of research design and analysis of gene expression and biomedical related studies.

 

¡@