Research

My research has centred on the development of statistical methodology and applications in biological and medical research. More specifically my work has focused on the development of statistical methods for integrating –omics data from multiple platforms and devising models for incorporating related biological information with these data. Topics of ongoing research are :

Processing of various high thoughput -omics data.

Effective interpretation relies on appropriate preprocessing techniques in each and every new biotechnological data. Despite their advantages, new biotechnologies contain many intrinsic sources of systematic variation that reduce the biology to detect real biological signals. Identifying and addressing preprocessing issues is an important component before embarking on modelling and integration of data. Technologies that we have examine includes various

high throughput sequencing including (exome sequencing, RNA-seq, CAGE-seq and miRNA-seq).
Gene expression microarray from different platfroms.
miRNA expression from microarrays and RT-PCR.
Protein expression from iTRAQ and SILAC.
Clinical and pathaological data.

My recent interest is to examine effective normalization strategies for removing intrinsic sources of systematic variation from non-standard array plotforms. This includes normalization examining microRNA and iTRAQ data. Unlike the large amount of literature on normalization with gene expression and –seq data, these data are characterize by large proportion of missing values (up to 60 or 70% in iTRAQ) and large propostion of noise (up to 95%) in microRNA arrays.

Vertical data integration

With recent dramatic expansion in numbers and types of biotechnologies increases the type of available infomration that we are able to measure. These data and databases now promises groudbreaking discoveries in complex diseases. Critical to these discoveries will be the development of new powerful adaptive, robust and stable statistical methodologies to unravel underlying biological structures and information from large and complex data. In classical (horizontal) type of data integratio, data are typically collected from more samples or patients with the same measurment. In contrast, vertical (omics) data collects more genetic infomration from the same patients. As such these data are often characterise by smaller number of samples and more varaibles, sample mismatch. Successful integration faces the challenges of dealing with

understanding and developing analysis approaches for many individual platforms
developing robust models that accounts for correlated information from mutiple platforms.

Two borad area of questions are

1) To build better predicitve model for biomarker discovery.

2) What are the underlying relationship between and within these omics data.