University Logo
Sydney Precision Bioinformatics

The goal of the group's work is to develop methods to improve the interpretation of omics data, leading to tangible translational benefits to important biological problems. You could become part of the team working on one of the projects detailed below.

Future Projects

Integrative Omics

Cell fate prediction using multimodal single-cell omics data and statistical learning (1 Honours or PhD Student)

While all cells from a given organism have the same DNA sequence that codes for the same genes, cells make "fate decision" to become different cell types during their differentiation to complex tissues and organs. This project aims to identify the molecular programs that underlie and predict cell fate decisions during cell differentiation and organ development. Leveraging on the recent development of multimodal single-cell omics technologies, we aim to design statistical learning models that are capable of integrating multiple data modalities of single cells for predicting cell fate decisions (see an example study link).

Contact: A/Prof. Pengyi Yang to discuss and/or apply.

Is integrative-omics the new currency for solutions to complex diseases? Exploring Correlations in High-dimensional Data (1 Honours Student)

In the surge of large volumes of high-throughput biological data being generated, more researchers are looking to integrate data of different types to inform hypotheses. For example, in complex metabolic diseases such as T2D and obesity, it is crucial to interrogate multiple data types to gain a comprehensive picture of the system defects and may eventually lead to identification of T2D or obesity markers. In this project, we aim to apply multivariate statistical approaches to integrate the data and build better predictors from multiple data sources. In this project, we will explore ways of weighting the relatively sparse proteomics data with information borrowed from the transcriptomics data. This involves, exploring or developing methods to correlate multiple high dimensional datasets to identify common and differentiating patterns.

Contact: Prof. Jean Yang to discuss and/or apply.

Machine Learning for Trans-omics Data (1 Honours or PhD Student)

A PhD position is available in developing and applying computational and machine learning models for multilayered trans-omics data sets generated by state-of-the-art mass spectrometer (MS) and next generation sequencer (NGS). Our research project, funded by the Australian Research Council (ARC), aims to develop novel machine learning algorithms to analyse and integrate large-scale MS-based omics data with ultra-fast NGS-based omics data generated from complex biological systems. Characterising the signaling cascades, transcriptional networks, and translational protein networks and their cross-talks are critical for comprehensive understanding of complex biological systems. Our large-scale multilayered trans-omics data generated from state-of-the- art technological platforms provides a unique opportunity to uncover the novel biology and molecular mechanisms that are critical for treating complex diseases and personalised medicine. In this project, we aim to develop novel machine learning methods that are capable of extracting key patterns from each omic layer and integrate such information across multiple omic layers.

Contact: A/Prof. Pengyi Yang to discuss and/or apply.

Variable Selection

Ultra-high Variable Screening (1 Student)

This project will review recent literature in ultra-high variable screening, computationally fast and ingenious methods to sort out the 'good from the ugly'. There are millions of variables - too many for any regression procedure to be run with the full data. The task is to safely eliminate that part of the design matrix which is guaranteed to be uninformative. Such variable screening is an essential preprocessing step for successful model selection methods.

Contact: A. Prof. Samuel Mueller to discuss and/or apply.

Robust Model Selection Criteria (1 Student)

Mueller and Welsh introduced methods to robustly select variables in a regression-type model using the bootstrap. This project would revisit their methods and special additional cases will be identified and investigated. One aim of the project could be to make available an R-package or at least an R-function. There are also additional algorithms that could be explored that do not require to have to consider all possible submodels. That is, how to robustly reduce the powerset of models with fast and robust methods before turning attention to more computationally expensive but more efficient model selectors is an important question as well.

Contact: A. Prof. Samuel Mueller to discuss and/or apply.

Heterogeneous Data Classification

Bayesian Approaches to Differential Distribution (1 Honours Student)

The distribution of genes is potentially informative when trying to distinguish between health samples and diseased samples. Traditionally this has been performed via a hypothesis testing approach which tests for differences in the mean gene expression levels between healthy samples and diseased samples, which is called differential expression. In this project we will perform analogous Bayesian test for differences across the whole distribution of gene expression levels between two states. A multiple testing approach will be developed to take into account false discoveries. This work will be motivated by real gene expression data from melanoma patients where it is hoped that this new approach will be able to uncover new biomarkers for the disease.

Contact: Dr. John Ormerod to discuss and/or apply.

Dissecting the heterogeneity of organoids through a data-driven approach (1 Honours or PhD Student)

“Organs-in-a-dish” or so-called organoids from human induced pluripotent stem cells have been shown to have remarkable fidelity to the human tissue. These advances coupled with state-of-the-art biotechnologies that enable the measurement of the expression levels of thousands of genes in single cells promise exciting discoveries in translational medicine and stem cell therapy. This project will focus on the application and/or development of machine learning methodology to analyse high-dimensional data generated from retinal organoids, specifically for the discovery of new surface markers of photoreceptors--a key step for the purification of organoid-derived cells for retinal transplantation. Our group specialises in bioinformatics and work with diverse high-throughput technologies, including single-cell RNA-seq, multi-omics, and trans-omics.

Contact: A/Prof. Pengyi Yang to discuss and/or apply.

Brain Omics and the Connectome

Molecular-pathology: Exploring the cellular composition of diseased tissue through 'Omics technology. (1 Student)

Gene expression profiles of post mortem tissue from whole brain are composed of multiple cell types providing a snapshot of the cellular pathology of the tissue. We can explore how these bulk tissue profiles can be integrated with single-cell sequencing data or highthroughput imaging data to describe the presence, behaviour and interaction of various cell types and how this affects disease. Integrating these data-types will require the development of novel unsupervised clustering methods and image-based segmentation and analysis approaches to quantify cell subpopulations

Contact: Dr. Ellis Patrick to discuss and/or apply.

Causal associations in Alzheimer's disease (AD) (1 Student)

Mendelian randomization (MR) exploits the natural genetic variability in a population to make causal inferences about the relationships between certain classes of molecules and disease. With a few assumptions, MR seeks to identify associations between genes and disease that are independent of potential confounders while also establishing the causal direction of these associations. Recent advances in sparse regression approaches could extend the capacity of MR to use the information in thousands of SNPs near a gene to increase the likelihood of identifying strong gene-phenotype relationships. We could explore the use of the MR framework to develop tools for filtering the list of genes whose expression is associated with AD. First we could integrate sparse regression techniques such as elastic net to improve the power of MR making it appropriate for use in a high-throughput setting. This will allow us to apply MR to the matched genotyping and gene expression data to begin prioritising causal associations in Alzheimer's disease for further investigation. We could also seek to refine our MR algorithm by including the information on cell-type proportions. This will include using the cell-type proportion estimates to reduce noise in our models and also to prioritize the cell-types through which the AD-genes are be acting. Or we could extend MR to provide a causal understanding of gene expression changes in AD at a system level. We will use MR to contribute to a directional prediction in various annotated and PPI networks to begin to establish a direction of information flow throughout the network.

Contact: Dr. Ellis Patrick to discuss and/or apply.

Dimension Reduction for Resting State fMRI Data (1 Honours Student)

Anatomical, functional and effective networks within the brain are currently being elucidated at fine temporal and spatial resolution using magnetic resonance imaging, via both functional MRI (fMRI). The concepts behind local region clustering such as superpixels are becoming increasingly popular for use in computer vision applications, data visualization and dimensional reduction strategies. This project involves exploring ideas and models for segmenting fMRI imaging data by borrowing information across multiple samples. Specific applications of this information sharing may be to improve the identification of interesting biologically features or improve sample classification in large p small n datasets.

Contact: Prof. Jean Yang or Dr. John Ormerod to discuss and/or apply.

Networks

Methods Towards Personalised Medicine (1 Honours Student)

Over the past decade, new and more powerful genomic tools have been applied to the study of complex disease such as cancer and generated a myriad of complex data. However, our general ability to analyse this data lags far behind our ability to produce it. This project is to develop statistical method that deliver better prediction of response to drug therapy. In particular, this project investigate whether it is possible to establish the patient or sample specific network based (matrix) by integrating public repository and gene expression data.

Contact: Prof. Jean Yang to discuss and/or apply.

Classification Using Statistical Networks (1 Honours Student)

Classical approaches in classification are primarily based on single features that exhibit effect size difference between classes. In omics data, this is equivalent to finding differential expression of genes or proteins between different treatment classes. Recently, network-based approaches utilising interaction information between genes have emerged and our recent work (Barter et al. 2014) further reveals that simple network based methods are able to classify alternate subsets of patients compared to gene-based approaches. This suggests that next-generation methods of gene expression signature modelling may benefit from harnessing data from external networks. This project will further explore the strength and weaknesses of utilizing statistical network as a feature in classification. The project will also extend Barter et al, 2014 by examining the effect of robust networks obtain from external databases or complementary datasets and evaluate its effect in classification (prognostic) setting.

Contact: Prof. Jean Yang to discuss and/or apply.

How do cells talk? A data-driven approach to understand the cellular signalling network of the brain (1 Honours or PhD Student)

Cell-cell communication is the main form of interaction between cells. Recent state-of-the-art technologies give us the unprecedented opportunities to measure the expression levels of thousands of genes and genomic patterns in single cells and simultaneously inspect the cells spatial context. Our group specialises in bioinformatics and the study of multi-omics and trans-omics. Single-cell data at the level of RNA, DNA, and space gives us a unique opportunity to accurately interrogate the cell-cell communication that happens between cells. This project will focus on the application and/or development of machine learning methodology to analyse high-dimensional data generated from the mouse brain to better understand the communication between cells.

Contact: A/Prof. Pengyi Yang to discuss and/or apply.