Statistical Bioinformatics Seminar

The aim of the statistical bioinformatics seminar is to provide a forum for people working within the broad area of computation and statistics and their application to various aspects of biology to present their work and showcase their ongoing projects. It is intended to foster the exchange of ideas and build potential collaborations across multiple disciplines.

To be added to the mailing list, fill out this form. For any other information, please contact Dario Strbenac or Ellis Patrick.

Seminars in 2020, Semester 2

The seminars are held at 1:00 pm on Mondays and broadcast using Zoom. The format of the talk is approximately 25 minutes and 5 minutes of questions.

Monday November 23 2020

Speaker: Dr. Yang Liao (Olivia Newton-John Cancer Research Centre)

Title: Rsubread: Ultrafast Read Mapping and Quantification of Next-Generation Sequencing Data

Abstract: Rsubread is a Bioconductor R package that encompasses multiple tools for fast and accurate analysis of Next-Generation Sequencing (NGS) data. Major functionalities of this toolbox include mapping of RNA and DNA sequencing reads to a reference genome, counting reads to genomic features such as genes, exons, junctions and genomic intervals (quantification) and discovery of genomic mutations. Challenges in existing pipelines will be highlighted and the basic ideas behind Rsubread to overcome these challenges will be explained. Functionalities of Rsubread and a demonstration of how to use it in the NGS data analysis will be shown. Results comparing Rsubread to other software tools for the mapping and quantification of sequence reads will be shown. The results show that Rsubread achieves a superior computational efficiency' than the other software tools without compromising specificity and sensitivity.

About the speaker: Yang is a post-doctoral researcher.

Monday November 16 2020

Speaker: Dr. Rebecca Poulos (Children's Medical Research Institute)

Title: Exploring the Landscape of Pan-cancer Proteogenomics with Cancer Cell Lines

Abstract: The combination of genomic and proteomic data can reveal novel insights relating to important biological processes in cancer. The acquisition of proteomic data From 979 cancer cell lines, derived from over 7000 data independent acquisition mass spectrometry runs generated at ProCan, Sydney, is presented. Proteomic data is integrated with existing multi-omic datasets available via the COSMIC project, and proteogenomic relationships are characterised.

About the speaker: Rebecca is an Early Career Research Fellow in Cancer Data Science.

Monday November 9 2020

Speaker: Ms. Yue Cao (University of Sydney)

Title: Benchmarking of Simulation Methods for Single-cell RNA-seq Data

Abstract: Single-cell RNA-sequencing (scRNA-seq) is a powerful technique for profiling the transcriptome at the single cell resolution and has gained tremendous popularity since its emergence in 2009. In recent years, there has been an increasing number of simulation tools designed specifically for simulating scRNA-seq data. For simulation data to be useful to aid in the development of analytical algorithms, simulation methods must generate a faithful and realistic representation of the scRNA-seq data. Using a systematic framework, the aim of our study is to evaluate each method at capturing the underlying biological structure of scRNA-seq datasets. In our evaluation framework, we evaluate a total of 12 simulation methods across 35 diverse datasets with a variety of tissue types, biological conditions and sequencing platforms. We discover that some measures are harder to capture by current methods than others and identified areas that could benefit from further methodological development.

About the speaker: Yue is a PhD student supervised by Professor Jean Yang and Dr. Pengyi Yang.

Monday November 2 2020

Speaker: Dr. Weichang Yu (University of Mebourne)

Title: Gaussian Process Discriminant Analysis for Classification of Proteomics Mass Spectra

Abstract: Biomarker detection and prognostic classification are common steps in the analysis of proteomics mass spectrometry data. However, many existing classifiers do not incorporate the spectral nature of the data properly which may result in poor classification performance. In this talk, I will describe a newly developed Gaussian process discriminant analysis that is suitable for classifying mass spectrometry data. The proposed model incorporates feature selection and classification within a unified framework. The spectral nature of the data is accounted for with an appropriate covariance function. The computational efficiency of the model is kept within a reasonable range through variational Bayes and computational shortcuts. I will conclude the talk with numerical results based on simulated and real datasets.

About the speaker: Weichang is a Research Fellow in Statistical Data Science.

Monday October 26 2020

Speaker: Dr. Fabio Zanini (University of New South Wales)

Title: The Art of Generating Hypotheses from Single Cell Data

Abstract: Single cell transcriptomic data are being ammassed by many laboratories and are revealing an amazing and sometimes overwhelming degree of heterogeneity within organisms, tissues, and even within each individual cell type. The cell similarity graph or network is a mathematical object at the core of such data sets, encoding phenotypic heterogeneity in a simplified yet powerful form. I will give an overview of my lab operations, centered around cell graphs and aimed at generating sound and interesting hypotheses for biomedicine via data exploration and deep experimental collaborations. I will present northstar, a new cell clustering/classification approach that is particularly well suited for cancer and developmental biology. I will then discuss two of our recent adventures in biomedicine: (1) constructing a cell atlas of the neonatal lung and (2) understanding the corruption of hematopoietic gene regulatory networks during acute myeloid leukemia.

About the speaker: Fabio is Group Leader of the Data Driven Biomedicine research group.

Monday October 19 2020

Speaker: Mr. Fang Hu (University of Hong Kong)

Title: Optimising Tumor Mutation Burden Estimation from Targeted Panel Sequencing Data

Abstract: Tumor mutation burden (TMB) has emerged as a predictive marker for responsiveness to immune checkpoint blockade in multiple tumor types. As the gold standard, TMB is quantitated from whole exome data, but in a clinical setting it is generally approximated from targeted panel sequencing data. In this study, we systematically evaluate parameters that could affect the panel-based TMB (pTMB) assessment including panel size, gene content and local mutation determinants. By analysis simulated pTMB across different independent cohorts, we found that panels that based on cancer genes usually overestimate TMB, leading to misclassification of patients to receive improper therapy. This might be caused by positive selection for mutations on cancer genes and unlikely alleviate by removal of hotspots. To overcome this issue, we develop a parsimonious model that is capable of optimising pTMB estimation, with improved performance for patient stratification to clinical management. These findings may be immediately applicable for guiding accurate TMB approximation based on targeted panel sequencing data.

About the speaker: Fang is a PhD student supervised by Dr. Jason Wong.

Monday October 12 2020

Speaker: Dr. Ignatius Pang (University of New South Wales)

Title: Mapping the Transcriptome Architecture and RNA-RNA Interactions of the Multi-drug Resistant Staphylococcus aureus Uncovers a Mechanism of Antibiotic Resistance

Abstract: Treatment of methicillin-resistant Staphylococcus aureus (MRSA) infections is dependent on the efficacy of last-line antibiotics like vancomycin. Changes in expression of regulatory RNA transcripts have been correlated with antibiotic stress responses in vancomycin-intermediate resistance isolates. The 5’ and 3’ untranslated regions (UTRs) of mRNAs are often the site of regulatory RNA interactions but these UTRs regions are often poorly annotated and uncharacterized. This talk will explore the use of three RNA sequencing techniques (RNA-seq, dRNA-seq and Term-Seq) to identify transcripts and their start and termination sites and the use of the ANNOgesic pipeline to analyse these data and generate a detailed transcriptome architecture of the methicillin-resistant Staphylococcus aureus JKD6009. We also discuss the use a custom Snakemake pipeline to identify RNA-RNA interactions from sequencing data generated from the endoribonuclease RNase III capture and RNA proximity-dependent ligation technique termed CLASH. We identified over 900 RNA-RNA interactions, which suggested mRNA-mRNA regulation of co-expression are much more widespread than previously appreciated.

About the speaker: Ignatius is a post-doctoral researcher.

Monday September 28 2020

Speaker: Dr. Juan Molina Ortiz (Charles Perkins Centre, University of Sydney)

Title: Modelling the Gut Microbiome: From One-size-fits-all to Personalised Dietary Interventions to Improve Health Outcomes

Abstract: Prevalence of non-communicable disease is on the rise and has become a public health concern. In recent decades, the gut microbiome has been linked to the onset of myriad non-communicable diseases making it an appealing target for intervention. One of the alternatives to intervene the gut microbiome is through diet, due to which several dietary interventions are readily available to the public. These are often one-size-fits-all approaches even when evidence attests that we all have a different set of microbes living in our gut. There is an urgent need for interventions that can be tailored to our individual requirements. Here we argue that in order to develop said interventions a better mechanistic understanding of how health outcomes emerge from individual microbes is required. Computational modelling can help us enable this knowledge by allowing us to examine the gut microbiome from novel perspectives.

About the speaker: Juan is a medical doctor who is researching systems biology.

Monday September 21 2020

Speaker: Dr. Emily Wong (Victor Chang Cardiac Research Institute)

Title: Impact of Ageing on Lung Regeneration and Tumorigenesis

Abstract: Ageing is an unstoppable process and the strongest risk factor for common diseases. We combine the power of comparative genomics and single-cell technologies to understand the key intrinsic signals involved in the loss of molecular identity and robustness in lung aging, and explore the effect of exercise on remodelling lung gene regulatory networks.

About the speaker: Emily is Head of Regulatory Genomics.

Monday September 14 2020

Speaker: Mr. Justin Miller (Western Sydney University)

Title: Predicting Cataract Development via in silico Gene Knockdown Within a Universe of Lens Signalling Pathways and Associated Gene Regulatory Networks

Abstract: SPAGI is designed to use known protein-protein interactions (PPIs) to simulate potential growth signalling pathways and then use RNA-seq expression data to filter these pathways and predict which paths are expressed in a given cell. In this method, potential paths are scored based on combined PPIs and the highest scoring path is retained, discarding the rest of the information. This method is then leveraged to create a virtual gene knockdown experiment. A machine learning model was trained to examine the difference in cell signalling pathways where genes have been removed that will result in cataracts being produced in development and genes that are not known to cause cataracts. This model was able to successfully predict which cell signalling universe a gene was from in two thirds of cases, making this a potentially useful method for identifying cataract-causing genes.

About the speaker: Justin Miller is a data analyst for Clayton Utz law firm.

Monday September 7 2020

Speaker: Mr. Frederick Jaya (University of Technology Sydney)

Title: Evaluation of Recombination Detection Methods for Viral Sequence Analysis

Abstract: In order to accurately infer the evolutionary history of viral genomes, the process of recombination needs to be accounted for and addressed appropriately. A vast choice of recombination detection methods have been developed over the past 20 years, but their ability to address the needs presented by high-throughput sequencing of viral data is unclear. Here I present an overview of five published methods for detecting viral recombination (PhiPack (Profile), 3SEQ, GENECONV, UCHIME and gmos), by comparing their statistical approaches and the results from simulated and empirical analyses of +ssRNA viral data. I present the key considerations and guidelines in selecting appropriate methods for viral analyses, with examples of how sequence diversity may mislead some methods. Finally, I present how these methods scale to analyse large datasets.

About the speaker: Frederick Jaya is a student supervised by Professor Aaron Darling and Dr. Barbara Brito-Rodriguez

Monday August 31 2020

Speaker: Dr. Ralph Patrick (Victor Chang Cardiac Research Institute)

Title: Tracing Cardiac Cell Networks and Dynamics Across Homeostasis, Injury and Augmented Heart Repair at Single-cell Resolution

Abstract: An overview of the scRNA-seq projects in our research group will be made, including some of the bioinformatics method development we have engaged with, including intercellular communication analysis and differential transcript use. Biological applications such as understanding the role of cardiac fibroblasts in heart injury and repair, and how these processes are modulated in different contexts such as therapy models or genetic knockouts will be discussed.

About the speaker: Ralph Patrick is a post-doctoral researcher.

Monday August 24 2020

Speaker: Ms. Yingxin Lin (University of Sydney)

Title: Transfer Learning for Data Integration of Single-cell RNA-seq and ATAC-seq

Abstract: Single-cell transcriptomics profiling with single-cell RNA-seq (scRNA-seq) has provided unprecedented resolution in charatersing cell identities, cell functions across diverse tissues and conditions. Recent advances in measuring multiple modalities of single cells, such as single-cell ATAC sequencing (scATAC-seq), further enable characterisation of cells from different aspects. While scATAC-seq data provides the epigenomics profiling of cells, its extreme sparsity leads to its lack of the power of cell type identification. Therefore, integrative analysis of scRNA-seq and scATAC-seq allows not only cell type label transferring but also better understanding of the cellular phenotypes. We develop an end-to-end transfer learning algorithm, scJoint, to integrate scRNA-seq and scATAC-seq data. By building an integrative framework with neural network based dimension reduction and semi-supervised cell type prediction model, our algorithm is able to transfer labels from scRNA-seq to scATAC-seq data and construct a joint embedding for the two modalities. We illustrate our algorithm with two mouse cell atlas data from scRNA-seq and scATAC-seq data. We found that our algorithm outperforms the existing methods by a large margin in both joint visualisation of two modalities and cell type prediction.

About the speaker: Yingxin is a PhD student supervised by Professor Jean Yang and and Associate Professor John Ormerod.

Monday August 17 2020

Speaker: Ms. Chelsea Mayoh (Children’s Cancer Institute)

Title: Improving the Actionability of RNA-seq in High-Risk Paediatric Cancer

Abstract: The Zero Childhood Cancer (ZERO) program provides a comprehensive precision medicine approach to High-Risk paediatric malignancies (less than 30% survival) to improve treatment outcomes. We developed a pipeline to increase the utility of transcriptome sequencing (RNA-seq) in precision medicine to identify driver fusions, somatic mutations from RNA and over-/under-expressed genes. Through deeper exploration of RNA-seq beyond expression analysis and integration with whole genome sequencing, the RNA-seq pipeline has expanded the targeted therapeutic options to 72% of patients and a driver mutation identified in 94%. Here we will present our bioinformatic approaches to integrating the pipelines and the additional clinical utility a comprehensive RNA-seq pipeline provides and its impact on patient management and response.

About the speaker: Chelsea is a PhD student supervised by Associate Professor Mark Cowley.

Monday August 3 2020

Speaker: Ingrid Tarr (Victor Chang Cardiac Research Institute)

Title: The Use of Duplicate Samples to Improve Rare Variant Quality Control in Whole Genome Sequencing Studies

Abstract: Whole genome sequencing has transformed our ability to detect associations between phenotypes and genetic variants, however, the amount of erroneous variant calls has also drastically increased. Even with low error rates, a significant quantity of called variants will be false positives. These are particularly concerning in unbiased genome-wide rare variant analyses, where a smaller number of false variants can have a meaningful impact on results while broad confirmation of variants is unfeasible. Currently, evidence informing rare variant filtering is lacking and there is no consensus regarding indicators of poor variants. The ability of common GATK metrics to discriminate true-positive and false-positive rare variant calls from samples sequenced in duplicate will be discussed.

About the speaker: Ingrid Tarr is a post-doctoral researcher.

Seminars in 2020, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 29 2020

Speaker: Dr. Sarah Romanes (BCG GAMMA)

Title: From PhD Student to Consultant: Changing Careers During a Global Pandemic

Abstract: I share my experiences transitioning from a bioinformatics PhD student at the University of Sydney, to a data scientist working in the world of management consulting. I will discuss the biggest differences I have observed between analysis in academia and consulting, how I have managed to adapt to these changes during the COVID-19 crisis, and what you can expect if you decide that you are interested in a career in consulting!

About the speaker: Sarah Romanes is a statistical bioinformatician turned commercial data scientist.

Monday June 22 2020

Speaker: Dr. Seyhan Yazar (Garvan Institute of Medical Research)

Title: Single-cell eQTL Mapping Identifies Cell-type Specific Control of Complex Disease

Abstract: Genome-wide association studies in large populations have enriched our understanding of genetic variants implicated in health and disease while expression quantitative trait loci (eQTL) studies with microarray and bulk-RNA sequencing data showed us how these genetic variants affect the expression of one or more genes in a tissue-specific manner. However, it is much less known how genetic variants influence gene expression in various cell types within a tissue. This study, therefore, set to identify the cell-specific eQTLs in the human immune cells using single-cell sequencing technology. We have performed conditional cis-eQTL analysis on 14 cell types in 1242226 immune cells from 993 healthy human subjects and identified thousands of independent cis-eQTLs across 14 different immune cell types. We show that the majority of these eQTL were unique to an individual cell-type; however, eQTLs shared across the hematopoietic lineage are also identified. Linking GWAS variants with cis-eQTLs within different cell types, we were able to show disease variants exert their effects in specific cell types. We have shown cell-specific control of immune system disease and established a healthy immune cell resource at single-cell resolution to prioritise disease-associated variants in functional studies.

About the speaker: Seyhan Yazhar is a postdoctoral research in Associate Professor Joseph Powell's research group.

Monday June 15 2020

Special Event: Statistical Bioinformatics and AI for Cancer Care Symposium < Download Video

Speaker: Prof. Pablo Fernandez-Peñas (University of Sydney)

Title: Maths is Core to Cancer Care: The Melanoma Case

Abstract: In clinical care, we use numbers to understand most of the biology components of the human being and decide diagnosis and treatments. Measures such as blood pressure or glucose levels have been around for too many years. But there has been areas that have escaped to quantification and analysis, and one of the most critical ones for skin cancer and melanoma in particular is imaging. Dermatology is a visual specialty that relies on the skills of humans to make diagnosis. To add more complexity, if clinicians can’t make a diagnosis, the biopsies they take to help them are read by humans using, again, their visual skills. The time is coming for a more objective, quantifiable measurements to help with these visual challenges, and for this information to be combined with other sources of clinical data.

About the speaker: Pablo Fernández-Peñas is Professor in Dermatology at Sydney Medical School

Speaker: Assoc. Prof. Jinman Kim (University of Sydney)

Title: AI for Biomedical Image Analysis – Experiences with Skin Lesion Images

Abstract: Medical imaging has an indispensable role in patient management in modern healthcare. There are numerous medical imaging modalities available; they vary in complexity and ‘sophistication’ from plain digital chest X-rays to simultaneous functional and anatomical imaging with positron emission tomography (PET) and computed tomography (CT) imaging (PET-CT) using the one device. The challenge is now on how to maximize the extraction of meaningful information from the images while not overloading the user. Fortunately, in parallel to the imaging improvements, we are in an era of artificial intelligence (AI) fuelling the growth of smart decision support and analysis tools for medical image interpretation. In a matter of few years, we have seen rapid rise in research algorithms being integrated to computed aided diagnosis (CAD) systems for clinical use. Yet from an engineering view, we are only at the infancy of the AI revolution towards healthcare. This talk will present the trend in AI development for medical images, with examples on our research in skin lesion image analysis.

About the speaker: Jinman Kim is director of the Visual TeleHealth Lab, the Biomedical and Multimedia Information Technology Research Group in School of Computer Science.

Monday June 1 2020

Speaker: Assoc. Prof. Joshua Ho and Mr. Hephaes Chau (University of Hong Kong)

Title: A Statistical Method to Identify Cell Types with Differential Abundance in Single Cell RNA-seq Data

Abstract: Medical imaging has an indispensable role in patient management in modern healthcare. There are numerous medical imaging modalities available; they vary in complexity and ‘sophistication’ from plain digital chest X-rays to simultaneous functional and anatomical imaging with positron emission tomography (PET) and computed tomography (CT) imaging (PET-CT) using the one device. The challenge is now on how to maximize the extraction of meaningful information from the images while not overloading the user. Fortunately, in parallel to the imaging improvements, we are in an era of artificial intelligence (AI) fuelling the growth of smart decision support and analysis tools for medical image interpretation. In a matter of few years, we have seen rapid rise in research algorithms being integrated to computed aided diagnosis (CAD) systems for clinical use. Yet from an engineering view, we are only at the infancy of the AI revolution towards healthcare. This talk will present the trend in AI development for medical images, with examples on our research in skin lesion image analysis.

About the speaker: Joshua Ho is an Associate Professor at the School of Biomedical Sciences and he supervises the student Hephaes Chau.

Monday May 25 2020

Speaker: Mr. Taiyun Kim (University of Sydney)

Title: scReClassify: Post-hoc Cell Type Classification of Single Cell RNA-seq Data

Abstract: Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. A semi-supervised learning framework, scReClassify, for ‘post hoc’ cell type identification from scRNA-seq datasets is developed. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, scReClassify is shown to be able to accurately identify and reclassify misclassified cells to their correct cell types. scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from

About the speaker: Taiyun is a PhD student supervised by Dr. Pengyi Yang.

Monday May 18 2020

Speaker: Dr. David Humphreys (Victor Chang Cardiac Research Institute)

Title: Annotation: The Backbone and Achilles Heel of Genome Bioinformatics

Abstract: Gene models are an important, sometimes essential, requirement for many genomic bioinformatic pipelines. Interpreting transcript information from gene models can be difficult and something that one bioinformatic tool cannot comprehensively cover. This is especially true for RNA-sequencing (RNA-Seq) where an ever-increasing number of techniques continue to be developed. The VCCRI genomic core is responsible for contributing to the analysis of RNA-Seq and whole genome sequencing (WGS) data sets. In this presentation, I will highlight observations that were made outside the use of standard tools and how this led to the development of two new bioinformatic tools called Ularcirc and Sierra. From this analysis we conclude that the heart transcriptome is remarkably complex and that a complete annotated gene model does not yet exist. We anticipate that improving the annotation and interpretation of gene models will lead to better interpretation of variants that are detected in WGS.

About the speaker: David manages the Genomics and Bioinformatics Core Facility at the Victor Chang Cardiac Research Institute where he carries out all aspects of library preparation, sequencing and downstream analysis.

Monday May 11 2020

Speaker: Dr. Ismael Vergara (Melanoma Institute of Australia and Peter MacCallum Cancer Centre)

Title: Evolution of Late-stage Metastatic Melanoma is Dominated by Tetraploidization and Aneuploidy

Abstract: Australia has one of the highest incidences of melanoma in the world and it has been referred to as our national cancer. Survival rates for melanoma are poor if not caught early. Recently, understanding of the molecular events that dominate the landscape of early disease has benefited from genomic sequencing, but how melanoma evolves into its metastatic and lethal form is poorly understood. To help rectify this, a rapid autopsy program, CASCADE (CAncer tiSsue Collection After DEath) was established at the Peter MacCallum Cancer Centre that provides multi-region sampling of metastases from patients at time of death. We obtained sequencing data from more than 70 samples from 13 patients, including WES and WGS. The matricial nature of this dataset prompted us to apply an analysis approach that builds on existing methods and makes use of the multiple samples from each patient.

Our analysis reveals striking patterns in the evolution of lethal melanoma. While early melanomas have large numbers of single nucleotide variants, we generally observed limited subsequent SNV and indel gain. Rather, evolution was dominated by large-scale copy number change including a remarkable level of loss of heterozygosity in some patients. In one case, multicore sampling revealed spatial heterogeneity in copy number of the primary tumour. Patterns of copy number change hinted that two mutational processes, aneuploidy and genome doubling, were operating universally. To test this we developed a novel method that models these mechanisms using branching processes.

Our findings in lethal melanoma suggest possible biomarkers that might be useful clinically in challenging settings where patients present significant clinical heterogeneity such as stage III disease. We are further developing these using a training set of 55 sequencing datasets from primary disease

About the speaker: Ismael has eight years of post-doctoral work experience and has worked in Canada and Australia.

Monday May 4 2020

Speaker: Mr. Aedan Roberts (University of Technology Sydney)

Title: A Bayesian Hierarchical Model for Detecting Differential Gene Expression Distributions for RNA-seq Data

Abstract: Gene expression data from diseases such as cancer provides the potential both to gain a deeper understanding of disease processes and to improve diagnoses by allowing the classification of patients with similar clinical features but clinically relevant differences at the genetic level. Identifying genes for biological relevance or as features for classification algorithms has traditionally relied on assessment of differential expression – differences in average levels of gene expression between classes – but there is evidence that differences in variance and other distributional properties can also be informative in identifying genes associated with disease. This research is aimed at developing methods for identifying genes relevant to disease by taking into account all available information on the distribution of gene expression – in particular the mean and the negative binomial dispersion parameter for RNA-seq count data. We have developed a Markov chain Monte Carlo algorithm based on a Bayesian hierarchical model to identify differences in mean, dispersion or overall distribution between groups. Inference for small sample sizes is improved over the few existing methods for detecting differences in variance by information sharing across genes. Testing on simulated data and real RNA-seq datasets with artificially induced differences in expression shows that the proposed method is competitive with existing methods for detecting differential expression and outperforms existing methods for detecting differences in variance, as well as allowing assessment of overall differences in distribution between groups, providing the potential to identify genes that may be involved in disease or have potential for prognostic classification, but which would be missed by traditional methods.

About the speaker: Aedan is a PhD student supervised by Associate Professor Paul Kennedy.

Monday April 27 2020

Speaker: Ms. Yunwei Zhang (University of Sydney)

Title: Combining Machine Learning and Survival Analysis to Identify Recipient Sub-cohorts in a Heterogeneous Kidney Transplantation Population

Abstract: Kidney transplant is the main remedy for end-stage renal disease and the prognosis of allograft survival is what recipients care about the most. A popular method for allograft survival prediction in kidney transplantation is through the Cox proportional hazard model. There is a substantial literature and the performance of the published models varies greatly. One possible explanation driving this variability of performance is the high heterogeneity that is intrinsic in the transplant population. We propose two complementary approaches (bottom-up and top-down) that aim to identify recipient sub cohorts based on the inherent structure of the data which will improve allograft survival. The innovations of our approaches lie in combining supervised and unsupervised learning, that is, it integrates machine learning methods with survival analysis. The bottom-up approach uses Numero, a new self-organising-map method, with the elastic net Cox model to stratify potential recipient sub cohorts. The alternative top-down approach uses the Cox model with a contrast tree method to identify cohort characteristics. Examining the results from both approaches, we find that recipient waiting time is an important predictor in predicting graft survival for the whole population. We also find that there is a large amount of heterogeneity among ‘unfit’ recipients, these recipients have sub cohorts that are particularly hard to predict in terms of their graft survival. In contrast, for younger and ‘fit’ cohorts, we found that immunological factors are important components. The ability to identify sub cohorts based on prediction outcome is useful for enhancing prediction of graft survival and has the potential guide allocation algorithm.

About the speaker: Yunwei is a PhD student supervised by Professor Jean Yang.

Monday April 20 2020

Speaker: Ms. Tingting Gong (Garvan Institute of Medical Research)

Title: Refining Somatic Structural Variant Detection for Precision Oncology

Abstract: Somatic structural variants (SVs), which are variants that typically impact more than 50 nucleotides, play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools over the past few years, each method has inherent advantages and limitations. We aim to evaluate variables impacting our ability to accurately detect somatic SVs and further facilitate informative decision-making on important impactful factors. Using simulation studies, we evaluated single and combinatoric effects of SV caller, SV types and sizes, variant allele frequency (tumour purity), sequencing depth of coverage, and variant breakpoint resolution. Using a generalized additive model allowed predictions of sensitivity and precision to be made for any combination of predictors. The prediction model was implemented in a web-based application, called Shiny-SoSV, which is freely available at Shiny-SoSV provides an interactive and visual platform for users to easily compare the individual and combined impact of different parameters. It predicts the performance of a proposed study design, on somatic SV detection in silico, prior to the commencement of experiments.

About the speaker: Tingting is a PhD student supervised by Professor Vanessa Hayes.

Monday April 6 2020

Speaker: Ms. Hani Jieun Kim (University of Sydney)

Title: CiteFuse Enables Multi-modal Analysis of CITE-seq Data

Abstract: Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins (ADT). Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for double detection, modality integration, clustering, differential RNA and protein expression analysis, ADT evaluation, ligand-receptor interaction analysis, and interactive web-based visualisation of CITE-seq data. We show the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate the use of CiteFuse for predicting ligand-receptor interactions with multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data.

About the speaker: Hani is a PhD student supervised by Dr. Pengyi Yang.

Seminars in 2019, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday October 28 2019

Speaker: Associate Professor Sarah Kummerfeld (Garvan Institute)

Title: Realising the clinical potential of multi-omics

Abstract: Whole genome sequencing is now well established as a clinical tool. Interpretation of genomic DNA is being used for a broad range of clinical applications include: diagnosis of rare diseases, pharmacogenomics and understanding of complex disease. However, genomic DNA represents only one slice of the biology of disease. Through two very different vignettes, I will present on our work using a broad range of -omics technologies to understand disease and derive clinically relevant biomarkers.

About the speaker: Associate Professor Sarah Kummerfeld is the Scientific Head of the Kinghorn Centre for Clinical Genomics at the Garvan Institute of Medical Research. She uses genomics to understand human disease and translate findings into clinical diagnostics and treatments. Sarah completed her PhD in Computational Biology at the University of Cambridge, working on protein structure and function prediction. Her postdoctoral research at Stanford University studied the molecular basis of human ageing. Sarah has worked both in academia and industry, including 10 years as a Scientist at Genentech, based in the San Francisco Bay Area. At Genentech, she used large-scale genomics approaches to understand why only some patients respond to treatment and to identify diagnostic biomarkers that predict response to particular drugs. Sarah is dedicated to bringing applying advances in genomics research to benefit patients.

Monday October 21 2019

Speaker: Professor Jean Yang (University of Sydney)

Title: Single cell data integrative analysis

Abstract: Recent advances in large scale single cell transcriptome profiling have greatly expanded cell-type specific characterisation of complex biological systems. It enables discovery of many heterogeneous cell-types and differences in cell-type roportions often carry biological significance. A critical first step towards understanding such differences is the accurate identification of cell types. from the complex tissues and organs. A common approach achieved this by unsupervised clustering followed by manual annotation according to marker gene expression. With the increasing availability of large collections of scRNA-seq datasets generated from the same tissues, organs, and biological systems, as well as the comprehensive human and mouse cell atlases, we are now at the transition point where supervised classification may be trained to accurately classify cell types. In this talk, I will discuss a number of approaches develop at Sydney to address methodological challenges associated with single cell data. We will discuss a novel single cell differential composition (scDC) approach that performs differential cell-type composition analysis via bootstrap resampling. We will introduce a multiscale classification framework (scClassify) for single cell classification on a cell type hierarchy and ensemble learning where scClassify effectively annotates cells at different levels of the cell type hierarchy. Finally, for a given training dataset, scClassify implements a sample size estimation procedure to determine the number of cells required for accurate cell type classification at a given cell type hierarchy level.

About the speaker: Professor Jean Yang is an applied statistician with expertise in statistical bioinformatics. She was awarded the 2015 Moran Medal in statistics from the Australian Academy of Science in recognition of her work on developing methods for molecular data arising in cutting edge biomedical research. Her research stands at the interface between medicine and methodology development and has centered on the development of methods and the application of statistics to problems in -omics and biomedical research. She has made contributions to the development of novel statistical methodology and software for the design and analysis of high-throughput biotechnological data including that from microarrays, mass spectrometry and next generation sequencing. Recently, much of her focus is on integration of multiple biotechnologies with clinical data to answer a variety of scientific questions. This includes developing various approaches and methodologies in statistical machine learning and network analysis. As a statistician who works in the bioinformatics area, she enjoys research in a collaborative environment, working closely with scientific investigators from diverse backgrounds.

Monday October 14 2019

Speaker: Dr Pengyi Yang (University of Sydney)

Title: Systems stem cell biology

Abstract: The ability of stem and progenitor cells to differentiate into specialised cells is essential for organogenesis and opens the possibility for regenerative medicine where damaged cells in tissues and organs could be replaced using stem cell-derived cells. The understanding of identities and fates of stem/progenitor cells is the foundation for controlling stem cell differentiation and their use for stem cell-based therapies. In this talk, I will present three studies where systems approaches were utilised to understand cell identities and fate decisions in stem cells and during their differentiation. The talk will showcase how computational methods can be applied for making biological discovery and expanding our knowledge in stem cell biology.

About the speaker: Pengyi Yang is the Group Leader of the Computational Systems Biology group at Children's Medical Research Institute, at the Westmead Research Hub. He also heads the Computational Trans-Regulatory Biology group at Charles Perkins Centre, and holds a Senior Lectureship at the School of Mathematics and Statistics, the University of Sydney. As a systems biologist cross-trained in computer science, statistics, and systems biology, Pengyi combines computational and statistical methods to model trans-regulatory networks in stem and progenitor cells using large-scale multi-layered omic data.

Monday September 30 2019

Speaker: Dr Heejung Shim (University of Melbourne)

Title: riboHMM*: Comprehensive annotation of translated regions using ribosome footprint profiling data

Abstract: Understanding the functional effects of gene expression critically depends on the accurate and comprehensive annotation of sequence elements which are translated in each gene. Ribosome profiling provides direct and genome-wide measurements of translation levels in a given cell type. In this talk, I will first introduce a method, riboHMM, that 1) models a codon periodicity structure in ribosome profiling data, and 2) integrates RNA sequence information and transcript expressions to identify translated regions in a transcript. Applying riboHMM on ribosome profiling data collected from human lymphoblastoid cell lines, we identified 7273 novel translated regions, including 2442 translated upstream open reading frames (uORFs) and 2551 coding sequences from transcripts that were previously annotated as non-coding. We observed that more than 60% of the novel coding sequences use non-canonical start codons. We also observed that ~40% of the 2442 translated uORFs are likely to regulate the translation of their downstream coding regions.

About the speaker: I am a Group Leader in the Melbourne Integrative Genomics (MIG) and Lecturer (equivalent to Assistant Professor in US) in the School of Mathematics and Statistics at the University of Melbourne. I completed my BS in Mathematics (with a double major in Computer Science and Engineering) from the POSTECH, and my PhD in Statistics from the University of Wisconsin at Madison, advised by Dr. Bret Larget. I did a postdoc at the University of Chicago working with Dr. Matthew Stephens. Previous to my position here, I was a tenure track Assistant Professor in the Department of Statistics at the Purdue University for two years. Currently I retain an affiliation with Purdue as an Adjunct Assistant Professor.

Monday September 23 2019

Speaker: Professor Eric Stone (ANU)

Title: Getting serious about graphical structures in genome sciences

Abstract: Systems biology, defined broadly, is the study of how components of a biological system interact. Graphs, meanwhile, are general representations of the pairwise interactions (edges) between arbitrary components (vertices). It is no wonder, then, that graphs pervade systems biology and genome sciences in general. This talk is an attempt to lay some ground rules for making sense of them. I will focus on the ubiquitous issue of “missing vertices” that correspond to unmeasured components of the biological system. To do so, I will introduce a formal definition of graphs with missing vertices, and I will discuss how and why these objects are amenable to theory. Subsequently, I will discuss how theoretical results can be leveraged to make biological inference. I aim to provide a range of biological applications/illustrations spanning phylogenetics, population genetics, systems biology and beyond. While this talk will synthesise concepts from mathematics (i.e. spectral graph theory) and multivariate statistics (i.e. principal components analysis and multidimensional scaling), accessibility is not predicated on previous knowledge.

About the speaker: Eric is a quantitative biologist who combines statistical methods and mathematical theory to investigate how genetic variation has shaped biological diversity. He studied Mathematics at the University of Florida and Princeton University before training in Statistics and Genetics at Stanford University. He joined the Australian National University in mid-2016 after eleven years on the faculty at North Carolina State University. He is founding Director of the ANU Biological Data Science Institute as well as Director of the ANU-CSIRO Centre for Genomics, Metabolomics and Bioinformatics.

Monday September 16 2019

Speaker: Justine Charon (University of Sydney)

Title: Hunting the parasites of our parasites : meta-transcriptomic based identification of viral sequences associated to the human malaria parasite Plasmodium vivax.

Abstract: Eukaryotes of the genus Plasmodium cause malaria, a parasitic disease responsible for substantial morbidity and mortality in humans. Yet, the nature and abundance of any viruses carried by these divergent eukaryotic parasites is unknown. We investigated the Plasmodium virome by performing a meta-transcriptomic study of blood samples taken from patients suffering from malaria and infected with P. vivax, P. falciparum or P. knowlesi. This resulted in the identification of a novel RNA virus that we term Matryoshka RNA virus 1 (MaRNAV-1), encoding an RNA polymerase and restricted to P. vivax, as well as an associated hypothetical viral segment of unknown function. Additional screening revealed that MaRNAV-1 was abundant in geographically diverse P. vivax derived from humans and mosquitoes. A related bi-segmented narnavirus-like sequences (MaRNAV-2) were also retrieved from Australian birds infected with a Leucocytozoon - a genus of eukaryotic parasites that group with Plasmodium in the Apicomplexa subclass hematozoa. Together, these data support the establishment of two new phylogenetically divergent and genomically distinct viral species of protists, including the first virus infecting Plasmodium parasites. As well as broadening our understanding of the diversity and evolutionary history of the eukaryotic virosphere, the restriction of infection to P. vivax may be of importance in understanding P. vivax-specific biology in humans and mosquitoes, and how viral co-infection might alter host responses at each stage of the P. vivax life-cycle.

About the speaker: I am a postdoc in Edward Holmes lab since February 2018. My research interests focus on the evolution and diversity of RNA viruses, from the study of the molecular determinants of their amazing evolutive capacities to the characterization of the RNA virosphere in various unicellular eukaryota. More precisely, I am currently investigating the RNA virus diversity in human parasites Plasmodium responsible for the Malaria, as well as various taxa of microalgae, completely overlooked so far, through the use of meta-transcriptomic approaches.

  • Orcid ID :
  • PhD in Molecular Plant Virology, obtained in December 2015 in University of Bordeaux, France – Dr. Thierry Candresse group. Work on the molecular and structural determinants of RNA virus evolution and adaptation using a plant-virus pathosystem as a model.
  • 2016 : Post-doc at INRA, Bordeaux (France) – Dr. Thierry Candresse group : In vitro experimental assessment of the intrinsic disorder in RNA virus protein as an enhancer of virus evolutive potential.
  • 2017 : Post-doc at IECB, Bordeaux (France) – Dr. Axel Innis group : Identification of new anti-microbial peptides using the E.coli ribosome as a high-throughput selection platform.
  • Since 2018 : Postdoc at University of Sydney – Pr. Edward Holmes group : Characterization of the RNA virus diversity in unicellular eukaryotes through the intensive use of meta-transcriptomics approaches.

Monday September 9 2019

Speaker: Peter Priestley (Hartwig Medical Center)


Abstract: Structural variation is one of the key drivers of tumorgenesis, but the genomic rearrangements in many tumor genomes are frequently overwhelmingly complex. We have created a novel interpretation and visualisation tool LINX to facilitate integrated analysis, intepretation and visualisation of structural variation and copy number in tumor genomes. In this talk, I will explain how LINX clusters raw structural variants into consistent events and predicts the chaining of derivative chromosomes and the genic impact. I will also demonstrate the novel visualisation capabilities in LINX. Improved interpretation of genomic rearrangements can lead to novel clinically relevant findings and improve insight into tumorgenesis.

About the speaker: Peter is the bioinformatics lead for Hartwig Medical Foundation and director of the Australian subsidiary. Hartwig Medical Foundation is a non profit institute based in the Netherlands which focuses on whole genome sequencing of cancer patients for research and has created the world’s largest database of metastatic whole cancer genomes. Peter’s team focuses on developing novel bioinformatic tools for whole genome analysis and applying these to patient reporting to support clinical decision making.

Monday September 2 2019

Speaker: Ellis Patrick (Usyd)

Title: Highly multiplexed imaging cytometry to investigate cell-type interactions in situ

Abstract: Understanding the interplay between different types of cells and their immediate environment is critical for understanding the mechanisms of cells themselves and their function in the context of human diseases. Recent advances in high-parameter imaging cytometry technologies have fundamentally revolutionized our ability to observe these complex cellular relationships, providing an unprecedented characterisation of cellular heterogeneity in a tissue environment. In this presentation, I will provide an overview of a selection of these exciting new assays and the scientific hypotheses that they enable. I will also offer my perspective on some of the analytical methods available and introduce a method that I have developed for exploring patterns of spatial organisation of cell-types.

About the speaker: Dr Ellis Patrick is an applied statistician and bioinformatician. He is currently a Lecturer and Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer's disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions.

Monday August 26 2019

Speaker: Belinda Phipson (MCRI)

Title: Using transcriptomics to understand the variability between human kidney organoids

Abstract: The ability to make three dimensional organoids from human pluripotent stem cells through directed differentiation opens up the possibilities of personalised drug testing, disease modelling and regeneration, as well as enhancing our knowledge of organ development. However, successfully using organoids for drug screening or disease modelling will rely on the robustness and transferability of the protocol between cell lines. I will talk about how we examined the reproducibility and robustness of a specific kidney organoid protocol using RNA-seq and single cell RNA-seq data across a total of 54 whole organoid samples.

We designed and performed extensive RNA-seq profiling of kidney organoids taken from various time points across the differentiation protocol. In addition, we generated a series of day 18 organoids derived from distinct iPSC clones, differentiations separated in time, as well as organoids grown concurrently from the same starting cells in separate vials. This allowed us to specifically model the gene-wise sources of variability that arise during the step-wise differentiation process. While individual organoids within a differentiation experiment were highly correlated, greater variation was seen between experimental batches. The most highly variable genes between differentiations were found to be associated with organoid maturation as defined by our time course analysis. Single cell profiling of organoids revealed shifts in patterning and cell type proportions in line with this observation. Finally, I will show how we have applied what we have learned to disease modelling studies, thereby increasing the utility of kidney organoids for personalised medicine and functional genomics.

About the speaker: Dr Belinda Phipson completed her undergraduate degree in Applied Mathematics and Statistics and her Masters degree in Biostatistics in South Africa. She worked briefly as a Biostatistician before relocating to Australia in 2007 and transitioning to Bioinformatics. She joined Professor Gordon Smyth’s group at the Walter and Eliza Hall Institute of Medical Research where she first worked as a Statistical Consultant and then enrolled in a PhD. During her PhD she worked on empirical Bayes methods for gene expression data. She completed her PhD in 2013 before joining the Murdoch Children’s Research Institute as a post-doctoral researcher with Associate Professor Alicia Oshlack. Her current research focusses on methods development and analysis of single cell RNA-seq data, as well as methods development for methylation array data.

Monday August 19 2019

Speaker: Thuc Le (UniSA)

Title: Causality Discovery and Applications in Bioinformatics and Cancer Research

Abstract: In many real-world applications, the research questions of interest are about causality rather than association, whether the goal is for better explanation, prediction or decision making. Causal discovery aims to answer the causality related questions by inferring the cause-effect relationships between variables. Traditionally, causal relationships are identified by making use of interventions or randomised controlled experiments. However, conducting such experiments is often expensive or even impossible due to cost or ethical concerns. Therefore, there has been an increasing interest in discovering causal relationships based on observational data, and in the past few decades, significant contributions have been made to this field by computer scientists. In this talk, I will briefly introduce causality discovery approaches and then talk about a few applications in Bioinformatics and cancer research, including inferring miRNA regulatory relationships, predicting cancer treatment responses, and identifying cancer drivers.

About the speaker: Thuc is a Senior Lecturer in the School of Information Technology and Mathematical Sciences, University of South Australia. He is currently an NHMRC Early Career Research Fellow in Bioinformatics (2017-2020). His research focuses on the development of causal inference methods and their applications in Bioinformatics, particularly in gene regulatory networks, cancer drivers, non-coding RNAs, and cancer subtype discovery.

Monday August 12 2019

Speaker: Erdahl Teber (CMRI)

Title: Framework for determining chromosome-arm specific telomere sequence length and content.

Abstract: Current methods to measure telomere length and to acquire the sequence content are flawed by substantial technical limitations and lack the efficiently to resolve haplotype-specific sub-telomere and telomere sequences. Dr Teber will discuss the challenges and limitations in using long read sequencing to assemble contigs spanning the extensively repetitive telomeric (TTAGGG)n regions.

About the speaker: Dr Erdahl Teber is Team lead at CMRI’s Bioinformatics and Data Science Research Core Facility and Conjoint lecturer at Sydney Medical School, University of Sydney. His research interests are in developing algorithms and code to help unpack the genes that drive cancer, stem cell biology and regenerative medicine

Seminars in 2019, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 24 2019

Speaker: Daniel Cameron (WEHI)

Title: GRIDSS2: detecting undetectable structural variants

Abstract: Structural variants play a major role in the development of cancer and in genetic disorders. With the bulk of genetic studies focusing on single nucleotide variants and small insertions and deletions, structural variants are often overlooked. In part this is due to the increased complexity of analysis required, but is also due to the increased difficulty of detection. In this talk, I will show how the single breakend reporting capability unique to GRIDSS2 enable it to identify structural variants in regions currently considered inaccessible to short read sequencing. Further demonstrating the power of this approach, I will present results from the Hartwig Medical somatic SV pipeline in which we integrate GRIDSS2 with PURPLE, a somatic copy number, purity and ploidy estimation tool, and LINX, a SV/CNV analysis, visualisation and interpretation tool to uncover the structural mutational landscape of cancer in unprecedented resolution and accuracy.

About the speaker: After completing a double degree in science and engineering, Daniel worked for over a decade as a professional software engineer, leading team of up to 30 developers building large-scale software systems. In 2012, he commenced a PhD in the Bioinformatics Division of the Walter and Eliza Hall Institute of Medical Research (WEHI), where he focused on improving structural variant detection, developing the Genomic Rearrangement Identification Software Suite (GRIDSS). Since graduating, he has focused on understanding the role of genomic rearrangements in cancer and has been working as part of the Hartwig Medical Foundation Bioinformatics team uncovering the landscape of genomic rearrangement in metastatic cancer.

Monday May 27 2019

Speaker: Charmaine Tam/ Madhura Killehar (University of Sydney/SIH)

Title: Harnessing data in electronic medical records to characterise the management of patients with acute chest pain

Abstract: The widespread and rapid adoption of electronic medical records (eMR) has generated much promise for the use of this data to inform scientific discovery as well as applications in clinical decision support. In this talk, we present the methodology and preliminary findings from our proof-of-concept study called SPEED-EXTRACT, which extracts electronic medical record data from patients presenting with suspected acute coronary syndrome to Northern Sydney and Central Coast Local Health Districts. The main goal of SPEED-EXTRACT is to evaluate the accuracy of current administrative ICD10 coding for a diagnosis of “STEMI”, a serious type of heart attack, and subsequently develop algorithms for classifying “gold standard” STEMI diagnoses. Other goals are to examine whether data from the eMR can be used for benchmarking quality and safety as well as return relevant analytical findings back to clinicians.

About the speaker: Charmaine Tam is a Senior Research Fellow and Project Lead for the SPEED-EXTRACT project at Northern Clinical School and the Centre for Translational Data Science. Madhura Killedar is a Data Science Research Engineer Group Lead at the Sydney Informatics Hub. Together they work in an analytics team that share a common vision that informatics is a compelling approach for scientific discovery, quality improvement in clinical care and enabling effective healthcare services.

Monday May 20 2019

Speaker: Dr. Edward Hancock (University of Sydney)

Title: The synergy between buffering and feedback in metabolic regulation

About the speaker: Edward completed a BSc (Maths) and BE at Sydney University, and a DPhil (PhD) in control and dynamical systems at Oxford University. He was subsequently a post-doc in synthetic biology at Oxford before starting in the Coffey LifeLab in the Charles Perkins Centre at Sydney. He has spent time as a visiting researcher at ANU, Imperial College London and Caltech.

Monday May 13 2019

Speaker: Francisco Avila Cobos (Ghent University)

Title: Impact of data transformation, pre-processing and choice of method in the computational deconvolution of transcriptomics data

Abstract: Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. Many computational methods to infer proportions of individual cell types from bulk transcriptomics data have been developed (= computational deconvolution). Attempts comparing these methods revealed that the choice of reference signatures is more important than the method itself. However, an evaluation of the combined impact of data transformation, pre-processing and methodology on the results is still lacking. Using single-cell RNA-sequencing (scRNA-seq) data from human pancreas and PBMCs, we artificially generated hundreds of pseudo-bulk mixtures with varying number of cells and cell types in known proportions, allowing the evaluation of the combined impact on the deconvolution results.

About the speaker: Francisco Avila Cobos obtained a BSc in Biotechnology from University of León (Spain, 2013), completed a MSc in Bioinformatics and Computational Biology at University College Cork (Ireland, 2015) and, thanks to a Special Research Fund (BOF) scholarship of Ghent University (Belgium), became a PhD fellow under the supervision of Prof. Katleen De Preter and Prof. Pieter Mestdagh. Funded by grant for a long stay abroad from Scientific Research Flanders (FWO), he is currently a visiting researcher at the Garvan-Weizmann Centre for Cellular Genomics in Sydney (single cell and computational genomics division) under the supervision of Prof. Joseph Powell.

Monday May 6 2019

Speaker: Dr. Conan Wang (University of Queensland)

Title: Let’s Get Back into Shape! Structuring peptides into drugs

Abstract: Large-scale analyses of disease datasets have uncovered a multitude of pathways and protein-protein interactions that can now be targeted for therapeutic gain. One class of molecules that has attracted interest from industry for the design of next generation drugs to bind to these targets are peptides, nature’s miniature proteins, because of their potential for high specificity and low toxicity. I am particularly interested in peptides with privileged biopharmaceutical properties of exceptional structural stability and favourable thermodynamic binding properties owing to their unique and highly constrained architectures. Many of these peptides are found in nature – in the venom of spiders, in seeds of sunflowers, and in leaves of your everyday garden herbs – and many more continue to be discovered through bioinformatic screens of public databases. I support the view that an understanding of their structures will allow us to design better drugs, ones that have high shape complementarity to their target, resulting in potent binding affinities and specificities, and can be delivered effectively to their targets. In this talk, I will discuss how structures or shapes of peptides and proteins can be used to guide the design of new drug leads by mining structural databases; and to understand how we can better deliver them to increase efficacy and patient compliance by simulating their dynamics in biological environments. I will also discuss recent methods for the determination of peptide structures that have come about through analyses of packing preferences within crystals. I hope to convey by the end of this talk the utility of structural investigations in molecular engineering of new bioactive compounds.

About the speaker: Dr Wang is a Senior Research Officer at the Institute for Molecular Bioscience, the University of Queensland, Brisbane. He began his training as a bioinformatician pressing keys at the University of New South Wales. Since then, he has broadened his research experience as a structural biologist lifting pipettes and test tubes on an NHMRC ECR Fellowship at Hong Kong University of Science and Technology, Griffith University, and now at the University of Queensland. He works at the interface between experimentation and computation in industry collaborations to design new bioactive molecules to treat diseases, such as multiple sclerosis, cardiovascular disease, and cancer, or control environmental pests. He is interested in understanding the structures of these bioactive compounds and using that knowledge to improve design approaches. With the time he has spent in the lab, he may not be in shape but hopefully his molecules are!

Monday April 29 2019

Speaker: Dr. Robert Weatheritt (Garvan Institute of Medical Research)

Title: Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop

Abstract: Alternative splicing (AS) is a widespread process underlying the generation of transcriptomic and proteomic diversity and is frequently misregulated in human disease. Accordingly, an important goal of biomedical research is the development of tools capable of comprehensively, accurately, and efficiently profiling AS. Here, we describe Whippet, an easy-to-use RNA-seq analysis method that rapidly—with hardware requirements compatible with a laptop—models and quantifies AS events of any complexity without loss of accuracy. Using an entropic measure of splicing complexity, Whippet reveals that one-third of human protein coding genes produce transcripts with complex AS events involving co-expression of two or more principal splice isoforms. We observe that high-entropy AS events are more prevalent in tumor relative to matched normal tissues and correlate with increased expression of proto-oncogenic splicing factors. Whippet thus affords the rapid and accurate analysis of AS events of any complexity, and as such will facilitate future biomedical research.

About the speaker: Robert Weatheritt studies the impact of post-transcriptional regulation on proteomic diversity. He did his PhD at EMBL Heidelberg and undertook postdoctoral research in Cambridge and Toronto. He is now an EMBL Australia group leader at the Garvan Institute of Medical Research.

Monday April 15 2019

Speaker: Dr. Shila Ghazanfar (CRUK, Cambridge)

Title: Characterization of differential correlation across single cell differentiation trajectories with scDCARS

Abstract: Single cell RNA-seq data places us in an unprecedented position where we are able to examine patterns of variation and importantly co-variation of genes across cells along continuous differentiation trajectories. We recently presented Differential Correlation Across Ranked Samples (DCARS), a statistical method to identify differentially correlated gene pairs across a set of ranked samples, representing either discrete or continuous patterns of group identity. Here, we describe a new approach, scDCARS, a framework for which changes in correlation are examined across a differentiation trajectory. We demonstrate scDCARS with liver developmental data and find key cascading changes in coordination of gene subnetworks including those associated with cell cycle and lipoprotein metabolism. Furthermore, we present scDCARS as part of the DCARS package as well as an interactive Shiny application readily available for scientists’ interrogation with new data. This work provides a unique lens in which higher order interactions among genes can be unpicked and understand the landscape of cell type fate choice.

About the speaker: Dr. Shila Ghazanfar is a Royal Society Newton International Fellow and Research Associate working at the Cancer Research UK Cambridge Institute. She completed her PhD in statistical bioinformatics at The University of Sydney in the School of Mathematics and Statistics. Her current research interests are in the statistical analysis of data arising from high throughput sequencing technologies such as single cell RNA-Seq and spatially resolved single cell transcriptomics in various research contexts.

Monday April 8 2019

Speaker: Dr. Qing Zong (Group Leader, Cancer Data Science, Children’s Medical Research Institute)

Title: Cancer Data Science for Mass Spectrometry-based Proteomics

Abstract: Dr Qing Zhong will give an overview of ProCan, a flagship program at the Children’s Medical Research Institute. The aim of ProCan is to generate and analyse a pan-cancer proteome database of tens of thousands of human cancers of all tumour types in the next 7 years. He will then introduce the mass spectrometry-based proteomics and a typical ProCan workflow that shows how biological samples are turned into permanent digital proteome maps. Next, he will talk about some ongoing projects, such as discovery of prognostic biomarker for stratifying prostate cancer patients with intermediate Gleason scores, proteomic profiling of 1000+ cell lines from the Sanger Institute, investigation of ovarian tissue heterogeneity, and proteogenomic analysis of lung cancer. In addition to these research topics, he will also present several technical studies regarding feature stability by machine learning and cross-instrument reproducibility. ProCan believes that these projects will add to the landscape of precision cancer medicine and facilitate the delivery of molecular data to cancer clinicians, in a clinically-relevant time frame, to maximise the accuracy of treatment decisions.

About the speaker: Dr Qing Zhong is a data scientist with expertise in analysis of biological and medical data by machine learning techniques. He has a science doctorate from the Swiss Federal Institute of Technology (ETH) Zurich, and a decade of experience working in an interdisciplinary environment that involves collaboration between biologists, clinicians, and industry partners. His postdoctoral training was at the University of Zurich and its affiliated hospital, where he developed expertise in analysis of omics data, and designed and performed a proof-of-concept study, to test a clinical big data system for consolidating genomic, clinical and demographic information into a unified model for precision and data-driven medicine. He joined Children’s Medical Research Institute (CMRI) in 2017 to head the Cancer Data Science group.

Monday April 1 2019

Speaker: Dr. Natalie Twine (CSIRO)

Title: TRIBES: Cryptic relationship and disease variant discovery in Amyotrophic Lateral Sclerosis

Abstract: Amyotrophic lateral sclerosis, (ALS) is a devastating and lethal neurodegenerative disorder. The majority of ALS cases (90%) are sporadic (SALS), while the remaining cases are familial (FALS). Due to FALS gene mutations sometimes appearing in apparently sporadic cases, we hypothesise that some SALS are distantly related. We have developed a genetic ancestry tool, TRIBES, which is faster than comparable relatedness tools (ERSA) and has improved accuracy to KING. Using TRIBES, we have identified cryptic relatives in a large ALS whole genome sequence (WGS) cohort. We discovered a single haplotype connecting 19 FALS families, highlighting previously unknown relatedness. We also discovered novel 5th and 6th degree relationships connecting SALS cases. Crucially, shared genomic regions between novel relatives highlight mutations in known ALS genes, as well as novel genes. Newly identified relatives significantly increases the power to identify novel ALS genetic mutations.

About the speaker: Natalie is a research scientist and team lead for the genomics insight team within the transformational bioinformatics group at CSIRO. The team focuses on developing technology for population-scale genomics. Natalie’s major research focus is use of Big Data technologies to understand the genetic basis of ALS. This is a collaborative project with Macquarie University and the international consortium, Project MinE. Natalie has expertise in high throughput genomic and transcriptomic data analysis, clinical genomics, genetics and big data analysis. She obtained her PhD in Bioinformatics from University of New South Wales and has previously worked at UNSW, Kings College London and University College London.

Monday March 25 2019

Speaker: Dr. Yi Jin Liew (CSIRO)

Title: Epigenetic adaptation of corals to climate change.

Abstract: The role of epigenetics in plants and mammals is fairly well understood; not so for basal metazoans such as corals. One such mechanism is DNA methylation, which regulates gene expression through the reversible methylation of cytosines in the genome. Using the coral Stylophora pistillata, which has a broad geographical range and able to thrive in diverse habitats, we sought to understand whether DNA methylation plays a role in adapting to more acidic oceans—a consequence of anthropogenic climate change. To do this, we performed whole genome bisulphite sequencing (WGBS) on triplicate samples from four pH conditions for more than two years to mimic long-term pH stress. We observed genes associated with cell cycle and body sizes having increased gene body methylation under this stress, which corresponded to the phenotypes of larger cell and polyp sizes. This allowed stressed corals to maintain the same linear growth rate despite being in conditions that reduced their calcification rate. As corals are long-lived organisms, we were interested in knowing whether these epigenetic modifications were passed on to the next generation. WGBS was carried out on adult, gamete and larval samples from another coral, Platygyra daedalea. Epigenotypes of the samples were more similar to their respective parents’ than to other samples of the same type, providing initial evidence for the intergenerational inheritance of methylation patterns in corals.

About the speaker: Yi Jin is currently a Research Scientist in the Molecular Diagnostics Solutions group in CSIRO, attempting to squeeze public datasets for promising cancer biomarkers. Prior to that, he was a postdoc at the King Abdullah University of Science and Technology in Saudi Arabia, where he studied DNA methylation in corals (and regrets never mastering diving despite working on corals AND living by the Red Sea). He graduated with a PhD in Genetics from the University of Cambridge, but has, over the years, swapped the pipette for a keyboard.

Seminars in 2018, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Friday December 14, 2018 (NOTE: Special time and location: 10 - 11AM, Level 4 Large Meeting Room, Charles Perkins Centre)

Speaker: Dr Aaron Lun (Cancer Research UK - Cambridge Institute)

Title: Challenges and future directions in single-cell data analysis

About the speaker: I graduated from the University of Sydney with a Bachelor of Science, majoring in Molecular Biology and Genetics. I did my PhD in Bioinformatics at the Walter and Eliza Hall Institute for Medical Research with Gordon Smyth, working on statistical methods for analyzing ChIP-seq and Hi-C data to study chromatin structure and organization. I am currently working as a research associate with John Marioni at the CRUK Cambridge Institute, developing computational methods for analyzing single-cell RNA sequencing data. I maintain around 14 Bioconductor packages focusing on a range of genomics data analyses, but am probably best known as the cat on the support site.

Monday November 19, 2018

Speaker: A/Prof Jessica Mar (The University of Queensland)

Title: One of these cells is not like the other – how variability of gene expression highlights regulatory control.

Abstract: When studying the transcriptome, our inferences typically revolve around changes in average gene expression. For a population of single cells, modeling gene expression distributions and how their properties differ between phenotypes, can be far more informative than following average trends alone. This talk outlines some of the approaches my lab has developed to investigate how variability of gene expression contributes to our understanding of transcriptional regulation.

About the speaker: Associate Professor Jessica Mar is a Group Leader at the Australian Institute for Bioengineering and Nanotechnology at the University of Queensland in Brisbane. The Mar group focuses on understanding variability in the transcriptome and how this informs regulation of cell phenotypes. Jess received her PhD in Biostatistics from Harvard University in 2008. She was a postdoctoral fellow at the Dana-Farber Cancer Institute in Boston (2008-11), and an Assistant Professor at Albert Einstein College of Medicine in New York (2011-2018). Having only just relocated back to Australia as an ARC Future Fellow this year in July, a major focus of her work is on modelling the aging process using single cell bioinformatics. Jess has received several awards, including a Fulbright scholarship (2003), the Metcalf Prize for Stem Cell Research from the National Stem Cell Foundation of Australia (2017), and the one that she is the proudest of is the LaDonne H. Shulman Award for Teaching Excellence (2017) because the winner is selected by the graduate students at Albert Einstein College of Medicine.

Monday November 12, 2018

Speaker: A/Prof Ruby Lin (The University of Sydney)

Title: Treatment of severe Staphylococcus aureus infections with bacteriophage therapy - Westmead experience.

About the speaker: Ruby joined Iredell lab at the end of October 2017 after a short stint in industry. She is the project manager for Iredell lab and the scientific lead for an investigator-led clinical trial involving treatment of severe Staphylococcal infections using bacteriophage therapy. Her research focus has been microRNA driven dysfunctions in eukaryotic disease model systems including mouse/rat models and humans. She was a named NHMRC Peter Doherty fellow, 2005-8 and UNSW Global postdoctoral fellow, 2009-14. She has acquired >A$5.1m in competitive funding. She has 5 seminal papers in BMJ, 1999, Nature, 2010, ATVB, 2010, PNAS, 2012 and Faseb J, 2014, all received media coverage, high impact and have high citation index. She has presented >65 papers at >30 conferences and cross-disciplinary seminars as invited chair and speaker. She is a conjoint Associate Professor at UNSW. During her presidency at Australasian Genomic Technologies Association (AGTA), a prominent society in genomics in Australia and NZ with members from industry and academia, she implemented gender equality at its annual meetings. She is heavily involved in promoting gender balance and women in STEM through various professional networks. She continues to train honours, PhD and postdoctoral researchers. She is a regular guest lecturer at UNSW, UTS and Macquarie University. In her spare time she volunteers as the primary ethics coordinator at her kids’ school and helps P & C with fundraising events. She also does pro bono work as a career coach.

Monday November 5, 2018

(No Seminar)

Monday October 29, 2018

Speaker: Helen McGuire (The University of Sydney)

Title: Single cell analysis with Mass Cytometry; technology introduction and opportunities

Primer on Mass Cytometry: The anatomy of single cell mass cytometry data

About the speaker: Dr Helen McGuire is a Research Officer at the Ramaciotti Facility for Human Systems Biology, Charles Perkins Centre, an initiative established in 2013 to support the development of mass cytometry and broader systems biology analysis across the University of Sydney campus and wider collaborative links. Her research focus and interest lies in the clinical application of immunological studies to a range of human diseases.

Monday October 22, 2018

Speaker: Timothy Peters (Epigenetics Laboratory, Garvan Institute)

Title: A general framework for evaluating cross-platform concordance in genomic studies

Abstract: The reproducibility of scientific results from multiple sources is critical to the establishment of scientific doctrine. However, when characterising various genomic features (transcript/gene abundances, methylation levels, allele frequencies and the like), all measurements from any given technology are estimates and thus will retain some degree of error. Hence defining a “gold standard” process is dangerous, since all subsequent measurement comparisons will be biased towards that standard. In the absence of a “gold standard” we instead empirically assess the precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. We analyse a publicly available TCGA dataset containing both sequencing and array technologies, allowing a direct per-technology, per-locus comparison of sensitivity and precision across all common loci. We implement and showcase a number of applications of the row-linear model, including direct comparisons of the sensitivity and precision of these platforms. Our findings demonstrate the utility of the row-linear model in evincing varying levels of concordance between measurements on these platforms, serving as a process for identifying reproducibility caveats in studies where cross-platform validation is performed.

About the speaker: Tim's background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. He is currently a bioinformatician/statistician in the Immunogenomics group at Garvan Institute of Medical Research. Current interests include single-cell methylome and transcriptome analysis, and reproducibility of genomic studies.

Monday October 15, 2018

Speaker: Gene Hart-Smith (University of New South Wales)

Title: The promise and pitfalls of big proteomics data: a case study centred on protein methylation

Abstract: The field of proteomics is reliant on computationally intensive analyses of large datasets. A particular focus is on the accurate identification of peptides from large datasets of tandem mass spectrometry (MS/MS) spectra, which are typically collected in high-throughput LC-MS/MS experiments. The data analysis workflow that has been developed to meet this challenge – the ‘sequence database search’ – is considered a cornerstone of contemporary proteomics research.
Despite their near-ubiquity, sequence database searches can consistently go wrong. For example we recently showed that when sequence database searches are applied to the identification of post-translational protein methylation, false discovery rates are unavoidably high. This particular defect of the sequence database search has resulted in a plethora of false information entering the mainstream scientific literature. The reasons behind this defect will be discussed, together with specific and practical means by which this defect can be overcome.

About the speaker: Dr Gene Hart-Smith is a recent ARC Discovery Early Career Researcher Award holder working within the UNSW School of Biotechnology and Biomolecular Sciences. In 2010 he completed a PhD at the UNSW School of Chemistry, in which he utilised mass spectrometry as a primary tool to investigate synthetic polymer formation processes. He has since been applying his expertise in mass spectrometry to the study of biological systems.
Gene’s current research is centred around the examination of protein-protein interaction networks. He is particularly interested in how post-translational modifications regulate the dynamics of these networks, and is developing and applying mass spectrometric methods towards the investigation of this phenomenon.

Monday October 8, 2018

Speaker: Chelsea Mayoh (Children's Cancer Institute)

Title: The Complexity of Identifying Targetable Genes in the Paediatric Transcriptome

Abstract: Molecular profiling of childhood cancers allows for personalised treatments based on targets found through Whole Genome and RNA sequencing. The accurate identification of germline/somatic mutations, copy number amplifications/deletions and structural variations is possible through the availability of matched tumour-normal pairs. However, identification of up-/down-regulated genes poses a challenge without having a matched normal. In this talk I will speak about the limitations, advantages and complexity of the transcriptome and utilising it to identify potential drug targets for paediatric patients through the Zero Childhood Cancer Program.

About the speaker: Chelsea is the lead bioinformatician at the Children's Cancer Institute. In addition to managing the Bioinformatician team she is one of the key bioinformaticians involved with the Zero Childhood Cancer program. At the Children's Cancer Institute, she works on a wide variety of childhood cancers with a heavy focus on CNS tumours, Neuroblastoma and Leukaemia's performing various kinds of bioinformatic analysis. She is also the Bioinformatician/Biostatistician on several Study Committees through the Sydney Children's Hospital Clinical Trials Program. Prior to coming to Australia in 2015 she was at the Genome Sciences Centre in affiliation with the BC Cancer Agency in Canada.

Monday October 1, 2018 (No seminar - Labour Day public holiday)

Monday September 24, 2018 (No seminar)

Monday September 17, 2018

Speaker: James Cornwell (The University of Sydney)

Title: Quantifying intrinsic and extrinsic control of single cell fates by time- lapse imaging, single-cell tracking, and competing risks analysis

Abstract: The molecular control of cell fate and behaviour is a central theme in biology. Inherent heterogeneity within cell populations requires that control of cell fate be studied at the single-cell level. Time-lapse imaging and single-cell tracking are powerful technologies for acquiring cell lifetime data, allowing quantification of how cell-intrinsic and extrinsic factors control single-cell fates over time. However, cell lifetime data contain complex features. Competing cell fates, censoring, and the possible inter-dependence of competing fates, currently present challenges to modelling cell lifetime data. Thus far such features are largely ignored, resulting in loss of data and introducing a source of bias. In this seminar I will talk about how competing risks and concordance statistics, previously applied to clinical data and the study of genetic influences on life events in twins, respectively, can be used to quantify intrinsic and extrinsic control of single-cell fates.

About the speaker: James completed a Bachelor of Mechatronic Engineering and a Master of Biomedical Engineering in 2012 from the University of New South Wales (UNSW). During his studies James undertook research internships at the Australian Nuclear Science and Technology Organisation (Sydney), the Jozef Stefan Institute (Slovenia), and at ETH Zurich (Switzerland). In 2012 James started his PhD at the Victor Chang Cardiac Research Institute (VCCRI) under the supervision of Professor Richard Harvey and Dr Robert Nordon. His PhD focused on characterising the growth dynamics of cardiac stem cells by time-lapse imaging and single-cell tracking. James established this technology at VCCRI; constructing a methodological pipeline for analysis of single- cell growth dynamics. In 2016 James completed his PhD and joined the School of Dentistry, Faculty of Medicine and Health, University of Sydney as an Associate Lecturer. James’ research currently focuses on developing tools for recording and analysing single cell dynamics and applying these tools to study stem and cancer cell biology.

Monday September 10, 2018

Speaker: Ignatius Pang (University of New South Wales)

Title: Benchmarking Protein Correlation Profiling datasets against reference protein complexes: case studies in S. cerevisiae.

Abstract: Protein Correlation Profiling (PCP) is a method which enables many protein complexes to be identified in single experiments, unlike other methods such as affinity purification-mass spectrometry, which involves ‘one-at-time’ affinity purifications of tagged-proteins. A typical PCP experiment involves fractionation of endogenous and untagged protein complexes by size or other physiochemical parameters, followed by LC-MS/MS and label-free quantification of each fraction. Proteins in the same intact complex are co-eluted and often have high correlation in protein abundance across multiple fractions. Although this information can help identify intact complexes, doing so is computationally challenging. For example, machine learning strategies used to identify complexes from PCP datasets can have high false positive rates for novel complexes (Shatsky et al. 2016 MCP 15.6:2186-02). The aim of this study was to develop a framework for benchmarking PCP datasets against high-quality sets of reference protein complexes. This approach, which we predominantly applied using the large-scale reference sets of protein complexes available for Saccharomyces cerevisiae (e.g. Benschop et al. 2010 Mol. Cell. 38: 916-928), enabled us to evaluate the quality of PCP datasets, identify known protein complexes with high confidence, and develop guidelines on the choice of correlation metrics and fractionation approaches used to interpret and collect PCP datasets.

About the speaker: Igy is a postdoctoral research associate at the Systems Biology Initiative at UNSW, led by Prof. Marc Wilkins. Igy’s current role involves collaborating with bioscience researchers who are interested in the analysis of -omics datasets. His expertise involves the co-analysis of multiple -omics datasets in conjunction with multiple types of biological networks, for example, signaling, regulatory, and protein-protein interactions networks. His current projects include identifying the potential link between gene mutations and the side-effects of an antipsychotic medication and analyzing the role of the virome on the onset of type-1 diabetes among infants. Prior to his postdoc he had 2 years of experience in audit data analytics and fraud detection.

Monday September 3, 2018

Speaker: Serigne Lo (The University of Sydney and the Melanoma Institute of Australia)

Title: Competing risks analysis with missing event types - penalized likelihood estimation of cause-specific Cox models

Abstract: Competing risks models provide attractive tools to analyze time-to-event data where each subject faces multiple event types. The models are useful when assessing the burden and etiology attributable to a specific disease. However, a complexity may arise when the event type for some subjects are missing but their event times are observed. Assuming the unobservable event types are missing-at-random, we develop a novel constrained maximum penalized likelihood estimates for semi-parametric cause-specific Cox regression models. Penalty functions are used to smooth the baseline hazards. The appealing feature of our approach is that all relevant estimates in competing risk setting are provided including regression coefficients and cause-specific baseline hazards. Asymptotic results of these estimates are also developed. Through intensive simulations, we demonstrated the superiority of our method compared to some existing methods. We illustrate the new method using data from melanoma patients who faced two competing risk events: melanoma versus non-melanoma causes of death.

About the speaker: Dr Serigne Lo is a Senior Research Fellow in Biostatistics at the University of Sydney. He leads the Research & Biostatistics Group at the Melanoma Institute Australia. He has accumulated +15 years of academic teaching/research experience. Dr Lo provides leadership in the conduct of cutting edge biostatistical methods and support across the institute. Dr Lo is interested in the development of new statistical methods and his personal research includes: Clinical trials, Adaptive design, Multistate modelling, and Joint-modelling.

Monday August 27, 2018

Speaker: Joshua Ho (Victor Chang Cardiac Research Institute)

Title: Scalable bioinformatics methods for single-cell RNA-seq analysis

Abstract: TBA

About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS).

Monday August 20, 2018 (Cancelled)

Speaker: Chelsea Mayoh (Children's Cancer Institute)

Title: The Complexity of Identifying Targetable Genes in the Paediatric Transcriptome

Abstract: Molecular profiling of childhood cancers allows for personalised treatments based on targets found through Whole Genome and RNA sequencing. The accurate identification of germline/somatic mutations, copy number amplifications/deletions and structural variations is possible through the availability of matched tumour-normal pairs. However, identification of up-/down-regulated genes poses a challenge without having a matched normal. In this talk I will speak about the limitations, advantages and complexity of the transcriptome and utilising it to identify potential drug targets for paediatric patients through the Zero Childhood Cancer Program.

About the speaker: Chelsea is the lead bioinformatician at the Children's Cancer Institute. In addition to managing the Bioinformatician team she is one of the key bioinformaticians involved with the Zero Childhood Cancer program. At the Children's Cancer Institute, she works on a wide variety of childhood cancers with a heavy focus on CNS tumours, Neuroblastoma and Leukaemia's performing various kinds of bioinformatic analysis. She is also the Bioinformatician/Biostatistician on several Study Committees through the Sydney Children's Hospital Clinical Trials Program. Prior to coming to Australia in 2015 she was at the Genome Sciences Centre in affiliation with the BC Cancer Agency in Canada.

Monday August 13, 2018

Speaker: Dr Boris Guennewig (Brain and Mind Centre, The University of Sydney)

Title: Bioinformatics on multimodal data sets in a longitudinal ageing and neurodegeneration clinic

Bio: Dr Boris Guennewig leads the Forefront Bioinformatics team at the BMC. Boris is a Senior Lecturer at the University of Sydney and Conjoint Lecturer at UNSW. He received a diploma in chemistry from the University of Munster and secured a very competitive position at the Max Planck Institute for Molecular Biomedicine, which shifted his focus to immunology and inflammatory processes in human disease. After this, he transitioned to the Swiss Federal Institute of Technology as a PhD candidate in the laboratory of Prof J Hall, where he worked on microRNA biogenesis and their functions in human disease. He was then recruited by Prof J Mattick to the Garvan Institute to work with Prof J Mattick & A Cooper as well as Glenda Halliday on deriving diagnostic biomarkers from various neurodegenerative diseases. Boris is additionally the lead bioinformatician/consultant for the International Cerebral Palsy genetics consortium, a member of the Australian Genomics Health Alliance and the founder of the analytics/bioinformatics company Pacific Analytics PTY LTD (Australia).

Research Interests: I am a research scientist/bioinformatician/statistician specializing in the development of infrastructure, software and pipelines to manage, analyze and mine large complex datasets in medical research. Using structured, semi-structured and unstructured data, my research focuses on the identification and characterization of genetic variation and transcriptional changes influencing complex human diseases (such as frontotemporal lobe dementia, bipolar disorder, Parkinson’s and Alzheimer’s disease, etc.). I achieve this through the functional integration of high-dimensional biological (omics) data, in combination with my statistical, genetics and data mining skills. I believe that assimilating and modelling multi-modal data (i.e. imaging, clinical and omic data) is key to uncovering the genotype-phenotype interaction and how this relationship affects complex traits.

Seminars in 2018, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Friday June 8, 2018 (Special Statistical Bioinformatics Seminar - 11:20AM - Carslaw 829 Access Grid Room)

Speaker: Associate Professor Bradley Broom (MD Anderson Cancer Center, USA)

Title: Bioinformatics analysis tools

Abstract: This Friday we will be hosting Associate Professor Bradley Broom from the Department of Bioinformatics and Computational Biology at MD Anderson Cancer Center for a special Statistical Bioinformatics Seminar. The Seminar will start at 11:20 in the AGR. See below for a description of Bradley’s proposed talk.
In this talk I will describe two bioinformatics tools being developed at MD Anderson Cancer Center.
Clustered heat maps were developed for biomedical data in the early 1990s. They are now the most widely used visualization for molecular profiling data. But they are fundamentally static objects. We have created Next-Generation Clustered Heat Maps that include capabilities for interactively zooming and navigating the clustered heat map, for adjusting its color scale or other display parameters, and for interrogating the data in, or behind, the contents of the heat map. I will describe the key features of Next-Generation Clustered Heat Maps and the tools for generating them.
The bioinformatics analysts at MD Anderson perform many hundreds of detailed bioinformatics analyses per year. We also frequently receive requests to update an analysis or to repeat a substantially similar analysis on new data. These new requests can arrive months, years, or even decades after the original analysis was performed. Access to the original analysis is often crucial to reproducing and updating the earlier analysis, but locating the original analysis can be challenging if a long time has elapsed since the original analysis and/or the analyst who performed the initial analysis has left the institution. In this talk I will describe FjORD, a meta-information system we have developed to record bioinformatics analyses as well as to search for and retrieve previous analyses. I will also describe future goals for enforcing reproducible analyses.

Monday May 28, 2018

Speaker: Mathieu Fourment (ithree Institute, University of Technology, Sydney)

Title: New methods in phylogenetic inference

Abstract: Markov chain Monte Carlo (MCMC) algorithms have been the workhorse of Bayesian inference in phylogenetics for almost two decades. Although these algorithms have been successfully used in a wide range of applications they do not scale well to large numbers of sequences. In this talk I will present some of my work on sequential Monte Carlo algorithms and approximate inference using the variational Bayesian framework.

About the speaker: Mathieu is a data scientist at the University of Technology Sydney. He obtained his PhD at Macquarie University in 2010 and had research positions at the University of California San Diego, Duke-NUS, and the University of Sydney. His research interests include phylogenetics, variational inference, and probabilistic modelling.

Monday May 21, 2018

Speaker: Vinita Deshpande and Tom Geddes (The University of Sydney)

Title: (i) Good and bad fat: discovering key distinguishing features with proteomics; (ii) Ensemble deep learning for integrating heterogeneous omics data

Abstract: (i) Subcutaneous (SC) and visceral (VIS) fat cells store energy as fat and affect whole body metabolism. Excessive VIS fat is associated with insulin resistance, a precursor to Type 2 diabetes. In contrast, SC fat may be protective. Despite these important physiological functions, relatively little is known about the molecular features that distinguish these discrete types of fat cells. We have used mass spectrometry to construct proteomes of SC and VIS fat cells from mouse models representing healthy and diabetic states. These proteomes consist of over 7,500 quantified proteins spanning six orders of magnitude. By coupling with statistical approaches, we stratified the proteome into groups defined by the factor(s) driving the differences in protein expression, which revealed interesting functional differences. This analytic approach also serves as a computational framework to answer biological questions in a comprehensive and systematic manner. Thus we demonstrate the utility of this proteomic resource and analytic approach in uncovering novel insights into fat cell biology.

(ii) Artificial neural network models are capable of learning high-accuracy classifiers over inputs with complex information structure given a large number of training examples, and have become increasingly popular across a wide variety of applications. Ensemble learning methods can be used to increase classification accuracy and robustness by combining outputs from a collection of individual models trained differently on the same data, often proving more reliable than a single model. In the field of biology, multiple datasets are often available pertaining to highly overlapping sets of molecular species (such as proteins or RNA transcripts) but differing in experimental origin, and containing potentially orthogonal/non-redundant information. We explore the use of ensemble deep learning to draw on mass-spectrometry datasets from a variety of sources to increase classification accuracy

About the speaker: (i) Vinita Deshpande is a PhD student at the Charles Perkins Centre at the University of Sydney, where she is using quantitative proteomics to investigate fat cell biology. Her research interests lie in the application of systems biology methods to understand human diseases.

(ii) Tom Geddes is a PhD student at the Charles Perkins Centre at the University of Sydney, where he is using deep learning methods to predict protein-protein interactions. He is interested in better understanding the structure of complex biological systems by leveraging large, information-rich datasets.

Monday May 14, 2018

Speaker: Susan Corley (University of New South Wales)

Title: Working with ‘Salmon’ a new fast tool to quantify RNA-Seq reads

Abstract: Quantification of sequenced reads is the first vital step in undertaking a gene expression experiment using RNA-Seq. Recently, methods including kallisto (Bray et al., 2016) and Salmon (Patro et al.,2017) have been introduced which calculate transcript abundance without full mapping of reads to the genome. In this talk I will give some examples of my experience using Salmon and I will compare these results to read mapping produced using Tophat2.

About the speaker: Susan is a Senior Research Associate in the Systems Biology Initiative (SBI) at UNSW. She commenced with the SBI in 2012 following completion of her PhD in biomedical science and biochemistry at the John Curtin School of Medical Research (ANU). Prior to this she completed her Bachelor of Science with majors in chemistry at the University of Sydney. Susan’s primary research interest is in understanding the genes and pathways involved in human health and disease through employing Next Generation sequencing techniques. Susan’s research projects cover a wide breadth and have included gene expression analysis relating to Crohn’s disease, Schizophrenia, Williams-Beuren syndrome, Kerataconus, Immunity related conditions and cancer cachexia. Susan worked as a lawyer before commencing her studies and work in science.

Monday May 7, 2018

Speaker: Alex Lancaster (Ronin Institute and the University of Sydney)

Title: Modeling cellular systems: stochastic approaches in evolutionary research and therapeutic applications

Abstract: Stochasticity - random noise at the cellular, developmental and organismal level - plays an under-appreciated role in both evolutionary and biomedical research. Alex will present summaries of his work in the computational modeling of stochasticity in cellular systems, including yeast prions, signalling and gene networks and sepsis. Throughout, the role of noise in shaping both evolutionary trajectories, as well as therapeutic intervention are highlighted. The benefits of bottom-up, agent and rule-based strategies in both academic and commercial R&D contexts to help illuminate these questions are also discussed.

About the speaker: Alex Lancaster is a Visiting Scholar at the University of Sydney, a Research Scholar at the Ronin Institute, and a Partner at Cambridge, Massachusetts-based consulting company, Amber Biology. Following undergraduate degrees at the University of Sydney, he received his Ph.D. from University of California, Berkeley and has held research positions at the Santa Fe Institute, the Whitehead Institute at MIT, and Harvard Medical School. His research interests are in the intersection of evolutionary theory, systems biology and agent-based modeling.

Monday April 30, 2018

Speaker: Joseph Cursons (The Walter and Eliza Hall Institute)

Title: Post-transcriptional control of epithelial-mesenchymal transition through combinatorial miRNA targeting

Abstract: MicroRNAs (miRNAs) are small, non-coding RNAs with an important role in post-transcriptional regulation, targeting mRNAs for degradation and/or inhibiting their translation. Working with a model of TGF-β induced epithelial-mesenchymal transition in human mammary epithelial cells, we identified a set of miRNAs that appear to be co‑regulated with induction of an invasive mesenchymal phenotype. Computational analyses show that these miRNAs are coregulated across clinical breast cancer samples from the TCGA as well as a wider set of primary cell lines. Furthermore, analysis of high‑confidence predicted targets (based upon miRNA:mRNA sequence complement) suggests that these miRNAs share several targets, and many of their targets also interact at the protein level. Investigating this result, we selected several pro-epithelial miRNAs with evidence of co-targeting and demonstrated that combinatorial treatment could alter cellular phenotype with ectopic miRNA concentrations several orders of magnitude below what is typically used, and much closer to endogenous levels. This work suggests that cooperative targeting by miRNAs may be an important factor for their physiological function, and future work attempting to classify miRNA function should consider such combinatorial effects.

About the speaker: Joe Cursons is a Senior Research Officer in the Davis Laboratory within the Bioinformatics Division of the Walter and Eliza Hall Institute. He obtained his PhD from the Auckland Bioengineering Institute in 2012 and his research interests are centred on the regulatory control systems dysregulated during the progression and metastasis of breast cancer and melanoma. Much of Joe’s work involves the analysis of sequencing data to identify mechanisms of drug sensitivity and drug resistance for cancer treatment.

Monday April 23, 2018

Speaker: Bobbie Cansdale (The University of Sydney)

Title: From CTCF to 3D Modelling: Investigating the mammalian genome in three dimensions

Abstract: The three-dimensional structure of the mammalian genome is non-random and important for several key biological processes including the regulation of gene expression. Determining this structure, as well as the sequence itself, is necessary to further genome biology research. Topologically associating domains (TADs) are a main feature of chromatin organisation. These are clusters of genes that are functionally co-regulated, with boundary regions enriched for features such as CTCF binding sites, transfer RNAs, and SINE retrotransposons. Chromosome conformation capture (3C) based approaches, including Hi-C, can provide valuable insight into the spatial organisation of chromatin fibre. Computational frameworks have recently become available to use this data to create 3D representations of the genome, providing novel insights compared to the standard interaction matrices alone. Knowledge of these structures will allow investigation as to how they relate to the nearby genes and other genomic features. Here I will focus on bioinformatics strategies and possible future applications of this work using examples from the canine genome.

About the speaker: Bobbie Cansdale is a PhD student in computational biology and animal genomics at the University of Sydney. She completed a Bachelor of Animal and Veterinary Bioscience (Honours) at the University of Sydney in 2015. Her current focus is on canine genomic research. She is interested in the modelling of chromatin architecture, genomic data analysis, novel sequencing methods, and the integration of various data types to better answer questions.

Monday April 16, 2018

Speaker: Sarah Beecroft (The University of Western Australia)

Title: Gene Hunting- Why don't we know all disease genes yet?

Abstract: Rare genetic diseases include some of the most debilitating disease to affect humans, with onset ranging from before birth to old age. Although the human genome has now been mapped, finding mutations that cause rare diseases is still hugely challenging. About 50% of patients are without a diagnosis. Sarah will discuss the bioinformatic strategies used in this field, and the future of disease gene discovery.

About the speaker: Sarah Beecroft is based in the Molecular Neurogenetics lab at the Harry Perkins Institute of Medical Research, Perth. She works to discover new disease genes in patients with nerve and muscle diseases. She is interested in finding mutations in the non-coding regions of the genome, a vast unknown in rare disease genetics.

Monday April 9, 2018

Speaker: Shila Ghazanfar (The University of Sydney)

Title: Investigating combinatorial expression of delta-protocadherins in single olfactory sensory neurons

Abstract: Single cell RNA-Sequencing (scRNA-Seq) has enabled unprecedented insight into the behaviour of individual cells on the scale of the entire transcriptome. Such precision offers an opportunity to explore cell-specific heterogeneity, however two distinct features arise from such data: (1) hyperinflation of identically zero counts for the majority of genes for any given cell, and (2) an apparent bimodal distribution of non-zero counts. Both features are unique to scRNA-Seq, and warrant further development of statistical tools in order to answer biological questions of interest.

We propose a mixture modelling framework to classify cells into three transcriptional states for each gene: (1) no, (2) low, and (3) high gene expression. This approach has the potential to reveal the cell-specific dynamics of RNA transcription (bursting) and degradation, as well as acting as a cross-dataset standardisation. We utilised a number of publicly available scRNA-Seq datasets, stemming from mouse neuronal cell populations, to perform the mixture model comparison, assess highly and lowly variable genes, and to estimate cell networks via a uniqueness thresholding.

This work is in the context of understanding how olfactory sensory neurons (OSNs) interact with each other during embryonic development of the mouse olfaction system. In particular, we study the role that combinatorial expression of genes in the delta protocadherin gene subfamily plays in mediating cell-cell adhesion. Further, we utilise distinct guiding principles to build a Monte Carlo simulation of this cell-cell adhesion behaviour, and assess it's suitability. This addresses the larger question of how combinatorial gene expression specifies specific cell types and tissues.

About the speaker: Shila Ghazanfar has recently completed her PhD in Statistical Bioinformatics at The University of Sydney and is currently a Research Associate in the Judith and David Coffey Lifelab and School of Mathematics and Statistics. Her research interests are in statistical analysis of data arising from high throughput sequencing technologies such as RNA-Seq in various research contexts.

Monday March 26, 2018

Speaker: Kitty Lo (The University of Sydney)

Title: Novel alternative splicing in TDP-43 mutant mouse models of ALS

Abstract: TDP-43 (encoded by TARDBP) is an RNA binding protein central in the pathogenesis of the neurodegenerative disorder amyotrophic lateral sclerosis (ALS). However, how TARDBP mutations trigger pathogenesis remains unknown. Here, we use novel mouse mutants carrying point mutations in Tardbp to dissect TDP-43 function. Interestingly, we find that TDP-43 C-terminal mutations lead to a gain of splicing function. Using two different strains we are able to separate TDP-43 loss and gain of function effects. This new gain-of-function induces a novel category of splicing events, here termed skiptic exons, in which skipping of constitutive exons occurs, causing expression changes. Our findings provide a novel pathogenic mechanism and highlight how gain- and loss-of TDP-43 function affect RNA processing differently, suggesting they may play roles at different disease stages.

About the speaker: Kitty Lo is currently a bioinformatics postdoctoral researcher in the Faculty of Science. Prior to this, she was at the University College London and the UCL Institute of Neurology. She has also worked in a Cambridge based biotechnology startup where she developed cancer diagnostic tools using ctDNA. Kitty has a PhD in astronomy from the University of Sydney.

Monday March 19, 2018

Speaker: Dana Pascovici (Macquarie University)

Title: DIA/SWATH - challenges and opportunities for bioinformatics

Abstract: Protein quantitation using DIA/SWATH mass spectrometry has been growing in popularity over the last few years. From the point of view of the bioinformatics involved, on one hand the data resulting from such experiments is quite easy to analyse at least if the experiment is not too large, due to a much lower percentage of missing data, and data look and distribution that makes existing methodology from other areas quite easily applicable. Put plainly, extracted SWATH data is quite nice to work with. However, that is because much of the difficulty has been pushed underneath, at the level of the SWATH library building and data extraction, where it is somewhat hidden from view.

In this talk we will describe SWATH and its place in the landscape of quantitative proteomics (including broad comparisons with label free and labelled techniques such as iTRAQ and TMT), and the many positive aspects of the resulting SWATH datasets, from the point of view of the data analyst. We will also focus on how SWATH data extraction usually relies on using high quality peptide MS/MS spectral libraries, however building such libraries to ensure good proteome coverage can be time consuming and expensive. In order to address this issue various computational approaches for merging archived or external libraries were created and evaluated, including efforts from our group. We will describe the appeal of such methods, the possible issues that can ensue and some approaches to tackle them in order to ensure that the proteins are reliably detected and their quantitation is consistent and reproducible. We will discuss these aspects in the context of several existing datasets, including a carefully designed spiked-in experiment, and a recently published large plasma proteomics experiment containing samples from neonates, young children and adults.

About the speaker: I am currently a Biostatistician at the Australian Proteome Analysis Facility at Macquarie University, where I help people generate biological insights out of their proteomics data, especially in the context of complex experiments.

Working in a proteomics facility, our focus has been on generating reliable methods of interpreting and analysing data from a variety of platforms, lately emphasizing SWATH and TMT, and wherever possible incorporating them into software workflows. Areas of particular relevance to us have been plasma proteomics, and plant proteomics of agriculturally important species. Our work has benefitted from interactions with researchers, students and the APAF team of mass spectrometry specialists and analytical chemists.

I come from a mathematical and computational background, having completed a bachelor degree in Mathematics and Computer Science at Dartmouth College in the US, followed by a PhD in Mathematics at MIT, and a brief stint of teaching at Purdue. In Sydney I took a more practical turn and worked in the industry in the area of speech recognition, before settling into biostatistics for the past 13 years, both in the industry and research environment.

Seminars in 2017, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 27, 2017

Speaker: Beth Signal (Garvan Institute of Medical Research)

Title: Machine learning annotation of branchpoints and in silico modelling of functional splicing events.

Abstract: RNA splicing is a key component of mature RNA transcript formation, required for the removal of intronic regions and subsequent ligation of exonic regions. This process can also allow for alternative splicing to occur, where different exonic regions are ligated together to produce alternative RNA products. The branchpoint element is one of the splicing sequence elements, required for the first lariat-forming reaction in splicing. However current catalogues of human branchpoints remain incomplete due to the difficulty in experimentally identifying these elements. To address this limitation, we have developed a machine-learning algorithm - branchpointer - to identify branchpoint elements solely from gene annotations and genomic sequence. Using branchpointer, we annotate branchpoint elements in 85% of human gene introns with sensitivity (61.8%) and specificity (97.8%). In addition to annotation, branchpointer can evaluate the impact of SNPs on branchpoint architecture to inform functional interpretation of genetic variants. Branchpointer identifies all published deleterious branchpoint mutations annotated in clinical variant databases, and finds thousands of additional clinical and common genetic variants with similar predicted effects. While alternative splicing can produce alternative RNA products, a large proportion of these have little functional impact on open reading frames or transcript stability. To address this limitation in the functional interpretation of differential splicing analyses, we have developed software to model events in silico and interpret their functional impact.

About the speaker: Beth is a PhD Student in the Clinical Genome Informatics group at the Garvan Institute. Her current research is focused on developing bioinformatics methods to understand how transcript splicing and expression is controlled. She has a particular interest in using machine learning techniques to study transcriptomic behaviour.

Monday November 20, 2017 (PLEASE NOTE: Special second talk, Location = Carslaw Access Grid Room (Level 8), Time: 4pm)

Speaker: Sonja Greven (LMU Munich)

Title: Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains

Abstract: Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, e.g. functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen-Loève Theorem. For the practically relevant case of a finite Karhunen-Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN. The new method is shown to be competitive to existing approaches for data observed on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. Supplementary material, including detailed proofs, additional simulation results and software is available online.

About the speaker:

Monday November 20, 2017 (PLEASE NOTE: Special location - Level 5 Large Meeting Room, Usual time: 1pm - 2pm)

Speaker: Elizabeth Mason (The University of Melbourne)

Title: Modelling transcriptional variability in single cell RNA-seq data during human embryogenesis captures changes in the regulation of critical developmental genes

Abstract: Human development is a temporally and spatially ordered series of events that occur with remarkable precision; the same DNA blueprint gives rise to more than 250 sharply defined cell phenotypes. At the functional phenotype level embryogenesis appears predictable because we observe the average behaviour of many individual cells, even as the number of cells, the range of phenotypes and transcriptional complexity increases during the course of development. When we evaluate single molecules and transcripts that the stochastic nature of gene expression is revealed, for example in single cell RNA-seq experiments (scRNA-seq). Current methods reduce scRNA-seq data to a well-defined trajectory based on the abundance of key regulators of phenotype, and differential abundance between cells in a given phenotype is used to identify sub-populations. Here we present an alternative approach: that measuring the transcriptional variability at the gene level informs the level of regulation imposed on it, reflecting an intrinsic property of development that is often overlooked. While linear models have been a successful framework to characterize differences in abundance between phenotypes on average, they do not account for stochastic differences captured by scRNA-seq experiments. Accurately determining abundance and variability is further complicated by the sparseness of non-zero expression values. To address these challenges and evaluate gene expression during human pre-implantation embryogenesis, we applied a statistical mixture model to scRNA-seq data. Fitting the model on a gene-by-gene basis allowed us to evaluate shifts in the proportion of cells expressing a given gene (λ), and also the mean (μ) and standard deviation (σ) of expression. From here, a correlation based analysis evaluated whether abundance (μ) and variability (σ) capture different aspects of transcriptional regulation. While each metric largely identified the same genes, the number and nature of relationships between them differed. Indeed, genes sharing correlated patterns of variability during development were enriched for motifs associated with developmental transcription factors (e.g. HIC2, PPARG, E2F4 and ZNF692). Variability was more effective than abundance at specifically detecting regulatory relationships during development, and with less redundancy. Our approach provides a gene-centric platform to evaluate population-based parameters of gene expression, while preserving the complexity of scRNA-seq data.

About the speaker: Lizzi began her career in human genomics as a laboratory manager and laboratory technician with Professor Greg Gibson (Centre for Integrative Genomics, Georgia Tech University). She conducted 2 investigations in Australia which identified maternal influences on development of the neonate immune system, and uncovered population structure of the leukocyte transcriptome. Together with scientists at Emory University, Greg and Lizzi initiated the CIG’s involvement in the WHOLE (Wellness and Health Omics Linked to the Environment) study of Predictive Health Genomics in Atlanta (USA) which is currently in its 6th year. Lizzi has recently completed a PhD in systems biology of human stem cells at the Australian Institute for Bioengineering and Nanotechnology at the University of Queensland. Her PhD project formed an international collaboration with Professor Christine Wells (University of Melbourne AUS), stem cell biologists Professors Martin Pera (Jackson Laboratory USA) and Ernst Wolvetang (University of Queensland AUS), biostatistician Assistant Professor Jessica Mar (Albert Einstein College of Medicine, USA) and computational biologist Professor John Quackenbush (Harvard University, USA). Her primary focus is evaluating whether molecular variability in stem cell populations describes an important, but until now hidden predictor of cellular behaviour and phenotype. Phenotypic heterogeneity in clonally derived cell populations is ubiquitous, and biologically relevant information is often masked by using population-averaging techniques, versus individual cell based measurements. She has developed new network approaches which incorporate gene expression variance, with the goal of identifying genetic elements which stabilize a cell phenotype, and push a cell to transition between phenotypes. During her PhD Lizzi has been invited to present her work in departmental seminars at the Harvard Stem Cell Institute, the Lieber Brain Institute at Johns Hopkins University, and the Black Family Stem Cell Institute at Mt Sainai Hospital New York. She was also one of 12 international scientists who were invited to participate in the Radcliffe Exploratory Workshop for Variation at Harvard University in 2011. She is currently based with Professor Christine Wells in the Centre for Stem Cell Systems at the University of Melbourne, where she is working on applied statistical methods to evaluate molecular variability in single cell RNA-seq data.

Monday November 13, 2017 (No seminar)

Monday November 6, 2017

Speaker: Shila Ghazanfar (The University of Sydney)

Title: Integrated single cell data analysis for understanding mechanisms of neuronal diversity

Abstract: Technological advances such as large scale single cell transcriptome profiling have exploded in recent years and enabled unprecedented insight into the behaviour of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterise very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterisation of high gene expression, as well as gene coexpression. In this talk, I will describe a versatile modelling framework to identify transcriptional states motivated by a collaborative project involving neuronal single cell data. Neuronal cell systems exhibit extraordinary levels of complexity. Thus it is of great interest to explore the ways in which this neuronal diversity is generated and manifested to achieve such complexity. One proposed mechanism is patterns of gene transcription across neurons. We will describe an approach using bioinformatics and statistics to evaluate evidence of gene transcriptional mosaics as a mechanism for achieving diversity of neuronal cells.

About the speaker: Shila has recently completed her PhD in Statistical Bioinformatics at The University of Sydney. Her research interests are in statistical analysis of data arising from high throughput sequencing technologies such as RNA-Seq in various research contexts.

Monday October 30, 2017 (No seminar)

Monday October 23, 2017

Speaker: Eva Chan (Garvan Institute)

Title: Detecting Complex Genomic Rearrangements using Optical Mapping

Abstract: Genomic rearrangements are common in cancer, with demonstrated links to disease progression and treatment response. These rearrangements can be complex, resulting in fusions of multiple chromosomal fragments and generation of derivative chromosomes. Comprehensively detecting complex genomic rearrangements (CGR) in cancer remains challenging. No single approach can comprehensively identify all structural variations as each has their strengths and weaknesses. In this seminar, I will demonstrate the utility of whole genome optical mapping in capturing CGR. I will further showcase an example using optical mapping to capture chained fusion events in a well-studied liposarcoma cell line. Using this approach, we identified fusion maps that clearly revealed chained fusion architectures (content, order, orientation, and size), as well as large rearrangement junctions that are undetectable by sequencing alone. I hope to convince you that optical mapping is an important complement to existing technologies for detecting and reconstructing complex genomic rearrangements.

About the speaker: Senior Bioinformatics Research Officer, Human Comparative and Prostate Cancer Genomics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, The Kinghorn Cancer Centre

Monday Oct 16, 2017 (PLEASE NOTE: Special time of 2:00PM)

Speaker: Natalie Thorne (Melbourne Genomics)

Title: Clinical bioinformatics – what does it really take to translate research into practise?

Abstract: Melbourne Genomics Health Alliance has taken a collaborative, patient-centred, clinically-driven, evidence-based and sustainable approach to delivering genomic testing. This year the Alliance has commenced implementing Victoria’s new clinical system for genomics. A platform for bioinformatics analysis and a tool for variant curation will be among the first components to be implemented and used for accredited clinical genomic testing by diagnostic laboratories. Operating within this shared digital system however, presents a challenge for laboratories to simultaneously coordinate with other diagnostic laboratories and hospitals, whilst also supporting their own business requirements for accreditation and continual innovation.
At the heart of diagnostic innovation in genomics is the emerging field of clinical bioinformatics; combining clinical, diagnostic, analytical, software and genetic aspects to implementing clinical genomic testing. The field has two key challenges: first, it is in its infancy and laboratories lack the support of a mature discipline; second, it demands skills and expertise predominantly lacking in traditional academia. These include developing enterprise-grade solutions, complex strategies for organisational change, multi-stakeholder collaboration, community engagement and rapidly evolving biotechnology.
Drawing on my experiences working with the Melbourne Genomics and Australian Genomics Health Alliances, I will discuss the challenges and opportunities in clinical bioinformatics, including the use of ‘implementation science’ for translating research bioinformatics into clinical practice.

About the speaker: TBA

Monday Oct 9, 2017

Speaker: Saskia Freytag (WEHI)

Title: Cluster Headache: Comparing Clustering Tools for 10X Single Cell Sequencing Data

Abstract: The commercially available 10x Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10x Genomics data lack cell labels. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with 10x Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrated that all clustering methods tested clustered cells to a large degree according to the amount of ribosomal RNA in each cell.

About the speaker: Saskia completed her Masters in Statistical Science at University College London. After finishing she moved back to Germany, where she completed a PhD in Biostatistics in 2014. She then got the opportunity to relocate to Melbourne to work as a Post-Doctoral Fellow at the Walter and Eliza Hall Institute in Melanie Bahlo’s group. Her research focus is methodological development for the analysis of high throughput sequencing data. She is co-founder of R-Ladies and an ambassador for CHOOSEMATHS.

Monday October 2, 2017 (No seminar - Labour Day public holiday)

Monday September 25, 2017 (No seminar)

Monday Sept 18, 2017

Speaker: Rebecca Poulos (UNSW)

Title: The use of big data in the search for cis-regulatory driver mutations in cancer genomes

Abstract: Mutations that directly alter protein function have been extensively studied in cancer. However, in recent years, it has become feasible to examine the cancer-causing role of mutations within the remaining 98% of the genome which is non-coding. Here I will present our use of big data in the study of cis-regulatory somatic mutations in cancer genomes. We analysed somatic mutations from over 1,000 cancer genomes across 14 cancer types, specifically focusing on promoter regions. These regulatory regions are often bound by proteins, and we discovered remarkable ‘mutation hotspots’ at sites of protein binding. To understand why these hotspots formed, we used genome-wide maps of nucleotide excision repair (NER) to show that sites of protein binding have reduced levels of NER. Our analyses uncovered the presence of a previously unknown mechanism, by which we associated reduced NER with the formation of mutation hotspots at promoters. To determine how these hotspots might impact cancer development, we investigated whether these mutations can impact the ability of a protein to bind to DNA by analyzing skin cancer mutations at the binding site of the protein CTCF. Performing CTCF ChIP-seq in a melanoma cell-line, we demonstrated the functionality of such mutations through allele-specific reduction of CTCF binding to mutant alleles. Finally, we sought to determine the role of DNA methylation (a common epigenetic modification) on the occurrence of somatic mutations in cancer. We correlated mutation load with methylation across 15 cancer types and subtypes, and we showed that reduced levels of methylation in regulatory regions may be responsible for reduced mutation loads at such loci in colorectal cancer. Taken together, these analyses develop our understanding of the formation and repair of mutagenic lesions in cis-regulatory regions of cancer genomes, providing insight into the search for driver mutations at such loci.

About the speaker: Rebecca Poulos is a researcher in the ‘Bioinformatics and Integrative Genomics’ group at the Lowy Cancer Research Centre at UNSW Sydney. Rebecca’s research field is in the area of cancer genomics, where she uses ‘big data’ to study DNA mutation and repair processes in regulatory regions of cancer genomes. Her research output includes first-author publications in ‘Nature’ and ‘Cell Reports’, together with a review article, editorial and book chapter in the area of non-coding driver mutations in cancer. Rebecca studied science and business at the University of Technology Sydney. She is currently at UNSW Sydney where she completed her Honours year (with University Medal) and is finalising her PhD research (with UNSW Research Excellence Award).

Monday Sept 11, 2017

Speaker: Mark Segal (UCSF)

Title: Statistical and Computational Challenges in Conformational Biology

Abstract: Chromatin architecture is critical to numerous cellular processes including gene regulation, while conformational disruption can be oncogenic. Accordingly, discerning chromatin configuration is of basic importance, however, this task is complicated by a number of factors including scale, compaction, dynamics, and inter-cellular variation. The recent emergence of a suite of proximity ligation based assays, notably Hi-C, has transformed conformational biology with, for example, the elicitation of topological and contact domains providing a high resolution view of genome organization. Such conformation capture assays provide proxies for pairwise distances between genomic loci which can be used to infer 3D coordinates, although much downstream analysis bypasses this reconstruction step. After demonstrating advantages deriving from obtaining 3D genome reconstructions, in particular from superposing genomic attributes on a reconstruction and identifying extrema (’3D hotspots’) thereof, we showcase methodological challenges surrounding such analyses. Open issues highlighted include (i) performing and synthesizing reconstructions from single-cell assays, (ii) devising rotation invariant methods for 3D hotspot detection, (iii) assessing genome reconstruction accuracy, and (iv) averting reconstruction uncertainty by direct integration of Hi-C data and genomic features. By using p-values from (epi)genome wide association studies as the feature the latter approach provides a conformational lens for viewing GWAS findings.

About the speaker: TBA

Monday Sept 4, 2017

Speaker: Dr Fabio Luciani (UNSW)

Title: A systems immunology approach to study antigen-specific T cells in viral infection 

Abstract: Immunological memory is a cardinal feature of human adaptive immunity and is critical for prophylactic vaccination and recently has been shown to play important role in determining the outcome of T cell based immunotherapies in cancer. Although cytotoxic T cells can have a significant impact on disease clearance, the essential phenotype of a clinically successful T cell and how this influences therapeutic efficacy remain largely undefined. In this presentation I will present our systems immunology approach to tackle these issues. I will review recent studies on longitudinal samples of primary HCV infection using flow cytometry for phenotyping virus specific T cells, along with single cell transcriptomic and TCR diversity analyses. Future directions involve application of this systems immunology approach to other viral infections, as well to understand how long term T cell memory protection is achieved.

About the speaker: Ass. Prof. Fabio Luciani was trained as theoretical physicist (Masters), theoretical biologist (PhD 2006 from the Humboldt University of Berlin (Germany)). His research interests include adaptive immune responses against pathogen infections, computational models for studying host-pathogen interactions, and bioinformatics analyses of high throughput next generation sequencing data. He has applied mathematical modelling to understand infectious diseases, focussing on transmission dynamics of drug resistant mycobacterium tuberculosis, and the transmission of hepatitis C virus among injecting drug users. He has made several contributions in how HCV infect a new host and the role of T cell mediated responses using next generation sequencing technologies, flow cytometry and statistical modelling. More recently, he has moved into single cell genomics and systems immunology approaches to understand T cell dynamics. He currently holds a NHMRC Career Development Fellowship and he leads a systems immunology group where he conducts both wet- and dry-lab research in the field of immune responses against pathogens. During his career he has published more than 80 papers in specialized and more general journals.

Monday Aug 28, 2017

Speaker: John-Sebastian Eden (Charles Perkins Centre, USyd)

Title: Using RNA-Seq to reveal the Australian Virome

Abstract: TBA

About the speaker: TBA

Monday Aug 21, 2017

Speaker: Lori Chibnik (Harvard School of Public Health)

Title: Genomic journeys into neuropathology and cognitive reserve in an aging population

Abstract: TBA

About the speaker: Dr. Lori Chibnik, PhD, MPH is a biostatistician and Assistant Professor with a joint appointment in the Department of Epidemiology at the Harvard T.H. Chan School of Public Health and the Department of Medicine at the Harvard Medical School. She received her MPH in International Health and her PhD in biostatistics from Boston University where she worked on predictive modeling methods for disease risk. Over her career she has developed and assessed predictive models for diseases such as HIV, pre-natal screening and autoimmune diseases and continues to apply her methods to various diseases. Dr. Chibnik’s current research focuses primarily on genetics and genomics of Alzheimer’s disease and dementia with an emphasis on longitudinal cohorts. In addition to her research, she is internationally renowned for her training programs and innovative teaching techniques, having developed multiple courses in biostatistics for varied audiences. While at Boston University she managed the Summer Institute for Training in Biostatistics, an NHLBI funded, 6-week summer program designed to bring undergraduate students into the fields of Biostatistics and Public Health. Most recently she developed and implemented a series of biostatistics and programming courses specific to the needs of scientists in sub-Saharan Africa. Currently she directs the Global Initiative for Neuropsychiatric Genetics Education in Researcher at the Harvard-Chan School and the Stanley Center for Psychiatric Research in the Broad Institute of Harvard and MIT.

Seminars in 2017, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 26, 2017

Speaker: Timothy Peters (Epigenetics Research Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research)

Title: Robust and flexible de novo calling of differentially methylated regions

Abstract: DNA methylation is a dynamic, environmentally sensitive modification implicated in a large array of biological processes, from transcription factor binding to a being a reliable predictor of age. Hence accurate and interpretable statistical modelling of the methylome is of great importance when investigating epigenetic cell states. DMRcate is a Bioconductor package that calls differentially methylated regions (DMRs) from replicated Illumina array (including the new EPIC array) and whole genome bisulfite sequencing (WGBS) experiments, under general experimental design. It uses a tunable kernel smoother and whole-methylome significance testing to find and rank the most differentially methylated regions for a given hypothesis. It is fast and delivers DMRs in the order of seconds for arrays and minutes for WGBS. Package features include: • Adjustable kernel size • Guidance for users towards appropriate false discovery rate (FDR) thresholds • Annotation-agnostic calling • Options for filtering Illumina probes known to be polymorphic and/or cross-hybridise to off-target genomic sites • Automatic post-calling annotation of DMRs with known Gencode promoter regions • Output in GenomicRanges and bedGraph format • Elegant plotting of DMRs using the Gviz package, including proximal Gencode gene loci • Calling of variably methylated regions (VMRs) from Illumina arrays DMRcate takes into account a number of biological and statistical considerations when defining DMRs, such as irregular spacing of CpG sites and the distribution of variances across CpGs as a result of variable sequencing depth. Reference: Peters et al (2015) De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin. 2015 Jan 27;8:6. doi: 10.1186/1756-8935-8-6.

About the speaker: Tim’s background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. In addition, he has spoken at a number of national and international conferences, including an oral presentation at the Joint Statistical Meetings (JSM) in Washington, DC.

Monday June 19, 2017

Speaker: Geoff Barton (Professor of Bioinformatics and Head of Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.)

Title: Identification of novel functional sites in protein domains from the analysis of human variation

Abstract: In this talk I will present a new analysis that compares publically available variation data for human with variation seen across all available protein sequences regardless of species. The analysis confirms patterns of variation in human are consistent with protein structural features, but highlights structurally and functionally important sites in around 15,000 human protein domains that are not found by conventional sequence analysis methods. The identified sites are enriched in disease-associated variants and ligand binding residues. I will explain the method and illustrate the new analysis with a number of examples including the Nuclear Receptor Ligand Binding Domains and G-protein coupled receptors (GPCRs) which are important therapeutic targets. The study makes heavy use of the popular Jalview ( sequence analysis program developed in my group, so I will also give a brief update on Jalview’s new features for exploring nsSNPs on alignments.

Note: This is a joint event where Prof. Geoff Barton will be giving a talk to all in CPC. Time and location will be announced later.

Monday June 12, 2017, No meeting (Queen's Birthday)

Monday June 5, 2017

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Unsupervised recurrent neural network for cell event detection in videos

Abstract: In this talk, we will present an automatic unsupervised cell event detection and classification method for cell videos based on convolutional and recurrent neural networks. Cell images captured from various biomedical applications often possess different visual characteristics regarding cell appearance, motility, and cell activities. This presents difficulties in finding a generic solution for the automatic detection of cell events (division, death, differentiation, etc.) in videos. Current methods for event detection rely on human observers with specific expertise and long hours of labor; this also renders supervised training a sup-optimal choice. We use a convolutional Long Short-Term Memory (LSTM) neural network structure that simultaneously exploits both spatial visual features and temporal patterns of objects to filter and classify possible cell events in a video sequence. Our model design allows for the detection and classification of cell events without the need for labeled training data; we will demonstrate our model for the detection of mitosis events.

About the speaker: TBA

Wednesday May 31, 2017

Speaker: Stephen Leslie (Centre for Systems Genomics), Schools of Mathematics and Statistics, and BioSciences, The University of Melbourne

Title: Genetics and Geography: Using genomic data to understand population history and demography

Abstract: In this talk Stephen will present some of the findings from the People of the British Isles project, which was published in Nature in March 2015 (and featured on the cover), and some more recent work following on from this study. In particular he will show that using newly developed statistical techniques one can uncover subtle genetic differences between people from different regions at a hitherto unprecedented level of detail. For example, in the UK one can separate the neighbouring counties of Devon and Cornwall, or two islands of Orkney, using only genetic information. Stephen will then show how these genetic differences reflect current historical and archaeological knowledge, as well as providing new insights into the historical make up of the British population, and the movement of people from Europe into the British Isles. This is the first detailed analysis of very fine-scale genetic differences and their origin in a population of very similar humans. The key to the findings of this study is the careful sampling strategy and an approach to statistical analysis that accounts for the correlation structure of the genome. The methods developed are readily extended to analyses in other populations.

About the speaker: Associate Professor Stephen Leslie is a statistician working in the field of mathematical genetics. A/Prof. Leslie did his undergraduate degree at ANU, including honours in Mathematics. He obtained his doctorate from the Department of Statistics, University of Oxford in 2008, followed by post-doctoral work at Oxford, before becoming the Head of Statistical Genetics at Murdoch Childrens Research Institute in 2012. Since 2016 Stephen has been at the University of Melbourne as Associate Professor of Statistical Genomics, in the Schools of Mathematics and Statistics, and Biosciences, and the Centre for Systems Genomics. In late 2016 he was awarded the Woodward Medal in Science and Technology, the University of Melbourne’s highest award for staff, which is given for research that has made the most significant contribution to knowledge in the five years prior to the award. A/Prof. Leslie's work covers several aspects of statistical and population genetics. His group's main focus is on methodological developments for the analysis of high throughput genetic data and the application of these methods to studies of disease and natural population variation. These methods typically combine modern computationally-intensive statistical approaches with insights from population genetics models. Specifically the group works on statistical methods for imputing immune system (and other) genes from incomplete genetic data; the application of these methods to studies of autoimmune and other diseases; methods for detecting and controlling for population stratification; and understanding the causes and consequences of genetic variation in populations.

Monday May 29, 2017

Speaker: Tram Doan (Westmead Millennium Institute, Sydney Medical School)

Title: RNA-seq profiling of normal human breast epithelial cells reveals un-expected nuclear receptor segregation

Abstract: The ovarian hormone progesterone is a key regulator of female reproductive function. The established role of progesterone analogues in hormone replacement therapy in increasing breast cancer risk has sharpened focus on the mechanisms of action of this hormone in the normal breast. Progesterone play an essential role in the development of lobular alveolar structures in the breast, through stimulation of proliferation during the normal menstrual cycle and pregnancy. We previously reported that the progesterone receptor (PR) was present in the progenitor-enriched normal breast cell population and likely mediates proliferative effects in those cells. In the present study, we profiled the transcriptome of the normal human breast epithelial cells at single cell resolution. The aims are to 1) identify the number and functional characteristics of different cell populations in the normal breast epithelium, and 2) characterise PR expression and lineage association in different normal breast epithelial cell types. We show that progesterone exerts distinct functional roles in different normal breast epithelial cell types and that PR is expressed more frequently in progenitor cells and has the strongest transcriptional effect in this cell population.

About the speaker: TBA

Monday May 22, 2017

Speaker: Kevin Wang (Statistical Bioinformatics Group, School of Mathematics and Statistics)

Title: A bias correction method to identify over-represented gene-sets for boutique arrays

Abstract: Gene annotation databases such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are important tools in gene set enrichment test (also known as GST) that describe genes in terms of their associated biological functions and pathways. The purpose of this type of enrichment analysis is to assign biologically meaningful terms to each gene. Associations between a gene set and biological functions of interest can then be established by considering statistically over-represented annotation terms. Traditionally this is done through Fisher’s Exact Test (FET), assuming gene expression arrays capture the complete or at least a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platforms. Specifically, the conventional enrichment analysis is no longer appropriate due to the gene set selection bias induced during the construction of the arrays. By introducing bias correction terms in the contingency table, we thus propose an adjustment method on the traditional hypergeometric test statistics in FET. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. In this paper we demonstrate a method to adjust over-representation p-value in a grid in $\left[0,1\right]^2 $. Using our own Shiny application, we will illustrate the advantages and practicality of the method through multiple differential gene expression analyses in melanoma and other cancers.

About the speaker: I am currently a PhD candidate at the School of Mathematics and Statistics under the supervision of Prof. Jean Yang, A/Prof. Samuel Mueller and Dr. Garth Tarr. I am working in the area of statistical bioinformatics and I have strong interests in developing novel methods brought forward by high dimensional biomedical data. A central focus of my current research focuses on the increasingly popular boutique array platform and its application both as a discovery and validation platform for biomarkers for patients in melanoma studies.

Monday May 15, 2017

Speaker: Fabian Held (The Life Lab, Charles Perkins Centre)

Title: Challenges of modelling (collaborative) networks

Abstract: With a constantly expanding body of scientific knowledge and expertise collaboration is essential for the vast majority of research projects. However, we know very little about the complex interactions that affect the success and failure of collaborations, which may include collaborators’ personal attributes, their dynamics as a team, as well as the environment they’re working in. This is especially challenging when success and failure are not clearly defined. In this presentation Fabian will give an update about his progress in evaluating the Charles Perkins Centre’s effectiveness at “challenging prevailing dogmas, generating new ideas and translating knowledge into action” through facilitation of diverse collaborations. In particular, he will focus on his attempts of a statistical analysis of the network of collaborations that have emerged in the CPC, through the co-location of research groups and facilitation of project nodes. He will address conceptual, methodological, as well as technical issues of approaching this problem with exponential-family random graph models on the HPC cluster.

About the speaker: TBA

Monday May 8, 2017

Speaker: Dario Strbenac (Statistical Bioinformatics Group, SoMS, USyd)

Title: Design, Experimentation and Analysis of a Spike-in iTRAQ Proteomics Dataset Reveals Unexpected Aspects of Measurement Bias and Variance

Abstract: A replicated Latin squares experimental design was created to explore a variety of factors that influence the accuracy and precision of the measurements made by defining a set of 15 performance metrics. The experiment consisted of 21 non-yeast proteins which were spiked into a background yeast proteome in seven instrument runs of 8 samples labelled using the iTRAQ 8-plex kit. Importantly, the effect of the particular iTRAQ label used was greater than the effect of instrument run. Also, dividing the quantities of different proteins within the same run yielded reasonably accurate fold changes, providing a counter-example to the commonly accepted rule that measurements of different proteins can't be directly compared. Thirdly, the method of summarisation of peptides to a protein-level summary was found to have little effect. Finally, simple point-and-click normalisation using ProteinPilot resulted in better estimation of fold changes at the expense of increased variance and didn't perform substantially worse in any other performance metrics than methods like RUV or linear models, suggesting that commercial software can enable good quality analyses to be done quickly and accurately. The raw dataset is available from ProteomeXchange and allows anyone to apply their own normalisation method to it and upload the protein quantities to the web application and see how their method's performance compares to other methods.

Monday May 1, 2017

Speaker: Tim Burykin (The Life Lab, Charles Perkins Centre)

Title: Call for data: Exploration and visualization of complex datasets with a novel method

Abstract: As a member of professional staff, I'm helping academics at Charles Perkins Centre to visualize their data for presentation, teaching or research purposes. In the first part of the talk I will briefly demonstrate how my images and videos were used to support the narrative of high-impact presentations. I will then focus on the generic method behind these visuals and discuss its usefulness for the exploration and potentially for the in-depth analysis of complex datasets of almost any nature. The talk would be suitable for people who want to look at their data from a different angle or who are searching for a friendly yet comprehensive way to convey their work to the broader audience.

About the speaker: I received my Master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John Crawford. My project was concerned with three-dimensional modelling, analysis and visualization of soil microenvironment and leaf cellular structures. Accumulated experience in computer graphics and efficient algorithm development enabled me to join Judith & David Coffey LifeLab at Charles Perkins Centre as a data visualization technician.

Monday April 24, 2017

Speaker: Alistair Senior(School of Mathematics and Statistics, Charles Perkins Centre)

Title: Meta-analytic tools to detect overlooked variance effects in biological systems

Abstract: Medically, the effects of a treatment on among individual variation in health have direct implications for personalized medicine. Ecologically, among-individual variation governs a species niche and is the grist of evolution by natural selection. However, experimental designs and analytical paradigms in biology are heavily focused on detecting the effects of treatments on population averages. As a result, we have a comparatively poor understanding of how environments and treatments affect among-individual variation. Over the last few years I have been developing tools for meta-analysis, which allow the user to combine the results of published studies to assess the effects of treatments on variation. These methods require only those summary statistics that are reported as a matter of standard practice, and integrate easily with commonly used meta-analytic softwares. I will present a summary of the methodology, as well as examples of its application that are pertinent to research goals of the Charles Perkins Centre.

About the speaker: I did my undergraduate and masters degrees in the UK, where my research was primarily directed towards questions in ecology and evolution. In 2010 I moved to the University of Otago to do a PhD on gene-environment interactions in determining phenotypic sex, with Shinichi Nakagawa. During this period, I developed an interest in the development and application of hierarchical statistics to questions in biology. After graduating, in early 2014 I moved to Sydney where I began working with Profs Simpson and Raubenheimer to apply my quantitative skills to questions in nutritional ecology.

Monday April 17, 2017 (Easter Monday)

Monday April 10, 2017

Speaker: Pengyi Yang (School of Mathematics and Statistics, Charles Perkins Centre)

Title: A dynamic multi-omic atlas of the transition from naive to primed pluripotency.

Abstract: Embryonic stem cells (ESCs) have the potential to generate virtually any differentiated cell types to establish new models of mammalian development and to create new sources of cells for treating an enormous range of diseases. To elucidate the molecular pathways underpinning the transition from naïve to primed pluripotency cell states, we quantified the dynamic changes in the proteome, phosphoproteome, transcriptome, and epigenome underpinning the transition between these cellular states with high temporal resolution. We observed widespread remodelling of the cell across all regulatory layers, and yet the rate, extent and magnitude of phosphorylation changes exceed those observed on other levels, emphasising a critical role for phosphorylation in this process. Our dynamic phosphoproteomics data reveal that ERK and mTOR signalling branches dominate early and late signalling network activity respectively during the ESC to EpiLC transition. Collectively these data provide insight into the molecular processes underlying naïve and primed states, highlighting numerous potential gatekeeper mechanisms governing ESC pluripotency.

About the speaker: I obtained my PhD in bioinformatics from School of Information Technologies, The University of Sydney, in 2012. I then moved to the United States and completed an interdisciplinary Research Fellowship in Systems Biology Group, ESCBL, at National Institutes of Health on characterising transcriptomic and epigenomic regulations in embryonic stem cells (ESCs) using ultrafast sequencing data. I relocated back to Australia in late 2015 on a University of Sydney Postdoctoral Fellowship (DVCR) to pursue my own research in systems biology. I’m now affiliated with School of Mathematics and Statistics (SoMS); and Charles Perkins Centre, The University of Sydney. I have been offered a Lectureship in USyd (April 2016) and a Discovery Early Career Researcher Award (DECRA).

Monday April 3, 2017 (Hunter Meeting)

Monday March 20, 2017

Speaker: Ellis Patrick (School of Mathematics and Statistics)

Title: Deconstructing the innate immune component of a molecular network of the aging frontal cortex.

Abstract: Alzheimer’s disease is pathologically characterized by the accumulation of neuritic β-amyloid plaques and neurofibrillary tangles in the brain and clinically associated with a loss of cognitive function. The dysfunction of microglia cells has been proposed as one of the many cellular mechanisms that can lead to an increase in Alzheimer’s disease pathology. Investigating the molecular underpinnings of microglia function could help isolate the causes of dysfunction while also providing context for broader gene expression changes already observed in mRNA profiles of the human cortex. In this talk I will lay out the various statistical approaches I have used to tackle this problem.

About the speaker: Dr Ellis Patrick is a computational biologist and applied statistician. He is currently an Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer’s disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions.

Seminars in 2016, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 14, 2016

Speaker: Darya Vanichkina (Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Marvellous complexity: Exploring the mammalian transcriptome using RNA sequencing

Abstract: The complexity of the trillions of cells that comprise the mammalian body is underpinned not by their genomes, which are by definition identical, but by the temporally and physically precise expression of particular coding genes, long non-coding RNAs and small regulatory RNAs. In my talk, I will present some of the outcomes of my PhD research, which focussed on using and expanding upon developments in RNA sequencing technology and stem cell differentiation to deeply investigate transcripts in cortical and hindbrain-like neurons, and in oligodendrocyte precursor cells. I will also give an overview of my current work, which involves exploring the roles of alternative splicing in controlling gene expression, and the development of new methods of analysing splicing complexity.

About the speaker: I am a genomics data scientist at the Gene and Stem Cell Therapy Program at the Centenary Institute in Sydney, where I investigate how the mammalian genome works using next-generation sequencing. I use a combination of preexisting bioinformatics software and custom R, python, and shell scripts to process terabytes of data on a daily basis, taking advantage of the University of Sydney's HPC facilites. I recently completed my PhD in Bioinformatics and Genomics at the Institute for Molecular Bioscience at the University of Queensland, under the supervision of Dr. Ryan Taft and Professor John Mattick. My work focused on using high-throughput sequencing to understand changes in the transcriptome that occur during neuronal functioning in normal cells and in disease; and on induced pluripotent stem cells as models of the human nervous system. For many years, I have been passionate about teaching, especially programming and bioinformatics to biologists, and was able to do a signifcant amount of this during my PhD studies. I am both a Software and Data Carpentry instructor. I also hold a Specialist Degree in Biochemistry with a Major in Molecular Biology from Lomonosov Moscow State University.

Monday November 7, 2016

Speaker: Weichang Yu (PhD candidate, SoMS, Usyd)

Title: Semisupervised quadratic discriminant analysis using model selection and variational Bayes

Abstract: We develop a mean field collapsed variational Bayes approximation for quadratic discriminant analysis (QDA) with model selection, where we allow missing class information in the training dataset and subsequent model selection. This allows the use of unlabelled data to build the classifier and identification of strong predictors. We demonstrate using simulated and real datasets that this leads to a reduction in prediction error even in cases where the within-class dispersion is large. We make two contributions: We presented a computationally cheaper alternative to Monte Carlo Markov Chain with comparable results for Bayesian inference for QDA and a Bayesian framework for performing model selection in QDA.

About the speaker: I am a first-year PhD student at University of Sydney working with Dr John Ormerod. My research interests includes variational approximation, model selection and the use of predictive algorithms in medical and bio-informatics.

Monday October 24, 2016 (ABACBS, No seminar)

Monday October 17, 2016

Speaker: Jason Wong (Group Leader, Bioinformatics and Integrative Genomics, UNSW)

Title: Gaining fundamental insights into DNA repair-chromatin interactions through cancer genomics

Abstract: Mutations form in the genome through the interplay of DNA lesion formation and incomplete DNA repair. With the advent of cancer genomics and particularly whole cancer genome sequencing, cancer somatic mutations provides us a window into which we can looking into how mutational and DNA repair processes function within human cells. In this talk, I will discuss how we have used whole cancer genome sequencing data to discover a novel biological process. Using publicly available data, we showed that transcription factor binding at active gene promoters can impair nucleotide excision repair (NER) thereby resulting in prevalent mutation hotspots at gene promoters in NER depend cancers such as skin and lung cancers. I will further discuss the implications of this biological process on cancer development and the impact of our study on the interpretation of functional mutations in cancer.

About the speaker: Dr Wong is an ARC Future Fellow at the Prince of Wales Clinical School, UNSW and lead the Bioinformatics and Integrative Genomics Team at the Lowy Cancer Research Centre. He received his B.Sc (Hons I), from the University of Sydney and was award a D.Phil in Bioanalytical Chemistry at the University of Oxford, UK in 2007. This was followed by a post-doctoral fellowship at the University College Dublin, Ireland, before returning to Sydney to join UNSW. To date, he has published 65 peer reviewed journal articles with senior authorship in journals including Nature, Genome Biology, Molecular Cancer Research and Nucleic Acids Research. He has attracted over $2 million in research funding as lead investigator from the ARC, Cancer Australia and Cancer Institute NSW. His current research is focused on the study of mutational processes in cancer and its influence on gene regulation and function.

Monday September 19, 2016

Speaker: Fatemeh Vafaee (CPC, USyd)

Title: Determination of circulating microRNA markers of colorectal cancer prognosis by a novel network-based multi-objective optimisation routine

Abstract: Colorectal cancer presents a significant cause of cancer-related death and effective treatments that maximise quality of life as well as cancer-related outcomes are therefore of major importance. Determining the appropriate treatment pathway through a personalised medicine paradigm is a prime goal, and so biomarkers are sought to aide in the decision-making process. In the age of high-throughput technologies, molecular markers are particularly attractive as a means of achieving true personalisation of cancer treatment. We have recently evaluated the role of circulating microRNA as a means of predicting patients’ prognosis and developed an innovative multi-objective network-based optimisation method to identify robust microRNA signatures which are reliable in terms of predictive power and functional relevance. In this talk, I will go through the details of the proposed method. Also, to identify potential collaboration opportunities with the audience, I also give a concise and general overview of my research interests/projects.

About the speaker: Dr Fatemeh Vafaee received her PhD in Artificial Intelligence from the University of Illinois at Chicago in 2011. Her doctorate studies involved multiple projects in domains of optimisation, machine learning, data mining, pattern recognition, and probabilistic graphical models with the focus on theoretic and applied genetic algorithms as her PhD thesis. While perusing her PhD, Fatemeh also collaborated with the University’s Computational Biology Laboratory and extended her research to biological applications such as cellular network alignment and phylogeny reconstruction. After her PhD, Fatemeh started a postdoctoral research position at the University of Toronto, Ontario Cancer Institute, one of the largest cancer research centres in Canada and worldwide. During her postdoc, Fatemeh had the privilege to work in a highly trans-disciplinary environment and collaborate with world-renowned scholars in integrative cancer informatics. In 2013, Fatemeh took a Research Fellow position at Charles Perkins Centre and School of Maths & Stats at the University of Sydney. Her research relies on a wide national and international collaboration network and she has published several papers in competitive peer-reviewed proceedings and top-tier journals as Nature Methods, Scientific Reports, BMC Systems biology, Plos1 and Alzheimer's & Dementia.

Monday September 12, 2016

Speaker: Chendong Ma (SoMS, USyd)

Title: Honours practice talk

Monday September 5, 2016

Speaker: Shila Ghazanfar (SoMS, USyd)

Title: Integrated Single Cell Data Analysis Reveals Cell-Specific Networks and Novel Coactivation Markers

Abstract: Large scale single cell transcriptome profiling has exploded in recent years and has enabled unprecedented insight into the behavior of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterize very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterization of high gene expression, as well as gene coexpression. We offer a versatile modeling framework to identify transcriptional states as well as structures of coactivation for different neuronal cell types across multiple datasets. We employed a gamma-normal mixture model to identify active gene expression across cells, and used these to characterize markers for olfactory sensory neuron cell maturity, and to build cell-specific coactivation networks. We found that combined analysis of multiple datasets results in more known maturity markers being identified, as well as pointing towards some novel genes that may be involved in neuronal maturation. We also observed that the cell-specific coactivation networks of mature neurons tended to have a higher centralization network measure than immature neurons. Integration of multiple datasets promises to bring about more statistical power to identify genes and patterns of interest. We found that transforming the data into active and inactive gene states allowed for more direct comparison of datasets, leading to identification of maturity marker genes and cell-specific network observations, taking into account the unique characteristics of single cell transcriptomics data.

About the speaker:

Monday August 29, 2016

Speaker: Kevin Wang (SoMS, USyd)

Title: An adjustment method for gene set over-representation in boutique arrays

Monday August 22, 2016

Speaker: Cali Willet (Sydney Informatics Core Research Facility, USyd)

Title: Bioinformatics services and training

Abstract: An overview of the bioinformatics services and training available through the Sydney Informatics Core Research Facility

About the speaker: Cali Willet is a bioinformatics technician for the Sydney Informatics Core Research Facility at the University of Sydney. She completed her PhD in animal genomics and computational biology in the Faculty of Veterinary Science at the same institution. She is interested in the genetics of disease, particularly in companion and endangered animals, and in the development of bioinformatics methodologies tailored for causal locus identification in non-model organisms. As a bioinformatician for the Core Research Facilities, she is focused on providing support to bioinformatics research groups in the form of consultation, training and advocating for the needs of bioinformatics and computational biology groups at the University of Sydney.

Monday August 15, 2016 (Cancelled)

Monday August 8, 2016

Speaker: Ulf Schmitz (Research Officer, Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Intron retention redefines post-transcriptional gene regulation in mammalian and vertebrate species

Abstract: Intron retention (IR) occurs when the splicing machinery fails to excise introns from primary transcripts. This may give rise to diverse downstream effects, most often however, it induces nonsense-mediated decay (NMD) of the intron-retaining transcript. We performed a phylogenetic analysis of IR in human, mouse, dog, chicken, and zebrafish granulocytes. We found evidence that IR affects functionally related genes in granulocytes throughout evolution, many of which are orthologs. We also found a strong anti-correlation between the number of intron-retaining genes and the number of protein coding genes in a genome. Retained introns have similar characteristics in all investigated species (human, mouse, dog, chicken, zebrafish). They are shorter and have a higher GC content than their non-retaining counterparts; they often reside near the 3 prime end of a transcript and are enriched in premature termination codons. Their host genes harbour a larger number of miRNA binding sites in their 3' untranslated region and are often co-regulated in human and mouse. Our results suggest that IR is a global control mechanism affecting similar biological processes independent of specific effector genes. More important, we gained new insights that support the notion of IR as an independent mechanism of post-transcriptional gene regulation that supplements and maybe even cooperates with other form of post-transcriptional gene regulation.

About the speaker: Ulf Schmitz is a post-doctoral researcher at the Centenary Institute in Sydney. His research focuses on the design of integrative workflows combining various computational disciplines with experimentation to approach molecular biological and medical problems. Between 2003 and 2015, Ulf Schmitz worked as a systems engineer and later as bioinformatician at the Department of Systems Biology & Bioinformatics, University of Rostock, Germany. He was awarded his PhD in Bioinformatics in June 2015. Thereafter, he joined Prof John Rasko’s Gene and Stem Cell Therapy Program as a bioinformatics research officer. In January 2016, he was appointed as Conjoint Senior Lecturer at the Centenary Institute and the Sydney Medical School.

Monday August 1, 2016

Speaker: Dario Strbenac (Senior Research Associate, Statistical Bioinformatics Group, USyd)

Title: Interactive Benchmarking of Quantitative Proteomics Preprocessing Alternatives

Abstract: Mass spectrometry has long been used to analyse biological samples and find associations of altered proteins with experimental conditions. However, the focus of previous method evaluation efforts has been on the peptide amino acid sequence determination problem. Here, using a replicated Latin squares experimental design, the first comprehensive comparison of alternative choices of preprocessing alternatives on the bias and variance of protein quantitation is made. Surprisingly, the variability between iTRAQ labels is larger than between different runs of the instrument. This has consequences for research who don't adequately incorporate randomisation and blocking in their proteomic experimental designs. Secondly, the default preprocessing done by the vendor software ProteinPilot outperforms more advanced methods, such as linear models and RUV, in terms of recovering the expected fold changes (bias). Thirdly, comparing the measurements of different proteins within a sample is shown to be feasible, which was previously assumed to be inaccurate and always avoided. Finally, a benchmarking Shiny application will be demonstrated, which allows users to upload their own preprocessing of the raw data, and see how their method compares to other methods in an interactive scoreboard.

Monday July 25, 2016

Speaker: Joshua Ho (Head, Bioinformatics and Systems Medicine Laboratory, VCCRI)

Title: A systems approach to study organ development and congenital disease

Abstract: A systems biology approach is now being widely employed to systematically how molecular and signalling pathways are regulated in organ development in humans and relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the genetic causes of congenital diseases that stem from dysregulation of GRN during organ development. In this talk I will discuss latest advances in GRN inference and analysis using large amount of experimentally determined perturbation data, and how we can use GRN to study organ development and congenital diseases.

About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS).

Seminars in 2016, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 20, 2016

Speaker: Rima Chaudhuri (Metabolic Cybernetics Lab, CPC, USyd)

Title: Understanding the relationship between AKT recruitment and GLUT4 translocation to the plasma membrane in fat cells through single cell microscopy data analysis.


About the speaker: Dr. Chaudhuri was awarded her PhD in Bioinformatics from the University of Illinois, USA in 2010. Her doctoral thesis was on the discovery and design of drugs for the treatment of SARS coronavirus and Hepatitis C virus through computational modeling. While pursuing her doctorate degree, she worked as a researcher in software and pharmaceutical companies such as Blackbaud Inc., and Pfizer Inc., (USA) developing modules of scientific research software. Dr. Chaudhuri pursued her postdoctoral training at the Parc Cientific de Barcelona (PCB) as a joint affiliate between the Institute for Research in Biomedicine and the Barcelona Supercomputing Center in Barcelona in biophysical simulations. She holds two international patents in the field of drug discovery and design. After her year-long post-doc in Spain, she moved to Sydney in 2011 and joined the Garvan Institute of Medical Research in the laboratory of Prof. David E. James to work on systems biology based approaches to unravel the complexities behind the incidence of metabolic disease such as diabetes and obesity. Her strength lies in interdisciplinary research and bridging the gap between computational and basic sciences. She is currently a research fellow at the Charles Perkins Centre in the University of Sydney. Her current research interests include isolating candidate bio-markers of T2D and obesity from molecular expression profiles, understanding and targeting protein-protein interactions in disease to facilitate a cure and integrating multi-dimensional data from different platforms (transcriptomics, proteomics, interactomics and metabolomics) to acquire a precise picture of the diseased cell.

Monday June 13, 2016 (Queen's Birthday)

Monday June 6, 2016

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Computing Image Similarity for Image-Derived Disease Models

Abstract: Imaging is a critical and indispensable component of modern healthcare. The automated analysis of medical images has a vast range of applications in evidenced-based diagnosis, physician education, and biomedical research. These decision support applications are predicated on the ability to objectively compute the similarity of image content in a manner that matches the subjective similarity judgement of human domain experts. In this talk, I will present an overview of the conceptual challenges in this field before detailing my research on methods for characterising and comparing the visual content of images, including a graph-based method for comparing 3D PET-CT lung cancer images and my more recent work using convolutional neural networks.

About the speaker: Dr. Ashnil Kumar received the Ph.D. degree in information technology also from the University of Sydney in 2013; his PhD introduced a new graph-based method for modelling the relationships between tumours and organs in medical images.

Monday May 30, 2016

Speaker: Vinita Deshpande (Metabolic Cybernetics Lab, CPC, USyd)

Title: Removing unwanted variation in large scale ‘omics datasets containing missing values

Abstract: Transcriptomics and proteomics are powerful techniques to obtain a comprehensive snapshot of biological systems ranging from cells to whole organisms. However, a major problem for such big datasets is the presence of missing values, as many statistical tools used to analyse these often require complete data. One such bioinformatic tool is RUV (Removing Unwanted Variation), a widely used R package developed to remove technical variation, such as batch effects, in order to normalise the data and perform downstream analyses such as differential expression analysis.

One of the solutions to overcoming this issue of missing data is to obtain a complete dataset, by either filtering the data to eliminate missing values or performing imputation. These approaches however, can greatly reduce the sample size or biological variation, leading to loss of statistical power. The first part of this talk will describe an alternative approach in which the RUV algorithm was adapted to handle data with missing values as its input. The performance of this new algorithm was evaluated in terms of its ability to normalise and correctly identify differentially expressed genes/proteins in large ’omics datasets containing varying amounts of simulated missing values. The second part of this talk will be a discussion on the future directions and challenges of this PhD project in terms of designing and conducting further quantitative analyses on large scale ‘omics data.

About the speaker: Vinita is a PhD student supervised by Prof David James and Prof Jean Yang at The University of Sydney, where she is pursuing her research interests in the application of systems biology and bioinformatic approaches to metabolic diseases. Vinita has previously completed a Bachelor of Science (Bioinformatics) / Bachelor of Information Technology (Computer Science) with Honours from The University of Sydney. Prior to commencing her PhD, she worked as a bioinformatics research assistant with Dr Joshua Ho in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute in Sydney.

Monday May 23, 2016

Speaker: Ashley Waardenberg (Children's Medical Research Institute; Sydney Medical School, USyd)

Title: Discovering Protein-Protein Interactions from DNA sequence - insights into the cardiac gene regulatory network and disease

Abstract: NKX2-5 is a key transcription factor (TF) required for normal heart development and is implicated in a range of cardiac diseases. NKX2-5 is a critical TF for normal heart development that binds directly to DNA by recognising a specific sequence called the NKX2-5 binding element (NKE). However, until recently its genomic targets were poorly defined and the NKX2-5 protein-protein interaction network remains poorly defined. Recently we identified genomic target regions for NKX2-5 and human disease relevant mutations in cultured HL-1 cardiomyocytes using the DamID method and identified new NKX2-5 disease mechanisms (Bouveret R, Waardenberg AJ, et al. eLIFE, 2015). This talk describes our efforts at predicting and the subsequent validation of novel protein-protein interactions (PPIs) based on recurrent binding sites (or motif grammar) through the application of machine learning algorithms.

About the speaker: Dr Ashley Waardenberg is currently a postdoctoral bioinformatican at the Children's Medical Research Institute, Westmead, where he is developing systems biology approaches for investigating proteomics and high throughput protein modification data related to the brain and associated diseases; in collaboration with Dr Mark Graham and Prof Phil Robinson. He received a PhD in Systems Biology (2012) under the supervision of A/Prof Christine Wells (now at the University of Glasgow, Scotland) and Dr Brian Dalrymple (CSIRO, Australia) where he developed a novel visualisation approach for viewing gene expression data specifically in the context of striated muscle contractile protein location. A key outcome of his PhD was the discovery of a new protein-protein interaction between PI3K and a muscle mechano-sensor in the heart, implicating the muscle contractile apparatus in responding to cardiac stress which has broader implications in the context of PI3K cancer therapies (Waardenberg, et al. Journal of Biological Chemistry, 2011). During his PhD, he was also involved in the Bovine Genome Consortia which was published in Science in 2009 and was a team recipient of the CSIRO Chairman's Medal in 2010 for contributions to this international effort. He then joined the Cardiac Developmental and Stem Cell Biology Laboratory of Professor Richard Harvey at the Victor Chang Cardiac Research Institute, Darlinghurst, as a Postdoctoral Scientist to gain a deeper insight into development biology, furthering an interest in understanding the origins of disease, where he implemented systems biology strategies for understanding genome-wide binding effects of the cardiac transcription factor NKX2-5 and NKX2-5 mutations relevant to congenital heart disease. This has a resulted in a number of recent publications (Waardenberg AJ, Ramialison M et al Cold Spring Harbour of Laboratory Perspectives in Medicine, 2014; Bouveret R, Waardenberg AJ et al. eLIFE, 2015; Waardenberg AJ et al. BMC Bioinformatics, 2015) and he continues to collaborate with the Victor Chang Cardiac Research Institute.

Ashley is also a founding member and Vice-President of the Australian Bioinformatics and Computational Biology Society (ABACBS). Ashley has been heavily involved in establishing this very young society and is passionate about establishing communities in this domain.

Monday May 16, 2016

Speaker: Timur Burykin (Judith and David Coffy Life Lab, CPC, USyd)

Title: Data visualization and exploration using particle dynamics simulation

Abstract: Exploration of complex multidimensional datasets is an ongoing challenge in many fields of research. In the attempt to simplify this task for people with no expertise in advanced statistics or programming a novel method of data visualization was discovered. The algorithm applies simple particle interaction rules on data points and allows them to self-organize into layouts that approximate the clustering of objects in the multidimensional space. Complementary density map, superimposed network connectivity and configurable node properties linked to extra dimensions make this visualization method suitable for a wide range of applications. A few datasets will be demonstrated in this presentation including hospital admission records, TF-TG interaction network and results of diet experiments. The extension of the algorithm to the advanced image and network analysis will also be discussed.

About the speaker: Tim Burykin is an experienced C++ programmer who joined Charles Perkins Centre last year as a data visualization technician and a member of Judith & David Coffey LifeLab supervised by Prof. Zdenka Kuncic. He received a master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John W. Crawford.

Monday May 9, 2016

Speaker: Denis Bauer (Team leader Transformational Bioinformatics, CSIRO)

Title: VariantSpark: applying Spark-based machine learning methods to genomic information

Abstract: Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a Hadoop/Spark framework that utilises the machine learning library, MLlib, thereby providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark offers an interface to the standard variant format (VCF), seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VariantSpark, we cluster of more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach developed by the Global Alliance for Genomics and Health, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

About the speaker: Dr. Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s ehealth program. Her expertise is in high throughput genomic data analysis, computational genome engineering, as well as Spark/Hadoop and high-performance compute system. She has a PhD in Bioinformatics and has done her Postdoctoral training in machine learning and human genetics, respectively. Her collaborators include Prof Simon Foote on mammalian susceptibility to infectious diseases, Prof Ian Blair on molecular mechanisms on motor neuron disease, and Prof Rodney Scott on obesity-driven cancer. She has 23 peer-reviewed publications (9 first author, 4 senior author) with three in journals of IF>8 (e.g. Nat Genet.) and H-index 9. To date she has attracted more than AU$25Million in funding.

Monday May 2, 2016 (Florian and Falk farewell. No seminar.)

Monday April 25, 2016 (ANZAC Day)

Monday April 18, 2016

Speaker: Michael De Ridder (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: CeraVA: A Visual Analytics Framework for Neurological Disorder Analysis with Functional Magnetic Resonance Imaging

Abstract: Functional Magnetic Resonance Imaging (fMRI) is an important imaging modality for understanding and diagnosing neurological disorders, such as schizophrenia, bipolar disorder and Alzheimer's disease. The modality temporally scans blood oxygenation as a proxy for neuronal activity. This activity is often processed into three components for analysis: (i) the anatomical context; (ii) individual voxel and region (group of voxel) time-series; and (iii) the correlation of activity between regions. While many statistical and graph theoretical approaches have been applied to data, issues such as noise and a lack of understanding of the brain lead to a diverse range of challenges. Visualisation-based analytics is often used to overcome some of these challenges, however, current methods often present an oversimplification of the data. With CereVA, we integrate all three of the commonly derived activity components in a visual analytics framework comprising of a full scale pipeline that incorporates automatic image processing and interactive visualisation. Finally, we present a new application for fMRI visual analytics by applying CereVA to the active research area of classifying neurological disorders.

About the speaker: Michael de Ridder is a PhD student with The Institute of Biomedical Engineering and Technology (BMET) in the School of Information Technologies at the University of Sydney. He is supervised by A/Prof Jinman Kim. Michael's work straddles the boundary of scientific and information visualisation with a heavy influence from medical imaging techniques.

Monday April 11, 2016 (Hunter Meeting)

Monday April 4, 2016

Speaker: Taiyun Kim (Victor Chang Cardiac Research Institute, and UNSW)

Title: PAD: An interactive web portal for analysis of transcription factor co-binding at promoters and enhancers

Abstract: It has long been observed that transcription factors (TFs) bind to DNA collaboratively with other TFs as co-binding partners. Recently, through studying the genomic binding sites of essential embryonic stem cell TF NF-Y, Dr Pengyi Yang has shown that the same TF may bind DNA with different co-binding partners if we consider TF binding sites that are proximal or distal to transcription start sites separately. Based on this observation, we have developed a database of binding sites of >200 TFs in mouse embryonic stem cells, and an interactive web portal that enables any user-submitted TF binding profiles to be clustered and visualised with our database TF profiles, at the proximal and distal regions separately. Our tool contributes to our understanding of how gene regulation occurs via combinatorial binding if TFs in different cell types.

About the speaker: Taiyun Kim is a 5th year student in the Bachelor of Engineering (Bioinformatics)/Masters of Biomedical Engineering program at the University of New South Wales. In 2015, he was awarded a Summer Scholarship to work in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute (VCCRI), under the supervision of Dr Joshua Ho (VCCRI) and Dr Pengyi Yang (University of Sydney).

Monday Feb 1, 2016

Speaker: Lei Sun (School of Information Engineering, Yangzhou University, China)

Title: Study on long noncoding RNAs using computational methods

Abstract: Tens of thousands of long noncoding RNAs (lncRNAs) newly discovered have been attracting the spotlight from life science for a period of time as their important biological functions are revealed increasingly. Due to the intrinsic complexity of lncRNA functions and mechanisms, our group proposes to study the lncRNAs using a series of computational methods, which can certainly improve the research efficiency. In my talk, I would like to share some ideas on the results of lncRNA prediction using support vector machine (SVM), and to discuss potential lncRNA-specific transcriptional patterns detected using computational methods.

About the speaker: Dr. Lei Sun received a doctor of engineering degree from China University of Mining and Technology in 2013. Now he is a lecturer in School of Information Engineering at Yangzhou University, P.R. China. is research interests include bioinformatics, signal and information processing. As a visiting PhD student, Dr. Sun was previously doing research on bioinformatics in several institutes and universities respectively, including School of IT at The University of Sydney, Institute of Molecular Bioscience (IBM) at University of Queensland, and Beijing Institute of Genomics (BIG) of Chinese Academy of Sciences.

Seminars in 2015, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Seminars in 2015, Semester 2

The seminars will be held at 1:00 pm on Monday in Access Grid Room, which is on level 8 of Carslaw Building. The format of the talk is 30~45 minutes plus questions.

Monday Nov 30

Speaker: Shila Ghazanfar (SMS, Faculty of Sciences, The University of Sydney)

Title: Gene coexpression identification from single-cell expression experiments

Abstract: Classically, gene expression profiles have represented an aggregate of expression levels of each of the multitude of cells within the sample of interest. More recently, technologies utilising quantitative PCR, such as nanoString, enable measurement of expression in individual cells opposed to an amalgamation of cells. As such, using this data along with appropriate statistical models, we can ask questions such as in what proportion of cells certain genes are expressed, and we can determine the distribution of coexpression of genes among these cells. In collaboration with Associate Professor David Lin at Cornell University, whose interest lies in investigating the olfactory system in mouse models, a set of neuronal cells were assayed with special interest in the protocadherin family of genes. We describe the statistical methods for processing the single-cell expression data and identifying coexpression of genes in subsets of the cell population, including mixture modelling and visualisation techniques for further insight.

About the speaker: Shila Ghazanfar is a 3rd year PhD student and Postgraduate Teaching Fellow in the School of Mathematics and Statistics at The University of Sydney. She is supervised by A/Prof Jean Yee Hwa Yang (The University of Sydney), Dr John Ormerod (The University of Sydney) and Dr Michael Buckley (CSIRO). Her research interests are in analysing high throughput sequence data such as RNA-Seq and Exome-Seq, and integrating different types of high-throughput data. She has previously completed a Bachelor of Science (Advanced Mathematics) and Honours in Statistics at the University of Sydney.

Monday Nov 23

Speaker: Cristian Leyton (Faculty of Health Sciences, The University of Sydney)

Title: Primary progressive aphasia and its challenges

Abstract: Primary progressive aphasia (PPA) comprises a group of neurodegenerative conditions that affect predominantly the language function. As a result of the partial destruction of the language network, several clinical variants of PPA have been described, each of which has its own profile of linguistic deficits, distribution of brain atrophy, and molecular pathology. This unique group of conditions offers a natural paradigm to understand the neural basis of language processing and how neurodegeneration starts and spreads. I will explain the main challenges in the field and explore potential contributions of bioinformatics to the field.

About the speaker: Dr Leyton worked as a clinical neurologist for four years before moving to Australia in 2009. He was awarded with a PhD on progressive aphasias in 2013 at UNSW. In 2014, he was awarded with a DVC University Postdoctoral Fellowship at the Faculty of Health Sciences, University of Sydney. His main interest is the study of aphasic manifestations caused by neurodegenerative diseases.

Monday Nov 16

Speaker: David Humphreys (Victor Chang Cardiac Research Institute)

Title: Obstacles and challenges in specialised RNA sequencing (RNA-seq) analysis

Abstract: In recent times RNA-seq has become an affordable method to profile the transcriptome of a biological sample. One of the strengths of RNA-Seq over other technologies is the ability to capture information at a nucleotide level using relatively unbiased methods without having any prior genomic information. These features have given rise to many RNA-Seq applications and with this often arises new challenges in the bioinformatics analysis. In this presentation I will highlight the challenges (and some solutions) in small RNA-Seq, RNA editing and circular RNA analysis from high throughput sequencing data.

About the speaker: Dr David Humphreys is a multidisciplinary wet-lab-scientist/bioinformatician who manages the Genomics Core facility at the Victor Chang Cardiac Research Institute. His undergraduate training comprised of a joint major in Biology and Computer Science before completing honours followed by a PhD in molecular biology. After joining the Victor Chang Cardiac Research Institute (VCCRI) he developed a research interest in gene regulation and the involvement of small non-coding RNAs. Since 2009 he has been heavily involved in studies utilising high throughput sequence technologies which has allowed him to refocus his computer science skills. David has a number of active research collaborations with VCCRI faculty and St Vincent's Hospital cardiologists utilising RNA-Seq and exome sequencing.

Monday Nov 9

Speaker: James Burchfield (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Snapshots of diabetes

Abstract: The imaging of biological systems has become a fundamental tool in the cell biologists arsenal. Our lab has utilised a range of imaging techniques to probe the insulin signalling network in single cells and have data that throws into question the traditional view of signalling networks. Central to the continued success of this approach is the ability to extract relevant information from this large volume of high-content image data and whilst the analytical pipelines for large scale genomic and proteomic data has undergone a revolution in recent years, there has been a lack of development of similar tools to analyse data generated from imaging experiments. I will discuss some of the developments in this arena in the context of diabetes and insulin resistance.

About the speaker: James obtained his PhD from The University of Sydney and pursued a postdoctoral fellowship in David James' lab in Garvan Institute of Medical Research. In 2014, James relocated with the David James' lab to Charles Perkins Centre in University of Sydney. James is the expert in using high performance microscopy for single cell imaging.

Monday Nov 2

Speaker: Martin Wong (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Kinetic simulation of the Akt pathway in Insulin Signalling

Abstract: Traditional biological research pathways are often focused on discovery of novel protein-protein interactions. The temporal kinetics of signalling events is a feature that is not commonly investigated, but they may encode important information regarding the physical mechanisms underlying these interactions. The talk today will discuss how in kinetic simulations can be used to infer these physical mechanisms. The talk will begin by discussion how models are constructed in terms of the rate equation used, the network topology and the parameter fitting procedure. The application of this will then be discussed in more detail in the context of Insulin Signalling and the Akt signalling pathway where new insight has been obtained regarding the phosphorylation mechanism of Akt prior to the activation of its downstream substrates.

About the speaker: Martin comes from a very diverse background, having completed a Bachelor of Science majoring in Physics, and a Bachelor of Engineering in the Biomedical stream. He also completed an honours in engineering where he worked on developing a bioactive material for use in implanted devices. He is now a few months from completing his PhD under David James from the Metabolic Cybernetics Lab in the CPC, and Zdenka Kuncic from the Institute of Medical Physics at the School of Physics, where he is using mathematical modelling to interrogate the temporal aspects of insulin signalling.

Monday Oct 26

Speaker: Lake-Ee Quek (Coffey LifeLab, Charles Perkins Centre, The University of Sydney)

Title: Processing of metabolite data generated by mass spectrometry

Abstract: Metabolites are the chemical species transformed during metabolism. They are direct signatures of cellular state, and therefore easier to correlate with phenotype. With mass spectrometry, the appeal is the ability to rapidly measure thousands of metabolites from very small samples. The talk today will briefly introduce metabolomics, although the focus will be on data acquisition and processing, in the context of targeted metabolomics. Global metabolic profiling in an unbiased fashion is the ultimate aim in metabolomics, with the right analytics and bioinformatics.

About the speaker: Lake-Ee obtained his PhD in Cell Metabolism in The University of Queensland (UQ) in 2010. He has been a UQ Postdoctoral Fellow from 2011 to 2014 and a Research Postdoc with A/Prof. Nigel Turner, Mitochondria Bioenergetics Lab, School of Medical Sciences, UNSW, from 2014 to 2015. He recently take on a Postdoctoral Fellowship with Coffey LifeLab in Charles Perkins Centre and relocated to The University of Sydney.

Seminars in 2015, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

The seminars will be held at 1:00 pm on Monday in Access Grid Room.

Monday June 22

Mark Greenaway (Stat, Sydney)

New tools in the R ecosystem

Show abstract / Hide abstract

In the past few years, there has been an explosion of interest in data science. R has been at the forefront of this field, which has lead to a lot of positive contributions to the R ecosystem from the wider tech. community. Useful tools from the tech. community which are available for R will be outlined, particularly focusing on GitHub, visualisation, the contributions of Hadley Wickham/RStudio, Spark and cloud computing.

Monday May 25

Paul Lin (Stat, UNSW)

Gene expression Changes in Human Rett Syndrome Brain

Show abstract / Hide abstract

Rett Syndrome (RTT) is an X-linked neurodevelopmental disorder. It affects girls at a frequency of 1 out of 10000 live births. Our study is the first transcriptome-level analysis of post-mortem RTT brain tissue with age-matched controls. We have used two technologies, RNA-seq and micro-array, to replicate our findings. We have taken into consideration of tissue composition, which hasn?t been done in previous RTT studies; we have found that tissue composition affects the outcomes of differential expression analysis. More than 95% of classic RTT cases are caused by sporadic mutations in the gene encoding methyl-CpG binding protein 2 (MeCP2). Initial studies have pointed out a transcriptional repressor role of MeCP2; our data is consistent with recent data and confirms that MeCP2 is a transcriptional activator. We have also shown that intergenic L1 expression increases in human RTT brain. Lastly, co-expression networks will be demonstrated to identify brain region specific enhancer RNAs in the human brain. In this study, we have identified a set of Robust Brain-Expressed Enhancers (rBEEs). rBEEs are enriched for genetic variants associated with autism spectrum disorders (ASD).

Monday May 18

John Ormerod (Stat, Sydney)

Statistical Analysis of a Lupus Flow Cytometry Experiment

Show abstract / Hide abstract

Results from an analysis of an flow cytometry-based observational study of patients of patients with lupus will be presented. The data was collected and analysed over a five year period at the Centenary Institute at University of Sydney. The underlying biotechnology will be described and how the statistical complications associated with the data including measurement error, missing values, and outliers were resolved.

Monday May 11

Rima Chaudhuri (CPC, Sydney)

In-silico differentiation between direct and indirect protein binding partners from a large MS-based protein interactome experiment

Show abstract / Hide abstract

MS-based protein interactome experiment Abstract: Protein-protein interactions (PPIs) are crucial in all cellular processes, primarily in understanding signalling cascades and protein functions. Affinity purification, followed by mass spectrometry analysis (AP/MS), offers a powerful approach for the study of complex protein-??protein interactions. However, such MS-based high-throughput screens are notorious for high false discovery rates (FDR). Secondly, such screens do not allow differentiation between direct an indirect protein binders. In this study, we developed a scoring function that ranks putative binders based on their likelihood of being a direct binder using an array of features, including 3D protein structure information. We use the interactomes of eIF4E and 4EBP1 proteins implicated in insulin resistance as case studies to elucidate the principles behind the development of this scoring function. Lastly, Phosphortholog, a web-based tool to map orthologous post-translational modification sites on proteins across species is demonstrated.

Monday May 4

Euijoon Ahn (IT, Sydney)

Automated Melanoma Segmentation and Classification

Show abstract / Hide abstract

The segmentation of skin lesions in dermoscopic images is considered as one of the most important steps in computer-aided diagnosis (CAD) for automated melanoma diagnosis. Existing methods, however, have problems with over-segmentation and do not perform well when the contrast between the lesion and its surrounding skin is low. A new automated saliency-based skin lesion segmentation (SSLS) method is proposed, which is designed to exploit the inherent properties of dermoscopic images, which have a focal central region and subtle contrast discrimination with the surrounding regions. The proposed method was evaluated on a public dataset of lesional dermoscopic images and was compared to established methods for lesion segmentation that included adaptive thresholding, Chan-based level set and seeded region growing. Results show that SSLS outperformed the other methods in regard to accuracy and robustness, in particular, for difficult cases. Superpixels are also introduced.

Monday April 27

Shila Ghazanfar (Stat, Sydney)

Integrative Analysis of Somatic Mutations with Focus on Biological Pathways

Show abstract / Hide abstract

The development and severity of cancers depends on the somatic mutations occurring in the tissue. Technologies like whole exome and whole genome sequencing (WES/WGS) have allowed for interrogation of the somatic mutations taken on in a tumour compared to normal tissue in a patient. However, it is clear that some mutations are worse than others, leading to work in identification of genes harbouring ?driver? mutations as opposed to ?passenger? mutations. Further to this there is work in elucidating the role these mutations play in the system as a whole, via integration of mutation, gene expression, and network information (e.g. protein-protein interaction networks), as well as other data sources. In this seminar I will discuss my work up to date on methods that aim to answer these questions, with focus on the melanoma dataset.

Monday April 20

Ellis Patrick (Stat, Sydney)

Using Resampling to Fit Better Models

Show abstract / Hide abstract

The weighted bootstrap is one of many procedures for evaluating the goodness of fit of a model. I would like to attempt to highlight how and why this work changed the way I thought about cross-validation and, most importantly, the practical impacts of using a weighted bootstrap for estimating LASSO penalty parameters. Diane Loo's work is highly relevant for anyone that has ever used cross-validation or ever plans to. I will use the prognostic melanoma data to highlight a few of the limitations of cross-validation. Time permitting, some of the issues we have faced when trying to explore and validate the weighted bootstrap will be explained. This work is currently being drafted for journal submission.

Monday April 13

Kevin Wang (Stat, Sydney) and Sarah Romanes (Stat, Sydney)

Data Exploration and Subtype Discovery and Prognosis Prediction

Show abstract / Hide abstract

Finding an appropriate measure of association between connected regions of brain resting fMRI datasets. Potential challenges of the project are noted and some exploratory analysis on a cleaned fMRI dataset is shown.