Follow us_

Statistical Bioinformatics Seminar

Please visit the Sydney Precision Data Science Centre events page to sign up for the mailing list and check out upcoming seminars. This list remains only as a historical record.

The Statistical Bioinformatics Seminar is hosted jointly by the Sydney Precision Data Science Centre, and the Integrative Systems and Modelling Theme and Judith and David Coffey Life Lab in the Charles Perkins Centre. The aim of this series is to provide a forum for people working within the broad area of computation and statistics and their application to various aspects of biology to present their work and showcase their ongoing projects. It is intended to foster the exchange of ideas and build potential collaborations across multiple disciplines.

Seminars in 2023, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 5 2023

Speaker: Professor Ben Hayes (University of Queensland)

Title: Can recent advances in machine learning help feed the world?

Abstract: Machine learning has enabled step changes in progress in some fields recently, most notably in predicting 3D structure of proteins and generative text on very diverse topics. These applications use deep learning models trained on colossal data sets to make their predictions. Crop and livestock breeders are now using phenotypic, genomic and omic data sets with billions of data points to breed higher yielding, more adapted varieties and animals. Given this explosion of data, and the ability of machine learning to analyse large data sets, it would seem useful to explore how machine learning can assist crop and livestock breeders. in this presentation, the potential application of machine learning to contribute to crop and livestock tasks is assessed and compared to existing methods. The conclusion is that for some tasks, machine learning could make a major contribution, for other tasks, existing methods outcompete machine learning methods.

About the speaker: Professor Hayes has extensive research experience in genetic improvement of livestock, crop, pasture and aquaculture species, with a focus on integration of genomic information into breeding programs, including leading many large scale projects which have successfully implemented genomic technologies in livestock and cropping industries. Author of more than 300 journal papers, including in Nature Genetics, Nature Reviews Genetics, and Science, contributing to statistical methodology for genomic, microbiome and metagenomic profile predictions, quantitative genetics including knowledge of genetic mechanisms underlying complex traits, and development of bioinformatics pipelines for sequence analysis. Highly cited researcher 2015 - 2022.

Monday May 29 2023

Speaker: Dr Xiaohang Fu and Dr Yingxin Lin (University of Sydney)

Title: Biologically-informed deep learning for segmentation of subcellular spatial transcriptomics data

Abstract: Recent advances in subcellular imaging transcriptomics platforms have enabled high-resolution spatial mapping of gene expression, while also introducing significant analytical challenges in accurately identifying cells and assigning transcripts. Existing methods grapple with cell segmentation, frequently leading to fragmented cells or oversized cells that capture contaminated expression. To this end, we present BIDCell, a data-driven strategy that maximises the utilisation of relevant information, including single-cell transcriptomics data from public repositories. BIDCell leverages a self-supervised deep learning framework that innovatively incorporates cell type and morphology data through biologically-informed loss functions. Utilising a comprehensive evaluation framework consisting of metrics in five complementary categories for cell segmentation performance, we demonstrate that BIDCell outperforms other state-of-the-art methods according to many metrics across a variety of tissue types and technology platforms. Our findings underscore the potential of BIDCell to significantly enhance single-cell spatial expression analyses, including cell-cell interactions, enabling great potential in biological discovery.

About the speakers:

Xiaohang Fu received her PhD in Computer Science from The University of Sydney in 2023 and her Bachelor of Engineering (Honors) specializing in Biomedical Engineering with First Class Honors in 2018 from The University of Auckland. Currently, she is a postdoctoral research fellow at the University of Sydney. Her research interests include deep learning, medical image classification, segmentation, and analysis.

Yingxin Lin is a Postdoctoral research associate at the University of Sydney. She completed her PhD in Statistics at the University of Sydney in 2022 and her Bachelor of Science (Honors) in Statistics in 2017 from The University of Sydney. She is a member of the School of Mathematics and Statistics and Sydney Precision Data Science Centre. Her research interests lie broadly in statistical modelling and machine learning for various omics, biomedical and clinical data.

Monday May 22 2023

Speaker: Professor Eduardo Eyras (ANU)

Title: Characterising the (epi)transcriptome at single-molecule resolution with long-read sequencing

Abstract: We describe our recent developments in the study of transcriptomes and epitranscriptomes using long-read sequencing. The epitranscriptome embodies many largely unexplored functions of RNA. A major roadblock in epitranscriptomics is the lack of transcriptome-wide methods that detect multiple RNA modifications, identify RNA modifications in individual molecules, and estimate modification stoichiometry accurately. We addressed these issues with CHEUI, a new method that processes signals from nanopore direct RNA sequencing to identify RNA modifications at single molecule resolution from any sample. CHEUI outperforms other methods in detecting m6A and m5C sites and quantifying their stoichiometry, and further reveals a non-random co-occurrence of m6A and m5C in mRNA transcripts in cell lines and tissues. We also describe RISER, a method to perform in-silico biochemical-free enrichment or depletion of RNA classes in real time during direct RNA sequencing. RISER identifies RNA classes directly from the first few seconds of the signal without needing basecalling or a reference and communicates with the sequencing hardware in real-time to enact biochemical-free targeted RNA sequencing. We illustrate RISER for the enrichment and depletion of coding and non-coding RNA, demonstrating a 3.4-3.6x enrichment and 6.2-6.7x depletion of non-coding RNA in live sequencing experiments. RISER and CHEUI unlock novel ways to study transcriptomes and epitranscriptomes and enable discoveries and new applications across multiple fields.

About the speaker: Eduardo Eyras is an EMBL Australia Group Leader and Professor at the Australian National University (ANU), where he develops computational methods to study transcriptome and epitranscriptomes and their alterations in cancer using long-read sequencing technologies. Before joining ANU, Eduardo Eyras worked at the Sanger Institute (2001-2004) and was group leader at the Pompeu Fabra University in Barcelona, Spain (2005-2019). During this time, he developed methods to annotate RNA alternative splicing in genomes, contributed to the landmark analyses of the human, mouse, rat, chicken, and cow genomes, and led a research program on Machine Learning applied to RNA biology and cancer.

Monday May 15 2023

Speaker: Dr Siyuan Ma (Vanderbilt University)

Title: Modelling the Joint Distribution of Compositional Microbiome Data

Abstract: Microbiome epidemiology demands generative models of community profiles for study design considerations such as power analysis. We developed SparseDOSSA, a statistical model that parameterizes microbial communities and can be used to simulate new, realistic profiles to inform study designs. Our model connects zero-inflated marginals with a Gaussian copula, and has an additional renormalization component. As such, it uniquely satisfies common compositional, zero-inflation, and interaction properties of microbiome data. We demonstrate that SparseDOSSA accurately models human-associated microbiomes, and can generate realistic synthetic communities with prescribed population and ecological structures. We provide an open-source implementation for SparseDOSSA, which can be used in practice for power analysis and method benchmarking to inform microbiome study designs.

About the speaker: Siyuan’s work focuses on statistical methods for modern molecular epidemiology applications. His methods research includes batch correction and meta-analysis, dimension reduction, high-dimensional conditional testing, and simulation models for power analysis. His application areas include the healthy and dysbiotic microbiome, cancer transcriptomics, and spatially resolved imaging proteomics. He obtained his Ph.D. in biostatistics from Harvard T.H. Chan School of Public Health and had postdoctoral training at the University of Pennsylvania.

Monday May 8 2023

Speaker: Dr Yue Cao (University of Sydney)

Title: Methods towards precision bioinformatics in single cell era

Abstract: Single-cell technology offers unprecedented insight into the molecular landscape of individual cell and is transforming precision medicine. Key to the effective use of single-cell data for disease understanding is the analysis of such information through bioinformatics methods. In her PhD thesis, she examines and addresses several challenges in single-cell bioinformatics methods for precision medicine. First, the thesis discusses the challenges of single-cell bioinformatics and the recent success of deep learning and ensemble learning. It then introduces SimBench, a comprehensive framework for evaluating single-cell RNA-sequencing data simulation tools. It also presents scFeatures, an approach for creating interpretable molecular representations of individuals. Finally, the thesis applies scFeatures to multiple COVID-19 scRNA-seq data in a case study, demonstrating the impact of deep learning and ensemble learning on disease outcome prediction

About the speaker: Yue Cao has recently completed her PhD in bioinformatics under the guidance of Prof. Jean Yang, A/Prof. Pengyi Yang and Dr. Shila Ghazanfar at the University of Sydney. She is currently a post-doctoral researcher at the Sydney Precision Data Science Centre at the University of Sydney. Her research interests revolve around the computational analysis of high-dimensional omics data in the era of precision medicine, with a particular focus on single-cell data.

Monday May 1 2023

Speaker: Dr Anna Cuomo (Garvan Institute)

Title: Characterising the effects of genetic variation on gene expression at single-cell resolution

Abstract: Single-cell RNA sequencing (scRNA-seq) is widely applied to assess cellular heterogeneity in human tissues and cell-based models. Technological advances and exponential reduction in cost have enabled the first population-scale scRNA-seq studies, which have assayed single-cell transcriptomes in hundreds of genetically diverse individuals. However, current workflows to analyse these data remain primarily based on principles for analysing conventional bulk expression quantitative trait locus (eQTL) studies, and hence fail to fully exploit complex scRNA-seq readouts. A critical limitation of existing approaches is the need to define discrete cell types for eQTL mapping a priori, which limits novel opportunities to chart continuous and unbiased landscapes of regulatory variants. To address this, we recently proposed CellRegMap, a statistical framework to map regulatory variants across the continuous manifold of cellular environments estimated from scRNA-seq. CellRegMap allows for testing and characterisation of genetic effects on gene expression at the resolution of individual cells while flexibly sharing statistical strength related to cell states. Our framework provides a principled strategy to identify and characterise heterogeneous genetic effects that vary across cell states and cell types. Here, I will describe applications of CellRegMap on both real and simulated datasets, and describe future challenges in the broader field of single-cell population genetics.

About the speaker: Anna completed undergrad studies (BSc) in Applied Maths at Milano Polytechnic (Italy), and a MSc in bioinformatics from Delft University of Technology (Netherlands). She obtained her PhD at the University of Cambridge and the EMBL-EBI (UK) co-supervised by Dr. Oliver Stegle and Dr. John Marioni. After a brief bridging postdoc at the Sanger Institute with Prof. Nicole Soranzo, she started her current role as an EMBO Postdoctoral fellow at the Garvan Institute in Sydney (Australia), in Joseph Powell and Daniel MacArthur Lab. Her main interest and expertise lies at the intersection between single-cell genomics and human genetics, and the development of statistical models to link DNA variants to single-cell expression profiles.

Monday April 17 2023

Speaker: Dr Maggie Stanislawski (University of Colorado)

Title: The gut microbiota and weight loss: Results from a weight loss intervention of daily caloric restriction versus intermittent fasting

Abstract: Altered gut microbiota has been linked to obesity and may influence weight loss. In this presentation, I will present findings from a study examining the gut microbiota of participants during a weight loss intervention trial of daily caloric restriction (DCR) versus intermittent fasting (IMF). After three months of this intervention, participants experienced significant improvements in clinical health measures, along with altered composition and diversity of fecal microbiota. I will discuss the associations between gut microbiota characteristics and changes in clinical measures during the intervention, such as weight and waist circumference. These initial analyses also show differences between DCR and IMF participants in the relative abundance of the genus-level taxon of Akkermansia in response to the interventions. I will also discuss current and ongoing work to extend these results to include data from the full year-long intervention, as well as 6-months post intervention, and to understand the roles of metabolomics, host genetics, and DNA methylation. Our results provide insight into how omic profiles respond to a weight loss intervention, and how they relate to weight loss responsiveness.

About the speaker: Maggie Stanislawski is a molecular epidemiologist at the University of Colorado School of Medicine, and her work aims to understand the role of the gut microbiome and related molecular profiles in health and disease, specifically obesity, cardiometabolic disease and inflammatory conditions. Before completing her PhD in Epidemiology at the Colorado School of Public Health, Dr. Stanislawski completed her bachelor’s degree in Mathematics at Pomona College and a master’s degree in Statistics at Colorado State University.

Monday April 3 2023

Speaker: Professor Melanie Bahlo (The Walter and Eliza Hall Institute of Medical Research)

Title: Multi-Omics and AI approaches for the study of retinal diseases

Abstract: The retina, or back of the eye, is a very specialized human tissue with incredible energy demands and a unique metabolism that is still poorly understood. The human eye is also unique in the tree of life, with special adaptations and spatial landmarks only observed in humans and higher primates. Retinal diseases often correspond to such landmarks, for example age-related macular degeneration and macular telangiectasia Type (MacTel) are hallmarks of the macula, a particular part of the retina. In this talk I will summarise our journey into the area of retinal diseases which have helped us to understand MacTel. I will also cover current work on the subset of the UK Biobank cohort with OCT imaging, where we are using our insights to inform retinal biology and disease.

About the speaker: Professor Bahlo is the Theme Leader of the “Healthy Development and Ageing” theme at The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia, overseeing the scientific strategy for three divisions, including the Population Health and Immunity division which she co-established in 2015. A bioinformatician/statistical geneticist with over 20 years’ experience, Professor Bahlo’s research aims to understand the genetic basis of human diseases, with a focus on neurological and retinal disorders including epilepsy, ataxia, Parkinson’s disease, Macular Telangiectasia type 2 (MacTel) and Age-related Macular degeneration (AMD). Professor Bahlo’s research lab has developed novel analysis methods and software particularly for identity by descent methods and repeat expansions. Her lab also enjoys working on large cohorts with multi-omic data and is increasingly utilizing AI enabled phenotypes to identify biological mechanisms. This work has led to the identification of the role of many genes in disease and understanding of genetic pathways, also providing genetic diagnoses for many patients.

Monday March 27 2023

Speaker: Dr Zhana Duren (Clemson University)

Title: Continuous lifelong learning for modelling of gene regulation from single cell multiome data by leveraging atlas-scale external data

Abstract: Inferring context-specific Gene Regulatory Networks (GRNs) from genomics data is a crucial task in computational biology. However, the accuracy of inferred GRNs is often low due to the limitations of current methods. We developed a method called scPECA, which infers gene regulation from single cell Paired gene Expression and Chromatin Accessibility data from the same cell. We also propose a metric called the "pioneer index" which aims to improve the accuracy of GRN and interpretability of the model by providing a quantitative measure of the TFs' ability to initiate chromatin remodeling. The scPECA method achieved 3 times higher accuracy compared to currently available GRN inference methods when ChIP-seq data was used as the ground truth. We found that disease genes from both differential expression analysis and Genome-Wide Association Studies (GWAS) were enriched in the target genes of TFs with high pioneer index scores.

About the speaker: Dr Zhana Duren earned his BS in Mathematics and Applied Mathematics from Beihang University (China) in 2012. He was awarded his PhD in Operational Research and Cybernetics from Academy of mathematics and systems science, Chinese Academy of Sciences. From 2015-2020, he worked in Professor Wing Hung Wong’s lab at Stanford University as a visiting PhD student (2015-2017) and postdoc research fellow (2017-2020). He is currently Assistant Professor in the Department of Genetics and Biochemistry at Clemson University.

Monday March 20 2023

Speaker: Dr Martin Jinye Zhang (Harvard School of Public Health)

Title: Polygenic enrichment distinguishes disease associations of individual cells in single-cell RNA-seq data

Abstract: Single-cell RNA-sequencing (scRNA-seq) provides unique insights into the pathology and cellular origin of disease. We introduce scDRS, an approach that links scRNA-seq with polygenic disease risk at single-cell resolution, independent of annotated cell-types. scDRS identifies cells exhibiting excess expression across disease-associated genes implicated by genome-wide association studies (GWAS). We applied scDRS 74 diseases/traits and 1.3M single-cell gene-expression profiles across 31 tissues/organs. Cell-type-level results broadly recapitulated known cell-type-disease associations. Individual-cell-level results identified subpopulations of disease-associated cells not captured by existing cell-type labels, including T cell subpopulations associated with inflammatory bowel disease, partially characterized by their effector-like states; neuron subpopulations associated with schizophrenia, partially characterized by their spatial locations; hepatocyte subpopulations associated with triglyceride levels, partially characterized by their higher ploidy levels. Genes whose expression was correlated with the scDRS score across cells (reflecting co-expression with GWAS disease-associated genes) were strongly enriched for gold-standard drug target and Mendelian disease genes.

About the speaker: Dr Martin Jinye Zhang is a research associate at Harvard School of Public Health, advised by Prof. Alkes Price. He obtained a PhD in Electrical Engineering from Stanford University, advised by Prof. David Tse and Prof. James Zou. He is the recipient of the 2021 ASHG Epstein postdoc semifinalist award, the 2020 Top 50 Life and Biological Sciences Articles in Nature Communications, and the 2019 RECOMB best paper award. His research focuses on the development of statistical methods that integrates GWAS and functional genomics to uncover the genetic basis of human disease. Areas of interest include functional components of heritability, disease-critical cellular contexts, and causal inference approaches to identify disease genes and proteins.

Monday March 13 2023

Speaker: Dr Xiuwei Zhang (Georgia Institute of Technology)

Title: Single cell multi-omics data integration and evaluation

Abstract: Single cell data integration methods aim to integrate cells across data batches and modalities, and data integration tasks can be categorized into horizontal, vertical, diagonal, and mosaic integration. Although many methods have been developed for data integration, there are scenarios that need special precaution, for example, when the datasets come from different patients under varying medical conditions. Also, quantitatively evaluating the performance of computational methods in single cell genomics has been a challenge due to the lack of ground truth information. I will present scDisInFact, a method to integrate scRNA-seq data from different patients while disentangling batch effects from biological variations across batches associated with patient conditions; and scMultiSim, a simulator for multi-modality single-cell data that can be used to benchmark a range of computational methods including data integration methods.

About the speaker: Xiuwei Zhang is an Assistant Professor and a J.Z. Liang Early Career Professor at the School of Computational Science & Engineering at Georgia Tech. She obtained her PhD in computer science from EPFL (École Polytechnique Fédérale de Lausanne) in Switzerland, and conducted postdoctoral research at Cambridge, UK and UC Berkeley. Her research focuses on developing methods to analyze single cell genomics data, including methods to study cell temporal dynamics, to perform data integration and to infer molecular interactions. She is a recipient of an NSF CAREER Award and the NIH Maximizing Investigators’ Research Award.

Monday Feb 27 2023

Speaker: Dr. Katrina Stuart (University of Auckland)

Title: A whole genome perspective on genetic variation and rapid adaptation

Abstract: Evolutionary theory tells us that the immense diversity that exists on this planet does so through a complex combination of factors, in which genetic variation plays a central role. It is on this genetic variation that selection may act, enabling adaptation. Ongoing developments in sequencing and analytical tools are enabling more comprehensive characterization of diverse components of genomic variation present in the natural world. Invasive species are often used as eco-evolutionary model species in such studies, as the rapid evolutionary shifts they undergo post introduction allows study of how molecular mechanisms and genomic variation underly adaptive processes. Amongst these invasive species is the globally invasive European (or common) starling. Their global success presents us an invaluable opportunity to test predictions about the how different aspects of genetic variation may influence a populations’ ability to rapidly adapt to a novel environment. My research covers a range of different genomic approaches, incorporating single nucleotide polymorphisms, structural variants, transposable elements, museum genomics, as well as environmental and phenotypic data to better understand mechanisms and patterns of rapid adaptation within this species. More broadly, this evolutionary research into the starling provides an important perspective on the role of rapid evolution in invasive species persistence, and how different aspects of genetic variation may contribute to population resilience under a shifting climate.

About the speaker: Katarina Stuart’s research interests are in evolutionary genomics and exploring how local adaptation is facilitated in invasive populations. She completed her PhD in 2022 at the University of New South Wales, looking at the genomic mechanisms underpinning rapid adaptation in the globally invasive European starling. Katarina is now a Research Fellow at the University of Auckland, where she is extending her research to focus primarily on the role of transposable elements in rapid adaption, using both the starling and the common myna as model systems.

Seminars in 2022, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday Nov 21 2022

Speaker: Dr. Lianrong Pu

Title: Graph Algorithms and Machine Learning for Deep Sequencing Data Analysis

Abstract: High-throughput sequencing techniques generate large volumes of DNA sequencing data at ultra-fast speed and extremely low cost. To handle the large datasets produced by these techniques, efficient data structures and algorithms are necessary and have been developed in recent decades. How to analyse these deep sequencing data by using graph algorithms and machine learning methods will be introduced. Specifically, a three-way classifier for metagenome assemblies will be the focus. Viruses and plasmids are part of microbial communities and play a major role in disease and in antibiotic resistance. In metagenome sequence assembly, identifying virus and plasmid contigs is a hard task, since they tend to form shorter contigs and are overwhelmed by a larger mass of bacterial contigs. 3CAC is a new classifier that builds on machine learning based classifiers and exploits the structure of assembly graph for the classification of contigs into bacterial, viral, plasmidic, and unknown contigs. In simulated and real metagenomes of short and long reads, 3CAC outperformed the state-of-the-art algorithms.

About the speaker: Lianrong Pu is a Postdoctoral Researcher at Prof. Ron Shamir’s lab at Tel Aviv University. She received her PhD from Shandong University in 2018. She was also a visiting PhD student at UC San Diego. Her PhD thesis received ACM SIGBIO Doctoral Dissertation Award in 2019. Her research interest is in algorithm development in computational genomics and computational theory.

Monday Nov 14 2022

Speaker: Prof. Dr. Constantin Pape (University of Göttingen, Germany)

Title: Spatial Transcriptomics meets Microscopy: Image Analysis and Visualization

Abstract: Morphology, gene expression and cellular dynamics are three fundamental observable modalities used to study cells. Recent advances in spatial transcriptomics and microscopy enable joint recording of all three and promise to further our understanding of cellular processes. However, this requires the analysis of large and diverse datasets - a challenging task, especially due to the different foundational methods across communities with expertise in sequence and image analysis. Challenges and solutions to image analysis and visualization for joint transcriptomics and microscopy will be discussed, based on a collaborative project to map the transcriptome during early mouse embryo development.

About the speaker: Constantin completed a Bachelor of Science in 2013 at Ruprecht-Karls University, followed by a Master of Science in 2016 and a PhD degree in 2021 at the same university, majoring in physics for all three degrees. Earlier this year, he launched and leads the Computational Cell Analytics research group. His research focuses on deep learning methods for computer vision, particularly segmentation, applied primarily to microscopy images. He is passionate about open-source software and open science and actively contributes to several related efforts such as bioimage.io and modelzoo.

Monday Nov 7 2022

Speaker: Associate Professor Andreas Mund (University of Copenhagen, Denmark)

Title: Single Cell and Deep Visual Proteomics for Spatial Molecular Profiling in Tissue

Abstract: Understanding intertumoral heterogeneity in cancer biology to improve the diagnosis and treatment of specific cancer subtypes is key. Single-cell RNA sequencing has revealed novel molecular regulators associated with tumour growth, metastasis and drug resistance. However, at the protein level, arguably the closest proxy for biological function or dysfunction, single-cell variations are particularly challenging to study due to limited sensitivity and robustness of current instrumentation. Combining the visual information with the molecular phenotype using antibody-based bioimaging, and the unbiased characterization of proteomes that integrates single-cell and spatial data, remains elusive. Here we combine artificial intelligence (AI)-powered image-based analysis of cellular phenotypes with ultra-high sensitivity (single-cell) mass spectrometry (MS)-based proteomics. This concept, called Deep Visual Proteomics (DVP), ties together the visual information defining cellular identity and heterogeneity with cellular neighbourhoods and the underlying proteomic signatures in an unbiased and systems-wide way. Applied to biobank tissue samples of melanoma, DVP captured the spatial proteome during disease progression from normal melanocytes, over pre-cancerous in-situ lesions to fully invasive melanoma. We described key pathways, e.g. a metabolic switch and RNA splicing, dysregulated in cancer progressing. This highlights the sensitivity of DVP to retain the spatial information of protein expression in the tissue context essential for a true mechanistic understanding of tissue organization, development, and pathogenesis.

About the speaker: Andreas Mund is an associate professor in the clinical proteomics group of Professor Matthias Mann at the Novo Nordisk Foundation Center for Protein Research, University of Copenhagen. He has a dual education profile with a degree in biotechnology engineering from the Anhalt University of Applied Sciences and a PhD in protein biochemistry from the University of Hamburg. His research focuses on the characterization of single cell identity and heterogeneity in tissue biobank samples by a combination of high parametric imaging, artificial intelligence, and ultrahigh sensitive proteomics.

Monday Oct 31 2022

Speaker: Professor Melissa Davis (Walter and Eliza Hall Institute)

Title: Analysis of spatial transcriptomics data at both region based and single cell based resolution.

Abstract: Spatial transcriptomics provides us with a strategy to tie transcriptional measurements to specific locations in tissues, both at regional and single cell resolutions. Further, the new single-cell spatial transcriptomics platforms that use image based detection are producing data unlike that we have seen from traditional sequencing based technologies. I will present our analysis of two region based COVID 19 datasets, as well as our initial analysis of both Nanostring CosMX and 10x Xenium datasets, and discuss the bioinformatics and statistical challenges presented by these different platforms.

About the speaker: Professor Melissa Davis is a computational biologist and group leader at the Walter and Eliza Hall Institute, where she leads a highly multidisciplinary research team focused on computational research in cancer progression and plasticity. She is also Joint Division Head of the WEHI Bioinformatics Division, a research division comprising >60 bioinformaticians, biostatisticians and computational scientists as well as research higher degree students and visiting scientists across seven research labs and a core Bioinformatics Facility. Over the last six years she has also collaborated extensively with the Co-operative Research Centre for Cancer Therapeutics, directly supporting commercial collaborations with large international pharmaceutical companies. She has recently been appointed as Professor in the Faculty of Medicine at the University of Adelaide, and in 2023 will take up an appointment as Program Leader in Computational Systems Oncology in the newly established South Australian ImmunoGenomics Cancer Institute (SAiGENCI), where she will be a part of the senior leadership team recruited to establish Australia’s first new medical research institute in over a decade. Melissa is recognised nationally and internationally for her work in computational biology and network analysis, knowledge engineering, and the analysis of cancer plasticity. Her recent work developing methods for newly emerging spatial molecular measurement platforms has already resulted in substantial international recognition and collaborations with research labs and companies world wide.

Monday Oct 24 2022

Speaker: Ms Luli Zou (Harvard University)

Title: Detection of allele-specific expression in spatial transcriptomics with spASE

Abstract: Allele-specific expression (ASE), or the preferential expression of one allele, can be observed in transcriptomics data from early development throughout the lifespan. However, the prevalence of spatial and cell type-specific ASE variation remains unclear. Spatial transcriptomics technologies permit the study of spatial ASE patterns genome-wide at near-single-cell resolution. However, the data are highly sparse, and confounding between cell type and spatial location present further statistical challenges. Here, we introduce spASE, a computational framework for detecting spatial patterns in ASE within and across cell types from spatial transcriptomics data. To tackle the challenge presented by the low signal to noise ratio due to the sparsity of the data, we implement a spatial smoothing approach that greatly improves statistical power. We generated Slide-seqV2 data from the mouse hippocampus and detected ASE in X-chromosome genes, both within and across cell type, validating our ability to recover known ASE patterns. We demonstrate that our method can also identify cell type-specific effects, which we find can explain the majority of the spatial signal for autosomal genes. The findings facilitated by our method provide new insight into the uncharacterized landscape of spatial and cell type-specific ASE in the mouse hippocampus.

About the speaker: Luli Zou is a 5th year PhD student in Biostatistics at Harvard. She is co-advised by Martin Aryee and Rafael Irizarry. Her research focuses on the development of statistical and computational methods for spatial genomic data, including spatial transcriptomics and 3D genome data.

Monday Oct 17 2022

Speaker: Associate Professor Rui Wang (Harvard University)

Title: Design and analysis of stepped-wedge cluster randomized trials in the presence of exposure-time specific treatment effect heterogeneity

Abstract: A stepped-wedge cluster randomized trial is a unidirectional crossover study in which timings of treatment initiation for clusters are randomized. Because the timing of treatment initiation is different, an emerging question is whether the treatment effect depends on the exposure time, namely, the time duration since the initiation of treatment. In this talk, I will discuss several modeling approaches for assessing treatment effect heterogeneity across exposure times, including existing approaches that either assume a parametric functional form of exposure time or model the exposure time as a categorical variable, and a new model formulation through the inclusion of a random effect to represent varying treatment effects by exposure time. We consider both testing for the null hypothesis of a constant treatment effect across exposure times and the estimation of an average treatment effect as well as exposure-time specific treatment effects. In addition, we derive a variance formula to facilitate the design of stepped-wedge cluster randomized trials with heterogeneous exposure-time specific treatment effects.

About the speaker: Dr. Rui Wang is an Associate Professor of Population Medicine and Director of the Division of Biostatistics in the Department of Population Medicine at Harvard Medical School and the Harvard Pilgrim Health Care Institute. She is also an Associate Professor in the Department of Biostatistics at Harvard T.H. Chan School of Public Health. Her research interests include the design, monitoring, and analysis of parallel and stepped-wedge cluster randomized trials, analysis of longitudinal data and time-to-event data and addressing missing data issues in correlated or distributed data settings.

Monday Oct 10 2022

Speaker: Ms Ariel Hippen (the University of Pennsylvania)

Title: Deconvolution of cancer data: best practices for bulk and scRNA-seq

Abstract: Bulk RNA-seq is an efficient, scalable method for profiling gene expression, but it masks information about the cell type composition of a tissue sample. Deconvolution allows for the computational estimation of cell type proportions in bulk samples. Many deconvolution software packages have been created in recent years, especially with the advent of single-cell RNA-seq allowing for more precisely defined cell type expression profiles. However, most deconvolution methods were designed for normal tissue and have not been benchmarked on a cancer dataset. Also, experimental design decisions can introduce bias in deconvolution results. We describe lessons learned on a new dataset of high-grade serous ovarian tumors, as well as best practices recommendations for scientists looking to use deconvolution to study tumor heterogeneity.

About the speaker: Ariel is a fifth-year PhD candidate at the University of Pennsylvania, studying Genomics and Computational Biology under the supervision of Casey Greene. Her research focuses on developing new methods for processing and leveraging single-cell RNA-seq data, particularly in challenging tumor contexts. Before her graduate work, Ariel graduated with a bachelor's degree in Bioinformatics from Brigham Young University and worked as a Senior Bioinformatics Analyst at AncestryDNA.

Monday Sep 26 2022

Speaker: Mr Taiyun Kim (the University of Sydney)

Title: Development of statistical methods for integrative omics analysis in precision medicine

Abstract: Precision medicine is an integrative approach to the prevention and treatment of complex diseases such as cardiovascular disease that considers an individual’s lifestyle, clinical information, and omics profile. In the last decade, the advances in omics technologies have allowed researchers to gain insight into biological systems and progress to precision medicine. Many omics technology now enables us to rapidly generate, store and analyse data at a large scale. While many strategies have been developed to integrate large-scale multi-batch and multi-omics data, challenges remain in developing a robust method capable of pre-processing large-scale datasets, handling mislabelled information, and performing integrative analysis. This thesis proposed several methodologies and tools to address the challenges from data processing to integrative analysis of omics in precision medicine, including, (1) novel strategies and robust methods for removal of unwanted variation from large-scale metabolomics data, (2) Visualisation tool for integrative omics analysis, and (3) cell-type identification methods in single cell transcriptomics data.

About the speaker: Taiyun Kim is in the final stages of completing his PhD at The University of Sydney having recently submitted under the supervision of A/Prof. Pengyi Yang and Prof. Jean Yang. He is currently working as a Research Associate at the Digital Science Initiative and a member of the Judith and David Coffey Life Hub at The University of Sydney. His research interest is in methods development of omics data processing and integrative omics analysis in precision medicine.

Monday Sep 19 2022

Speaker: Professor Saunak Sen (University of Tennessee)

Title: Bilinear models for structured high-throughput data

Abstract: Data from many high-throughput technologies can be presented as a large matrix with covariates annotating both rows and columns of the matrix. For example, metabolomic data may be presented as a matrix with rows as samples/individuals and columns as metabolites. We have covariate information on both individuals/samples and metabolites. Typically a two-step analysis is done where the metabolites are analyzed individually and then a second analysis is done across metabolites. We present a family of bilinear models for modeling such data in one step, that accounts for correlations between features (eg. metabolites) with sparsity constraints, if desired. The central idea is that signals are easier to detect when we can aggregate signals over features and individuals. These models can be applied to a wide variety of data including high-throughput genetic screens, metabolomics, gene-environment interaction analyses, and longitudinal traits. We have developed a number of Julia packages for these models. See: https://senresearch.github.io.

About the speaker: Professor Saunak Sen is Professor and Chief of Biostatistics, Department of Preventive Medicine, University of Tennessee Health Science Center, in Memphis, TN. He develops statistical methods for understanding biological systems using genetic variation and high-dimensional data. Current interests include developing statistical approaches for matrix-valued high-throughput data, computational methods for large-scale linear mixed models, and statistical computing using the Julia programming language. His group uses several complementary approaches including bilinear models, penalized regression, multivariate kernel regression, matrix factorization, and gradient-based optimization techniques. Dr. Sen obtained his PhD in statistics from the University of Chicago, and did postdoctoral work at Stanford University and the Jackson Laboratory. After several years on the faculty at the University of California San Francisco, he joined UTHSC in 2015.

Monday Sep 12 2022

Speaker: Professor Katalin Susztak (the University of Pennsylvania)

Title: Epigenomic and transcriptomic analyses define core cell types, genes and targetable mechanisms for kidney disease

Abstract: More than 800 million people suffer from kidney disease, yet the mechanism of kidney dysfunction is poorly understood. In the present study, we define the genetic association with kidney function in 1.5 million individuals and identify 878 (126 new) loci. We map the genotype effect on the methylome in 443 kidneys, transcriptome in 686 samples and single-cell open chromatin in 57,229 kidney cells. Heritability analysis reveals that methylation variation explains a larger fraction of heritability than gene expression. We present a multi-stage prioritization strategy and prioritize target genes for 87% of kidney function loci. We highlight key roles of proximal tubules and metabolism in kidney function regulation. Furthermore, the causal role of SLC47A1 in kidney disease is defined in mice with genetic loss of Slc47a1 and in human individuals carrying loss-of-function variants. Our findings emphasize the key role of bulk and single-cell epigenomic information in translating genome-wide association studies into identifying causal genes, cellular origins and mechanisms of complex traits.

About the speaker: Professor Katalin Susztak is a Professor of Medicine and Genetics at the University of Pennsylvania, she is a physician-scientist who aims to understand the genetics and molecular mechanism of kidney disease development, with the ultimate goal of finding new, more effective therapies. She is is a member of the American Society of Clinical Investigations, the American College of Physicians. She had received multiple awards, including the Young Investigator award from the American Society of Nephrology, the Alfred Richards lifetime award from the International Society of Nephrology, and the William Osler research award from the University of Pennsylvania. She has made discoveries fundamental to defining critical genes, cell types, and mechanisms of chronic kidney disease. She was instrumental in defining genetic, epigenetic, and transcriptional changes in diseased human kidneys. She identified multiple novel kidney disease genes and demonstrated the role of Notch signaling and metabolic dysregulation in kidney disease development.

Monday Sep 5 2022

Speaker: Associate Professor Yijuan Hu (Emory University)

Title: It’s All Relative: Testing Differential Abundance in Compositional Microbiome Data

Abstract: Studies on the human microbiome have revealed that differences in microbial communities are associated with many human disorders such as inflammatory bowel disease, type II diabetes, and even Alzheimer’s disease and some cancers. The microbiome is a particularly attractive target for establishing new biomarkers for disease diagnosis and prognosis, and for developing low-cost, low-risk interventions. Microbiome data have two unique features. First, they are compositional, i.e., the total number of sequencing reads per sample is an experimental artifact and only the relative abundance of taxa can be measured. Second, they are subject to a wide variety of experimental biases (e.g., in the process of DNA extraction and PCR amplification) that plague most analyses that directly analyze relative abundance data. These features call for analyses that are based on log-ratio transformation of the relative abundance data. Existing methods often ignore experimental biases, do not handle the extensive (50-90%) zero count data adequately, and do not accommodate other complexities in microbiome data (e.g., high-dimensionality, confounding covariates, and continuous covariates of interest). In this talk, we present a new logistic-regression-based method that takes into account all of these features of microbiome data for robust testing of differential abundance. Our simulation studies indicate that our method is the only one that universally controls the FDR while at the same time maintaining good power. We illustrate our method by the analysis of a throat microbiome dataset.

About the speaker: I am currently a tenured Associate Professor in the Department of Biostatistics and Bioinformatics, Rollins School of Public Health (RSPH), Emory University. I joined the department in 2011 after I obtained my PhD in biostatistics at University of North Carolina at Chapel Hill. I was promoted to Associate Professor with tenure in 2017. My research centers on the development of statistical methods and software programs for analyzing high-throughput omics data in human epidemiological and clinical studies. In particular, my research was focused on statistical genetics before my tenure at Emory. After tenure, I shifted to a new field, microbiome statistics, and I have developed a series of statistical methods and popular software packages (e.g., LDM, LOCOM, and BOUTH). With my unique expertise on microbiome data analysis, I have also established extensive collaborations with a wide range of scientists on microbiome research within and outside Emory.

Monday August 29 2022

Speaker: Dr. Alistair Senior (The University of Sydney)

Title: Genetic Variation in the Response of Drosophila to Dietary Macronutrients

Abstract: Studies in organisms from yeast to human indicate that dietary macronutrient composition affect health mortality and lifespan. In this talk I will presenting the results of a genetic screen in Drosophila melanogaster to identify genetic variants that might moderate/affect the response of lifespan to dietary protein to carbohydrate ratio.

About the speaker: Alistair is a Senior Lecturer at the University of Sydney, affiliated with School of Life and Environmental Sciences and the Charles Perkins Centre. Originally from the UK, he completed a Biological Sciences undergraduate degree and a masters’ degree in Ecology and Evolution. In 2010 Alistair moved to the University of Otago in New Zealand to undertake a PhD in the department of Zoology, which he completed in 2013. Since then, Alistair has been working at the University of Sydney.

Monday August 22 2022

Speaker: Ms. Danqing (Angela) Yin (The University of Hong Kong)

Title: Building a large-scale single cell Covid-19 atlas data portal: how to overcome 5 million big data challenges on the cloud

Abstract: Many studies have utilized high-throughput scRNA-seq to study the immune and non-immune cell types of Covid-19 patients. The vast majority of these data come from patient blood samples, i.e. peripheral blood mononuclear cells (PBMCs). Other researchers have compiled a comprehensive resource of all Covid-19 single-cell sequencing data to date, have processed (normalization, integration, dimension reduction, annotation) this data into a UMAP visualization, and have developed a series of technologies to perform various analyses on them. We have also identified three challenges in the existing technologies for analysing this tremendous dataset in cloud accessibility, data curation and data interpretation. To address these challenges, and exact the insights from this massive dataset, we are developing a one-stop atlas data portal with scFeature meta-analysis to support the browsing, subsetting, download and visualization of metadata, count matrix and scFeature data across multiple conditions including patient number, age, health status and cell type. Users will be able to access these well-curated 5 million cells on a publicly available web application. Tutorials for downstream analysis such as annotation, differential gene expression, trajectory inference, functional annotation, and cell-cell communication will be provided for continuous discovery. The set of existing PBMC scRNA-seq data will be updated every few months. We aim to provide gold-standard Covid-19 atlas resources for the scientific community, therefore accelerating the Covid-19 research worldwide.

About the speaker: Danqing (Angela) Yin is undertaking a PhD in Biomedical Sciences at The University of Hong Kong. She received her Bachelor of Biomedicine in 2016, from the University of Melbourne Medical , and Master of IT with specialization in Software Engineering in 2020 from University of SydnSchooley. She has worked at bioinformatics groups in The Walter and Eliza Hall Institute and Monash University-Hudson Institute during her undergraduate study. Her professional work experiences include being the Bioinformatics Software Engineer at Garvan Institute of Medical Health for years, and she contributed to the development of scalable computing architecture for genomics data on large clusters and cloud.

Monday August 15 2022

Speaker: Ms. Xiaohang (Helen) Fu (The University of Sydney)

Title: Machine learning in multimodal and multilabel cancer diagnosis and image analysis

Abstract: Tumour segmentation and classification from medical images are necessary for effective diagnosis, treatment planning, and cancer management. However, these tasks are challenging as many characteristics are assessed simultaneously. Manual assessment is often subjective and prone to intra- and interobserver variability. These issues may be overcome with automated methods. State-of-the-art medical image analysis methods are now typically based on deep learning. There is an increasing focus on combining different modalities of data, such as different image modalities (e.g., PET-CT), or images with clinical variables (e.g., age). Multimodal approaches are designed to leverage the benefits of each modality in a complementary way. Additionally, multi-task methods are also useful as they predict multiple outcomes simultaneously and can exploit shared features between different labels. I will discuss our recent work where we developed deep learning methods for the segmentation, classification, characterisation, and survival prediction of cancers using PET-CT, CT, dermoscopy/clinical, and imaging mass cytometry (IMC) images.

About the speaker: Xiaohang (Helen) Fu is undertaking a PhD in Computer Science at The University of Sydney. She received her Bachelor of Engineering (Honours) specialising in Biomedical Engineering with First Class Honours in 2018, from The University of Auckland. She was a research assistant at the Auckland Bioengineering Institute. Her research interests include machine learning, and medical image classification, segmentation, and analysis.

Monday August 8 2022

Speaker: Dr. Lukas M. Weber (Johns Hopkins University)

Title: nnSVG: scalable identification of spatially variable genes in spatially-resolved transcriptomics data

Abstract: Spatially-resolved transcriptomics enables the investigation of spatially-resolved biological processes by measuring transcriptome-scale gene expression along with the spatial coordinates of the measurements, for example on tissue slides. Feature selection to identify spatially variable genes (SVGs) is a key step during analyses. We have developed nnSVG, a new scalable approach to identify SVGs based on nearest-neighbor Gaussian processes. Our method (i) identifies genes that vary in expression continuously across the entire tissue or within a priori defined spatial domains, (ii) uses gene-specific estimates of length scale parameters within the Gaussian process models, and (iii) scales linearly with the number of spatial locations. We have demonstrated the performance of our method using experimental data from several technological platforms and simulations, and a software implementation is available from Bioconductor (https://bioconductor.org/packages/nnSVG). In addition, I will discuss Bioconductor-based infrastructure for spatially-resolved transcriptomics analyses, including the SpatialExperiment data structure.

About the speaker: Dr. Lukas Weber am a postdoctoral fellow at Johns Hopkins Bloomberg School of Public Health, Department of Biostatistics, supported by a K99/R00 Pathway to Independence Award from the National Institutes of Health (NIH) NHGRI. His work is on the development of unsupervised statistical methodology and software for analyzing data from spatially-resolved transcriptomics, single-cell RNA sequencing, and other high-throughput genomics technologies. His methodological work is motivated by collaborative projects with experimental researchers in fields including neuroscience and cancer, and he implements methods as R packages within the open-source Bioconductor project. He is interested in rigorous benchmarking of new methods against existing and baseline methods, and he support open science principles including the release of open-source software, reproducible analyses, free availability of code and data resources, and publication of preprints. His training includes a PhD in Biostatistics at the University of Zurich, Switzerland, and a MSc in Statistics at ETH Zurich, Switzerland.

Seminars in 2022, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 27 2022

Speaker: Dr. Alma Andersson (KTH Royal Institute of Technology)

Title: Building common coordinate frameworks for spatial transcriptomics data to promote modeling of large and diverse datasets

Abstract: Over the last years, spatial transcriptomics has become a field of vigorous activity, producing both more and larger data sets. A trend well exemplified by large research initiatives such as the Human Developmental Cell Atlas (HDCA), where multiple tissue samples are collected from several individuals across different developmental time-points. However, due to variation in tissue morphology and size, comparison of spatial gene expression between samples is often complicated. Therefore, we present a new method to construct so-called common coordinate frameworks (CCFs) for spatial transcriptomics data, using a statistical approach relying on landmark annotation and Gaussian Process Regression (GPR). More specifically, a function relating gene expression to landmark distances is learnt and then used to transfer the observed data to any reference of choice. By this transfer, it is possible to compare local changes in gene expression between conditions or time points as well as performing more sophisticated forms of spatiotemporal modelling. A CCF also allows spatial gene expression from multiple samples to be represented jointly in a single reference, facilitating the identification of canonical structures or patterns. Our method also accounts for differences among samples caused by nonlinear distortions, in contrast to traditional alignment methods based on rotation and rescaling. We partially illustrate the use of our method with synthetic time-series data, where examples of downstream spatiotemporal analyses are given. Furthermore, we successfully apply our transfer strategy to real data sets containing multiple samples of human developmental heart and mouse brain, where we also integrate data from different spatial technologies. While mainly designed for analysis of spatial gene expression data, our method extends to other features of interest, which we demonstrate by a transfer of deconvolved cell type proportion estimates between two spatial transcriptomics samples. The method is implemented in Python and available on GitHub, with an API designed for seamless interactions with the Scanpy suite.

About the speaker: Dr. Alma Andersson is a computational biologist currently working as a researcher in the Lundeberg Lab. She focuses on the development of methods to analyze single cell and spatial transcriptomics data, usually relying on concepts from machine learning and probabilistic modeling. In June 2019, she joined the Lundeberg Lab as a PhD student and in March 2022 she successfully defended her thesis: "Computational methods for analysis of spatial transcriptomics data." At the moment, she's interested in context-dependent models, i.e. how a cell's environment influences its state as well as the idea of creating common coordinate frameworks for spatial data.

Monday June 20 2022

Speaker: Mr. Philipp Weiler (Technical University of Munich)

Title: CellRank 2 - A unified framework to study single-cell fate decision

Abstract: Single-cell RNA sequencing allows reconstructing trajectories of cellular state dynamics and investigating cell fate decisions. While the field of trajectory inference is well studied and robust methods exist to do so, they traditionally require the developmental direction of the biological system to be known. For cases where this information is not accessible, CellRank has been proposed to extend trajectory inference by directional information from RNA velocity. Although RNA velocity has been successfully applied to many datasets, its model assumptions are oftentimes violated, for example in hematopoiesis, making it inapplicable in such scenarios. Here, we extend CellRank to a unified framework for studying cellular fate decisions, which overcomes RNA velocity limitations by including alternative means to quantify probabilities of state changes. Our extension allows automatically identifying terminal states, cellular fate, and lineage drivers based on a pseudotime, the CytoTRACE score, real-time, or multiome information. Applied to datasets of adult human hematopoiesis and human embryoid stem cell differentiation, we automatically recover known terminal states as well as driver genes of the respective lineages. Additionally, in the case of pharyngeal endoderm development, this provides novel insight into the fate decision towards medullary and cortical thymic epithelial cells.

About the speaker: Mr. Philipp Weiler obtained M. Sc. degrees in Mathematics and Applied Mathematics from the Technical University of Munich (TUM) and École polytechnique fédéral de Lausanne (EPFL). He is now a Ph.D. student at the chair of Mathematical Modeling of Biological Systems at TUM and the Institute of Computational Biology (ICB) at Helmholtz Munich, supervised by Prof. Fabian Theis.

Monday June 6 2022

Speaker: Dr. Wenjun Kong (Washington University)

Title: Capybara: a computational tool to measure cell identity and fate transitions

Abstract: Cells in transition are important in continual biological processes, such as development, reprogramming, and disease. Yet, current methodologies mainly focus on the classification of cell types in a discrete, categorical manner. Here, we present a tool, Capybara, to measure cell identity as a continuum at the single-cell resolution. This approach enables the capture of cells with both discrete and transitional identities, supporting a metric that measures cell fate transition. The application of Capybara on hematopoiesis validates classification and supports the concept of 'hybrid' cells. Further applications in different reprogramming strategies instruct previously uncharacterized regional patterning and identify a putative in vivo correlate to a poorly-characterized engineered cell type.

About the speaker: Dr. Wenjun Kong graduated from Rose-Hulman Institute of Technology in 2016, majoring in Computer Science. She joined the computational and systems biology graduate program at Washington University in St. Louis and did her Ph.D. research in the laboratory of Dr. Samantha Morris. This past April, she joined Calico Life Sciences as a scientist to further work on diverse genomic technologies.

Monday May 30 2022

Speaker: Ms. Siobhon Egan (Murdoch University)

Title: Profiling substrate utilization of the gut microbiome

Abstract: The gut microbiome is a diverse system of microorganisms that play an important role in maintaining health. Changes in the microbial composition can lead to multiple inflammatory conditions, and result in an increased risk of metabolic syndromes and susceptibility to certain cancers. In addition, it is recognised initial seeding of the infant gut microbiome is influenced by environmental factors and can have life-long implications. Studies on the gut microbiome are further hindered by unique differences between individuals in substrate utilization and subsequent metabolism. Here we present a novel methodology for characterising a substrate utilization profile of gut bacteria using a multi-omic approach. Stool samples from infants (n = 6) and adults (n = 12) were used to inoculate Biolog® GENIII microplate (Biolog Inc., USA), these 96-well culture plates are pre-coated with a single substrate containing 71 carbon and 23 chemical sources. After 24-hour anaerobic cultivation, samples were then subject to metabolite and bacteria profiling. Proton nuclear magnetic resonance (1H NMR) spectroscopy of supernatants was performed using 1D and 2D experiments. The remaining culture pellet was used for DNA extraction and subject to full length 16S rRNA sequencing on the PacBio Sequel II platform. Data acquisition is still underway for this study, however to date we have identified distinct differences between infants and adults in both the metabolite and taxonomic profile of cultured bacteria. This presentation will discuss our novel experimental and data analysis approaches to untangle the complex relationship with bacteria, substrate and metabolism.

About the speaker: Ms. Siobhon Egan is an early carer researcher at Murdoch University based at the Centre for Computational and Systems Medicine. She began working under the direction of Prof. Elaine Holmes in March 2021, as part of a project to understand host-microbiome signalling axes in ageing. This research is focused on understanding the interactions at the host-microbe axis using a range of -omic technologies. Within this project she lead the development of genomics and culture experiments and subsequent data analysis to characterise the role of gut bacteria on circulating metabolites and ultimately health outcomes. Her thesis is titled Ecology of Ticks and Microbes in Australian Wildlife, which is currently under examination This research was centred around the identification of microbes in ticks and wildlife reservoir hosts with aim to provide insight into potential causative agent(s) of zoonotic tick-borne pathogens. In 2021 she was awarded the Sinnecker-Kunz Award for early career researchers at the 14th International Symposium on Ticks and Tick-borne Diseases. To date she has published 15 peer reviewed articles from research conducted going her PhD candidature.

Monday May 23 2022

Speaker: Dr. Ru-Fang Yeh (Genentech)

Title: Data sciences in drug development

Abstract: Data sciences play increasingly important roles in all aspects of work in the biotechnology/pharmaceutical industry. In this talk, I will discuss high-level examples to illustrate a wide spectrum of data sciences contributions in drug development – statistics, bioinformatics, machine learning/artificial intelligence, data engineering – in settings ranging from research, drug discovery, pre-clinical and clinical development and beyond.

About the speaker: Ru-Fang Yeh is currently a Global Data Sciences Leader in Genentech, leading cross-functional data sciences teams on complex clinical development projects. She is an experienced drug developer with statistics and bioinformatics expertise. Ru-Fang received a PhD in Statistics from UC Berkeley with Prof Terry Speed (Human Genome Project), completed a postdoc at MIT with Prof. Chris Burge (computational gene finding), and was an Assistant Professor at UCSF (genomics) prior to joining Genentech in 2008.

Monday May 16 2022

Speaker: Ms. Tiantian Liu (Shanghai Jiao Tong University)

Title: MZINBVA: Variational approximation for multilevel zero-inflated negative-binomial models for association analysis in microbiome surveys

Abstract: As our understanding of the microbiome has expanded, so has the recognition of its critical role in human health and disease, thereby emphasizing the importance of testing whether microbes are associated with environmental factors or clinical outcomes. However, many of the fundamental challenges that concern microbiome surveys arise from statistical and experimental design issues, such as the sparse and overdispersed nature of microbiome count data and the complex correlation structure among samples. For example, in the Human Microbiome Project (HMP) dataset, the repeated observations across time points (level 1) are nested within body sites (level 2) which are further nested within subjects (level 3). Therefore, there is a great need for the development of specialized and sophisticated statistical tests. We propose multilevel zero-inflated negative-binomial models for association analysis in microbiome surveys and develop a variational approximation method for maximum likelihood estimation and inference. It uses optimization, rather than sampling, to approximate the log-likelihood and compute parameter estimates, provides a robust estimate of the covariance of parameter estimates, and constructs a Wald-type test statistic for association testing. We evaluate and demonstrate the performance of our method using extensive simulation studies and an application to the HMP data set. We have developed an R package MZINBVA to implement the proposed method, which is available from the GitHub repository: https://github.com/liudoubletian/MZINBVA.

About the speaker: Tiantian Liu is a PhD candidate in the SJTU-Yale Joint Center for Biostatistics and Data Science at Shanghai Jiao Tong University, under supervision of A/Prof. Tao Wang. She received her bachelor degree in bioinformatics from Tianjin Medical University in 2016. Her research interest is in developing statistical methodology for differential abundance analysis and mediation analysis.

Monday May 9 2022

Speaker: Assistant Professor Changhee Lee (Chung-Ang University)

Title: Deep Learning Approaches to Time-to-Event Analysis and Beyond

Abstract: In this talk, I will present my work in deep learning approaches to time-to-event analysis and methodologies that transformed the conventional time-to-event analysis from the view of time-series analysis, automated machine learning, and treatment effect estimation.

About the speaker: Changhee Lee is currently an assistant professor at the Department of Artificial Intelligence, Chung-Ang University, Seoul, Korea, where he leads the Decision Intelligence Lab. Changhee completed his Ph.D. at the University of California, Los Angeles, Department of Electrical and Computer Engineering, as a member of the van der Schaar Lab led by his advisor Prof. Mihaela van der Schaar. His research focus is on machine learning in healthcare particularly on deep learning approaches to address challenges associated with modeling, predicting, and interpreting in time-to-event analysis and time-series analysis, and on deep learning approaches for multiple omics data including genomics and transcriptomics.

Monday May 2 2022

Speaker: Dr. David Zeevi (Weizmann Institute of Science)

Title: Mining microbial communities for information on their environment

Abstract: Microbial communities can have an immense effect on their environment and are strongly affected by it. Using new methods for metagenomic sequencing analysis, we systematically identified microbial genomic structural variants and found them to be highly prevalent in the gut microbiome and to correlate with disease risk factors (Zeevi et al., Nature 2019). Our results suggest that these variants facilitate adaptation to environmental stress. Exploring genes that are clustered in the same variant, we uncovered potential mechanistic links between microbiome and its host. Inspired by our discovery of potential microbial adaptation to host pressures, we developed a strategy for mining marine microbiome samples for novel bioremediation genes. To this end, we devised a high-throughput evolutionary analysis, and revealed an unexpected insight into microbial adaptation (Shenhav and Zeevi, Science 2020). Our primary analyses uncovered overwhelmingly strong purifying selective pressure across marine microbial life. This selection was highly correlated with nutrient concentrations and has led us to explore robustness in the genetic code, common to nearly all life forms.

About the speaker: David Zeevi is a principal investigator at the Department of Plant and Environmental Sciences at the Weizmann Institute of Science. David did his Ph.D. with Prof.Eran Segal at the Weizmann institute of Science and his postdoc as an Independent Fellow at the Rockefeller University’s Center for Studies in Physics and Biology. He has coauthored several publications in the microbiome field, linking the microbiome to the effects of artificial sweeteners (Suez et al., Nature 2014) and host circadian rhythm (Thaiss et al., Cell 2015), inferring bacterial growth dynamics (Korem et al., Science 2015), predicting the glycemic responses of individuals to complex meals (Zeevi et al., Cell 2015; Korem et al., Cell Metab 2017), characterizing microbial genomic variability across individuals (Zeevi et al., Nature 2019), and understanding resource conservation in marine microbes (Shenhav and Zeevi, Science 2020). David’s lab studies how microbes are affected by human-made pollution using AI and multi-omics.

Monday April 11 2022

Speaker: Assistant Professor Jingshu Wang (the University of Chicago)

Title: Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments

Abstract: Mendelian randomization (MR) is a method of exploiting genetic variation to unbiasedly estimate a causal effect in presence of unmeasured confounding. In MR, natural genetic variations are used as instrumental variables to perform causal inference on the effect of heritable risk factors. Because of its convenience, MR has been being widely used in epidemiology and other related areas of population science. However, the phenomenon that “all genes affect every complex trait” complicates Mendelian Randomization (MR) studies as most genetic variants will then be invalid instruments. In the talk, I will discuss the assumptions of existing MR methods and show how they need to be clarified to allow for pervasive horizontal pleiotropy and heterogeneous effect sizes. I will present a comprehensive framework that we have developed for MR. By using GWAS summary statistics, we can efficiently use both strong and weak genetic instruments, detect the existence of multiple pleiotropic pathways, determine the causal direction and perform multivariable MR to adjust for confounding risk factors. I’ll also illustrate a few case studies at the end of the talk.

About the speaker: Dr. Jingshu Wang is an Assistant Professor in statistics at the University of Chicago. Before joining in Chicago in 2019, and was a postdoc at the University of Pennsylvania working with Prof. Nancy R. Zhang. She received her Ph.D. in statistics from Stanford at 2016, under the supervisor of Prof. Art Owen. Her main research interest is in developing statistical methods for cutting-edge bio-technologies and genetic problems. She currently work on problems in single-cell RNA sequencing, Mendelian Randomization and structural variation in the 3D genome. She also develop new statistical methods and theory for factor models and multiple hypotheses testing, with their applications in statistical genetics.

Monday April 4 2022

Speaker: Dr. Yixuan Ye (Yale University)

Title: Leveraging functional annotations and multi-populations data in genetic risk prediction

Abstract: An essential component of human genetics research and precision medicine is to develop robust and accurate disease risk prediction models from genetic data together with other risk factors. Accurate prediction models will have great impacts on disease prevention and early and effective treatment. With the remarkable success achieved by genome-wide association studies (GWAS) in the past 17 years, numerous single-nucleotide polymorphisms (SNPs) associated with complex human traits and diseases have been identified, and various methods, broadly called polygenic risk scores (PRS), have been proposed to utilize GWAS data to predict genetic risk. Here we introduced two novel PRS methods, AnnoPred and xPred. The AnnoPred is a principled framework that leverages diverse types of genomic and epigenomic functional annotations in genetic risk prediction. And xPred further integrates data from diverse populations in developing PRS. Through comprehensive simulations and real-data analyses, we show that our novel PRS methods can significantly increase the accuracy of polygenic risk prediction and risk population stratification compared to the existing methods for diverse populations.

About the speaker: Yixuan Ye is a fifth-year Ph.D. student working with Dr. Hongyu Zhao in the program of Computational biology and Bioinformatics at Yale University. She received her B.M. in preventive medicine and B.S. in physics (double majored) from Peking University in 2017. Her research interests are the developments and applications of novel statistical methods in the genetic risk predictions for human complex diseases and traits. She is also working on the causal inference and exploring the interactions between gene, lifestyle, and diseases. She recently defended her thesis successfully with the topic: Making the most of polygenic risk scores: risk decomposition, cross-population prediction, and clinical application.

Monday March 28 2022

Speaker: Dr. Dhirendra Kumar (National Institute of Environmental Health Sciences (NIEHS))

Title: Decoding the function of bivalent chromatin in development and cancer

Abstract: Bivalent chromatin is characterized by the simultaneous presence of H3K4me3 and H3K27me3, histone modifications generally associated with transcriptionally active and repressed chromatin, respectively. Prevalent in embryonic stem cells (ESCs), bivalency is postulated to poise/prime lineage-controlling developmental genes for rapid activation during embryogenesis while maintaining a transcriptionally repressed state in the absence of activation cues; however, this hypothesis remains to be directly tested. Most gene promoters DNA hypermethylated in adult human cancers are bivalently marked in ESCs, and it was speculated that bivalency predisposes them for aberrant de novo DNA methylation and irreversible silencing in cancer, but evidence supporting this model is largely lacking. Here, we show that bivalent chromatin does not poise genes for rapid activation but protects promoters from de novo DNA methylation. Genome-wide studies in differentiating ESCs reveal that activation of bivalent genes is no more rapid than that of other transcriptionally silent genes, challenging the premise that H3K4me3 is instructive for transcription. H3K4me3 at bivalent promoters-a product of the underlying DNA sequence-persists in nearly all cell types irrespective of gene expression and confers protection from de novo DNA methylation. Bivalent genes in ESCs that are frequent targets of aberrant hypermethylation in cancer are particularly strongly associated with loss of H3K4me3/bivalency in cancer. Altogether, our findings suggest that bivalency protects reversibly repressed genes from irreversible silencing and that loss of H3K4me3 may make them more susceptible to aberrant DNA methylation in diseases such as cancer. Bivalency may thus represent a distinct regulatory mechanism for maintaining epigenetic plasticity.

About the speaker: Dhirendra Kumar is a staff scientist within Jothi Lab at the NIEHS in North Carolina. His research at NIEHS focusses on the use of integrative multi-omics approaches to understand cell fate decisions during early embryonic development.

Monday March 21 2022

Speaker: Assistant Professor Zhixiang Lin (the Chinese University of Hong Kong)

Title: RA3 is a reference-guided approach for epigenetic characterization of single cells

Abstract: The recent advancements in single-cell technologies, including single-cell chromatin accessibility sequencing (scCAS), have enabled profiling the epigenetic landscapes for thousands of individual cells. However, the characteristics of scCAS data, including high dimensionality, high degree of sparsity and high technical variation, make the computational analysis challenging. Reference-guided approaches, which utilize the information in existing datasets, may facilitate the analysis of scCAS data. Here, we present RA3 (Reference-guided Approach for the Analysis of single-cell chromatin Accessibility data), which utilizes the information in massive existing bulk chromatin accessibility and annotated scCAS data. RA3 simultaneously models (1) the shared biological variation among scCAS data and the reference data, and (2) the unique biological variation in scCAS data that identifies distinct subpopulations. We show that RA3 achieves superior performance when used on several scCAS datasets, and on references constructed using various approaches. Altogether, these analyses demonstrate the wide applicability of RA3 in analyzing scCAS data. Methods for profiling differences between individual cells are constantly expanding. Here, the authors present a computational framework for the analysis of chromatin accessibility data at the single-cell level that takes into account previous knowledge and data-specific characteristics.

About the speaker: Zhixiang Lin received his B.S. from Tsinghua University in 2010 and Ph.D. from Yale University in 2015. During his graduate study at Yale, he was co-advised by Prof. Hongyu Zhao in the Department of Biostatistics and Prof. Matthew State in the Department of Psychiatry. Dr. Lin worked with Prof. Wing Hung Wong at Stanford University as a postdoctoral researcher from 2015 to 2018. In 2018, he was appointed as Assistant Professor in the Department of Statistics at the Chinese University of Hong Kong (CUHK). His lab develops statistical methods and computational tools for addressing significant scientific questions, especially those related to biomedical research, large-scale genomic data, health and clinical data. Our most recent application area include single cell genomics and neuroscience.

Monday March 14 2022

Speaker: A/Prof. Simon van Heeringen (Radboud University)

Title: Computational modeling of enhancer dynamics and gene regulatory networks during development and differentiation

Abstract: Our genome encodes the programs to develop a fertilized cell into a complex organism with hundreds of different cell types. A multitude of regulatory sequences integrate complex, interconnected signals to drive gene expression programs. This produces robust, yet remarkably flexible cellular outcomes. A useful computational framework to model cell identity and behavior is a gene regulatory network (GRN) that describes how transcription factors in unison regulate their target genes. Large-scale GRNs can be inferred from genome-wide functional data, for instance, based on gene expression profiles or on transcription factor (TF) motifs. However, due to the inherent complexity of GRNs, our understanding of how they shape complex phenotypes remains incomplete. To enable the characterization of gene regulatory principles from genome-wide (epi)genomic data, we recently developed ANANSE (ANalysis Algorithm for Networks Specified by Enhancers), a network-based method that exploits enhancer-encoded regulatory information to identify the key transcription factors in cell fate determination. ANANSEmodels genome-wide binding profiles of transcription factors in various cell types using enhancer activity and transcription factor binding motifs. Subsequently, applying these inferred binding profiles, we can construct cell type-specific gene regulatory networks. Finally, ANANSE predicts key transcription factors controlling cell fate transitions using differential networks between cell types. Extensive benchmarks demonstrate that ANANSE outperforms existing approaches in regulatory network inference and prioritization of transcription factors driving cellular transitions. I will highlight some examples of our recent work, which show how ANANSE can be used to model gene regulation in a variety of dynamic systems, in humans as well as (non)model organisms.

About the speaker: Simon van Heeringen is an associate professor of molecular developmental biology at the Radboud University. His work includes development of methods that enable characterisation of gene regulatory principles from genome-wide (epi)genomic data. Recently, Simon's group developed ANANSE (ANalysis Algorithm for Networks Specified by Enhancers), a network-based method that exploits enhancer-encoded regulatory information to identify the key transcription factors in cell fate determination.

Monday March 7 2022

Speaker: Dr. Ye Zheng (Fred Hutchinson Cancer Research Center)

Title: Bridging the gap: joint modeling of single-cell 1D and 3D genomics

Abstract: Recent advancements in single-cell technologies enabled the profiling of 3D genome structures at the single-cell level. Quantitative tools are needed to fully leverage the unprecedented resolution of single-cell high throughput chromatin conformation (scHi-C) data and integrate it with other single-cell data modalities. To address this need, we developed single-cell gene-body associating domain (scGAD) scores. scGAD explores scHi-C data in units of genes, with proper normalization to eliminate technical biases. Application of scGAD analysis to scHi-C data of the developing mouse cortex and hippocampus (Tan et al. 2021) revealed that scGAD extracts data summaries that agree well with the scRNA-seq data from the same system. Specifically, genes with high scGAD scores exhibit similarly high expression across individual cells, and genes with differential scGAD scores across cell types largely capture cell-type-specific marker genes from scRNA-seq. As a result, scGAD bridges the analysis of scHi-C and scRNA-seq data modalities and facilitates the projection of scHi-C cells onto scRNA-seq reference panels with known cell types/status. This provides a convenient and more accurate means of annotating cells based on their 3D genomes.

About the speaker: Ye Zheng is currently a Postdoctoral Research Fellow at Fred Hutchinson Cancer Research Center. She is a statistician specializing in the problems at the interface of statistics, biology, and biomedical sciences. Her research includes the development of the simulation system for 3D chromatin conformation capture data, statistical generative models for recovering epigenomic signals involving repetitive regions of the genome. Her recent research involves the normalization and denoising of the single-cell 3D genomics data and investigates DNA interactions at the unit of genes, enabling the integrative analysis across 3D genomics, epigenomics and transcriptomics. Ye Zheng has also built a repertoire of collaboration to pioneer the investigation of single-cell multi-omics study in immunotherapy.

Monday February 28 2022

Speaker: Dr. Beth Cimini (Broad Institute)

Title: Making the Most of your Microscopy with High-Content Image Analysis

Abstract: Although microscopy has been used to study biological systems for hundreds of years, it took the invention of the digital camera to unlock its quantitative potential for the average user. The twenty-first century has seen an explosion of open source microscopy analysis tools, but the inherent richness of imaging data is often under-utilized. In many cases, users focus on one or two known phenotypes of interest, despite the fact that a standard microscopy image provides more than one million quantitative data points.To maximize the information that can be extracted in biological systems, two tools have proven particularly helpful: high-content assays such as Cell painting that maximize the information that can be extracted from an individual cell, and open source tools that make it easy for users to find their objects of interest and extract large numbers of measurements.Advancements in these fields, as well as new techniques being developed to use machine learning to predict other biological measurements such as RNA expression, will be highlighted.

About the speaker: Beth Cimini leads the Cimini Lab within the Imaging Platform of the Broad Institute of MIT and Harvard. Her team works with biologists to help them create image analysis workflows and makes the open-source image analysis software CellProfiler.

Seminars in 2021, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 1 2021

Speaker: Associate Professor Suoqin Jin (Wuhan University, China)

Title: Dissecting Cellular Heterogeneity and Communication via Integration of Single-cell Genomics Data

Abstract: Recent advances of single-cell technologies, in particular single-cell RNA and ATAC sequencing, provide an unprecedented opportunity to dissect cellular heterogeneity and communication more comprehensively. To deconvolute heterogeneous single cells from both transcriptomic and epigenomic profiles, we developed a matrix factorization-based method, scAI, for integrating single cell RNA-seq data and ATAC-seq or DNA methylation data obtained from the same individual cells. To address the extremely sparse and binary nature of the epigenomic data, scAI aggregates sparse epigenomic signals in similar cells learned in an unsupervised manner, allowing coherent fusion with transcriptomic measurements. Simulation studies and applications to real datasets demonstrate its capability of dissecting cellular heterogeneity within both transcriptomic and epigenomic layers and understanding transcriptional regulatory mechanisms. In addition, single cell RNA-seq data also offers a great opportunity for probing underlying intercellular communications that often drive heterogeneity and cell state transitions in tissues. We developed an integrated method CellChat for systematic inference and quantitative analysis of cell-cell communication by integrating scRNA-seq data and prior knowledge of the interactions between signaling molecules. I will show how we can quantitatively build and analyze cell-cell communication networks in an easily interpretable way by applying systems biology and machine learning approaches. Applying CellChat to real datasets shows its ability to extract complex signaling patterns. Our versatile and easy-to-use toolkit CellChat will help discover novel intercellular communications and build cell-cell communication atlases in diverse tissues.

Monday October 25 2021

Speaker: Ms. April Kriebel (University of Michigan, USA)

Title: Mosaic Integration of Single-cell Multi-omic Datasets Using Non-negative Matrix Factorization

Abstract: Single-cell genomic technologies provide an unprecedented opportunity to define molecular cell types in a data-driven fashion, but present unique data integration challenges. Integration analyses often involve datasets with partially overlapping features, including both shared features that occur in all datasets and features exclusive to a single experiment. Previous computational integration approaches require that the input matrices share the same number of either genes or cells, and thus can use only shared features. To address this limitation, we derive a novel non-negative matrix factorization algorithm for integrating single-cell datasets containing both shared and unshared features. The key advance is incorporating an additional metagene matrix that allows unshared features to inform the factorization. We demonstrate that incorporating unshared features significantly improves integration of single-cell RNA-seq, spatial transcriptomic, SNARE-seq, and cross-species datasets. We have incorporated the UINMF algorithm into the open-source R package LIGER

Monday October 18 2021

Speaker: Ms. Taylor Cavazos (University of California in San Francisco)

Title: The Risk of Polygenic Risk Scores in Diverse Populations

Abstract: The majority of polygenic risk scores (PRS) have been developed and optimized in individuals of European ancestry and have limited generalisability across other ancestral populations. Understanding aspects of PRS that contribute to this issue and determining solutions is complicated by disease-specific genetic architecture and limited knowledge of sharing of causal variants and effect sizes across populations. Motivated by these challenges, we undertook a simulation study to assess the relationship between ancestry and the potential bias in PRS developed in European ancestry populations. Our simulations show that the magnitude of this bias increases with increasing divergence from European ancestry, and this is attributed to population differences in linkage disequilibrium and allele frequencies of European discovered variants, likely as a result of genetic drift. Importantly, we find that including into the PRS variants discovered in African ancestry individuals has the potential to achieve unbiased estimates of genetic risk across global populations and admixed individuals. Given the demonstrated improvement in PRS prediction accuracy, recruiting larger diverse cohorts will be crucial - and potentially even necessary - for enabling accurate and equitable genetic risk prediction across populations.

Monday October 11 2021

Speaker: Dr. Ruey-Leng Loo (Murdoch University)

Title: Characterising and Visualising Human Metabolic Phenotyping Datasets in Diverse Populations

Abstract: Metabolic phenotyping (metabolomics/metabonomics) by mass spectrometry and nuclear magnetic resonance spectroscopy has been increasingly applied to human population studies. Different data analysis tools have been applied to interpret the complex metabolic phenotyping datasets generated by these analytical platforms. However, these techniques are often inefficient for extracting detailed information from these complex datasets. In this talk, I will describe an interactive software pipeline for exploratory analyses of real population-based nuclear magnetic resonance spectral data using the Combined Multi-block Principal component Analysis with Statistical Spectroscopy (COMPASS). I will explain the key principles behind the COMPASS approach and will conclude the talk by demonstrating some of its key advantages.

Monday September 27 2021

Speaker: Dr. Kenji Kamimoto (Washington University in Saint Louis)

Title: CellOracle: Dissecting cell identity via network inference and in silico gene perturbation

Abstract: Recent technological advances in single-cell sequencing enable the acquisition of multi-dimensional data in a high-throughput manner. These technologies reveal the existence of heterogeneity and the diversity of cell states and identities. To reveal the regulatory mechanism underlying such phenomena, many computational Gene Regulatory Network (GRN) inference methods have been proposed. However, understanding biological events from a GRN perspective remains difficult. Even if a computational algorithm can infer GRN, the biological network is so complex that it is challenging to understand how it systematically dictates cell identities. There is significant demand for new methodologies that bridge the gap between cellular phenotypes and the underlying GRN. Thus, we have developed a new method, CellOracle, a new computational approach for the inference and analysis of GRN. By utilizing machine learning algorithms and genetic information, CellOracle infers sample-specific GRN configurations from single-cell RNA-seq and ATAC-seq data. Our GRN models are designed to be used for the simulation of cell identity changes in response to gene perturbation. This simulation enables network configurations to be interrogated in relation to cell-fate regulation, facilitating their interpretation. To validate CellOracle’s GRN inference method, we present benchmarking on various tissues and cell-types. We also validate the efficacy of CellOracle to recapitulate known outcomes of well-characterized gene perturbations in developmental processes, including mouse hematopoiesis and zebrafish embryogenesis. Our benchmarking and validation results demonstrate the efficacy of CellOracle to infer and interpret the dynamics of GRN configurations, promoting new mechanistic insights into the regulation of cell identity.

Monday September 20 2021

Speaker: Dr. Melissa Ko (Stanford University)

Title: Visualizing Trajectories from Single-Cell Time Course Data Using FLOW-MAP

Abstract: Multi-parameter single-cell measurement technologies give us an unprecedented view into complex biological systems, but diving into this data can be a monumental task. To study a highly dynamic process like drug resistance in cancer, researchers may collect this dense high-dimensional data at several timepoints that then need to be brought together in the analysis step. How can we extract important patterns across this time course data? How can we spot interesting rare phenomena? To aid analysis of single-cell time course datasets, we developed a graph-based analysis tool called FLOW-MAP. FLOW-MAP enables researchers to visualize and then infer trajectories from data produced in flow cytometry, mass cytometry or single-cell RNA sequencing experiments. This approach has been applied to investigate drug-induced apoptosis in multiple myeloma and identify what factors may lead to a subset of cancer cells surviving our attempts to treat this disease. Through this example, we will explore how FLOW-MAP can be used to gain an intuition for complex datasets, reveal patterns over time, and then communicate these findings to our research audience.

Monday September 13 2021

Speaker: Dr. Brendan Miller (Johns Hopkins University)

Title: Reference-free Cell-type Deconvolution of Multi-Cellular Pixel-Resolution Spatially-Resolved Transcriptomics Data

Abstract: Recent technological advancements have enabled spatially resolved transcriptomic (ST) profiling but at multi-cellular pixel resolution, thereby hindering the identification of cell-type spatial co-localization patterns. Supervised deconvolution approaches have recently been developed to predict the proportion of cell-types within ST multi-cellular pixels but these approaches rely on the availability of a suitable single-cell reference, which may present limitations if such a reference does not exist. To address this challenge, we developed STdeconvolve as an unsupervised approach that builds upon latent Dirichlet allocation to deconvolve underlying cell-types comprising such ST datasets. We show that STdeconvolve effectively recovers the putative transcriptomic profiles of cell-types and their proportional representation within ST multi-cellular pixels without reliance on external single-cell transcriptomics references. We find that STdeconvolve provides competitive performance to existing reference-based methods when suitable single-cell references are available, as well as potentially superior performance when suitable single-cell references are not available.

Monday September 6 2021

Speaker: Dr. Neeraj Kumar (University of Alberta)

Title: Learning Individual Survival Models from PanCancer Whole Transcriptome Data – A Step Towards Personalized Medicine

Abstract: Personalized medical oncology aims to provide individualized cancer treatments by acknowledging that every cancer patient is unique, in terms of prognosis, treatment tolerance, and survival outcome due in part to each individual tumor’s distinctive molecular profile. It is clearly useful to accurately estimate a patient’s survival time, as that could help in making end-of-life decisions, and in assessing patient-specific benefits of personalized medicine. A novel type of survival prediction model that estimates individual survival distributions (ISDs) – survival probabilities at several time points for an individual – can play a significant role in the future of personalized oncology. Specifically, this talk will show how to fit accurate ISDs from pan-cancer whole transcriptome data and how these ISDs could be used to accurately assess a cancer patient's survival likelihood in comparison to other models, including the ubiquitous Kaplan-Meier and Cox proportional hazard estimates.

Monday August 30 2021

Speaker: Mr. Chao Gao (University of Michigan, USA)

Title: Iterative Single-cell Multi-omic Integration Using Online Learning

Abstract: Integrating large single-cell gene expression, chromatin accessibility and DNA methylation datasets requires general and scalable computational approaches. Here we describe online integrative non-negative matrix factorization (iNMF), an algorithm for integrating large, diverse and continually arriving\ single-cell datasets. Our approach scales to arbitrarily large numbers of cells using fixed memory, iteratively incorporates new datasets as they are generated and allows many users to simultaneously analyze a single copy of a large dataset by streaming it over the internet. Iterative data addition can also be used to map new data to a reference dataset. Comparisons with previous methods indicate that the improvements in efficiency do not sacrifice dataset alignment and cluster preservation performance. We demonstrate the effectiveness of online iNMF by integrating more than 1 million cells on a standard laptop, integrating large single-cell RNA sequencing and spatial transcriptomic datasets, and iteratively constructing a single-cell multi-omic atlas of the mouse motor cortex.

Monday August 23 2021

Speaker: Dr. Sebastian Uhrig (Deutsches Krebsforschungszentrum, Germany)

Title: Gene Fusions in Solid Tumours: From Novel Discoveries to Diagnosis to Treatment

Abstract: The MASTER (Molecularly Aided Stratification for Tumour ERadication) program is a precision oncology trial for patients with advanced-stage disease. The trial recruits patients of young age or suffering from rare cancer types. Multi-omics assays are used to create a molecular profile of the tumors with the goal of finding genetic aberrations which can be exploited therapeutically. Gene fusions play an important role in this setting, since many of the soft-tissue sarcomas which are part of the MASTER cohort are associated with oncogenic driver fusions. Often, the detection of a pathognomonic fusion serves to confirm or refine the diagnosis of a patient. More importantly, many fusions are recognized as potent drug targets and are therefore of central interest to the program. Furthermore, the multi-layered, in-depth analysis of each patient recruited in the MASTER trial has repeatedly revealed novel insights into the biology of cancer. After careful study and experimental validation in the context of basic cancer research projects, such findings can ultimately be translated into clinical practice.

Monday August 16 2021

Speaker: Assoc. Prof. Jessica Jingyi Li (University of California, Los Angeles)

Title: Applications of Generalized Additive Models (GAMs) and Copulas to Single-cell RNA-seq: PseudotimeDE and scDesign2

Abstract:

Part 1: PseudotimeDE: inference of differential gene expression along cell pseudotime with well-calibrated p-values from single-cell RNA sequencing data. To investigate molecular mechanisms underlying cell state changes, a crucial analysis is to identify differentially expressed (DE) genes along the pseudotime inferred from single-cell RNA-sequencing data. However, existing methods do not account for pseudotime inference uncertainty, and they have either ill-posed p-values or restrictive models. Here we propose PseudotimeDE, a DE gene identification method that adapts to various pseudotime inference methods, accounts for pseudotime inference uncertainty, and outputs well-calibrated p-values. Comprehensive simulations and real-data applications verify that PseudotimeDE outperforms existing methods in false discovery rate control and power.

Part 2: scDesign2: a transparent simulator that generates high-fidelity single-cell gene expression count data with gene correlations captured A pressing challenge in single-cell transcriptomics is to benchmark experimental protocols and computational methods. A solution is to use computational simulators, but existing simulators cannot simultaneously achieve three goals: preserving genes, capturing gene correlations, and generating any number of cells with varying sequencing depths. To fill this gap, we propose scDesign2, a transparent simulator that achieves all three goals and generates high-fidelity synthetic data for multiple single-cell gene expression count-based technologies. In particular, scDesign2 is advantageous in its transparent use of probabilistic models and its ability to capture gene correlations via copulas.

Monday August 9 2021

Speaker: Dr. George Guo (University of Auckland)

Title: HIT-MAP: A High-resolution Mass-spectrometry Imaging Informatics Toolbox for Automated Proteomics Annotation and Visualisation

Abstract: The unique set of expressed proteins, specific to a particular cell type, location, or place in time or space, critically underpins organ function and disease states. The liquid-chromatography mass spectrometry (LC-MS/MS) proteomic method allows for global characterisation of these proteomes at the expense of spatial information, which limits our understanding of disease mechanisms. Matrix-assisted laser desorption/ionisation mass spectrometry imaging (MALDI-MSI) can survey this spatial proteomic complexity. But identification and quantification of peptides are mutually exclusive, primarily due to the typically lower amount of evidence provided within a given MS imaging coordinate. To address this, we developed HIT-MAP (High-resolution Informatics Toolbox in MALDI mass-spectrometry imaging Proteomics), an R-based pipeline for the automated annotation and visualization of proteomic MALDI-MSI datasets. HIT-MAP uses statistical methods for spatially aware pixel clustering and m/z feature summarization to perform proteomics annotation via a false discovery rate-controlled peptide mass fingerprinting and protein coverage analysis pipeline.

Monday August 2 2021

Speaker: Dr. Jimmy Breen (University of Adelaide)

Title: Temporal Placental Gene Expression Profiles Reflect Three Phases of Blood Flow During Human Gestation

Abstract: The human placenta is the largest fetal organ and is important during pregnancy. However, very little is known about its development and function during gestation. Using RNA sequencing, placental chorionic villous tissues from 96 pregnancies at six to 23 weeks of gestation were profiled. Substantial changes in gene expression between early and late gestation were identified, the majority of which were enriched in functions relating to transcription factor signalling, inflammatory response and cell adhesion. Using co-expression network and gene set enrichment analyses, three distinct phases of gene expression coincident with phases of maternal blood flow to the placenta were discovered. They impact immune function and are likely driven by oxygen tension, potentially in a sex-specific manner. Future efforts to improve how pregnancy complications are treated will be discussed.

Seminars in 2021, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 21 2021

Speaker: Mr. Erick Armingol (University of California, San Diego)

Title: Cell-cell Interactions and Spatial Patterns of Communication in Multicellular Organisms

Abstract: Cell-cell interactions (CCIs) are crucial for multicellular life. They shape cellular functions which can ultimately influence organismal phenotype. Since intercellular interactions can be inferred from RNA-sequencing data by integrating prior knowledge about ligand-receptor interactions, one can employ these strategies to unveil how CCIs are associated with spatial organizations across the whole body of multicellular organisms. In particular, we are interested in understanding the spatial code embedded in the molecular interactions that drive and sustain spatial organization, and the organization that in turns drives intercellular interactions across a living animal. Strategies for studying CCIs from gene expression will be overviewed and exemplified by a computational strategy to inspect CCIs at the whole body of Caenorhabditis elegans larvae. Briefly, this strategy encompasses the inference of an overall potential of intercellular interactions through a new Bray-Curtis-like metric, a genetic algorithm to select the ligand-receptor pairs most informative of the spatial organization of cells and the functional association of these molecular bases encoding spatial information.

Monday June 7 2021

Speaker: Dr. Nils Eling (University of Zurich)

Title: Informed Region Selection and Analysis of Imaging Mass Cytometry Data

Abstract: The development of highly multiplexed imaging technologies has led to deeper insights into the spatial organization of healthy and diseased tissues. In particular, the structure of the immune compartment within the tumour microenvironment is a predictor for immuno-oncology treatment success. The IMMUcan (Integrated immunoprofiling of large adaptive cancer patients cohorts https://immucan.eu) project studies immune-tumor interactions within the tumour microenvironment and its impact on therapeutic interventions. As part of this initiative, the team acquires multiplexed immunofluorescence and imaging mass cytometry data of thousands of patient samples from five cancer types. While multiplexed immunofluorescence captures the expression of about seven proteins across all cells of the tumor section, imaging mass cytometry focuses on measuring smaller regions (about 1mm²) with higher content (about 40 proteins). To study detailed immune-tumor interactions using imaging mass cytometry, it is therefore crucial to perform informed selection of regions of interest that capture the cell types of interest. We have now developed a set of computational tools that guide and facilitate the selection of regions for imaging mass cytometry based on multiplexed immunofluorescence measurements. First, a custom-made, user-guided workflow robustly identifies the same cell-types across different patients and cancer indications. Via an online tool, we are able to select 3-4 regions containing about 50% of tumor cells and a diversity of immune cells of interest. The Python package napping was developed to transfer the selected regions' coordinates onto brightfield images of consecutive sections to be measured by imaging mass cytometry. Upon data acquisition and quality control, we use the cytomapper Bioconductor package to manually label cell types on imaging mass cytometry images and train a random forest classifier for cell type classification. Based on this workflow, we were able to identify broad (e.g. T cells) and rare (exhausted CD8+ T cells) cell types across all acquired samples with good agreement to matched multiplexed immunofluorescence data.

Monday May 31 2021

Speaker: Mr. Shian Su (Walter and Eliza Hall Institute, Melbourne)

Title: Visualising Nanopore Methylation Data with NanoMethViz

Abstract: Signals produced by Oxford Nanopore direct DNA sequencing contains information that can be used to infer DNA modifications, providing an effective new tool for high-throughput and high-resolution analysis of genome-wide DNA methylation patterns. Our focus is on 5-methylcytosine CpG DNA methylation, which plays an essential role in the epigenetic regulation of mammalian gene expression. We produced a dataset of long read DNA sequencing on F1 cross female placenta data from known parental strains to demonstrate the utility of nanopore sequencing for investigating methylation. The well-characterised parental strains and long read sequences allow us to haplotype a majority of reads, and placental tissue guarantees paternal X-inactivation. We developed a package NanoMethViz to process and visualise the results from this data. NanoMethViz standardises data from multiple popular methylation callers and produces plots of methylation patterns at varying resolutions from sample overview to individual reads. We explore the patterns of methylation within imprinted and x-inactivated genes, as well as repeat elements.

Monday May 24 2021

Speaker: Mr. Andrew Sharo (University of California, Berkeley)

Title: Computational Methods Improve and Refine Clinical Structural Variant Interpretation

Abstract: Computational methods are rapidly improving our ability to predict which germline variants cause rare Mendelian disease. The applications are startling. Consider Kathleen Folbigg, who is serving a 30-year prison sentence for the alleged murder of her four children. Years later, scientists have found that her children inherited rare variants that may explain their sudden death. Will variant interpretation eventually exonerate Kathleen? More commonly, clinical geneticists must identify one or two disease-causing variants among millions of neutral variants in the genome of an individual with a rare disease. However, at least half of these cases remain unresolved, even after whole genome sequencing. Structural variants may be the cause of a portion of these unresolved cases. We have developed StrVCTVRE, a random forest method, to prioritize structural variants that overlap exons. StrVCTVRE will allow clinicians to eliminate half of structural variants from consideration with 90% sensitivity. I will also discuss our analysis of cataloged pathogenic variants, those variants that have been identified by clinical laboratories or researchers to cause disease. We consider two popular databases, ClinVar and HGMD. Using population sequencing datasets, we find that pathogenic HGMD variants imply two orders of magnitude more affected individuals than ClinVar. We also find that individuals of African ancestry are five times more likely to be predicted to be affected when HGMD variants are used. Encouragingly, more recent clinical variant interpretation recommendations removed much of the ancestry skew.

Monday May 17 2021

Speaker: Ms. Andrea Castro (University of California, San Diego)

Title: Cancer Mutation Differences Between Men and Women

Abstract: Cancer is a heterogenous disease, largely initiated and driven by genomic alterations. Some of these alterations can be detected by immune surveillance, leading the adaptive immune system to attack tumor cells. Indeed, immune evasion is one of the hallmark behaviors shared by all cancers. The major histocompatibility complexes (MHC) are central to immune surveillance, determining which of the genomic alterations in tumor cells can be presented to T cells. The peptide binding region of MHC-I is encoded by the highly polymorphic HLA genes (HLA-A/B/C) which ultimately determine peptide-MHC specificities. Tumor cells with "invisible" mutations escape immune surveillance whereas cells with "visible" mutations are more likely to be recognized and eliminated by T cells. Several studies have found that tumor genome landscapes show a bias for mutations that are "invisible" to the MHC. To better understand the factors that influence host anti-tumor immunity, we studied tumor mutation landscapes in the TCGA and found that mutation landscapes from tumors occurring in younger and female individuals are more depleted of "visible" mutations, suggesting that a stronger immune system may produce stronger constraints on tumor evolution. Furthermore, loss of integrity of the MHC is reflected by an excess of well-presented mutations. These findings have implications for developing effective immunotherapies.

Monday May 10 2021

Speaker: Dr. Jayne Barbour (University of Hong Kong)

Title: Mutations in CTCF-cohesin Binding Sites in Cancer

Abstract: Mutations acquired in cancer genomes form distinct patterns. CTCF-cohesin binding sites (CBS), or DNA loop anchors, and are vulnerable to accumulating mutations. To explore why CBS are vulnerable to mutagenesis we performed analysis of somatic mutation densities in CBS on 1980 whole-genome sequenced cancer samples. We found three groups of samples with enriched CBS mutations densities: skin, gastrointestinal and XPD mutant bladder cancers. XPD is a DNA helicase that is part of TFIIH and mutated in ~10% of bladder cancer. XPD mutant samples displayed elevated mutation densities in accessible chromatin and at transcriptionally active CBS. In a separate project, we tested if a germline mutation in a specific CBS in the CDKN2B locus is important in melanoma. Dermal fibroblasts from a family of sporadic melanoma harbouring this variant were cultured and we performed CTCF ChIP-seq, HiC and RNA-seq. We observed allele-specific loss of CTCF binding suggesting the variant is functional.

Monday May 3 2021

Speaker: Dr. Jiawei Wang (Monash University)

Title: Machine Learning Based Prediction and Analysis of Anti-CRISPR Proteins

Abstract: Anti-CRISPR (Acr) proteins are widespread amongst phage and promote phage infection by inactivating the bacterial host’s CRISPR-Cas defence system. Except for their universally short sequences, Acrs have little in common with each other. With very low sequence and structural similarity, at least 50 distinct\ Acr families have been identified across both bacterial and archaeal domains of life where they each use different molecular mechanisms to inhibit CRISPR-Cas systems. Outside the confined environment of a microbial cell, Acrs have inspired a number of downstream applications, from gene editing technologies and protein engineering to phage therapy, applications that are only limited by the relatively small number of known anti-CRISPR systems compared to the thousands hidden in sequenced genomes. In this talk, I will introduce our work in design and implementation of an all-in-one solution to better assist biologists to predict and analyze Acrs. This includes development of a novel machine learning based anti-CRISPR predictor (PaCRISPR) and a subsequent platform (AcrHub) to annotate known Acrs, predict novel Acrs and visualize the relationship between known and potential Acrs. These tools can either work independently or within the platform pipeline to facilitate prediction and downstream analysis of Acrs and thereby shorten the gap between prediction, functional characterisation, and eventual experimental validation.

Monday April 19 2021

Speaker: Dr. Jan Marzinek (A*STAR, Singapore)

Title: Multiscale Simulations of Dengue Virus Morphological Changes and Antibody Interactions

Abstract: A primary causative agent of infectious disease is the positive single-stranded RNA family of flaviviruses, which includes dengue (DENV), tick-borne encephalitis, West Nile virus, Japanese encephalitis, yellow fever, and Zika virus. Flavivirus particles undergo many structural rearrangements throughout their life cycle, such as during maturation or endocytosis. Another example involves "breathing", in which the virus can change its shape in response to a change in temperature, a phenomenon which can be modulated through mutations in the surface envelope proteins in order to evade vaccines and therapeutics. In this work, a multiscale molecular dynamics simulation approach was employed to investigate such conformational changes associated with the envelope protein during the DENV life cycle. Based on cryo-electron microscopy maps, we constructed a near-atomic resolution model of the complete viral envelope, containing envelope and membrane proteins embedded within a lipid bilayer vesicle. We leveraged this model to probe the dynamics of the viral outer shell associated with different stages of the virus life cycle, triggered by changes in the host microenvironment, such as temperature, pH and salt. We also used all-atom simulations to probe molecular details of the breathing process and its dependence upon mutations. We subsequently investigated interactions with antibodies different morphological states of the virus particle and, supported by diverse biophysical data, rationalise the occurrence of antibody-dependent enhancement, which can lead to the most serious forms of DENV infection including dengue hemorrhagic fever and shock syndrome. The combination of multiscale simulation and experiment reported here provides novel insights into appropriate therapeutic strategies for different stages of DENV infection and could give rise to new approaches for vaccine development and antibody engineering.

Monday April 12 2021

Speaker: Dr. Ting Qi (Westlake University, China)

Title: Genetic Control of pre-mRNA Splicing in Brain and its Role in Complex Trait Variation

Abstract: Most variants identified from genome-wide association studies (G.W.A.S.) in humans are non-coding, suggesting their role in gene regulation. Prior studies have shown considerable links of G.W.A.S. signals to expression quantitative trait loci (eQTLs), but the links to other genetic regulatory mechanisms such as splicing QTLs (sQTLs) are under-explored. Here, we introduced a transcript-based sQTL method (named DISMISS) with improved power for sQTL detection. Applying DISMISS along with LeafCutter, an event-based sQTL method, to brain transcriptomic data, we identified 7491 genes with sQTLs with p value less than 0.00000005, 2598 of which did not have eQTLs. Integrating the eQTL and sQTL data into GWAS for nine neuropsychological traits, we identified 271 genes associated with the traits through sQTLs, 153 of which did not overlap the trait-associated genes identified through eQTLs. Our study demonstrates the use of brain sQTLs as an invaluable means to understand the role of genetic regulation of transcription in neuropsychological traits.

Seminars in 2020, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 23 2020

Speaker: Dr. Yang Liao (Olivia Newton-John Cancer Research Centre)

Title: Rsubread: Ultrafast Read Mapping and Quantification of Next-Generation Sequencing Data

Abstract: Rsubread is a Bioconductor R package that encompasses multiple tools for fast and accurate analysis of Next-Generation Sequencing (NGS) data. Major functionalities of this toolbox include mapping of RNA and DNA sequencing reads to a reference genome, counting reads to genomic features such as genes, exons, junctions and genomic intervals (quantification) and discovery of genomic mutations. Challenges in existing pipelines will be highlighted and the basic ideas behind Rsubread to overcome these challenges will be explained. Functionalities of Rsubread and a demonstration of how to use it in the NGS data analysis will be shown. Results comparing Rsubread to other software tools for the mapping and quantification of sequence reads will be shown. The results show that Rsubread achieves a superior computational efficiency' than the other software tools without compromising specificity and sensitivity.

Monday November 16 2020

Speaker: Dr. Rebecca Poulos (Children's Medical Research Institute)

Title: Exploring the Landscape of Pan-cancer Proteogenomics with Cancer Cell Lines

Abstract: The combination of genomic and proteomic data can reveal novel insights relating to important biological processes in cancer. The acquisition of proteomic data From 979 cancer cell lines, derived from over 7000 data independent acquisition mass spectrometry runs generated at ProCan, Sydney, is presented. Proteomic data is integrated with existing multi-omic datasets available via the COSMIC project, and proteogenomic relationships are characterised.

Monday November 9 2020

Speaker: Ms. Yue Cao (University of Sydney)

Title: Benchmarking of Simulation Methods for Single-cell RNA-seq Data

Abstract: Single-cell RNA-sequencing (scRNA-seq) is a powerful technique for profiling the transcriptome at the single cell resolution and has gained tremendous popularity since its emergence in 2009. In recent years, there has been an increasing number of simulation tools designed specifically for simulating scRNA-seq data. For simulation data to be useful to aid in the development of analytical algorithms, simulation methods must generate a faithful and realistic representation of the scRNA-seq data. Using a systematic framework, the aim of our study is to evaluate each method at capturing the underlying biological structure of scRNA-seq datasets. In our evaluation framework, we evaluate a total of 12 simulation methods across 35 diverse datasets with a variety of tissue types, biological conditions and sequencing platforms. We discover that some measures are harder to capture by current methods than others and identified areas that could benefit from further methodological development.

Monday November 2 2020

Speaker: Dr. Weichang Yu (University of Mebourne)

Title: Gaussian Process Discriminant Analysis for Classification of Proteomics Mass Spectra

Abstract: Biomarker detection and prognostic classification are common steps in the analysis of proteomics mass spectrometry data. However, many existing classifiers do not incorporate the spectral nature of the data properly which may result in poor classification performance. In this talk, I will describe a newly developed Gaussian process discriminant analysis that is suitable for classifying mass spectrometry data. The proposed model incorporates feature selection and classification within a unified framework. The spectral nature of the data is accounted for with an appropriate covariance function. The computational efficiency of the model is kept within a reasonable range through variational Bayes and computational shortcuts. I will conclude the talk with numerical results based on simulated and real datasets.

Monday October 26 2020

Speaker: Dr. Fabio Zanini (University of New South Wales)

Title: The Art of Generating Hypotheses from Single Cell Data

Abstract: Single cell transcriptomic data are being ammassed by many laboratories and are revealing an amazing and sometimes overwhelming degree of heterogeneity within organisms, tissues, and even within each individual cell type. The cell similarity graph or network is a mathematical object at the core of such data sets, encoding phenotypic heterogeneity in a simplified yet powerful form. I will give an overview of my lab operations, centered around cell graphs and aimed at generating sound and interesting hypotheses for biomedicine via data exploration and deep experimental collaborations. I will present northstar, a new cell clustering/classification approach that is particularly well suited for cancer and developmental biology. I will then discuss two of our recent adventures in biomedicine: (1) constructing a cell atlas of the neonatal lung and (2) understanding the corruption of hematopoietic gene regulatory networks during acute myeloid leukemia.

Monday October 19 2020

Speaker: Mr. Fang Hu (University of Hong Kong)

Title: Optimising Tumor Mutation Burden Estimation from Targeted Panel Sequencing Data

Abstract: Tumor mutation burden (TMB) has emerged as a predictive marker for responsiveness to immune checkpoint blockade in multiple tumor types. As the gold standard, TMB is quantitated from whole exome data, but in a clinical setting it is generally approximated from targeted panel sequencing data. In this study, we systematically evaluate parameters that could affect the panel-based TMB (pTMB) assessment including panel size, gene content and local mutation determinants. By analysis simulated pTMB across different independent cohorts, we found that panels that based on cancer genes usually overestimate TMB, leading to misclassification of patients to receive improper therapy. This might be caused by positive selection for mutations on cancer genes and unlikely alleviate by removal of hotspots. To overcome this issue, we develop a parsimonious model that is capable of optimising pTMB estimation, with improved performance for patient stratification to clinical management. These findings may be immediately applicable for guiding accurate TMB approximation based on targeted panel sequencing data.

Monday October 12 2020

Speaker: Dr. Ignatius Pang (University of New South Wales)

Title: Mapping the Transcriptome Architecture and RNA-RNA Interactions of the Multi-drug Resistant Staphylococcus aureus Uncovers a Mechanism of Antibiotic Resistance

Abstract: Treatment of methicillin-resistant Staphylococcus aureus (MRSA) infections is dependent on the efficacy of last-line antibiotics like vancomycin. Changes in expression of regulatory RNA transcripts have been correlated with antibiotic stress responses in vancomycin-intermediate resistance isolates. The 5’ and 3’ untranslated regions (UTRs) of mRNAs are often the site of regulatory RNA interactions but these UTRs regions are often poorly annotated and uncharacterized. This talk will explore the use of three RNA sequencing techniques (RNA-seq, dRNA-seq and Term-Seq) to identify transcripts and their start and termination sites and the use of the ANNOgesic pipeline to analyse these data and generate a detailed transcriptome architecture of the methicillin-resistant Staphylococcus aureus JKD6009. We also discuss the use a custom Snakemake pipeline to identify RNA-RNA interactions from sequencing data generated from the endoribonuclease RNase III capture and RNA proximity-dependent ligation technique termed CLASH. We identified over 900 RNA-RNA interactions, which suggested mRNA-mRNA regulation of co-expression are much more widespread than previously appreciated.

Monday September 28 2020

Speaker: Dr. Juan Molina Ortiz (Charles Perkins Centre, University of Sydney)

Title: Modelling the Gut Microbiome: From One-size-fits-all to Personalised Dietary Interventions to Improve Health Outcomes

Abstract: Prevalence of non-communicable disease is on the rise and has become a public health concern. In recent decades, the gut microbiome has been linked to the onset of myriad non-communicable diseases making it an appealing target for intervention. One of the alternatives to intervene the gut microbiome is through diet, due to which several dietary interventions are readily available to the public. These are often one-size-fits-all approaches even when evidence attests that we all have a different set of microbes living in our gut. There is an urgent need for interventions that can be tailored to our individual requirements. Here we argue that in order to develop said interventions a better mechanistic understanding of how health outcomes emerge from individual microbes is required. Computational modelling can help us enable this knowledge by allowing us to examine the gut microbiome from novel perspectives.

Monday September 21 2020

Speaker: Dr. Emily Wong (Victor Chang Cardiac Research Institute)

Title: Impact of Ageing on Lung Regeneration and Tumorigenesis

Abstract: Ageing is an unstoppable process and the strongest risk factor for common diseases. We combine the power of comparative genomics and single-cell technologies to understand the key intrinsic signals involved in the loss of molecular identity and robustness in lung aging, and explore the effect of exercise on remodelling lung gene regulatory networks.

Monday September 14 2020

Speaker: Mr. Justin Miller (Western Sydney University)

Title: Predicting Cataract Development via in silico Gene Knockdown Within a Universe of Lens Signalling Pathways and Associated Gene Regulatory Networks

Abstract: SPAGI is designed to use known protein-protein interactions (PPIs) to simulate potential growth signalling pathways and then use RNA-seq expression data to filter these pathways and predict which paths are expressed in a given cell. In this method, potential paths are scored based on combined PPIs and the highest scoring path is retained, discarding the rest of the information. This method is then leveraged to create a virtual gene knockdown experiment. A machine learning model was trained to examine the difference in cell signalling pathways where genes have been removed that will result in cataracts being produced in development and genes that are not known to cause cataracts. This model was able to successfully predict which cell signalling universe a gene was from in two thirds of cases, making this a potentially useful method for identifying cataract-causing genes.

Monday September 7 2020

Speaker: Mr. Frederick Jaya (University of Technology Sydney)

Title: Evaluation of Recombination Detection Methods for Viral Sequence Analysis

Abstract: In order to accurately infer the evolutionary history of viral genomes, the process of recombination needs to be accounted for and addressed appropriately. A vast choice of recombination detection methods have been developed over the past 20 years, but their ability to address the needs presented by high-throughput sequencing of viral data is unclear. Here I present an overview of five published methods for detecting viral recombination (PhiPack (Profile), 3SEQ, GENECONV, UCHIME and gmos), by comparing their statistical approaches and the results from simulated and empirical analyses of +ssRNA viral data. I present the key considerations and guidelines in selecting appropriate methods for viral analyses, with examples of how sequence diversity may mislead some methods. Finally, I present how these methods scale to analyse large datasets.

Monday August 31 2020

Speaker: Dr. Ralph Patrick (Victor Chang Cardiac Research Institute)

Title: Tracing Cardiac Cell Networks and Dynamics Across Homeostasis, Injury and Augmented Heart Repair at Single-cell Resolution

Abstract: An overview of the scRNA-seq projects in our research group will be made, including some of the bioinformatics method development we have engaged with, including intercellular communication analysis and differential transcript use. Biological applications such as understanding the role of cardiac fibroblasts in heart injury and repair, and how these processes are modulated in different contexts such as therapy models or genetic knockouts will be discussed.

Monday August 24 2020

Speaker: Ms. Yingxin Lin (University of Sydney)

Title: Transfer Learning for Data Integration of Single-cell RNA-seq and ATAC-seq

Abstract: Single-cell transcriptomics profiling with single-cell RNA-seq (scRNA-seq) has provided unprecedented resolution in charatersing cell identities, cell functions across diverse tissues and conditions. Recent advances in measuring multiple modalities of single cells, such as single-cell ATAC sequencing (scATAC-seq), further enable characterisation of cells from different aspects. While scATAC-seq data provides the epigenomics profiling of cells, its extreme sparsity leads to its lack of the power of cell type identification. Therefore, integrative analysis of scRNA-seq and scATAC-seq allows not only cell type label transferring but also better understanding of the cellular phenotypes. We develop an end-to-end transfer learning algorithm, scJoint, to integrate scRNA-seq and scATAC-seq data. By building an integrative framework with neural network based dimension reduction and semi-supervised cell type prediction model, our algorithm is able to transfer labels from scRNA-seq to scATAC-seq data and construct a joint embedding for the two modalities. We illustrate our algorithm with two mouse cell atlas data from scRNA-seq and scATAC-seq data. We found that our algorithm outperforms the existing methods by a large margin in both joint visualisation of two modalities and cell type prediction.

Monday August 17 2020

Speaker: Ms. Chelsea Mayoh (Children’s Cancer Institute)

Title: Improving the Actionability of RNA-seq in High-Risk Paediatric Cancer

Abstract: The Zero Childhood Cancer (ZERO) program provides a comprehensive precision medicine approach to High-Risk paediatric malignancies (less than 30% survival) to improve treatment outcomes. We developed a pipeline to increase the utility of transcriptome sequencing (RNA-seq) in precision medicine to identify driver fusions, somatic mutations from RNA and over-/under-expressed genes. Through deeper exploration of RNA-seq beyond expression analysis and integration with whole genome sequencing, the RNA-seq pipeline has expanded the targeted therapeutic options to 72% of patients and a driver mutation identified in 94%. Here we will present our bioinformatic approaches to integrating the pipelines and the additional clinical utility a comprehensive RNA-seq pipeline provides and its impact on patient management and response.

Monday August 3 2020

Speaker: Ingrid Tarr (Victor Chang Cardiac Research Institute)

Title: The Use of Duplicate Samples to Improve Rare Variant Quality Control in Whole Genome Sequencing Studies

Abstract: Whole genome sequencing has transformed our ability to detect associations between phenotypes and genetic variants, however, the amount of erroneous variant calls has also drastically increased. Even with low error rates, a significant quantity of called variants will be false positives. These are particularly concerning in unbiased genome-wide rare variant analyses, where a smaller number of false variants can have a meaningful impact on results while broad confirmation of variants is unfeasible. Currently, evidence informing rare variant filtering is lacking and there is no consensus regarding indicators of poor variants. The ability of common GATK metrics to discriminate true-positive and false-positive rare variant calls from samples sequenced in duplicate will be discussed.

Seminars in 2020, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 29 2020

Speaker: Dr. Sarah Romanes (BCG GAMMA)

Title: From PhD Student to Consultant: Changing Careers During a Global Pandemic

Abstract: I share my experiences transitioning from a bioinformatics PhD student at the University of Sydney, to a data scientist working in the world of management consulting. I will discuss the biggest differences I have observed between analysis in academia and consulting, how I have managed to adapt to these changes during the COVID-19 crisis, and what you can expect if you decide that you are interested in a career in consulting!

Monday June 22 2020

Speaker: Dr. Seyhan Yazar (Garvan Institute of Medical Research)

Title: Single-cell eQTL Mapping Identifies Cell-type Specific Control of Complex Disease

Abstract: Genome-wide association studies in large populations have enriched our understanding of genetic variants implicated in health and disease while expression quantitative trait loci (eQTL) studies with microarray and bulk-RNA sequencing data showed us how these genetic variants affect the expression of one or more genes in a tissue-specific manner. However, it is much less known how genetic variants influence gene expression in various cell types within a tissue. This study, therefore, set to identify the cell-specific eQTLs in the human immune cells using single-cell sequencing technology. We have performed conditional cis-eQTL analysis on 14 cell types in 1242226 immune cells from 993 healthy human subjects and identified thousands of independent cis-eQTLs across 14 different immune cell types. We show that the majority of these eQTL were unique to an individual cell-type; however, eQTLs shared across the hematopoietic lineage are also identified. Linking GWAS variants with cis-eQTLs within different cell types, we were able to show disease variants exert their effects in specific cell types. We have shown cell-specific control of immune system disease and established a healthy immune cell resource at single-cell resolution to prioritise disease-associated variants in functional studies.

Monday June 15 2020

Special Event: Statistical Bioinformatics and AI for Cancer Care Symposium < Download Video

Speaker: Prof. Pablo Fernandez-Peñas (University of Sydney)

Title: Maths is Core to Cancer Care: The Melanoma Case

Abstract: In clinical care, we use numbers to understand most of the biology components of the human being and decide diagnosis and treatments. Measures such as blood pressure or glucose levels have been around for too many years. But there has been areas that have escaped to quantification and analysis, and one of the most critical ones for skin cancer and melanoma in particular is imaging. Dermatology is a visual specialty that relies on the skills of humans to make diagnosis. To add more complexity, if clinicians can’t make a diagnosis, the biopsies they take to help them are read by humans using, again, their visual skills. The time is coming for a more objective, quantifiable measurements to help with these visual challenges, and for this information to be combined with other sources of clinical data.

Speaker: Assoc. Prof. Jinman Kim (University of Sydney)

Title: AI for Biomedical Image Analysis – Experiences with Skin Lesion Images

Abstract: Medical imaging has an indispensable role in patient management in modern healthcare. There are numerous medical imaging modalities available; they vary in complexity and ‘sophistication’ from plain digital chest X-rays to simultaneous functional and anatomical imaging with positron emission tomography (PET) and computed tomography (CT) imaging (PET-CT) using the one device. The challenge is now on how to maximize the extraction of meaningful information from the images while not overloading the user. Fortunately, in parallel to the imaging improvements, we are in an era of artificial intelligence (AI) fuelling the growth of smart decision support and analysis tools for medical image interpretation. In a matter of few years, we have seen rapid rise in research algorithms being integrated to computed aided diagnosis (CAD) systems for clinical use. Yet from an engineering view, we are only at the infancy of the AI revolution towards healthcare. This talk will present the trend in AI development for medical images, with examples on our research in skin lesion image analysis.

Monday June 1 2020

Speaker: Assoc. Prof. Joshua Ho and Mr. Hephaes Chau (University of Hong Kong)

Title: A Statistical Method to Identify Cell Types with Differential Abundance in Single Cell RNA-seq Data

Monday May 25 2020

Speaker: Mr. Taiyun Kim (University of Sydney)

Title: scReClassify: Post-hoc Cell Type Classification of Single Cell RNA-seq Data

Abstract: Single-cell RNA-sequencing (scRNA-seq) is a fast emerging technology allowing global transcriptome profiling on the single cell level. Cell type identification from scRNA-seq data is a critical task in a variety of research such as developmental biology, cell reprogramming, and cancers. Typically, cell type identification relies on human inspection using a combination of prior biological knowledge (e.g. marker genes and morphology) and computational techniques (e.g. PCA and clustering). Due to the incompleteness of our current knowledge and the subjectivity involved in this process, a small amount of cells may be subject to mislabelling. A semi-supervised learning framework, scReClassify, for ‘post hoc’ cell type identification from scRNA-seq datasets is developed. Starting from an initial cell type annotation with potentially mislabelled cells, scReClassify first performs dimension reduction using PCA and next applies a semi-supervised learning method to learn and subsequently reclassify cells that are likely mislabelled initially to the most probable cell types. By using both simulated and real-world experimental datasets that profiled various tissues and biological systems, scReClassify is shown to be able to accurately identify and reclassify misclassified cells to their correct cell types. scReClassify can be used for scRNA-seq data as a post hoc cell type classification tool to fine-tune cell type annotations generated by any cell type classification procedure. It is implemented as an R package and is freely available from https://github.com/SydneyBioX/scReClassify

Monday May 18 2020

Speaker: Dr. David Humphreys (Victor Chang Cardiac Research Institute)

Title: Annotation: The Backbone and Achilles Heel of Genome Bioinformatics

Abstract: Gene models are an important, sometimes essential, requirement for many genomic bioinformatic pipelines. Interpreting transcript information from gene models can be difficult and something that one bioinformatic tool cannot comprehensively cover. This is especially true for RNA-sequencing (RNA-Seq) where an ever-increasing number of techniques continue to be developed. The VCCRI genomic core is responsible for contributing to the analysis of RNA-Seq and whole genome sequencing (WGS) data sets. In this presentation, I will highlight observations that were made outside the use of standard tools and how this led to the development of two new bioinformatic tools called Ularcirc and Sierra. From this analysis we conclude that the heart transcriptome is remarkably complex and that a complete annotated gene model does not yet exist. We anticipate that improving the annotation and interpretation of gene models will lead to better interpretation of variants that are detected in WGS.

Monday May 11 2020

Speaker: Dr. Ismael Vergara (Melanoma Institute of Australia and Peter MacCallum Cancer Centre)

Title: Evolution of Late-stage Metastatic Melanoma is Dominated by Tetraploidization and Aneuploidy

Abstract: Australia has one of the highest incidences of melanoma in the world and it has been referred to as our national cancer. Survival rates for melanoma are poor if not caught early. Recently, understanding of the molecular events that dominate the landscape of early disease has benefited from genomic sequencing, but how melanoma evolves into its metastatic and lethal form is poorly understood. To help rectify this, a rapid autopsy program, CASCADE (CAncer tiSsue Collection After DEath) was established at the Peter MacCallum Cancer Centre that provides multi-region sampling of metastases from patients at time of death. We obtained sequencing data from more than 70 samples from 13 patients, including WES and WGS. The matricial nature of this dataset prompted us to apply an analysis approach that builds on existing methods and makes use of the multiple samples from each patient.

Our analysis reveals striking patterns in the evolution of lethal melanoma. While early melanomas have large numbers of single nucleotide variants, we generally observed limited subsequent SNV and indel gain. Rather, evolution was dominated by large-scale copy number change including a remarkable level of loss of heterozygosity in some patients. In one case, multicore sampling revealed spatial heterogeneity in copy number of the primary tumour. Patterns of copy number change hinted that two mutational processes, aneuploidy and genome doubling, were operating universally. To test this we developed a novel method that models these mechanisms using branching processes.

Our findings in lethal melanoma suggest possible biomarkers that might be useful clinically in challenging settings where patients present significant clinical heterogeneity such as stage III disease. We are further developing these using a training set of 55 sequencing datasets from primary disease

Monday May 4 2020

Speaker: Mr. Aedan Roberts (University of Technology Sydney)

Title: A Bayesian Hierarchical Model for Detecting Differential Gene Expression Distributions for RNA-seq Data

Abstract: Gene expression data from diseases such as cancer provides the potential both to gain a deeper understanding of disease processes and to improve diagnoses by allowing the classification of patients with similar clinical features but clinically relevant differences at the genetic level. Identifying genes for biological relevance or as features for classification algorithms has traditionally relied on assessment of differential expression – differences in average levels of gene expression between classes – but there is evidence that differences in variance and other distributional properties can also be informative in identifying genes associated with disease. This research is aimed at developing methods for identifying genes relevant to disease by taking into account all available information on the distribution of gene expression – in particular the mean and the negative binomial dispersion parameter for RNA-seq count data. We have developed a Markov chain Monte Carlo algorithm based on a Bayesian hierarchical model to identify differences in mean, dispersion or overall distribution between groups. Inference for small sample sizes is improved over the few existing methods for detecting differences in variance by information sharing across genes. Testing on simulated data and real RNA-seq datasets with artificially induced differences in expression shows that the proposed method is competitive with existing methods for detecting differential expression and outperforms existing methods for detecting differences in variance, as well as allowing assessment of overall differences in distribution between groups, providing the potential to identify genes that may be involved in disease or have potential for prognostic classification, but which would be missed by traditional methods.

Monday April 27 2020

Speaker: Ms. Yunwei Zhang (University of Sydney)

Title: Combining Machine Learning and Survival Analysis to Identify Recipient Sub-cohorts in a Heterogeneous Kidney Transplantation Population

Abstract: Kidney transplant is the main remedy for end-stage renal disease and the prognosis of allograft survival is what recipients care about the most. A popular method for allograft survival prediction in kidney transplantation is through the Cox proportional hazard model. There is a substantial literature and the performance of the published models varies greatly. One possible explanation driving this variability of performance is the high heterogeneity that is intrinsic in the transplant population. We propose two complementary approaches (bottom-up and top-down) that aim to identify recipient sub cohorts based on the inherent structure of the data which will improve allograft survival. The innovations of our approaches lie in combining supervised and unsupervised learning, that is, it integrates machine learning methods with survival analysis. The bottom-up approach uses Numero, a new self-organising-map method, with the elastic net Cox model to stratify potential recipient sub cohorts. The alternative top-down approach uses the Cox model with a contrast tree method to identify cohort characteristics. Examining the results from both approaches, we find that recipient waiting time is an important predictor in predicting graft survival for the whole population. We also find that there is a large amount of heterogeneity among ‘unfit’ recipients, these recipients have sub cohorts that are particularly hard to predict in terms of their graft survival. In contrast, for younger and ‘fit’ cohorts, we found that immunological factors are important components. The ability to identify sub cohorts based on prediction outcome is useful for enhancing prediction of graft survival and has the potential guide allocation algorithm.

Monday April 20 2020

Speaker: Ms. Tingting Gong (Garvan Institute of Medical Research)

Title: Refining Somatic Structural Variant Detection for Precision Oncology

Abstract: Somatic structural variants (SVs), which are variants that typically impact more than 50 nucleotides, play a significant role in cancer development and evolution, but are notoriously more difficult to detect than small variants from short-read next-generation sequencing (NGS) data. This is due to a combination of challenges attributed to the purity of tumour samples, tumour heterogeneity, limitations of short-read information from NGS, and sequence alignment ambiguities. In spite of active development of SV detection tools over the past few years, each method has inherent advantages and limitations. We aim to evaluate variables impacting our ability to accurately detect somatic SVs and further facilitate informative decision-making on important impactful factors. Using simulation studies, we evaluated single and combinatoric effects of SV caller, SV types and sizes, variant allele frequency (tumour purity), sequencing depth of coverage, and variant breakpoint resolution. Using a generalized additive model allowed predictions of sensitivity and precision to be made for any combination of predictors. The prediction model was implemented in a web-based application, called Shiny-SoSV, which is freely available at https://hcpcg.shinyapps.io/shiny-sosv. Shiny-SoSV provides an interactive and visual platform for users to easily compare the individual and combined impact of different parameters. It predicts the performance of a proposed study design, on somatic SV detection in silico, prior to the commencement of experiments.

Monday April 6 2020

Speaker: Ms. Hani Jieun Kim (University of Sydney)

Title: CiteFuse Enables Multi-modal Analysis of CITE-seq Data

Abstract: Multi-modal profiling of single cells represents one of the latest technological advancements in molecular biology. Among various single-cell multi-modal strategies, cellular indexing of transcriptomes and epitopes by sequencing (CITE-seq) allows simultaneous quantification of two distinct species: RNA and cell-surface proteins (ADT). Here, we introduce CiteFuse, a streamlined package consisting of a suite of tools for double detection, modality integration, clustering, differential RNA and protein expression analysis, ADT evaluation, ligand-receptor interaction analysis, and interactive web-based visualisation of CITE-seq data. We show the capacity of CiteFuse to integrate the two data modalities and its relative advantage against data generated from single modality profiling using both simulations and real-world CITE-seq data. Furthermore, we illustrate a novel doublet detection method based on a combined index of cell hashing and transcriptome data. Finally, we demonstrate the use of CiteFuse for predicting ligand-receptor interactions with multi-modal CITE-seq data. Collectively, we demonstrate the utility and effectiveness of CiteFuse for the integrative analysis of transcriptome and epitope profiles from CITE-seq data.

Seminars in 2019, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday October 28 2019

Speaker: Associate Professor Sarah Kummerfeld (Garvan Institute)

Title: Realising the clinical potential of multi-omics

Abstract: Whole genome sequencing is now well established as a clinical tool. Interpretation of genomic DNA is being used for a broad range of clinical applications include: diagnosis of rare diseases, pharmacogenomics and understanding of complex disease. However, genomic DNA represents only one slice of the biology of disease. Through two very different vignettes, I will present on our work using a broad range of -omics technologies to understand disease and derive clinically relevant biomarkers.

About the speaker: Associate Professor Sarah Kummerfeld is the Scientific Head of the Kinghorn Centre for Clinical Genomics at the Garvan Institute of Medical Research. She uses genomics to understand human disease and translate findings into clinical diagnostics and treatments. Sarah completed her PhD in Computational Biology at the University of Cambridge, working on protein structure and function prediction. Her postdoctoral research at Stanford University studied the molecular basis of human ageing. Sarah has worked both in academia and industry, including 10 years as a Scientist at Genentech, based in the San Francisco Bay Area. At Genentech, she used large-scale genomics approaches to understand why only some patients respond to treatment and to identify diagnostic biomarkers that predict response to particular drugs. Sarah is dedicated to bringing applying advances in genomics research to benefit patients.

Monday October 21 2019

Speaker: Professor Jean Yang (University of Sydney)

Title: Single cell data integrative analysis

Abstract: Recent advances in large scale single cell transcriptome profiling have greatly expanded cell-type specific characterisation of complex biological systems. It enables discovery of many heterogeneous cell-types and differences in cell-type roportions often carry biological significance. A critical first step towards understanding such differences is the accurate identification of cell types. from the complex tissues and organs. A common approach achieved this by unsupervised clustering followed by manual annotation according to marker gene expression. With the increasing availability of large collections of scRNA-seq datasets generated from the same tissues, organs, and biological systems, as well as the comprehensive human and mouse cell atlases, we are now at the transition point where supervised classification may be trained to accurately classify cell types. In this talk, I will discuss a number of approaches develop at Sydney to address methodological challenges associated with single cell data. We will discuss a novel single cell differential composition (scDC) approach that performs differential cell-type composition analysis via bootstrap resampling. We will introduce a multiscale classification framework (scClassify) for single cell classification on a cell type hierarchy and ensemble learning where scClassify effectively annotates cells at different levels of the cell type hierarchy. Finally, for a given training dataset, scClassify implements a sample size estimation procedure to determine the number of cells required for accurate cell type classification at a given cell type hierarchy level.

About the speaker: Professor Jean Yang is an applied statistician with expertise in statistical bioinformatics. She was awarded the 2015 Moran Medal in statistics from the Australian Academy of Science in recognition of her work on developing methods for molecular data arising in cutting edge biomedical research. Her research stands at the interface between medicine and methodology development and has centered on the development of methods and the application of statistics to problems in -omics and biomedical research. She has made contributions to the development of novel statistical methodology and software for the design and analysis of high-throughput biotechnological data including that from microarrays, mass spectrometry and next generation sequencing. Recently, much of her focus is on integration of multiple biotechnologies with clinical data to answer a variety of scientific questions. This includes developing various approaches and methodologies in statistical machine learning and network analysis. As a statistician who works in the bioinformatics area, she enjoys research in a collaborative environment, working closely with scientific investigators from diverse backgrounds.

Monday October 14 2019

Speaker: Dr Pengyi Yang (University of Sydney)

Title: Systems stem cell biology

Abstract: The ability of stem and progenitor cells to differentiate into specialised cells is essential for organogenesis and opens the possibility for regenerative medicine where damaged cells in tissues and organs could be replaced using stem cell-derived cells. The understanding of identities and fates of stem/progenitor cells is the foundation for controlling stem cell differentiation and their use for stem cell-based therapies. In this talk, I will present three studies where systems approaches were utilised to understand cell identities and fate decisions in stem cells and during their differentiation. The talk will showcase how computational methods can be applied for making biological discovery and expanding our knowledge in stem cell biology.

About the speaker: Pengyi Yang is the Group Leader of the Computational Systems Biology group at Children's Medical Research Institute, at the Westmead Research Hub. He also heads the Computational Trans-Regulatory Biology group at Charles Perkins Centre, and holds a Senior Lectureship at the School of Mathematics and Statistics, the University of Sydney. As a systems biologist cross-trained in computer science, statistics, and systems biology, Pengyi combines computational and statistical methods to model trans-regulatory networks in stem and progenitor cells using large-scale multi-layered omic data.

Monday September 30 2019

Speaker: Dr Heejung Shim (University of Melbourne)

Title: riboHMM*: Comprehensive annotation of translated regions using ribosome footprint profiling data

Abstract: Understanding the functional effects of gene expression critically depends on the accurate and comprehensive annotation of sequence elements which are translated in each gene. Ribosome profiling provides direct and genome-wide measurements of translation levels in a given cell type. In this talk, I will first introduce a method, riboHMM, that 1) models a codon periodicity structure in ribosome profiling data, and 2) integrates RNA sequence information and transcript expressions to identify translated regions in a transcript. Applying riboHMM on ribosome profiling data collected from human lymphoblastoid cell lines, we identified 7273 novel translated regions, including 2442 translated upstream open reading frames (uORFs) and 2551 coding sequences from transcripts that were previously annotated as non-coding. We observed that more than 60% of the novel coding sequences use non-canonical start codons. We also observed that ~40% of the 2442 translated uORFs are likely to regulate the translation of their downstream coding regions.

About the speaker: I am a Group Leader in the Melbourne Integrative Genomics (MIG) and Lecturer (equivalent to Assistant Professor in US) in the School of Mathematics and Statistics at the University of Melbourne. I completed my BS in Mathematics (with a double major in Computer Science and Engineering) from the POSTECH, and my PhD in Statistics from the University of Wisconsin at Madison, advised by Dr. Bret Larget. I did a postdoc at the University of Chicago working with Dr. Matthew Stephens. Previous to my position here, I was a tenure track Assistant Professor in the Department of Statistics at the Purdue University for two years. Currently I retain an affiliation with Purdue as an Adjunct Assistant Professor.

Monday September 23 2019

Speaker: Professor Eric Stone (ANU)

Title: Getting serious about graphical structures in genome sciences

Abstract: Systems biology, defined broadly, is the study of how components of a biological system interact. Graphs, meanwhile, are general representations of the pairwise interactions (edges) between arbitrary components (vertices). It is no wonder, then, that graphs pervade systems biology and genome sciences in general. This talk is an attempt to lay some ground rules for making sense of them. I will focus on the ubiquitous issue of “missing vertices” that correspond to unmeasured components of the biological system. To do so, I will introduce a formal definition of graphs with missing vertices, and I will discuss how and why these objects are amenable to theory. Subsequently, I will discuss how theoretical results can be leveraged to make biological inference. I aim to provide a range of biological applications/illustrations spanning phylogenetics, population genetics, systems biology and beyond. While this talk will synthesise concepts from mathematics (i.e. spectral graph theory) and multivariate statistics (i.e. principal components analysis and multidimensional scaling), accessibility is not predicated on previous knowledge.

About the speaker: Eric is a quantitative biologist who combines statistical methods and mathematical theory to investigate how genetic variation has shaped biological diversity. He studied Mathematics at the University of Florida and Princeton University before training in Statistics and Genetics at Stanford University. He joined the Australian National University in mid-2016 after eleven years on the faculty at North Carolina State University. He is founding Director of the ANU Biological Data Science Institute as well as Director of the ANU-CSIRO Centre for Genomics, Metabolomics and Bioinformatics.

Monday September 16 2019

Speaker: Justine Charon (University of Sydney)

Title: Hunting the parasites of our parasites : meta-transcriptomic based identification of viral sequences associated to the human malaria parasite Plasmodium vivax.

Abstract: Eukaryotes of the genus Plasmodium cause malaria, a parasitic disease responsible for substantial morbidity and mortality in humans. Yet, the nature and abundance of any viruses carried by these divergent eukaryotic parasites is unknown. We investigated the Plasmodium virome by performing a meta-transcriptomic study of blood samples taken from patients suffering from malaria and infected with P. vivax, P. falciparum or P. knowlesi. This resulted in the identification of a novel RNA virus that we term Matryoshka RNA virus 1 (MaRNAV-1), encoding an RNA polymerase and restricted to P. vivax, as well as an associated hypothetical viral segment of unknown function. Additional screening revealed that MaRNAV-1 was abundant in geographically diverse P. vivax derived from humans and mosquitoes. A related bi-segmented narnavirus-like sequences (MaRNAV-2) were also retrieved from Australian birds infected with a Leucocytozoon - a genus of eukaryotic parasites that group with Plasmodium in the Apicomplexa subclass hematozoa. Together, these data support the establishment of two new phylogenetically divergent and genomically distinct viral species of protists, including the first virus infecting Plasmodium parasites. As well as broadening our understanding of the diversity and evolutionary history of the eukaryotic virosphere, the restriction of infection to P. vivax may be of importance in understanding P. vivax-specific biology in humans and mosquitoes, and how viral co-infection might alter host responses at each stage of the P. vivax life-cycle.

About the speaker: I am a postdoc in Edward Holmes lab since February 2018. My research interests focus on the evolution and diversity of RNA viruses, from the study of the molecular determinants of their amazing evolutive capacities to the characterization of the RNA virosphere in various unicellular eukaryota. More precisely, I am currently investigating the RNA virus diversity in human parasites Plasmodium responsible for the Malaria, as well as various taxa of microalgae, completely overlooked so far, through the use of meta-transcriptomic approaches.

Orcid ID :https://orcid.org/0000-0002-5602-6600
PhD in Molecular Plant Virology, obtained in December 2015 in University of Bordeaux, France – Dr. Thierry Candresse group. Work on the molecular and structural determinants of RNA virus evolution and adaptation using a plant-virus pathosystem as a model.
2016 : Post-doc at INRA, Bordeaux (France) – Dr. Thierry Candresse group : In vitro experimental assessment of the intrinsic disorder in RNA virus protein as an enhancer of virus evolutive potential.
2017 : Post-doc at IECB, Bordeaux (France) – Dr. Axel Innis group : Identification of new anti-microbial peptides using the E.coli ribosome as a high-throughput selection platform.
Since 2018 : Postdoc at University of Sydney – Pr. Edward Holmes group : Characterization of the RNA virus diversity in unicellular eukaryotes through the intensive use of meta-transcriptomics approaches.

Monday September 9 2019

Speaker: Peter Priestley (Hartwig Medical Center)

Title:

Abstract: Structural variation is one of the key drivers of tumorgenesis, but the genomic rearrangements in many tumor genomes are frequently overwhelmingly complex. We have created a novel interpretation and visualisation tool LINX to facilitate integrated analysis, intepretation and visualisation of structural variation and copy number in tumor genomes. In this talk, I will explain how LINX clusters raw structural variants into consistent events and predicts the chaining of derivative chromosomes and the genic impact. I will also demonstrate the novel visualisation capabilities in LINX. Improved interpretation of genomic rearrangements can lead to novel clinically relevant findings and improve insight into tumorgenesis.

About the speaker: Peter is the bioinformatics lead for Hartwig Medical Foundation and director of the Australian subsidiary. Hartwig Medical Foundation is a non profit institute based in the Netherlands which focuses on whole genome sequencing of cancer patients for research and has created the world’s largest database of metastatic whole cancer genomes. Peter’s team focuses on developing novel bioinformatic tools for whole genome analysis and applying these to patient reporting to support clinical decision making.

Monday September 2 2019

Speaker: Ellis Patrick (Usyd)

Title: Highly multiplexed imaging cytometry to investigate cell-type interactions in situ

Abstract: Understanding the interplay between different types of cells and their immediate environment is critical for understanding the mechanisms of cells themselves and their function in the context of human diseases. Recent advances in high-parameter imaging cytometry technologies have fundamentally revolutionized our ability to observe these complex cellular relationships, providing an unprecedented characterisation of cellular heterogeneity in a tissue environment. In this presentation, I will provide an overview of a selection of these exciting new assays and the scientific hypotheses that they enable. I will also offer my perspective on some of the analytical methods available and introduce a method that I have developed for exploring patterns of spatial organisation of cell-types.

About the speaker: Dr Ellis Patrick is an applied statistician and bioinformatician. He is currently a Lecturer and Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer's disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions.

Monday August 26 2019

Speaker: Belinda Phipson (MCRI)

Title: Using transcriptomics to understand the variability between human kidney organoids

Abstract: The ability to make three dimensional organoids from human pluripotent stem cells through directed differentiation opens up the possibilities of personalised drug testing, disease modelling and regeneration, as well as enhancing our knowledge of organ development. However, successfully using organoids for drug screening or disease modelling will rely on the robustness and transferability of the protocol between cell lines. I will talk about how we examined the reproducibility and robustness of a specific kidney organoid protocol using RNA-seq and single cell RNA-seq data across a total of 54 whole organoid samples.

We designed and performed extensive RNA-seq profiling of kidney organoids taken from various time points across the differentiation protocol. In addition, we generated a series of day 18 organoids derived from distinct iPSC clones, differentiations separated in time, as well as organoids grown concurrently from the same starting cells in separate vials. This allowed us to specifically model the gene-wise sources of variability that arise during the step-wise differentiation process. While individual organoids within a differentiation experiment were highly correlated, greater variation was seen between experimental batches. The most highly variable genes between differentiations were found to be associated with organoid maturation as defined by our time course analysis. Single cell profiling of organoids revealed shifts in patterning and cell type proportions in line with this observation. Finally, I will show how we have applied what we have learned to disease modelling studies, thereby increasing the utility of kidney organoids for personalised medicine and functional genomics.

About the speaker: Dr Belinda Phipson completed her undergraduate degree in Applied Mathematics and Statistics and her Masters degree in Biostatistics in South Africa. She worked briefly as a Biostatistician before relocating to Australia in 2007 and transitioning to Bioinformatics. She joined Professor Gordon Smyth’s group at the Walter and Eliza Hall Institute of Medical Research where she first worked as a Statistical Consultant and then enrolled in a PhD. During her PhD she worked on empirical Bayes methods for gene expression data. She completed her PhD in 2013 before joining the Murdoch Children’s Research Institute as a post-doctoral researcher with Associate Professor Alicia Oshlack. Her current research focusses on methods development and analysis of single cell RNA-seq data, as well as methods development for methylation array data.

Monday August 19 2019

Speaker: Thuc Le (UniSA)

Title: Causality Discovery and Applications in Bioinformatics and Cancer Research

Abstract: In many real-world applications, the research questions of interest are about causality rather than association, whether the goal is for better explanation, prediction or decision making. Causal discovery aims to answer the causality related questions by inferring the cause-effect relationships between variables. Traditionally, causal relationships are identified by making use of interventions or randomised controlled experiments. However, conducting such experiments is often expensive or even impossible due to cost or ethical concerns. Therefore, there has been an increasing interest in discovering causal relationships based on observational data, and in the past few decades, significant contributions have been made to this field by computer scientists. In this talk, I will briefly introduce causality discovery approaches and then talk about a few applications in Bioinformatics and cancer research, including inferring miRNA regulatory relationships, predicting cancer treatment responses, and identifying cancer drivers.

About the speaker: Thuc is a Senior Lecturer in the School of Information Technology and Mathematical Sciences, University of South Australia. He is currently an NHMRC Early Career Research Fellow in Bioinformatics (2017-2020). His research focuses on the development of causal inference methods and their applications in Bioinformatics, particularly in gene regulatory networks, cancer drivers, non-coding RNAs, and cancer subtype discovery.

Monday August 12 2019

Speaker: Erdahl Teber (CMRI)

Title: Framework for determining chromosome-arm specific telomere sequence length and content.

Abstract: Current methods to measure telomere length and to acquire the sequence content are flawed by substantial technical limitations and lack the efficiently to resolve haplotype-specific sub-telomere and telomere sequences. Dr Teber will discuss the challenges and limitations in using long read sequencing to assemble contigs spanning the extensively repetitive telomeric (TTAGGG)n regions.

About the speaker: Dr Erdahl Teber is Team lead at CMRI’s Bioinformatics and Data Science Research Core Facility and Conjoint lecturer at Sydney Medical School, University of Sydney. His research interests are in developing algorithms and code to help unpack the genes that drive cancer, stem cell biology and regenerative medicine

Seminars in 2019, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 24 2019

Speaker: Daniel Cameron (WEHI)

Title: GRIDSS2: detecting undetectable structural variants

Abstract: Structural variants play a major role in the development of cancer and in genetic disorders. With the bulk of genetic studies focusing on single nucleotide variants and small insertions and deletions, structural variants are often overlooked. In part this is due to the increased complexity of analysis required, but is also due to the increased difficulty of detection. In this talk, I will show how the single breakend reporting capability unique to GRIDSS2 enable it to identify structural variants in regions currently considered inaccessible to short read sequencing. Further demonstrating the power of this approach, I will present results from the Hartwig Medical somatic SV pipeline in which we integrate GRIDSS2 with PURPLE, a somatic copy number, purity and ploidy estimation tool, and LINX, a SV/CNV analysis, visualisation and interpretation tool to uncover the structural mutational landscape of cancer in unprecedented resolution and accuracy.

About the speaker: After completing a double degree in science and engineering, Daniel worked for over a decade as a professional software engineer, leading team of up to 30 developers building large-scale software systems. In 2012, he commenced a PhD in the Bioinformatics Division of the Walter and Eliza Hall Institute of Medical Research (WEHI), where he focused on improving structural variant detection, developing the Genomic Rearrangement Identification Software Suite (GRIDSS). Since graduating, he has focused on understanding the role of genomic rearrangements in cancer and has been working as part of the Hartwig Medical Foundation Bioinformatics team uncovering the landscape of genomic rearrangement in metastatic cancer.

Monday May 27 2019

Speaker: Charmaine Tam/ Madhura Killehar (University of Sydney/SIH)

Title: Harnessing data in electronic medical records to characterise the management of patients with acute chest pain

Abstract: The widespread and rapid adoption of electronic medical records (eMR) has generated much promise for the use of this data to inform scientific discovery as well as applications in clinical decision support. In this talk, we present the methodology and preliminary findings from our proof-of-concept study called SPEED-EXTRACT, which extracts electronic medical record data from patients presenting with suspected acute coronary syndrome to Northern Sydney and Central Coast Local Health Districts. The main goal of SPEED-EXTRACT is to evaluate the accuracy of current administrative ICD10 coding for a diagnosis of “STEMI”, a serious type of heart attack, and subsequently develop algorithms for classifying “gold standard” STEMI diagnoses. Other goals are to examine whether data from the eMR can be used for benchmarking quality and safety as well as return relevant analytical findings back to clinicians.

About the speaker: Charmaine Tam is a Senior Research Fellow and Project Lead for the SPEED-EXTRACT project at Northern Clinical School and the Centre for Translational Data Science. Madhura Killedar is a Data Science Research Engineer Group Lead at the Sydney Informatics Hub. Together they work in an analytics team that share a common vision that informatics is a compelling approach for scientific discovery, quality improvement in clinical care and enabling effective healthcare services.

Monday May 20 2019

Speaker: Dr. Edward Hancock (University of Sydney)

Title: The synergy between buffering and feedback in metabolic regulation

About the speaker: Edward completed a BSc (Maths) and BE at Sydney University, and a DPhil (PhD) in control and dynamical systems at Oxford University. He was subsequently a post-doc in synthetic biology at Oxford before starting in the Coffey LifeLab in the Charles Perkins Centre at Sydney. He has spent time as a visiting researcher at ANU, Imperial College London and Caltech.

Monday May 13 2019

Speaker: Francisco Avila Cobos (Ghent University)

Title: Impact of data transformation, pre-processing and choice of method in the computational deconvolution of transcriptomics data

Abstract: Gene expression analyses of bulk tissues often ignore cell type composition as an important confounding factor, resulting in a loss of signal from lowly abundant cell types. Many computational methods to infer proportions of individual cell types from bulk transcriptomics data have been developed (= computational deconvolution). Attempts comparing these methods revealed that the choice of reference signatures is more important than the method itself. However, an evaluation of the combined impact of data transformation, pre-processing and methodology on the results is still lacking. Using single-cell RNA-sequencing (scRNA-seq) data from human pancreas and PBMCs, we artificially generated hundreds of pseudo-bulk mixtures with varying number of cells and cell types in known proportions, allowing the evaluation of the combined impact on the deconvolution results.

About the speaker: Francisco Avila Cobos obtained a BSc in Biotechnology from University of León (Spain, 2013), completed a MSc in Bioinformatics and Computational Biology at University College Cork (Ireland, 2015) and, thanks to a Special Research Fund (BOF) scholarship of Ghent University (Belgium), became a PhD fellow under the supervision of Prof. Katleen De Preter and Prof. Pieter Mestdagh. Funded by grant for a long stay abroad from Scientific Research Flanders (FWO), he is currently a visiting researcher at the Garvan-Weizmann Centre for Cellular Genomics in Sydney (single cell and computational genomics division) under the supervision of Prof. Joseph Powell.

Monday May 6 2019

Speaker: Dr. Conan Wang (University of Queensland)

Title: Let’s Get Back into Shape! Structuring peptides into drugs

Abstract: Large-scale analyses of disease datasets have uncovered a multitude of pathways and protein-protein interactions that can now be targeted for therapeutic gain. One class of molecules that has attracted interest from industry for the design of next generation drugs to bind to these targets are peptides, nature’s miniature proteins, because of their potential for high specificity and low toxicity. I am particularly interested in peptides with privileged biopharmaceutical properties of exceptional structural stability and favourable thermodynamic binding properties owing to their unique and highly constrained architectures. Many of these peptides are found in nature – in the venom of spiders, in seeds of sunflowers, and in leaves of your everyday garden herbs – and many more continue to be discovered through bioinformatic screens of public databases. I support the view that an understanding of their structures will allow us to design better drugs, ones that have high shape complementarity to their target, resulting in potent binding affinities and specificities, and can be delivered effectively to their targets. In this talk, I will discuss how structures or shapes of peptides and proteins can be used to guide the design of new drug leads by mining structural databases; and to understand how we can better deliver them to increase efficacy and patient compliance by simulating their dynamics in biological environments. I will also discuss recent methods for the determination of peptide structures that have come about through analyses of packing preferences within crystals. I hope to convey by the end of this talk the utility of structural investigations in molecular engineering of new bioactive compounds.

About the speaker: Dr Wang is a Senior Research Officer at the Institute for Molecular Bioscience, the University of Queensland, Brisbane. He began his training as a bioinformatician pressing keys at the University of New South Wales. Since then, he has broadened his research experience as a structural biologist lifting pipettes and test tubes on an NHMRC ECR Fellowship at Hong Kong University of Science and Technology, Griffith University, and now at the University of Queensland. He works at the interface between experimentation and computation in industry collaborations to design new bioactive molecules to treat diseases, such as multiple sclerosis, cardiovascular disease, and cancer, or control environmental pests. He is interested in understanding the structures of these bioactive compounds and using that knowledge to improve design approaches. With the time he has spent in the lab, he may not be in shape but hopefully his molecules are!

Monday April 29 2019

Speaker: Dr. Robert Weatheritt (Garvan Institute of Medical Research)

Title: Efficient and Accurate Quantitative Profiling of Alternative Splicing Patterns of Any Complexity on a Laptop

Abstract: Alternative splicing (AS) is a widespread process underlying the generation of transcriptomic and proteomic diversity and is frequently misregulated in human disease. Accordingly, an important goal of biomedical research is the development of tools capable of comprehensively, accurately, and efficiently profiling AS. Here, we describe Whippet, an easy-to-use RNA-seq analysis method that rapidly—with hardware requirements compatible with a laptop—models and quantifies AS events of any complexity without loss of accuracy. Using an entropic measure of splicing complexity, Whippet reveals that one-third of human protein coding genes produce transcripts with complex AS events involving co-expression of two or more principal splice isoforms. We observe that high-entropy AS events are more prevalent in tumor relative to matched normal tissues and correlate with increased expression of proto-oncogenic splicing factors. Whippet thus affords the rapid and accurate analysis of AS events of any complexity, and as such will facilitate future biomedical research.

About the speaker: Robert Weatheritt studies the impact of post-transcriptional regulation on proteomic diversity. He did his PhD at EMBL Heidelberg and undertook postdoctoral research in Cambridge and Toronto. He is now an EMBL Australia group leader at the Garvan Institute of Medical Research.

Monday April 15 2019

Speaker: Dr. Shila Ghazanfar (CRUK, Cambridge)

Title: Characterization of differential correlation across single cell differentiation trajectories with scDCARS

Abstract: Single cell RNA-seq data places us in an unprecedented position where we are able to examine patterns of variation and importantly co-variation of genes across cells along continuous differentiation trajectories. We recently presented Differential Correlation Across Ranked Samples (DCARS), a statistical method to identify differentially correlated gene pairs across a set of ranked samples, representing either discrete or continuous patterns of group identity. Here, we describe a new approach, scDCARS, a framework for which changes in correlation are examined across a differentiation trajectory. We demonstrate scDCARS with liver developmental data and find key cascading changes in coordination of gene subnetworks including those associated with cell cycle and lipoprotein metabolism. Furthermore, we present scDCARS as part of the DCARS package as well as an interactive Shiny application readily available for scientists’ interrogation with new data. This work provides a unique lens in which higher order interactions among genes can be unpicked and understand the landscape of cell type fate choice.

About the speaker: Dr. Shila Ghazanfar is a Royal Society Newton International Fellow and Research Associate working at the Cancer Research UK Cambridge Institute. She completed her PhD in statistical bioinformatics at The University of Sydney in the School of Mathematics and Statistics. Her current research interests are in the statistical analysis of data arising from high throughput sequencing technologies such as single cell RNA-Seq and spatially resolved single cell transcriptomics in various research contexts.

Monday April 8 2019

Speaker: Dr. Qing Zong (Group Leader, Cancer Data Science, Children’s Medical Research Institute)

Title: Cancer Data Science for Mass Spectrometry-based Proteomics

Abstract: Dr Qing Zhong will give an overview of ProCan, a flagship program at the Children’s Medical Research Institute. The aim of ProCan is to generate and analyse a pan-cancer proteome database of tens of thousands of human cancers of all tumour types in the next 7 years. He will then introduce the mass spectrometry-based proteomics and a typical ProCan workflow that shows how biological samples are turned into permanent digital proteome maps. Next, he will talk about some ongoing projects, such as discovery of prognostic biomarker for stratifying prostate cancer patients with intermediate Gleason scores, proteomic profiling of 1000+ cell lines from the Sanger Institute, investigation of ovarian tissue heterogeneity, and proteogenomic analysis of lung cancer. In addition to these research topics, he will also present several technical studies regarding feature stability by machine learning and cross-instrument reproducibility. ProCan believes that these projects will add to the landscape of precision cancer medicine and facilitate the delivery of molecular data to cancer clinicians, in a clinically-relevant time frame, to maximise the accuracy of treatment decisions.

About the speaker: Dr Qing Zhong is a data scientist with expertise in analysis of biological and medical data by machine learning techniques. He has a science doctorate from the Swiss Federal Institute of Technology (ETH) Zurich, and a decade of experience working in an interdisciplinary environment that involves collaboration between biologists, clinicians, and industry partners. His postdoctoral training was at the University of Zurich and its affiliated hospital, where he developed expertise in analysis of omics data, and designed and performed a proof-of-concept study, to test a clinical big data system for consolidating genomic, clinical and demographic information into a unified model for precision and data-driven medicine. He joined Children’s Medical Research Institute (CMRI) in 2017 to head the Cancer Data Science group.

Monday April 1 2019

Speaker: Dr. Natalie Twine (CSIRO)

Title: TRIBES: Cryptic relationship and disease variant discovery in Amyotrophic Lateral Sclerosis

Abstract: Amyotrophic lateral sclerosis, (ALS) is a devastating and lethal neurodegenerative disorder. The majority of ALS cases (90%) are sporadic (SALS), while the remaining cases are familial (FALS). Due to FALS gene mutations sometimes appearing in apparently sporadic cases, we hypothesise that some SALS are distantly related. We have developed a genetic ancestry tool, TRIBES, which is faster than comparable relatedness tools (ERSA) and has improved accuracy to KING. Using TRIBES, we have identified cryptic relatives in a large ALS whole genome sequence (WGS) cohort. We discovered a single haplotype connecting 19 FALS families, highlighting previously unknown relatedness. We also discovered novel 5th and 6th degree relationships connecting SALS cases. Crucially, shared genomic regions between novel relatives highlight mutations in known ALS genes, as well as novel genes. Newly identified relatives significantly increases the power to identify novel ALS genetic mutations.

About the speaker: Natalie is a research scientist and team lead for the genomics insight team within the transformational bioinformatics group at CSIRO. The team focuses on developing technology for population-scale genomics. Natalie’s major research focus is use of Big Data technologies to understand the genetic basis of ALS. This is a collaborative project with Macquarie University and the international consortium, Project MinE. Natalie has expertise in high throughput genomic and transcriptomic data analysis, clinical genomics, genetics and big data analysis. She obtained her PhD in Bioinformatics from University of New South Wales and has previously worked at UNSW, Kings College London and University College London.

Monday March 25 2019

Speaker: Dr. Yi Jin Liew (CSIRO)

Title: Epigenetic adaptation of corals to climate change.

Abstract: The role of epigenetics in plants and mammals is fairly well understood; not so for basal metazoans such as corals. One such mechanism is DNA methylation, which regulates gene expression through the reversible methylation of cytosines in the genome. Using the coral Stylophora pistillata, which has a broad geographical range and able to thrive in diverse habitats, we sought to understand whether DNA methylation plays a role in adapting to more acidic oceans—a consequence of anthropogenic climate change. To do this, we performed whole genome bisulphite sequencing (WGBS) on triplicate samples from four pH conditions for more than two years to mimic long-term pH stress. We observed genes associated with cell cycle and body sizes having increased gene body methylation under this stress, which corresponded to the phenotypes of larger cell and polyp sizes. This allowed stressed corals to maintain the same linear growth rate despite being in conditions that reduced their calcification rate. As corals are long-lived organisms, we were interested in knowing whether these epigenetic modifications were passed on to the next generation. WGBS was carried out on adult, gamete and larval samples from another coral, Platygyra daedalea. Epigenotypes of the samples were more similar to their respective parents’ than to other samples of the same type, providing initial evidence for the intergenerational inheritance of methylation patterns in corals.

About the speaker: Yi Jin is currently a Research Scientist in the Molecular Diagnostics Solutions group in CSIRO, attempting to squeeze public datasets for promising cancer biomarkers. Prior to that, he was a postdoc at the King Abdullah University of Science and Technology in Saudi Arabia, where he studied DNA methylation in corals (and regrets never mastering diving despite working on corals AND living by the Red Sea). He graduated with a PhD in Genetics from the University of Cambridge, but has, over the years, swapped the pipette for a keyboard.

Seminars in 2018, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Friday December 14, 2018 (NOTE: Special time and location: 10 - 11AM, Level 4 Large Meeting Room, Charles Perkins Centre)

Speaker: Dr Aaron Lun (Cancer Research UK - Cambridge Institute)

Title: Challenges and future directions in single-cell data analysis

About the speaker: I graduated from the University of Sydney with a Bachelor of Science, majoring in Molecular Biology and Genetics. I did my PhD in Bioinformatics at the Walter and Eliza Hall Institute for Medical Research with Gordon Smyth, working on statistical methods for analyzing ChIP-seq and Hi-C data to study chromatin structure and organization. I am currently working as a research associate with John Marioni at the CRUK Cambridge Institute, developing computational methods for analyzing single-cell RNA sequencing data. I maintain around 14 Bioconductor packages focusing on a range of genomics data analyses, but am probably best known as the cat on the support site.

Monday November 19, 2018

Speaker: A/Prof Jessica Mar (The University of Queensland)

Title: One of these cells is not like the other – how variability of gene expression highlights regulatory control.

Abstract: When studying the transcriptome, our inferences typically revolve around changes in average gene expression. For a population of single cells, modeling gene expression distributions and how their properties differ between phenotypes, can be far more informative than following average trends alone. This talk outlines some of the approaches my lab has developed to investigate how variability of gene expression contributes to our understanding of transcriptional regulation.

About the speaker: Associate Professor Jessica Mar is a Group Leader at the Australian Institute for Bioengineering and Nanotechnology at the University of Queensland in Brisbane. The Mar group focuses on understanding variability in the transcriptome and how this informs regulation of cell phenotypes. Jess received her PhD in Biostatistics from Harvard University in 2008. She was a postdoctoral fellow at the Dana-Farber Cancer Institute in Boston (2008-11), and an Assistant Professor at Albert Einstein College of Medicine in New York (2011-2018). Having only just relocated back to Australia as an ARC Future Fellow this year in July, a major focus of her work is on modelling the aging process using single cell bioinformatics. Jess has received several awards, including a Fulbright scholarship (2003), the Metcalf Prize for Stem Cell Research from the National Stem Cell Foundation of Australia (2017), and the one that she is the proudest of is the LaDonne H. Shulman Award for Teaching Excellence (2017) because the winner is selected by the graduate students at Albert Einstein College of Medicine.

Monday November 12, 2018

Speaker: A/Prof Ruby Lin (The University of Sydney)

Title: Treatment of severe Staphylococcus aureus infections with bacteriophage therapy - Westmead experience.

About the speaker: Ruby joined Iredell lab at the end of October 2017 after a short stint in industry. She is the project manager for Iredell lab and the scientific lead for an investigator-led clinical trial involving treatment of severe Staphylococcal infections using bacteriophage therapy. Her research focus has been microRNA driven dysfunctions in eukaryotic disease model systems including mouse/rat models and humans. She was a named NHMRC Peter Doherty fellow, 2005-8 and UNSW Global postdoctoral fellow, 2009-14. She has acquired >A$5.1m in competitive funding. She has 5 seminal papers in BMJ, 1999, Nature, 2010, ATVB, 2010, PNAS, 2012 and Faseb J, 2014, all received media coverage, high impact and have high citation index. She has presented >65 papers at >30 conferences and cross-disciplinary seminars as invited chair and speaker. She is a conjoint Associate Professor at UNSW. During her presidency at Australasian Genomic Technologies Association (AGTA), a prominent society in genomics in Australia and NZ with members from industry and academia, she implemented gender equality at its annual meetings. She is heavily involved in promoting gender balance and women in STEM through various professional networks. She continues to train honours, PhD and postdoctoral researchers. She is a regular guest lecturer at UNSW, UTS and Macquarie University. In her spare time she volunteers as the primary ethics coordinator at her kids’ school and helps P & C with fundraising events. She also does pro bono work as a career coach.

Monday November 5, 2018

(No Seminar)

Monday October 29, 2018

Speaker: Helen McGuire (The University of Sydney)

Title: Single cell analysis with Mass Cytometry; technology introduction and opportunities

Primer on Mass Cytometry: The anatomy of single cell mass cytometry data

About the speaker: Dr Helen McGuire is a Research Officer at the Ramaciotti Facility for Human Systems Biology, Charles Perkins Centre, an initiative established in 2013 to support the development of mass cytometry and broader systems biology analysis across the University of Sydney campus and wider collaborative links. Her research focus and interest lies in the clinical application of immunological studies to a range of human diseases.

Monday October 22, 2018

Speaker: Timothy Peters (Epigenetics Laboratory, Garvan Institute)

Title: A general framework for evaluating cross-platform concordance in genomic studies

Abstract: The reproducibility of scientific results from multiple sources is critical to the establishment of scientific doctrine. However, when characterising various genomic features (transcript/gene abundances, methylation levels, allele frequencies and the like), all measurements from any given technology are estimates and thus will retain some degree of error. Hence defining a “gold standard” process is dangerous, since all subsequent measurement comparisons will be biased towards that standard. In the absence of a “gold standard” we instead empirically assess the precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. We analyse a publicly available TCGA dataset containing both sequencing and array technologies, allowing a direct per-technology, per-locus comparison of sensitivity and precision across all common loci. We implement and showcase a number of applications of the row-linear model, including direct comparisons of the sensitivity and precision of these platforms. Our findings demonstrate the utility of the row-linear model in evincing varying levels of concordance between measurements on these platforms, serving as a process for identifying reproducibility caveats in studies where cross-platform validation is performed.

About the speaker: Tim's background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. He is currently a bioinformatician/statistician in the Immunogenomics group at Garvan Institute of Medical Research. Current interests include single-cell methylome and transcriptome analysis, and reproducibility of genomic studies.

Monday October 15, 2018

Speaker: Gene Hart-Smith (University of New South Wales)

Title: The promise and pitfalls of big proteomics data: a case study centred on protein methylation

Abstract: The field of proteomics is reliant on computationally intensive analyses of large datasets. A particular focus is on the accurate identification of peptides from large datasets of tandem mass spectrometry (MS/MS) spectra, which are typically collected in high-throughput LC-MS/MS experiments. The data analysis workflow that has been developed to meet this challenge – the ‘sequence database search’ – is considered a cornerstone of contemporary proteomics research.
Despite their near-ubiquity, sequence database searches can consistently go wrong. For example we recently showed that when sequence database searches are applied to the identification of post-translational protein methylation, false discovery rates are unavoidably high. This particular defect of the sequence database search has resulted in a plethora of false information entering the mainstream scientific literature. The reasons behind this defect will be discussed, together with specific and practical means by which this defect can be overcome.

About the speaker: Dr Gene Hart-Smith is a recent ARC Discovery Early Career Researcher Award holder working within the UNSW School of Biotechnology and Biomolecular Sciences. In 2010 he completed a PhD at the UNSW School of Chemistry, in which he utilised mass spectrometry as a primary tool to investigate synthetic polymer formation processes. He has since been applying his expertise in mass spectrometry to the study of biological systems.
Gene’s current research is centred around the examination of protein-protein interaction networks. He is particularly interested in how post-translational modifications regulate the dynamics of these networks, and is developing and applying mass spectrometric methods towards the investigation of this phenomenon.

Monday October 8, 2018

Speaker: Chelsea Mayoh (Children's Cancer Institute)

Title: The Complexity of Identifying Targetable Genes in the Paediatric Transcriptome

Abstract: Molecular profiling of childhood cancers allows for personalised treatments based on targets found through Whole Genome and RNA sequencing. The accurate identification of germline/somatic mutations, copy number amplifications/deletions and structural variations is possible through the availability of matched tumour-normal pairs. However, identification of up-/down-regulated genes poses a challenge without having a matched normal. In this talk I will speak about the limitations, advantages and complexity of the transcriptome and utilising it to identify potential drug targets for paediatric patients through the Zero Childhood Cancer Program.

About the speaker: Chelsea is the lead bioinformatician at the Children's Cancer Institute. In addition to managing the Bioinformatician team she is one of the key bioinformaticians involved with the Zero Childhood Cancer program. At the Children's Cancer Institute, she works on a wide variety of childhood cancers with a heavy focus on CNS tumours, Neuroblastoma and Leukaemia's performing various kinds of bioinformatic analysis. She is also the Bioinformatician/Biostatistician on several Study Committees through the Sydney Children's Hospital Clinical Trials Program. Prior to coming to Australia in 2015 she was at the Genome Sciences Centre in affiliation with the BC Cancer Agency in Canada.

Monday October 1, 2018 (No seminar - Labour Day public holiday)

Monday September 24, 2018 (No seminar)

Monday September 17, 2018

Speaker: James Cornwell (The University of Sydney)

Title: Quantifying intrinsic and extrinsic control of single cell fates by time- lapse imaging, single-cell tracking, and competing risks analysis

Abstract: The molecular control of cell fate and behaviour is a central theme in biology. Inherent heterogeneity within cell populations requires that control of cell fate be studied at the single-cell level. Time-lapse imaging and single-cell tracking are powerful technologies for acquiring cell lifetime data, allowing quantification of how cell-intrinsic and extrinsic factors control single-cell fates over time. However, cell lifetime data contain complex features. Competing cell fates, censoring, and the possible inter-dependence of competing fates, currently present challenges to modelling cell lifetime data. Thus far such features are largely ignored, resulting in loss of data and introducing a source of bias. In this seminar I will talk about how competing risks and concordance statistics, previously applied to clinical data and the study of genetic influences on life events in twins, respectively, can be used to quantify intrinsic and extrinsic control of single-cell fates.

About the speaker: James completed a Bachelor of Mechatronic Engineering and a Master of Biomedical Engineering in 2012 from the University of New South Wales (UNSW). During his studies James undertook research internships at the Australian Nuclear Science and Technology Organisation (Sydney), the Jozef Stefan Institute (Slovenia), and at ETH Zurich (Switzerland). In 2012 James started his PhD at the Victor Chang Cardiac Research Institute (VCCRI) under the supervision of Professor Richard Harvey and Dr Robert Nordon. His PhD focused on characterising the growth dynamics of cardiac stem cells by time-lapse imaging and single-cell tracking. James established this technology at VCCRI; constructing a methodological pipeline for analysis of single- cell growth dynamics. In 2016 James completed his PhD and joined the School of Dentistry, Faculty of Medicine and Health, University of Sydney as an Associate Lecturer. James’ research currently focuses on developing tools for recording and analysing single cell dynamics and applying these tools to study stem and cancer cell biology.

Monday September 10, 2018

Speaker: Ignatius Pang (University of New South Wales)

Title: Benchmarking Protein Correlation Profiling datasets against reference protein complexes: case studies in S. cerevisiae.

Abstract: Protein Correlation Profiling (PCP) is a method which enables many protein complexes to be identified in single experiments, unlike other methods such as affinity purification-mass spectrometry, which involves ‘one-at-time’ affinity purifications of tagged-proteins. A typical PCP experiment involves fractionation of endogenous and untagged protein complexes by size or other physiochemical parameters, followed by LC-MS/MS and label-free quantification of each fraction. Proteins in the same intact complex are co-eluted and often have high correlation in protein abundance across multiple fractions. Although this information can help identify intact complexes, doing so is computationally challenging. For example, machine learning strategies used to identify complexes from PCP datasets can have high false positive rates for novel complexes (Shatsky et al. 2016 MCP 15.6:2186-02). The aim of this study was to develop a framework for benchmarking PCP datasets against high-quality sets of reference protein complexes. This approach, which we predominantly applied using the large-scale reference sets of protein complexes available for Saccharomyces cerevisiae (e.g. Benschop et al. 2010 Mol. Cell. 38: 916-928), enabled us to evaluate the quality of PCP datasets, identify known protein complexes with high confidence, and develop guidelines on the choice of correlation metrics and fractionation approaches used to interpret and collect PCP datasets.

About the speaker: Igy is a postdoctoral research associate at the Systems Biology Initiative at UNSW, led by Prof. Marc Wilkins. Igy’s current role involves collaborating with bioscience researchers who are interested in the analysis of -omics datasets. His expertise involves the co-analysis of multiple -omics datasets in conjunction with multiple types of biological networks, for example, signaling, regulatory, and protein-protein interactions networks. His current projects include identifying the potential link between gene mutations and the side-effects of an antipsychotic medication and analyzing the role of the virome on the onset of type-1 diabetes among infants. Prior to his postdoc he had 2 years of experience in audit data analytics and fraud detection.

Monday September 3, 2018

Speaker: Serigne Lo (The University of Sydney and the Melanoma Institute of Australia)

Title: Competing risks analysis with missing event types - penalized likelihood estimation of cause-specific Cox models

Abstract: Competing risks models provide attractive tools to analyze time-to-event data where each subject faces multiple event types. The models are useful when assessing the burden and etiology attributable to a specific disease. However, a complexity may arise when the event type for some subjects are missing but their event times are observed. Assuming the unobservable event types are missing-at-random, we develop a novel constrained maximum penalized likelihood estimates for semi-parametric cause-specific Cox regression models. Penalty functions are used to smooth the baseline hazards. The appealing feature of our approach is that all relevant estimates in competing risk setting are provided including regression coefficients and cause-specific baseline hazards. Asymptotic results of these estimates are also developed. Through intensive simulations, we demonstrated the superiority of our method compared to some existing methods. We illustrate the new method using data from melanoma patients who faced two competing risk events: melanoma versus non-melanoma causes of death.

About the speaker: Dr Serigne Lo is a Senior Research Fellow in Biostatistics at the University of Sydney. He leads the Research & Biostatistics Group at the Melanoma Institute Australia. He has accumulated +15 years of academic teaching/research experience. Dr Lo provides leadership in the conduct of cutting edge biostatistical methods and support across the institute. Dr Lo is interested in the development of new statistical methods and his personal research includes: Clinical trials, Adaptive design, Multistate modelling, and Joint-modelling.

Monday August 27, 2018

Speaker: Joshua Ho (Victor Chang Cardiac Research Institute)

Title: Scalable bioinformatics methods for single-cell RNA-seq analysis

Abstract: TBA

About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS).

Monday August 20, 2018 (Cancelled)

Speaker: Chelsea Mayoh (Children's Cancer Institute)

Title: The Complexity of Identifying Targetable Genes in the Paediatric Transcriptome

Monday August 13, 2018

Speaker: Dr Boris Guennewig (Brain and Mind Centre, The University of Sydney)

Title: Bioinformatics on multimodal data sets in a longitudinal ageing and neurodegeneration clinic

Bio: Dr Boris Guennewig leads the Forefront Bioinformatics team at the BMC. Boris is a Senior Lecturer at the University of Sydney and Conjoint Lecturer at UNSW. He received a diploma in chemistry from the University of Munster and secured a very competitive position at the Max Planck Institute for Molecular Biomedicine, which shifted his focus to immunology and inflammatory processes in human disease. After this, he transitioned to the Swiss Federal Institute of Technology as a PhD candidate in the laboratory of Prof J Hall, where he worked on microRNA biogenesis and their functions in human disease. He was then recruited by Prof J Mattick to the Garvan Institute to work with Prof J Mattick & A Cooper as well as Glenda Halliday on deriving diagnostic biomarkers from various neurodegenerative diseases. Boris is additionally the lead bioinformatician/consultant for the International Cerebral Palsy genetics consortium, a member of the Australian Genomics Health Alliance and the founder of the analytics/bioinformatics company Pacific Analytics PTY LTD (Australia).

Research Interests: I am a research scientist/bioinformatician/statistician specializing in the development of infrastructure, software and pipelines to manage, analyze and mine large complex datasets in medical research. Using structured, semi-structured and unstructured data, my research focuses on the identification and characterization of genetic variation and transcriptional changes influencing complex human diseases (such as frontotemporal lobe dementia, bipolar disorder, Parkinson’s and Alzheimer’s disease, etc.). I achieve this through the functional integration of high-dimensional biological (omics) data, in combination with my statistical, genetics and data mining skills. I believe that assimilating and modelling multi-modal data (i.e. imaging, clinical and omic data) is key to uncovering the genotype-phenotype interaction and how this relationship affects complex traits.

Seminars in 2018, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Friday June 8, 2018 (Special Statistical Bioinformatics Seminar - 11:20AM - Carslaw 829 Access Grid Room)

Speaker: Associate Professor Bradley Broom (MD Anderson Cancer Center, USA)

Title: Bioinformatics analysis tools

Abstract: This Friday we will be hosting Associate Professor Bradley Broom from the Department of Bioinformatics and Computational Biology at MD Anderson Cancer Center for a special Statistical Bioinformatics Seminar. The Seminar will start at 11:20 in the AGR. See below for a description of Bradley’s proposed talk.
In this talk I will describe two bioinformatics tools being developed at MD Anderson Cancer Center.
Clustered heat maps were developed for biomedical data in the early 1990s. They are now the most widely used visualization for molecular profiling data. But they are fundamentally static objects. We have created Next-Generation Clustered Heat Maps that include capabilities for interactively zooming and navigating the clustered heat map, for adjusting its color scale or other display parameters, and for interrogating the data in, or behind, the contents of the heat map. I will describe the key features of Next-Generation Clustered Heat Maps and the tools for generating them.
The bioinformatics analysts at MD Anderson perform many hundreds of detailed bioinformatics analyses per year. We also frequently receive requests to update an analysis or to repeat a substantially similar analysis on new data. These new requests can arrive months, years, or even decades after the original analysis was performed. Access to the original analysis is often crucial to reproducing and updating the earlier analysis, but locating the original analysis can be challenging if a long time has elapsed since the original analysis and/or the analyst who performed the initial analysis has left the institution. In this talk I will describe FjORD, a meta-information system we have developed to record bioinformatics analyses as well as to search for and retrieve previous analyses. I will also describe future goals for enforcing reproducible analyses.

Monday May 28, 2018

Speaker: Mathieu Fourment (ithree Institute, University of Technology, Sydney)

Title: New methods in phylogenetic inference

Abstract: Markov chain Monte Carlo (MCMC) algorithms have been the workhorse of Bayesian inference in phylogenetics for almost two decades. Although these algorithms have been successfully used in a wide range of applications they do not scale well to large numbers of sequences. In this talk I will present some of my work on sequential Monte Carlo algorithms and approximate inference using the variational Bayesian framework.

About the speaker: Mathieu is a data scientist at the University of Technology Sydney. He obtained his PhD at Macquarie University in 2010 and had research positions at the University of California San Diego, Duke-NUS, and the University of Sydney. His research interests include phylogenetics, variational inference, and probabilistic modelling.

Monday May 21, 2018

Speaker: Vinita Deshpande and Tom Geddes (The University of Sydney)

Title: (i) Good and bad fat: discovering key distinguishing features with proteomics; (ii) Ensemble deep learning for integrating heterogeneous omics data

Abstract: (i) Subcutaneous (SC) and visceral (VIS) fat cells store energy as fat and affect whole body metabolism. Excessive VIS fat is associated with insulin resistance, a precursor to Type 2 diabetes. In contrast, SC fat may be protective. Despite these important physiological functions, relatively little is known about the molecular features that distinguish these discrete types of fat cells. We have used mass spectrometry to construct proteomes of SC and VIS fat cells from mouse models representing healthy and diabetic states. These proteomes consist of over 7,500 quantified proteins spanning six orders of magnitude. By coupling with statistical approaches, we stratified the proteome into groups defined by the factor(s) driving the differences in protein expression, which revealed interesting functional differences. This analytic approach also serves as a computational framework to answer biological questions in a comprehensive and systematic manner. Thus we demonstrate the utility of this proteomic resource and analytic approach in uncovering novel insights into fat cell biology.

(ii) Artificial neural network models are capable of learning high-accuracy classifiers over inputs with complex information structure given a large number of training examples, and have become increasingly popular across a wide variety of applications. Ensemble learning methods can be used to increase classification accuracy and robustness by combining outputs from a collection of individual models trained differently on the same data, often proving more reliable than a single model. In the field of biology, multiple datasets are often available pertaining to highly overlapping sets of molecular species (such as proteins or RNA transcripts) but differing in experimental origin, and containing potentially orthogonal/non-redundant information. We explore the use of ensemble deep learning to draw on mass-spectrometry datasets from a variety of sources to increase classification accuracy

About the speaker: (i) Vinita Deshpande is a PhD student at the Charles Perkins Centre at the University of Sydney, where she is using quantitative proteomics to investigate fat cell biology. Her research interests lie in the application of systems biology methods to understand human diseases.

(ii) Tom Geddes is a PhD student at the Charles Perkins Centre at the University of Sydney, where he is using deep learning methods to predict protein-protein interactions. He is interested in better understanding the structure of complex biological systems by leveraging large, information-rich datasets.

Monday May 14, 2018

Speaker: Susan Corley (University of New South Wales)

Title: Working with ‘Salmon’ a new fast tool to quantify RNA-Seq reads

Abstract: Quantification of sequenced reads is the first vital step in undertaking a gene expression experiment using RNA-Seq. Recently, methods including kallisto (Bray et al., 2016) and Salmon (Patro et al.,2017) have been introduced which calculate transcript abundance without full mapping of reads to the genome. In this talk I will give some examples of my experience using Salmon and I will compare these results to read mapping produced using Tophat2.

About the speaker: Susan is a Senior Research Associate in the Systems Biology Initiative (SBI) at UNSW. She commenced with the SBI in 2012 following completion of her PhD in biomedical science and biochemistry at the John Curtin School of Medical Research (ANU). Prior to this she completed her Bachelor of Science with majors in chemistry at the University of Sydney. Susan’s primary research interest is in understanding the genes and pathways involved in human health and disease through employing Next Generation sequencing techniques. Susan’s research projects cover a wide breadth and have included gene expression analysis relating to Crohn’s disease, Schizophrenia, Williams-Beuren syndrome, Kerataconus, Immunity related conditions and cancer cachexia. Susan worked as a lawyer before commencing her studies and work in science.

Monday May 7, 2018

Speaker: Alex Lancaster (Ronin Institute and the University of Sydney)

Title: Modeling cellular systems: stochastic approaches in evolutionary research and therapeutic applications

Abstract: Stochasticity - random noise at the cellular, developmental and organismal level - plays an under-appreciated role in both evolutionary and biomedical research. Alex will present summaries of his work in the computational modeling of stochasticity in cellular systems, including yeast prions, signalling and gene networks and sepsis. Throughout, the role of noise in shaping both evolutionary trajectories, as well as therapeutic intervention are highlighted. The benefits of bottom-up, agent and rule-based strategies in both academic and commercial R&D contexts to help illuminate these questions are also discussed.

About the speaker: Alex Lancaster is a Visiting Scholar at the University of Sydney, a Research Scholar at the Ronin Institute, and a Partner at Cambridge, Massachusetts-based consulting company, Amber Biology. Following undergraduate degrees at the University of Sydney, he received his Ph.D. from University of California, Berkeley and has held research positions at the Santa Fe Institute, the Whitehead Institute at MIT, and Harvard Medical School. His research interests are in the intersection of evolutionary theory, systems biology and agent-based modeling.

Monday April 30, 2018

Speaker: Joseph Cursons (The Walter and Eliza Hall Institute)

Title: Post-transcriptional control of epithelial-mesenchymal transition through combinatorial miRNA targeting

Abstract: MicroRNAs (miRNAs) are small, non-coding RNAs with an important role in post-transcriptional regulation, targeting mRNAs for degradation and/or inhibiting their translation. Working with a model of TGF-β induced epithelial-mesenchymal transition in human mammary epithelial cells, we identified a set of miRNAs that appear to be co‑regulated with induction of an invasive mesenchymal phenotype. Computational analyses show that these miRNAs are coregulated across clinical breast cancer samples from the TCGA as well as a wider set of primary cell lines. Furthermore, analysis of high‑confidence predicted targets (based upon miRNA:mRNA sequence complement) suggests that these miRNAs share several targets, and many of their targets also interact at the protein level. Investigating this result, we selected several pro-epithelial miRNAs with evidence of co-targeting and demonstrated that combinatorial treatment could alter cellular phenotype with ectopic miRNA concentrations several orders of magnitude below what is typically used, and much closer to endogenous levels. This work suggests that cooperative targeting by miRNAs may be an important factor for their physiological function, and future work attempting to classify miRNA function should consider such combinatorial effects.

About the speaker: Joe Cursons is a Senior Research Officer in the Davis Laboratory within the Bioinformatics Division of the Walter and Eliza Hall Institute. He obtained his PhD from the Auckland Bioengineering Institute in 2012 and his research interests are centred on the regulatory control systems dysregulated during the progression and metastasis of breast cancer and melanoma. Much of Joe’s work involves the analysis of sequencing data to identify mechanisms of drug sensitivity and drug resistance for cancer treatment.

Monday April 23, 2018

Speaker: Bobbie Cansdale (The University of Sydney)

Title: From CTCF to 3D Modelling: Investigating the mammalian genome in three dimensions

Abstract: The three-dimensional structure of the mammalian genome is non-random and important for several key biological processes including the regulation of gene expression. Determining this structure, as well as the sequence itself, is necessary to further genome biology research. Topologically associating domains (TADs) are a main feature of chromatin organisation. These are clusters of genes that are functionally co-regulated, with boundary regions enriched for features such as CTCF binding sites, transfer RNAs, and SINE retrotransposons. Chromosome conformation capture (3C) based approaches, including Hi-C, can provide valuable insight into the spatial organisation of chromatin fibre. Computational frameworks have recently become available to use this data to create 3D representations of the genome, providing novel insights compared to the standard interaction matrices alone. Knowledge of these structures will allow investigation as to how they relate to the nearby genes and other genomic features. Here I will focus on bioinformatics strategies and possible future applications of this work using examples from the canine genome.

About the speaker: Bobbie Cansdale is a PhD student in computational biology and animal genomics at the University of Sydney. She completed a Bachelor of Animal and Veterinary Bioscience (Honours) at the University of Sydney in 2015. Her current focus is on canine genomic research. She is interested in the modelling of chromatin architecture, genomic data analysis, novel sequencing methods, and the integration of various data types to better answer questions.

Monday April 16, 2018

Speaker: Sarah Beecroft (The University of Western Australia)

Title: Gene Hunting- Why don't we know all disease genes yet?

Abstract: Rare genetic diseases include some of the most debilitating disease to affect humans, with onset ranging from before birth to old age. Although the human genome has now been mapped, finding mutations that cause rare diseases is still hugely challenging. About 50% of patients are without a diagnosis. Sarah will discuss the bioinformatic strategies used in this field, and the future of disease gene discovery.

About the speaker: Sarah Beecroft is based in the Molecular Neurogenetics lab at the Harry Perkins Institute of Medical Research, Perth. She works to discover new disease genes in patients with nerve and muscle diseases. She is interested in finding mutations in the non-coding regions of the genome, a vast unknown in rare disease genetics.

Monday April 9, 2018

Speaker: Shila Ghazanfar (The University of Sydney)

Title: Investigating combinatorial expression of delta-protocadherins in single olfactory sensory neurons

Abstract: Single cell RNA-Sequencing (scRNA-Seq) has enabled unprecedented insight into the behaviour of individual cells on the scale of the entire transcriptome. Such precision offers an opportunity to explore cell-specific heterogeneity, however two distinct features arise from such data: (1) hyperinflation of identically zero counts for the majority of genes for any given cell, and (2) an apparent bimodal distribution of non-zero counts. Both features are unique to scRNA-Seq, and warrant further development of statistical tools in order to answer biological questions of interest.

We propose a mixture modelling framework to classify cells into three transcriptional states for each gene: (1) no, (2) low, and (3) high gene expression. This approach has the potential to reveal the cell-specific dynamics of RNA transcription (bursting) and degradation, as well as acting as a cross-dataset standardisation. We utilised a number of publicly available scRNA-Seq datasets, stemming from mouse neuronal cell populations, to perform the mixture model comparison, assess highly and lowly variable genes, and to estimate cell networks via a uniqueness thresholding.

This work is in the context of understanding how olfactory sensory neurons (OSNs) interact with each other during embryonic development of the mouse olfaction system. In particular, we study the role that combinatorial expression of genes in the delta protocadherin gene subfamily plays in mediating cell-cell adhesion. Further, we utilise distinct guiding principles to build a Monte Carlo simulation of this cell-cell adhesion behaviour, and assess it's suitability. This addresses the larger question of how combinatorial gene expression specifies specific cell types and tissues.

About the speaker: Shila Ghazanfar has recently completed her PhD in Statistical Bioinformatics at The University of Sydney and is currently a Research Associate in the Judith and David Coffey Lifelab and School of Mathematics and Statistics. Her research interests are in statistical analysis of data arising from high throughput sequencing technologies such as RNA-Seq in various research contexts.

Monday March 26, 2018

Speaker: Kitty Lo (The University of Sydney)

Title: Novel alternative splicing in TDP-43 mutant mouse models of ALS

Abstract: TDP-43 (encoded by TARDBP) is an RNA binding protein central in the pathogenesis of the neurodegenerative disorder amyotrophic lateral sclerosis (ALS). However, how TARDBP mutations trigger pathogenesis remains unknown. Here, we use novel mouse mutants carrying point mutations in Tardbp to dissect TDP-43 function. Interestingly, we find that TDP-43 C-terminal mutations lead to a gain of splicing function. Using two different strains we are able to separate TDP-43 loss and gain of function effects. This new gain-of-function induces a novel category of splicing events, here termed skiptic exons, in which skipping of constitutive exons occurs, causing expression changes. Our findings provide a novel pathogenic mechanism and highlight how gain- and loss-of TDP-43 function affect RNA processing differently, suggesting they may play roles at different disease stages.

About the speaker: Kitty Lo is currently a bioinformatics postdoctoral researcher in the Faculty of Science. Prior to this, she was at the University College London and the UCL Institute of Neurology. She has also worked in a Cambridge based biotechnology startup where she developed cancer diagnostic tools using ctDNA. Kitty has a PhD in astronomy from the University of Sydney.

Monday March 19, 2018

Speaker: Dana Pascovici (Macquarie University)

Title: DIA/SWATH - challenges and opportunities for bioinformatics

Abstract: Protein quantitation using DIA/SWATH mass spectrometry has been growing in popularity over the last few years. From the point of view of the bioinformatics involved, on one hand the data resulting from such experiments is quite easy to analyse at least if the experiment is not too large, due to a much lower percentage of missing data, and data look and distribution that makes existing methodology from other areas quite easily applicable. Put plainly, extracted SWATH data is quite nice to work with. However, that is because much of the difficulty has been pushed underneath, at the level of the SWATH library building and data extraction, where it is somewhat hidden from view.

In this talk we will describe SWATH and its place in the landscape of quantitative proteomics (including broad comparisons with label free and labelled techniques such as iTRAQ and TMT), and the many positive aspects of the resulting SWATH datasets, from the point of view of the data analyst. We will also focus on how SWATH data extraction usually relies on using high quality peptide MS/MS spectral libraries, however building such libraries to ensure good proteome coverage can be time consuming and expensive. In order to address this issue various computational approaches for merging archived or external libraries were created and evaluated, including efforts from our group. We will describe the appeal of such methods, the possible issues that can ensue and some approaches to tackle them in order to ensure that the proteins are reliably detected and their quantitation is consistent and reproducible. We will discuss these aspects in the context of several existing datasets, including a carefully designed spiked-in experiment, and a recently published large plasma proteomics experiment containing samples from neonates, young children and adults.

About the speaker: I am currently a Biostatistician at the Australian Proteome Analysis Facility at Macquarie University, where I help people generate biological insights out of their proteomics data, especially in the context of complex experiments.

Working in a proteomics facility, our focus has been on generating reliable methods of interpreting and analysing data from a variety of platforms, lately emphasizing SWATH and TMT, and wherever possible incorporating them into software workflows. Areas of particular relevance to us have been plasma proteomics, and plant proteomics of agriculturally important species. Our work has benefitted from interactions with researchers, students and the APAF team of mass spectrometry specialists and analytical chemists.

I come from a mathematical and computational background, having completed a bachelor degree in Mathematics and Computer Science at Dartmouth College in the US, followed by a PhD in Mathematics at MIT, and a brief stint of teaching at Purdue. In Sydney I took a more practical turn and worked in the industry in the area of speech recognition, before settling into biostatistics for the past 13 years, both in the industry and research environment.

Seminars in 2017, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 27, 2017

Speaker: Beth Signal (Garvan Institute of Medical Research)

Title: Machine learning annotation of branchpoints and in silico modelling of functional splicing events.

Abstract: RNA splicing is a key component of mature RNA transcript formation, required for the removal of intronic regions and subsequent ligation of exonic regions. This process can also allow for alternative splicing to occur, where different exonic regions are ligated together to produce alternative RNA products. The branchpoint element is one of the splicing sequence elements, required for the first lariat-forming reaction in splicing. However current catalogues of human branchpoints remain incomplete due to the difficulty in experimentally identifying these elements. To address this limitation, we have developed a machine-learning algorithm - branchpointer - to identify branchpoint elements solely from gene annotations and genomic sequence. Using branchpointer, we annotate branchpoint elements in 85% of human gene introns with sensitivity (61.8%) and specificity (97.8%). In addition to annotation, branchpointer can evaluate the impact of SNPs on branchpoint architecture to inform functional interpretation of genetic variants. Branchpointer identifies all published deleterious branchpoint mutations annotated in clinical variant databases, and finds thousands of additional clinical and common genetic variants with similar predicted effects. While alternative splicing can produce alternative RNA products, a large proportion of these have little functional impact on open reading frames or transcript stability. To address this limitation in the functional interpretation of differential splicing analyses, we have developed software to model events in silico and interpret their functional impact.

About the speaker: Beth is a PhD Student in the Clinical Genome Informatics group at the Garvan Institute. Her current research is focused on developing bioinformatics methods to understand how transcript splicing and expression is controlled. She has a particular interest in using machine learning techniques to study transcriptomic behaviour.

Monday November 20, 2017 (PLEASE NOTE: Special second talk, Location = Carslaw Access Grid Room (Level 8), Time: 4pm)

Speaker: Sonja Greven (LMU Munich)

Title: Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains

Abstract: Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, e.g. functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen-Loève Theorem. For the practically relevant case of a finite Karhunen-Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN. The new method is shown to be competitive to existing approaches for data observed on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. Supplementary material, including detailed proofs, additional simulation results and software is available online.

About the speaker: http://greven.userweb.mwn.de/research.html

Monday November 20, 2017 (PLEASE NOTE: Special location - Level 5 Large Meeting Room, Usual time: 1pm - 2pm)

Speaker: Elizabeth Mason (The University of Melbourne)

Title: Modelling transcriptional variability in single cell RNA-seq data during human embryogenesis captures changes in the regulation of critical developmental genes

Abstract: Human development is a temporally and spatially ordered series of events that occur with remarkable precision; the same DNA blueprint gives rise to more than 250 sharply defined cell phenotypes. At the functional phenotype level embryogenesis appears predictable because we observe the average behaviour of many individual cells, even as the number of cells, the range of phenotypes and transcriptional complexity increases during the course of development. When we evaluate single molecules and transcripts that the stochastic nature of gene expression is revealed, for example in single cell RNA-seq experiments (scRNA-seq). Current methods reduce scRNA-seq data to a well-defined trajectory based on the abundance of key regulators of phenotype, and differential abundance between cells in a given phenotype is used to identify sub-populations. Here we present an alternative approach: that measuring the transcriptional variability at the gene level informs the level of regulation imposed on it, reflecting an intrinsic property of development that is often overlooked. While linear models have been a successful framework to characterize differences in abundance between phenotypes on average, they do not account for stochastic differences captured by scRNA-seq experiments. Accurately determining abundance and variability is further complicated by the sparseness of non-zero expression values. To address these challenges and evaluate gene expression during human pre-implantation embryogenesis, we applied a statistical mixture model to scRNA-seq data. Fitting the model on a gene-by-gene basis allowed us to evaluate shifts in the proportion of cells expressing a given gene (λ), and also the mean (μ) and standard deviation (σ) of expression. From here, a correlation based analysis evaluated whether abundance (μ) and variability (σ) capture different aspects of transcriptional regulation. While each metric largely identified the same genes, the number and nature of relationships between them differed. Indeed, genes sharing correlated patterns of variability during development were enriched for motifs associated with developmental transcription factors (e.g. HIC2, PPARG, E2F4 and ZNF692). Variability was more effective than abundance at specifically detecting regulatory relationships during development, and with less redundancy. Our approach provides a gene-centric platform to evaluate population-based parameters of gene expression, while preserving the complexity of scRNA-seq data.

About the speaker: Lizzi began her career in human genomics as a laboratory manager and laboratory technician with Professor Greg Gibson (Centre for Integrative Genomics, Georgia Tech University). She conducted 2 investigations in Australia which identified maternal influences on development of the neonate immune system, and uncovered population structure of the leukocyte transcriptome. Together with scientists at Emory University, Greg and Lizzi initiated the CIG’s involvement in the WHOLE (Wellness and Health Omics Linked to the Environment) study of Predictive Health Genomics in Atlanta (USA) which is currently in its 6th year. Lizzi has recently completed a PhD in systems biology of human stem cells at the Australian Institute for Bioengineering and Nanotechnology at the University of Queensland. Her PhD project formed an international collaboration with Professor Christine Wells (University of Melbourne AUS), stem cell biologists Professors Martin Pera (Jackson Laboratory USA) and Ernst Wolvetang (University of Queensland AUS), biostatistician Assistant Professor Jessica Mar (Albert Einstein College of Medicine, USA) and computational biologist Professor John Quackenbush (Harvard University, USA). Her primary focus is evaluating whether molecular variability in stem cell populations describes an important, but until now hidden predictor of cellular behaviour and phenotype. Phenotypic heterogeneity in clonally derived cell populations is ubiquitous, and biologically relevant information is often masked by using population-averaging techniques, versus individual cell based measurements. She has developed new network approaches which incorporate gene expression variance, with the goal of identifying genetic elements which stabilize a cell phenotype, and push a cell to transition between phenotypes. During her PhD Lizzi has been invited to present her work in departmental seminars at the Harvard Stem Cell Institute, the Lieber Brain Institute at Johns Hopkins University, and the Black Family Stem Cell Institute at Mt Sainai Hospital New York. She was also one of 12 international scientists who were invited to participate in the Radcliffe Exploratory Workshop for Variation at Harvard University in 2011. She is currently based with Professor Christine Wells in the Centre for Stem Cell Systems at the University of Melbourne, where she is working on applied statistical methods to evaluate molecular variability in single cell RNA-seq data.

Monday November 13, 2017 (No seminar)

Monday November 6, 2017

Speaker: Shila Ghazanfar (The University of Sydney)

Title: Integrated single cell data analysis for understanding mechanisms of neuronal diversity

Abstract: Technological advances such as large scale single cell transcriptome profiling have exploded in recent years and enabled unprecedented insight into the behaviour of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterise very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterisation of high gene expression, as well as gene coexpression. In this talk, I will describe a versatile modelling framework to identify transcriptional states motivated by a collaborative project involving neuronal single cell data. Neuronal cell systems exhibit extraordinary levels of complexity. Thus it is of great interest to explore the ways in which this neuronal diversity is generated and manifested to achieve such complexity. One proposed mechanism is patterns of gene transcription across neurons. We will describe an approach using bioinformatics and statistics to evaluate evidence of gene transcriptional mosaics as a mechanism for achieving diversity of neuronal cells.

About the speaker: Shila has recently completed her PhD in Statistical Bioinformatics at The University of Sydney. Her research interests are in statistical analysis of data arising from high throughput sequencing technologies such as RNA-Seq in various research contexts.

Monday October 30, 2017 (No seminar)

Monday October 23, 2017

Speaker: Eva Chan (Garvan Institute)

Title: Detecting Complex Genomic Rearrangements using Optical Mapping

Abstract: Genomic rearrangements are common in cancer, with demonstrated links to disease progression and treatment response. These rearrangements can be complex, resulting in fusions of multiple chromosomal fragments and generation of derivative chromosomes. Comprehensively detecting complex genomic rearrangements (CGR) in cancer remains challenging. No single approach can comprehensively identify all structural variations as each has their strengths and weaknesses. In this seminar, I will demonstrate the utility of whole genome optical mapping in capturing CGR. I will further showcase an example using optical mapping to capture chained fusion events in a well-studied liposarcoma cell line. Using this approach, we identified fusion maps that clearly revealed chained fusion architectures (content, order, orientation, and size), as well as large rearrangement junctions that are undetectable by sequencing alone. I hope to convince you that optical mapping is an important complement to existing technologies for detecting and reconstructing complex genomic rearrangements.

About the speaker: Senior Bioinformatics Research Officer, Human Comparative and Prostate Cancer Genomics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, The Kinghorn Cancer Centre

Monday Oct 16, 2017 (PLEASE NOTE: Special time of 2:00PM)

Speaker: Natalie Thorne (Melbourne Genomics)

Title: Clinical bioinformatics – what does it really take to translate research into practise?

Abstract: Melbourne Genomics Health Alliance has taken a collaborative, patient-centred, clinically-driven, evidence-based and sustainable approach to delivering genomic testing. This year the Alliance has commenced implementing Victoria’s new clinical system for genomics. A platform for bioinformatics analysis and a tool for variant curation will be among the first components to be implemented and used for accredited clinical genomic testing by diagnostic laboratories. Operating within this shared digital system however, presents a challenge for laboratories to simultaneously coordinate with other diagnostic laboratories and hospitals, whilst also supporting their own business requirements for accreditation and continual innovation.
At the heart of diagnostic innovation in genomics is the emerging field of clinical bioinformatics; combining clinical, diagnostic, analytical, software and genetic aspects to implementing clinical genomic testing. The field has two key challenges: first, it is in its infancy and laboratories lack the support of a mature discipline; second, it demands skills and expertise predominantly lacking in traditional academia. These include developing enterprise-grade solutions, complex strategies for organisational change, multi-stakeholder collaboration, community engagement and rapidly evolving biotechnology.
Drawing on my experiences working with the Melbourne Genomics and Australian Genomics Health Alliances, I will discuss the challenges and opportunities in clinical bioinformatics, including the use of ‘implementation science’ for translating research bioinformatics into clinical practice.

About the speaker: TBA

Monday Oct 9, 2017

Speaker: Saskia Freytag (WEHI)

Title: Cluster Headache: Comparing Clustering Tools for 10X Single Cell Sequencing Data

Abstract: The commercially available 10x Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10x Genomics data lack cell labels. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with 10x Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrated that all clustering methods tested clustered cells to a large degree according to the amount of ribosomal RNA in each cell.

About the speaker: Saskia completed her Masters in Statistical Science at University College London. After finishing she moved back to Germany, where she completed a PhD in Biostatistics in 2014. She then got the opportunity to relocate to Melbourne to work as a Post-Doctoral Fellow at the Walter and Eliza Hall Institute in Melanie Bahlo’s group. Her research focus is methodological development for the analysis of high throughput sequencing data. She is co-founder of R-Ladies and an ambassador for CHOOSEMATHS.

Monday October 2, 2017 (No seminar - Labour Day public holiday)

Monday September 25, 2017 (No seminar)

Monday Sept 18, 2017

Speaker: Rebecca Poulos (UNSW)

Title: The use of big data in the search for cis-regulatory driver mutations in cancer genomes

Abstract: Mutations that directly alter protein function have been extensively studied in cancer. However, in recent years, it has become feasible to examine the cancer-causing role of mutations within the remaining 98% of the genome which is non-coding. Here I will present our use of big data in the study of cis-regulatory somatic mutations in cancer genomes. We analysed somatic mutations from over 1,000 cancer genomes across 14 cancer types, specifically focusing on promoter regions. These regulatory regions are often bound by proteins, and we discovered remarkable ‘mutation hotspots’ at sites of protein binding. To understand why these hotspots formed, we used genome-wide maps of nucleotide excision repair (NER) to show that sites of protein binding have reduced levels of NER. Our analyses uncovered the presence of a previously unknown mechanism, by which we associated reduced NER with the formation of mutation hotspots at promoters. To determine how these hotspots might impact cancer development, we investigated whether these mutations can impact the ability of a protein to bind to DNA by analyzing skin cancer mutations at the binding site of the protein CTCF. Performing CTCF ChIP-seq in a melanoma cell-line, we demonstrated the functionality of such mutations through allele-specific reduction of CTCF binding to mutant alleles. Finally, we sought to determine the role of DNA methylation (a common epigenetic modification) on the occurrence of somatic mutations in cancer. We correlated mutation load with methylation across 15 cancer types and subtypes, and we showed that reduced levels of methylation in regulatory regions may be responsible for reduced mutation loads at such loci in colorectal cancer. Taken together, these analyses develop our understanding of the formation and repair of mutagenic lesions in cis-regulatory regions of cancer genomes, providing insight into the search for driver mutations at such loci.

About the speaker: Rebecca Poulos is a researcher in the ‘Bioinformatics and Integrative Genomics’ group at the Lowy Cancer Research Centre at UNSW Sydney. Rebecca’s research field is in the area of cancer genomics, where she uses ‘big data’ to study DNA mutation and repair processes in regulatory regions of cancer genomes. Her research output includes first-author publications in ‘Nature’ and ‘Cell Reports’, together with a review article, editorial and book chapter in the area of non-coding driver mutations in cancer. Rebecca studied science and business at the University of Technology Sydney. She is currently at UNSW Sydney where she completed her Honours year (with University Medal) and is finalising her PhD research (with UNSW Research Excellence Award).

Monday Sept 11, 2017

Speaker: Mark Segal (UCSF)

Title: Statistical and Computational Challenges in Conformational Biology

Abstract: Chromatin architecture is critical to numerous cellular processes including gene regulation, while conformational disruption can be oncogenic. Accordingly, discerning chromatin configuration is of basic importance, however, this task is complicated by a number of factors including scale, compaction, dynamics, and inter-cellular variation. The recent emergence of a suite of proximity ligation based assays, notably Hi-C, has transformed conformational biology with, for example, the elicitation of topological and contact domains providing a high resolution view of genome organization. Such conformation capture assays provide proxies for pairwise distances between genomic loci which can be used to infer 3D coordinates, although much downstream analysis bypasses this reconstruction step. After demonstrating advantages deriving from obtaining 3D genome reconstructions, in particular from superposing genomic attributes on a reconstruction and identifying extrema (’3D hotspots’) thereof, we showcase methodological challenges surrounding such analyses. Open issues highlighted include (i) performing and synthesizing reconstructions from single-cell assays, (ii) devising rotation invariant methods for 3D hotspot detection, (iii) assessing genome reconstruction accuracy, and (iv) averting reconstruction uncertainty by direct integration of Hi-C data and genomic features. By using p-values from (epi)genome wide association studies as the feature the latter approach provides a conformational lens for viewing GWAS findings.

About the speaker: TBA

Monday Sept 4, 2017

Speaker: Dr Fabio Luciani (UNSW)

Title: A systems immunology approach to study antigen-specific T cells in viral infection

Abstract: Immunological memory is a cardinal feature of human adaptive immunity and is critical for prophylactic vaccination and recently has been shown to play important role in determining the outcome of T cell based immunotherapies in cancer. Although cytotoxic T cells can have a significant impact on disease clearance, the essential phenotype of a clinically successful T cell and how this influences therapeutic efficacy remain largely undefined. In this presentation I will present our systems immunology approach to tackle these issues. I will review recent studies on longitudinal samples of primary HCV infection using flow cytometry for phenotyping virus specific T cells, along with single cell transcriptomic and TCR diversity analyses. Future directions involve application of this systems immunology approach to other viral infections, as well to understand how long term T cell memory protection is achieved.

About the speaker: Ass. Prof. Fabio Luciani was trained as theoretical physicist (Masters), theoretical biologist (PhD 2006 from the Humboldt University of Berlin (Germany)). His research interests include adaptive immune responses against pathogen infections, computational models for studying host-pathogen interactions, and bioinformatics analyses of high throughput next generation sequencing data. He has applied mathematical modelling to understand infectious diseases, focussing on transmission dynamics of drug resistant mycobacterium tuberculosis, and the transmission of hepatitis C virus among injecting drug users. He has made several contributions in how HCV infect a new host and the role of T cell mediated responses using next generation sequencing technologies, flow cytometry and statistical modelling. More recently, he has moved into single cell genomics and systems immunology approaches to understand T cell dynamics. He currently holds a NHMRC Career Development Fellowship and he leads a systems immunology group where he conducts both wet- and dry-lab research in the field of immune responses against pathogens. During his career he has published more than 80 papers in specialized and more general journals.

Monday Aug 28, 2017

Speaker: John-Sebastian Eden (Charles Perkins Centre, USyd)

Title: Using RNA-Seq to reveal the Australian Virome

Abstract: TBA

About the speaker: TBA

Monday Aug 21, 2017

Speaker: Lori Chibnik (Harvard School of Public Health)

Title: Genomic journeys into neuropathology and cognitive reserve in an aging population

Abstract: TBA

About the speaker: Dr. Lori Chibnik, PhD, MPH is a biostatistician and Assistant Professor with a joint appointment in the Department of Epidemiology at the Harvard T.H. Chan School of Public Health and the Department of Medicine at the Harvard Medical School. She received her MPH in International Health and her PhD in biostatistics from Boston University where she worked on predictive modeling methods for disease risk. Over her career she has developed and assessed predictive models for diseases such as HIV, pre-natal screening and autoimmune diseases and continues to apply her methods to various diseases. Dr. Chibnik’s current research focuses primarily on genetics and genomics of Alzheimer’s disease and dementia with an emphasis on longitudinal cohorts. In addition to her research, she is internationally renowned for her training programs and innovative teaching techniques, having developed multiple courses in biostatistics for varied audiences. While at Boston University she managed the Summer Institute for Training in Biostatistics, an NHLBI funded, 6-week summer program designed to bring undergraduate students into the fields of Biostatistics and Public Health. Most recently she developed and implemented a series of biostatistics and programming courses specific to the needs of scientists in sub-Saharan Africa. Currently she directs the Global Initiative for Neuropsychiatric Genetics Education in Researcher at the Harvard-Chan School and the Stanley Center for Psychiatric Research in the Broad Institute of Harvard and MIT.

Seminars in 2017, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 26, 2017

Speaker: Timothy Peters (Epigenetics Research Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research)

Title: Robust and flexible de novo calling of differentially methylated regions

Abstract: DNA methylation is a dynamic, environmentally sensitive modification implicated in a large array of biological processes, from transcription factor binding to a being a reliable predictor of age. Hence accurate and interpretable statistical modelling of the methylome is of great importance when investigating epigenetic cell states. DMRcate is a Bioconductor package that calls differentially methylated regions (DMRs) from replicated Illumina array (including the new EPIC array) and whole genome bisulfite sequencing (WGBS) experiments, under general experimental design. It uses a tunable kernel smoother and whole-methylome significance testing to find and rank the most differentially methylated regions for a given hypothesis. It is fast and delivers DMRs in the order of seconds for arrays and minutes for WGBS. Package features include: • Adjustable kernel size • Guidance for users towards appropriate false discovery rate (FDR) thresholds • Annotation-agnostic calling • Options for filtering Illumina probes known to be polymorphic and/or cross-hybridise to off-target genomic sites • Automatic post-calling annotation of DMRs with known Gencode promoter regions • Output in GenomicRanges and bedGraph format • Elegant plotting of DMRs using the Gviz package, including proximal Gencode gene loci • Calling of variably methylated regions (VMRs) from Illumina arrays DMRcate takes into account a number of biological and statistical considerations when defining DMRs, such as irregular spacing of CpG sites and the distribution of variances across CpGs as a result of variable sequencing depth. Reference: Peters et al (2015) De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin. 2015 Jan 27;8:6. doi: 10.1186/1756-8935-8-6.

About the speaker: Tim’s background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. In addition, he has spoken at a number of national and international conferences, including an oral presentation at the Joint Statistical Meetings (JSM) in Washington, DC.

Monday June 19, 2017

Speaker: Geoff Barton (Professor of Bioinformatics and Head of Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.)

Title: Identification of novel functional sites in protein domains from the analysis of human variation

Abstract: In this talk I will present a new analysis that compares publically available variation data for human with variation seen across all available protein sequences regardless of species. The analysis confirms patterns of variation in human are consistent with protein structural features, but highlights structurally and functionally important sites in around 15,000 human protein domains that are not found by conventional sequence analysis methods. The identified sites are enriched in disease-associated variants and ligand binding residues. I will explain the method and illustrate the new analysis with a number of examples including the Nuclear Receptor Ligand Binding Domains and G-protein coupled receptors (GPCRs) which are important therapeutic targets. The study makes heavy use of the popular Jalview (www.jalview.org) sequence analysis program developed in my group, so I will also give a brief update on Jalview’s new features for exploring nsSNPs on alignments.

Note: This is a joint event where Prof. Geoff Barton will be giving a talk to all in CPC. Time and location will be announced later.

Monday June 12, 2017, No meeting (Queen's Birthday)

Monday June 5, 2017

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Unsupervised recurrent neural network for cell event detection in videos

Abstract: In this talk, we will present an automatic unsupervised cell event detection and classification method for cell videos based on convolutional and recurrent neural networks. Cell images captured from various biomedical applications often possess different visual characteristics regarding cell appearance, motility, and cell activities. This presents difficulties in finding a generic solution for the automatic detection of cell events (division, death, differentiation, etc.) in videos. Current methods for event detection rely on human observers with specific expertise and long hours of labor; this also renders supervised training a sup-optimal choice. We use a convolutional Long Short-Term Memory (LSTM) neural network structure that simultaneously exploits both spatial visual features and temporal patterns of objects to filter and classify possible cell events in a video sequence. Our model design allows for the detection and classification of cell events without the need for labeled training data; we will demonstrate our model for the detection of mitosis events.

About the speaker: TBA

Wednesday May 31, 2017

Speaker: Stephen Leslie (Centre for Systems Genomics), Schools of Mathematics and Statistics, and BioSciences, The University of Melbourne

Title: Genetics and Geography: Using genomic data to understand population history and demography

Abstract: In this talk Stephen will present some of the findings from the People of the British Isles project, which was published in Nature in March 2015 (and featured on the cover), and some more recent work following on from this study. In particular he will show that using newly developed statistical techniques one can uncover subtle genetic differences between people from different regions at a hitherto unprecedented level of detail. For example, in the UK one can separate the neighbouring counties of Devon and Cornwall, or two islands of Orkney, using only genetic information. Stephen will then show how these genetic differences reflect current historical and archaeological knowledge, as well as providing new insights into the historical make up of the British population, and the movement of people from Europe into the British Isles. This is the first detailed analysis of very fine-scale genetic differences and their origin in a population of very similar humans. The key to the findings of this study is the careful sampling strategy and an approach to statistical analysis that accounts for the correlation structure of the genome. The methods developed are readily extended to analyses in other populations.

About the speaker: Associate Professor Stephen Leslie is a statistician working in the field of mathematical genetics. A/Prof. Leslie did his undergraduate degree at ANU, including honours in Mathematics. He obtained his doctorate from the Department of Statistics, University of Oxford in 2008, followed by post-doctoral work at Oxford, before becoming the Head of Statistical Genetics at Murdoch Childrens Research Institute in 2012. Since 2016 Stephen has been at the University of Melbourne as Associate Professor of Statistical Genomics, in the Schools of Mathematics and Statistics, and Biosciences, and the Centre for Systems Genomics. In late 2016 he was awarded the Woodward Medal in Science and Technology, the University of Melbourne’s highest award for staff, which is given for research that has made the most significant contribution to knowledge in the five years prior to the award. A/Prof. Leslie's work covers several aspects of statistical and population genetics. His group's main focus is on methodological developments for the analysis of high throughput genetic data and the application of these methods to studies of disease and natural population variation. These methods typically combine modern computationally-intensive statistical approaches with insights from population genetics models. Specifically the group works on statistical methods for imputing immune system (and other) genes from incomplete genetic data; the application of these methods to studies of autoimmune and other diseases; methods for detecting and controlling for population stratification; and understanding the causes and consequences of genetic variation in populations.

Monday May 29, 2017

Speaker: Tram Doan (Westmead Millennium Institute, Sydney Medical School)

Title: RNA-seq profiling of normal human breast epithelial cells reveals un-expected nuclear receptor segregation

Abstract: The ovarian hormone progesterone is a key regulator of female reproductive function. The established role of progesterone analogues in hormone replacement therapy in increasing breast cancer risk has sharpened focus on the mechanisms of action of this hormone in the normal breast. Progesterone play an essential role in the development of lobular alveolar structures in the breast, through stimulation of proliferation during the normal menstrual cycle and pregnancy. We previously reported that the progesterone receptor (PR) was present in the progenitor-enriched normal breast cell population and likely mediates proliferative effects in those cells. In the present study, we profiled the transcriptome of the normal human breast epithelial cells at single cell resolution. The aims are to 1) identify the number and functional characteristics of different cell populations in the normal breast epithelium, and 2) characterise PR expression and lineage association in different normal breast epithelial cell types. We show that progesterone exerts distinct functional roles in different normal breast epithelial cell types and that PR is expressed more frequently in progenitor cells and has the strongest transcriptional effect in this cell population.

About the speaker: TBA

Monday May 22, 2017

Speaker: Kevin Wang (Statistical Bioinformatics Group, School of Mathematics and Statistics)

Title: A bias correction method to identify over-represented gene-sets for boutique arrays

Abstract: Gene annotation databases such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are important tools in gene set enrichment test (also known as GST) that describe genes in terms of their associated biological functions and pathways. The purpose of this type of enrichment analysis is to assign biologically meaningful terms to each gene. Associations between a gene set and biological functions of interest can then be established by considering statistically over-represented annotation terms. Traditionally this is done through Fisher’s Exact Test (FET), assuming gene expression arrays capture the complete or at least a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platforms. Specifically, the conventional enrichment analysis is no longer appropriate due to the gene set selection bias induced during the construction of the arrays. By introducing bias correction terms in the contingency table, we thus propose an adjustment method on the traditional hypergeometric test statistics in FET. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. In this paper we demonstrate a method to adjust over-representation p-value in a grid in $\left[0,1\right]^2 $. Using our own Shiny application, we will illustrate the advantages and practicality of the method through multiple differential gene expression analyses in melanoma and other cancers.

About the speaker: I am currently a PhD candidate at the School of Mathematics and Statistics under the supervision of Prof. Jean Yang, A/Prof. Samuel Mueller and Dr. Garth Tarr. I am working in the area of statistical bioinformatics and I have strong interests in developing novel methods brought forward by high dimensional biomedical data. A central focus of my current research focuses on the increasingly popular boutique array platform and its application both as a discovery and validation platform for biomarkers for patients in melanoma studies.

Monday May 15, 2017

Speaker: Fabian Held (The Life Lab, Charles Perkins Centre)

Title: Challenges of modelling (collaborative) networks

Abstract: With a constantly expanding body of scientific knowledge and expertise collaboration is essential for the vast majority of research projects. However, we know very little about the complex interactions that affect the success and failure of collaborations, which may include collaborators’ personal attributes, their dynamics as a team, as well as the environment they’re working in. This is especially challenging when success and failure are not clearly defined. In this presentation Fabian will give an update about his progress in evaluating the Charles Perkins Centre’s effectiveness at “challenging prevailing dogmas, generating new ideas and translating knowledge into action” through facilitation of diverse collaborations. In particular, he will focus on his attempts of a statistical analysis of the network of collaborations that have emerged in the CPC, through the co-location of research groups and facilitation of project nodes. He will address conceptual, methodological, as well as technical issues of approaching this problem with exponential-family random graph models on the HPC cluster.

About the speaker: TBA

Monday May 8, 2017

Speaker: Dario Strbenac (Statistical Bioinformatics Group, SoMS, USyd)

Title: Design, Experimentation and Analysis of a Spike-in iTRAQ Proteomics Dataset Reveals Unexpected Aspects of Measurement Bias and Variance

Abstract: A replicated Latin squares experimental design was created to explore a variety of factors that influence the accuracy and precision of the measurements made by defining a set of 15 performance metrics. The experiment consisted of 21 non-yeast proteins which were spiked into a background yeast proteome in seven instrument runs of 8 samples labelled using the iTRAQ 8-plex kit. Importantly, the effect of the particular iTRAQ label used was greater than the effect of instrument run. Also, dividing the quantities of different proteins within the same run yielded reasonably accurate fold changes, providing a counter-example to the commonly accepted rule that measurements of different proteins can't be directly compared. Thirdly, the method of summarisation of peptides to a protein-level summary was found to have little effect. Finally, simple point-and-click normalisation using ProteinPilot resulted in better estimation of fold changes at the expense of increased variance and didn't perform substantially worse in any other performance metrics than methods like RUV or linear models, suggesting that commercial software can enable good quality analyses to be done quickly and accurately. The raw dataset is available from ProteomeXchange and allows anyone to apply their own normalisation method to it and upload the protein quantities to the web application and see how their method's performance compares to other methods.

Monday May 1, 2017

Speaker: Tim Burykin (The Life Lab, Charles Perkins Centre)

Title: Call for data: Exploration and visualization of complex datasets with a novel method

Abstract: As a member of professional staff, I'm helping academics at Charles Perkins Centre to visualize their data for presentation, teaching or research purposes. In the first part of the talk I will briefly demonstrate how my images and videos were used to support the narrative of high-impact presentations. I will then focus on the generic method behind these visuals and discuss its usefulness for the exploration and potentially for the in-depth analysis of complex datasets of almost any nature. The talk would be suitable for people who want to look at their data from a different angle or who are searching for a friendly yet comprehensive way to convey their work to the broader audience.

About the speaker: I received my Master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John Crawford. My project was concerned with three-dimensional modelling, analysis and visualization of soil microenvironment and leaf cellular structures. Accumulated experience in computer graphics and efficient algorithm development enabled me to join Judith & David Coffey LifeLab at Charles Perkins Centre as a data visualization technician.

Monday April 24, 2017

Speaker: Alistair Senior(School of Mathematics and Statistics, Charles Perkins Centre)

Title: Meta-analytic tools to detect overlooked variance effects in biological systems

Abstract: Medically, the effects of a treatment on among individual variation in health have direct implications for personalized medicine. Ecologically, among-individual variation governs a species niche and is the grist of evolution by natural selection. However, experimental designs and analytical paradigms in biology are heavily focused on detecting the effects of treatments on population averages. As a result, we have a comparatively poor understanding of how environments and treatments affect among-individual variation. Over the last few years I have been developing tools for meta-analysis, which allow the user to combine the results of published studies to assess the effects of treatments on variation. These methods require only those summary statistics that are reported as a matter of standard practice, and integrate easily with commonly used meta-analytic softwares. I will present a summary of the methodology, as well as examples of its application that are pertinent to research goals of the Charles Perkins Centre.

About the speaker: I did my undergraduate and masters degrees in the UK, where my research was primarily directed towards questions in ecology and evolution. In 2010 I moved to the University of Otago to do a PhD on gene-environment interactions in determining phenotypic sex, with Shinichi Nakagawa. During this period, I developed an interest in the development and application of hierarchical statistics to questions in biology. After graduating, in early 2014 I moved to Sydney where I began working with Profs Simpson and Raubenheimer to apply my quantitative skills to questions in nutritional ecology.

Monday April 17, 2017 (Easter Monday)

Monday April 10, 2017

Speaker: Pengyi Yang (School of Mathematics and Statistics, Charles Perkins Centre)

Title: A dynamic multi-omic atlas of the transition from naive to primed pluripotency.

Abstract: Embryonic stem cells (ESCs) have the potential to generate virtually any differentiated cell types to establish new models of mammalian development and to create new sources of cells for treating an enormous range of diseases. To elucidate the molecular pathways underpinning the transition from naïve to primed pluripotency cell states, we quantified the dynamic changes in the proteome, phosphoproteome, transcriptome, and epigenome underpinning the transition between these cellular states with high temporal resolution. We observed widespread remodelling of the cell across all regulatory layers, and yet the rate, extent and magnitude of phosphorylation changes exceed those observed on other levels, emphasising a critical role for phosphorylation in this process. Our dynamic phosphoproteomics data reveal that ERK and mTOR signalling branches dominate early and late signalling network activity respectively during the ESC to EpiLC transition. Collectively these data provide insight into the molecular processes underlying naïve and primed states, highlighting numerous potential gatekeeper mechanisms governing ESC pluripotency.

About the speaker: I obtained my PhD in bioinformatics from School of Information Technologies, The University of Sydney, in 2012. I then moved to the United States and completed an interdisciplinary Research Fellowship in Systems Biology Group, ESCBL, at National Institutes of Health on characterising transcriptomic and epigenomic regulations in embryonic stem cells (ESCs) using ultrafast sequencing data. I relocated back to Australia in late 2015 on a University of Sydney Postdoctoral Fellowship (DVCR) to pursue my own research in systems biology. I’m now affiliated with School of Mathematics and Statistics (SoMS); and Charles Perkins Centre, The University of Sydney. I have been offered a Lectureship in USyd (April 2016) and a Discovery Early Career Researcher Award (DECRA).

Monday April 3, 2017 (Hunter Meeting)

Monday March 20, 2017

Speaker: Ellis Patrick (School of Mathematics and Statistics)

Title: Deconstructing the innate immune component of a molecular network of the aging frontal cortex.

Abstract: Alzheimer’s disease is pathologically characterized by the accumulation of neuritic β-amyloid plaques and neurofibrillary tangles in the brain and clinically associated with a loss of cognitive function. The dysfunction of microglia cells has been proposed as one of the many cellular mechanisms that can lead to an increase in Alzheimer’s disease pathology. Investigating the molecular underpinnings of microglia function could help isolate the causes of dysfunction while also providing context for broader gene expression changes already observed in mRNA profiles of the human cortex. In this talk I will lay out the various statistical approaches I have used to tackle this problem.

About the speaker: Dr Ellis Patrick is a computational biologist and applied statistician. He is currently an Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer’s disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions.

Seminars in 2016, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 14, 2016

Speaker: Darya Vanichkina (Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Marvellous complexity: Exploring the mammalian transcriptome using RNA sequencing

Abstract: The complexity of the trillions of cells that comprise the mammalian body is underpinned not by their genomes, which are by definition identical, but by the temporally and physically precise expression of particular coding genes, long non-coding RNAs and small regulatory RNAs. In my talk, I will present some of the outcomes of my PhD research, which focussed on using and expanding upon developments in RNA sequencing technology and stem cell differentiation to deeply investigate transcripts in cortical and hindbrain-like neurons, and in oligodendrocyte precursor cells. I will also give an overview of my current work, which involves exploring the roles of alternative splicing in controlling gene expression, and the development of new methods of analysing splicing complexity.

About the speaker: I am a genomics data scientist at the Gene and Stem Cell Therapy Program at the Centenary Institute in Sydney, where I investigate how the mammalian genome works using next-generation sequencing. I use a combination of preexisting bioinformatics software and custom R, python, and shell scripts to process terabytes of data on a daily basis, taking advantage of the University of Sydney's HPC facilites. I recently completed my PhD in Bioinformatics and Genomics at the Institute for Molecular Bioscience at the University of Queensland, under the supervision of Dr. Ryan Taft and Professor John Mattick. My work focused on using high-throughput sequencing to understand changes in the transcriptome that occur during neuronal functioning in normal cells and in disease; and on induced pluripotent stem cells as models of the human nervous system. For many years, I have been passionate about teaching, especially programming and bioinformatics to biologists, and was able to do a signifcant amount of this during my PhD studies. I am both a Software and Data Carpentry instructor. I also hold a Specialist Degree in Biochemistry with a Major in Molecular Biology from Lomonosov Moscow State University.

Monday November 7, 2016

Speaker: Weichang Yu (PhD candidate, SoMS, Usyd)

Title: Semisupervised quadratic discriminant analysis using model selection and variational Bayes

Abstract: We develop a mean field collapsed variational Bayes approximation for quadratic discriminant analysis (QDA) with model selection, where we allow missing class information in the training dataset and subsequent model selection. This allows the use of unlabelled data to build the classifier and identification of strong predictors. We demonstrate using simulated and real datasets that this leads to a reduction in prediction error even in cases where the within-class dispersion is large. We make two contributions: We presented a computationally cheaper alternative to Monte Carlo Markov Chain with comparable results for Bayesian inference for QDA and a Bayesian framework for performing model selection in QDA.

About the speaker: I am a first-year PhD student at University of Sydney working with Dr John Ormerod. My research interests includes variational approximation, model selection and the use of predictive algorithms in medical and bio-informatics.

Monday October 24, 2016 (ABACBS, No seminar)

Monday October 17, 2016

Speaker: Jason Wong (Group Leader, Bioinformatics and Integrative Genomics, UNSW)

Title: Gaining fundamental insights into DNA repair-chromatin interactions through cancer genomics

Abstract: Mutations form in the genome through the interplay of DNA lesion formation and incomplete DNA repair. With the advent of cancer genomics and particularly whole cancer genome sequencing, cancer somatic mutations provides us a window into which we can looking into how mutational and DNA repair processes function within human cells. In this talk, I will discuss how we have used whole cancer genome sequencing data to discover a novel biological process. Using publicly available data, we showed that transcription factor binding at active gene promoters can impair nucleotide excision repair (NER) thereby resulting in prevalent mutation hotspots at gene promoters in NER depend cancers such as skin and lung cancers. I will further discuss the implications of this biological process on cancer development and the impact of our study on the interpretation of functional mutations in cancer.

About the speaker: Dr Wong is an ARC Future Fellow at the Prince of Wales Clinical School, UNSW and lead the Bioinformatics and Integrative Genomics Team at the Lowy Cancer Research Centre. He received his B.Sc (Hons I), from the University of Sydney and was award a D.Phil in Bioanalytical Chemistry at the University of Oxford, UK in 2007. This was followed by a post-doctoral fellowship at the University College Dublin, Ireland, before returning to Sydney to join UNSW. To date, he has published 65 peer reviewed journal articles with senior authorship in journals including Nature, Genome Biology, Molecular Cancer Research and Nucleic Acids Research. He has attracted over $2 million in research funding as lead investigator from the ARC, Cancer Australia and Cancer Institute NSW. His current research is focused on the study of mutational processes in cancer and its influence on gene regulation and function.

Monday September 19, 2016

Speaker: Fatemeh Vafaee (CPC, USyd)

Title: Determination of circulating microRNA markers of colorectal cancer prognosis by a novel network-based multi-objective optimisation routine

Abstract: Colorectal cancer presents a significant cause of cancer-related death and effective treatments that maximise quality of life as well as cancer-related outcomes are therefore of major importance. Determining the appropriate treatment pathway through a personalised medicine paradigm is a prime goal, and so biomarkers are sought to aide in the decision-making process. In the age of high-throughput technologies, molecular markers are particularly attractive as a means of achieving true personalisation of cancer treatment. We have recently evaluated the role of circulating microRNA as a means of predicting patients’ prognosis and developed an innovative multi-objective network-based optimisation method to identify robust microRNA signatures which are reliable in terms of predictive power and functional relevance. In this talk, I will go through the details of the proposed method. Also, to identify potential collaboration opportunities with the audience, I also give a concise and general overview of my research interests/projects.

About the speaker: Dr Fatemeh Vafaee received her PhD in Artificial Intelligence from the University of Illinois at Chicago in 2011. Her doctorate studies involved multiple projects in domains of optimisation, machine learning, data mining, pattern recognition, and probabilistic graphical models with the focus on theoretic and applied genetic algorithms as her PhD thesis. While perusing her PhD, Fatemeh also collaborated with the University’s Computational Biology Laboratory and extended her research to biological applications such as cellular network alignment and phylogeny reconstruction. After her PhD, Fatemeh started a postdoctoral research position at the University of Toronto, Ontario Cancer Institute, one of the largest cancer research centres in Canada and worldwide. During her postdoc, Fatemeh had the privilege to work in a highly trans-disciplinary environment and collaborate with world-renowned scholars in integrative cancer informatics. In 2013, Fatemeh took a Research Fellow position at Charles Perkins Centre and School of Maths & Stats at the University of Sydney. Her research relies on a wide national and international collaboration network and she has published several papers in competitive peer-reviewed proceedings and top-tier journals as Nature Methods, Scientific Reports, BMC Systems biology, Plos1 and Alzheimer's & Dementia.

Monday September 12, 2016

Speaker: Chendong Ma (SoMS, USyd)

Title: Honours practice talk

Monday September 5, 2016

Speaker: Shila Ghazanfar (SoMS, USyd)

Title: Integrated Single Cell Data Analysis Reveals Cell-Specific Networks and Novel Coactivation Markers

Abstract: Large scale single cell transcriptome profiling has exploded in recent years and has enabled unprecedented insight into the behavior of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterize very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterization of high gene expression, as well as gene coexpression. We offer a versatile modeling framework to identify transcriptional states as well as structures of coactivation for different neuronal cell types across multiple datasets. We employed a gamma-normal mixture model to identify active gene expression across cells, and used these to characterize markers for olfactory sensory neuron cell maturity, and to build cell-specific coactivation networks. We found that combined analysis of multiple datasets results in more known maturity markers being identified, as well as pointing towards some novel genes that may be involved in neuronal maturation. We also observed that the cell-specific coactivation networks of mature neurons tended to have a higher centralization network measure than immature neurons. Integration of multiple datasets promises to bring about more statistical power to identify genes and patterns of interest. We found that transforming the data into active and inactive gene states allowed for more direct comparison of datasets, leading to identification of maturity marker genes and cell-specific network observations, taking into account the unique characteristics of single cell transcriptomics data.

About the speaker:

Monday August 29, 2016

Speaker: Kevin Wang (SoMS, USyd)

Title: An adjustment method for gene set over-representation in boutique arrays

Monday August 22, 2016

Speaker: Cali Willet (Sydney Informatics Core Research Facility, USyd)

Title: Bioinformatics services and training

Abstract: An overview of the bioinformatics services and training available through the Sydney Informatics Core Research Facility

About the speaker: Cali Willet is a bioinformatics technician for the Sydney Informatics Core Research Facility at the University of Sydney. She completed her PhD in animal genomics and computational biology in the Faculty of Veterinary Science at the same institution. She is interested in the genetics of disease, particularly in companion and endangered animals, and in the development of bioinformatics methodologies tailored for causal locus identification in non-model organisms. As a bioinformatician for the Core Research Facilities, she is focused on providing support to bioinformatics research groups in the form of consultation, training and advocating for the needs of bioinformatics and computational biology groups at the University of Sydney.

Monday August 15, 2016 (Cancelled)

Monday August 8, 2016

Speaker: Ulf Schmitz (Research Officer, Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Intron retention redefines post-transcriptional gene regulation in mammalian and vertebrate species

Abstract: Intron retention (IR) occurs when the splicing machinery fails to excise introns from primary transcripts. This may give rise to diverse downstream effects, most often however, it induces nonsense-mediated decay (NMD) of the intron-retaining transcript. We performed a phylogenetic analysis of IR in human, mouse, dog, chicken, and zebrafish granulocytes. We found evidence that IR affects functionally related genes in granulocytes throughout evolution, many of which are orthologs. We also found a strong anti-correlation between the number of intron-retaining genes and the number of protein coding genes in a genome. Retained introns have similar characteristics in all investigated species (human, mouse, dog, chicken, zebrafish). They are shorter and have a higher GC content than their non-retaining counterparts; they often reside near the 3 prime end of a transcript and are enriched in premature termination codons. Their host genes harbour a larger number of miRNA binding sites in their 3' untranslated region and are often co-regulated in human and mouse. Our results suggest that IR is a global control mechanism affecting similar biological processes independent of specific effector genes. More important, we gained new insights that support the notion of IR as an independent mechanism of post-transcriptional gene regulation that supplements and maybe even cooperates with other form of post-transcriptional gene regulation.

About the speaker: Ulf Schmitz is a post-doctoral researcher at the Centenary Institute in Sydney. His research focuses on the design of integrative workflows combining various computational disciplines with experimentation to approach molecular biological and medical problems. Between 2003 and 2015, Ulf Schmitz worked as a systems engineer and later as bioinformatician at the Department of Systems Biology & Bioinformatics, University of Rostock, Germany. He was awarded his PhD in Bioinformatics in June 2015. Thereafter, he joined Prof John Rasko’s Gene and Stem Cell Therapy Program as a bioinformatics research officer. In January 2016, he was appointed as Conjoint Senior Lecturer at the Centenary Institute and the Sydney Medical School.

Monday August 1, 2016

Speaker: Dario Strbenac (Senior Research Associate, Statistical Bioinformatics Group, USyd)

Title: Interactive Benchmarking of Quantitative Proteomics Preprocessing Alternatives

Abstract: Mass spectrometry has long been used to analyse biological samples and find associations of altered proteins with experimental conditions. However, the focus of previous method evaluation efforts has been on the peptide amino acid sequence determination problem. Here, using a replicated Latin squares experimental design, the first comprehensive comparison of alternative choices of preprocessing alternatives on the bias and variance of protein quantitation is made. Surprisingly, the variability between iTRAQ labels is larger than between different runs of the instrument. This has consequences for research who don't adequately incorporate randomisation and blocking in their proteomic experimental designs. Secondly, the default preprocessing done by the vendor software ProteinPilot outperforms more advanced methods, such as linear models and RUV, in terms of recovering the expected fold changes (bias). Thirdly, comparing the measurements of different proteins within a sample is shown to be feasible, which was previously assumed to be inaccurate and always avoided. Finally, a benchmarking Shiny application will be demonstrated, which allows users to upload their own preprocessing of the raw data, and see how their method compares to other methods in an interactive scoreboard.

Monday July 25, 2016

Speaker: Joshua Ho (Head, Bioinformatics and Systems Medicine Laboratory, VCCRI)

Title: A systems approach to study organ development and congenital disease

Abstract: A systems biology approach is now being widely employed to systematically how molecular and signalling pathways are regulated in organ development in humans and relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the genetic causes of congenital diseases that stem from dysregulation of GRN during organ development. In this talk I will discuss latest advances in GRN inference and analysis using large amount of experimentally determined perturbation data, and how we can use GRN to study organ development and congenital diseases.

Seminars in 2016, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 20, 2016

Speaker: Rima Chaudhuri (Metabolic Cybernetics Lab, CPC, USyd)

Title: Understanding the relationship between AKT recruitment and GLUT4 translocation to the plasma membrane in fat cells through single cell microscopy data analysis.

Abstract:

About the speaker: Dr. Chaudhuri was awarded her PhD in Bioinformatics from the University of Illinois, USA in 2010. Her doctoral thesis was on the discovery and design of drugs for the treatment of SARS coronavirus and Hepatitis C virus through computational modeling. While pursuing her doctorate degree, she worked as a researcher in software and pharmaceutical companies such as Blackbaud Inc., and Pfizer Inc., (USA) developing modules of scientific research software. Dr. Chaudhuri pursued her postdoctoral training at the Parc Cientific de Barcelona (PCB) as a joint affiliate between the Institute for Research in Biomedicine and the Barcelona Supercomputing Center in Barcelona in biophysical simulations. She holds two international patents in the field of drug discovery and design. After her year-long post-doc in Spain, she moved to Sydney in 2011 and joined the Garvan Institute of Medical Research in the laboratory of Prof. David E. James to work on systems biology based approaches to unravel the complexities behind the incidence of metabolic disease such as diabetes and obesity. Her strength lies in interdisciplinary research and bridging the gap between computational and basic sciences. She is currently a research fellow at the Charles Perkins Centre in the University of Sydney. Her current research interests include isolating candidate bio-markers of T2D and obesity from molecular expression profiles, understanding and targeting protein-protein interactions in disease to facilitate a cure and integrating multi-dimensional data from different platforms (transcriptomics, proteomics, interactomics and metabolomics) to acquire a precise picture of the diseased cell.

Monday June 13, 2016 (Queen's Birthday)

Monday June 6, 2016

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Computing Image Similarity for Image-Derived Disease Models

Abstract: Imaging is a critical and indispensable component of modern healthcare. The automated analysis of medical images has a vast range of applications in evidenced-based diagnosis, physician education, and biomedical research. These decision support applications are predicated on the ability to objectively compute the similarity of image content in a manner that matches the subjective similarity judgement of human domain experts. In this talk, I will present an overview of the conceptual challenges in this field before detailing my research on methods for characterising and comparing the visual content of images, including a graph-based method for comparing 3D PET-CT lung cancer images and my more recent work using convolutional neural networks.

About the speaker: Dr. Ashnil Kumar received the Ph.D. degree in information technology also from the University of Sydney in 2013; his PhD introduced a new graph-based method for modelling the relationships between tumours and organs in medical images.

Monday May 30, 2016

Speaker: Vinita Deshpande (Metabolic Cybernetics Lab, CPC, USyd)

Title: Removing unwanted variation in large scale ‘omics datasets containing missing values

Abstract: Transcriptomics and proteomics are powerful techniques to obtain a comprehensive snapshot of biological systems ranging from cells to whole organisms. However, a major problem for such big datasets is the presence of missing values, as many statistical tools used to analyse these often require complete data. One such bioinformatic tool is RUV (Removing Unwanted Variation), a widely used R package developed to remove technical variation, such as batch effects, in order to normalise the data and perform downstream analyses such as differential expression analysis.

One of the solutions to overcoming this issue of missing data is to obtain a complete dataset, by either filtering the data to eliminate missing values or performing imputation. These approaches however, can greatly reduce the sample size or biological variation, leading to loss of statistical power. The first part of this talk will describe an alternative approach in which the RUV algorithm was adapted to handle data with missing values as its input. The performance of this new algorithm was evaluated in terms of its ability to normalise and correctly identify differentially expressed genes/proteins in large ’omics datasets containing varying amounts of simulated missing values. The second part of this talk will be a discussion on the future directions and challenges of this PhD project in terms of designing and conducting further quantitative analyses on large scale ‘omics data.

About the speaker: Vinita is a PhD student supervised by Prof David James and Prof Jean Yang at The University of Sydney, where she is pursuing her research interests in the application of systems biology and bioinformatic approaches to metabolic diseases. Vinita has previously completed a Bachelor of Science (Bioinformatics) / Bachelor of Information Technology (Computer Science) with Honours from The University of Sydney. Prior to commencing her PhD, she worked as a bioinformatics research assistant with Dr Joshua Ho in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute in Sydney.

Monday May 23, 2016

Speaker: Ashley Waardenberg (Children's Medical Research Institute; Sydney Medical School, USyd)

Title: Discovering Protein-Protein Interactions from DNA sequence - insights into the cardiac gene regulatory network and disease

Abstract: NKX2-5 is a key transcription factor (TF) required for normal heart development and is implicated in a range of cardiac diseases. NKX2-5 is a critical TF for normal heart development that binds directly to DNA by recognising a specific sequence called the NKX2-5 binding element (NKE). However, until recently its genomic targets were poorly defined and the NKX2-5 protein-protein interaction network remains poorly defined. Recently we identified genomic target regions for NKX2-5 and human disease relevant mutations in cultured HL-1 cardiomyocytes using the DamID method and identified new NKX2-5 disease mechanisms (Bouveret R, Waardenberg AJ, et al. eLIFE, 2015). This talk describes our efforts at predicting and the subsequent validation of novel protein-protein interactions (PPIs) based on recurrent binding sites (or motif grammar) through the application of machine learning algorithms.

About the speaker: Dr Ashley Waardenberg is currently a postdoctoral bioinformatican at the Children's Medical Research Institute, Westmead, where he is developing systems biology approaches for investigating proteomics and high throughput protein modification data related to the brain and associated diseases; in collaboration with Dr Mark Graham and Prof Phil Robinson. He received a PhD in Systems Biology (2012) under the supervision of A/Prof Christine Wells (now at the University of Glasgow, Scotland) and Dr Brian Dalrymple (CSIRO, Australia) where he developed a novel visualisation approach for viewing gene expression data specifically in the context of striated muscle contractile protein location. A key outcome of his PhD was the discovery of a new protein-protein interaction between PI3K and a muscle mechano-sensor in the heart, implicating the muscle contractile apparatus in responding to cardiac stress which has broader implications in the context of PI3K cancer therapies (Waardenberg, et al. Journal of Biological Chemistry, 2011). During his PhD, he was also involved in the Bovine Genome Consortia which was published in Science in 2009 and was a team recipient of the CSIRO Chairman's Medal in 2010 for contributions to this international effort. He then joined the Cardiac Developmental and Stem Cell Biology Laboratory of Professor Richard Harvey at the Victor Chang Cardiac Research Institute, Darlinghurst, as a Postdoctoral Scientist to gain a deeper insight into development biology, furthering an interest in understanding the origins of disease, where he implemented systems biology strategies for understanding genome-wide binding effects of the cardiac transcription factor NKX2-5 and NKX2-5 mutations relevant to congenital heart disease. This has a resulted in a number of recent publications (Waardenberg AJ, Ramialison M et al Cold Spring Harbour of Laboratory Perspectives in Medicine, 2014; Bouveret R, Waardenberg AJ et al. eLIFE, 2015; Waardenberg AJ et al. BMC Bioinformatics, 2015) and he continues to collaborate with the Victor Chang Cardiac Research Institute.

Ashley is also a founding member and Vice-President of the Australian Bioinformatics and Computational Biology Society (ABACBS). Ashley has been heavily involved in establishing this very young society and is passionate about establishing communities in this domain.

Monday May 16, 2016

Speaker: Timur Burykin (Judith and David Coffy Life Lab, CPC, USyd)

Title: Data visualization and exploration using particle dynamics simulation

Abstract: Exploration of complex multidimensional datasets is an ongoing challenge in many fields of research. In the attempt to simplify this task for people with no expertise in advanced statistics or programming a novel method of data visualization was discovered. The algorithm applies simple particle interaction rules on data points and allows them to self-organize into layouts that approximate the clustering of objects in the multidimensional space. Complementary density map, superimposed network connectivity and configurable node properties linked to extra dimensions make this visualization method suitable for a wide range of applications. A few datasets will be demonstrated in this presentation including hospital admission records, TF-TG interaction network and results of diet experiments. The extension of the algorithm to the advanced image and network analysis will also be discussed.

About the speaker: Tim Burykin is an experienced C++ programmer who joined Charles Perkins Centre last year as a data visualization technician and a member of Judith & David Coffey LifeLab supervised by Prof. Zdenka Kuncic. He received a master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John W. Crawford.

Monday May 9, 2016

Speaker: Denis Bauer (Team leader Transformational Bioinformatics, CSIRO)

Title: VariantSpark: applying Spark-based machine learning methods to genomic information

Abstract: Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a Hadoop/Spark framework that utilises the machine learning library, MLlib, thereby providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark offers an interface to the standard variant format (VCF), seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VariantSpark, we cluster of more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach developed by the Global Alliance for Genomics and Health, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

About the speaker: Dr. Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s ehealth program. Her expertise is in high throughput genomic data analysis, computational genome engineering, as well as Spark/Hadoop and high-performance compute system. She has a PhD in Bioinformatics and has done her Postdoctoral training in machine learning and human genetics, respectively. Her collaborators include Prof Simon Foote on mammalian susceptibility to infectious diseases, Prof Ian Blair on molecular mechanisms on motor neuron disease, and Prof Rodney Scott on obesity-driven cancer. She has 23 peer-reviewed publications (9 first author, 4 senior author) with three in journals of IF>8 (e.g. Nat Genet.) and H-index 9. To date she has attracted more than AU$25Million in funding.

Monday May 2, 2016 (Florian and Falk farewell. No seminar.)

Monday April 25, 2016 (ANZAC Day)

Monday April 18, 2016

Speaker: Michael De Ridder (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: CeraVA: A Visual Analytics Framework for Neurological Disorder Analysis with Functional Magnetic Resonance Imaging

Abstract: Functional Magnetic Resonance Imaging (fMRI) is an important imaging modality for understanding and diagnosing neurological disorders, such as schizophrenia, bipolar disorder and Alzheimer's disease. The modality temporally scans blood oxygenation as a proxy for neuronal activity. This activity is often processed into three components for analysis: (i) the anatomical context; (ii) individual voxel and region (group of voxel) time-series; and (iii) the correlation of activity between regions. While many statistical and graph theoretical approaches have been applied to data, issues such as noise and a lack of understanding of the brain lead to a diverse range of challenges. Visualisation-based analytics is often used to overcome some of these challenges, however, current methods often present an oversimplification of the data. With CereVA, we integrate all three of the commonly derived activity components in a visual analytics framework comprising of a full scale pipeline that incorporates automatic image processing and interactive visualisation. Finally, we present a new application for fMRI visual analytics by applying CereVA to the active research area of classifying neurological disorders.

About the speaker: Michael de Ridder is a PhD student with The Institute of Biomedical Engineering and Technology (BMET) in the School of Information Technologies at the University of Sydney. He is supervised by A/Prof Jinman Kim. Michael's work straddles the boundary of scientific and information visualisation with a heavy influence from medical imaging techniques.

Monday April 11, 2016 (Hunter Meeting)

Monday April 4, 2016

Speaker: Taiyun Kim (Victor Chang Cardiac Research Institute, and UNSW)

Title: PAD: An interactive web portal for analysis of transcription factor co-binding at promoters and enhancers

Abstract: It has long been observed that transcription factors (TFs) bind to DNA collaboratively with other TFs as co-binding partners. Recently, through studying the genomic binding sites of essential embryonic stem cell TF NF-Y, Dr Pengyi Yang has shown that the same TF may bind DNA with different co-binding partners if we consider TF binding sites that are proximal or distal to transcription start sites separately. Based on this observation, we have developed a database of binding sites of >200 TFs in mouse embryonic stem cells, and an interactive web portal that enables any user-submitted TF binding profiles to be clustered and visualised with our database TF profiles, at the proximal and distal regions separately. Our tool contributes to our understanding of how gene regulation occurs via combinatorial binding if TFs in different cell types.

About the speaker: Taiyun Kim is a 5th year student in the Bachelor of Engineering (Bioinformatics)/Masters of Biomedical Engineering program at the University of New South Wales. In 2015, he was awarded a Summer Scholarship to work in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute (VCCRI), under the supervision of Dr Joshua Ho (VCCRI) and Dr Pengyi Yang (University of Sydney).

Monday Feb 1, 2016

Speaker: Lei Sun (School of Information Engineering, Yangzhou University, China)

Title: Study on long noncoding RNAs using computational methods

Abstract: Tens of thousands of long noncoding RNAs (lncRNAs) newly discovered have been attracting the spotlight from life science for a period of time as their important biological functions are revealed increasingly. Due to the intrinsic complexity of lncRNA functions and mechanisms, our group proposes to study the lncRNAs using a series of computational methods, which can certainly improve the research efficiency. In my talk, I would like to share some ideas on the results of lncRNA prediction using support vector machine (SVM), and to discuss potential lncRNA-specific transcriptional patterns detected using computational methods.

About the speaker: Dr. Lei Sun received a doctor of engineering degree from China University of Mining and Technology in 2013. Now he is a lecturer in School of Information Engineering at Yangzhou University, P.R. China. is research interests include bioinformatics, signal and information processing. As a visiting PhD student, Dr. Sun was previously doing research on bioinformatics in several institutes and universities respectively, including School of IT at The University of Sydney, Institute of Molecular Bioscience (IBM) at University of Queensland, and Beijing Institute of Genomics (BIG) of Chinese Academy of Sciences.

Seminars in 2015, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Seminars in 2015, Semester 2

The seminars will be held at 1:00 pm on Monday in Access Grid Room, which is on level 8 of Carslaw Building. The format of the talk is 30~45 minutes plus questions.

Monday Nov 30

Speaker: Shila Ghazanfar (SMS, Faculty of Sciences, The University of Sydney)

Title: Gene coexpression identification from single-cell expression experiments

Abstract: Classically, gene expression profiles have represented an aggregate of expression levels of each of the multitude of cells within the sample of interest. More recently, technologies utilising quantitative PCR, such as nanoString, enable measurement of expression in individual cells opposed to an amalgamation of cells. As such, using this data along with appropriate statistical models, we can ask questions such as in what proportion of cells certain genes are expressed, and we can determine the distribution of coexpression of genes among these cells. In collaboration with Associate Professor David Lin at Cornell University, whose interest lies in investigating the olfactory system in mouse models, a set of neuronal cells were assayed with special interest in the protocadherin family of genes. We describe the statistical methods for processing the single-cell expression data and identifying coexpression of genes in subsets of the cell population, including mixture modelling and visualisation techniques for further insight.

About the speaker: Shila Ghazanfar is a 3rd year PhD student and Postgraduate Teaching Fellow in the School of Mathematics and Statistics at The University of Sydney. She is supervised by A/Prof Jean Yee Hwa Yang (The University of Sydney), Dr John Ormerod (The University of Sydney) and Dr Michael Buckley (CSIRO). Her research interests are in analysing high throughput sequence data such as RNA-Seq and Exome-Seq, and integrating different types of high-throughput data. She has previously completed a Bachelor of Science (Advanced Mathematics) and Honours in Statistics at the University of Sydney.

Monday Nov 23

Speaker: Cristian Leyton (Faculty of Health Sciences, The University of Sydney)

Title: Primary progressive aphasia and its challenges

Abstract: Primary progressive aphasia (PPA) comprises a group of neurodegenerative conditions that affect predominantly the language function. As a result of the partial destruction of the language network, several clinical variants of PPA have been described, each of which has its own profile of linguistic deficits, distribution of brain atrophy, and molecular pathology. This unique group of conditions offers a natural paradigm to understand the neural basis of language processing and how neurodegeneration starts and spreads. I will explain the main challenges in the field and explore potential contributions of bioinformatics to the field.

About the speaker: Dr Leyton worked as a clinical neurologist for four years before moving to Australia in 2009. He was awarded with a PhD on progressive aphasias in 2013 at UNSW. In 2014, he was awarded with a DVC University Postdoctoral Fellowship at the Faculty of Health Sciences, University of Sydney. His main interest is the study of aphasic manifestations caused by neurodegenerative diseases.

Monday Nov 16

Speaker: David Humphreys (Victor Chang Cardiac Research Institute)

Title: Obstacles and challenges in specialised RNA sequencing (RNA-seq) analysis

Abstract: In recent times RNA-seq has become an affordable method to profile the transcriptome of a biological sample. One of the strengths of RNA-Seq over other technologies is the ability to capture information at a nucleotide level using relatively unbiased methods without having any prior genomic information. These features have given rise to many RNA-Seq applications and with this often arises new challenges in the bioinformatics analysis. In this presentation I will highlight the challenges (and some solutions) in small RNA-Seq, RNA editing and circular RNA analysis from high throughput sequencing data.

About the speaker: Dr David Humphreys is a multidisciplinary wet-lab-scientist/bioinformatician who manages the Genomics Core facility at the Victor Chang Cardiac Research Institute. His undergraduate training comprised of a joint major in Biology and Computer Science before completing honours followed by a PhD in molecular biology. After joining the Victor Chang Cardiac Research Institute (VCCRI) he developed a research interest in gene regulation and the involvement of small non-coding RNAs. Since 2009 he has been heavily involved in studies utilising high throughput sequence technologies which has allowed him to refocus his computer science skills. David has a number of active research collaborations with VCCRI faculty and St Vincent's Hospital cardiologists utilising RNA-Seq and exome sequencing.

Monday Nov 9

Speaker: James Burchfield (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Snapshots of diabetes

Abstract: The imaging of biological systems has become a fundamental tool in the cell biologists arsenal. Our lab has utilised a range of imaging techniques to probe the insulin signalling network in single cells and have data that throws into question the traditional view of signalling networks. Central to the continued success of this approach is the ability to extract relevant information from this large volume of high-content image data and whilst the analytical pipelines for large scale genomic and proteomic data has undergone a revolution in recent years, there has been a lack of development of similar tools to analyse data generated from imaging experiments. I will discuss some of the developments in this arena in the context of diabetes and insulin resistance.

About the speaker: James obtained his PhD from The University of Sydney and pursued a postdoctoral fellowship in David James' lab in Garvan Institute of Medical Research. In 2014, James relocated with the David James' lab to Charles Perkins Centre in University of Sydney. James is the expert in using high performance microscopy for single cell imaging.

Monday Nov 2

Speaker: Martin Wong (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Kinetic simulation of the Akt pathway in Insulin Signalling

Abstract: Traditional biological research pathways are often focused on discovery of novel protein-protein interactions. The temporal kinetics of signalling events is a feature that is not commonly investigated, but they may encode important information regarding the physical mechanisms underlying these interactions. The talk today will discuss how in kinetic simulations can be used to infer these physical mechanisms. The talk will begin by discussion how models are constructed in terms of the rate equation used, the network topology and the parameter fitting procedure. The application of this will then be discussed in more detail in the context of Insulin Signalling and the Akt signalling pathway where new insight has been obtained regarding the phosphorylation mechanism of Akt prior to the activation of its downstream substrates.

About the speaker: Martin comes from a very diverse background, having completed a Bachelor of Science majoring in Physics, and a Bachelor of Engineering in the Biomedical stream. He also completed an honours in engineering where he worked on developing a bioactive material for use in implanted devices. He is now a few months from completing his PhD under David James from the Metabolic Cybernetics Lab in the CPC, and Zdenka Kuncic from the Institute of Medical Physics at the School of Physics, where he is using mathematical modelling to interrogate the temporal aspects of insulin signalling.

Monday Oct 26

Speaker: Lake-Ee Quek (Coffey LifeLab, Charles Perkins Centre, The University of Sydney)

Title: Processing of metabolite data generated by mass spectrometry

Abstract: Metabolites are the chemical species transformed during metabolism. They are direct signatures of cellular state, and therefore easier to correlate with phenotype. With mass spectrometry, the appeal is the ability to rapidly measure thousands of metabolites from very small samples. The talk today will briefly introduce metabolomics, although the focus will be on data acquisition and processing, in the context of targeted metabolomics. Global metabolic profiling in an unbiased fashion is the ultimate aim in metabolomics, with the right analytics and bioinformatics.

About the speaker: Lake-Ee obtained his PhD in Cell Metabolism in The University of Queensland (UQ) in 2010. He has been a UQ Postdoctoral Fellow from 2011 to 2014 and a Research Postdoc with A/Prof. Nigel Turner, Mitochondria Bioenergetics Lab, School of Medical Sciences, UNSW, from 2014 to 2015. He recently take on a Postdoctoral Fellowship with Coffey LifeLab in Charles Perkins Centre and relocated to The University of Sydney.

Seminars in 2015, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

The seminars will be held at 1:00 pm on Monday in Access Grid Room.

Monday June 22

Mark Greenaway (Stat, Sydney)

New tools in the R ecosystem

Show abstract / Hide abstract

In the past few years, there has been an explosion of interest in data science. R has been at the forefront of this field, which has lead to a lot of positive contributions to the R ecosystem from the wider tech. community. Useful tools from the tech. community which are available for R will be outlined, particularly focusing on GitHub, visualisation, the contributions of Hadley Wickham/RStudio, Spark and cloud computing.

Monday May 25

Paul Lin (Stat, UNSW)

Gene expression Changes in Human Rett Syndrome Brain

Show abstract / Hide abstract

Rett Syndrome (RTT) is an X-linked neurodevelopmental disorder. It affects girls at a frequency of 1 out of 10000 live births. Our study is the first transcriptome-level analysis of post-mortem RTT brain tissue with age-matched controls. We have used two technologies, RNA-seq and micro-array, to replicate our findings. We have taken into consideration of tissue composition, which hasn?t been done in previous RTT studies; we have found that tissue composition affects the outcomes of differential expression analysis. More than 95% of classic RTT cases are caused by sporadic mutations in the gene encoding methyl-CpG binding protein 2 (MeCP2). Initial studies have pointed out a transcriptional repressor role of MeCP2; our data is consistent with recent data and confirms that MeCP2 is a transcriptional activator. We have also shown that intergenic L1 expression increases in human RTT brain. Lastly, co-expression networks will be demonstrated to identify brain region specific enhancer RNAs in the human brain. In this study, we have identified a set of Robust Brain-Expressed Enhancers (rBEEs). rBEEs are enriched for genetic variants associated with autism spectrum disorders (ASD).

Monday May 18

John Ormerod (Stat, Sydney)

Statistical Analysis of a Lupus Flow Cytometry Experiment

Show abstract / Hide abstract

Results from an analysis of an flow cytometry-based observational study of patients of patients with lupus will be presented. The data was collected and analysed over a five year period at the Centenary Institute at University of Sydney. The underlying biotechnology will be described and how the statistical complications associated with the data including measurement error, missing values, and outliers were resolved.

Monday May 11

Rima Chaudhuri (CPC, Sydney)

In-silico differentiation between direct and indirect protein binding partners from a large MS-based protein interactome experiment

Show abstract / Hide abstract

MS-based protein interactome experiment Abstract: Protein-protein interactions (PPIs) are crucial in all cellular processes, primarily in understanding signalling cascades and protein functions. Affinity purification, followed by mass spectrometry analysis (AP/MS), offers a powerful approach for the study of complex protein-??protein interactions. However, such MS-based high-throughput screens are notorious for high false discovery rates (FDR). Secondly, such screens do not allow differentiation between direct an indirect protein binders. In this study, we developed a scoring function that ranks putative binders based on their likelihood of being a direct binder using an array of features, including 3D protein structure information. We use the interactomes of eIF4E and 4EBP1 proteins implicated in insulin resistance as case studies to elucidate the principles behind the development of this scoring function. Lastly, Phosphortholog, a web-based tool to map orthologous post-translational modification sites on proteins across species is demonstrated.

Monday May 4

Euijoon Ahn (IT, Sydney)

Automated Melanoma Segmentation and Classification

Show abstract / Hide abstract

The segmentation of skin lesions in dermoscopic images is considered as one of the most important steps in computer-aided diagnosis (CAD) for automated melanoma diagnosis. Existing methods, however, have problems with over-segmentation and do not perform well when the contrast between the lesion and its surrounding skin is low. A new automated saliency-based skin lesion segmentation (SSLS) method is proposed, which is designed to exploit the inherent properties of dermoscopic images, which have a focal central region and subtle contrast discrimination with the surrounding regions. The proposed method was evaluated on a public dataset of lesional dermoscopic images and was compared to established methods for lesion segmentation that included adaptive thresholding, Chan-based level set and seeded region growing. Results show that SSLS outperformed the other methods in regard to accuracy and robustness, in particular, for difficult cases. Superpixels are also introduced.

Monday April 27

Shila Ghazanfar (Stat, Sydney)

Integrative Analysis of Somatic Mutations with Focus on Biological Pathways

Show abstract / Hide abstract

The development and severity of cancers depends on the somatic mutations occurring in the tissue. Technologies like whole exome and whole genome sequencing (WES/WGS) have allowed for interrogation of the somatic mutations taken on in a tumour compared to normal tissue in a patient. However, it is clear that some mutations are worse than others, leading to work in identification of genes harbouring ?driver? mutations as opposed to ?passenger? mutations. Further to this there is work in elucidating the role these mutations play in the system as a whole, via integration of mutation, gene expression, and network information (e.g. protein-protein interaction networks), as well as other data sources. In this seminar I will discuss my work up to date on methods that aim to answer these questions, with focus on the melanoma dataset.

Monday April 20

Ellis Patrick (Stat, Sydney)

Using Resampling to Fit Better Models

Show abstract / Hide abstract

The weighted bootstrap is one of many procedures for evaluating the goodness of fit of a model. I would like to attempt to highlight how and why this work changed the way I thought about cross-validation and, most importantly, the practical impacts of using a weighted bootstrap for estimating LASSO penalty parameters. Diane Loo's work is highly relevant for anyone that has ever used cross-validation or ever plans to. I will use the prognostic melanoma data to highlight a few of the limitations of cross-validation. Time permitting, some of the issues we have faced when trying to explore and validate the weighted bootstrap will be explained. This work is currently being drafted for journal submission.

Monday April 13

Kevin Wang (Stat, Sydney) and Sarah Romanes (Stat, Sydney)

Data Exploration and Subtype Discovery and Prognosis Prediction

Show abstract / Hide abstract

Finding an appropriate measure of association between connected regions of brain resting fMRI datasets. Potential challenges of the project are noted and some exploratory analysis on a cleaned fMRI dataset is shown.

About the School

Research

Undergraduate Study

For Prospective Students

Internal Pages

Statistical Bioinformatics Seminar

Seminars in 2023, Semester 1

Seminars in 2022, Semester 2

Seminars in 2022, Semester 1

Seminars in 2021, Semester 2

Seminars in 2021, Semester 1

Seminars in 2020, Semester 2

Seminars in 2020, Semester 1

Special Event: Statistical Bioinformatics and AI for Cancer Care Symposium < Download Video

Seminars in 2019, Semester 2

Seminars in 2019, Semester 1

Seminars in 2018, Semester 2

Seminars in 2018, Semester 1

Seminars in 2017, Semester 2

Seminars in 2017, Semester 1

Seminars in 2016, Semester 2

Seminars in 2016, Semester 1

Seminars in 2015, Semester 2

Seminars in 2015, Semester 2

Seminars in 2015, Semester 1

Maths & Stats website: