# Statistical Bioinformatics Seminar

The aim of the statistical bioinformatics seminar is to provide a forum for people working within the broad area of computation and statistics and their application to various aspects of biology to present their work and showcase their ongoing projects. It is intended to foster the exchange of ideas and build potential collaborations across multiple disciplines.

To be added to or removed from the mailing list, or for any other information, please contact Shila Ghazanfar or Ellis Patrick or Pengyi Yang.

## Seminars in 2017, Semester 2

The seminars will be held at 1:00 pm on Monday in Charles Perkins Centre Seminar Room (Level 3, large meeting room). The format of the talk is 30~45 minutes plus questions.

Monday Oct 23, 2017

Speaker: Eva Chan (Garvan Institute)

Title: Detecting Complex Genomic Rearrangements using Optical Mapping

Abstract: Genomic rearrangements are common in cancer, with demonstrated links to disease progression and treatment response. These rearrangements can be complex, resulting in fusions of multiple chromosomal fragments and generation of derivative chromosomes. Comprehensively detecting complex genomic rearrangements (CGR) in cancer remains challenging. No single approach can comprehensively identify all structural variations as each has their strengths and weaknesses. In this seminar, I will demonstrate the utility of whole genome optical mapping in capturing CGR. I will further showcase an example using optical mapping to capture chained fusion events in a well-studied liposarcoma cell line. Using this approach, we identified fusion maps that clearly revealed chained fusion architectures (content, order, orientation, and size), as well as large rearrangement junctions that are undetectable by sequencing alone. I hope to convince you that optical mapping is an important complement to existing technologies for detecting and reconstructing complex genomic rearrangements.

About the speaker: Senior Bioinformatics Research Officer, Human Comparative and Prostate Cancer Genomics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, The Kinghorn Cancer Centre

Monday Oct 16, 2017 (PLEASE NOTE: Special time of 2:00PM)

Speaker: Natalie Thorne (Melbourne Genomics)

Title: Clinical bioinformatics – what does it really take to translate research into practise?

Abstract: Melbourne Genomics Health Alliance has taken a collaborative, patient-centred, clinically-driven, evidence-based and sustainable approach to delivering genomic testing. This year the Alliance has commenced implementing Victoria’s new clinical system for genomics. A platform for bioinformatics analysis and a tool for variant curation will be among the first components to be implemented and used for accredited clinical genomic testing by diagnostic laboratories. Operating within this shared digital system however, presents a challenge for laboratories to simultaneously coordinate with other diagnostic laboratories and hospitals, whilst also supporting their own business requirements for accreditation and continual innovation.
At the heart of diagnostic innovation in genomics is the emerging field of clinical bioinformatics; combining clinical, diagnostic, analytical, software and genetic aspects to implementing clinical genomic testing. The field has two key challenges: first, it is in its infancy and laboratories lack the support of a mature discipline; second, it demands skills and expertise predominantly lacking in traditional academia. These include developing enterprise-grade solutions, complex strategies for organisational change, multi-stakeholder collaboration, community engagement and rapidly evolving biotechnology.
Drawing on my experiences working with the Melbourne Genomics and Australian Genomics Health Alliances, I will discuss the challenges and opportunities in clinical bioinformatics, including the use of ‘implementation science’ for translating research bioinformatics into clinical practice.

Monday Oct 9, 2017

Title: Cluster Headache: Comparing Clustering Tools for 10X Single Cell Sequencing Data

Abstract: The commercially available 10x Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10x Genomics data lack cell labels. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with 10x Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrated that all clustering methods tested clustered cells to a large degree according to the amount of ribosomal RNA in each cell.

About the speaker: Saskia completed her Masters in Statistical Science at University College London. After finishing she moved back to Germany, where she completed a PhD in Biostatistics in 2014. She then got the opportunity to relocate to Melbourne to work as a Post-Doctoral Fellow at the Walter and Eliza Hall Institute in Melanie Bahlo’s group. Her research focus is methodological development for the analysis of high throughput sequencing data. She is co-founder of R-Ladies and an ambassador for CHOOSEMATHS.

Monday October 2, 2017 (No seminar - Labour Day public holiday)

Monday September 25, 2017 (No seminar)

Monday Sept 18, 2017

Speaker: Rebecca Poulos (UNSW)

Title: The use of big data in the search for cis-regulatory driver mutations in cancer genomes

Abstract: Mutations that directly alter protein function have been extensively studied in cancer. However, in recent years, it has become feasible to examine the cancer-causing role of mutations within the remaining 98% of the genome which is non-coding. Here I will present our use of big data in the study of cis-regulatory somatic mutations in cancer genomes. We analysed somatic mutations from over 1,000 cancer genomes across 14 cancer types, specifically focusing on promoter regions. These regulatory regions are often bound by proteins, and we discovered remarkable ‘mutation hotspots’ at sites of protein binding. To understand why these hotspots formed, we used genome-wide maps of nucleotide excision repair (NER) to show that sites of protein binding have reduced levels of NER. Our analyses uncovered the presence of a previously unknown mechanism, by which we associated reduced NER with the formation of mutation hotspots at promoters. To determine how these hotspots might impact cancer development, we investigated whether these mutations can impact the ability of a protein to bind to DNA by analyzing skin cancer mutations at the binding site of the protein CTCF. Performing CTCF ChIP-seq in a melanoma cell-line, we demonstrated the functionality of such mutations through allele-specific reduction of CTCF binding to mutant alleles. Finally, we sought to determine the role of DNA methylation (a common epigenetic modification) on the occurrence of somatic mutations in cancer. We correlated mutation load with methylation across 15 cancer types and subtypes, and we showed that reduced levels of methylation in regulatory regions may be responsible for reduced mutation loads at such loci in colorectal cancer. Taken together, these analyses develop our understanding of the formation and repair of mutagenic lesions in cis-regulatory regions of cancer genomes, providing insight into the search for driver mutations at such loci.

About the speaker: Rebecca Poulos is a researcher in the ‘Bioinformatics and Integrative Genomics’ group at the Lowy Cancer Research Centre at UNSW Sydney. Rebecca’s research field is in the area of cancer genomics, where she uses ‘big data’ to study DNA mutation and repair processes in regulatory regions of cancer genomes. Her research output includes first-author publications in ‘Nature’ and ‘Cell Reports’, together with a review article, editorial and book chapter in the area of non-coding driver mutations in cancer. Rebecca studied science and business at the University of Technology Sydney. She is currently at UNSW Sydney where she completed her Honours year (with University Medal) and is finalising her PhD research (with UNSW Research Excellence Award).

Monday Sept 11, 2017

Speaker: Mark Segal (UCSF)

Title: Statistical and Computational Challenges in Conformational Biology

Abstract: Chromatin architecture is critical to numerous cellular processes including gene regulation, while conformational disruption can be oncogenic. Accordingly, discerning chromatin configuration is of basic importance, however, this task is complicated by a number of factors including scale, compaction, dynamics, and inter-cellular variation. The recent emergence of a suite of proximity ligation based assays, notably Hi-C, has transformed conformational biology with, for example, the elicitation of topological and contact domains providing a high resolution view of genome organization. Such conformation capture assays provide proxies for pairwise distances between genomic loci which can be used to infer 3D coordinates, although much downstream analysis bypasses this reconstruction step. After demonstrating advantages deriving from obtaining 3D genome reconstructions, in particular from superposing genomic attributes on a reconstruction and identifying extrema (’3D hotspots’) thereof, we showcase methodological challenges surrounding such analyses. Open issues highlighted include (i) performing and synthesizing reconstructions from single-cell assays, (ii) devising rotation invariant methods for 3D hotspot detection, (iii) assessing genome reconstruction accuracy, and (iv) averting reconstruction uncertainty by direct integration of Hi-C data and genomic features. By using p-values from (epi)genome wide association studies as the feature the latter approach provides a conformational lens for viewing GWAS findings.

Monday Sept 4, 2017

Speaker: Dr Fabio Luciani (UNSW)

Title: A systems immunology approach to study antigen-specific T cells in viral infection

Abstract: Immunological memory is a cardinal feature of human adaptive immunity and is critical for prophylactic vaccination and recently has been shown to play important role in determining the outcome of T cell based immunotherapies in cancer. Although cytotoxic T cells can have a significant impact on disease clearance, the essential phenotype of a clinically successful T cell and how this influences therapeutic efficacy remain largely undefined. In this presentation I will present our systems immunology approach to tackle these issues. I will review recent studies on longitudinal samples of primary HCV infection using flow cytometry for phenotyping virus specific T cells, along with single cell transcriptomic and TCR diversity analyses. Future directions involve application of this systems immunology approach to other viral infections, as well to understand how long term T cell memory protection is achieved.

About the speaker: Ass. Prof. Fabio Luciani was trained as theoretical physicist (Masters), theoretical biologist (PhD 2006 from the Humboldt University of Berlin (Germany)). His research interests include adaptive immune responses against pathogen infections, computational models for studying host-pathogen interactions, and bioinformatics analyses of high throughput next generation sequencing data. He has applied mathematical modelling to understand infectious diseases, focussing on transmission dynamics of drug resistant mycobacterium tuberculosis, and the transmission of hepatitis C virus among injecting drug users. He has made several contributions in how HCV infect a new host and the role of T cell mediated responses using next generation sequencing technologies, flow cytometry and statistical modelling. More recently, he has moved into single cell genomics and systems immunology approaches to understand T cell dynamics. He currently holds a NHMRC Career Development Fellowship and he leads a systems immunology group where he conducts both wet- and dry-lab research in the field of immune responses against pathogens. During his career he has published more than 80 papers in specialized and more general journals.

Monday Aug 28, 2017

Speaker: John-Sebastian Eden (Charles Perkins Centre, USyd)

Title: Using RNA-Seq to reveal the Australian Virome

Abstract: TBA

Monday Aug 21, 2017

Speaker: Lori Chibnik (Harvard School of Public Health)

Title: Genomic journeys into neuropathology and cognitive reserve in an aging population

Abstract: TBA

About the speaker: Dr. Lori Chibnik, PhD, MPH is a biostatistician and Assistant Professor with a joint appointment in the Department of Epidemiology at the Harvard T.H. Chan School of Public Health and the Department of Medicine at the Harvard Medical School. She received her MPH in International Health and her PhD in biostatistics from Boston University where she worked on predictive modeling methods for disease risk. Over her career she has developed and assessed predictive models for diseases such as HIV, pre-natal screening and autoimmune diseases and continues to apply her methods to various diseases. Dr. Chibnik’s current research focuses primarily on genetics and genomics of Alzheimer’s disease and dementia with an emphasis on longitudinal cohorts. In addition to her research, she is internationally renowned for her training programs and innovative teaching techniques, having developed multiple courses in biostatistics for varied audiences. While at Boston University she managed the Summer Institute for Training in Biostatistics, an NHLBI funded, 6-week summer program designed to bring undergraduate students into the fields of Biostatistics and Public Health. Most recently she developed and implemented a series of biostatistics and programming courses specific to the needs of scientists in sub-Saharan Africa. Currently she directs the Global Initiative for Neuropsychiatric Genetics Education in Researcher at the Harvard-Chan School and the Stanley Center for Psychiatric Research in the Broad Institute of Harvard and MIT.

## Seminars in 2017, Semester 1

Show talks from Semester 2 / Hide talks from Semester 2

Monday June 26, 2017

Speaker: Timothy Peters (Epigenetics Research Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research)

Title: Robust and flexible de novo calling of differentially methylated regions

Abstract: DNA methylation is a dynamic, environmentally sensitive modification implicated in a large array of biological processes, from transcription factor binding to a being a reliable predictor of age. Hence accurate and interpretable statistical modelling of the methylome is of great importance when investigating epigenetic cell states. DMRcate is a Bioconductor package that calls differentially methylated regions (DMRs) from replicated Illumina array (including the new EPIC array) and whole genome bisulfite sequencing (WGBS) experiments, under general experimental design. It uses a tunable kernel smoother and whole-methylome significance testing to find and rank the most differentially methylated regions for a given hypothesis. It is fast and delivers DMRs in the order of seconds for arrays and minutes for WGBS. Package features include: • Adjustable kernel size • Guidance for users towards appropriate false discovery rate (FDR) thresholds • Annotation-agnostic calling • Options for filtering Illumina probes known to be polymorphic and/or cross-hybridise to off-target genomic sites • Automatic post-calling annotation of DMRs with known Gencode promoter regions • Output in GenomicRanges and bedGraph format • Elegant plotting of DMRs using the Gviz package, including proximal Gencode gene loci • Calling of variably methylated regions (VMRs) from Illumina arrays DMRcate takes into account a number of biological and statistical considerations when defining DMRs, such as irregular spacing of CpG sites and the distribution of variances across CpGs as a result of variable sequencing depth. Reference: Peters et al (2015) De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin. 2015 Jan 27;8:6. doi: 10.1186/1756-8935-8-6.

About the speaker: Tim’s background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. In addition, he has spoken at a number of national and international conferences, including an oral presentation at the Joint Statistical Meetings (JSM) in Washington, DC.

Monday June 19, 2017

Speaker: Geoff Barton (Professor of Bioinformatics and Head of Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.)

Title: Identification of novel functional sites in protein domains from the analysis of human variation

Abstract: In this talk I will present a new analysis that compares publically available variation data for human with variation seen across all available protein sequences regardless of species. The analysis confirms patterns of variation in human are consistent with protein structural features, but highlights structurally and functionally important sites in around 15,000 human protein domains that are not found by conventional sequence analysis methods. The identified sites are enriched in disease-associated variants and ligand binding residues. I will explain the method and illustrate the new analysis with a number of examples including the Nuclear Receptor Ligand Binding Domains and G-protein coupled receptors (GPCRs) which are important therapeutic targets. The study makes heavy use of the popular Jalview (www.jalview.org) sequence analysis program developed in my group, so I will also give a brief update on Jalview’s new features for exploring nsSNPs on alignments.

Note: This is a joint event where Prof. Geoff Barton will be giving a talk to all in CPC. Time and location will be announced later.

Monday June 12, 2017, No meeting (Queen's Birthday)

Monday June 5, 2017

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Unsupervised recurrent neural network for cell event detection in videos

Abstract: In this talk, we will present an automatic unsupervised cell event detection and classification method for cell videos based on convolutional and recurrent neural networks. Cell images captured from various biomedical applications often possess different visual characteristics regarding cell appearance, motility, and cell activities. This presents difficulties in finding a generic solution for the automatic detection of cell events (division, death, differentiation, etc.) in videos. Current methods for event detection rely on human observers with specific expertise and long hours of labor; this also renders supervised training a sup-optimal choice. We use a convolutional Long Short-Term Memory (LSTM) neural network structure that simultaneously exploits both spatial visual features and temporal patterns of objects to filter and classify possible cell events in a video sequence. Our model design allows for the detection and classification of cell events without the need for labeled training data; we will demonstrate our model for the detection of mitosis events.

Wednesday May 31, 2017

Speaker: Stephen Leslie (Centre for Systems Genomics), Schools of Mathematics and Statistics, and BioSciences, The University of Melbourne

Title: Genetics and Geography: Using genomic data to understand population history and demography

Abstract: In this talk Stephen will present some of the findings from the People of the British Isles project, which was published in Nature in March 2015 (and featured on the cover), and some more recent work following on from this study. In particular he will show that using newly developed statistical techniques one can uncover subtle genetic differences between people from different regions at a hitherto unprecedented level of detail. For example, in the UK one can separate the neighbouring counties of Devon and Cornwall, or two islands of Orkney, using only genetic information. Stephen will then show how these genetic differences reflect current historical and archaeological knowledge, as well as providing new insights into the historical make up of the British population, and the movement of people from Europe into the British Isles. This is the first detailed analysis of very fine-scale genetic differences and their origin in a population of very similar humans. The key to the findings of this study is the careful sampling strategy and an approach to statistical analysis that accounts for the correlation structure of the genome. The methods developed are readily extended to analyses in other populations.

About the speaker: Associate Professor Stephen Leslie is a statistician working in the field of mathematical genetics. A/Prof. Leslie did his undergraduate degree at ANU, including honours in Mathematics. He obtained his doctorate from the Department of Statistics, University of Oxford in 2008, followed by post-doctoral work at Oxford, before becoming the Head of Statistical Genetics at Murdoch Childrens Research Institute in 2012. Since 2016 Stephen has been at the University of Melbourne as Associate Professor of Statistical Genomics, in the Schools of Mathematics and Statistics, and Biosciences, and the Centre for Systems Genomics. In late 2016 he was awarded the Woodward Medal in Science and Technology, the University of Melbourne’s highest award for staff, which is given for research that has made the most significant contribution to knowledge in the five years prior to the award. A/Prof. Leslie's work covers several aspects of statistical and population genetics. His group's main focus is on methodological developments for the analysis of high throughput genetic data and the application of these methods to studies of disease and natural population variation. These methods typically combine modern computationally-intensive statistical approaches with insights from population genetics models. Specifically the group works on statistical methods for imputing immune system (and other) genes from incomplete genetic data; the application of these methods to studies of autoimmune and other diseases; methods for detecting and controlling for population stratification; and understanding the causes and consequences of genetic variation in populations.

Monday May 29, 2017

Speaker: Tram Doan (Westmead Millennium Institute, Sydney Medical School)

Title: RNA-seq profiling of normal human breast epithelial cells reveals un-expected nuclear receptor segregation

Abstract: The ovarian hormone progesterone is a key regulator of female reproductive function. The established role of progesterone analogues in hormone replacement therapy in increasing breast cancer risk has sharpened focus on the mechanisms of action of this hormone in the normal breast. Progesterone play an essential role in the development of lobular alveolar structures in the breast, through stimulation of proliferation during the normal menstrual cycle and pregnancy. We previously reported that the progesterone receptor (PR) was present in the progenitor-enriched normal breast cell population and likely mediates proliferative effects in those cells. In the present study, we profiled the transcriptome of the normal human breast epithelial cells at single cell resolution. The aims are to 1) identify the number and functional characteristics of different cell populations in the normal breast epithelium, and 2) characterise PR expression and lineage association in different normal breast epithelial cell types. We show that progesterone exerts distinct functional roles in different normal breast epithelial cell types and that PR is expressed more frequently in progenitor cells and has the strongest transcriptional effect in this cell population.

Monday May 22, 2017

Speaker: Kevin Wang (Statistical Bioinformatics Group, School of Mathematics and Statistics)

Title: A bias correction method to identify over-represented gene-sets for boutique arrays

Abstract: Gene annotation databases such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are important tools in gene set enrichment test (also known as GST) that describe genes in terms of their associated biological functions and pathways. The purpose of this type of enrichment analysis is to assign biologically meaningful terms to each gene. Associations between a gene set and biological functions of interest can then be established by considering statistically over-represented annotation terms. Traditionally this is done through Fisher’s Exact Test (FET), assuming gene expression arrays capture the complete or at least a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platforms. Specifically, the conventional enrichment analysis is no longer appropriate due to the gene set selection bias induced during the construction of the arrays. By introducing bias correction terms in the contingency table, we thus propose an adjustment method on the traditional hypergeometric test statistics in FET. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. In this paper we demonstrate a method to adjust over-representation p-value in a grid in $\left[0,1\right]^2$. Using our own Shiny application, we will illustrate the advantages and practicality of the method through multiple differential gene expression analyses in melanoma and other cancers.

About the speaker: I am currently a PhD candidate at the School of Mathematics and Statistics under the supervision of Prof. Jean Yang, A/Prof. Samuel Mueller and Dr. Garth Tarr. I am working in the area of statistical bioinformatics and I have strong interests in developing novel methods brought forward by high dimensional biomedical data. A central focus of my current research focuses on the increasingly popular boutique array platform and its application both as a discovery and validation platform for biomarkers for patients in melanoma studies.

Monday May 15, 2017

Speaker: Fabian Held (The Life Lab, Charles Perkins Centre)

Title: Challenges of modelling (collaborative) networks

Abstract: With a constantly expanding body of scientific knowledge and expertise collaboration is essential for the vast majority of research projects. However, we know very little about the complex interactions that affect the success and failure of collaborations, which may include collaborators’ personal attributes, their dynamics as a team, as well as the environment they’re working in. This is especially challenging when success and failure are not clearly defined. In this presentation Fabian will give an update about his progress in evaluating the Charles Perkins Centre’s effectiveness at “challenging prevailing dogmas, generating new ideas and translating knowledge into action” through facilitation of diverse collaborations. In particular, he will focus on his attempts of a statistical analysis of the network of collaborations that have emerged in the CPC, through the co-location of research groups and facilitation of project nodes. He will address conceptual, methodological, as well as technical issues of approaching this problem with exponential-family random graph models on the HPC cluster.

Monday May 8, 2017

Speaker: Dario Strbenac (Statistical Bioinformatics Group, SoMS, USyd)

Title: Design, Experimentation and Analysis of a Spike-in iTRAQ Proteomics Dataset Reveals Unexpected Aspects of Measurement Bias and Variance

Abstract: A replicated Latin squares experimental design was created to explore a variety of factors that influence the accuracy and precision of the measurements made by defining a set of 15 performance metrics. The experiment consisted of 21 non-yeast proteins which were spiked into a background yeast proteome in seven instrument runs of 8 samples labelled using the iTRAQ 8-plex kit. Importantly, the effect of the particular iTRAQ label used was greater than the effect of instrument run. Also, dividing the quantities of different proteins within the same run yielded reasonably accurate fold changes, providing a counter-example to the commonly accepted rule that measurements of different proteins can't be directly compared. Thirdly, the method of summarisation of peptides to a protein-level summary was found to have little effect. Finally, simple point-and-click normalisation using ProteinPilot resulted in better estimation of fold changes at the expense of increased variance and didn't perform substantially worse in any other performance metrics than methods like RUV or linear models, suggesting that commercial software can enable good quality analyses to be done quickly and accurately. The raw dataset is available from ProteomeXchange and allows anyone to apply their own normalisation method to it and upload the protein quantities to the web application and see how their method's performance compares to other methods.

Monday May 1, 2017

Speaker: Tim Burykin (The Life Lab, Charles Perkins Centre)

Title: Call for data: Exploration and visualization of complex datasets with a novel method

Abstract: As a member of professional staff, I'm helping academics at Charles Perkins Centre to visualize their data for presentation, teaching or research purposes. In the first part of the talk I will briefly demonstrate how my images and videos were used to support the narrative of high-impact presentations. I will then focus on the generic method behind these visuals and discuss its usefulness for the exploration and potentially for the in-depth analysis of complex datasets of almost any nature. The talk would be suitable for people who want to look at their data from a different angle or who are searching for a friendly yet comprehensive way to convey their work to the broader audience.

About the speaker: I received my Master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John Crawford. My project was concerned with three-dimensional modelling, analysis and visualization of soil microenvironment and leaf cellular structures. Accumulated experience in computer graphics and efficient algorithm development enabled me to join Judith & David Coffey LifeLab at Charles Perkins Centre as a data visualization technician.

Monday April 24, 2017

Speaker: Alistair Senior(School of Mathematics and Statistics, Charles Perkins Centre)

Title: Meta-analytic tools to detect overlooked variance effects in biological systems

Abstract: Medically, the effects of a treatment on among individual variation in health have direct implications for personalized medicine. Ecologically, among-individual variation governs a species niche and is the grist of evolution by natural selection. However, experimental designs and analytical paradigms in biology are heavily focused on detecting the effects of treatments on population averages. As a result, we have a comparatively poor understanding of how environments and treatments affect among-individual variation. Over the last few years I have been developing tools for meta-analysis, which allow the user to combine the results of published studies to assess the effects of treatments on variation. These methods require only those summary statistics that are reported as a matter of standard practice, and integrate easily with commonly used meta-analytic softwares. I will present a summary of the methodology, as well as examples of its application that are pertinent to research goals of the Charles Perkins Centre.

About the speaker: I did my undergraduate and masters degrees in the UK, where my research was primarily directed towards questions in ecology and evolution. In 2010 I moved to the University of Otago to do a PhD on gene-environment interactions in determining phenotypic sex, with Shinichi Nakagawa. During this period, I developed an interest in the development and application of hierarchical statistics to questions in biology. After graduating, in early 2014 I moved to Sydney where I began working with Profs Simpson and Raubenheimer to apply my quantitative skills to questions in nutritional ecology.

Monday April 17, 2017 (Easter Monday)

Monday April 10, 2017

Speaker: Pengyi Yang (School of Mathematics and Statistics, Charles Perkins Centre)

Title: A dynamic multi-omic atlas of the transition from naive to primed pluripotency.

Abstract: Embryonic stem cells (ESCs) have the potential to generate virtually any differentiated cell types to establish new models of mammalian development and to create new sources of cells for treating an enormous range of diseases. To elucidate the molecular pathways underpinning the transition from naïve to primed pluripotency cell states, we quantified the dynamic changes in the proteome, phosphoproteome, transcriptome, and epigenome underpinning the transition between these cellular states with high temporal resolution. We observed widespread remodelling of the cell across all regulatory layers, and yet the rate, extent and magnitude of phosphorylation changes exceed those observed on other levels, emphasising a critical role for phosphorylation in this process. Our dynamic phosphoproteomics data reveal that ERK and mTOR signalling branches dominate early and late signalling network activity respectively during the ESC to EpiLC transition. Collectively these data provide insight into the molecular processes underlying naïve and primed states, highlighting numerous potential gatekeeper mechanisms governing ESC pluripotency.

About the speaker: I obtained my PhD in bioinformatics from School of Information Technologies, The University of Sydney, in 2012. I then moved to the United States and completed an interdisciplinary Research Fellowship in Systems Biology Group, ESCBL, at National Institutes of Health on characterising transcriptomic and epigenomic regulations in embryonic stem cells (ESCs) using ultrafast sequencing data. I relocated back to Australia in late 2015 on a University of Sydney Postdoctoral Fellowship (DVCR) to pursue my own research in systems biology. I’m now affiliated with School of Mathematics and Statistics (SoMS); and Charles Perkins Centre, The University of Sydney. I have been offered a Lectureship in USyd (April 2016) and a Discovery Early Career Researcher Award (DECRA).

Monday April 3, 2017 (Hunter Meeting)

Monday March 20, 2017

Speaker: Ellis Patrick (School of Mathematics and Statistics)

Title: Deconstructing the innate immune component of a molecular network of the aging frontal cortex.

Abstract: Alzheimer’s disease is pathologically characterized by the accumulation of neuritic β-amyloid plaques and neurofibrillary tangles in the brain and clinically associated with a loss of cognitive function. The dysfunction of microglia cells has been proposed as one of the many cellular mechanisms that can lead to an increase in Alzheimer’s disease pathology. Investigating the molecular underpinnings of microglia function could help isolate the causes of dysfunction while also providing context for broader gene expression changes already observed in mRNA profiles of the human cortex. In this talk I will lay out the various statistical approaches I have used to tackle this problem.

About the speaker: Dr Ellis Patrick is a computational biologist and applied statistician. He is currently an Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer’s disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions.

## Seminars in 2016, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 14, 2016

Speaker: Darya Vanichkina (Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Marvellous complexity: Exploring the mammalian transcriptome using RNA sequencing

Abstract: The complexity of the trillions of cells that comprise the mammalian body is underpinned not by their genomes, which are by definition identical, but by the temporally and physically precise expression of particular coding genes, long non-coding RNAs and small regulatory RNAs. In my talk, I will present some of the outcomes of my PhD research, which focussed on using and expanding upon developments in RNA sequencing technology and stem cell differentiation to deeply investigate transcripts in cortical and hindbrain-like neurons, and in oligodendrocyte precursor cells. I will also give an overview of my current work, which involves exploring the roles of alternative splicing in controlling gene expression, and the development of new methods of analysing splicing complexity.

About the speaker: I am a genomics data scientist at the Gene and Stem Cell Therapy Program at the Centenary Institute in Sydney, where I investigate how the mammalian genome works using next-generation sequencing. I use a combination of preexisting bioinformatics software and custom R, python, and shell scripts to process terabytes of data on a daily basis, taking advantage of the University of Sydney's HPC facilites. I recently completed my PhD in Bioinformatics and Genomics at the Institute for Molecular Bioscience at the University of Queensland, under the supervision of Dr. Ryan Taft and Professor John Mattick. My work focused on using high-throughput sequencing to understand changes in the transcriptome that occur during neuronal functioning in normal cells and in disease; and on induced pluripotent stem cells as models of the human nervous system. For many years, I have been passionate about teaching, especially programming and bioinformatics to biologists, and was able to do a signifcant amount of this during my PhD studies. I am both a Software and Data Carpentry instructor. I also hold a Specialist Degree in Biochemistry with a Major in Molecular Biology from Lomonosov Moscow State University.

Monday November 7, 2016

Speaker: Weichang Yu (PhD candidate, SoMS, Usyd)

Title: Semisupervised quadratic discriminant analysis using model selection and variational Bayes

Abstract: We develop a mean field collapsed variational Bayes approximation for quadratic discriminant analysis (QDA) with model selection, where we allow missing class information in the training dataset and subsequent model selection. This allows the use of unlabelled data to build the classifier and identification of strong predictors. We demonstrate using simulated and real datasets that this leads to a reduction in prediction error even in cases where the within-class dispersion is large. We make two contributions: We presented a computationally cheaper alternative to Monte Carlo Markov Chain with comparable results for Bayesian inference for QDA and a Bayesian framework for performing model selection in QDA.

About the speaker: I am a first-year PhD student at University of Sydney working with Dr John Ormerod. My research interests includes variational approximation, model selection and the use of predictive algorithms in medical and bio-informatics.

Monday October 24, 2016 (ABACBS, No seminar)

Monday October 17, 2016

Speaker: Jason Wong (Group Leader, Bioinformatics and Integrative Genomics, UNSW)

Title: Gaining fundamental insights into DNA repair-chromatin interactions through cancer genomics

Abstract: Mutations form in the genome through the interplay of DNA lesion formation and incomplete DNA repair. With the advent of cancer genomics and particularly whole cancer genome sequencing, cancer somatic mutations provides us a window into which we can looking into how mutational and DNA repair processes function within human cells. In this talk, I will discuss how we have used whole cancer genome sequencing data to discover a novel biological process. Using publicly available data, we showed that transcription factor binding at active gene promoters can impair nucleotide excision repair (NER) thereby resulting in prevalent mutation hotspots at gene promoters in NER depend cancers such as skin and lung cancers. I will further discuss the implications of this biological process on cancer development and the impact of our study on the interpretation of functional mutations in cancer.

About the speaker: Dr Wong is an ARC Future Fellow at the Prince of Wales Clinical School, UNSW and lead the Bioinformatics and Integrative Genomics Team at the Lowy Cancer Research Centre. He received his B.Sc (Hons I), from the University of Sydney and was award a D.Phil in Bioanalytical Chemistry at the University of Oxford, UK in 2007. This was followed by a post-doctoral fellowship at the University College Dublin, Ireland, before returning to Sydney to join UNSW. To date, he has published 65 peer reviewed journal articles with senior authorship in journals including Nature, Genome Biology, Molecular Cancer Research and Nucleic Acids Research. He has attracted over 2 million in research funding as lead investigator from the ARC, Cancer Australia and Cancer Institute NSW. His current research is focused on the study of mutational processes in cancer and its influence on gene regulation and function. Monday September 19, 2016 Speaker: Fatemeh Vafaee (CPC, USyd) Title: Determination of circulating microRNA markers of colorectal cancer prognosis by a novel network-based multi-objective optimisation routine Abstract: Colorectal cancer presents a significant cause of cancer-related death and effective treatments that maximise quality of life as well as cancer-related outcomes are therefore of major importance. Determining the appropriate treatment pathway through a personalised medicine paradigm is a prime goal, and so biomarkers are sought to aide in the decision-making process. In the age of high-throughput technologies, molecular markers are particularly attractive as a means of achieving true personalisation of cancer treatment. We have recently evaluated the role of circulating microRNA as a means of predicting patients’ prognosis and developed an innovative multi-objective network-based optimisation method to identify robust microRNA signatures which are reliable in terms of predictive power and functional relevance. In this talk, I will go through the details of the proposed method. Also, to identify potential collaboration opportunities with the audience, I also give a concise and general overview of my research interests/projects. About the speaker: Dr Fatemeh Vafaee received her PhD in Artificial Intelligence from the University of Illinois at Chicago in 2011. Her doctorate studies involved multiple projects in domains of optimisation, machine learning, data mining, pattern recognition, and probabilistic graphical models with the focus on theoretic and applied genetic algorithms as her PhD thesis. While perusing her PhD, Fatemeh also collaborated with the University’s Computational Biology Laboratory and extended her research to biological applications such as cellular network alignment and phylogeny reconstruction. After her PhD, Fatemeh started a postdoctoral research position at the University of Toronto, Ontario Cancer Institute, one of the largest cancer research centres in Canada and worldwide. During her postdoc, Fatemeh had the privilege to work in a highly trans-disciplinary environment and collaborate with world-renowned scholars in integrative cancer informatics. In 2013, Fatemeh took a Research Fellow position at Charles Perkins Centre and School of Maths & Stats at the University of Sydney. Her research relies on a wide national and international collaboration network and she has published several papers in competitive peer-reviewed proceedings and top-tier journals as Nature Methods, Scientific Reports, BMC Systems biology, Plos1 and Alzheimer's & Dementia. Monday September 12, 2016 Speaker: Chendong Ma (SoMS, USyd) Title: Honours practice talk Monday September 5, 2016 Speaker: Shila Ghazanfar (SoMS, USyd) Title: Integrated Single Cell Data Analysis Reveals Cell-Specific Networks and Novel Coactivation Markers Abstract: Large scale single cell transcriptome profiling has exploded in recent years and has enabled unprecedented insight into the behavior of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterize very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterization of high gene expression, as well as gene coexpression. We offer a versatile modeling framework to identify transcriptional states as well as structures of coactivation for different neuronal cell types across multiple datasets. We employed a gamma-normal mixture model to identify active gene expression across cells, and used these to characterize markers for olfactory sensory neuron cell maturity, and to build cell-specific coactivation networks. We found that combined analysis of multiple datasets results in more known maturity markers being identified, as well as pointing towards some novel genes that may be involved in neuronal maturation. We also observed that the cell-specific coactivation networks of mature neurons tended to have a higher centralization network measure than immature neurons. Integration of multiple datasets promises to bring about more statistical power to identify genes and patterns of interest. We found that transforming the data into active and inactive gene states allowed for more direct comparison of datasets, leading to identification of maturity marker genes and cell-specific network observations, taking into account the unique characteristics of single cell transcriptomics data. About the speaker: Monday August 29, 2016 Speaker: Kevin Wang (SoMS, USyd) Title: An adjustment method for gene set over-representation in boutique arrays Monday August 22, 2016 Speaker: Cali Willet (Sydney Informatics Core Research Facility, USyd) Title: Bioinformatics services and training Abstract: An overview of the bioinformatics services and training available through the Sydney Informatics Core Research Facility About the speaker: Cali Willet is a bioinformatics technician for the Sydney Informatics Core Research Facility at the University of Sydney. She completed her PhD in animal genomics and computational biology in the Faculty of Veterinary Science at the same institution. She is interested in the genetics of disease, particularly in companion and endangered animals, and in the development of bioinformatics methodologies tailored for causal locus identification in non-model organisms. As a bioinformatician for the Core Research Facilities, she is focused on providing support to bioinformatics research groups in the form of consultation, training and advocating for the needs of bioinformatics and computational biology groups at the University of Sydney. Monday August 15, 2016 (Cancelled) Monday August 8, 2016 Speaker: Ulf Schmitz (Research Officer, Gene & Stem Cell Therapy Program, Centenary Institute) Title: Intron retention redefines post-transcriptional gene regulation in mammalian and vertebrate species Abstract: Intron retention (IR) occurs when the splicing machinery fails to excise introns from primary transcripts. This may give rise to diverse downstream effects, most often however, it induces nonsense-mediated decay (NMD) of the intron-retaining transcript. We performed a phylogenetic analysis of IR in human, mouse, dog, chicken, and zebrafish granulocytes. We found evidence that IR affects functionally related genes in granulocytes throughout evolution, many of which are orthologs. We also found a strong anti-correlation between the number of intron-retaining genes and the number of protein coding genes in a genome. Retained introns have similar characteristics in all investigated species (human, mouse, dog, chicken, zebrafish). They are shorter and have a higher GC content than their non-retaining counterparts; they often reside near the 3 prime end of a transcript and are enriched in premature termination codons. Their host genes harbour a larger number of miRNA binding sites in their 3' untranslated region and are often co-regulated in human and mouse. Our results suggest that IR is a global control mechanism affecting similar biological processes independent of specific effector genes. More important, we gained new insights that support the notion of IR as an independent mechanism of post-transcriptional gene regulation that supplements and maybe even cooperates with other form of post-transcriptional gene regulation. About the speaker: Ulf Schmitz is a post-doctoral researcher at the Centenary Institute in Sydney. His research focuses on the design of integrative workflows combining various computational disciplines with experimentation to approach molecular biological and medical problems. Between 2003 and 2015, Ulf Schmitz worked as a systems engineer and later as bioinformatician at the Department of Systems Biology & Bioinformatics, University of Rostock, Germany. He was awarded his PhD in Bioinformatics in June 2015. Thereafter, he joined Prof John Rasko’s Gene and Stem Cell Therapy Program as a bioinformatics research officer. In January 2016, he was appointed as Conjoint Senior Lecturer at the Centenary Institute and the Sydney Medical School. Monday August 1, 2016 Speaker: Dario Strbenac (Senior Research Associate, Statistical Bioinformatics Group, USyd) Title: Interactive Benchmarking of Quantitative Proteomics Preprocessing Alternatives Abstract: Mass spectrometry has long been used to analyse biological samples and find associations of altered proteins with experimental conditions. However, the focus of previous method evaluation efforts has been on the peptide amino acid sequence determination problem. Here, using a replicated Latin squares experimental design, the first comprehensive comparison of alternative choices of preprocessing alternatives on the bias and variance of protein quantitation is made. Surprisingly, the variability between iTRAQ labels is larger than between different runs of the instrument. This has consequences for research who don't adequately incorporate randomisation and blocking in their proteomic experimental designs. Secondly, the default preprocessing done by the vendor software ProteinPilot outperforms more advanced methods, such as linear models and RUV, in terms of recovering the expected fold changes (bias). Thirdly, comparing the measurements of different proteins within a sample is shown to be feasible, which was previously assumed to be inaccurate and always avoided. Finally, a benchmarking Shiny application will be demonstrated, which allows users to upload their own preprocessing of the raw data, and see how their method compares to other methods in an interactive scoreboard. Monday July 25, 2016 Speaker: Joshua Ho (Head, Bioinformatics and Systems Medicine Laboratory, VCCRI) Title: A systems approach to study organ development and congenital disease Abstract: A systems biology approach is now being widely employed to systematically how molecular and signalling pathways are regulated in organ development in humans and relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the genetic causes of congenital diseases that stem from dysregulation of GRN during organ development. In this talk I will discuss latest advances in GRN inference and analysis using large amount of experimentally determined perturbation data, and how we can use GRN to study organ development and congenital diseases. About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS). ## Seminars in 2016, Semester 1 Show talks from Semester 1 / Hide talks from Semester 1 Monday June 20, 2016 Speaker: Rima Chaudhuri (Metabolic Cybernetics Lab, CPC, USyd) Title: Understanding the relationship between AKT recruitment and GLUT4 translocation to the plasma membrane in fat cells through single cell microscopy data analysis. Abstract: About the speaker: Dr. Chaudhuri was awarded her PhD in Bioinformatics from the University of Illinois, USA in 2010. Her doctoral thesis was on the discovery and design of drugs for the treatment of SARS coronavirus and Hepatitis C virus through computational modeling. While pursuing her doctorate degree, she worked as a researcher in software and pharmaceutical companies such as Blackbaud Inc., and Pfizer Inc., (USA) developing modules of scientific research software. Dr. Chaudhuri pursued her postdoctoral training at the Parc Cientific de Barcelona (PCB) as a joint affiliate between the Institute for Research in Biomedicine and the Barcelona Supercomputing Center in Barcelona in biophysical simulations. She holds two international patents in the field of drug discovery and design. After her year-long post-doc in Spain, she moved to Sydney in 2011 and joined the Garvan Institute of Medical Research in the laboratory of Prof. David E. James to work on systems biology based approaches to unravel the complexities behind the incidence of metabolic disease such as diabetes and obesity. Her strength lies in interdisciplinary research and bridging the gap between computational and basic sciences. She is currently a research fellow at the Charles Perkins Centre in the University of Sydney. Her current research interests include isolating candidate bio-markers of T2D and obesity from molecular expression profiles, understanding and targeting protein-protein interactions in disease to facilitate a cure and integrating multi-dimensional data from different platforms (transcriptomics, proteomics, interactomics and metabolomics) to acquire a precise picture of the diseased cell. Monday June 13, 2016 (Queen's Birthday) Monday June 6, 2016 Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd) Title: Computing Image Similarity for Image-Derived Disease Models Abstract: Imaging is a critical and indispensable component of modern healthcare. The automated analysis of medical images has a vast range of applications in evidenced-based diagnosis, physician education, and biomedical research. These decision support applications are predicated on the ability to objectively compute the similarity of image content in a manner that matches the subjective similarity judgement of human domain experts. In this talk, I will present an overview of the conceptual challenges in this field before detailing my research on methods for characterising and comparing the visual content of images, including a graph-based method for comparing 3D PET-CT lung cancer images and my more recent work using convolutional neural networks. About the speaker: Dr. Ashnil Kumar received the Ph.D. degree in information technology also from the University of Sydney in 2013; his PhD introduced a new graph-based method for modelling the relationships between tumours and organs in medical images. Monday May 30, 2016 Speaker: Vinita Deshpande (Metabolic Cybernetics Lab, CPC, USyd) Title: Removing unwanted variation in large scale ‘omics datasets containing missing values Abstract: Transcriptomics and proteomics are powerful techniques to obtain a comprehensive snapshot of biological systems ranging from cells to whole organisms. However, a major problem for such big datasets is the presence of missing values, as many statistical tools used to analyse these often require complete data. One such bioinformatic tool is RUV (Removing Unwanted Variation), a widely used R package developed to remove technical variation, such as batch effects, in order to normalise the data and perform downstream analyses such as differential expression analysis. One of the solutions to overcoming this issue of missing data is to obtain a complete dataset, by either filtering the data to eliminate missing values or performing imputation. These approaches however, can greatly reduce the sample size or biological variation, leading to loss of statistical power. The first part of this talk will describe an alternative approach in which the RUV algorithm was adapted to handle data with missing values as its input. The performance of this new algorithm was evaluated in terms of its ability to normalise and correctly identify differentially expressed genes/proteins in large ’omics datasets containing varying amounts of simulated missing values. The second part of this talk will be a discussion on the future directions and challenges of this PhD project in terms of designing and conducting further quantitative analyses on large scale ‘omics data. About the speaker: Vinita is a PhD student supervised by Prof David James and Prof Jean Yang at The University of Sydney, where she is pursuing her research interests in the application of systems biology and bioinformatic approaches to metabolic diseases. Vinita has previously completed a Bachelor of Science (Bioinformatics) / Bachelor of Information Technology (Computer Science) with Honours from The University of Sydney. Prior to commencing her PhD, she worked as a bioinformatics research assistant with Dr Joshua Ho in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute in Sydney. Monday May 23, 2016 Speaker: Ashley Waardenberg (Children's Medical Research Institute; Sydney Medical School, USyd) Title: Discovering Protein-Protein Interactions from DNA sequence - insights into the cardiac gene regulatory network and disease Abstract: NKX2-5 is a key transcription factor (TF) required for normal heart development and is implicated in a range of cardiac diseases. NKX2-5 is a critical TF for normal heart development that binds directly to DNA by recognising a specific sequence called the NKX2-5 binding element (NKE). However, until recently its genomic targets were poorly defined and the NKX2-5 protein-protein interaction network remains poorly defined. Recently we identified genomic target regions for NKX2-5 and human disease relevant mutations in cultured HL-1 cardiomyocytes using the DamID method and identified new NKX2-5 disease mechanisms (Bouveret R, Waardenberg AJ, et al. eLIFE, 2015). This talk describes our efforts at predicting and the subsequent validation of novel protein-protein interactions (PPIs) based on recurrent binding sites (or motif grammar) through the application of machine learning algorithms. About the speaker: Dr Ashley Waardenberg is currently a postdoctoral bioinformatican at the Children's Medical Research Institute, Westmead, where he is developing systems biology approaches for investigating proteomics and high throughput protein modification data related to the brain and associated diseases; in collaboration with Dr Mark Graham and Prof Phil Robinson. He received a PhD in Systems Biology (2012) under the supervision of A/Prof Christine Wells (now at the University of Glasgow, Scotland) and Dr Brian Dalrymple (CSIRO, Australia) where he developed a novel visualisation approach for viewing gene expression data specifically in the context of striated muscle contractile protein location. A key outcome of his PhD was the discovery of a new protein-protein interaction between PI3K and a muscle mechano-sensor in the heart, implicating the muscle contractile apparatus in responding to cardiac stress which has broader implications in the context of PI3K cancer therapies (Waardenberg, et al. Journal of Biological Chemistry, 2011). During his PhD, he was also involved in the Bovine Genome Consortia which was published in Science in 2009 and was a team recipient of the CSIRO Chairman's Medal in 2010 for contributions to this international effort. He then joined the Cardiac Developmental and Stem Cell Biology Laboratory of Professor Richard Harvey at the Victor Chang Cardiac Research Institute, Darlinghurst, as a Postdoctoral Scientist to gain a deeper insight into development biology, furthering an interest in understanding the origins of disease, where he implemented systems biology strategies for understanding genome-wide binding effects of the cardiac transcription factor NKX2-5 and NKX2-5 mutations relevant to congenital heart disease. This has a resulted in a number of recent publications (Waardenberg AJ, Ramialison M et al Cold Spring Harbour of Laboratory Perspectives in Medicine, 2014; Bouveret R, Waardenberg AJ et al. eLIFE, 2015; Waardenberg AJ et al. BMC Bioinformatics, 2015) and he continues to collaborate with the Victor Chang Cardiac Research Institute. Ashley is also a founding member and Vice-President of the Australian Bioinformatics and Computational Biology Society (ABACBS). Ashley has been heavily involved in establishing this very young society and is passionate about establishing communities in this domain. Monday May 16, 2016 Speaker: Timur Burykin (Judith and David Coffy Life Lab, CPC, USyd) Title: Data visualization and exploration using particle dynamics simulation Abstract: Exploration of complex multidimensional datasets is an ongoing challenge in many fields of research. In the attempt to simplify this task for people with no expertise in advanced statistics or programming a novel method of data visualization was discovered. The algorithm applies simple particle interaction rules on data points and allows them to self-organize into layouts that approximate the clustering of objects in the multidimensional space. Complementary density map, superimposed network connectivity and configurable node properties linked to extra dimensions make this visualization method suitable for a wide range of applications. A few datasets will be demonstrated in this presentation including hospital admission records, TF-TG interaction network and results of diet experiments. The extension of the algorithm to the advanced image and network analysis will also be discussed. About the speaker: Tim Burykin is an experienced C++ programmer who joined Charles Perkins Centre last year as a data visualization technician and a member of Judith & David Coffey LifeLab supervised by Prof. Zdenka Kuncic. He received a master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John W. Crawford. Monday May 9, 2016 Speaker: Denis Bauer (Team leader Transformational Bioinformatics, CSIRO) Title: VariantSpark: applying Spark-based machine learning methods to genomic information Abstract: Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a Hadoop/Spark framework that utilises the machine learning library, MLlib, thereby providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark offers an interface to the standard variant format (VCF), seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VariantSpark, we cluster of more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach developed by the Global Alliance for Genomics and Health, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data. About the speaker: Dr. Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s ehealth program. Her expertise is in high throughput genomic data analysis, computational genome engineering, as well as Spark/Hadoop and high-performance compute system. She has a PhD in Bioinformatics and has done her Postdoctoral training in machine learning and human genetics, respectively. Her collaborators include Prof Simon Foote on mammalian susceptibility to infectious diseases, Prof Ian Blair on molecular mechanisms on motor neuron disease, and Prof Rodney Scott on obesity-driven cancer. She has 23 peer-reviewed publications (9 first author, 4 senior author) with three in journals of IF>8 (e.g. Nat Genet.) and H-index 9. To date she has attracted more than AU25Million in funding.

Monday May 2, 2016 (Florian and Falk farewell. No seminar.)

Monday April 25, 2016 (ANZAC Day)

Monday April 18, 2016

Speaker: Michael De Ridder (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: CeraVA: A Visual Analytics Framework for Neurological Disorder Analysis with Functional Magnetic Resonance Imaging

Abstract: Functional Magnetic Resonance Imaging (fMRI) is an important imaging modality for understanding and diagnosing neurological disorders, such as schizophrenia, bipolar disorder and Alzheimer's disease. The modality temporally scans blood oxygenation as a proxy for neuronal activity. This activity is often processed into three components for analysis: (i) the anatomical context; (ii) individual voxel and region (group of voxel) time-series; and (iii) the correlation of activity between regions. While many statistical and graph theoretical approaches have been applied to data, issues such as noise and a lack of understanding of the brain lead to a diverse range of challenges. Visualisation-based analytics is often used to overcome some of these challenges, however, current methods often present an oversimplification of the data. With CereVA, we integrate all three of the commonly derived activity components in a visual analytics framework comprising of a full scale pipeline that incorporates automatic image processing and interactive visualisation. Finally, we present a new application for fMRI visual analytics by applying CereVA to the active research area of classifying neurological disorders.

About the speaker: Michael de Ridder is a PhD student with The Institute of Biomedical Engineering and Technology (BMET) in the School of Information Technologies at the University of Sydney. He is supervised by A/Prof Jinman Kim. Michael's work straddles the boundary of scientific and information visualisation with a heavy influence from medical imaging techniques.

Monday April 11, 2016 (Hunter Meeting)

Monday April 4, 2016

Speaker: Taiyun Kim (Victor Chang Cardiac Research Institute, and UNSW)

Title: PAD: An interactive web portal for analysis of transcription factor co-binding at promoters and enhancers

Abstract: It has long been observed that transcription factors (TFs) bind to DNA collaboratively with other TFs as co-binding partners. Recently, through studying the genomic binding sites of essential embryonic stem cell TF NF-Y, Dr Pengyi Yang has shown that the same TF may bind DNA with different co-binding partners if we consider TF binding sites that are proximal or distal to transcription start sites separately. Based on this observation, we have developed a database of binding sites of >200 TFs in mouse embryonic stem cells, and an interactive web portal that enables any user-submitted TF binding profiles to be clustered and visualised with our database TF profiles, at the proximal and distal regions separately. Our tool contributes to our understanding of how gene regulation occurs via combinatorial binding if TFs in different cell types.

About the speaker: Taiyun Kim is a 5th year student in the Bachelor of Engineering (Bioinformatics)/Masters of Biomedical Engineering program at the University of New South Wales. In 2015, he was awarded a Summer Scholarship to work in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute (VCCRI), under the supervision of Dr Joshua Ho (VCCRI) and Dr Pengyi Yang (University of Sydney).

Monday Feb 1, 2016

Speaker: Lei Sun (School of Information Engineering, Yangzhou University, China)

Title: Study on long noncoding RNAs using computational methods

Abstract: Tens of thousands of long noncoding RNAs (lncRNAs) newly discovered have been attracting the spotlight from life science for a period of time as their important biological functions are revealed increasingly. Due to the intrinsic complexity of lncRNA functions and mechanisms, our group proposes to study the lncRNAs using a series of computational methods, which can certainly improve the research efficiency. In my talk, I would like to share some ideas on the results of lncRNA prediction using support vector machine (SVM), and to discuss potential lncRNA-specific transcriptional patterns detected using computational methods.

About the speaker: Dr. Lei Sun received a doctor of engineering degree from China University of Mining and Technology in 2013. Now he is a lecturer in School of Information Engineering at Yangzhou University, P.R. China. is research interests include bioinformatics, signal and information processing. As a visiting PhD student, Dr. Sun was previously doing research on bioinformatics in several institutes and universities respectively, including School of IT at The University of Sydney, Institute of Molecular Bioscience (IBM) at University of Queensland, and Beijing Institute of Genomics (BIG) of Chinese Academy of Sciences.

## Seminars in 2015, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

## Seminars in 2015, Semester 2

The seminars will be held at 1:00 pm on Monday in Access Grid Room, which is on level 8 of Carslaw Building. The format of the talk is 30~45 minutes plus questions.

Monday Nov 30

Speaker: Shila Ghazanfar (SMS, Faculty of Sciences, The University of Sydney)

Title: Gene coexpression identification from single-cell expression experiments

Abstract: Classically, gene expression profiles have represented an aggregate of expression levels of each of the multitude of cells within the sample of interest. More recently, technologies utilising quantitative PCR, such as nanoString, enable measurement of expression in individual cells opposed to an amalgamation of cells. As such, using this data along with appropriate statistical models, we can ask questions such as in what proportion of cells certain genes are expressed, and we can determine the distribution of coexpression of genes among these cells. In collaboration with Associate Professor David Lin at Cornell University, whose interest lies in investigating the olfactory system in mouse models, a set of neuronal cells were assayed with special interest in the protocadherin family of genes. We describe the statistical methods for processing the single-cell expression data and identifying coexpression of genes in subsets of the cell population, including mixture modelling and visualisation techniques for further insight.

About the speaker: Shila Ghazanfar is a 3rd year PhD student and Postgraduate Teaching Fellow in the School of Mathematics and Statistics at The University of Sydney. She is supervised by A/Prof Jean Yee Hwa Yang (The University of Sydney), Dr John Ormerod (The University of Sydney) and Dr Michael Buckley (CSIRO). Her research interests are in analysing high throughput sequence data such as RNA-Seq and Exome-Seq, and integrating different types of high-throughput data. She has previously completed a Bachelor of Science (Advanced Mathematics) and Honours in Statistics at the University of Sydney.

Monday Nov 23

Speaker: Cristian Leyton (Faculty of Health Sciences, The University of Sydney)

Title: Primary progressive aphasia and its challenges

Abstract: Primary progressive aphasia (PPA) comprises a group of neurodegenerative conditions that affect predominantly the language function. As a result of the partial destruction of the language network, several clinical variants of PPA have been described, each of which has its own profile of linguistic deficits, distribution of brain atrophy, and molecular pathology. This unique group of conditions offers a natural paradigm to understand the neural basis of language processing and how neurodegeneration starts and spreads. I will explain the main challenges in the field and explore potential contributions of bioinformatics to the field.

About the speaker: Dr Leyton worked as a clinical neurologist for four years before moving to Australia in 2009. He was awarded with a PhD on progressive aphasias in 2013 at UNSW. In 2014, he was awarded with a DVC University Postdoctoral Fellowship at the Faculty of Health Sciences, University of Sydney. His main interest is the study of aphasic manifestations caused by neurodegenerative diseases.

Monday Nov 16

Speaker: David Humphreys (Victor Chang Cardiac Research Institute)

Title: Obstacles and challenges in specialised RNA sequencing (RNA-seq) analysis

Abstract: In recent times RNA-seq has become an affordable method to profile the transcriptome of a biological sample. One of the strengths of RNA-Seq over other technologies is the ability to capture information at a nucleotide level using relatively unbiased methods without having any prior genomic information. These features have given rise to many RNA-Seq applications and with this often arises new challenges in the bioinformatics analysis. In this presentation I will highlight the challenges (and some solutions) in small RNA-Seq, RNA editing and circular RNA analysis from high throughput sequencing data.

About the speaker: Dr David Humphreys is a multidisciplinary wet-lab-scientist/bioinformatician who manages the Genomics Core facility at the Victor Chang Cardiac Research Institute. His undergraduate training comprised of a joint major in Biology and Computer Science before completing honours followed by a PhD in molecular biology. After joining the Victor Chang Cardiac Research Institute (VCCRI) he developed a research interest in gene regulation and the involvement of small non-coding RNAs. Since 2009 he has been heavily involved in studies utilising high throughput sequence technologies which has allowed him to refocus his computer science skills. David has a number of active research collaborations with VCCRI faculty and St Vincent's Hospital cardiologists utilising RNA-Seq and exome sequencing.

Monday Nov 9

Speaker: James Burchfield (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Snapshots of diabetes

Abstract: The imaging of biological systems has become a fundamental tool in the cell biologists arsenal. Our lab has utilised a range of imaging techniques to probe the insulin signalling network in single cells and have data that throws into question the traditional view of signalling networks. Central to the continued success of this approach is the ability to extract relevant information from this large volume of high-content image data and whilst the analytical pipelines for large scale genomic and proteomic data has undergone a revolution in recent years, there has been a lack of development of similar tools to analyse data generated from imaging experiments. I will discuss some of the developments in this arena in the context of diabetes and insulin resistance.

About the speaker: James obtained his PhD from The University of Sydney and pursued a postdoctoral fellowship in David James' lab in Garvan Institute of Medical Research. In 2014, James relocated with the David James' lab to Charles Perkins Centre in University of Sydney. James is the expert in using high performance microscopy for single cell imaging.

Monday Nov 2

Speaker: Martin Wong (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Kinetic simulation of the Akt pathway in Insulin Signalling

Abstract: Traditional biological research pathways are often focused on discovery of novel protein-protein interactions. The temporal kinetics of signalling events is a feature that is not commonly investigated, but they may encode important information regarding the physical mechanisms underlying these interactions. The talk today will discuss how in kinetic simulations can be used to infer these physical mechanisms. The talk will begin by discussion how models are constructed in terms of the rate equation used, the network topology and the parameter fitting procedure. The application of this will then be discussed in more detail in the context of Insulin Signalling and the Akt signalling pathway where new insight has been obtained regarding the phosphorylation mechanism of Akt prior to the activation of its downstream substrates.

About the speaker: Martin comes from a very diverse background, having completed a Bachelor of Science majoring in Physics, and a Bachelor of Engineering in the Biomedical stream. He also completed an honours in engineering where he worked on developing a bioactive material for use in implanted devices. He is now a few months from completing his PhD under David James from the Metabolic Cybernetics Lab in the CPC, and Zdenka Kuncic from the Institute of Medical Physics at the School of Physics, where he is using mathematical modelling to interrogate the temporal aspects of insulin signalling.

Monday Oct 26

Speaker: Lake-Ee Quek (Coffey LifeLab, Charles Perkins Centre, The University of Sydney)

Title: Processing of metabolite data generated by mass spectrometry

Abstract: Metabolites are the chemical species transformed during metabolism. They are direct signatures of cellular state, and therefore easier to correlate with phenotype. With mass spectrometry, the appeal is the ability to rapidly measure thousands of metabolites from very small samples. The talk today will briefly introduce metabolomics, although the focus will be on data acquisition and processing, in the context of targeted metabolomics. Global metabolic profiling in an unbiased fashion is the ultimate aim in metabolomics, with the right analytics and bioinformatics.

About the speaker: Lake-Ee obtained his PhD in Cell Metabolism in The University of Queensland (UQ) in 2010. He has been a UQ Postdoctoral Fellow from 2011 to 2014 and a Research Postdoc with A/Prof. Nigel Turner, Mitochondria Bioenergetics Lab, School of Medical Sciences, UNSW, from 2014 to 2015. He recently take on a Postdoctoral Fellowship with Coffey LifeLab in Charles Perkins Centre and relocated to The University of Sydney.

## Seminars in 2015, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

The seminars will be held at 1:00 pm on Monday in Access Grid Room.

Monday June 22

Mark Greenaway (Stat, Sydney)

New tools in the R ecosystem

Show abstract / Hide abstract

In the past few years, there has been an explosion of interest in data science. R has been at the forefront of this field, which has lead to a lot of positive contributions to the R ecosystem from the wider tech. community. Useful tools from the tech. community which are available for R will be outlined, particularly focusing on GitHub, visualisation, the contributions of Hadley Wickham/RStudio, Spark and cloud computing.

Monday May 25

Paul Lin (Stat, UNSW)

Gene expression Changes in Human Rett Syndrome Brain

Show abstract / Hide abstract

Rett Syndrome (RTT) is an X-linked neurodevelopmental disorder. It affects girls at a frequency of 1 out of 10000 live births. Our study is the first transcriptome-level analysis of post-mortem RTT brain tissue with age-matched controls. We have used two technologies, RNA-seq and micro-array, to replicate our findings. We have taken into consideration of tissue composition, which hasn?t been done in previous RTT studies; we have found that tissue composition affects the outcomes of differential expression analysis. More than 95% of classic RTT cases are caused by sporadic mutations in the gene encoding methyl-CpG binding protein 2 (MeCP2). Initial studies have pointed out a transcriptional repressor role of MeCP2; our data is consistent with recent data and confirms that MeCP2 is a transcriptional activator. We have also shown that intergenic L1 expression increases in human RTT brain. Lastly, co-expression networks will be demonstrated to identify brain region specific enhancer RNAs in the human brain. In this study, we have identified a set of Robust Brain-Expressed Enhancers (rBEEs). rBEEs are enriched for genetic variants associated with autism spectrum disorders (ASD).

Monday May 18

Statistical Analysis of a Lupus Flow Cytometry Experiment

Show abstract / Hide abstract

Results from an analysis of an flow cytometry-based observational study of patients of patients with lupus will be presented. The data was collected and analysed over a five year period at the Centenary Institute at University of Sydney. The underlying biotechnology will be described and how the statistical complications associated with the data including measurement error, missing values, and outliers were resolved.

Monday May 11

In-silico differentiation between direct and indirect protein binding partners from a large MS-based protein interactome experiment

Show abstract / Hide abstract

MS-based protein interactome experiment Abstract: Protein-protein interactions (PPIs) are crucial in all cellular processes, primarily in understanding signalling cascades and protein functions. Affinity purification, followed by mass spectrometry analysis (AP/MS), offers a powerful approach for the study of complex protein-??protein interactions. However, such MS-based high-throughput screens are notorious for high false discovery rates (FDR). Secondly, such screens do not allow differentiation between direct an indirect protein binders. In this study, we developed a scoring function that ranks putative binders based on their likelihood of being a direct binder using an array of features, including 3D protein structure information. We use the interactomes of eIF4E and 4EBP1 proteins implicated in insulin resistance as case studies to elucidate the principles behind the development of this scoring function. Lastly, Phosphortholog, a web-based tool to map orthologous post-translational modification sites on proteins across species is demonstrated.

Monday May 4

Euijoon Ahn (IT, Sydney)

Automated Melanoma Segmentation and Classification

Show abstract / Hide abstract

The segmentation of skin lesions in dermoscopic images is considered as one of the most important steps in computer-aided diagnosis (CAD) for automated melanoma diagnosis. Existing methods, however, have problems with over-segmentation and do not perform well when the contrast between the lesion and its surrounding skin is low. A new automated saliency-based skin lesion segmentation (SSLS) method is proposed, which is designed to exploit the inherent properties of dermoscopic images, which have a focal central region and subtle contrast discrimination with the surrounding regions. The proposed method was evaluated on a public dataset of lesional dermoscopic images and was compared to established methods for lesion segmentation that included adaptive thresholding, Chan-based level set and seeded region growing. Results show that SSLS outperformed the other methods in regard to accuracy and robustness, in particular, for difficult cases. Superpixels are also introduced.

Monday April 27

Integrative Analysis of Somatic Mutations with Focus on Biological Pathways

Show abstract / Hide abstract

The development and severity of cancers depends on the somatic mutations occurring in the tissue. Technologies like whole exome and whole genome sequencing (WES/WGS) have allowed for interrogation of the somatic mutations taken on in a tumour compared to normal tissue in a patient. However, it is clear that some mutations are worse than others, leading to work in identification of genes harbouring ?driver? mutations as opposed to ?passenger? mutations. Further to this there is work in elucidating the role these mutations play in the system as a whole, via integration of mutation, gene expression, and network information (e.g. protein-protein interaction networks), as well as other data sources. In this seminar I will discuss my work up to date on methods that aim to answer these questions, with focus on the melanoma dataset.

Monday April 20

Using Resampling to Fit Better Models

Show abstract / Hide abstract

The weighted bootstrap is one of many procedures for evaluating the goodness of fit of a model. I would like to attempt to highlight how and why this work changed the way I thought about cross-validation and, most importantly, the practical impacts of using a weighted bootstrap for estimating LASSO penalty parameters. Diane Loo's work is highly relevant for anyone that has ever used cross-validation or ever plans to. I will use the prognostic melanoma data to highlight a few of the limitations of cross-validation. Time permitting, some of the issues we have faced when trying to explore and validate the weighted bootstrap will be explained. This work is currently being drafted for journal submission.

Monday April 13

Data Exploration and Subtype Discovery and Prognosis Prediction

Show abstract / Hide abstract

Finding an appropriate measure of association between connected regions of brain resting fMRI datasets. Potential challenges of the project are noted and some exploratory analysis on a cleaned fMRI dataset is shown.