# Statistical Bioinformatics Seminar

The aim of the statistical bioinformatics seminar is to provide a forum for people working within the broad area of computation and statistics and their application to various aspects of biology to present their work and showcase their ongoing projects. It is intended to foster the exchange of ideas and build potential collaborations across multiple disciplines.

To be added to the mailing list, fill out this form. For any other information, please contact Kitty Lo or Ellis Patrick.

## Seminars in 2018, Semester 2

The seminars are held at 1:00 pm on Mondays at the Charles Perkins Centre, Seminar Room (Level 3, large meeting room). The format of the talk is approximately 45 minutes plus questions.

Friday December 14, 2018 (NOTE: Special time and location: 10 - 11AM, Level 4 Large Meeting Room, Charles Perkins Centre)

Speaker: Dr Aaron Lun (Cancer Research UK - Cambridge Institute)

Title: Challenges and future directions in single-cell data analysis

About the speaker: I graduated from the University of Sydney with a Bachelor of Science, majoring in Molecular Biology and Genetics. I did my PhD in Bioinformatics at the Walter and Eliza Hall Institute for Medical Research with Gordon Smyth, working on statistical methods for analyzing ChIP-seq and Hi-C data to study chromatin structure and organization. I am currently working as a research associate with John Marioni at the CRUK Cambridge Institute, developing computational methods for analyzing single-cell RNA sequencing data. I maintain around 14 Bioconductor packages focusing on a range of genomics data analyses, but am probably best known as the cat on the support site.

Monday November 19, 2018

Speaker: A/Prof Jessica Mar (The University of Queensland)

Title: One of these cells is not like the other – how variability of gene expression highlights regulatory control.

Abstract: When studying the transcriptome, our inferences typically revolve around changes in average gene expression. For a population of single cells, modeling gene expression distributions and how their properties differ between phenotypes, can be far more informative than following average trends alone. This talk outlines some of the approaches my lab has developed to investigate how variability of gene expression contributes to our understanding of transcriptional regulation.

About the speaker: Associate Professor Jessica Mar is a Group Leader at the Australian Institute for Bioengineering and Nanotechnology at the University of Queensland in Brisbane. The Mar group focuses on understanding variability in the transcriptome and how this informs regulation of cell phenotypes. Jess received her PhD in Biostatistics from Harvard University in 2008. She was a postdoctoral fellow at the Dana-Farber Cancer Institute in Boston (2008-11), and an Assistant Professor at Albert Einstein College of Medicine in New York (2011-2018). Having only just relocated back to Australia as an ARC Future Fellow this year in July, a major focus of her work is on modelling the aging process using single cell bioinformatics. Jess has received several awards, including a Fulbright scholarship (2003), the Metcalf Prize for Stem Cell Research from the National Stem Cell Foundation of Australia (2017), and the one that she is the proudest of is the LaDonne H. Shulman Award for Teaching Excellence (2017) because the winner is selected by the graduate students at Albert Einstein College of Medicine.

Monday November 12, 2018

Speaker: A/Prof Ruby Lin (The University of Sydney)

Title: Treatment of severe Staphylococcus aureus infections with bacteriophage therapy - Westmead experience.

About the speaker: Ruby joined Iredell lab at the end of October 2017 after a short stint in industry. She is the project manager for Iredell lab and the scientific lead for an investigator-led clinical trial involving treatment of severe Staphylococcal infections using bacteriophage therapy. Her research focus has been microRNA driven dysfunctions in eukaryotic disease model systems including mouse/rat models and humans. She was a named NHMRC Peter Doherty fellow, 2005-8 and UNSW Global postdoctoral fellow, 2009-14. She has acquired >A5.1m in competitive funding. She has 5 seminal papers in BMJ, 1999, Nature, 2010, ATVB, 2010, PNAS, 2012 and Faseb J, 2014, all received media coverage, high impact and have high citation index. She has presented >65 papers at >30 conferences and cross-disciplinary seminars as invited chair and speaker. She is a conjoint Associate Professor at UNSW. During her presidency at Australasian Genomic Technologies Association (AGTA), a prominent society in genomics in Australia and NZ with members from industry and academia, she implemented gender equality at its annual meetings. She is heavily involved in promoting gender balance and women in STEM through various professional networks. She continues to train honours, PhD and postdoctoral researchers. She is a regular guest lecturer at UNSW, UTS and Macquarie University. In her spare time she volunteers as the primary ethics coordinator at her kids’ school and helps P & C with fundraising events. She also does pro bono work as a career coach. Monday November 5, 2018 (No Seminar) Monday October 29, 2018 Speaker: Helen McGuire (The University of Sydney) Title: Single cell analysis with Mass Cytometry; technology introduction and opportunities Primer on Mass Cytometry: The anatomy of single cell mass cytometry data About the speaker: Dr Helen McGuire is a Research Officer at the Ramaciotti Facility for Human Systems Biology, Charles Perkins Centre, an initiative established in 2013 to support the development of mass cytometry and broader systems biology analysis across the University of Sydney campus and wider collaborative links. Her research focus and interest lies in the clinical application of immunological studies to a range of human diseases. Monday October 22, 2018 Speaker: Timothy Peters (Epigenetics Laboratory, Garvan Institute) Title: A general framework for evaluating cross-platform concordance in genomic studies Abstract: The reproducibility of scientific results from multiple sources is critical to the establishment of scientific doctrine. However, when characterising various genomic features (transcript/gene abundances, methylation levels, allele frequencies and the like), all measurements from any given technology are estimates and thus will retain some degree of error. Hence defining a “gold standard” process is dangerous, since all subsequent measurement comparisons will be biased towards that standard. In the absence of a “gold standard” we instead empirically assess the precision and sensitivity of a large suite of genomic technologies via a consensus modelling method called the row-linear model. This method is an application of the American Society for Testing and Materials Standard E691 for assessing interlaboratory precision and sources of variability across multiple testing sites. We analyse a publicly available TCGA dataset containing both sequencing and array technologies, allowing a direct per-technology, per-locus comparison of sensitivity and precision across all common loci. We implement and showcase a number of applications of the row-linear model, including direct comparisons of the sensitivity and precision of these platforms. Our findings demonstrate the utility of the row-linear model in evincing varying levels of concordance between measurements on these platforms, serving as a process for identifying reproducibility caveats in studies where cross-platform validation is performed. About the speaker: Tim's background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. He is currently a bioinformatician/statistician in the Immunogenomics group at Garvan Institute of Medical Research. Current interests include single-cell methylome and transcriptome analysis, and reproducibility of genomic studies. Monday October 15, 2018 Speaker: Gene Hart-Smith (University of New South Wales) Title: The promise and pitfalls of big proteomics data: a case study centred on protein methylation Abstract: The field of proteomics is reliant on computationally intensive analyses of large datasets. A particular focus is on the accurate identification of peptides from large datasets of tandem mass spectrometry (MS/MS) spectra, which are typically collected in high-throughput LC-MS/MS experiments. The data analysis workflow that has been developed to meet this challenge – the ‘sequence database search’ – is considered a cornerstone of contemporary proteomics research. Despite their near-ubiquity, sequence database searches can consistently go wrong. For example we recently showed that when sequence database searches are applied to the identification of post-translational protein methylation, false discovery rates are unavoidably high. This particular defect of the sequence database search has resulted in a plethora of false information entering the mainstream scientific literature. The reasons behind this defect will be discussed, together with specific and practical means by which this defect can be overcome. About the speaker: Dr Gene Hart-Smith is a recent ARC Discovery Early Career Researcher Award holder working within the UNSW School of Biotechnology and Biomolecular Sciences. In 2010 he completed a PhD at the UNSW School of Chemistry, in which he utilised mass spectrometry as a primary tool to investigate synthetic polymer formation processes. He has since been applying his expertise in mass spectrometry to the study of biological systems. Gene’s current research is centred around the examination of protein-protein interaction networks. He is particularly interested in how post-translational modifications regulate the dynamics of these networks, and is developing and applying mass spectrometric methods towards the investigation of this phenomenon. Monday October 8, 2018 Speaker: Chelsea Mayoh (Children's Cancer Institute) Title: The Complexity of Identifying Targetable Genes in the Paediatric Transcriptome Abstract: Molecular profiling of childhood cancers allows for personalised treatments based on targets found through Whole Genome and RNA sequencing. The accurate identification of germline/somatic mutations, copy number amplifications/deletions and structural variations is possible through the availability of matched tumour-normal pairs. However, identification of up-/down-regulated genes poses a challenge without having a matched normal. In this talk I will speak about the limitations, advantages and complexity of the transcriptome and utilising it to identify potential drug targets for paediatric patients through the Zero Childhood Cancer Program. About the speaker: Chelsea is the lead bioinformatician at the Children's Cancer Institute. In addition to managing the Bioinformatician team she is one of the key bioinformaticians involved with the Zero Childhood Cancer program. At the Children's Cancer Institute, she works on a wide variety of childhood cancers with a heavy focus on CNS tumours, Neuroblastoma and Leukaemia's performing various kinds of bioinformatic analysis. She is also the Bioinformatician/Biostatistician on several Study Committees through the Sydney Children's Hospital Clinical Trials Program. Prior to coming to Australia in 2015 she was at the Genome Sciences Centre in affiliation with the BC Cancer Agency in Canada. Monday October 1, 2018 (No seminar - Labour Day public holiday) Monday September 24, 2018 (No seminar) Monday September 17, 2018 Speaker: James Cornwell (The University of Sydney) Title: Quantifying intrinsic and extrinsic control of single cell fates by time- lapse imaging, single-cell tracking, and competing risks analysis Abstract: The molecular control of cell fate and behaviour is a central theme in biology. Inherent heterogeneity within cell populations requires that control of cell fate be studied at the single-cell level. Time-lapse imaging and single-cell tracking are powerful technologies for acquiring cell lifetime data, allowing quantification of how cell-intrinsic and extrinsic factors control single-cell fates over time. However, cell lifetime data contain complex features. Competing cell fates, censoring, and the possible inter-dependence of competing fates, currently present challenges to modelling cell lifetime data. Thus far such features are largely ignored, resulting in loss of data and introducing a source of bias. In this seminar I will talk about how competing risks and concordance statistics, previously applied to clinical data and the study of genetic influences on life events in twins, respectively, can be used to quantify intrinsic and extrinsic control of single-cell fates. About the speaker: James completed a Bachelor of Mechatronic Engineering and a Master of Biomedical Engineering in 2012 from the University of New South Wales (UNSW). During his studies James undertook research internships at the Australian Nuclear Science and Technology Organisation (Sydney), the Jozef Stefan Institute (Slovenia), and at ETH Zurich (Switzerland). In 2012 James started his PhD at the Victor Chang Cardiac Research Institute (VCCRI) under the supervision of Professor Richard Harvey and Dr Robert Nordon. His PhD focused on characterising the growth dynamics of cardiac stem cells by time-lapse imaging and single-cell tracking. James established this technology at VCCRI; constructing a methodological pipeline for analysis of single- cell growth dynamics. In 2016 James completed his PhD and joined the School of Dentistry, Faculty of Medicine and Health, University of Sydney as an Associate Lecturer. James’ research currently focuses on developing tools for recording and analysing single cell dynamics and applying these tools to study stem and cancer cell biology. Monday September 10, 2018 Speaker: Ignatius Pang (University of New South Wales) Title: Benchmarking Protein Correlation Profiling datasets against reference protein complexes: case studies in S. cerevisiae. Abstract: Protein Correlation Profiling (PCP) is a method which enables many protein complexes to be identified in single experiments, unlike other methods such as affinity purification-mass spectrometry, which involves ‘one-at-time’ affinity purifications of tagged-proteins. A typical PCP experiment involves fractionation of endogenous and untagged protein complexes by size or other physiochemical parameters, followed by LC-MS/MS and label-free quantification of each fraction. Proteins in the same intact complex are co-eluted and often have high correlation in protein abundance across multiple fractions. Although this information can help identify intact complexes, doing so is computationally challenging. For example, machine learning strategies used to identify complexes from PCP datasets can have high false positive rates for novel complexes (Shatsky et al. 2016 MCP 15.6:2186-02). The aim of this study was to develop a framework for benchmarking PCP datasets against high-quality sets of reference protein complexes. This approach, which we predominantly applied using the large-scale reference sets of protein complexes available for Saccharomyces cerevisiae (e.g. Benschop et al. 2010 Mol. Cell. 38: 916-928), enabled us to evaluate the quality of PCP datasets, identify known protein complexes with high confidence, and develop guidelines on the choice of correlation metrics and fractionation approaches used to interpret and collect PCP datasets. About the speaker: Igy is a postdoctoral research associate at the Systems Biology Initiative at UNSW, led by Prof. Marc Wilkins. Igy’s current role involves collaborating with bioscience researchers who are interested in the analysis of -omics datasets. His expertise involves the co-analysis of multiple -omics datasets in conjunction with multiple types of biological networks, for example, signaling, regulatory, and protein-protein interactions networks. His current projects include identifying the potential link between gene mutations and the side-effects of an antipsychotic medication and analyzing the role of the virome on the onset of type-1 diabetes among infants. Prior to his postdoc he had 2 years of experience in audit data analytics and fraud detection. Monday September 3, 2018 Speaker: Serigne Lo (The University of Sydney and the Melanoma Institute of Australia) Title: Competing risks analysis with missing event types - penalized likelihood estimation of cause-specific Cox models Abstract: Competing risks models provide attractive tools to analyze time-to-event data where each subject faces multiple event types. The models are useful when assessing the burden and etiology attributable to a specific disease. However, a complexity may arise when the event type for some subjects are missing but their event times are observed. Assuming the unobservable event types are missing-at-random, we develop a novel constrained maximum penalized likelihood estimates for semi-parametric cause-specific Cox regression models. Penalty functions are used to smooth the baseline hazards. The appealing feature of our approach is that all relevant estimates in competing risk setting are provided including regression coefficients and cause-specific baseline hazards. Asymptotic results of these estimates are also developed. Through intensive simulations, we demonstrated the superiority of our method compared to some existing methods. We illustrate the new method using data from melanoma patients who faced two competing risk events: melanoma versus non-melanoma causes of death. About the speaker: Dr Serigne Lo is a Senior Research Fellow in Biostatistics at the University of Sydney. He leads the Research & Biostatistics Group at the Melanoma Institute Australia. He has accumulated +15 years of academic teaching/research experience. Dr Lo provides leadership in the conduct of cutting edge biostatistical methods and support across the institute. Dr Lo is interested in the development of new statistical methods and his personal research includes: Clinical trials, Adaptive design, Multistate modelling, and Joint-modelling. Monday August 27, 2018 Speaker: Joshua Ho (Victor Chang Cardiac Research Institute) Title: Scalable bioinformatics methods for single-cell RNA-seq analysis Abstract: TBA About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS). Monday August 20, 2018 (Cancelled) Speaker: Chelsea Mayoh (Children's Cancer Institute) Title: The Complexity of Identifying Targetable Genes in the Paediatric Transcriptome Abstract: Molecular profiling of childhood cancers allows for personalised treatments based on targets found through Whole Genome and RNA sequencing. The accurate identification of germline/somatic mutations, copy number amplifications/deletions and structural variations is possible through the availability of matched tumour-normal pairs. However, identification of up-/down-regulated genes poses a challenge without having a matched normal. In this talk I will speak about the limitations, advantages and complexity of the transcriptome and utilising it to identify potential drug targets for paediatric patients through the Zero Childhood Cancer Program. About the speaker: Chelsea is the lead bioinformatician at the Children's Cancer Institute. In addition to managing the Bioinformatician team she is one of the key bioinformaticians involved with the Zero Childhood Cancer program. At the Children's Cancer Institute, she works on a wide variety of childhood cancers with a heavy focus on CNS tumours, Neuroblastoma and Leukaemia's performing various kinds of bioinformatic analysis. She is also the Bioinformatician/Biostatistician on several Study Committees through the Sydney Children's Hospital Clinical Trials Program. Prior to coming to Australia in 2015 she was at the Genome Sciences Centre in affiliation with the BC Cancer Agency in Canada. Monday August 13, 2018 Speaker: Dr Boris Guennewig (Brain and Mind Centre, The University of Sydney) Title: Bioinformatics on multimodal data sets in a longitudinal ageing and neurodegeneration clinic Bio: Dr Boris Guennewig leads the Forefront Bioinformatics team at the BMC. Boris is a Senior Lecturer at the University of Sydney and Conjoint Lecturer at UNSW. He received a diploma in chemistry from the University of Munster and secured a very competitive position at the Max Planck Institute for Molecular Biomedicine, which shifted his focus to immunology and inflammatory processes in human disease. After this, he transitioned to the Swiss Federal Institute of Technology as a PhD candidate in the laboratory of Prof J Hall, where he worked on microRNA biogenesis and their functions in human disease. He was then recruited by Prof J Mattick to the Garvan Institute to work with Prof J Mattick & A Cooper as well as Glenda Halliday on deriving diagnostic biomarkers from various neurodegenerative diseases. Boris is additionally the lead bioinformatician/consultant for the International Cerebral Palsy genetics consortium, a member of the Australian Genomics Health Alliance and the founder of the analytics/bioinformatics company Pacific Analytics PTY LTD (Australia). Research Interests: I am a research scientist/bioinformatician/statistician specializing in the development of infrastructure, software and pipelines to manage, analyze and mine large complex datasets in medical research. Using structured, semi-structured and unstructured data, my research focuses on the identification and characterization of genetic variation and transcriptional changes influencing complex human diseases (such as frontotemporal lobe dementia, bipolar disorder, Parkinson’s and Alzheimer’s disease, etc.). I achieve this through the functional integration of high-dimensional biological (omics) data, in combination with my statistical, genetics and data mining skills. I believe that assimilating and modelling multi-modal data (i.e. imaging, clinical and omic data) is key to uncovering the genotype-phenotype interaction and how this relationship affects complex traits. ## Seminars in 2018, Semester 1 Show talks from Semester 1 / Hide talks from Semester 1 Friday June 8, 2018 (Special Statistical Bioinformatics Seminar - 11:20AM - Carslaw 829 Access Grid Room) Speaker: Associate Professor Bradley Broom (MD Anderson Cancer Center, USA) Title: Bioinformatics analysis tools Abstract: This Friday we will be hosting Associate Professor Bradley Broom from the Department of Bioinformatics and Computational Biology at MD Anderson Cancer Center for a special Statistical Bioinformatics Seminar. The Seminar will start at 11:20 in the AGR. See below for a description of Bradley’s proposed talk. In this talk I will describe two bioinformatics tools being developed at MD Anderson Cancer Center. Clustered heat maps were developed for biomedical data in the early 1990s. They are now the most widely used visualization for molecular profiling data. But they are fundamentally static objects. We have created Next-Generation Clustered Heat Maps that include capabilities for interactively zooming and navigating the clustered heat map, for adjusting its color scale or other display parameters, and for interrogating the data in, or behind, the contents of the heat map. I will describe the key features of Next-Generation Clustered Heat Maps and the tools for generating them. The bioinformatics analysts at MD Anderson perform many hundreds of detailed bioinformatics analyses per year. We also frequently receive requests to update an analysis or to repeat a substantially similar analysis on new data. These new requests can arrive months, years, or even decades after the original analysis was performed. Access to the original analysis is often crucial to reproducing and updating the earlier analysis, but locating the original analysis can be challenging if a long time has elapsed since the original analysis and/or the analyst who performed the initial analysis has left the institution. In this talk I will describe FjORD, a meta-information system we have developed to record bioinformatics analyses as well as to search for and retrieve previous analyses. I will also describe future goals for enforcing reproducible analyses. Monday May 28, 2018 Speaker: Mathieu Fourment (ithree Institute, University of Technology, Sydney) Title: New methods in phylogenetic inference Abstract: Markov chain Monte Carlo (MCMC) algorithms have been the workhorse of Bayesian inference in phylogenetics for almost two decades. Although these algorithms have been successfully used in a wide range of applications they do not scale well to large numbers of sequences. In this talk I will present some of my work on sequential Monte Carlo algorithms and approximate inference using the variational Bayesian framework. About the speaker: Mathieu is a data scientist at the University of Technology Sydney. He obtained his PhD at Macquarie University in 2010 and had research positions at the University of California San Diego, Duke-NUS, and the University of Sydney. His research interests include phylogenetics, variational inference, and probabilistic modelling. Monday May 21, 2018 Speaker: Vinita Deshpande and Tom Geddes (The University of Sydney) Title: (i) Good and bad fat: discovering key distinguishing features with proteomics; (ii) Ensemble deep learning for integrating heterogeneous omics data Abstract: (i) Subcutaneous (SC) and visceral (VIS) fat cells store energy as fat and affect whole body metabolism. Excessive VIS fat is associated with insulin resistance, a precursor to Type 2 diabetes. In contrast, SC fat may be protective. Despite these important physiological functions, relatively little is known about the molecular features that distinguish these discrete types of fat cells. We have used mass spectrometry to construct proteomes of SC and VIS fat cells from mouse models representing healthy and diabetic states. These proteomes consist of over 7,500 quantified proteins spanning six orders of magnitude. By coupling with statistical approaches, we stratified the proteome into groups defined by the factor(s) driving the differences in protein expression, which revealed interesting functional differences. This analytic approach also serves as a computational framework to answer biological questions in a comprehensive and systematic manner. Thus we demonstrate the utility of this proteomic resource and analytic approach in uncovering novel insights into fat cell biology. (ii) Artificial neural network models are capable of learning high-accuracy classifiers over inputs with complex information structure given a large number of training examples, and have become increasingly popular across a wide variety of applications. Ensemble learning methods can be used to increase classification accuracy and robustness by combining outputs from a collection of individual models trained differently on the same data, often proving more reliable than a single model. In the field of biology, multiple datasets are often available pertaining to highly overlapping sets of molecular species (such as proteins or RNA transcripts) but differing in experimental origin, and containing potentially orthogonal/non-redundant information. We explore the use of ensemble deep learning to draw on mass-spectrometry datasets from a variety of sources to increase classification accuracy About the speaker: (i) Vinita Deshpande is a PhD student at the Charles Perkins Centre at the University of Sydney, where she is using quantitative proteomics to investigate fat cell biology. Her research interests lie in the application of systems biology methods to understand human diseases. (ii) Tom Geddes is a PhD student at the Charles Perkins Centre at the University of Sydney, where he is using deep learning methods to predict protein-protein interactions. He is interested in better understanding the structure of complex biological systems by leveraging large, information-rich datasets. Monday May 14, 2018 Speaker: Susan Corley (University of New South Wales) Title: Working with ‘Salmon’ a new fast tool to quantify RNA-Seq reads Abstract: Quantification of sequenced reads is the first vital step in undertaking a gene expression experiment using RNA-Seq. Recently, methods including kallisto (Bray et al., 2016) and Salmon (Patro et al.,2017) have been introduced which calculate transcript abundance without full mapping of reads to the genome. In this talk I will give some examples of my experience using Salmon and I will compare these results to read mapping produced using Tophat2. About the speaker: Susan is a Senior Research Associate in the Systems Biology Initiative (SBI) at UNSW. She commenced with the SBI in 2012 following completion of her PhD in biomedical science and biochemistry at the John Curtin School of Medical Research (ANU). Prior to this she completed her Bachelor of Science with majors in chemistry at the University of Sydney. Susan’s primary research interest is in understanding the genes and pathways involved in human health and disease through employing Next Generation sequencing techniques. Susan’s research projects cover a wide breadth and have included gene expression analysis relating to Crohn’s disease, Schizophrenia, Williams-Beuren syndrome, Kerataconus, Immunity related conditions and cancer cachexia. Susan worked as a lawyer before commencing her studies and work in science. Monday May 7, 2018 Speaker: Alex Lancaster (Ronin Institute and the University of Sydney) Title: Modeling cellular systems: stochastic approaches in evolutionary research and therapeutic applications Abstract: Stochasticity - random noise at the cellular, developmental and organismal level - plays an under-appreciated role in both evolutionary and biomedical research. Alex will present summaries of his work in the computational modeling of stochasticity in cellular systems, including yeast prions, signalling and gene networks and sepsis. Throughout, the role of noise in shaping both evolutionary trajectories, as well as therapeutic intervention are highlighted. The benefits of bottom-up, agent and rule-based strategies in both academic and commercial R&D contexts to help illuminate these questions are also discussed. About the speaker: Alex Lancaster is a Visiting Scholar at the University of Sydney, a Research Scholar at the Ronin Institute, and a Partner at Cambridge, Massachusetts-based consulting company, Amber Biology. Following undergraduate degrees at the University of Sydney, he received his Ph.D. from University of California, Berkeley and has held research positions at the Santa Fe Institute, the Whitehead Institute at MIT, and Harvard Medical School. His research interests are in the intersection of evolutionary theory, systems biology and agent-based modeling. Monday April 30, 2018 Speaker: Joseph Cursons (The Walter and Eliza Hall Institute) Title: Post-transcriptional control of epithelial-mesenchymal transition through combinatorial miRNA targeting Abstract: MicroRNAs (miRNAs) are small, non-coding RNAs with an important role in post-transcriptional regulation, targeting mRNAs for degradation and/or inhibiting their translation. Working with a model of TGF-β induced epithelial-mesenchymal transition in human mammary epithelial cells, we identified a set of miRNAs that appear to be co‑regulated with induction of an invasive mesenchymal phenotype. Computational analyses show that these miRNAs are coregulated across clinical breast cancer samples from the TCGA as well as a wider set of primary cell lines. Furthermore, analysis of high‑confidence predicted targets (based upon miRNA:mRNA sequence complement) suggests that these miRNAs share several targets, and many of their targets also interact at the protein level. Investigating this result, we selected several pro-epithelial miRNAs with evidence of co-targeting and demonstrated that combinatorial treatment could alter cellular phenotype with ectopic miRNA concentrations several orders of magnitude below what is typically used, and much closer to endogenous levels. This work suggests that cooperative targeting by miRNAs may be an important factor for their physiological function, and future work attempting to classify miRNA function should consider such combinatorial effects. About the speaker: Joe Cursons is a Senior Research Officer in the Davis Laboratory within the Bioinformatics Division of the Walter and Eliza Hall Institute. He obtained his PhD from the Auckland Bioengineering Institute in 2012 and his research interests are centred on the regulatory control systems dysregulated during the progression and metastasis of breast cancer and melanoma. Much of Joe’s work involves the analysis of sequencing data to identify mechanisms of drug sensitivity and drug resistance for cancer treatment. Monday April 23, 2018 Speaker: Bobbie Cansdale (The University of Sydney) Title: From CTCF to 3D Modelling: Investigating the mammalian genome in three dimensions Abstract: The three-dimensional structure of the mammalian genome is non-random and important for several key biological processes including the regulation of gene expression. Determining this structure, as well as the sequence itself, is necessary to further genome biology research. Topologically associating domains (TADs) are a main feature of chromatin organisation. These are clusters of genes that are functionally co-regulated, with boundary regions enriched for features such as CTCF binding sites, transfer RNAs, and SINE retrotransposons. Chromosome conformation capture (3C) based approaches, including Hi-C, can provide valuable insight into the spatial organisation of chromatin fibre. Computational frameworks have recently become available to use this data to create 3D representations of the genome, providing novel insights compared to the standard interaction matrices alone. Knowledge of these structures will allow investigation as to how they relate to the nearby genes and other genomic features. Here I will focus on bioinformatics strategies and possible future applications of this work using examples from the canine genome. About the speaker: Bobbie Cansdale is a PhD student in computational biology and animal genomics at the University of Sydney. She completed a Bachelor of Animal and Veterinary Bioscience (Honours) at the University of Sydney in 2015. Her current focus is on canine genomic research. She is interested in the modelling of chromatin architecture, genomic data analysis, novel sequencing methods, and the integration of various data types to better answer questions. Monday April 16, 2018 Speaker: Sarah Beecroft (The University of Western Australia) Title: Gene Hunting- Why don't we know all disease genes yet? Abstract: Rare genetic diseases include some of the most debilitating disease to affect humans, with onset ranging from before birth to old age. Although the human genome has now been mapped, finding mutations that cause rare diseases is still hugely challenging. About 50% of patients are without a diagnosis. Sarah will discuss the bioinformatic strategies used in this field, and the future of disease gene discovery. About the speaker: Sarah Beecroft is based in the Molecular Neurogenetics lab at the Harry Perkins Institute of Medical Research, Perth. She works to discover new disease genes in patients with nerve and muscle diseases. She is interested in finding mutations in the non-coding regions of the genome, a vast unknown in rare disease genetics. Monday April 9, 2018 Speaker: Shila Ghazanfar (The University of Sydney) Title: Investigating combinatorial expression of delta-protocadherins in single olfactory sensory neurons Abstract: Single cell RNA-Sequencing (scRNA-Seq) has enabled unprecedented insight into the behaviour of individual cells on the scale of the entire transcriptome. Such precision offers an opportunity to explore cell-specific heterogeneity, however two distinct features arise from such data: (1) hyperinflation of identically zero counts for the majority of genes for any given cell, and (2) an apparent bimodal distribution of non-zero counts. Both features are unique to scRNA-Seq, and warrant further development of statistical tools in order to answer biological questions of interest. We propose a mixture modelling framework to classify cells into three transcriptional states for each gene: (1) no, (2) low, and (3) high gene expression. This approach has the potential to reveal the cell-specific dynamics of RNA transcription (bursting) and degradation, as well as acting as a cross-dataset standardisation. We utilised a number of publicly available scRNA-Seq datasets, stemming from mouse neuronal cell populations, to perform the mixture model comparison, assess highly and lowly variable genes, and to estimate cell networks via a uniqueness thresholding. This work is in the context of understanding how olfactory sensory neurons (OSNs) interact with each other during embryonic development of the mouse olfaction system. In particular, we study the role that combinatorial expression of genes in the delta protocadherin gene subfamily plays in mediating cell-cell adhesion. Further, we utilise distinct guiding principles to build a Monte Carlo simulation of this cell-cell adhesion behaviour, and assess it's suitability. This addresses the larger question of how combinatorial gene expression specifies specific cell types and tissues. About the speaker: Shila Ghazanfar has recently completed her PhD in Statistical Bioinformatics at The University of Sydney and is currently a Research Associate in the Judith and David Coffey Lifelab and School of Mathematics and Statistics. Her research interests are in statistical analysis of data arising from high throughput sequencing technologies such as RNA-Seq in various research contexts. Monday March 26, 2018 Speaker: Kitty Lo (The University of Sydney) Title: Novel alternative splicing in TDP-43 mutant mouse models of ALS Abstract: TDP-43 (encoded by TARDBP) is an RNA binding protein central in the pathogenesis of the neurodegenerative disorder amyotrophic lateral sclerosis (ALS). However, how TARDBP mutations trigger pathogenesis remains unknown. Here, we use novel mouse mutants carrying point mutations in Tardbp to dissect TDP-43 function. Interestingly, we find that TDP-43 C-terminal mutations lead to a gain of splicing function. Using two different strains we are able to separate TDP-43 loss and gain of function effects. This new gain-of-function induces a novel category of splicing events, here termed skiptic exons, in which skipping of constitutive exons occurs, causing expression changes. Our findings provide a novel pathogenic mechanism and highlight how gain- and loss-of TDP-43 function affect RNA processing differently, suggesting they may play roles at different disease stages. About the speaker: Kitty Lo is currently a bioinformatics postdoctoral researcher in the Faculty of Science. Prior to this, she was at the University College London and the UCL Institute of Neurology. She has also worked in a Cambridge based biotechnology startup where she developed cancer diagnostic tools using ctDNA. Kitty has a PhD in astronomy from the University of Sydney. Monday March 19, 2018 Speaker: Dana Pascovici (Macquarie University) Title: DIA/SWATH - challenges and opportunities for bioinformatics Abstract: Protein quantitation using DIA/SWATH mass spectrometry has been growing in popularity over the last few years. From the point of view of the bioinformatics involved, on one hand the data resulting from such experiments is quite easy to analyse at least if the experiment is not too large, due to a much lower percentage of missing data, and data look and distribution that makes existing methodology from other areas quite easily applicable. Put plainly, extracted SWATH data is quite nice to work with. However, that is because much of the difficulty has been pushed underneath, at the level of the SWATH library building and data extraction, where it is somewhat hidden from view. In this talk we will describe SWATH and its place in the landscape of quantitative proteomics (including broad comparisons with label free and labelled techniques such as iTRAQ and TMT), and the many positive aspects of the resulting SWATH datasets, from the point of view of the data analyst. We will also focus on how SWATH data extraction usually relies on using high quality peptide MS/MS spectral libraries, however building such libraries to ensure good proteome coverage can be time consuming and expensive. In order to address this issue various computational approaches for merging archived or external libraries were created and evaluated, including efforts from our group. We will describe the appeal of such methods, the possible issues that can ensue and some approaches to tackle them in order to ensure that the proteins are reliably detected and their quantitation is consistent and reproducible. We will discuss these aspects in the context of several existing datasets, including a carefully designed spiked-in experiment, and a recently published large plasma proteomics experiment containing samples from neonates, young children and adults. About the speaker: I am currently a Biostatistician at the Australian Proteome Analysis Facility at Macquarie University, where I help people generate biological insights out of their proteomics data, especially in the context of complex experiments. Working in a proteomics facility, our focus has been on generating reliable methods of interpreting and analysing data from a variety of platforms, lately emphasizing SWATH and TMT, and wherever possible incorporating them into software workflows. Areas of particular relevance to us have been plasma proteomics, and plant proteomics of agriculturally important species. Our work has benefitted from interactions with researchers, students and the APAF team of mass spectrometry specialists and analytical chemists. I come from a mathematical and computational background, having completed a bachelor degree in Mathematics and Computer Science at Dartmouth College in the US, followed by a PhD in Mathematics at MIT, and a brief stint of teaching at Purdue. In Sydney I took a more practical turn and worked in the industry in the area of speech recognition, before settling into biostatistics for the past 13 years, both in the industry and research environment. ## Seminars in 2017, Semester 2 Show talks from Semester 2 / Hide talks from Semester 2 Monday November 27, 2017 Speaker: Beth Signal (Garvan Institute of Medical Research) Title: Machine learning annotation of branchpoints and in silico modelling of functional splicing events. Abstract: RNA splicing is a key component of mature RNA transcript formation, required for the removal of intronic regions and subsequent ligation of exonic regions. This process can also allow for alternative splicing to occur, where different exonic regions are ligated together to produce alternative RNA products. The branchpoint element is one of the splicing sequence elements, required for the first lariat-forming reaction in splicing. However current catalogues of human branchpoints remain incomplete due to the difficulty in experimentally identifying these elements. To address this limitation, we have developed a machine-learning algorithm - branchpointer - to identify branchpoint elements solely from gene annotations and genomic sequence. Using branchpointer, we annotate branchpoint elements in 85% of human gene introns with sensitivity (61.8%) and specificity (97.8%). In addition to annotation, branchpointer can evaluate the impact of SNPs on branchpoint architecture to inform functional interpretation of genetic variants. Branchpointer identifies all published deleterious branchpoint mutations annotated in clinical variant databases, and finds thousands of additional clinical and common genetic variants with similar predicted effects. While alternative splicing can produce alternative RNA products, a large proportion of these have little functional impact on open reading frames or transcript stability. To address this limitation in the functional interpretation of differential splicing analyses, we have developed software to model events in silico and interpret their functional impact. About the speaker: Beth is a PhD Student in the Clinical Genome Informatics group at the Garvan Institute. Her current research is focused on developing bioinformatics methods to understand how transcript splicing and expression is controlled. She has a particular interest in using machine learning techniques to study transcriptomic behaviour. Monday November 20, 2017 (PLEASE NOTE: Special second talk, Location = Carslaw Access Grid Room (Level 8), Time: 4pm) Speaker: Sonja Greven (LMU Munich) Title: Multivariate Functional Principal Component Analysis for Data Observed on Different (Dimensional) Domains Abstract: Existing approaches for multivariate functional principal component analysis are restricted to data on the same one-dimensional interval. The presented approach focuses on multivariate functional data on different domains that may differ in dimension, e.g. functions and images. The theoretical basis for multivariate functional principal component analysis is given in terms of a Karhunen-Loève Theorem. For the practically relevant case of a finite Karhunen-Loève representation, a relationship between univariate and multivariate functional principal component analysis is established. This offers an estimation strategy to calculate multivariate functional principal components and scores based on their univariate counterparts. For the resulting estimators, asymptotic results are derived. The approach can be extended to finite univariate expansions in general, not necessarily orthonormal bases. It is also applicable for sparse functional data or data with measurement error. A flexible R implementation is available on CRAN. The new method is shown to be competitive to existing approaches for data observed on a common one-dimensional domain. The motivating application is a neuroimaging study, where the goal is to explore how longitudinal trajectories of a neuropsychological test score covary with FDG-PET brain scans at baseline. Supplementary material, including detailed proofs, additional simulation results and software is available online. About the speaker: http://greven.userweb.mwn.de/research.html Monday November 20, 2017 (PLEASE NOTE: Special location - Level 5 Large Meeting Room, Usual time: 1pm - 2pm) Speaker: Elizabeth Mason (The University of Melbourne) Title: Modelling transcriptional variability in single cell RNA-seq data during human embryogenesis captures changes in the regulation of critical developmental genes Abstract: Human development is a temporally and spatially ordered series of events that occur with remarkable precision; the same DNA blueprint gives rise to more than 250 sharply defined cell phenotypes. At the functional phenotype level embryogenesis appears predictable because we observe the average behaviour of many individual cells, even as the number of cells, the range of phenotypes and transcriptional complexity increases during the course of development. When we evaluate single molecules and transcripts that the stochastic nature of gene expression is revealed, for example in single cell RNA-seq experiments (scRNA-seq). Current methods reduce scRNA-seq data to a well-defined trajectory based on the abundance of key regulators of phenotype, and differential abundance between cells in a given phenotype is used to identify sub-populations. Here we present an alternative approach: that measuring the transcriptional variability at the gene level informs the level of regulation imposed on it, reflecting an intrinsic property of development that is often overlooked. While linear models have been a successful framework to characterize differences in abundance between phenotypes on average, they do not account for stochastic differences captured by scRNA-seq experiments. Accurately determining abundance and variability is further complicated by the sparseness of non-zero expression values. To address these challenges and evaluate gene expression during human pre-implantation embryogenesis, we applied a statistical mixture model to scRNA-seq data. Fitting the model on a gene-by-gene basis allowed us to evaluate shifts in the proportion of cells expressing a given gene (λ), and also the mean (μ) and standard deviation (σ) of expression. From here, a correlation based analysis evaluated whether abundance (μ) and variability (σ) capture different aspects of transcriptional regulation. While each metric largely identified the same genes, the number and nature of relationships between them differed. Indeed, genes sharing correlated patterns of variability during development were enriched for motifs associated with developmental transcription factors (e.g. HIC2, PPARG, E2F4 and ZNF692). Variability was more effective than abundance at specifically detecting regulatory relationships during development, and with less redundancy. Our approach provides a gene-centric platform to evaluate population-based parameters of gene expression, while preserving the complexity of scRNA-seq data. About the speaker: Lizzi began her career in human genomics as a laboratory manager and laboratory technician with Professor Greg Gibson (Centre for Integrative Genomics, Georgia Tech University). She conducted 2 investigations in Australia which identified maternal influences on development of the neonate immune system, and uncovered population structure of the leukocyte transcriptome. Together with scientists at Emory University, Greg and Lizzi initiated the CIG’s involvement in the WHOLE (Wellness and Health Omics Linked to the Environment) study of Predictive Health Genomics in Atlanta (USA) which is currently in its 6th year. Lizzi has recently completed a PhD in systems biology of human stem cells at the Australian Institute for Bioengineering and Nanotechnology at the University of Queensland. Her PhD project formed an international collaboration with Professor Christine Wells (University of Melbourne AUS), stem cell biologists Professors Martin Pera (Jackson Laboratory USA) and Ernst Wolvetang (University of Queensland AUS), biostatistician Assistant Professor Jessica Mar (Albert Einstein College of Medicine, USA) and computational biologist Professor John Quackenbush (Harvard University, USA). Her primary focus is evaluating whether molecular variability in stem cell populations describes an important, but until now hidden predictor of cellular behaviour and phenotype. Phenotypic heterogeneity in clonally derived cell populations is ubiquitous, and biologically relevant information is often masked by using population-averaging techniques, versus individual cell based measurements. She has developed new network approaches which incorporate gene expression variance, with the goal of identifying genetic elements which stabilize a cell phenotype, and push a cell to transition between phenotypes. During her PhD Lizzi has been invited to present her work in departmental seminars at the Harvard Stem Cell Institute, the Lieber Brain Institute at Johns Hopkins University, and the Black Family Stem Cell Institute at Mt Sainai Hospital New York. She was also one of 12 international scientists who were invited to participate in the Radcliffe Exploratory Workshop for Variation at Harvard University in 2011. She is currently based with Professor Christine Wells in the Centre for Stem Cell Systems at the University of Melbourne, where she is working on applied statistical methods to evaluate molecular variability in single cell RNA-seq data. Monday November 13, 2017 (No seminar) Monday November 6, 2017 Speaker: Shila Ghazanfar (The University of Sydney) Title: Integrated single cell data analysis for understanding mechanisms of neuronal diversity Abstract: Technological advances such as large scale single cell transcriptome profiling have exploded in recent years and enabled unprecedented insight into the behaviour of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterise very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterisation of high gene expression, as well as gene coexpression. In this talk, I will describe a versatile modelling framework to identify transcriptional states motivated by a collaborative project involving neuronal single cell data. Neuronal cell systems exhibit extraordinary levels of complexity. Thus it is of great interest to explore the ways in which this neuronal diversity is generated and manifested to achieve such complexity. One proposed mechanism is patterns of gene transcription across neurons. We will describe an approach using bioinformatics and statistics to evaluate evidence of gene transcriptional mosaics as a mechanism for achieving diversity of neuronal cells. About the speaker: Shila has recently completed her PhD in Statistical Bioinformatics at The University of Sydney. Her research interests are in statistical analysis of data arising from high throughput sequencing technologies such as RNA-Seq in various research contexts. Monday October 30, 2017 (No seminar) Monday October 23, 2017 Speaker: Eva Chan (Garvan Institute) Title: Detecting Complex Genomic Rearrangements using Optical Mapping Abstract: Genomic rearrangements are common in cancer, with demonstrated links to disease progression and treatment response. These rearrangements can be complex, resulting in fusions of multiple chromosomal fragments and generation of derivative chromosomes. Comprehensively detecting complex genomic rearrangements (CGR) in cancer remains challenging. No single approach can comprehensively identify all structural variations as each has their strengths and weaknesses. In this seminar, I will demonstrate the utility of whole genome optical mapping in capturing CGR. I will further showcase an example using optical mapping to capture chained fusion events in a well-studied liposarcoma cell line. Using this approach, we identified fusion maps that clearly revealed chained fusion architectures (content, order, orientation, and size), as well as large rearrangement junctions that are undetectable by sequencing alone. I hope to convince you that optical mapping is an important complement to existing technologies for detecting and reconstructing complex genomic rearrangements. About the speaker: Senior Bioinformatics Research Officer, Human Comparative and Prostate Cancer Genomics Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research, The Kinghorn Cancer Centre Monday Oct 16, 2017 (PLEASE NOTE: Special time of 2:00PM) Speaker: Natalie Thorne (Melbourne Genomics) Title: Clinical bioinformatics – what does it really take to translate research into practise? Abstract: Melbourne Genomics Health Alliance has taken a collaborative, patient-centred, clinically-driven, evidence-based and sustainable approach to delivering genomic testing. This year the Alliance has commenced implementing Victoria’s new clinical system for genomics. A platform for bioinformatics analysis and a tool for variant curation will be among the first components to be implemented and used for accredited clinical genomic testing by diagnostic laboratories. Operating within this shared digital system however, presents a challenge for laboratories to simultaneously coordinate with other diagnostic laboratories and hospitals, whilst also supporting their own business requirements for accreditation and continual innovation. At the heart of diagnostic innovation in genomics is the emerging field of clinical bioinformatics; combining clinical, diagnostic, analytical, software and genetic aspects to implementing clinical genomic testing. The field has two key challenges: first, it is in its infancy and laboratories lack the support of a mature discipline; second, it demands skills and expertise predominantly lacking in traditional academia. These include developing enterprise-grade solutions, complex strategies for organisational change, multi-stakeholder collaboration, community engagement and rapidly evolving biotechnology. Drawing on my experiences working with the Melbourne Genomics and Australian Genomics Health Alliances, I will discuss the challenges and opportunities in clinical bioinformatics, including the use of ‘implementation science’ for translating research bioinformatics into clinical practice. About the speaker: TBA Monday Oct 9, 2017 Speaker: Saskia Freytag (WEHI) Title: Cluster Headache: Comparing Clustering Tools for 10X Single Cell Sequencing Data Abstract: The commercially available 10x Genomics protocol to generate droplet-based single cell RNA-seq (scRNA-seq) data is enjoying growing popularity among researchers. Fundamental to the analysis of such scRNA-seq data is the ability to cluster similar or same cells into non-overlapping groups. Many competing methods have been proposed for this task, but there is currently little guidance with regards to which method offers most accuracy. Answering this question is complicated by the fact that 10x Genomics data lack cell labels. Thus in this review, we focused on comparing clustering solutions of a dozen methods for three datasets on human peripheral mononuclear cells generated with 10x Genomics technology. While clustering solutions appeared robust, we found that solutions produced by different methods have little in common with each other. They also failed to replicate cell type assignment generated with supervised labeling approaches. Furthermore, we demonstrated that all clustering methods tested clustered cells to a large degree according to the amount of ribosomal RNA in each cell. About the speaker: Saskia completed her Masters in Statistical Science at University College London. After finishing she moved back to Germany, where she completed a PhD in Biostatistics in 2014. She then got the opportunity to relocate to Melbourne to work as a Post-Doctoral Fellow at the Walter and Eliza Hall Institute in Melanie Bahlo’s group. Her research focus is methodological development for the analysis of high throughput sequencing data. She is co-founder of R-Ladies and an ambassador for CHOOSEMATHS. Monday October 2, 2017 (No seminar - Labour Day public holiday) Monday September 25, 2017 (No seminar) Monday Sept 18, 2017 Speaker: Rebecca Poulos (UNSW) Title: The use of big data in the search for cis-regulatory driver mutations in cancer genomes Abstract: Mutations that directly alter protein function have been extensively studied in cancer. However, in recent years, it has become feasible to examine the cancer-causing role of mutations within the remaining 98% of the genome which is non-coding. Here I will present our use of big data in the study of cis-regulatory somatic mutations in cancer genomes. We analysed somatic mutations from over 1,000 cancer genomes across 14 cancer types, specifically focusing on promoter regions. These regulatory regions are often bound by proteins, and we discovered remarkable ‘mutation hotspots’ at sites of protein binding. To understand why these hotspots formed, we used genome-wide maps of nucleotide excision repair (NER) to show that sites of protein binding have reduced levels of NER. Our analyses uncovered the presence of a previously unknown mechanism, by which we associated reduced NER with the formation of mutation hotspots at promoters. To determine how these hotspots might impact cancer development, we investigated whether these mutations can impact the ability of a protein to bind to DNA by analyzing skin cancer mutations at the binding site of the protein CTCF. Performing CTCF ChIP-seq in a melanoma cell-line, we demonstrated the functionality of such mutations through allele-specific reduction of CTCF binding to mutant alleles. Finally, we sought to determine the role of DNA methylation (a common epigenetic modification) on the occurrence of somatic mutations in cancer. We correlated mutation load with methylation across 15 cancer types and subtypes, and we showed that reduced levels of methylation in regulatory regions may be responsible for reduced mutation loads at such loci in colorectal cancer. Taken together, these analyses develop our understanding of the formation and repair of mutagenic lesions in cis-regulatory regions of cancer genomes, providing insight into the search for driver mutations at such loci. About the speaker: Rebecca Poulos is a researcher in the ‘Bioinformatics and Integrative Genomics’ group at the Lowy Cancer Research Centre at UNSW Sydney. Rebecca’s research field is in the area of cancer genomics, where she uses ‘big data’ to study DNA mutation and repair processes in regulatory regions of cancer genomes. Her research output includes first-author publications in ‘Nature’ and ‘Cell Reports’, together with a review article, editorial and book chapter in the area of non-coding driver mutations in cancer. Rebecca studied science and business at the University of Technology Sydney. She is currently at UNSW Sydney where she completed her Honours year (with University Medal) and is finalising her PhD research (with UNSW Research Excellence Award). Monday Sept 11, 2017 Speaker: Mark Segal (UCSF) Title: Statistical and Computational Challenges in Conformational Biology Abstract: Chromatin architecture is critical to numerous cellular processes including gene regulation, while conformational disruption can be oncogenic. Accordingly, discerning chromatin configuration is of basic importance, however, this task is complicated by a number of factors including scale, compaction, dynamics, and inter-cellular variation. The recent emergence of a suite of proximity ligation based assays, notably Hi-C, has transformed conformational biology with, for example, the elicitation of topological and contact domains providing a high resolution view of genome organization. Such conformation capture assays provide proxies for pairwise distances between genomic loci which can be used to infer 3D coordinates, although much downstream analysis bypasses this reconstruction step. After demonstrating advantages deriving from obtaining 3D genome reconstructions, in particular from superposing genomic attributes on a reconstruction and identifying extrema (’3D hotspots’) thereof, we showcase methodological challenges surrounding such analyses. Open issues highlighted include (i) performing and synthesizing reconstructions from single-cell assays, (ii) devising rotation invariant methods for 3D hotspot detection, (iii) assessing genome reconstruction accuracy, and (iv) averting reconstruction uncertainty by direct integration of Hi-C data and genomic features. By using p-values from (epi)genome wide association studies as the feature the latter approach provides a conformational lens for viewing GWAS findings. About the speaker: TBA Monday Sept 4, 2017 Speaker: Dr Fabio Luciani (UNSW) Title: A systems immunology approach to study antigen-specific T cells in viral infection Abstract: Immunological memory is a cardinal feature of human adaptive immunity and is critical for prophylactic vaccination and recently has been shown to play important role in determining the outcome of T cell based immunotherapies in cancer. Although cytotoxic T cells can have a significant impact on disease clearance, the essential phenotype of a clinically successful T cell and how this influences therapeutic efficacy remain largely undefined. In this presentation I will present our systems immunology approach to tackle these issues. I will review recent studies on longitudinal samples of primary HCV infection using flow cytometry for phenotyping virus specific T cells, along with single cell transcriptomic and TCR diversity analyses. Future directions involve application of this systems immunology approach to other viral infections, as well to understand how long term T cell memory protection is achieved. About the speaker: Ass. Prof. Fabio Luciani was trained as theoretical physicist (Masters), theoretical biologist (PhD 2006 from the Humboldt University of Berlin (Germany)). His research interests include adaptive immune responses against pathogen infections, computational models for studying host-pathogen interactions, and bioinformatics analyses of high throughput next generation sequencing data. He has applied mathematical modelling to understand infectious diseases, focussing on transmission dynamics of drug resistant mycobacterium tuberculosis, and the transmission of hepatitis C virus among injecting drug users. He has made several contributions in how HCV infect a new host and the role of T cell mediated responses using next generation sequencing technologies, flow cytometry and statistical modelling. More recently, he has moved into single cell genomics and systems immunology approaches to understand T cell dynamics. He currently holds a NHMRC Career Development Fellowship and he leads a systems immunology group where he conducts both wet- and dry-lab research in the field of immune responses against pathogens. During his career he has published more than 80 papers in specialized and more general journals. Monday Aug 28, 2017 Speaker: John-Sebastian Eden (Charles Perkins Centre, USyd) Title: Using RNA-Seq to reveal the Australian Virome Abstract: TBA About the speaker: TBA Monday Aug 21, 2017 Speaker: Lori Chibnik (Harvard School of Public Health) Title: Genomic journeys into neuropathology and cognitive reserve in an aging population Abstract: TBA About the speaker: Dr. Lori Chibnik, PhD, MPH is a biostatistician and Assistant Professor with a joint appointment in the Department of Epidemiology at the Harvard T.H. Chan School of Public Health and the Department of Medicine at the Harvard Medical School. She received her MPH in International Health and her PhD in biostatistics from Boston University where she worked on predictive modeling methods for disease risk. Over her career she has developed and assessed predictive models for diseases such as HIV, pre-natal screening and autoimmune diseases and continues to apply her methods to various diseases. Dr. Chibnik’s current research focuses primarily on genetics and genomics of Alzheimer’s disease and dementia with an emphasis on longitudinal cohorts. In addition to her research, she is internationally renowned for her training programs and innovative teaching techniques, having developed multiple courses in biostatistics for varied audiences. While at Boston University she managed the Summer Institute for Training in Biostatistics, an NHLBI funded, 6-week summer program designed to bring undergraduate students into the fields of Biostatistics and Public Health. Most recently she developed and implemented a series of biostatistics and programming courses specific to the needs of scientists in sub-Saharan Africa. Currently she directs the Global Initiative for Neuropsychiatric Genetics Education in Researcher at the Harvard-Chan School and the Stanley Center for Psychiatric Research in the Broad Institute of Harvard and MIT. ## Seminars in 2017, Semester 1 Show talks from Semester 1 / Hide talks from Semester 1 Monday June 26, 2017 Speaker: Timothy Peters (Epigenetics Research Laboratory, Genomics and Epigenetics Division, Garvan Institute of Medical Research) Title: Robust and flexible de novo calling of differentially methylated regions Abstract: DNA methylation is a dynamic, environmentally sensitive modification implicated in a large array of biological processes, from transcription factor binding to a being a reliable predictor of age. Hence accurate and interpretable statistical modelling of the methylome is of great importance when investigating epigenetic cell states. DMRcate is a Bioconductor package that calls differentially methylated regions (DMRs) from replicated Illumina array (including the new EPIC array) and whole genome bisulfite sequencing (WGBS) experiments, under general experimental design. It uses a tunable kernel smoother and whole-methylome significance testing to find and rank the most differentially methylated regions for a given hypothesis. It is fast and delivers DMRs in the order of seconds for arrays and minutes for WGBS. Package features include: • Adjustable kernel size • Guidance for users towards appropriate false discovery rate (FDR) thresholds • Annotation-agnostic calling • Options for filtering Illumina probes known to be polymorphic and/or cross-hybridise to off-target genomic sites • Automatic post-calling annotation of DMRs with known Gencode promoter regions • Output in GenomicRanges and bedGraph format • Elegant plotting of DMRs using the Gviz package, including proximal Gencode gene loci • Calling of variably methylated regions (VMRs) from Illumina arrays DMRcate takes into account a number of biological and statistical considerations when defining DMRs, such as irregular spacing of CpG sites and the distribution of variances across CpGs as a result of variable sequencing depth. Reference: Peters et al (2015) De novo identification of differentially methylated regions in the human genome. Epigenetics Chromatin. 2015 Jan 27;8:6. doi: 10.1186/1756-8935-8-6. About the speaker: Tim’s background is in bioinformatics and applied statistics. He completed a PhD on the principles of statistical learning for transcriptomic data in the Department of Statistics at Macquarie University in 2012. He has worked as a Postdoctoral Fellow at CSIRO on the EpiSCOPE project: mapping the epigenetic terrain of human adipocytes, performing statistical analyses for human EWASs (epigenome wide association studies) and has published a novel method for statistical inference of whole-methylome data. In addition, he has spoken at a number of national and international conferences, including an oral presentation at the Joint Statistical Meetings (JSM) in Washington, DC. Monday June 19, 2017 Speaker: Geoff Barton (Professor of Bioinformatics and Head of Division of Computational Biology, School of Life Sciences, University of Dundee, Dundee, UK.) Title: Identification of novel functional sites in protein domains from the analysis of human variation Abstract: In this talk I will present a new analysis that compares publically available variation data for human with variation seen across all available protein sequences regardless of species. The analysis confirms patterns of variation in human are consistent with protein structural features, but highlights structurally and functionally important sites in around 15,000 human protein domains that are not found by conventional sequence analysis methods. The identified sites are enriched in disease-associated variants and ligand binding residues. I will explain the method and illustrate the new analysis with a number of examples including the Nuclear Receptor Ligand Binding Domains and G-protein coupled receptors (GPCRs) which are important therapeutic targets. The study makes heavy use of the popular Jalview (www.jalview.org) sequence analysis program developed in my group, so I will also give a brief update on Jalview’s new features for exploring nsSNPs on alignments. Note: This is a joint event where Prof. Geoff Barton will be giving a talk to all in CPC. Time and location will be announced later. Monday June 12, 2017, No meeting (Queen's Birthday) Monday June 5, 2017 Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd) Title: Unsupervised recurrent neural network for cell event detection in videos Abstract: In this talk, we will present an automatic unsupervised cell event detection and classification method for cell videos based on convolutional and recurrent neural networks. Cell images captured from various biomedical applications often possess different visual characteristics regarding cell appearance, motility, and cell activities. This presents difficulties in finding a generic solution for the automatic detection of cell events (division, death, differentiation, etc.) in videos. Current methods for event detection rely on human observers with specific expertise and long hours of labor; this also renders supervised training a sup-optimal choice. We use a convolutional Long Short-Term Memory (LSTM) neural network structure that simultaneously exploits both spatial visual features and temporal patterns of objects to filter and classify possible cell events in a video sequence. Our model design allows for the detection and classification of cell events without the need for labeled training data; we will demonstrate our model for the detection of mitosis events. About the speaker: TBA Wednesday May 31, 2017 Speaker: Stephen Leslie (Centre for Systems Genomics), Schools of Mathematics and Statistics, and BioSciences, The University of Melbourne Title: Genetics and Geography: Using genomic data to understand population history and demography Abstract: In this talk Stephen will present some of the findings from the People of the British Isles project, which was published in Nature in March 2015 (and featured on the cover), and some more recent work following on from this study. In particular he will show that using newly developed statistical techniques one can uncover subtle genetic differences between people from different regions at a hitherto unprecedented level of detail. For example, in the UK one can separate the neighbouring counties of Devon and Cornwall, or two islands of Orkney, using only genetic information. Stephen will then show how these genetic differences reflect current historical and archaeological knowledge, as well as providing new insights into the historical make up of the British population, and the movement of people from Europe into the British Isles. This is the first detailed analysis of very fine-scale genetic differences and their origin in a population of very similar humans. The key to the findings of this study is the careful sampling strategy and an approach to statistical analysis that accounts for the correlation structure of the genome. The methods developed are readily extended to analyses in other populations. About the speaker: Associate Professor Stephen Leslie is a statistician working in the field of mathematical genetics. A/Prof. Leslie did his undergraduate degree at ANU, including honours in Mathematics. He obtained his doctorate from the Department of Statistics, University of Oxford in 2008, followed by post-doctoral work at Oxford, before becoming the Head of Statistical Genetics at Murdoch Childrens Research Institute in 2012. Since 2016 Stephen has been at the University of Melbourne as Associate Professor of Statistical Genomics, in the Schools of Mathematics and Statistics, and Biosciences, and the Centre for Systems Genomics. In late 2016 he was awarded the Woodward Medal in Science and Technology, the University of Melbourne’s highest award for staff, which is given for research that has made the most significant contribution to knowledge in the five years prior to the award. A/Prof. Leslie's work covers several aspects of statistical and population genetics. His group's main focus is on methodological developments for the analysis of high throughput genetic data and the application of these methods to studies of disease and natural population variation. These methods typically combine modern computationally-intensive statistical approaches with insights from population genetics models. Specifically the group works on statistical methods for imputing immune system (and other) genes from incomplete genetic data; the application of these methods to studies of autoimmune and other diseases; methods for detecting and controlling for population stratification; and understanding the causes and consequences of genetic variation in populations. Monday May 29, 2017 Speaker: Tram Doan (Westmead Millennium Institute, Sydney Medical School) Title: RNA-seq profiling of normal human breast epithelial cells reveals un-expected nuclear receptor segregation Abstract: The ovarian hormone progesterone is a key regulator of female reproductive function. The established role of progesterone analogues in hormone replacement therapy in increasing breast cancer risk has sharpened focus on the mechanisms of action of this hormone in the normal breast. Progesterone play an essential role in the development of lobular alveolar structures in the breast, through stimulation of proliferation during the normal menstrual cycle and pregnancy. We previously reported that the progesterone receptor (PR) was present in the progenitor-enriched normal breast cell population and likely mediates proliferative effects in those cells. In the present study, we profiled the transcriptome of the normal human breast epithelial cells at single cell resolution. The aims are to 1) identify the number and functional characteristics of different cell populations in the normal breast epithelium, and 2) characterise PR expression and lineage association in different normal breast epithelial cell types. We show that progesterone exerts distinct functional roles in different normal breast epithelial cell types and that PR is expressed more frequently in progenitor cells and has the strongest transcriptional effect in this cell population. About the speaker: TBA Monday May 22, 2017 Speaker: Kevin Wang (Statistical Bioinformatics Group, School of Mathematics and Statistics) Title: A bias correction method to identify over-represented gene-sets for boutique arrays Abstract: Gene annotation databases such as Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) are important tools in gene set enrichment test (also known as GST) that describe genes in terms of their associated biological functions and pathways. The purpose of this type of enrichment analysis is to assign biologically meaningful terms to each gene. Associations between a gene set and biological functions of interest can then be established by considering statistically over-represented annotation terms. Traditionally this is done through Fisher’s Exact Test (FET), assuming gene expression arrays capture the complete or at least a very large proportion of the genome. However, this assumption is neither satisfied for the increasingly popular boutique array nor the custom designed gene expression profiling platforms. Specifically, the conventional enrichment analysis is no longer appropriate due to the gene set selection bias induced during the construction of the arrays. By introducing bias correction terms in the contingency table, we thus propose an adjustment method on the traditional hypergeometric test statistics in FET. The adjustment method works by estimating the proportion of genes captured on the array with respect to the genome in order to assist filtration of annotation terms that would otherwise be falsely included or excluded. In this paper we demonstrate a method to adjust over-representation p-value in a grid in\left[0,1\right]^2 $. Using our own Shiny application, we will illustrate the advantages and practicality of the method through multiple differential gene expression analyses in melanoma and other cancers. About the speaker: I am currently a PhD candidate at the School of Mathematics and Statistics under the supervision of Prof. Jean Yang, A/Prof. Samuel Mueller and Dr. Garth Tarr. I am working in the area of statistical bioinformatics and I have strong interests in developing novel methods brought forward by high dimensional biomedical data. A central focus of my current research focuses on the increasingly popular boutique array platform and its application both as a discovery and validation platform for biomarkers for patients in melanoma studies. Monday May 15, 2017 Speaker: Fabian Held (The Life Lab, Charles Perkins Centre) Title: Challenges of modelling (collaborative) networks Abstract: With a constantly expanding body of scientific knowledge and expertise collaboration is essential for the vast majority of research projects. However, we know very little about the complex interactions that affect the success and failure of collaborations, which may include collaborators’ personal attributes, their dynamics as a team, as well as the environment they’re working in. This is especially challenging when success and failure are not clearly defined. In this presentation Fabian will give an update about his progress in evaluating the Charles Perkins Centre’s effectiveness at “challenging prevailing dogmas, generating new ideas and translating knowledge into action” through facilitation of diverse collaborations. In particular, he will focus on his attempts of a statistical analysis of the network of collaborations that have emerged in the CPC, through the co-location of research groups and facilitation of project nodes. He will address conceptual, methodological, as well as technical issues of approaching this problem with exponential-family random graph models on the HPC cluster. About the speaker: TBA Monday May 8, 2017 Speaker: Dario Strbenac (Statistical Bioinformatics Group, SoMS, USyd) Title: Design, Experimentation and Analysis of a Spike-in iTRAQ Proteomics Dataset Reveals Unexpected Aspects of Measurement Bias and Variance Abstract: A replicated Latin squares experimental design was created to explore a variety of factors that influence the accuracy and precision of the measurements made by defining a set of 15 performance metrics. The experiment consisted of 21 non-yeast proteins which were spiked into a background yeast proteome in seven instrument runs of 8 samples labelled using the iTRAQ 8-plex kit. Importantly, the effect of the particular iTRAQ label used was greater than the effect of instrument run. Also, dividing the quantities of different proteins within the same run yielded reasonably accurate fold changes, providing a counter-example to the commonly accepted rule that measurements of different proteins can't be directly compared. Thirdly, the method of summarisation of peptides to a protein-level summary was found to have little effect. Finally, simple point-and-click normalisation using ProteinPilot resulted in better estimation of fold changes at the expense of increased variance and didn't perform substantially worse in any other performance metrics than methods like RUV or linear models, suggesting that commercial software can enable good quality analyses to be done quickly and accurately. The raw dataset is available from ProteomeXchange and allows anyone to apply their own normalisation method to it and upload the protein quantities to the web application and see how their method's performance compares to other methods. Monday May 1, 2017 Speaker: Tim Burykin (The Life Lab, Charles Perkins Centre) Title: Call for data: Exploration and visualization of complex datasets with a novel method Abstract: As a member of professional staff, I'm helping academics at Charles Perkins Centre to visualize their data for presentation, teaching or research purposes. In the first part of the talk I will briefly demonstrate how my images and videos were used to support the narrative of high-impact presentations. I will then focus on the generic method behind these visuals and discuss its usefulness for the exploration and potentially for the in-depth analysis of complex datasets of almost any nature. The talk would be suitable for people who want to look at their data from a different angle or who are searching for a friendly yet comprehensive way to convey their work to the broader audience. About the speaker: I received my Master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John Crawford. My project was concerned with three-dimensional modelling, analysis and visualization of soil microenvironment and leaf cellular structures. Accumulated experience in computer graphics and efficient algorithm development enabled me to join Judith & David Coffey LifeLab at Charles Perkins Centre as a data visualization technician. Monday April 24, 2017 Speaker: Alistair Senior(School of Mathematics and Statistics, Charles Perkins Centre) Title: Meta-analytic tools to detect overlooked variance effects in biological systems Abstract: Medically, the effects of a treatment on among individual variation in health have direct implications for personalized medicine. Ecologically, among-individual variation governs a species niche and is the grist of evolution by natural selection. However, experimental designs and analytical paradigms in biology are heavily focused on detecting the effects of treatments on population averages. As a result, we have a comparatively poor understanding of how environments and treatments affect among-individual variation. Over the last few years I have been developing tools for meta-analysis, which allow the user to combine the results of published studies to assess the effects of treatments on variation. These methods require only those summary statistics that are reported as a matter of standard practice, and integrate easily with commonly used meta-analytic softwares. I will present a summary of the methodology, as well as examples of its application that are pertinent to research goals of the Charles Perkins Centre. About the speaker: I did my undergraduate and masters degrees in the UK, where my research was primarily directed towards questions in ecology and evolution. In 2010 I moved to the University of Otago to do a PhD on gene-environment interactions in determining phenotypic sex, with Shinichi Nakagawa. During this period, I developed an interest in the development and application of hierarchical statistics to questions in biology. After graduating, in early 2014 I moved to Sydney where I began working with Profs Simpson and Raubenheimer to apply my quantitative skills to questions in nutritional ecology. Monday April 17, 2017 (Easter Monday) Monday April 10, 2017 Speaker: Pengyi Yang (School of Mathematics and Statistics, Charles Perkins Centre) Title: A dynamic multi-omic atlas of the transition from naive to primed pluripotency. Abstract: Embryonic stem cells (ESCs) have the potential to generate virtually any differentiated cell types to establish new models of mammalian development and to create new sources of cells for treating an enormous range of diseases. To elucidate the molecular pathways underpinning the transition from naïve to primed pluripotency cell states, we quantified the dynamic changes in the proteome, phosphoproteome, transcriptome, and epigenome underpinning the transition between these cellular states with high temporal resolution. We observed widespread remodelling of the cell across all regulatory layers, and yet the rate, extent and magnitude of phosphorylation changes exceed those observed on other levels, emphasising a critical role for phosphorylation in this process. Our dynamic phosphoproteomics data reveal that ERK and mTOR signalling branches dominate early and late signalling network activity respectively during the ESC to EpiLC transition. Collectively these data provide insight into the molecular processes underlying naïve and primed states, highlighting numerous potential gatekeeper mechanisms governing ESC pluripotency. About the speaker: I obtained my PhD in bioinformatics from School of Information Technologies, The University of Sydney, in 2012. I then moved to the United States and completed an interdisciplinary Research Fellowship in Systems Biology Group, ESCBL, at National Institutes of Health on characterising transcriptomic and epigenomic regulations in embryonic stem cells (ESCs) using ultrafast sequencing data. I relocated back to Australia in late 2015 on a University of Sydney Postdoctoral Fellowship (DVCR) to pursue my own research in systems biology. I’m now affiliated with School of Mathematics and Statistics (SoMS); and Charles Perkins Centre, The University of Sydney. I have been offered a Lectureship in USyd (April 2016) and a Discovery Early Career Researcher Award (DECRA). Monday April 3, 2017 (Hunter Meeting) Monday March 20, 2017 Speaker: Ellis Patrick (School of Mathematics and Statistics) Title: Deconstructing the innate immune component of a molecular network of the aging frontal cortex. Abstract: Alzheimer’s disease is pathologically characterized by the accumulation of neuritic β-amyloid plaques and neurofibrillary tangles in the brain and clinically associated with a loss of cognitive function. The dysfunction of microglia cells has been proposed as one of the many cellular mechanisms that can lead to an increase in Alzheimer’s disease pathology. Investigating the molecular underpinnings of microglia function could help isolate the causes of dysfunction while also providing context for broader gene expression changes already observed in mRNA profiles of the human cortex. In this talk I will lay out the various statistical approaches I have used to tackle this problem. About the speaker: Dr Ellis Patrick is a computational biologist and applied statistician. He is currently an Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer’s disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions. ## Seminars in 2016, Semester 2 Show talks from Semester 2 / Hide talks from Semester 2 Monday November 14, 2016 Speaker: Darya Vanichkina (Gene & Stem Cell Therapy Program, Centenary Institute) Title: Marvellous complexity: Exploring the mammalian transcriptome using RNA sequencing Abstract: The complexity of the trillions of cells that comprise the mammalian body is underpinned not by their genomes, which are by definition identical, but by the temporally and physically precise expression of particular coding genes, long non-coding RNAs and small regulatory RNAs. In my talk, I will present some of the outcomes of my PhD research, which focussed on using and expanding upon developments in RNA sequencing technology and stem cell differentiation to deeply investigate transcripts in cortical and hindbrain-like neurons, and in oligodendrocyte precursor cells. I will also give an overview of my current work, which involves exploring the roles of alternative splicing in controlling gene expression, and the development of new methods of analysing splicing complexity. About the speaker: I am a genomics data scientist at the Gene and Stem Cell Therapy Program at the Centenary Institute in Sydney, where I investigate how the mammalian genome works using next-generation sequencing. I use a combination of preexisting bioinformatics software and custom R, python, and shell scripts to process terabytes of data on a daily basis, taking advantage of the University of Sydney's HPC facilites. I recently completed my PhD in Bioinformatics and Genomics at the Institute for Molecular Bioscience at the University of Queensland, under the supervision of Dr. Ryan Taft and Professor John Mattick. My work focused on using high-throughput sequencing to understand changes in the transcriptome that occur during neuronal functioning in normal cells and in disease; and on induced pluripotent stem cells as models of the human nervous system. For many years, I have been passionate about teaching, especially programming and bioinformatics to biologists, and was able to do a signifcant amount of this during my PhD studies. I am both a Software and Data Carpentry instructor. I also hold a Specialist Degree in Biochemistry with a Major in Molecular Biology from Lomonosov Moscow State University. Monday November 7, 2016 Speaker: Weichang Yu (PhD candidate, SoMS, Usyd) Title: Semisupervised quadratic discriminant analysis using model selection and variational Bayes Abstract: We develop a mean field collapsed variational Bayes approximation for quadratic discriminant analysis (QDA) with model selection, where we allow missing class information in the training dataset and subsequent model selection. This allows the use of unlabelled data to build the classifier and identification of strong predictors. We demonstrate using simulated and real datasets that this leads to a reduction in prediction error even in cases where the within-class dispersion is large. We make two contributions: We presented a computationally cheaper alternative to Monte Carlo Markov Chain with comparable results for Bayesian inference for QDA and a Bayesian framework for performing model selection in QDA. About the speaker: I am a first-year PhD student at University of Sydney working with Dr John Ormerod. My research interests includes variational approximation, model selection and the use of predictive algorithms in medical and bio-informatics. Monday October 24, 2016 (ABACBS, No seminar) Monday October 17, 2016 Speaker: Jason Wong (Group Leader, Bioinformatics and Integrative Genomics, UNSW) Title: Gaining fundamental insights into DNA repair-chromatin interactions through cancer genomics Abstract: Mutations form in the genome through the interplay of DNA lesion formation and incomplete DNA repair. With the advent of cancer genomics and particularly whole cancer genome sequencing, cancer somatic mutations provides us a window into which we can looking into how mutational and DNA repair processes function within human cells. In this talk, I will discuss how we have used whole cancer genome sequencing data to discover a novel biological process. Using publicly available data, we showed that transcription factor binding at active gene promoters can impair nucleotide excision repair (NER) thereby resulting in prevalent mutation hotspots at gene promoters in NER depend cancers such as skin and lung cancers. I will further discuss the implications of this biological process on cancer development and the impact of our study on the interpretation of functional mutations in cancer. About the speaker: Dr Wong is an ARC Future Fellow at the Prince of Wales Clinical School, UNSW and lead the Bioinformatics and Integrative Genomics Team at the Lowy Cancer Research Centre. He received his B.Sc (Hons I), from the University of Sydney and was award a D.Phil in Bioanalytical Chemistry at the University of Oxford, UK in 2007. This was followed by a post-doctoral fellowship at the University College Dublin, Ireland, before returning to Sydney to join UNSW. To date, he has published 65 peer reviewed journal articles with senior authorship in journals including Nature, Genome Biology, Molecular Cancer Research and Nucleic Acids Research. He has attracted over$2 million in research funding as lead investigator from the ARC, Cancer Australia and Cancer Institute NSW. His current research is focused on the study of mutational processes in cancer and its influence on gene regulation and function.

Monday September 19, 2016

Speaker: Fatemeh Vafaee (CPC, USyd)

Title: Determination of circulating microRNA markers of colorectal cancer prognosis by a novel network-based multi-objective optimisation routine

Abstract: Colorectal cancer presents a significant cause of cancer-related death and effective treatments that maximise quality of life as well as cancer-related outcomes are therefore of major importance. Determining the appropriate treatment pathway through a personalised medicine paradigm is a prime goal, and so biomarkers are sought to aide in the decision-making process. In the age of high-throughput technologies, molecular markers are particularly attractive as a means of achieving true personalisation of cancer treatment. We have recently evaluated the role of circulating microRNA as a means of predicting patients’ prognosis and developed an innovative multi-objective network-based optimisation method to identify robust microRNA signatures which are reliable in terms of predictive power and functional relevance. In this talk, I will go through the details of the proposed method. Also, to identify potential collaboration opportunities with the audience, I also give a concise and general overview of my research interests/projects.

About the speaker: Dr Fatemeh Vafaee received her PhD in Artificial Intelligence from the University of Illinois at Chicago in 2011. Her doctorate studies involved multiple projects in domains of optimisation, machine learning, data mining, pattern recognition, and probabilistic graphical models with the focus on theoretic and applied genetic algorithms as her PhD thesis. While perusing her PhD, Fatemeh also collaborated with the University’s Computational Biology Laboratory and extended her research to biological applications such as cellular network alignment and phylogeny reconstruction. After her PhD, Fatemeh started a postdoctoral research position at the University of Toronto, Ontario Cancer Institute, one of the largest cancer research centres in Canada and worldwide. During her postdoc, Fatemeh had the privilege to work in a highly trans-disciplinary environment and collaborate with world-renowned scholars in integrative cancer informatics. In 2013, Fatemeh took a Research Fellow position at Charles Perkins Centre and School of Maths & Stats at the University of Sydney. Her research relies on a wide national and international collaboration network and she has published several papers in competitive peer-reviewed proceedings and top-tier journals as Nature Methods, Scientific Reports, BMC Systems biology, Plos1 and Alzheimer's & Dementia.

Monday September 12, 2016

Speaker: Chendong Ma (SoMS, USyd)

Title: Honours practice talk

Monday September 5, 2016

Speaker: Shila Ghazanfar (SoMS, USyd)

Title: Integrated Single Cell Data Analysis Reveals Cell-Specific Networks and Novel Coactivation Markers

Abstract: Large scale single cell transcriptome profiling has exploded in recent years and has enabled unprecedented insight into the behavior of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterize very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterization of high gene expression, as well as gene coexpression. We offer a versatile modeling framework to identify transcriptional states as well as structures of coactivation for different neuronal cell types across multiple datasets. We employed a gamma-normal mixture model to identify active gene expression across cells, and used these to characterize markers for olfactory sensory neuron cell maturity, and to build cell-specific coactivation networks. We found that combined analysis of multiple datasets results in more known maturity markers being identified, as well as pointing towards some novel genes that may be involved in neuronal maturation. We also observed that the cell-specific coactivation networks of mature neurons tended to have a higher centralization network measure than immature neurons. Integration of multiple datasets promises to bring about more statistical power to identify genes and patterns of interest. We found that transforming the data into active and inactive gene states allowed for more direct comparison of datasets, leading to identification of maturity marker genes and cell-specific network observations, taking into account the unique characteristics of single cell transcriptomics data.

Monday August 29, 2016

Speaker: Kevin Wang (SoMS, USyd)

Title: An adjustment method for gene set over-representation in boutique arrays

Monday August 22, 2016

Speaker: Cali Willet (Sydney Informatics Core Research Facility, USyd)

Title: Bioinformatics services and training

Abstract: An overview of the bioinformatics services and training available through the Sydney Informatics Core Research Facility

About the speaker: Cali Willet is a bioinformatics technician for the Sydney Informatics Core Research Facility at the University of Sydney. She completed her PhD in animal genomics and computational biology in the Faculty of Veterinary Science at the same institution. She is interested in the genetics of disease, particularly in companion and endangered animals, and in the development of bioinformatics methodologies tailored for causal locus identification in non-model organisms. As a bioinformatician for the Core Research Facilities, she is focused on providing support to bioinformatics research groups in the form of consultation, training and advocating for the needs of bioinformatics and computational biology groups at the University of Sydney.

Monday August 15, 2016 (Cancelled)

Monday August 8, 2016

Speaker: Ulf Schmitz (Research Officer, Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Intron retention redefines post-transcriptional gene regulation in mammalian and vertebrate species

Abstract: Intron retention (IR) occurs when the splicing machinery fails to excise introns from primary transcripts. This may give rise to diverse downstream effects, most often however, it induces nonsense-mediated decay (NMD) of the intron-retaining transcript. We performed a phylogenetic analysis of IR in human, mouse, dog, chicken, and zebrafish granulocytes. We found evidence that IR affects functionally related genes in granulocytes throughout evolution, many of which are orthologs. We also found a strong anti-correlation between the number of intron-retaining genes and the number of protein coding genes in a genome. Retained introns have similar characteristics in all investigated species (human, mouse, dog, chicken, zebrafish). They are shorter and have a higher GC content than their non-retaining counterparts; they often reside near the 3 prime end of a transcript and are enriched in premature termination codons. Their host genes harbour a larger number of miRNA binding sites in their 3' untranslated region and are often co-regulated in human and mouse. Our results suggest that IR is a global control mechanism affecting similar biological processes independent of specific effector genes. More important, we gained new insights that support the notion of IR as an independent mechanism of post-transcriptional gene regulation that supplements and maybe even cooperates with other form of post-transcriptional gene regulation.

About the speaker: Ulf Schmitz is a post-doctoral researcher at the Centenary Institute in Sydney. His research focuses on the design of integrative workflows combining various computational disciplines with experimentation to approach molecular biological and medical problems. Between 2003 and 2015, Ulf Schmitz worked as a systems engineer and later as bioinformatician at the Department of Systems Biology & Bioinformatics, University of Rostock, Germany. He was awarded his PhD in Bioinformatics in June 2015. Thereafter, he joined Prof John Rasko’s Gene and Stem Cell Therapy Program as a bioinformatics research officer. In January 2016, he was appointed as Conjoint Senior Lecturer at the Centenary Institute and the Sydney Medical School.

Monday August 1, 2016

Speaker: Dario Strbenac (Senior Research Associate, Statistical Bioinformatics Group, USyd)

Title: Interactive Benchmarking of Quantitative Proteomics Preprocessing Alternatives

Abstract: Mass spectrometry has long been used to analyse biological samples and find associations of altered proteins with experimental conditions. However, the focus of previous method evaluation efforts has been on the peptide amino acid sequence determination problem. Here, using a replicated Latin squares experimental design, the first comprehensive comparison of alternative choices of preprocessing alternatives on the bias and variance of protein quantitation is made. Surprisingly, the variability between iTRAQ labels is larger than between different runs of the instrument. This has consequences for research who don't adequately incorporate randomisation and blocking in their proteomic experimental designs. Secondly, the default preprocessing done by the vendor software ProteinPilot outperforms more advanced methods, such as linear models and RUV, in terms of recovering the expected fold changes (bias). Thirdly, comparing the measurements of different proteins within a sample is shown to be feasible, which was previously assumed to be inaccurate and always avoided. Finally, a benchmarking Shiny application will be demonstrated, which allows users to upload their own preprocessing of the raw data, and see how their method compares to other methods in an interactive scoreboard.

Monday July 25, 2016

Speaker: Joshua Ho (Head, Bioinformatics and Systems Medicine Laboratory, VCCRI)

Title: A systems approach to study organ development and congenital disease

Abstract: A systems biology approach is now being widely employed to systematically how molecular and signalling pathways are regulated in organ development in humans and relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the genetic causes of congenital diseases that stem from dysregulation of GRN during organ development. In this talk I will discuss latest advances in GRN inference and analysis using large amount of experimentally determined perturbation data, and how we can use GRN to study organ development and congenital diseases.

About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS).

## Seminars in 2016, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 20, 2016

Speaker: Rima Chaudhuri (Metabolic Cybernetics Lab, CPC, USyd)

Title: Understanding the relationship between AKT recruitment and GLUT4 translocation to the plasma membrane in fat cells through single cell microscopy data analysis.

Abstract:

About the speaker: Dr. Chaudhuri was awarded her PhD in Bioinformatics from the University of Illinois, USA in 2010. Her doctoral thesis was on the discovery and design of drugs for the treatment of SARS coronavirus and Hepatitis C virus through computational modeling. While pursuing her doctorate degree, she worked as a researcher in software and pharmaceutical companies such as Blackbaud Inc., and Pfizer Inc., (USA) developing modules of scientific research software. Dr. Chaudhuri pursued her postdoctoral training at the Parc Cientific de Barcelona (PCB) as a joint affiliate between the Institute for Research in Biomedicine and the Barcelona Supercomputing Center in Barcelona in biophysical simulations. She holds two international patents in the field of drug discovery and design. After her year-long post-doc in Spain, she moved to Sydney in 2011 and joined the Garvan Institute of Medical Research in the laboratory of Prof. David E. James to work on systems biology based approaches to unravel the complexities behind the incidence of metabolic disease such as diabetes and obesity. Her strength lies in interdisciplinary research and bridging the gap between computational and basic sciences. She is currently a research fellow at the Charles Perkins Centre in the University of Sydney. Her current research interests include isolating candidate bio-markers of T2D and obesity from molecular expression profiles, understanding and targeting protein-protein interactions in disease to facilitate a cure and integrating multi-dimensional data from different platforms (transcriptomics, proteomics, interactomics and metabolomics) to acquire a precise picture of the diseased cell.

Monday June 13, 2016 (Queen's Birthday)

Monday June 6, 2016

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Computing Image Similarity for Image-Derived Disease Models

Abstract: Imaging is a critical and indispensable component of modern healthcare. The automated analysis of medical images has a vast range of applications in evidenced-based diagnosis, physician education, and biomedical research. These decision support applications are predicated on the ability to objectively compute the similarity of image content in a manner that matches the subjective similarity judgement of human domain experts. In this talk, I will present an overview of the conceptual challenges in this field before detailing my research on methods for characterising and comparing the visual content of images, including a graph-based method for comparing 3D PET-CT lung cancer images and my more recent work using convolutional neural networks.

About the speaker: Dr. Ashnil Kumar received the Ph.D. degree in information technology also from the University of Sydney in 2013; his PhD introduced a new graph-based method for modelling the relationships between tumours and organs in medical images.

Monday May 30, 2016

Speaker: Vinita Deshpande (Metabolic Cybernetics Lab, CPC, USyd)

Title: Removing unwanted variation in large scale ‘omics datasets containing missing values

Abstract: Transcriptomics and proteomics are powerful techniques to obtain a comprehensive snapshot of biological systems ranging from cells to whole organisms. However, a major problem for such big datasets is the presence of missing values, as many statistical tools used to analyse these often require complete data. One such bioinformatic tool is RUV (Removing Unwanted Variation), a widely used R package developed to remove technical variation, such as batch effects, in order to normalise the data and perform downstream analyses such as differential expression analysis.

One of the solutions to overcoming this issue of missing data is to obtain a complete dataset, by either filtering the data to eliminate missing values or performing imputation. These approaches however, can greatly reduce the sample size or biological variation, leading to loss of statistical power. The first part of this talk will describe an alternative approach in which the RUV algorithm was adapted to handle data with missing values as its input. The performance of this new algorithm was evaluated in terms of its ability to normalise and correctly identify differentially expressed genes/proteins in large ’omics datasets containing varying amounts of simulated missing values. The second part of this talk will be a discussion on the future directions and challenges of this PhD project in terms of designing and conducting further quantitative analyses on large scale ‘omics data.

About the speaker: Vinita is a PhD student supervised by Prof David James and Prof Jean Yang at The University of Sydney, where she is pursuing her research interests in the application of systems biology and bioinformatic approaches to metabolic diseases. Vinita has previously completed a Bachelor of Science (Bioinformatics) / Bachelor of Information Technology (Computer Science) with Honours from The University of Sydney. Prior to commencing her PhD, she worked as a bioinformatics research assistant with Dr Joshua Ho in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute in Sydney.

Monday May 23, 2016

Speaker: Ashley Waardenberg (Children's Medical Research Institute; Sydney Medical School, USyd)

Title: Discovering Protein-Protein Interactions from DNA sequence - insights into the cardiac gene regulatory network and disease

Abstract: NKX2-5 is a key transcription factor (TF) required for normal heart development and is implicated in a range of cardiac diseases. NKX2-5 is a critical TF for normal heart development that binds directly to DNA by recognising a specific sequence called the NKX2-5 binding element (NKE). However, until recently its genomic targets were poorly defined and the NKX2-5 protein-protein interaction network remains poorly defined. Recently we identified genomic target regions for NKX2-5 and human disease relevant mutations in cultured HL-1 cardiomyocytes using the DamID method and identified new NKX2-5 disease mechanisms (Bouveret R, Waardenberg AJ, et al. eLIFE, 2015). This talk describes our efforts at predicting and the subsequent validation of novel protein-protein interactions (PPIs) based on recurrent binding sites (or motif grammar) through the application of machine learning algorithms.

About the speaker: Dr Ashley Waardenberg is currently a postdoctoral bioinformatican at the Children's Medical Research Institute, Westmead, where he is developing systems biology approaches for investigating proteomics and high throughput protein modification data related to the brain and associated diseases; in collaboration with Dr Mark Graham and Prof Phil Robinson. He received a PhD in Systems Biology (2012) under the supervision of A/Prof Christine Wells (now at the University of Glasgow, Scotland) and Dr Brian Dalrymple (CSIRO, Australia) where he developed a novel visualisation approach for viewing gene expression data specifically in the context of striated muscle contractile protein location. A key outcome of his PhD was the discovery of a new protein-protein interaction between PI3K and a muscle mechano-sensor in the heart, implicating the muscle contractile apparatus in responding to cardiac stress which has broader implications in the context of PI3K cancer therapies (Waardenberg, et al. Journal of Biological Chemistry, 2011). During his PhD, he was also involved in the Bovine Genome Consortia which was published in Science in 2009 and was a team recipient of the CSIRO Chairman's Medal in 2010 for contributions to this international effort. He then joined the Cardiac Developmental and Stem Cell Biology Laboratory of Professor Richard Harvey at the Victor Chang Cardiac Research Institute, Darlinghurst, as a Postdoctoral Scientist to gain a deeper insight into development biology, furthering an interest in understanding the origins of disease, where he implemented systems biology strategies for understanding genome-wide binding effects of the cardiac transcription factor NKX2-5 and NKX2-5 mutations relevant to congenital heart disease. This has a resulted in a number of recent publications (Waardenberg AJ, Ramialison M et al Cold Spring Harbour of Laboratory Perspectives in Medicine, 2014; Bouveret R, Waardenberg AJ et al. eLIFE, 2015; Waardenberg AJ et al. BMC Bioinformatics, 2015) and he continues to collaborate with the Victor Chang Cardiac Research Institute.

Ashley is also a founding member and Vice-President of the Australian Bioinformatics and Computational Biology Society (ABACBS). Ashley has been heavily involved in establishing this very young society and is passionate about establishing communities in this domain.

Monday May 16, 2016

Speaker: Timur Burykin (Judith and David Coffy Life Lab, CPC, USyd)

Title: Data visualization and exploration using particle dynamics simulation

Abstract: Exploration of complex multidimensional datasets is an ongoing challenge in many fields of research. In the attempt to simplify this task for people with no expertise in advanced statistics or programming a novel method of data visualization was discovered. The algorithm applies simple particle interaction rules on data points and allows them to self-organize into layouts that approximate the clustering of objects in the multidimensional space. Complementary density map, superimposed network connectivity and configurable node properties linked to extra dimensions make this visualization method suitable for a wide range of applications. A few datasets will be demonstrated in this presentation including hospital admission records, TF-TG interaction network and results of diet experiments. The extension of the algorithm to the advanced image and network analysis will also be discussed.

About the speaker: Tim Burykin is an experienced C++ programmer who joined Charles Perkins Centre last year as a data visualization technician and a member of Judith & David Coffey LifeLab supervised by Prof. Zdenka Kuncic. He received a master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John W. Crawford.

Monday May 9, 2016

Speaker: Denis Bauer (Team leader Transformational Bioinformatics, CSIRO)

Title: VariantSpark: applying Spark-based machine learning methods to genomic information

Abstract: Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a Hadoop/Spark framework that utilises the machine learning library, MLlib, thereby providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark offers an interface to the standard variant format (VCF), seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VariantSpark, we cluster of more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach developed by the Global Alliance for Genomics and Health, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

About the speaker: Dr. Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s ehealth program. Her expertise is in high throughput genomic data analysis, computational genome engineering, as well as Spark/Hadoop and high-performance compute system. She has a PhD in Bioinformatics and has done her Postdoctoral training in machine learning and human genetics, respectively. Her collaborators include Prof Simon Foote on mammalian susceptibility to infectious diseases, Prof Ian Blair on molecular mechanisms on motor neuron disease, and Prof Rodney Scott on obesity-driven cancer. She has 23 peer-reviewed publications (9 first author, 4 senior author) with three in journals of IF>8 (e.g. Nat Genet.) and H-index 9. To date she has attracted more than AU\$25Million in funding.

Monday May 2, 2016 (Florian and Falk farewell. No seminar.)

Monday April 25, 2016 (ANZAC Day)

Monday April 18, 2016

Speaker: Michael De Ridder (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: CeraVA: A Visual Analytics Framework for Neurological Disorder Analysis with Functional Magnetic Resonance Imaging

Abstract: Functional Magnetic Resonance Imaging (fMRI) is an important imaging modality for understanding and diagnosing neurological disorders, such as schizophrenia, bipolar disorder and Alzheimer's disease. The modality temporally scans blood oxygenation as a proxy for neuronal activity. This activity is often processed into three components for analysis: (i) the anatomical context; (ii) individual voxel and region (group of voxel) time-series; and (iii) the correlation of activity between regions. While many statistical and graph theoretical approaches have been applied to data, issues such as noise and a lack of understanding of the brain lead to a diverse range of challenges. Visualisation-based analytics is often used to overcome some of these challenges, however, current methods often present an oversimplification of the data. With CereVA, we integrate all three of the commonly derived activity components in a visual analytics framework comprising of a full scale pipeline that incorporates automatic image processing and interactive visualisation. Finally, we present a new application for fMRI visual analytics by applying CereVA to the active research area of classifying neurological disorders.

About the speaker: Michael de Ridder is a PhD student with The Institute of Biomedical Engineering and Technology (BMET) in the School of Information Technologies at the University of Sydney. He is supervised by A/Prof Jinman Kim. Michael's work straddles the boundary of scientific and information visualisation with a heavy influence from medical imaging techniques.

Monday April 11, 2016 (Hunter Meeting)

Monday April 4, 2016

Speaker: Taiyun Kim (Victor Chang Cardiac Research Institute, and UNSW)

Title: PAD: An interactive web portal for analysis of transcription factor co-binding at promoters and enhancers

Abstract: It has long been observed that transcription factors (TFs) bind to DNA collaboratively with other TFs as co-binding partners. Recently, through studying the genomic binding sites of essential embryonic stem cell TF NF-Y, Dr Pengyi Yang has shown that the same TF may bind DNA with different co-binding partners if we consider TF binding sites that are proximal or distal to transcription start sites separately. Based on this observation, we have developed a database of binding sites of >200 TFs in mouse embryonic stem cells, and an interactive web portal that enables any user-submitted TF binding profiles to be clustered and visualised with our database TF profiles, at the proximal and distal regions separately. Our tool contributes to our understanding of how gene regulation occurs via combinatorial binding if TFs in different cell types.

About the speaker: Taiyun Kim is a 5th year student in the Bachelor of Engineering (Bioinformatics)/Masters of Biomedical Engineering program at the University of New South Wales. In 2015, he was awarded a Summer Scholarship to work in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute (VCCRI), under the supervision of Dr Joshua Ho (VCCRI) and Dr Pengyi Yang (University of Sydney).

Monday Feb 1, 2016

Speaker: Lei Sun (School of Information Engineering, Yangzhou University, China)

Title: Study on long noncoding RNAs using computational methods

Abstract: Tens of thousands of long noncoding RNAs (lncRNAs) newly discovered have been attracting the spotlight from life science for a period of time as their important biological functions are revealed increasingly. Due to the intrinsic complexity of lncRNA functions and mechanisms, our group proposes to study the lncRNAs using a series of computational methods, which can certainly improve the research efficiency. In my talk, I would like to share some ideas on the results of lncRNA prediction using support vector machine (SVM), and to discuss potential lncRNA-specific transcriptional patterns detected using computational methods.

About the speaker: Dr. Lei Sun received a doctor of engineering degree from China University of Mining and Technology in 2013. Now he is a lecturer in School of Information Engineering at Yangzhou University, P.R. China. is research interests include bioinformatics, signal and information processing. As a visiting PhD student, Dr. Sun was previously doing research on bioinformatics in several institutes and universities respectively, including School of IT at The University of Sydney, Institute of Molecular Bioscience (IBM) at University of Queensland, and Beijing Institute of Genomics (BIG) of Chinese Academy of Sciences.

## Seminars in 2015, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

## Seminars in 2015, Semester 2

The seminars will be held at 1:00 pm on Monday in Access Grid Room, which is on level 8 of Carslaw Building. The format of the talk is 30~45 minutes plus questions.

Monday Nov 30

Speaker: Shila Ghazanfar (SMS, Faculty of Sciences, The University of Sydney)

Title: Gene coexpression identification from single-cell expression experiments

Abstract: Classically, gene expression profiles have represented an aggregate of expression levels of each of the multitude of cells within the sample of interest. More recently, technologies utilising quantitative PCR, such as nanoString, enable measurement of expression in individual cells opposed to an amalgamation of cells. As such, using this data along with appropriate statistical models, we can ask questions such as in what proportion of cells certain genes are expressed, and we can determine the distribution of coexpression of genes among these cells. In collaboration with Associate Professor David Lin at Cornell University, whose interest lies in investigating the olfactory system in mouse models, a set of neuronal cells were assayed with special interest in the protocadherin family of genes. We describe the statistical methods for processing the single-cell expression data and identifying coexpression of genes in subsets of the cell population, including mixture modelling and visualisation techniques for further insight.

About the speaker: Shila Ghazanfar is a 3rd year PhD student and Postgraduate Teaching Fellow in the School of Mathematics and Statistics at The University of Sydney. She is supervised by A/Prof Jean Yee Hwa Yang (The University of Sydney), Dr John Ormerod (The University of Sydney) and Dr Michael Buckley (CSIRO). Her research interests are in analysing high throughput sequence data such as RNA-Seq and Exome-Seq, and integrating different types of high-throughput data. She has previously completed a Bachelor of Science (Advanced Mathematics) and Honours in Statistics at the University of Sydney.

Monday Nov 23

Speaker: Cristian Leyton (Faculty of Health Sciences, The University of Sydney)

Title: Primary progressive aphasia and its challenges

Abstract: Primary progressive aphasia (PPA) comprises a group of neurodegenerative conditions that affect predominantly the language function. As a result of the partial destruction of the language network, several clinical variants of PPA have been described, each of which has its own profile of linguistic deficits, distribution of brain atrophy, and molecular pathology. This unique group of conditions offers a natural paradigm to understand the neural basis of language processing and how neurodegeneration starts and spreads. I will explain the main challenges in the field and explore potential contributions of bioinformatics to the field.

About the speaker: Dr Leyton worked as a clinical neurologist for four years before moving to Australia in 2009. He was awarded with a PhD on progressive aphasias in 2013 at UNSW. In 2014, he was awarded with a DVC University Postdoctoral Fellowship at the Faculty of Health Sciences, University of Sydney. His main interest is the study of aphasic manifestations caused by neurodegenerative diseases.

Monday Nov 16

Speaker: David Humphreys (Victor Chang Cardiac Research Institute)

Title: Obstacles and challenges in specialised RNA sequencing (RNA-seq) analysis

Abstract: In recent times RNA-seq has become an affordable method to profile the transcriptome of a biological sample. One of the strengths of RNA-Seq over other technologies is the ability to capture information at a nucleotide level using relatively unbiased methods without having any prior genomic information. These features have given rise to many RNA-Seq applications and with this often arises new challenges in the bioinformatics analysis. In this presentation I will highlight the challenges (and some solutions) in small RNA-Seq, RNA editing and circular RNA analysis from high throughput sequencing data.

About the speaker: Dr David Humphreys is a multidisciplinary wet-lab-scientist/bioinformatician who manages the Genomics Core facility at the Victor Chang Cardiac Research Institute. His undergraduate training comprised of a joint major in Biology and Computer Science before completing honours followed by a PhD in molecular biology. After joining the Victor Chang Cardiac Research Institute (VCCRI) he developed a research interest in gene regulation and the involvement of small non-coding RNAs. Since 2009 he has been heavily involved in studies utilising high throughput sequence technologies which has allowed him to refocus his computer science skills. David has a number of active research collaborations with VCCRI faculty and St Vincent's Hospital cardiologists utilising RNA-Seq and exome sequencing.

Monday Nov 9

Speaker: James Burchfield (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Snapshots of diabetes

Abstract: The imaging of biological systems has become a fundamental tool in the cell biologists arsenal. Our lab has utilised a range of imaging techniques to probe the insulin signalling network in single cells and have data that throws into question the traditional view of signalling networks. Central to the continued success of this approach is the ability to extract relevant information from this large volume of high-content image data and whilst the analytical pipelines for large scale genomic and proteomic data has undergone a revolution in recent years, there has been a lack of development of similar tools to analyse data generated from imaging experiments. I will discuss some of the developments in this arena in the context of diabetes and insulin resistance.

About the speaker: James obtained his PhD from The University of Sydney and pursued a postdoctoral fellowship in David James' lab in Garvan Institute of Medical Research. In 2014, James relocated with the David James' lab to Charles Perkins Centre in University of Sydney. James is the expert in using high performance microscopy for single cell imaging.

Monday Nov 2

Speaker: Martin Wong (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Kinetic simulation of the Akt pathway in Insulin Signalling

Abstract: Traditional biological research pathways are often focused on discovery of novel protein-protein interactions. The temporal kinetics of signalling events is a feature that is not commonly investigated, but they may encode important information regarding the physical mechanisms underlying these interactions. The talk today will discuss how in kinetic simulations can be used to infer these physical mechanisms. The talk will begin by discussion how models are constructed in terms of the rate equation used, the network topology and the parameter fitting procedure. The application of this will then be discussed in more detail in the context of Insulin Signalling and the Akt signalling pathway where new insight has been obtained regarding the phosphorylation mechanism of Akt prior to the activation of its downstream substrates.

About the speaker: Martin comes from a very diverse background, having completed a Bachelor of Science majoring in Physics, and a Bachelor of Engineering in the Biomedical stream. He also completed an honours in engineering where he worked on developing a bioactive material for use in implanted devices. He is now a few months from completing his PhD under David James from the Metabolic Cybernetics Lab in the CPC, and Zdenka Kuncic from the Institute of Medical Physics at the School of Physics, where he is using mathematical modelling to interrogate the temporal aspects of insulin signalling.

Monday Oct 26

Speaker: Lake-Ee Quek (Coffey LifeLab, Charles Perkins Centre, The University of Sydney)

Title: Processing of metabolite data generated by mass spectrometry

Abstract: Metabolites are the chemical species transformed during metabolism. They are direct signatures of cellular state, and therefore easier to correlate with phenotype. With mass spectrometry, the appeal is the ability to rapidly measure thousands of metabolites from very small samples. The talk today will briefly introduce metabolomics, although the focus will be on data acquisition and processing, in the context of targeted metabolomics. Global metabolic profiling in an unbiased fashion is the ultimate aim in metabolomics, with the right analytics and bioinformatics.

About the speaker: Lake-Ee obtained his PhD in Cell Metabolism in The University of Queensland (UQ) in 2010. He has been a UQ Postdoctoral Fellow from 2011 to 2014 and a Research Postdoc with A/Prof. Nigel Turner, Mitochondria Bioenergetics Lab, School of Medical Sciences, UNSW, from 2014 to 2015. He recently take on a Postdoctoral Fellowship with Coffey LifeLab in Charles Perkins Centre and relocated to The University of Sydney.

## Seminars in 2015, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

The seminars will be held at 1:00 pm on Monday in Access Grid Room.

Monday June 22

Mark Greenaway (Stat, Sydney)

New tools in the R ecosystem

Show abstract / Hide abstract

In the past few years, there has been an explosion of interest in data science. R has been at the forefront of this field, which has lead to a lot of positive contributions to the R ecosystem from the wider tech. community. Useful tools from the tech. community which are available for R will be outlined, particularly focusing on GitHub, visualisation, the contributions of Hadley Wickham/RStudio, Spark and cloud computing.

Monday May 25

Paul Lin (Stat, UNSW)

Gene expression Changes in Human Rett Syndrome Brain

Show abstract / Hide abstract

Rett Syndrome (RTT) is an X-linked neurodevelopmental disorder. It affects girls at a frequency of 1 out of 10000 live births. Our study is the first transcriptome-level analysis of post-mortem RTT brain tissue with age-matched controls. We have used two technologies, RNA-seq and micro-array, to replicate our findings. We have taken into consideration of tissue composition, which hasn?t been done in previous RTT studies; we have found that tissue composition affects the outcomes of differential expression analysis. More than 95% of classic RTT cases are caused by sporadic mutations in the gene encoding methyl-CpG binding protein 2 (MeCP2). Initial studies have pointed out a transcriptional repressor role of MeCP2; our data is consistent with recent data and confirms that MeCP2 is a transcriptional activator. We have also shown that intergenic L1 expression increases in human RTT brain. Lastly, co-expression networks will be demonstrated to identify brain region specific enhancer RNAs in the human brain. In this study, we have identified a set of Robust Brain-Expressed Enhancers (rBEEs). rBEEs are enriched for genetic variants associated with autism spectrum disorders (ASD).

Monday May 18

Statistical Analysis of a Lupus Flow Cytometry Experiment

Show abstract / Hide abstract

Results from an analysis of an flow cytometry-based observational study of patients of patients with lupus will be presented. The data was collected and analysed over a five year period at the Centenary Institute at University of Sydney. The underlying biotechnology will be described and how the statistical complications associated with the data including measurement error, missing values, and outliers were resolved.

Monday May 11

In-silico differentiation between direct and indirect protein binding partners from a large MS-based protein interactome experiment

Show abstract / Hide abstract

MS-based protein interactome experiment Abstract: Protein-protein interactions (PPIs) are crucial in all cellular processes, primarily in understanding signalling cascades and protein functions. Affinity purification, followed by mass spectrometry analysis (AP/MS), offers a powerful approach for the study of complex protein-??protein interactions. However, such MS-based high-throughput screens are notorious for high false discovery rates (FDR). Secondly, such screens do not allow differentiation between direct an indirect protein binders. In this study, we developed a scoring function that ranks putative binders based on their likelihood of being a direct binder using an array of features, including 3D protein structure information. We use the interactomes of eIF4E and 4EBP1 proteins implicated in insulin resistance as case studies to elucidate the principles behind the development of this scoring function. Lastly, Phosphortholog, a web-based tool to map orthologous post-translational modification sites on proteins across species is demonstrated.

Monday May 4

Euijoon Ahn (IT, Sydney)

Automated Melanoma Segmentation and Classification

Show abstract / Hide abstract

The segmentation of skin lesions in dermoscopic images is considered as one of the most important steps in computer-aided diagnosis (CAD) for automated melanoma diagnosis. Existing methods, however, have problems with over-segmentation and do not perform well when the contrast between the lesion and its surrounding skin is low. A new automated saliency-based skin lesion segmentation (SSLS) method is proposed, which is designed to exploit the inherent properties of dermoscopic images, which have a focal central region and subtle contrast discrimination with the surrounding regions. The proposed method was evaluated on a public dataset of lesional dermoscopic images and was compared to established methods for lesion segmentation that included adaptive thresholding, Chan-based level set and seeded region growing. Results show that SSLS outperformed the other methods in regard to accuracy and robustness, in particular, for difficult cases. Superpixels are also introduced.

Monday April 27

Integrative Analysis of Somatic Mutations with Focus on Biological Pathways

Show abstract / Hide abstract

The development and severity of cancers depends on the somatic mutations occurring in the tissue. Technologies like whole exome and whole genome sequencing (WES/WGS) have allowed for interrogation of the somatic mutations taken on in a tumour compared to normal tissue in a patient. However, it is clear that some mutations are worse than others, leading to work in identification of genes harbouring ?driver? mutations as opposed to ?passenger? mutations. Further to this there is work in elucidating the role these mutations play in the system as a whole, via integration of mutation, gene expression, and network information (e.g. protein-protein interaction networks), as well as other data sources. In this seminar I will discuss my work up to date on methods that aim to answer these questions, with focus on the melanoma dataset.

Monday April 20

Using Resampling to Fit Better Models

Show abstract / Hide abstract

The weighted bootstrap is one of many procedures for evaluating the goodness of fit of a model. I would like to attempt to highlight how and why this work changed the way I thought about cross-validation and, most importantly, the practical impacts of using a weighted bootstrap for estimating LASSO penalty parameters. Diane Loo's work is highly relevant for anyone that has ever used cross-validation or ever plans to. I will use the prognostic melanoma data to highlight a few of the limitations of cross-validation. Time permitting, some of the issues we have faced when trying to explore and validate the weighted bootstrap will be explained. This work is currently being drafted for journal submission.

Monday April 13

Data Exploration and Subtype Discovery and Prognosis Prediction

Show abstract / Hide abstract

Finding an appropriate measure of association between connected regions of brain resting fMRI datasets. Potential challenges of the project are noted and some exploratory analysis on a cleaned fMRI dataset is shown.