Statistical Bioinformatics Seminar

The aim of the statistical bioinformatics seminar is to provide a forum for people working within the broad area of computation and statistics and their application to various aspects of biology to present their work and showcase their ongoing projects. It is intended to foster the exchange of ideas and build potential collaborations across multiple disciplines.

To be added to or removed from the mailing list, or for any other information, please contact Ellis Patrick or Pengyi Yang.

Seminars in 2017, Semester 1

The seminars will be held at 1:00 pm on Monday in Charles Perkins Centre Seminar Room 1.2. The format of the talk is 30~45 minutes plus questions.

Monday May 1, 2017

Speaker: Tim Burykin (The Life Lab, Charles Perkins Centre)

Title: Call for data: Exploration and visualization of complex datasets with a novel method

Abstract: As a member of professional staff, I'm helping academics at Charles Perkins Centre to visualize their data for presentation, teaching or research purposes. In the first part of the talk I will briefly demonstrate how my images and videos were used to support the narrative of high-impact presentations. I will then focus on the generic method behind these visuals and discuss its usefulness for the exploration and potentially for the in-depth analysis of complex datasets of almost any nature. The talk would be suitable for people who want to look at their data from a different angle or who are searching for a friendly yet comprehensive way to convey their work to the broader audience.

About the speaker: I received my Master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John Crawford. My project was concerned with three-dimensional modelling, analysis and visualization of soil microenvironment and leaf cellular structures. Accumulated experience in computer graphics and efficient algorithm development enabled me to join Judith & David Coffey LifeLab at Charles Perkins Centre as a data visualization technician.

Monday April 24, 2017

Speaker: Alistair Senior(School of Mathematics and Statistics, Charles Perkins Centre)

Title: Meta-analytic tools to detect overlooked variance effects in biological systems

Abstract: Medically, the effects of a treatment on among individual variation in health have direct implications for personalized medicine. Ecologically, among-individual variation governs a species niche and is the grist of evolution by natural selection. However, experimental designs and analytical paradigms in biology are heavily focused on detecting the effects of treatments on population averages. As a result, we have a comparatively poor understanding of how environments and treatments affect among-individual variation. Over the last few years I have been developing tools for meta-analysis, which allow the user to combine the results of published studies to assess the effects of treatments on variation. These methods require only those summary statistics that are reported as a matter of standard practice, and integrate easily with commonly used meta-analytic softwares. I will present a summary of the methodology, as well as examples of its application that are pertinent to research goals of the Charles Perkins Centre.

About the speaker: I did my undergraduate and masters degrees in the UK, where my research was primarily directed towards questions in ecology and evolution. In 2010 I moved to the University of Otago to do a PhD on gene-environment interactions in determining phenotypic sex, with Shinichi Nakagawa. During this period, I developed an interest in the development and application of hierarchical statistics to questions in biology. After graduating, in early 2014 I moved to Sydney where I began working with Profs Simpson and Raubenheimer to apply my quantitative skills to questions in nutritional ecology.

Monday April 17, 2017 (Easter Monday)

Monday April 10, 2017

Speaker: Pengyi Yang (School of Mathematics and Statistics, Charles Perkins Centre)

Title: A dynamic multi-omic atlas of the transition from naive to primed pluripotency.

Abstract: Embryonic stem cells (ESCs) have the potential to generate virtually any differentiated cell types to establish new models of mammalian development and to create new sources of cells for treating an enormous range of diseases. To elucidate the molecular pathways underpinning the transition from naïve to primed pluripotency cell states, we quantified the dynamic changes in the proteome, phosphoproteome, transcriptome, and epigenome underpinning the transition between these cellular states with high temporal resolution. We observed widespread remodelling of the cell across all regulatory layers, and yet the rate, extent and magnitude of phosphorylation changes exceed those observed on other levels, emphasising a critical role for phosphorylation in this process. Our dynamic phosphoproteomics data reveal that ERK and mTOR signalling branches dominate early and late signalling network activity respectively during the ESC to EpiLC transition. Collectively these data provide insight into the molecular processes underlying naïve and primed states, highlighting numerous potential gatekeeper mechanisms governing ESC pluripotency.

About the speaker: I obtained my PhD in bioinformatics from School of Information Technologies, The University of Sydney, in 2012. I then moved to the United States and completed an interdisciplinary Research Fellowship in Systems Biology Group, ESCBL, at National Institutes of Health on characterising transcriptomic and epigenomic regulations in embryonic stem cells (ESCs) using ultrafast sequencing data. I relocated back to Australia in late 2015 on a University of Sydney Postdoctoral Fellowship (DVCR) to pursue my own research in systems biology. I’m now affiliated with School of Mathematics and Statistics (SoMS); and Charles Perkins Centre, The University of Sydney. I have been offered a Lectureship in USyd (April 2016) and a Discovery Early Career Researcher Award (DECRA).

Monday April 3, 2017 (Hunter Meeting)

Monday March 20, 2017

Speaker: Ellis Patrick (School of Mathematics and Statistics)

Title: Deconstructing the innate immune component of a molecular network of the aging frontal cortex.

Abstract: Alzheimer’s disease is pathologically characterized by the accumulation of neuritic β-amyloid plaques and neurofibrillary tangles in the brain and clinically associated with a loss of cognitive function. The dysfunction of microglia cells has been proposed as one of the many cellular mechanisms that can lead to an increase in Alzheimer’s disease pathology. Investigating the molecular underpinnings of microglia function could help isolate the causes of dysfunction while also providing context for broader gene expression changes already observed in mRNA profiles of the human cortex. In this talk I will lay out the various statistical approaches I have used to tackle this problem.

About the speaker: Dr Ellis Patrick is a computational biologist and applied statistician. He is currently an Early Career Development Fellow in the School of Mathematics and Statistics and a staff member at The Westmead Institute for Medical Research. He obtained his PhD in statistical bioinformatics in the School of Mathematics and Statistics at the University of Sydney. In his postdoctoral studies, he worked as a computational biologist with joint appointments at Brigham and Women's hospital, Harvard Medical School and The Broad Institute of MIT and Harvard. He spent this time using his statistical background to investigate the molecular drivers of Alzheimer’s disease and MS. As he spends most of his time analysing large biomedical datasets, his research relies on the subtlety of translating between biological and statistical concepts to form simple, suitable and targeted statistical questions.

Seminars in 2016, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Monday November 14, 2016

Speaker: Darya Vanichkina (Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Marvellous complexity: Exploring the mammalian transcriptome using RNA sequencing

Abstract: The complexity of the trillions of cells that comprise the mammalian body is underpinned not by their genomes, which are by definition identical, but by the temporally and physically precise expression of particular coding genes, long non-coding RNAs and small regulatory RNAs. In my talk, I will present some of the outcomes of my PhD research, which focussed on using and expanding upon developments in RNA sequencing technology and stem cell differentiation to deeply investigate transcripts in cortical and hindbrain-like neurons, and in oligodendrocyte precursor cells. I will also give an overview of my current work, which involves exploring the roles of alternative splicing in controlling gene expression, and the development of new methods of analysing splicing complexity.

About the speaker: I am a genomics data scientist at the Gene and Stem Cell Therapy Program at the Centenary Institute in Sydney, where I investigate how the mammalian genome works using next-generation sequencing. I use a combination of preexisting bioinformatics software and custom R, python, and shell scripts to process terabytes of data on a daily basis, taking advantage of the University of Sydney's HPC facilites. I recently completed my PhD in Bioinformatics and Genomics at the Institute for Molecular Bioscience at the University of Queensland, under the supervision of Dr. Ryan Taft and Professor John Mattick. My work focused on using high-throughput sequencing to understand changes in the transcriptome that occur during neuronal functioning in normal cells and in disease; and on induced pluripotent stem cells as models of the human nervous system. For many years, I have been passionate about teaching, especially programming and bioinformatics to biologists, and was able to do a signifcant amount of this during my PhD studies. I am both a Software and Data Carpentry instructor. I also hold a Specialist Degree in Biochemistry with a Major in Molecular Biology from Lomonosov Moscow State University.

Monday November 7, 2016

Speaker: Weichang Yu (PhD candidate, SoMS, Usyd)

Title: Semisupervised quadratic discriminant analysis using model selection and variational Bayes

Abstract: We develop a mean field collapsed variational Bayes approximation for quadratic discriminant analysis (QDA) with model selection, where we allow missing class information in the training dataset and subsequent model selection. This allows the use of unlabelled data to build the classifier and identification of strong predictors. We demonstrate using simulated and real datasets that this leads to a reduction in prediction error even in cases where the within-class dispersion is large. We make two contributions: We presented a computationally cheaper alternative to Monte Carlo Markov Chain with comparable results for Bayesian inference for QDA and a Bayesian framework for performing model selection in QDA.

About the speaker: I am a first-year PhD student at University of Sydney working with Dr John Ormerod. My research interests includes variational approximation, model selection and the use of predictive algorithms in medical and bio-informatics.

Monday October 24, 2016 (ABACBS, No seminar)

Monday October 17, 2016

Speaker: Jason Wong (Group Leader, Bioinformatics and Integrative Genomics, UNSW)

Title: Gaining fundamental insights into DNA repair-chromatin interactions through cancer genomics

Abstract: Mutations form in the genome through the interplay of DNA lesion formation and incomplete DNA repair. With the advent of cancer genomics and particularly whole cancer genome sequencing, cancer somatic mutations provides us a window into which we can looking into how mutational and DNA repair processes function within human cells. In this talk, I will discuss how we have used whole cancer genome sequencing data to discover a novel biological process. Using publicly available data, we showed that transcription factor binding at active gene promoters can impair nucleotide excision repair (NER) thereby resulting in prevalent mutation hotspots at gene promoters in NER depend cancers such as skin and lung cancers. I will further discuss the implications of this biological process on cancer development and the impact of our study on the interpretation of functional mutations in cancer.

About the speaker: Dr Wong is an ARC Future Fellow at the Prince of Wales Clinical School, UNSW and lead the Bioinformatics and Integrative Genomics Team at the Lowy Cancer Research Centre. He received his B.Sc (Hons I), from the University of Sydney and was award a D.Phil in Bioanalytical Chemistry at the University of Oxford, UK in 2007. This was followed by a post-doctoral fellowship at the University College Dublin, Ireland, before returning to Sydney to join UNSW. To date, he has published 65 peer reviewed journal articles with senior authorship in journals including Nature, Genome Biology, Molecular Cancer Research and Nucleic Acids Research. He has attracted over $2 million in research funding as lead investigator from the ARC, Cancer Australia and Cancer Institute NSW. His current research is focused on the study of mutational processes in cancer and its influence on gene regulation and function.

Monday September 19, 2016

Speaker: Fatemeh Vafaee (CPC, USyd)

Title: Determination of circulating microRNA markers of colorectal cancer prognosis by a novel network-based multi-objective optimisation routine

Abstract: Colorectal cancer presents a significant cause of cancer-related death and effective treatments that maximise quality of life as well as cancer-related outcomes are therefore of major importance. Determining the appropriate treatment pathway through a personalised medicine paradigm is a prime goal, and so biomarkers are sought to aide in the decision-making process. In the age of high-throughput technologies, molecular markers are particularly attractive as a means of achieving true personalisation of cancer treatment. We have recently evaluated the role of circulating microRNA as a means of predicting patients’ prognosis and developed an innovative multi-objective network-based optimisation method to identify robust microRNA signatures which are reliable in terms of predictive power and functional relevance. In this talk, I will go through the details of the proposed method. Also, to identify potential collaboration opportunities with the audience, I also give a concise and general overview of my research interests/projects.

About the speaker: Dr Fatemeh Vafaee received her PhD in Artificial Intelligence from the University of Illinois at Chicago in 2011. Her doctorate studies involved multiple projects in domains of optimisation, machine learning, data mining, pattern recognition, and probabilistic graphical models with the focus on theoretic and applied genetic algorithms as her PhD thesis. While perusing her PhD, Fatemeh also collaborated with the University’s Computational Biology Laboratory and extended her research to biological applications such as cellular network alignment and phylogeny reconstruction. After her PhD, Fatemeh started a postdoctoral research position at the University of Toronto, Ontario Cancer Institute, one of the largest cancer research centres in Canada and worldwide. During her postdoc, Fatemeh had the privilege to work in a highly trans-disciplinary environment and collaborate with world-renowned scholars in integrative cancer informatics. In 2013, Fatemeh took a Research Fellow position at Charles Perkins Centre and School of Maths & Stats at the University of Sydney. Her research relies on a wide national and international collaboration network and she has published several papers in competitive peer-reviewed proceedings and top-tier journals as Nature Methods, Scientific Reports, BMC Systems biology, Plos1 and Alzheimer's & Dementia.

Monday September 12, 2016

Speaker: Chendong Ma (SoMS, USyd)

Title: Honours practice talk

Monday September 5, 2016

Speaker: Shila Ghazanfar (SoMS, USyd)

Title: Integrated Single Cell Data Analysis Reveals Cell-Specific Networks and Novel Coactivation Markers

Abstract: Large scale single cell transcriptome profiling has exploded in recent years and has enabled unprecedented insight into the behavior of individual cells. Identifying genes with high levels of expression using data from single cell RNA sequencing can be useful to characterize very active genes and cells in which this occurs. In particular single cell RNA-Seq allows for cell-specific characterization of high gene expression, as well as gene coexpression. We offer a versatile modeling framework to identify transcriptional states as well as structures of coactivation for different neuronal cell types across multiple datasets. We employed a gamma-normal mixture model to identify active gene expression across cells, and used these to characterize markers for olfactory sensory neuron cell maturity, and to build cell-specific coactivation networks. We found that combined analysis of multiple datasets results in more known maturity markers being identified, as well as pointing towards some novel genes that may be involved in neuronal maturation. We also observed that the cell-specific coactivation networks of mature neurons tended to have a higher centralization network measure than immature neurons. Integration of multiple datasets promises to bring about more statistical power to identify genes and patterns of interest. We found that transforming the data into active and inactive gene states allowed for more direct comparison of datasets, leading to identification of maturity marker genes and cell-specific network observations, taking into account the unique characteristics of single cell transcriptomics data.

About the speaker:

Monday August 29, 2016

Speaker: Kevin Wang (SoMS, USyd)

Title: An adjustment method for gene set over-representation in boutique arrays

Monday August 22, 2016

Speaker: Cali Willet (Sydney Informatics Core Research Facility, USyd)

Title: Bioinformatics services and training

Abstract: An overview of the bioinformatics services and training available through the Sydney Informatics Core Research Facility

About the speaker: Cali Willet is a bioinformatics technician for the Sydney Informatics Core Research Facility at the University of Sydney. She completed her PhD in animal genomics and computational biology in the Faculty of Veterinary Science at the same institution. She is interested in the genetics of disease, particularly in companion and endangered animals, and in the development of bioinformatics methodologies tailored for causal locus identification in non-model organisms. As a bioinformatician for the Core Research Facilities, she is focused on providing support to bioinformatics research groups in the form of consultation, training and advocating for the needs of bioinformatics and computational biology groups at the University of Sydney.

Monday August 15, 2016 (Cancelled)

Monday August 8, 2016

Speaker: Ulf Schmitz (Research Officer, Gene & Stem Cell Therapy Program, Centenary Institute)

Title: Intron retention redefines post-transcriptional gene regulation in mammalian and vertebrate species

Abstract: Intron retention (IR) occurs when the splicing machinery fails to excise introns from primary transcripts. This may give rise to diverse downstream effects, most often however, it induces nonsense-mediated decay (NMD) of the intron-retaining transcript. We performed a phylogenetic analysis of IR in human, mouse, dog, chicken, and zebrafish granulocytes. We found evidence that IR affects functionally related genes in granulocytes throughout evolution, many of which are orthologs. We also found a strong anti-correlation between the number of intron-retaining genes and the number of protein coding genes in a genome. Retained introns have similar characteristics in all investigated species (human, mouse, dog, chicken, zebrafish). They are shorter and have a higher GC content than their non-retaining counterparts; they often reside near the 3 prime end of a transcript and are enriched in premature termination codons. Their host genes harbour a larger number of miRNA binding sites in their 3' untranslated region and are often co-regulated in human and mouse. Our results suggest that IR is a global control mechanism affecting similar biological processes independent of specific effector genes. More important, we gained new insights that support the notion of IR as an independent mechanism of post-transcriptional gene regulation that supplements and maybe even cooperates with other form of post-transcriptional gene regulation.

About the speaker: Ulf Schmitz is a post-doctoral researcher at the Centenary Institute in Sydney. His research focuses on the design of integrative workflows combining various computational disciplines with experimentation to approach molecular biological and medical problems. Between 2003 and 2015, Ulf Schmitz worked as a systems engineer and later as bioinformatician at the Department of Systems Biology & Bioinformatics, University of Rostock, Germany. He was awarded his PhD in Bioinformatics in June 2015. Thereafter, he joined Prof John Rasko’s Gene and Stem Cell Therapy Program as a bioinformatics research officer. In January 2016, he was appointed as Conjoint Senior Lecturer at the Centenary Institute and the Sydney Medical School.

Monday August 1, 2016

Speaker: Dario Strbenac (Senior Research Associate, Statistical Bioinformatics Group, USyd)

Title: Interactive Benchmarking of Quantitative Proteomics Preprocessing Alternatives

Abstract: Mass spectrometry has long been used to analyse biological samples and find associations of altered proteins with experimental conditions. However, the focus of previous method evaluation efforts has been on the peptide amino acid sequence determination problem. Here, using a replicated Latin squares experimental design, the first comprehensive comparison of alternative choices of preprocessing alternatives on the bias and variance of protein quantitation is made. Surprisingly, the variability between iTRAQ labels is larger than between different runs of the instrument. This has consequences for research who don't adequately incorporate randomisation and blocking in their proteomic experimental designs. Secondly, the default preprocessing done by the vendor software ProteinPilot outperforms more advanced methods, such as linear models and RUV, in terms of recovering the expected fold changes (bias). Thirdly, comparing the measurements of different proteins within a sample is shown to be feasible, which was previously assumed to be inaccurate and always avoided. Finally, a benchmarking Shiny application will be demonstrated, which allows users to upload their own preprocessing of the raw data, and see how their method compares to other methods in an interactive scoreboard.

Monday July 25, 2016

Speaker: Joshua Ho (Head, Bioinformatics and Systems Medicine Laboratory, VCCRI)

Title: A systems approach to study organ development and congenital disease

Abstract: A systems biology approach is now being widely employed to systematically how molecular and signalling pathways are regulated in organ development in humans and relevant animal models through bioinformatics. The overarching premise is that the integration of high-quality causal gene regulatory networks (GRNs), genomics, epigenomics, transcriptomics and other genome-wide data will greatly accelerate the discovery of the genetic causes of congenital diseases that stem from dysregulation of GRN during organ development. In this talk I will discuss latest advances in GRN inference and analysis using large amount of experimentally determined perturbation data, and how we can use GRN to study organ development and congenital diseases.

About the speaker: Dr Joshua Ho completed a BSc (Hon 1, Medal) in Biochemistry and Computer Science in 2006 and a PhD in Bioinformatics in 2010, both from the University of Sydney. He then completed an interdisciplinary postdoctoral fellowship at the Harvard Medical School (HMS), and was promoted to an Instructor in Medicine in 2012. In 2013, he returned to Australia to set up the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute. Joshua is also an NHMRC/National Heart Foundation Career Development Fellow, and a conjoint senior lecturer at UNSW. In 2015, he was awarded the NSW Ministerial Award for Rising Stars in Cardiovascular Research, and the Australian Epigenetics Alliance’s Illumina Early Career Research Award. His research focuses on developing fast and reliable bioinformatics methods to identify the genetic cause of inherited heart diseases, using a range of approaches such as whole genome sequencing, machine learning, systems biology, cloud computing, and software testing and quality assurance. Joshua has published over 48 papers, including first-author publications in Nature, Science Signaling, and PLoS Genetics. He is also currently the Secretary of the Australian Bioinformatics and Computational Biology Society (ABACBS).

Seminars in 2016, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

Monday June 20, 2016

Speaker: Rima Chaudhuri (Metabolic Cybernetics Lab, CPC, USyd)

Title: Understanding the relationship between AKT recruitment and GLUT4 translocation to the plasma membrane in fat cells through single cell microscopy data analysis.


About the speaker: Dr. Chaudhuri was awarded her PhD in Bioinformatics from the University of Illinois, USA in 2010. Her doctoral thesis was on the discovery and design of drugs for the treatment of SARS coronavirus and Hepatitis C virus through computational modeling. While pursuing her doctorate degree, she worked as a researcher in software and pharmaceutical companies such as Blackbaud Inc., and Pfizer Inc., (USA) developing modules of scientific research software. Dr. Chaudhuri pursued her postdoctoral training at the Parc Cientific de Barcelona (PCB) as a joint affiliate between the Institute for Research in Biomedicine and the Barcelona Supercomputing Center in Barcelona in biophysical simulations. She holds two international patents in the field of drug discovery and design. After her year-long post-doc in Spain, she moved to Sydney in 2011 and joined the Garvan Institute of Medical Research in the laboratory of Prof. David E. James to work on systems biology based approaches to unravel the complexities behind the incidence of metabolic disease such as diabetes and obesity. Her strength lies in interdisciplinary research and bridging the gap between computational and basic sciences. She is currently a research fellow at the Charles Perkins Centre in the University of Sydney. Her current research interests include isolating candidate bio-markers of T2D and obesity from molecular expression profiles, understanding and targeting protein-protein interactions in disease to facilitate a cure and integrating multi-dimensional data from different platforms (transcriptomics, proteomics, interactomics and metabolomics) to acquire a precise picture of the diseased cell.

Monday June 13, 2016 (Queen's Birthday)

Monday June 6, 2016

Speaker: Ashnil Kumar (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: Computing Image Similarity for Image-Derived Disease Models

Abstract: Imaging is a critical and indispensable component of modern healthcare. The automated analysis of medical images has a vast range of applications in evidenced-based diagnosis, physician education, and biomedical research. These decision support applications are predicated on the ability to objectively compute the similarity of image content in a manner that matches the subjective similarity judgement of human domain experts. In this talk, I will present an overview of the conceptual challenges in this field before detailing my research on methods for characterising and comparing the visual content of images, including a graph-based method for comparing 3D PET-CT lung cancer images and my more recent work using convolutional neural networks.

About the speaker: Dr. Ashnil Kumar received the Ph.D. degree in information technology also from the University of Sydney in 2013; his PhD introduced a new graph-based method for modelling the relationships between tumours and organs in medical images.

Monday May 30, 2016

Speaker: Vinita Deshpande (Metabolic Cybernetics Lab, CPC, USyd)

Title: Removing unwanted variation in large scale ‘omics datasets containing missing values

Abstract: Transcriptomics and proteomics are powerful techniques to obtain a comprehensive snapshot of biological systems ranging from cells to whole organisms. However, a major problem for such big datasets is the presence of missing values, as many statistical tools used to analyse these often require complete data. One such bioinformatic tool is RUV (Removing Unwanted Variation), a widely used R package developed to remove technical variation, such as batch effects, in order to normalise the data and perform downstream analyses such as differential expression analysis.

One of the solutions to overcoming this issue of missing data is to obtain a complete dataset, by either filtering the data to eliminate missing values or performing imputation. These approaches however, can greatly reduce the sample size or biological variation, leading to loss of statistical power. The first part of this talk will describe an alternative approach in which the RUV algorithm was adapted to handle data with missing values as its input. The performance of this new algorithm was evaluated in terms of its ability to normalise and correctly identify differentially expressed genes/proteins in large ’omics datasets containing varying amounts of simulated missing values. The second part of this talk will be a discussion on the future directions and challenges of this PhD project in terms of designing and conducting further quantitative analyses on large scale ‘omics data.

About the speaker: Vinita is a PhD student supervised by Prof David James and Prof Jean Yang at The University of Sydney, where she is pursuing her research interests in the application of systems biology and bioinformatic approaches to metabolic diseases. Vinita has previously completed a Bachelor of Science (Bioinformatics) / Bachelor of Information Technology (Computer Science) with Honours from The University of Sydney. Prior to commencing her PhD, she worked as a bioinformatics research assistant with Dr Joshua Ho in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute in Sydney.

Monday May 23, 2016

Speaker: Ashley Waardenberg (Children's Medical Research Institute; Sydney Medical School, USyd)

Title: Discovering Protein-Protein Interactions from DNA sequence - insights into the cardiac gene regulatory network and disease

Abstract: NKX2-5 is a key transcription factor (TF) required for normal heart development and is implicated in a range of cardiac diseases. NKX2-5 is a critical TF for normal heart development that binds directly to DNA by recognising a specific sequence called the NKX2-5 binding element (NKE). However, until recently its genomic targets were poorly defined and the NKX2-5 protein-protein interaction network remains poorly defined. Recently we identified genomic target regions for NKX2-5 and human disease relevant mutations in cultured HL-1 cardiomyocytes using the DamID method and identified new NKX2-5 disease mechanisms (Bouveret R, Waardenberg AJ, et al. eLIFE, 2015). This talk describes our efforts at predicting and the subsequent validation of novel protein-protein interactions (PPIs) based on recurrent binding sites (or motif grammar) through the application of machine learning algorithms.

About the speaker: Dr Ashley Waardenberg is currently a postdoctoral bioinformatican at the Children's Medical Research Institute, Westmead, where he is developing systems biology approaches for investigating proteomics and high throughput protein modification data related to the brain and associated diseases; in collaboration with Dr Mark Graham and Prof Phil Robinson. He received a PhD in Systems Biology (2012) under the supervision of A/Prof Christine Wells (now at the University of Glasgow, Scotland) and Dr Brian Dalrymple (CSIRO, Australia) where he developed a novel visualisation approach for viewing gene expression data specifically in the context of striated muscle contractile protein location. A key outcome of his PhD was the discovery of a new protein-protein interaction between PI3K and a muscle mechano-sensor in the heart, implicating the muscle contractile apparatus in responding to cardiac stress which has broader implications in the context of PI3K cancer therapies (Waardenberg, et al. Journal of Biological Chemistry, 2011). During his PhD, he was also involved in the Bovine Genome Consortia which was published in Science in 2009 and was a team recipient of the CSIRO Chairman's Medal in 2010 for contributions to this international effort. He then joined the Cardiac Developmental and Stem Cell Biology Laboratory of Professor Richard Harvey at the Victor Chang Cardiac Research Institute, Darlinghurst, as a Postdoctoral Scientist to gain a deeper insight into development biology, furthering an interest in understanding the origins of disease, where he implemented systems biology strategies for understanding genome-wide binding effects of the cardiac transcription factor NKX2-5 and NKX2-5 mutations relevant to congenital heart disease. This has a resulted in a number of recent publications (Waardenberg AJ, Ramialison M et al Cold Spring Harbour of Laboratory Perspectives in Medicine, 2014; Bouveret R, Waardenberg AJ et al. eLIFE, 2015; Waardenberg AJ et al. BMC Bioinformatics, 2015) and he continues to collaborate with the Victor Chang Cardiac Research Institute.

Ashley is also a founding member and Vice-President of the Australian Bioinformatics and Computational Biology Society (ABACBS). Ashley has been heavily involved in establishing this very young society and is passionate about establishing communities in this domain.

Monday May 16, 2016

Speaker: Timur Burykin (Judith and David Coffy Life Lab, CPC, USyd)

Title: Data visualization and exploration using particle dynamics simulation

Abstract: Exploration of complex multidimensional datasets is an ongoing challenge in many fields of research. In the attempt to simplify this task for people with no expertise in advanced statistics or programming a novel method of data visualization was discovered. The algorithm applies simple particle interaction rules on data points and allows them to self-organize into layouts that approximate the clustering of objects in the multidimensional space. Complementary density map, superimposed network connectivity and configurable node properties linked to extra dimensions make this visualization method suitable for a wide range of applications. A few datasets will be demonstrated in this presentation including hospital admission records, TF-TG interaction network and results of diet experiments. The extension of the algorithm to the advanced image and network analysis will also be discussed.

About the speaker: Tim Burykin is an experienced C++ programmer who joined Charles Perkins Centre last year as a data visualization technician and a member of Judith & David Coffey LifeLab supervised by Prof. Zdenka Kuncic. He received a master of IT degree in Russia and moved to Sydney to complete a PhD course in Agriculture under the supervision of Prof. John W. Crawford.

Monday May 9, 2016

Speaker: Denis Bauer (Team leader Transformational Bioinformatics, CSIRO)

Title: VariantSpark: applying Spark-based machine learning methods to genomic information

Abstract: Genomic information is increasingly being used for medical research, giving rise to the need for efficient analysis methodology able to cope with thousands of individuals and millions of variants. Catering for this need, we developed VariantSpark, a Hadoop/Spark framework that utilises the machine learning library, MLlib, thereby providing the means of parallelisation for population-scale bioinformatics tasks. VariantSpark offers an interface to the standard variant format (VCF), seamless genome-wide sampling of variants and provides a pipeline for visualising results. To demonstrate the capabilities of VariantSpark, we cluster of more than 3,000 individuals with 80 Million variants each to determine the population structure in the dataset. VariantSpark is 80% faster than the Spark-based genome clustering approach developed by the Global Alliance for Genomics and Health, ADAM, the comparable implementation using Hadoop/Mahout, as well as Admixture, a commonly used tool for determining individual ancestries. It is over 90% faster than traditional implementations using R and Python. These benefits of speed, resource consumption and scalability enables VariantSpark to open up the usage of advanced, efficient machine learning algorithms to genomic data.

About the speaker: Dr. Denis Bauer is the team leader of the transformational bioinformatics team in CSIRO’s ehealth program. Her expertise is in high throughput genomic data analysis, computational genome engineering, as well as Spark/Hadoop and high-performance compute system. She has a PhD in Bioinformatics and has done her Postdoctoral training in machine learning and human genetics, respectively. Her collaborators include Prof Simon Foote on mammalian susceptibility to infectious diseases, Prof Ian Blair on molecular mechanisms on motor neuron disease, and Prof Rodney Scott on obesity-driven cancer. She has 23 peer-reviewed publications (9 first author, 4 senior author) with three in journals of IF>8 (e.g. Nat Genet.) and H-index 9. To date she has attracted more than AU$25Million in funding.

Monday May 2, 2016 (Florian and Falk farewell. No seminar.)

Monday April 25, 2016 (ANZAC Day)

Monday April 18, 2016

Speaker: Michael De Ridder (The Institute of Biomedical Engineering and Technology (BMET), SIT, USyd)

Title: CeraVA: A Visual Analytics Framework for Neurological Disorder Analysis with Functional Magnetic Resonance Imaging

Abstract: Functional Magnetic Resonance Imaging (fMRI) is an important imaging modality for understanding and diagnosing neurological disorders, such as schizophrenia, bipolar disorder and Alzheimer's disease. The modality temporally scans blood oxygenation as a proxy for neuronal activity. This activity is often processed into three components for analysis: (i) the anatomical context; (ii) individual voxel and region (group of voxel) time-series; and (iii) the correlation of activity between regions. While many statistical and graph theoretical approaches have been applied to data, issues such as noise and a lack of understanding of the brain lead to a diverse range of challenges. Visualisation-based analytics is often used to overcome some of these challenges, however, current methods often present an oversimplification of the data. With CereVA, we integrate all three of the commonly derived activity components in a visual analytics framework comprising of a full scale pipeline that incorporates automatic image processing and interactive visualisation. Finally, we present a new application for fMRI visual analytics by applying CereVA to the active research area of classifying neurological disorders.

About the speaker: Michael de Ridder is a PhD student with The Institute of Biomedical Engineering and Technology (BMET) in the School of Information Technologies at the University of Sydney. He is supervised by A/Prof Jinman Kim. Michael's work straddles the boundary of scientific and information visualisation with a heavy influence from medical imaging techniques.

Monday April 11, 2016 (Hunter Meeting)

Monday April 4, 2016

Speaker: Taiyun Kim (Victor Chang Cardiac Research Institute, and UNSW)

Title: PAD: An interactive web portal for analysis of transcription factor co-binding at promoters and enhancers

Abstract: It has long been observed that transcription factors (TFs) bind to DNA collaboratively with other TFs as co-binding partners. Recently, through studying the genomic binding sites of essential embryonic stem cell TF NF-Y, Dr Pengyi Yang has shown that the same TF may bind DNA with different co-binding partners if we consider TF binding sites that are proximal or distal to transcription start sites separately. Based on this observation, we have developed a database of binding sites of >200 TFs in mouse embryonic stem cells, and an interactive web portal that enables any user-submitted TF binding profiles to be clustered and visualised with our database TF profiles, at the proximal and distal regions separately. Our tool contributes to our understanding of how gene regulation occurs via combinatorial binding if TFs in different cell types.

About the speaker: Taiyun Kim is a 5th year student in the Bachelor of Engineering (Bioinformatics)/Masters of Biomedical Engineering program at the University of New South Wales. In 2015, he was awarded a Summer Scholarship to work in the Bioinformatics and Systems Medicine Laboratory at the Victor Chang Cardiac Research Institute (VCCRI), under the supervision of Dr Joshua Ho (VCCRI) and Dr Pengyi Yang (University of Sydney).

Monday Feb 1, 2016

Speaker: Lei Sun (School of Information Engineering, Yangzhou University, China)

Title: Study on long noncoding RNAs using computational methods

Abstract: Tens of thousands of long noncoding RNAs (lncRNAs) newly discovered have been attracting the spotlight from life science for a period of time as their important biological functions are revealed increasingly. Due to the intrinsic complexity of lncRNA functions and mechanisms, our group proposes to study the lncRNAs using a series of computational methods, which can certainly improve the research efficiency. In my talk, I would like to share some ideas on the results of lncRNA prediction using support vector machine (SVM), and to discuss potential lncRNA-specific transcriptional patterns detected using computational methods.

About the speaker: Dr. Lei Sun received a doctor of engineering degree from China University of Mining and Technology in 2013. Now he is a lecturer in School of Information Engineering at Yangzhou University, P.R. China. is research interests include bioinformatics, signal and information processing. As a visiting PhD student, Dr. Sun was previously doing research on bioinformatics in several institutes and universities respectively, including School of IT at The University of Sydney, Institute of Molecular Bioscience (IBM) at University of Queensland, and Beijing Institute of Genomics (BIG) of Chinese Academy of Sciences.

Seminars in 2015, Semester 2

Show talks from Semester 2 / Hide talks from Semester 2

Seminars in 2015, Semester 2

The seminars will be held at 1:00 pm on Monday in Access Grid Room, which is on level 8 of Carslaw Building. The format of the talk is 30~45 minutes plus questions.

Monday Nov 30

Speaker: Shila Ghazanfar (SMS, Faculty of Sciences, The University of Sydney)

Title: Gene coexpression identification from single-cell expression experiments

Abstract: Classically, gene expression profiles have represented an aggregate of expression levels of each of the multitude of cells within the sample of interest. More recently, technologies utilising quantitative PCR, such as nanoString, enable measurement of expression in individual cells opposed to an amalgamation of cells. As such, using this data along with appropriate statistical models, we can ask questions such as in what proportion of cells certain genes are expressed, and we can determine the distribution of coexpression of genes among these cells. In collaboration with Associate Professor David Lin at Cornell University, whose interest lies in investigating the olfactory system in mouse models, a set of neuronal cells were assayed with special interest in the protocadherin family of genes. We describe the statistical methods for processing the single-cell expression data and identifying coexpression of genes in subsets of the cell population, including mixture modelling and visualisation techniques for further insight.

About the speaker: Shila Ghazanfar is a 3rd year PhD student and Postgraduate Teaching Fellow in the School of Mathematics and Statistics at The University of Sydney. She is supervised by A/Prof Jean Yee Hwa Yang (The University of Sydney), Dr John Ormerod (The University of Sydney) and Dr Michael Buckley (CSIRO). Her research interests are in analysing high throughput sequence data such as RNA-Seq and Exome-Seq, and integrating different types of high-throughput data. She has previously completed a Bachelor of Science (Advanced Mathematics) and Honours in Statistics at the University of Sydney.

Monday Nov 23

Speaker: Cristian Leyton (Faculty of Health Sciences, The University of Sydney)

Title: Primary progressive aphasia and its challenges

Abstract: Primary progressive aphasia (PPA) comprises a group of neurodegenerative conditions that affect predominantly the language function. As a result of the partial destruction of the language network, several clinical variants of PPA have been described, each of which has its own profile of linguistic deficits, distribution of brain atrophy, and molecular pathology. This unique group of conditions offers a natural paradigm to understand the neural basis of language processing and how neurodegeneration starts and spreads. I will explain the main challenges in the field and explore potential contributions of bioinformatics to the field.

About the speaker: Dr Leyton worked as a clinical neurologist for four years before moving to Australia in 2009. He was awarded with a PhD on progressive aphasias in 2013 at UNSW. In 2014, he was awarded with a DVC University Postdoctoral Fellowship at the Faculty of Health Sciences, University of Sydney. His main interest is the study of aphasic manifestations caused by neurodegenerative diseases.

Monday Nov 16

Speaker: David Humphreys (Victor Chang Cardiac Research Institute)

Title: Obstacles and challenges in specialised RNA sequencing (RNA-seq) analysis

Abstract: In recent times RNA-seq has become an affordable method to profile the transcriptome of a biological sample. One of the strengths of RNA-Seq over other technologies is the ability to capture information at a nucleotide level using relatively unbiased methods without having any prior genomic information. These features have given rise to many RNA-Seq applications and with this often arises new challenges in the bioinformatics analysis. In this presentation I will highlight the challenges (and some solutions) in small RNA-Seq, RNA editing and circular RNA analysis from high throughput sequencing data.

About the speaker: Dr David Humphreys is a multidisciplinary wet-lab-scientist/bioinformatician who manages the Genomics Core facility at the Victor Chang Cardiac Research Institute. His undergraduate training comprised of a joint major in Biology and Computer Science before completing honours followed by a PhD in molecular biology. After joining the Victor Chang Cardiac Research Institute (VCCRI) he developed a research interest in gene regulation and the involvement of small non-coding RNAs. Since 2009 he has been heavily involved in studies utilising high throughput sequence technologies which has allowed him to refocus his computer science skills. David has a number of active research collaborations with VCCRI faculty and St Vincent's Hospital cardiologists utilising RNA-Seq and exome sequencing.

Monday Nov 9

Speaker: James Burchfield (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Snapshots of diabetes

Abstract: The imaging of biological systems has become a fundamental tool in the cell biologists arsenal. Our lab has utilised a range of imaging techniques to probe the insulin signalling network in single cells and have data that throws into question the traditional view of signalling networks. Central to the continued success of this approach is the ability to extract relevant information from this large volume of high-content image data and whilst the analytical pipelines for large scale genomic and proteomic data has undergone a revolution in recent years, there has been a lack of development of similar tools to analyse data generated from imaging experiments. I will discuss some of the developments in this arena in the context of diabetes and insulin resistance.

About the speaker: James obtained his PhD from The University of Sydney and pursued a postdoctoral fellowship in David James' lab in Garvan Institute of Medical Research. In 2014, James relocated with the David James' lab to Charles Perkins Centre in University of Sydney. James is the expert in using high performance microscopy for single cell imaging.

Monday Nov 2

Speaker: Martin Wong (Metabolic Cybernetics Lab, Charles Perkins Centre, The University of Sydney)

Title: Kinetic simulation of the Akt pathway in Insulin Signalling

Abstract: Traditional biological research pathways are often focused on discovery of novel protein-protein interactions. The temporal kinetics of signalling events is a feature that is not commonly investigated, but they may encode important information regarding the physical mechanisms underlying these interactions. The talk today will discuss how in kinetic simulations can be used to infer these physical mechanisms. The talk will begin by discussion how models are constructed in terms of the rate equation used, the network topology and the parameter fitting procedure. The application of this will then be discussed in more detail in the context of Insulin Signalling and the Akt signalling pathway where new insight has been obtained regarding the phosphorylation mechanism of Akt prior to the activation of its downstream substrates.

About the speaker: Martin comes from a very diverse background, having completed a Bachelor of Science majoring in Physics, and a Bachelor of Engineering in the Biomedical stream. He also completed an honours in engineering where he worked on developing a bioactive material for use in implanted devices. He is now a few months from completing his PhD under David James from the Metabolic Cybernetics Lab in the CPC, and Zdenka Kuncic from the Institute of Medical Physics at the School of Physics, where he is using mathematical modelling to interrogate the temporal aspects of insulin signalling.

Monday Oct 26

Speaker: Lake-Ee Quek (Coffey LifeLab, Charles Perkins Centre, The University of Sydney)

Title: Processing of metabolite data generated by mass spectrometry

Abstract: Metabolites are the chemical species transformed during metabolism. They are direct signatures of cellular state, and therefore easier to correlate with phenotype. With mass spectrometry, the appeal is the ability to rapidly measure thousands of metabolites from very small samples. The talk today will briefly introduce metabolomics, although the focus will be on data acquisition and processing, in the context of targeted metabolomics. Global metabolic profiling in an unbiased fashion is the ultimate aim in metabolomics, with the right analytics and bioinformatics.

About the speaker: Lake-Ee obtained his PhD in Cell Metabolism in The University of Queensland (UQ) in 2010. He has been a UQ Postdoctoral Fellow from 2011 to 2014 and a Research Postdoc with A/Prof. Nigel Turner, Mitochondria Bioenergetics Lab, School of Medical Sciences, UNSW, from 2014 to 2015. He recently take on a Postdoctoral Fellowship with Coffey LifeLab in Charles Perkins Centre and relocated to The University of Sydney.

Seminars in 2015, Semester 1

Show talks from Semester 1 / Hide talks from Semester 1

The seminars will be held at 1:00 pm on Monday in Access Grid Room.

Monday June 22

Mark Greenaway (Stat, Sydney)

New tools in the R ecosystem

Show abstract / Hide abstract

In the past few years, there has been an explosion of interest in data science. R has been at the forefront of this field, which has lead to a lot of positive contributions to the R ecosystem from the wider tech. community. Useful tools from the tech. community which are available for R will be outlined, particularly focusing on GitHub, visualisation, the contributions of Hadley Wickham/RStudio, Spark and cloud computing.

Monday May 25

Paul Lin (Stat, UNSW)

Gene expression Changes in Human Rett Syndrome Brain

Show abstract / Hide abstract

Rett Syndrome (RTT) is an X-linked neurodevelopmental disorder. It affects girls at a frequency of 1 out of 10000 live births. Our study is the first transcriptome-level analysis of post-mortem RTT brain tissue with age-matched controls. We have used two technologies, RNA-seq and micro-array, to replicate our findings. We have taken into consideration of tissue composition, which hasn?t been done in previous RTT studies; we have found that tissue composition affects the outcomes of differential expression analysis. More than 95% of classic RTT cases are caused by sporadic mutations in the gene encoding methyl-CpG binding protein 2 (MeCP2). Initial studies have pointed out a transcriptional repressor role of MeCP2; our data is consistent with recent data and confirms that MeCP2 is a transcriptional activator. We have also shown that intergenic L1 expression increases in human RTT brain. Lastly, co-expression networks will be demonstrated to identify brain region specific enhancer RNAs in the human brain. In this study, we have identified a set of Robust Brain-Expressed Enhancers (rBEEs). rBEEs are enriched for genetic variants associated with autism spectrum disorders (ASD).

Monday May 18

John Ormerod (Stat, Sydney)

Statistical Analysis of a Lupus Flow Cytometry Experiment

Show abstract / Hide abstract

Results from an analysis of an flow cytometry-based observational study of patients of patients with lupus will be presented. The data was collected and analysed over a five year period at the Centenary Institute at University of Sydney. The underlying biotechnology will be described and how the statistical complications associated with the data including measurement error, missing values, and outliers were resolved.

Monday May 11

Rima Chaudhuri (CPC, Sydney)

In-silico differentiation between direct and indirect protein binding partners from a large MS-based protein interactome experiment

Show abstract / Hide abstract

MS-based protein interactome experiment Abstract: Protein-protein interactions (PPIs) are crucial in all cellular processes, primarily in understanding signalling cascades and protein functions. Affinity purification, followed by mass spectrometry analysis (AP/MS), offers a powerful approach for the study of complex protein-??protein interactions. However, such MS-based high-throughput screens are notorious for high false discovery rates (FDR). Secondly, such screens do not allow differentiation between direct an indirect protein binders. In this study, we developed a scoring function that ranks putative binders based on their likelihood of being a direct binder using an array of features, including 3D protein structure information. We use the interactomes of eIF4E and 4EBP1 proteins implicated in insulin resistance as case studies to elucidate the principles behind the development of this scoring function. Lastly, Phosphortholog, a web-based tool to map orthologous post-translational modification sites on proteins across species is demonstrated.

Monday May 4

Euijoon Ahn (IT, Sydney)

Automated Melanoma Segmentation and Classification

Show abstract / Hide abstract

The segmentation of skin lesions in dermoscopic images is considered as one of the most important steps in computer-aided diagnosis (CAD) for automated melanoma diagnosis. Existing methods, however, have problems with over-segmentation and do not perform well when the contrast between the lesion and its surrounding skin is low. A new automated saliency-based skin lesion segmentation (SSLS) method is proposed, which is designed to exploit the inherent properties of dermoscopic images, which have a focal central region and subtle contrast discrimination with the surrounding regions. The proposed method was evaluated on a public dataset of lesional dermoscopic images and was compared to established methods for lesion segmentation that included adaptive thresholding, Chan-based level set and seeded region growing. Results show that SSLS outperformed the other methods in regard to accuracy and robustness, in particular, for difficult cases. Superpixels are also introduced.

Monday April 27

Shila Ghazanfar (Stat, Sydney)

Integrative Analysis of Somatic Mutations with Focus on Biological Pathways

Show abstract / Hide abstract

The development and severity of cancers depends on the somatic mutations occurring in the tissue. Technologies like whole exome and whole genome sequencing (WES/WGS) have allowed for interrogation of the somatic mutations taken on in a tumour compared to normal tissue in a patient. However, it is clear that some mutations are worse than others, leading to work in identification of genes harbouring ?driver? mutations as opposed to ?passenger? mutations. Further to this there is work in elucidating the role these mutations play in the system as a whole, via integration of mutation, gene expression, and network information (e.g. protein-protein interaction networks), as well as other data sources. In this seminar I will discuss my work up to date on methods that aim to answer these questions, with focus on the melanoma dataset.

Monday April 20

Ellis Patrick (Stat, Sydney)

Using Resampling to Fit Better Models

Show abstract / Hide abstract

The weighted bootstrap is one of many procedures for evaluating the goodness of fit of a model. I would like to attempt to highlight how and why this work changed the way I thought about cross-validation and, most importantly, the practical impacts of using a weighted bootstrap for estimating LASSO penalty parameters. Diane Loo's work is highly relevant for anyone that has ever used cross-validation or ever plans to. I will use the prognostic melanoma data to highlight a few of the limitations of cross-validation. Time permitting, some of the issues we have faced when trying to explore and validate the weighted bootstrap will be explained. This work is currently being drafted for journal submission.

Monday April 13

Kevin Wang (Stat, Sydney) and Sarah Romanes (Stat, Sydney)

Data Exploration and Subtype Discovery and Prognosis Prediction

Show abstract / Hide abstract

Finding an appropriate measure of association between connected regions of brain resting fMRI datasets. Potential challenges of the project are noted and some exploratory analysis on a cleaned fMRI dataset is shown.