# Statistical Bioinformatics Seminar: Pascovici -- DIA/SWATH - challenges and opportunities for bioinformatics

The aim of the statistical bioinformatics seminar is to provide a forum for people
working within the broad area of computation and statistics and their application
to various aspects of biology to present their work and showcase their ongoing
projects. It is intended to foster the exchange of ideas and build potential
collaborations across multiple disciplines.

The seminars will be held at 1:00 pm on Mondays at the Charles Perkins Centre,
Seminar Room (Level 3, large meeting room). Seminars in 2018 will begin in March.
The format of the talk is 30~45 minutes plus questions.

Monday March 19, 2018

Speaker: Dana Pascovici (Macquarie University)

Title: DIA/SWATH - challenges and opportunities for bioinformatics

Abstract: Protein quantitation using DIA/SWATH mass spectrometry has been growing
in popularity over the last few years. From the point of view of the bioinformatics
involved, on one hand the data resulting from such experiments is quite easy to
analyse at least if the experiment is not too large, due to a much lower percentage
of missing data, and data look and distribution that makes existing methodology
from other areas quite easily applicable. Put plainly, extracted SWATH data is
quite nice to work with. However, that is because much of the difficulty has been
pushed underneath, at the level of the SWATH library building and data extraction,
where it is somewhat hidden from view.

In this talk we will describe SWATH and its place in the landscape of quantitative
proteomics (including broad comparisons with label free and labelled techniques such
as iTRAQ and TMT), and the many positive aspects of the resulting SWATH datasets,
from the point of view of the data analyst. We will also focus on how SWATH data
extraction usually relies on using high quality peptide MS/MS spectral libraries,
however building such libraries to ensure good proteome coverage can be time
consuming and expensive. In order to address this issue various computational
approaches for merging archived or external libraries were created and
evaluated, including efforts from our group. We will describe the appeal of such
methods, the possible issues that can ensue and some approaches to tackle them in
order to ensure that the proteins are reliably detected and their quantitation is
consistent and reproducible. We will discuss these aspects in the context of
several existing datasets, including a carefully designed spiked-in experiment,
and a recently published large plasma proteomics experiment containing samples
from neonates, young children and adults.

I am currently a Biostatistician at the Australian Proteome Analysis
Facility at Macquarie University, where I help people generate biological
insights out of their proteomics data, especially in the context of
complex experiments.

Working in a proteomics facility, our focus has been on generating
reliable methods of interpreting and analysing data from a variety of
platforms, lately emphasizing SWATH and TMT, and wherever possible
incorporating them into software workflows.  Areas of particular relevance
to us have been plasma proteomics, and plant proteomics of agriculturally
important species.  Our work has benefitted from interactions with
researchers, students and the APAF team of mass spectrometry specialists
and analytical chemists.

I come from a mathematical and computational background, having completed
a bachelor degree in Mathematics and Computer Science at Dartmouth College
in the US, followed by a PhD in Mathematics at MIT, and a brief stint of
teaching at Purdue.  In Sydney I took a more practical turn and worked in
the industry in the area of speech recognition, before settling into
biostatistics for the past 13 years, both in the industry and research
environment.


