Statistical Bioinformatics Webinar: Pu -- Graph Algorithms and Machine Learning

Presented by Dr.  Lianrong Pu (Tel Aviv University) 

High-throughput sequencing techniques generate large volumes of DNA sequencing data at
ultra-fast speed and extremely low cost.  To handle the large datasets produced by these
techniques, efficient data structures and algorithms are necessary and have been
developed in recent decades.  How to analyse these deep sequencing data by using graph
algorithms and machine learning methods will be introduced.  Specifically, a three-way
classifier for metagenome assemblies will be the focus.  Viruses and plasmids are part
of microbial communities and play a major role in disease and in antibiotic resistance.
In metagenome sequence assembly, identifying virus and plasmid contigs is a hard task,
since they tend to form shorter contigs and are overwhelmed by a larger mass of
bacterial contigs.  3CAC is a new classifier that builds on machine learning based
classifiers and exploits the structure of assembly graph for the classification of
contigs into bacterial, viral, plasmidic, and unknown contigs.  In simulated and real
metagenomes of short and long reads, 3CAC outperformed the state-of-the-art algorithms.

