Friday November 2, 2pm, Carslaw 173
University of Newcastle, School of Mathematical and Physical Sciences
Non-Parametric Density Estimates for Data Clustering, Visualisation, and Outlier Detection
Non-parametric density estimates are a useful tool for tackling different problems in statistical learning and data mining, most noticeably in the unsupervised and semi-supervised learning scenarios. In this talk, I elaborate on HDBSCAN*, a density-based framework for hierarchical and partitioning clustering, outlier detection, and data visualisation. Since its introduction in 2015, HDBSCAN* has gained increasing attention from both researchers and practitioners in data mining, with computationally efficient third-party implementations already available in major open-source software distributions such as R/CRAN and Python/SciKit-learn, as well as successful real-world applications reported in different fields. I will discuss the core HDBSCAN* algorithm and its interpretation from a non-parametric modelling perspective as well as from the perspective of graph theory. I will also discuss post-processing routines to perform hierarchy simplification, cluster evaluation, optimal cluster selection, visualisation, and outlier detection. Finally, I briefly survey a number of unsupervised and semi-supervised extensions of the HDBSCAN* framework currently under development along with students and collaborators, as well as some topics for future research.
Prof. Ricardo Campello received his Bachelor degree in Electronics Engineering from the State University of São Paulo, Brazil, in 1994, and his MSc and PhD degrees in Electrical and Computer Engineering from the State University of Campinas, Brazil, in 1997 and 2002, respectively. Among other appointments, he was a Post-doctoral Fellow at the University of Nice, France (fall/winter 2002 - 2003), an Assistant/Associate Professor in computer science at the University of São Paulo, Brazil (2007 - 2016), and a Visiting Professor in computer science at the University of Alberta, Canada (2011 - 2013), where he is currently an Adjunct Professor (since 2017). Between 2016 and 2018 he was a Professor in applied mathematics, College of Science and Engineering, James Cook University (JCU), Australia, where he was co-responsible for the development of a professional Master of Data Science online programme. Currently he holds a position of Adjunct Professor at JCU. He is a Professor of data science within the discipline of statistics in the University of Newcastle, Australia, since July/2018.