SMS scnews item created by Larissa Fedunik-Hofman at Thu 7 Sep 2023 1536
Type: Seminar
Modified: Tue 12 Sep 2023 1141; Mon 9 Oct 2023 2131; Mon 9 Oct 2023 2144; Fri 20 Oct 2023 1225; Thu 23 Nov 2023 1110
Distribution: World
Expiry: 16 Dec 2023
Calendar1: 13 Sep 2023 1000-1100
CalLoc1: Pharmacy and Bank Building Seminar Room N351 & Online
CalTitle1: Greg Yang (xAI) ’Mathematical challenges in AI’ seminar
Calendar2: 28 Sep 2023 0800-0900
CalLoc2: Online
CalTitle2: Sadhika Malladi (Princeton University) ’Mathematical challenges in AI’ seminar
Calendar3: 12 Oct 2023 2000-2100
CalLoc3: Quad Seminar Room S204 (Oriental) (A14.02.S204) & Online
CalTitle3: Neel Nanda (Deep Mind) "Mechanistic Interpretability & Mathematics", Mathematical challenges in AI seminar
Calendar4: 26 Oct 2023 0900-1000
CalLoc4: Carslaw 273 & Online
CalTitle4: Paul Christiano (Alignment Research Center) "Formalizing Explanations of Neural Network Behaviors", Mathematical challenges in AI seminar
Calendar5: 23 Nov 2023 1900-2000
CalLoc5: Carslaw 273 & Online
CalTitle5: Francois Charton (Meta AI) ’Mathematical challenges in AI’ seminar
Auth: (lfed9203) in SMS-SAML

Mathematical challenges in AI

Machine Learning

Thursday 23 November: Charton

The final Mathematical challenges in AI seminar will take place this evening.

The main focus of the seminars this year will be to explore the mathematical problems that arise in modern machine learning. For example, we aim to cover:

1) Mathematical problems (e.g. in linear algebra and probability theory) whose resolution would assist the design, implementation and understanding of current AI models.

2) Mathematical problems or results resulting from interpretability of ML models.

3) Mathematical questions posing challenges for AI systems.

Our aim is to attract interested mathematicians to what we see as a fascinating and important source of new research directions.

The seminar is an initiative of the Sydney Mathematical Research Institute (SMRI).

You can watch the seminar recordings on the "Mathematical challenges in AI" YouTube playlist.

Speakers list and schedule

Greg Yang (xAI): September 13, 10–11 am
Pharmacy and Bank Building Seminar Room N351
Title: The unreasonable effectiveness of mathematics in large scale deep learning
Abstract: Recently, the theory of infinite-width neural networks led to the first technology, muTransfer, for tuning enormous neural networks that are too expensive to train more than once. For example, this allowed us to tune the 6.7 billion parameter version of GPT-3 using only 7% of its pretraining compute budget, and with some asterisks, we get a performance comparable to the original GPT-3 model with twice the parameter count. In this talk, I will explain the core insight behind this theory. In fact, this is an instance of what I call the *Optimal Scaling Thesis*, which connects infinite-size limits for general notions of “size” to the optimal design of large models in practice. I'll end with several concrete key mathematical research questions whose resolutions will have incredible impact on the future of AI.

Sadhika Malladi (Princeton University): September 28, 8–9 am
Title: Mathematical Views on Modern Deep Learning Optimization
Abstract: This talk focuses on how rigorous mathematical tools can be used to describe the optimization of large, highly non-convex neural networks. We start by covering how stochastic differential equations (SDEs) provide a rigorous yet flexible model of how deep networks change over the course of training. We then cover how the SDEs yield practical insights into scaling training to highly distributed settings while preserving generalization performance. In the second half of the talk, we will explore the new deep learning paradigm of pre-training and fine-tuning large language models. We show that fine-tuning can be described by a very simplistic mathematical model, and insights allow us to develop a highly efficient and performant optimizer to fine-tune LLMs at scale. The talk will focus on various mathematical tools and the extent to which they can describe modern day deep learning.

Neel Nanda (Deep Mind): October 12, 8–9 pm
Title: Mechanistic Interpretability & Mathematics
Abstract: Mechanistic Interpretability is a branch of machine learning that takes a trained neural network, and tries to reverse-engineer the algorithms it's learned. First, I'll discuss what we've learned by reverse-engineering tiny models trained to do mathematical operations, eg the algorithm learned to do modular addition. I'll then discuss the phenomena of superposition, where models spontaneously learn to use the geometry of high-dimensional spaces to use compression schemes and represent and compute more features than they have dimensions. Superposition is a major open problem in mechanistic interpretability, and I'll discuss some of the weird mathematical phenomena that come up with superposition, some recent work exploring it, and open problems in the field.

Paul Christiano (Alignment Research Center): October 26, 9–10 am
Title: Formalizing Explanations of Neural Network Behaviors
Abstract: Existing research on mechanistic interpretability usually tries to develop an informal human understanding of “how a model works,” making it hard to evaluate research results and raising concerns about scalability. Meanwhile formal proofs of model properties seem far out of reach both in theory and practice. In this talk I’ll discuss an alternative strategy for “explaining” a particular behavior of a given neural network. This notion is much weaker than proving that the network exhibits the behavior, but may still provide similar safety benefits. This talk will primarily motivate a research direction and a set of theoretical questions rather than presenting results.

Francois Charton (Meta AI): November 23, 7–8 pm
Title: Transformers for maths, and maths for transformers
Abstract: Transformers can be trained to solve problems of mathematics. I present two recent applications, in mathematics and physics: predicting integer sequences, and discovering the properties of scattering amplitudes in a close relative of Quantum Chromo Dynamics. Problems of mathematics can also help understand transformers. Using two examples from linear algebra and integer arithmetic, I show that model predictions can be explained, that trained models do not confabulate, and that carefully choosing the training distributions can help achieve better, and more robust, performance.

Note: the schedule is in Sydney time (UTC+10hrs), iCal link.

Seminar info:

Format: Starting from September 13th, the seminars will be (roughly) fortnightly and typically on Thursdays with possible discussion sessions in the middle.

Location: Carslaw 273 + online

For more details, including Zoom link and access to the Slack channel, visit the Mathematical challenges in AI website.

ball Calendar (ICS file) download, for import into your favourite calendar application
ball UNCLUTTER for printing
ball AUTHENTICATE to mark the scnews item as read
School members may try to .