You Need More Than Attention

PhD Projects in Artificial Intelligence

Project summary

Attention has been the dominant paradigm within machine learning since 'Attention is All You Need' in 2017. Originally designed for machine translation, it was quickly shown to be the key component in state-of-the-art systems for text, vision, biological systems and graphs.

However, despite this pervasiveness, attention mechanisms suffer from a major weakness; quadratic compute complexity. Attention based systems such as transformers incur high costs in time, compute, energy and carbon emissions for long contexts or sequences.

This weakness has spurred a number of research directions into more scalable methods, including:

State space models
Recurrent systems
Hybrid approaches
Efficient attention mechanisms

However, no alternative approaches have yet been able to match the performance of the transformer family on downstream tasks with some high profile systems such as Alpha fold relying on even less scalable triangular(cubic complexity) attention.

The goal is to develop alternate paradigms to the attention mechanism that would unlock the next step change in AI capabilities. Support would be provided to explore applications in text, images and scientific domains such as genetics.

Potential supervisors

Dr Liam Atkinson (Research Engineer, EIT)
Dr Ira Ktena (Research Scientist, EIT)
Dr Ben Chamberlain (Research Scientist, EIT)
Additional Supervisor(s) from the University of Oxford

Skills Recommended

Strong background in linear algebra, probability, optimisation and statistics
Experience with deep learning (PyTorch or JAX) and modern ML engineering
Proficiency in Python; familiarity with CUDA or GPU performance tools isa plus
Exposure to signal processing/control (for SSMs) or sequence modelling a plus

Skills to be Developed

Designing and analysing SSM/RNN/hybrid sequence operators
Developing large scale multi-GPU training and inference codebases
Building long-context evaluation harnesses (retrieval, in-context learning, streaming) and ablations across operator choices
Profiling throughput, memory and energy, with principled carbon reporting for training and inference
Communicating results via open-source releases and reproducible benchmarks

University DPhil Courses ‍

Relevant background reading

Foundations & Motivation

State-Space Models

Modern RNNs / Retention

Hybrid & Convolutional Alternatives

Efficient Attention (keep when attention is needed)

Surveys / Overviews

‍