You Need More Than Attention

PhD Projects in Artificial Intelligence

Project summary

Attention has been the dominant paradigm within machine learning since 'Attention is All You Need' in 2017. Originally designed for machine translation, it was quickly shown to be the key component in state-of-the-art systems for text, vision, biological systems and graphs.

However, despite this pervasiveness, attention mechanisms suffer from a major weakness; quadratic compute complexity. Attention based systems such as transformers incur high costs in time, compute, energy and carbon emissions for long contexts or sequences.

This weakness has spurred a number of research directions into more scalable methods, including:

  1. State space models
  2. Recurrent systems
  3. Hybrid approaches
  4. Efficient attention mechanisms

However, no alternative approaches have yet been able to match the performance of the transformer family on downstream tasks with some high profile systems such as Alpha fold relying on even less scalable triangular(cubic complexity) attention.

The goal is to develop alternate paradigms to the attention mechanism that would unlock the next step change in AI capabilities. Support would be provided to explore applications in text, images and scientific domains such as genetics.

Potential supervisors

  • Dr Liam Atkinson (Research Engineer, EIT)
  • Dr Ira Ktena (Research Scientist, EIT)
  • Dr Ben Chamberlain (Research Scientist, EIT)
  • Additional Supervisor(s) from the University of Oxford

Skills Recommended

  • Strong background in linear algebra, probability, optimisation and statistics
  • Experience with deep learning (PyTorch or JAX) and modern ML engineering
  • Proficiency in Python; familiarity with CUDA or GPU performance tools isa plus
  • Exposure to signal processing/control (for SSMs) or sequence modelling a plus

Skills to be Developed

  • Designing and analysing SSM/RNN/hybrid sequence operators
  • Developing large scale multi-GPU training and inference codebases
  • Building long-context evaluation harnesses (retrieval, in-context learning, streaming) and ablations across operator choices
  • Profiling throughput, memory and energy, with principled carbon reporting for training and inference
  •  Communicating results via open-source releases and reproducible benchmarks

University DPhil Courses  ‍

Relevant background reading

Foundations & Motivation

State-Space Models

Modern RNNs / Retention

Hybrid & Convolutional Alternatives

Efficient Attention (keep when attention is needed)

Surveys / Overviews