You Need More Than Attention
PhD Projects in Artificial Intelligence

Project summary
Attention has been the dominant paradigm within machine learning since 'Attention is All You Need' in 2017. Originally designed for machine translation, it was quickly shown to be the key component in state-of-the-art systems for text, vision, biological systems and graphs.
However, despite this pervasiveness, attention mechanisms suffer from a major weakness; quadratic compute complexity. Attention based systems such as transformers incur high costs in time, compute, energy and carbon emissions for long contexts or sequences.
This weakness has spurred a number of research directions into more scalable methods, including:
- State space models
- Recurrent systems
- Hybrid approaches
- Efficient attention mechanisms
However, no alternative approaches have yet been able to match the performance of the transformer family on downstream tasks with some high profile systems such as Alpha fold relying on even less scalable triangular(cubic complexity) attention.
The goal is to develop alternate paradigms to the attention mechanism that would unlock the next step change in AI capabilities. Support would be provided to explore applications in text, images and scientific domains such as genetics.
Potential supervisors
- Dr Liam Atkinson (Research Engineer, EIT)
- Dr Ira Ktena (Research Scientist, EIT)
- Dr Ben Chamberlain (Research Scientist, EIT)
- Additional Supervisor(s) from the University of Oxford
Skills Recommended
- Strong background in linear algebra, probability, optimisation and statistics
- Experience with deep learning (PyTorch or JAX) and modern ML engineering
- Proficiency in Python; familiarity with CUDA or GPU performance tools isa plus
- Exposure to signal processing/control (for SSMs) or sequence modelling a plus
Skills to be Developed
- Designing and analysing SSM/RNN/hybrid sequence operators
- Developing large scale multi-GPU training and inference codebases
- Building long-context evaluation harnesses (retrieval, in-context learning, streaming) and ablations across operator choices
- Profiling throughput, memory and energy, with principled carbon reporting for training and inference
- Communicating results via open-source releases and reproducible benchmarks
University DPhil Courses
Relevant background reading
Foundations & Motivation
- Vaswani et al., Attention Is All You Need (Transformer)
- Patterson et al., Carbon Emissions andLarge Neural Network Training
State-Space Models
- Gu, Goel & Ré, S4: Efficiently Modeling Long Sequences with Structured State Spaces
- Smith et al., S5: Simplified State SpaceLayers for Sequence Modeling
- Gu & Dao, Mamba: Selective State Space Models
- Li, Singh & Grover, Mamba-ND:Selective SSMs for Multi-Dimensional Data
Modern RNNs / Retention
- Sun et al., Retentive Network (RetNet): A Successor to Transformer
- Peng et al., RWKV: Reinventing RNNs for the Transformer Era
- Yang et al., Parallelizing Linear Transformers with the Delta Rule
- HGRN2: Gated Linear RNNs with StateExpansion
- Zucchet et al., Gated RNNs Discover Attention (theory link between gated RNNs and linear self-attention)
Hybrid & Convolutional Alternatives
- Poli et al., Hyena Hierarchy: Towards Larger Convolutional Language Models.
- Lieber et al., Jamba: Hybrid Transformer-Mamba
- Dao et al., Griffin: Mixing Gated Linear Recurrences with Local Attention.
- Together Research, StripedHyena (hybrid gated conv + grouped attention).
Efficient Attention (keep when attention is needed)
- Katharopoulos et al., Transformers areRNNs: Linear Attention
- Wang et al., Linformer
- Beltagy et al., Longformer
- Zaheer et al., BigBird
- Kitaev et al., Reformer
- Dao et al., FlashAttention
- Kwon et al., vLLM: PagedAttention for high throughput serving
Surveys / Overviews
- Wan et al., Efficient Large LanguageModels: A Survey
- “From S4 to Mamba: A Comprehensive Survey on SSMs” (2025)
- Efficient Attention Mechanisms for LLMs: A Survey (2025)