Generative modelling of protein structure and sequence

Project Summary

Protein design with generative modelling is transforming biology. Today, proteins are routinely designed in silico and validated in the wet lab using frameworks originally developed for generating images, video and text. This progress has unlocked a host of novel applications, from creating bespoke binders to engineering new enzymes.

Yet generative protein modelling remains in its early days. Unlike image‑generation, diffusion methods for proteins haven’t yet seen the same rapid algorithmic advances. As a result, generated sequences often show less diversity than natural proteins, offer limited ways to control the outcome, and require sampling “tricks” (for example, lowering the sampling temperature) to produce acceptable designs.

Our lab focuses on pushing these frontiers. We’re developing next‑generation models that co‑design both sequence and structure for diverse purposes, such as binding a target protein, catalysing an economically or ecologically important reaction, or interacting with nucleic acids for gene editing. This work spans architecture and data‑sampling innovations, incorporation of protein‑inspired priors, and creation of synthetic training data. Top models will be tested against real wet‑lab data from the GBI to drive further improvements.

Potential Supervisors  

  • Professor Jason Chin (Founding Director, GBI, EIT & Professor of Chemistry and Chemical Biology, Department of Chemistry, University of Oxford)  

University DPhil Courses 

Skills Recommended

  • A Master’s Degree (or equivalent) in a relevant scientific discipline (e.g. Biology, Chemistry, Engineering, Computer Science)
  • Experience of hands-on research in a laboratory setting
  • Proven ability to work independently, think creatively, and solve complex problems
  • Experience with data analysis, automation platforms, or computational tools relevant to the field
  • Experience preparing publications and delivering scientific presentations
  • Strong organisational skills and the ability to manage multiple parallel workstreams
  • Excellent written and verbal communication skills, including the ability to collaborate across multidisciplinary teams
  • A proactive mindset and enthusiasm for working in a fast-paced, high-growth research environment

Relevant Literature

  • Watson, et al., 2023. De novo design of protein structure and function with RFdiffusion. Nature.
  • Ingraham, et al., 2023. Illuminating protein space with a programmable generative model. Nature.
  • Dauparas, et al., 2022. Robust deep learning–based protein sequence design using ProteinMPNN. Science.