Proteins Roadmap

Building predictive models for protein engineering.

At Align, we’re building a shared roadmap to unlock predictive models for protein engineering—starting with a grand challenge: predict the properties of a protein directly from its sequence. We aim to transform protein engineering from trial-and-error into a data-driven discipline. Our roadmap is co-developed with the scientific community and grounded in open data, reproducible methods, and scalable experimentation.

Overview

We partnered with the community to define where new data could most accelerate protein engineering. Through workshops and collaborative planning, two initial high-impact goals emerged: predicting protein function and predicting expression from sequence. These now anchor our protein project roadmap and shape the datasets we’re building with partners.

Design of a generalized platform for gathering protein sequence → function datasets at scale
Read Me
A strategy for scalable data collection of soluble protein expression in diverse hosts
Read Me
Can protein expression be "solved"?
Read Me
Sequence-to-Expression
October 1, 2023 | San Francisco, CA

This workshop brought together experts in protein engineering, high-throughput experimentation, and ML who were interested in modeling protein expression across microbes. Through open discussion and roadmap co-design, we identified key challenges in linking DNA sequence to expression levels, aligned on target organisms and measurement approaches, and scoped an initial dataset. These conversations shaped the proposals and collaborations now in motion. Read more here.

Sequence-to-Function
September 1, 2023 | The Francis Crick Institute, London, UK

This workshop brought together scientists and ML researchers with expertise in protein function — from experimental design to prediction. The goal was collaborative roadmap-building: through open discussion and shared ideation, we surfaced key challenges in function prediction, explored target selection strategies, and aligned on measurement techniques. These conversations seeded the working groups and dataset proposals we’re now advancing. Read more here.

Interested in getting involved?