The world’s first cloud lab grant program
The Bioautomation Challenge
Status: Discontinued
Launched in 2022 with support from Schmidt Futures, the Bioautomation Challenge explored whether researchers could script and remotely execute experiments in cloud labs—eliminating the need to own physical equipment. The Challenge surfaced steep technical barriers for academic users and highlighted systemic vulnerabilities in the cloud lab ecosystem, including the shutdown of Strateos, prolonged outages at Emerald Cloud Lab, and significant cost increases across providers.
Key Learnings:
- Infrastructure is a bottleneck. Automation is essential for reproducibility and throughput, but is not offered in formats friendly to new users. We learned that embedding knowledgeable automation engineers in method development dramatically accelerates progress and lowers technical barriers.
- Durability matters. Scientific methods must outlast individual labs, companies, and funding cycles. Resilient data collection is enabled by reproducible protocols, proven to generate data with equivalent results at multiple locations.
Living datasets, backed by automated data collection
The Open Datasets Initiative
Status: Expanding
The Open Datasets Initiative was designed to explore a more curated approach to automation access. In this program, Align identifies scientists with promising data collection methods and supports them in turning these methods into publicly accessible data collection platforms. Align handles the logistical, legal, and long-term operational tasks that individual researchers can’t easily take on, like providing support from professional automation engineers, contracting with data collection facilities, managing sample logistics, and hosting data for long-term public access. This initiative was launched in 2022 with support from Schmidt Futures and Griffin Catalyst.
To date, Align has received over 100 dataset concepts from scientists globally. We hosted three in-person workshops to refine these ideas, form working groups, and shepherd high-potential concepts through peer review. The most mature proposals are now being onboarded at automation facilities across multiple sites to ensure reproducibility and reduce key facility risk. As these platforms come online, external scientists can request data collection through Align’s web portal. Through our global network—spanning over 150 external collaborators across academic institutions, national labs, and industry—Align is now generating “living datasets” in areas such as protein expression, function, and microbial phenotypes. Scientists already use these platforms to conduct specific scientific projects, and in addition the resulting aggregated dataset serves as foundational resources for training predictive models that drive progress across the field.
The components of a living dataset:
- Agenda setting: Data priorities emerge from community-based roadmapping
- Automation partners: Data collections methods are onboarded at professional facilities.
- N=2: Each method runs at two or more sites to ensure reproducibility and avoid key facility risk.
- Transparency: Proposals, technical updates, and datasets are published.
- Data access: A web portal supports standardized data requests.
- Embargo: Requesters receive a one-year data embargo.
- Licensing: Datasets are released under permissive licenses.
- Iteration: Learnings from data collection are routinely fed back into method development cycles.
A public benchmark of protein design AI techniques
The Protein Engineering Tournament
Status: Expanding
Machine learning methods in life science are advancing faster than they can be experimentally validated. The Protein Engineering Tournament addresses this gap by benchmarking AI-generated predictions not through simulation, but through real-world experiments.
Inspired by CASP—the competition that led to AlphaFold2—Align’s tournament provides a reproducible, transparent framework for evaluating predictive and generative protein design models using standardized, automated lab protocols.
In 2023, Align launched a pilot focused on enzyme design. Participants competed in two phases: first, predicting protein biophysical properties; second, generating novel sequences for synthesis and experimental testing. Methods, data, and evaluation criteria were shared openly, and final scores were based on real-world performance. This created a transparent venue for benchmarking generative protein design.
6 datasets donated by 2 industry and 4 academic groups
1,200 novel protein sequences synthesized and tested
85 participants across 28 teams
7 finalist teams: 3 academic, 2 industry, 2 mixed
Comparative benchmarking of pLM, GNN, and RNN models