History

Align’s strategy is built on experiments—of the infrastructure kind.

During our first two years, Align ran three major programs to test what the science ecosystem needs most, and how we could help it grow.

    The world’s first cloud lab grant program

    The Bioautomation Challenge

    Status: Discontinued

    Launched in 2022 with support from Schmidt Futures, the Bioautomation Challenge explored whether researchers could script and remotely execute experiments in cloud labs—eliminating the need to own physical equipment. The Challenge surfaced steep technical barriers for academic users and highlighted systemic vulnerabilities in the cloud lab ecosystem, including the shutdown of Strateos, prolonged outages at Emerald Cloud Lab, and significant cost increases across providers.

    Key Learnings:

    • Infrastructure is a bottleneck. Automation is essential for reproducibility and throughput, but is not offered in formats friendly to new users. We learned that embedding knowledgeable automation engineers in method development dramatically accelerates progress and lowers technical barriers.
    • Durability matters. Scientific methods must outlast individual labs, companies, and funding cycles. Resilient data collection is enabled by reproducible protocols, proven to generate data with equivalent results at multiple locations.
    Open-source cloud lab on GitHub
    View Codebase
    Open-source methods for DNA manipulation
    Read the paper
    Screencasts showcasing cloud lab protocols
    Watch on our YouTube

    Living datasets, backed by automated data collection

    The Open Datasets Initiative

    Status: Expanding

    The Open Datasets Initiative was designed to explore a more curated approach to automation access. In this program, Align identifies scientists with promising data collection methods and supports them in turning these methods into publicly accessible data collection platforms. Align handles the logistical, legal, and long-term operational tasks that individual researchers can’t easily take on, like providing support from professional automation engineers, contracting with data collection facilities, managing sample logistics, and hosting data for long-term public access. This initiative was launched in 2022 with support from Schmidt Futures and Griffin Catalyst.

    To date, Align has received over 100 dataset concepts from scientists globally. We hosted three in-person workshops to refine these ideas, form working groups, and shepherd high-potential concepts through peer review. The most mature proposals are now being onboarded at automation facilities across multiple sites to ensure reproducibility and reduce key facility risk. As these platforms come online, external scientists can request data collection through Align’s web portal. Through our global network—spanning over 150 external collaborators across academic institutions, national labs, and industry—Align is now generating “living datasets” in areas such as protein expression, function, and microbial phenotypes. Scientists already use these platforms to conduct specific scientific projects, and in addition the resulting aggregated dataset serves as foundational resources for training predictive models that drive progress across the field.

    The components of a living dataset:

    • Agenda setting: Data priorities emerge from community-based roadmapping
    • Automation partners: Data collections methods are onboarded at professional facilities.
    • N=2: Each method runs at two or more sites to ensure reproducibility and avoid key facility risk.
    • Transparency: Proposals, technical updates, and datasets are published.
    • Data access: A web portal supports standardized data requests.
    • Embargo: Requesters receive a one-year data embargo.
    • Licensing: Datasets are released under permissive licenses.
    • Iteration: Learnings from data collection are routinely fed back into method development cycles.
    Protein Expression Roadmap
    Read the paper
    Can protein expression be solved?
    Read the Paper
    Microbial Phenotypes Roadmap
    Read the paper

    A public benchmark of protein design AI techniques

    The Protein Engineering Tournament

    Status: Expanding

    Machine learning methods in life science are advancing faster than they can be experimentally validated. The Protein Engineering Tournament addresses this gap by benchmarking AI-generated predictions not through simulation, but through real-world experiments.

    Inspired by CASP—the competition that led to AlphaFold2—Align’s tournament provides a reproducible, transparent framework for evaluating predictive and generative protein design models using standardized, automated lab protocols.

    In 2023, Align launched a pilot focused on enzyme design. Participants competed in two phases: first, predicting protein biophysical properties; second, generating novel sequences for synthesis and experimental testing. Methods, data, and evaluation criteria were shared openly, and final scores were based on real-world performance. This created a transparent venue for benchmarking generative protein design.

     6 datasets donated by 2 industry and 4 academic groups

    1,200 novel protein sequences synthesized and tested

    85 participants across 28 teams

    7 finalist teams: 3 academic, 2 industry, 2 mixed

    Comparative benchmarking of pLM, GNN, and RNN models

    Interested in getting involved?