
The Datasets
Living datasets backed by automated experiments and open source methods.
Genotype to Phenotype: Design of an Extensible Experimental Platform for Characterizing Microbes
Microorganisms are everywhere, driving health, food, ecosystems, and biotechnology, thriving in environments from hydrothermal vents to soil and the human gut. Despite their vast diversity, only a small fraction of the phylogenetic lineages have been successfully isolated, and our understanding of how environmental conditions influence microbial growth remains limited.
A generalizable model for predicting microbial phenotypes from genotypes could revolutionize biology, driving advancements across diverse fields such as human health, climate science, and biomanufacturing.
We propose to develop a phenotyping platform that leverages machine learning and high-throughput experimentation to produce a comprehensive dataset suitable for predictive modelling. This platform will focus on understanding and predicting how environmental conditions impact microbial phenotypes, such as growth, morphology, and function.
We will gather phenotypic data for 1,000 phylogenetically diverse microbial strains across 1,000 cultivation conditions, resulting in one million unique experiments. Our platform will leverage BacterAI, a reinforcement learning agent that designs automated condition variation experiments to maximize the knowledge base.
In partnership with ATCC, we will carefully select strains to maximize biodiversity, ensuring the dataset represents a broad range of microbial life. To further enhance diversity, Align will accept strain donations from the broader scientific community. Community strain donations will open in late 2025, with testing set to begin in 2026. Register your interest in contributing samples HERE.
OUR STRATEGY
The 1000 x 1000 plan
Using an automated phenotyping platform, we will profile the growth of every microbe across 1,000 conditions that span combinations of different environments (e.g., atmosphere, temperature), base growth media (undefined, semi-defined, and defined), and media alterations (e.g., carbon sources, vitamins, cofactors, metals). Our experimental approach assumes no prior knowledge about a microbe’s growth capabilities, instead we start by culturing each microbe in diverse media and environments, using the resulting insights to iteratively refine and optimize subsequent experiments.
Growth data will be collected through OD600 measurements at multiple timepoints to generate comprehensive growth curves. Additional measurements such as pH and microscopy will also be collected to provide further insights into microbial behavior and adaptation. For all strains provided through our partnership with ATCC, sequence information will be available through ATCC’s Genome Portal. Any community strain donations not previously sequenced will be sent for Whole Genome Sequencing (WGS).
HOW ARE WE DOING IT?
2025 - Phase I: Methods development (20,000 datapoints)
2025 - Phase II: Pilot scale (230,000 datapoints)
2026 - Phase III: Full scale (750,000 datapoints)
OUR TIMELINE
Project Team
Shantal Al Habib
TECHNICAL PROJECT MANAGER
Team Member Spotlight
Shantal studied biological chemistry at the University of Toronto and has worked in R&D and process development at biotech companies, where she also gained experience managing scientific projects. At Align to Innovate, Shantal leads the Microbes dataset initiative, aiming to advance our understanding of microbial biodiversity and growth.
Proposal Contributors
Adam Winnifrith - University of Oxford
Adrienne Hoarfrost - University of Georgia
Benjamin David - University of Michigan
David Bernstein - University of Vermont
Devon Stork - Pioneer Labs
Elisha Wood-Charlson - Lawrence Berkeley National Laboratory
Jonathan Eisen - University of California, Davis
Julia Koblitz - DSMZ
Lauren Fitch - ArkeaBio
Lorenz Reimer - DSMZ
Paul Jensen - University of Michigan
Sasha Milshteyn - Transition Biomining Inc.
Satnam Surae - Twig Bio
Scott Baker - Pacific Northwest National Laboratories
Una Nattermann - Pioneer Labs
Victoria Knight-Connoni - ATCC
Talk to us! Here’s how to participate:
Email us at datasets@alignbio.org