Align Announces New Collaboration with ATCC to Bring AI-Ready Microbial Datasets to Life

A new open-access microbial phenotyping dataset will accelerate AI research and empower the global scientific community

COVINA, Calif., June 18, 2025 – The Align Foundation, a nonprofit accelerating predictive biology through artificial intelligence (AI)-driven research infrastructure, today announced a collaboration with ATCC, the leading global nonprofit provider of microbial strains, cell lines, in vitro models, and standards, to create the world’s largest public, AI-ready microbial phenotyping dataset.

Together, Align and ATCC will generate high-quality phenotypic data for 1,000 phylogenetically diverse microbial strains across 1,000 cultivation conditions. This will create an unparalleled foundation for AI models that link genotype to phenotype—crucial to engineering biology in a safe and predictable manner. By combining Align’s scalable, highthroughput experimental platform with ATCC’s extensive authenticated microbial collection and expertise in microbial and cellular genomics, Align aims to bridge a critical gap in the scientific landscape: the lack of large, standardized, public datasets collected under consistent experimental conditions.

“Our vision at Align is to build the research infrastructure needed to make biological data collection frictionless, scalable, and shareable,” said Erika DeBenedictis, PhD, co-founder of Align. “Collaborating with ATCC — an organization synonymous with biological quality and reproducibility — is an incredible opportunity to create a large-scale, public resource that can help enable the next generation of AI-driven biological discovery. We’re honored to work alongside them, and this opportunity would not be possible without their diverse and trusted biomaterials.“

The dataset will be expansive, covering diverse environments (such as atmospheric and temperature variations), multiple media types (undefined, semi-defined, and defined), and a wide range of metabolic supplements (carbon sources, vitamins, cofactors, and metals). Each strain's growth and morphology will be systematically measured and linked to its genomic sequence, enabling researchers to train and validate machine learning models that predict microbial physiology from genetic information.

“The reliability of AI-driven biological insights depends entirely on the quality of the data—and ultimately, the source materials—used to train the predictive models. At ATCC, we are committed to providing reference datasets alongside our trusted biological reference materials so that these future insights can be physically reproduced and validated in the lab,” said Ruth Cheng, PhD, president and CEO of ATCC. “Our collaboration with Align is an important step towards enabling researchers to reliably apply AI in biology by building a dataset that is traceable to the authenticated microbial resources at ATCC.”

This is a first-of-its-kind effort that reflects our shared commitment to making biological research more reproducible, scalable, and accessible to the entire scientific community to solve the most challenging research problems. Phenotypic results will be hosted on Align’s Phenome Portal, with links to ATCC’s Genome Portal, enabling seamless cross-navigation between genotype and phenotype.

Why This Matters

Today’s biological data landscape is fragmented: datasets are often small, inconsistent, or inaccessible. By launching a public, scalable, and standardized microbial phenotyping resource, we’re removing barriers for researchers and creating a platform for faster, better model development to enable predictable engineering—making a significant impact on human health, environmental sustainability, and economic growth.

This collaboration marks a significant step forward, and we invite the global research community to engage with this open resource. Learn more about this project and explore opportunities to collaborate at alignbio.org/datasets-microbes or reach out to us directly at contact@alignbio.org.

About The Align Foundation

The Align Foundation is a research nonprofit launched in 2021 to make life sciences more predictive. Align enables the building of large, open biological datasets through highthroughput experimentation, automation partnerships, and global scientific collaboration. Align also hosts competitions to transparently benchmark scientific progress and measure the impact of open data. With support from philanthropic funders, Align is creating the reproducible, scalable, and shareable infrastructure needed to unlock the next generation of data-powered breakthroughs in biology. Learn more at alignbio.org.

Media Contacts:

Naomi Hagelund
The Align Foundation
comms@alignbio.org

About ATCC

ATCC is a premier global biological materials and information resource and standards organization and the leading developer and supplier of authenticated cell lines, microorganisms, and associated data for academia, industry, and government. With a history of scientific contributions spanning more than a century, ATCC offers an unmatched combination of being the world’s largest and most diverse collection of biological reference materials and data, and is a mission-driven, trusted partner that supports and encourages scientific collaboration. ATCC products, services, partnerships, and people provide the global scientific community with credible, advanced model systems to support complex research and innovations in basic science, drug discovery, translational medicine, and public health. ATCC is a 501(c)3 nonprofit organization headquartered in Manassas, Virginia, with research and technology centers of excellence in Gaithersburg and Germantown, Maryland. Learn more at atcc.org.

Media Contacts:

Naomi Hagelund
The Align Foundation
comms@alignbio.org

Samantha Paro
ATCC
pr@atcc.org