The Datasets
Living datasets backed by automated experiments and open source methods.
DATASETS IN INCUBATION
We’ve received over 100 dataset concepts from scientists around the world.
From ideation to incubation
We collect dataset ideas from scientists around the world.
We host workshops to facilitate community discussion and identify actionable ideas.
We organize working groups to develop ideas into methods for data collection
Submit a dataset idea
Are you a scientist, machine learning specialist, or automation expert interested in creating high-fidelity datasets in biology? Have a dream dataset? Participate in our Open Datasets Initiative!
WORKSHOPS
Our workshops promote collaboration, identify scientific needs, and result in actionable proposals.
Upcoming Workshops
Sign up for our newsletter to stay up-to-date with all of Align to Innovates’s activities!
Stay tuned!
Past Workshops
May, 2024
Vancouver, BC, Canada
This workshop brought together scientists to identify key microbiology dataset ideas that will enable the next generation of predictive models. Throughout the workshop, participants engaged in a series of guided sessions designed to explore current challenges in microbial datasets and identify promising projects to address some of these issues.
The overarching goal was to distill ideas into well-defined dataset proposals, each accompanied by potential approaches to data (and metadata) collection and use for model creation. The preliminary dataset ideas, predictive models, and data collection methods developed during the workshop are detailed in our Microbes Ideation Workshop Report and will be utilized by Align to Innovate's Open Datasets Initiative as foundational elements guiding our next public data collection effort.
If you are interested in contributing to this dataset, email us at datasets@alignbio.org.
Microbes! The Workshop
Protein Sequence to Expression Workshop
October, 2023
San Francisco, CA, USA
Through our initial discussions with scientists in academia and industry, we had found growing interest in pursuing a Sequence to Expression dataset to build a predictive model for understanding the impact of DNA sequence on protein expression in a variety of industrially relevant microbes.
This workshop brought together leading practitioners in protein engineering, high-throughput experimentation, and machine learning to determined the initial scope, methods, and feasibility of collecting this dataset and creating a predictive model of protein expression.
Protein Sequence to Function Workshop
September, 2023
The Francis Crick Institute, London, United Kingdom
This workshop will brought together scientists and ML experts that submitted protein sequence to function ideas or had expertise in function experimentation or prediction.
The goal of this workshop was collaboration — the open discussions and ideation helped us learn about the challenges in predicting protein function from sequence, and identify target selection strategies and measurement techniques that turned into working groups and dataset proposals.
AI-Powered Experimentation Workshop
March, 2023
San Francisco, CA, USA
This workshop was cohosted with Homeworld Collective to explore how will Generative AI, in silico prediction and automation will come together to impact the biggest biological challenges of our era.
LEARN MORE
Read about our existing Datasets
Questions? Check out our FAQs page.
Supported by
A philanthropic initiative founded by Eric and Wendy Schmidt.
A civic engagement initiative by Ken Griffin