The Datasets
Living datasets backed by automated experiments and open source methods.
DATASETS IN INCUBATION
We’ve received over 100 dataset concepts from scientists around the world.
From ideation to incubation
We collect dataset ideas from scientists around the world.
We host workshops to facilitate community discussion and identify actionable ideas.
We organize working groups to develop ideas into methods for data collection
Submit a dataset idea
Are you a scientist, machine learning specialist, or automation expert interested in creating high-fidelity datasets in biology? Have a dream dataset? Participate in our Open Datasets Initiative!
WORKSHOPS
Our workshops promote collaboration, identify scientific needs, and result in actionable proposals.
Upcoming Workshops
May, 2024
Vancouver, BC, Canada
We are currently gathering ideas for datasets and predictive models that can advance the field of microbial engineering.
Microbes! The Workshop
Past Workshops
March, 2023
San Francisco, CA, USA
This workshop was cohosted with Homeworld Collective to explore how will Generative AI, in silico prediction and automation will come together to impact the biggest biological challenges of our era.
AI-Powered Experimentation Workshop
Protein Sequence to Expression Workshop
October, 2023
San Francisco, CA, USA
Through our initial discussions with scientists in academia and industry, we had found growing interest in pursuing a Sequence to Expression dataset to build a predictive model for understanding the impact of DNA sequence on protein expression in a variety of industrially relevant microbes.
This workshop brought together leading practitioners in protein engineering, high-throughput experimentation, and machine learning to determined the initial scope, methods, and feasibility of collecting this dataset and creating a predictive model of protein expression.
Protein Sequence to Function Workshop
September, 2023
The Francis Crick Institute, London, United Kingdom
This workshop will brought together scientists and ML experts that submitted protein sequence to function ideas or had expertise in function experimentation or prediction.
The goal of this workshop was collaboration — the open discussions and ideation helped us learn about the challenges in predicting protein function from sequence, and identify target selection strategies and measurement techniques that turned into working groups and dataset proposals.
LEARN MORE
Read about our existing Datasets
Sequence to Expression dataset (coming soon)
Questions? Check out our FAQs page.
Supported by
A philanthropic initiative founded by Eric and Wendy Schmidt.
A civic engagement initiative by Ken Griffin