The Datasets

Living datasets backed by automated experiments and open source methods.

DATASETS IN INCUBATION

We’ve received over 100 dataset concepts from scientists around the world.

From ideation to incubation

We collect dataset ideas from scientists around the world.

We host workshops to facilitate community discussion and identify actionable ideas.

We organize working groups to develop ideas into methods for data collection

Submit a dataset idea

Are you a scientist, machine learning specialist, or automation expert interested in creating high-fidelity datasets in biology? Have a dream dataset? Participate in our Open Datasets Initiative!

WORKSHOPS

Our workshops promote collaboration, identify scientific needs, and result in actionable proposals.

Upcoming Workshops


May, 2024
Vancouver, BC, Canada

We are currently gathering ideas for datasets and predictive models that can advance the field of microbial engineering.

Microbes! The Workshop

Past Workshops


March, 2023
San Francisco, CA, USA

This workshop was cohosted with Homeworld Collective to explore how will Generative AI, in silico prediction and automation will come together to impact the biggest biological challenges of our era.

AI-Powered Experimentation Workshop


Protein Sequence to Expression Workshop

October, 2023
San Francisco, CA, USA

Through our initial discussions with scientists in academia and industry, we had found growing interest in pursuing a Sequence to Expression dataset to build a predictive model for understanding the impact of DNA sequence on protein expression in a variety of industrially relevant microbes.

This workshop brought together leading practitioners in protein engineering, high-throughput experimentation, and machine learning to determined the initial scope, methods, and feasibility of collecting this dataset and creating a predictive model of protein expression.


Protein Sequence to Function Workshop

September, 2023
The Francis Crick Institute, London, United Kingdom

This workshop will brought together scientists and ML experts that submitted protein sequence to function ideas or had expertise in function experimentation or prediction.

The goal of this workshop was collaboration — the open discussions and ideation helped us learn about the challenges in predicting protein function from sequence, and identify target selection strategies and measurement techniques that turned into working groups and dataset proposals.

LEARN MORE

Read about our existing Datasets

Sequence to Expression dataset (coming soon)

Supported by

A philanthropic initiative founded by Eric and Wendy Schmidt.

A civic engagement initiative by Ken Griffin