The Datasets

Living datasets backed by automated experiments and open source methods.

DATASETS IN INCUBATION

We’ve received over 100 dataset concepts from scientists around the world.

From ideation to incubation

We collect dataset ideas from scientists around the world.

We host workshops to facilitate community discussion and identify actionable ideas.

We organize working groups to develop ideas into methods for data collection

Submit a dataset idea

Are you a scientist, machine learning specialist, or automation expert interested in creating high-fidelity datasets in biology? Have a dream dataset? Participate in our Open Datasets Initiative!

WORKSHOPS

Our workshops promote collaboration, identify scientific needs, and result in actionable proposals.

Upcoming Workshops


Sign up for our newsletter to stay up-to-date with all of Align to Innovates’s activities!

Stay tuned!

Past Workshops


May, 2024
Vancouver, BC, Canada

This workshop brought together scientists to identify key microbiology dataset ideas that will enable the next generation of predictive models. Throughout the workshop, participants engaged in a series of guided sessions designed to explore current challenges in microbial datasets and identify promising projects to address some of these issues.

The overarching goal was to distill ideas into well-defined dataset proposals, each accompanied by potential approaches to data (and metadata) collection and use for model creation. The preliminary dataset ideas, predictive models, and data collection methods developed during the workshop are detailed in our Microbes Ideation Workshop Report and will be utilized by Align to Innovate's Open Datasets Initiative as foundational elements guiding our next public data collection effort.

If you are interested in contributing to this dataset, email us at datasets@alignbio.org.

Microbes! The Workshop


Protein Sequence to Expression Workshop

October, 2023
San Francisco, CA, USA

Through our initial discussions with scientists in academia and industry, we had found growing interest in pursuing a Sequence to Expression dataset to build a predictive model for understanding the impact of DNA sequence on protein expression in a variety of industrially relevant microbes.

This workshop brought together leading practitioners in protein engineering, high-throughput experimentation, and machine learning to determined the initial scope, methods, and feasibility of collecting this dataset and creating a predictive model of protein expression.


Protein Sequence to Function Workshop

September, 2023
The Francis Crick Institute, London, United Kingdom

This workshop will brought together scientists and ML experts that submitted protein sequence to function ideas or had expertise in function experimentation or prediction.

The goal of this workshop was collaboration — the open discussions and ideation helped us learn about the challenges in predicting protein function from sequence, and identify target selection strategies and measurement techniques that turned into working groups and dataset proposals.


AI-Powered Experimentation Workshop

March, 2023
San Francisco, CA, USA

This workshop was cohosted with Homeworld Collective to explore how will Generative AI, in silico prediction and automation will come together to impact the biggest biological challenges of our era.

LEARN MORE

Read about our existing Datasets

Supported by

A philanthropic initiative founded by Eric and Wendy Schmidt.

A civic engagement initiative by Ken Griffin