DATASET

Big Title

Protein functions, such as enzymatic activities, binding interactions, and membrane transport, exist as islands in the “archipelago” of the protein function landscape. Machine learning (ML) algorithms have tried to bridge this gap, but are still unable to find a general solution for predicting any protein’s function from its DNA sequence.

A general solution for predicting any protein function from sequence would catalyze a transformation in the field of biology.

We propose to develop an experimental platform and unified data ontology for collecting datasets from different functional ‘islands’ to build predictive models for individual protein functions. The experimental strategy uses a pooled, growth-based assay measured with DNA sequencing to create a simple, yet adaptable system that can be easily expanded to encompass new functions. 

Click here to read the full Transcription Factor proposal.

Proposal Team

  • Pete Kelly

    PROGRAM DIRECTOR

    Open Datasets Initiative

  • Dana Cortade

    TECHNICAL PROJECT MANAGER

    Open Datasets Initiative

  • David Ross

    PROPOSAL CO-LEADER

    Living Measurement Systems Foundry, National Institute of Standards and Technology (NIST)

  • Erika DeBenedictis

    PROPOSAL CO-LEADER

    Biodesign Lab, The Francis Crick Institute and Align to innovate

  • Simon d'Oelsnitz

    PROPOSAL CO-LEADER

    Harvard Medical School, Harvard University

  • Anjali Chadha

    PROTEASE SPECIFICITY

    Biodesign Lab, The Francis Crick Institute

  • Adam Winnifrith

    PROTEASE SPECIFICITY

    Biodesign Lab, The Francis Crick Institute

  • Geoffrey Taghon

    TRANSCRIPTION FACTOR BINDING

    Living Measurement Systems Foundry, National Institute of Standards and Technology (NIST)

TIMELINE


PROTOCOLS

Link to Google Drive


Contact us!