The Tournament

A protein engineering competition employing high-throughput automation and cloud lab experimentation.

Enzyme Design

THE 2023 PROTEIN ENGINEERING TOURNAMENT

The pilot Protein Engineering Tournament was launched on May 1st 2023 with the theme of Enzyme Design.

Enzymes, as nature’s catalysts, stand at the forefront of research in academic and industrial biotechnology, offering unmatched versatility across a spectrum of applications—from pharmaceuticals and food processing to biofuels and environmental remediation. By harnessing the power of enzyme design, participants are invited to contribute to the development of sustainable solutions that tackle pressing issues such as climate change, renewable energy production, and the efficient use of resources.

The pilot Tournament was based on six datasets received from both industry and academic groups. Initial interest in the pilot tournament led to the registration of just over 30 teams, representing a mix of academic (55%), industry (30%), and independent (15%) teams, with research experience running from Nobel Laureates to high school students. For the pilot tournament, the in vitro round experimentation was performed in-house by a corporate partner, International Flavors and Fragrances (IFF).

Click here to access our GitHub and all of the content from the 2023 pilot Protein Engineering Tournament

This includes all data, submissions, analysis scripts, figures, and team abstracts.

PAGE NAVIGATION

Tournament Structure

2023 Pilot Tournament Datasets

2023 Results: Overall Champions

2023 Results: in silico Round

Overview
Zero-shot Events
Supervised Events
Perceived Difficulty

2023 Results: in vitro Round

Overview
Data Quality
Results
Perceived Difficulty

Teams

Publications from the Tournament

Highlights from the 2023 pilot Tournament

7

final teams, including: 3 academic, 2 industry, and 2 mixed teams

85

users registered to access the data across 28 teams

>150,000

datapoints across
all of our donated datasets

6

donated datasets from 2 industry and 4 academic groups

1200

unique protein sequences submitted for in vitro characterization

TOURNAMENT STRUCTURE

2023 Pilot Tournament Datasets

The pilot Protein Engineering Tournament was made possible by the donation of 6 datasets from our academic and industry partners. The Tournament provides a opportunity for academic and industry groups to disseminate privately held or previously unpublished datasets to the protein engineering community while directing the development and benchmarking of new machine learning models.

Aminotransferase

Donated by: Prof. Dr. U.T. Bornscheuer and M.Sc. M. J. Menke

α-Amylase

Donated by: International Flavors and Fragrances (IFF)

Imine Reductase

Donated by: Codexis

Alkaline Phosphatase PafA

Donated by: Polly Fordyce Group & Dr. Craig Markin

β-glucosidase B

Donated by: Design to Data (D2D) Program & the Justin Siegel Lab

Xylanase

Donated by: Sarel Fleishman Lab

Do you have a dataset to contribute to the next Tournament?

Contact us by emailing tournament@alignbio.org to sponsor an event by donating a dataset and creating a challenge.

Click here to access our GitHub and all of the content from the 2023 pilot Protein Engineering Tournament

2023 Tournament Results

Overall champion

in silico Leaderboard

Total points shown are summed across both the supervised and zero-shot rounds. Please note, not all teams competed in both rounds. Teams were awarded points using a reverse-rank reward system. (i.e., teams coming in 1st, 2nd, and 3rd place were awarded 3, 2, and 1 points respectively. This system rewards teams for doing well in events, and for competing in multiple events, while normalizing to the number of competitors per event.

in vitro Leaderboard

The in vitro round winner, and therefore overall Tournament Champion, was selected as the participant with the highest performing single variant (i.e., the highest specific activity that met all expression and thermostability criteria). The ranking of teams by median performance of variants that passed design criteria is also displayed. The exact numerical values for the in vitro competition can be found in the in vitro section below.

Jump to the individual event results below:

Round 1:
in silico Zero Shot

Round 1:
in silico Supervised

Round 2:
in vitro

in silico Round

ROUND OVERVIEW

The goal of the in silico round was to test each team’s ability to successfully predict a variety of enzyme properties. The in silico round was composed of two tracks: zero-shot and supervised. Teams were invited to participate one or both rounds, and could select to participate in as many or as few of the events as they wanted.

In the zero-shot events, teams were given an enzyme’s the amino acid sequence and asked to predict a variety of properties (e.g. expression, thermostability, specific activity, etc.) without training data. The zero-shot track had three events: α-amylase, aminotransferase, and xylanase.

In the supervised events, teams were first given training data that included an enzyme’s amino acid sequence and corresponding measured properties (e.g. expression, thermostability, specific activity, etc.) to train their models. They were then supplied with a series of enzyme amino acid sequences and asked to predict the same measured properties. The supervised track had four events: alkaline phosphatase PafA, α-amylase, β-glucosidase B, and imine reductase.

All submission data, analysis notebooks, and final figures are located and can be downloaded from our GitHub.

ANALYSIS METRIC

The choice of Spearman correlation to assess the in silico zero-shot and supervised tracks was driven by several reasons. First, Spearman correlation is robust to non-linear relationships and does not assume linearity. As a rank-based metric, it is less sensitive to outliers compared to Pearson correlation, which is especially beneficial in the zero shot setting where the ranges of submission was arbitrary. Additionally, its ease of interpretation and widespread use in the protein engineering literature comforted us in this choice.

PARTICIPATING TEAMS

Arnold Lab
Anonymous Team
Marks Lab
Nimbus
ProtEng4All
SergiR1996
TUM Rostlab

in silico Round: Zero-Shot Events

α-Amylase

Challenge Problem: Score the following three properties 1) specific activity, 2) expression, and 3) thermostability for each variant (e.g. log probabilities). The range of scoring is arbitrary.

Aminotransferase

Challenge Problem: Score how active you predict each variant is for each of the three substrates (e.g. log probabilities). The range of scoring is arbitrary. The substrates are:

S-Phenylethylamine
(4-Chlorophenyl)phenylmethanamine
1,3-Diphenyl-propane-1-amine

Xylanase

Challenge Problem: Given the sequence, please predict how well the enzyme expresses. Your prediction should be a classification (0=No expression, 0.5=Low expression, 1=Good expression). The range of scoring is arbitrary.

in silico Round: Supervised Events

Alkaline phosphatase PafA

Challenge Problem: Score activity for each of the three substrates. The range of scoring is arbitrary. The substrates are:

methyl phosphate (MeP) - chemistry limited substrate
Carboxy 4-methylumbelliferyl phosphate ester (cMUP) - binding limited substrate
methyl phosphodiester (MecMUP) - promiscuous substrate

α-amylase

Challenge Problem: Score the following three properties 1) specific activity, 2) expression, and 3) thermostability for each variant (e.g. log probabilities). The range of scoring is arbitrary.

β -glucosidase B

Challenge Problem: Score each of the following three properties: 1) expression, 2) activity, and 3) melting point. The range of scoring is arbitrary.

Imine reductase

Challenge Problem: Score the fold improvement over positive control (FIOP) of activity. The range of scoring is arbitrary.

PERCEIVED DIFFICULTY

After each round participants were polled to determine the perceived difficulty rating of each event. They were asked to score the perceived difficulty from 0 (easy) to 10 (extremely difficult). This was for information purposes only and did not factor into the final scoring calculations.