EvoEF2: accurate and fast energy function for computational protein design

X Huang, R Pearce, Y Zhang - Bioinformatics, 2020 - academic.oup.com
Bioinformatics, 2020academic.oup.com
Motivation The accuracy and success rate of de novo protein design remain limited, mainly
due to the parameter over-fitting of current energy functions and their inability to discriminate
incorrect designs from correct designs. Results We developed an extended energy function,
EvoEF2, for efficient de novo protein sequence design, based on a previously proposed
physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3%
of all, core and surface residues for 148 test monomers, and was generally applicable to …
Motivation
The accuracy and success rate of de novo protein design remain limited, mainly due to the parameter over-fitting of current energy functions and their inability to discriminate incorrect designs from correct designs.
Results
We developed an extended energy function, EvoEF2, for efficient de novo protein sequence design, based on a previously proposed physical energy function, EvoEF. Remarkably, EvoEF2 recovered 32.5%, 47.9% and 22.3% of all, core and surface residues for 148 test monomers, and was generally applicable to protein–protein interaction design, as it recapitulated 30.9%, 42.4%, 31.3% and 21.4% of all, core, interface and surface residues for 88 test dimers, significantly outperforming EvoEF on the native sequence recapitulation. We further used I-TASSER to evaluate the foldability of the 148 designed monomer sequences, where all of them were predicted to fold into structures with high fold- and atomic-level similarity to their corresponding native structures, as demonstrated by the fact that 87.8% of the predicted structures shared a root-mean-square-deviation less than 2 Å to their native counterparts. The study also demonstrated that the usefulness of physical energy functions is highly correlated with the parameter optimization processes, and EvoEF2, with parameters optimized using sequence recapitulation, is more suitable for computational protein sequence design than EvoEF, which was optimized on thermodynamic mutation data.
Availability and implementation
The source code of EvoEF2 and the benchmark datasets are freely available at https://zhanglab.ccmb.med.umich.edu/EvoEF.
Supplementary information
Supplementary data are available at Bioinformatics online.
Oxford University Press