Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	12
Dung lượng	3,04 MB

Nội dung

As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment.

Yao et al BMC Bioinformatics (2017) 18:542 DOI 10.1186/s12859-017-1983-3 RESEARCH ARTICLE Open Access Diverse effects of distance cutoff and residue interval on the performance of distance-dependent atom-pair potential in protein structure prediction Yuangen Yao1, Rong Gui1, Quan Liu1, Ming Yi1 and Haiyou Deng1,2* Abstract Background: As one of the most successful knowledge-based energy functions, the distance-dependent atom-pair potential is widely used in all aspects of protein structure prediction, including conformational search, model refinement, and model assessment During the last two decades, great efforts have been made to improve the reference state of the potential, while other factors that also strongly affect the performance of the potential have been relatively less investigated Results: Based on different distance cutoffs (from to 22 Å) and residue intervals (from to 15) as well as six different reference states, we constructed a series of distance-dependent atom-pair potentials and tested them on several groups of structural decoy sets collected from diverse sources A comprehensive investigation has been performed to clarify the effects of distance cutoff and residue interval on the potential’s performance Our results provide a new perspective as well as a practical guidance for optimizing distance-dependent statistical potentials Conclusions: The optimal distance cutoff and residue interval are highly related with the reference state that the potential is based on, the measurements of the potential’s performance, and the decoy sets that the potential is applied to The performance of distance-dependent statistical potential can be significantly improved when the best statistical parameters for the specific application environment are adopted Keywords: Distance-dependent atom-pair potential, Protein structure prediction, Distance cutoff, Residue interval, Reference state Background One of the major challenges in protein structure prediction is to design accurate energy function that can discriminate native or near-native structure from nonnative structures [1] Especially in conformational search [2–5], model refinement [6, 7] and model assessment [8–12], energy function is always the primary issue to be conquered Although the detailed interactions of protein atoms can be described by quantum mechanical equations [13, 14], the amount of computation for such kind of macromolecule can easily go beyond the capability of current computing resources The common practice is * Correspondence: hydeng@mail.hzau.edu.cn Department of Physics, College of Science, Huazhong Agricultural University, Wuhan 430070, China Institute of Applied Physics, Huazhong Agricultural University, Wuhan 430070, China to approximate the interactions based on the classical physics [15] These energy functions generally contain terms associated with bond lengths, bond angles, torsion angles, van der Waals interactions, and electrostatic interactions, which are often called physics-based energy function [16, 17] By virtue of the abundant structure resources in Protein Data Bank [18], another category of energy function (called knowledge-based energy function [19, 20]) springs up and plays an increasingly important role in protein structure prediction So far the most successful prediction methods are more or less based on the knowledge-based energy function [21–24] Any aspect of structural features which characterize particular interactions in the folded proteins can be used to derive knowledge-based energy functions, especially those in pairwise form The distance-dependent atom- © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated Yao et al BMC Bioinformatics (2017) 18:542 pair potential [9, 25–29] is one of the most commonly used pairwise energy functions, which characterizes the distributions of pairwise distances between residuespecific atom types in protein structures, and converts them into energy based on the inverse of Boltzmann’s law Many distance-dependent atom-pair potentials have been developed and widely used during the last two decades, such as RAPDF [25], KBP [26], Dfire [27], Dope [9], RW [29] and so on Some potentials (e.g dDFIRE [30], RWplus [29], GOAP [31], ROTAS [32]) also combine other energy terms for characterizing side-chain orientation, angle distribution, solvent accessibility or secondary structure preference, but the distancedependent terms still play the central role In order to develop more efficient distance-dependent atom-pair potential, great efforts have been made to improve the reference state, which makes the reference state the major difference between different potentials [33] In fact, Many other factors also strongly affect the performance of distance-dependent atom-pair potential [34] Distance cutoff (interactions of atom pairs with distances larger than the cutoff will be ignored) and residue interval (only atom pairs from two residues with sequential intervals equal or larger than the specified residue interval are considered) are two important statistical parameters for designing distance-dependent atom-pair potentials RAPDF chooses a relatively large distance cutoff of 20 Å after testing four different values (5, 10, 15, and 20 Å) on the same decoy sets KBP and Dfire set the distance cutoff to14.5 Å, whereas Dope and RW take distance cutoffs of 15 and 15.5 Å, respectively Despite its importance, the distance cutoff was often determined without a careful optimization in many potentials Similar to the situation of distance cutoff, the residue intervals in different potentials are usually set to different values, such as (meaning that only atom-pairs within the same residue are excluded from the statistics), 5, 10 and so on So far it is unclear what the optimal distance cutoff (or residue interval) is, and how it is related to the reference state and the decoy sets that the potential is applied to To specifically explore the effects of distance cutoff and residue interval on the performance of distancedependent atom-pair potential, we constructed a series of potentials with different distance cutoffs and residue intervals as well as different reference states All potentials were tested on several groups of structural decoy sets collected from diverse sources We investigated the performance variations of these potentials in native recognition and decoy discrimination We also explored the preferences of optimal distance cutoff and residue interval for different decoy sets and potentials with different reference states The evaluation results have been compared with several widely used statistical potentials Moreover, we applied the potentials with other Page of 12 residue intervals rather than used in potential construction, which yielded better performance in many cases The results and observations of this work provide new insights and valuable references for determination of distance cutoff and residue interval to optimize the performance of distance-dependent atom-pair potential Methods Distance-dependent atom-pair potentials with different reference states The distance-dependent atom-pair potential is derived by counting the pair-wise distances of every two nonhydrogen atoms in protein structures With the assumption that the distributions of structural features obtained from protein structures obey the Boltzmann distribution of statistical mechanics [19], the potential can be written as: " OBS # f i;j r ị u i;j r ị ẳ k B T ln REF f i;j ðr Þ where kB and T are Boltzmann constant and Kelvin temperature, respectively f OBS i;j ðr Þ is the observed probability of atom types i and j in a particular distance bin r to r+Δr in native structures, which can be calculated Table Brief description of six reference states for distancedependent atom-pair potential Reference statea Description Averaging (ave-) Take the average distance distribution over different atom types from experimental conformations as the reference state, which means the distance distributions for all types of atom pair are identical in the reference state [25] Quasi-chemical approximation (kbp-) Use the overall distance distribution of atom pair from experimental structures and calculate the specific distance distribution of atom types i and j based on the mole fraction (on the whole dataset) of atom type i and j [26] Finite ideal-gas (dfire-) Treat the reference state as finite ideal-gas that probability of atom pair in a particular distance bin increases in with a to-be-determined constant a (a < 2) [27] Spherical noninteracting (dope-) Treat the reference state as a sphere in which all atoms of a protein evenly distributed without ineraction The size of sphere is specifically decided by corresponding experimental structure [9] Random-walk chain (rw-) Treat the reference state as an ideal randomwalk chain of a rigid step length, which mimics well the generic entropic elasticity and inherent connectivity of polymer protein molecules and yet ignores the atomic interactions of amino acids [29] Atom-shuffled (srs-) Generate a shuffled structure dataset by preserving all atomic positions while shuffling atom identities within each of the experimental structures [28] a The abbreviation is given in parentheses Yao et al BMC Bioinformatics (2017) 18:542 Page of 12 Fig The flowchart of our studies Step PDB dataset preparation; Step Potential construction; Step Potential application; Step Result analysis from a non-redundant set of experimental structures f REF i;j ðr Þ is the reference probability of atom types i and j in the corresponding distance bin in the non-native structures Since such a structural database does not exist for non-native structures, how to deal with the reference state for calculating f REF i;j ðr Þ is a critical issue in designing potentials We conducted our research on six well-known reference states The basic information of these reference states are shown in Table and more details can be found in our previous research article [33] Potential construction with different distance cutoffs and residue intervals We constructed a series of distance-dependent atompair potentials based on the aforementioned reference Table Basic information of the six groups of structural decoy sets Sets Name Number of sets Average lengtha Number of structures I-TASSER 56 80 (47–118) 24,707 Moulder 20 174 (81–340) 6406 Rosetta 58 83 (50–146) 5858 3DRobot 200 133 (80–240) 60,200 CASP10 72 224 (24–587) 5805 CASP11 62 206 (37–462) 4522 Total/Ave 468 146 107,498 a The length range is given in parentheses states with different distance cutoffs and residue intervals A non-redundant structural dataset of 1762 proteins with pairwise sequence identity of

Ngày đăng: 25/11/2020, 15:59