Báo cáo khoa học: Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle pot

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang	12
Dung lượng	684,48 KB

Nội dung

Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle Silvia Vilasi and Raffaele Ragone Dipartimento di Biochimica e Biofisica, Naples, Italy The view that a protein must fold into the correct shape, as encoded in the amino acid sequence, before it can function has been deeply rooted in protein science, even before the three-dimensional structure of a protein was first solved. However, for some proteins, especially those involved in signalling and regulation [1], the unstructured state has been suggested to be essential for basic cellular functions and recognized as a separate functional and structural category [2,3]. These are proteins or domains that, in their native state, are either completely disordered or contain large disordered regions, and therefore do not fit the standard sequence–structure–function paradigm, because intrinsic disorder, whether local or extended to the entire protein length, is crucially important for their function. Dunker and Obradovic [4] categorized functional intrinsically disordered regions in molten globule-like and random coil-like structural forms, and Uversky [5] suggested the existence of an additional pre-molten globule form, whose peculiarity is the pres- ence of unstable secondary structure. Betraying still imperfect categorization, these systems are currently classified as ‘intrinsically disordered proteins’ (IDPs), but the use of other synonymous expressions, such as ‘intrinsically unstructured proteins’, is widespread in the literature [6]. More than 100 such proteins are known, including Tau, Prions, Bcl-2, p53, 4E-BP1 and eIF1A [5,7]. Keywords bioinformatics; disorder prediction; intrinsically disordered proteins; seminal vesicle protein no. 4; structure–function relationship Correspondence R. Ragone, Dipartimento di Biochimica e Biofisica, Seconda Universita ` di Napoli, via S. Maria di Costantinopoli 16, 80138 Naples, Italy Fax: +39 081 294136 Tel: +39 081 294042 E-mail: raffrag@tiscali.it; raffaele.ragone@unina2.it (Received 30 October 2007, revised 5 December 2007, accepted 13 December 2007) doi:10.1111/j.1742-4658.2007.06242.x The potent immunomodulatory, anti-inflammatory and procoagulant properties of protein no. 4 secreted from the rat seminal vesicle epithelium (SV-IV) have previously been found to be modulated by a supramolecular monomer–trimer equilibrium. More structural details that integrate experimental data into a predictive framework have recently been reported. Unfortunately, homology modelling and fold-recognition strategies were not successful in creating a theoretical model of the structural organization of SV-IV. It was inferred that the global structure of SV-IV is not similar to that of any protein of known three-dimensional structure. Reversing the classical approach to the sequence–structure–function paradigm, in this paper we report novel information obtained by comparing the physicochemical parameters of SV-IV with two datasets composed of intrinsically unfolded and ideally globular proteins. In addition, we analyse the SV-IV sequence by several publicly available disorder-oriented predictors. Overall, disorder predictions and a re-examination of existing experimental data strongly suggest that SV-IV needs large plasticity to efficiently interact with the different targets that characterize its multifaceted biological function, and should therefore be better classified as an intrinsically disordered protein. Abbreviations HCA, hydrophobic cluster analysis; IDPs, intrinsically disordered proteins; PDB, protein data bank; SV-IV, rat seminal vesicle protein no. 4; SVM, support vector machine. FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 763 Of the proteins studied in our laboratory, SV-IV (seminal vesicle protein no. 4, so identified according to its electrophoretic mobility in SDS-PAGE; precursor SWISS-PROT ID, SVP2_RAT) is a basic (pI = 8.9), thermostable protein of 90 residues (M r = 9758) secreted from the rat seminal vesicle epithelium under strict androgen transcriptional control, which has been found to possess potent non-species-specific immunomodulatory, anti-inflammatory and procoagulant properties [8]. It has been purified to homogeneity and characterized extensively [8–10]. It is encoded by a gene that has been isolated, sequenced and expressed in Escherichia coli [11–14]. On the basis of its biological and biochemical characteristics, SV-IV appears to be a molecule of obvious pharmacological interest. SV-IV- immunorelated proteins have been discovered in several rat tissues, as well as in human seminal fluid and seminal vesicle secretion [13,14]. The segment 3–41 of SV-IV has been found to have a high amino acid sequence similarity with the C-terminal segment 34–66 of uteroglobin, a secreted protein from rabbit displaying phospholipase A2 inhibitory activity in vitro and anti- inflammatory effects in vivo [15,16]. Others have also been able to prepare potent anti-inflammatory peptides from the region of highest similarity between uteroglobin and lipocortin I, a protein that has been suggested to mediate the anti-inflammatory effects of glucocortic- oids [17]. It is therefore highly desirable to obtain as complete structural information as possible. From a structural standpoint, early circular dichroism and fluorescence polarization data indicated scarce structural organization [18]. This agreed with a predictor of local flexibility [19], although other predictive algorithms contrastingly have suggested either the pres- ence [18] or lack [20] of an appreciable amount of secondary structure. Recently, it has been found that, in the range of physiological concentrations (2–48 lm [20,21]), the peculiar biological properties of SV-IV are probably modulated by a supramolecular equilibrium in which a trimeric form competes with monomeric protein for binding to a large variety of SV-IV targets [20]. Eventually, Caporale et al. [22] found agreement between the amounts of predicted and experimental helical structure present in the monomeric form (20 and 24%, respectively), and attempted to create a theoretical model of the structural organization of SV- IV. However, on noting that homology modelling and fold-recognition strategies were not able to provide detailed structural information, they concluded that ‘SV-IV assumes a global structure that is not similar to any protein of known three-dimensional structure’ [22]. Indeed, such an occurrence suggests that SV-IV could violate the standard sequence–structure–function paradigm, but the authors did not investigate this pos- sibility. We have verified that, in terms of disorder- and order-promoting amino acid subsets [23,24], the composition of SV-IV does not strictly conform to trends previously found to occur in IDPs, except for a very high content of serine (24%). Furthermore, a search of the DisProt database [25] did not return any hits for SV-IV, indicating that no DisProt sequence resembles this protein. However, novel information obtained by publicly available disorder-oriented predictors empha- sizes that the functional state of SV-IV lacks significant structural organization. This evidence is sufficient to confidently state that SV-IV can be classified amongst IDPs. Incidentally, the present work also confirms that homology modelling and fold-recognition strategies are best suited to obtain information on the architecture of ordered proteins, but the study of IDPs as if they were ordered can prove to be highly frustrating. Thus, when dealing with proteins of uncertain three-dimensional structure, it would be more correct and less time-expensive to look for disorder before attempting modelling procedures. Results Survey of existing structural information In addition to fluorescence polarization and both far- and near-UV circular dichroism data from our laboratory [18,20,22], experimental evidence that regular structure is scarce in SV-IV comes from SDS-PAGE, which is routinely used to assess the M r values of proteins. Because of their unusual amino acid composition, IDPs bind less SDS than usual and their apparent M r value is often 1.2–1.8 times higher than the real value calculated from sequence data or mea- sured by mass spectrometry [7]. Indeed, the mobility of SV-IV in SDS-PAGE is compatible with an M r value of about 15 000–18 000 [9], which can be compared with an M r value of 9758 calculated from the sequence. Size-exclusion chromatography also indicates that the hydrodynamic radius of SV-IV resembles that of an IDP [7], because purified SV-IV elutes well behind chymotrypsinogen (M r = 25 600) and slightly ahead of RNase A (M r = 13 600) [9]. Finally, diges- tion of SV-IV with trypsin suggests that all but Lys80 of the potential proteolytic sites represented by nine lysine and seven arginine residues are able to efficiently interact with the catalytic site of the enzyme [22], as expected for an IDP-like polypeptide [7]. This piece of information has prompted us to perform predictive analyses aimed at clarifying whether or not the SV-IV Intrinsic disorder in SV-IV S. Vilasi and R. Ragone 764 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS sequence is compatible with the classical sequence– structure–function paradigm. Analysis of physicochemical parameters It has recently emerged that protein disorder tends to be related to general chemical properties, rather than to the abundance or scarcity of specific amino acids [26]. Indeed, like early analyses of protein disorder that were based on the reasoning that protein folding is governed by a balance between hydrophobic forces (attractive) and electrostatic forces between similarly charged residues (repulsive) [23], disorder-oriented predictors largely use physicochemical parameters, such as hydrophobicity [24,27–33], the absolute value of the net charge [24,27–29,33], C-a B-factors [24,27– 29,32,34] and number of contacts [35–38]. Accordingly, we obtained preliminary information on the structural preference of SV-IV by comparing values per residue of these parameters with those of two protein databases composed of ideally globular [35] and natively unfolded [39] proteins, respectively. Visual inspection of two-dimensional plots obtained by considering all possible combinations of two parameters suggests that SV-IV has a strong preference to conform to the general structural features expected for IDPs, because in no case do SV-IV data points fall in regions populated by ordered proteins (Fig. 1). General prediction analysis Owing to increased interest in the structure–function relationships of IDPs, disorder-related literature is increasing, as witnessed by several recent reviews [40–43]. To obtain prediction reliability, two general options are presently available: (a) the combined use of ab initio algorithms, such as a recent scheme based on well-known predictors [23]; or (b) recent programs with improved performance on some benchmarks, such as those based on expected packing density [36–38] or support vector machine (SVM) methods [44–46] (see Materials and methods for further details). However, as the SV-IV sequence comprises amino acid subsets different from those previously found to occur in IDPs [23,24] and does not resemble any known sequence included in the DisProt database [25], it may be valu- able to proceed with caution and investigate both options. The first procedure comprises a preliminary search for low-complexity regions through the seg algorithm [47], followed by a thorough analysis benefiting from the combined use of several ab initio methods, such as pondr (VSL1 and VL-XT) [24,27–29], hydrophobic cluster analysis (hca) [30], prelink [31], globplot [32], disembl [34], ronn [48], iupred [49], disopred 2 [50] and norsp [51]. When applied to SV-IV, seg resulted in a long non-globular region spanning the entire sequence, but few amino acids in the N- and C-termini (amino acids 1–4 and 84–90, respectively). Other structural peculiarities, such as disulfide-forming cysteine residues, zinc fingers and leucine zippers [52], are absent from the SV-IV sequence. On the functional side, SV-IV is predicted to be a metal binding protein [53], but the expected probability of correct classification is about 60%, which is lower than the actual classification accuracy based on the analysis of 9932 positive and 45 999 negative samples of proteins [54]. The vast majority of the other methods also converged to indicate an abundance of intrinsic disorder in SV-IV, but few amino acids in the C-terminal region. In particular, hydrophobic clusters, which are typical of secondary structure elements, were almost totally absent from the hca plot, and prelink predicted the whole sequence as disordered. By contrast, some regular structure was predicted by X-ray-based algorithms, such as various disembl routines and disopred 2 (segments 31–39, 49–59 and 77–90), and discrepancies also affected globplot analyses, depending on the particular order–disorder propensity set chosen to obtain predictions, but in no cases were potential globular domains predicted. When subjected to norsp, the SV- IV protein did not appear to conform to criteria fixed for identifying non-regular secondary structure (NORS) regions, although about 70% of residues were predicted to be in loopy regions. We suspect that no NORS region can be predicted in SV-IV because the recommended length of the sequence window used to calculate the structural content (70 amino acids) is close to the protein length (90 amino acids). Finally, a vanishingly small probability of coiled-coil regions was also predicted by multicoil [55] and coils [56] algorithms (not shown). The above results are summarized in Fig. 2. Another set of predictions was performed using algorithms that have been reported to predict protein disorder more accurately than other methods, namely the foldunfold predictor [36–38] and the SVM-based poodle suite [44–46]. According to foldunfold, SV-IV is probably fully disordered, because the average value of the disorder parameter over its sequence is less than the disorder threshold. Moreover, the average value of the disorder parameter over regions 1–34, 36–57 and 59–80 is less than the disorder threshold and the regions are greater than the reliable frame (11 residues), which means that these regions are predicted as fully disordered (Fig. 3A). Similarly, S. Vilasi and R. Ragone Intrinsic disorder in SV-IV FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 765 poodle predictions suggest that: (a) the entire SV-IV sequence corresponds to a long disorder region (poodle-l); (b) a few residues (amino acids 39–40 and 85–90) do not belong to short disorder regions (poodle-s); and (c) disorder characterizes the whole protein because of the high disorder propensity of all residues (poodle-w) (Fig. 3B). Other predictions To complete our analysis, we verified whether or not SV-IV possesses biased amino acid composition and can be maximally separated from globular proteins. Both features have been found to occur in IDPs. On the first point, Weathers et al. [26,57] have recently examined the contribution of various vectors to recognizing proteins that contain disordered regions through an SVM trained on naturally occurring disordered and ordered proteins. They found that high recognition accuracy can be obtained by an SVM that incorporates only amino acid composition, and very good recognition accuracy was retained using reduced sets of amino acids based on chemical similarity. Overall, this suggests that composition alone and general physicochemical properties, rather than specific amino acids, are sufficient to accurately recognize disorder. We applied 0 0.2 0.4 0.6 0.8 AB CD EF Hydrophobicity Hydrophobicity Net charge 0 0.2 0.4 0.6 18 19 20 21 22 Number of contacts Net charge 0 0.2 0.4 0.6 –0.1 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 B factors Net charge 0.15 0.30 0.45 18 –0.15 –0.05 0.05 0.15 0.25 19 20 21 22 Number of contacts Hydrophobicity 0.15 0.30 0.45 0.60 0.05–0.15 –0.05 0.15 0.25 B factors 16.5 18.0 19.5 21.0 22.5 B factors Number of contacts Fig. 1. Two-dimensional plots. The SV-IV datum (red symbol) is compared with the two sets of 90 natively unfolded and 80 ideally globular proteins (black and grey symbols, respectively) using the mean values of physicochemical parameters computed from the sequence. (A) Number of contacts versus hydrophobicity. (B) Number of contacts versus net charge. (C) Number of contacts versus C-a B-factors. (D) Net charge versus hydrophobicity. (E) Net charge versus C-a B-factors. (F) Hydrophobicity versus C-a B-factors. Intrinsic disorder in SV-IV S. Vilasi and R. Ragone 766 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS Fig. 2. Analysis of the SV-IV sequence using well-known predictors. The original graphic output of each method and the corresponding inter- pretation are shown. In HCA, the protein sequence is shown on a duplicated a-helical net with hydrophobic clusters identified by solid con- tours and amino acid numbers indicated on the top. , ¤, h and refer to proline, glycine, threonine and serine, respectively. S. Vilasi and R. Ragone Intrinsic disorder in SV-IV FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 767 the SVM method to compare the SV-IV sequence with the primary structures of 80 ideally folded and 90 natively unfolded proteins. Fig. 4A shows the mean values of the disorder score for all of these proteins. Although the regions covered by the two protein datasets overlap to some extent, the SV-IV datum clearly belongs to the region populated by natively unfolded proteins. With regard to the second point, other authors [35] have devised an optimal set of artificial parameters for 20 amino acid residues by Monte Carlo algorithm, by which they have obtained maximal sepa- ration between sets of natively unfolded and ideally globular proteins. Following the same rationale as above, we compared the mean value of the artificial parameter for SV-IV and the two sets of proteins. Even in this case, the SV-IV datum unequivocally falls amongst natively unfolded proteins, whose data points are well separated from those of globular proteins (Fig. 4B). Finally, Fig. 4C summarizes the results obtained by other algorithms, such as dispro [58], some additional methods not included in the pondr package developed by Dunker et al. [59,60], and aa 39–40 and 85–90 have borderline disorder (probability very close to 0.5). The remaining regions are predicted as disordered POODLE-SPOODLE-L The whole protein is predicted as disordered POODLE-W FOLDUNFOLD The whole protein is predicted as disordered 0 10 20 30 40 50 60 70 80 90 Residue position 17 18 19 20 21 22 Expected number of contacts A B Disorder probability Residue positions 0 0.5 1 0 20 40 60 80 Disorder probability Residue positions 0 0.5 1 0 20 40 60 80 Fig. 3. Analysis of the SV-IV sequence using improved performance programs. Graphic output of FOLDUNFOLD [36–38] (A) and POODLE [44–46] (B) predictors. Intrinsic disorder in SV-IV S. Vilasi and R. Ragone 768 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS drippred [61]. All of these algorithms agreed in predicting that 100% amino acids in the SV-IV sequence are disordered, except drippred, which resulted in 32% of residues scoring as regular structure. Discussion The structural information re-examined here indicates that intrinsic disorder is abundant in SV-IV. Thus, it was to be expected that homology modelling and fold-recognition strategies would be unable to create a theoretical model of the structural organization of SV-IV [22]. Indeed, we have used several disorder predictors to obtain novel evidence that the odd behaviour of SV-IV is not compatible with the classical sequence–structure–function paradigm. Our predictions suggest that: (a) the entire SV-IV sequence does not encode any region with globular organization; (b) a few isolated segments (mostly the C-terminal region) may possess some regular structure; (c) the prediction of regular structure almost exclusively comes from methods based on Protein Data Bank (PDB) missing coordinates (disembl routines, disopred 2 and drippred) and secondary structure- derived propensities (globplot with Deleage–Roux and Russell–Linding parameters); and (d) the mean physicochemical properties of SV-IV are typical of IDPs, as suggested by methods based on visual inspection. This could provide a clue for the clarifica- tion of the still obscure aspects of the SV-IV structure–function relationships. Lack of consensus affecting disorder prediction in some regions of SV-IV may result from the different sensitivity displayed by disorder predictors towards the various functional properties that are encoded in separate segments of the protein sequence. Indeed, integrity of the primary structure was found to be necessary for immunomodulation, whereas all of the procoagulant and anti-inflammatory properties were located in the fragment 1–70, which is devoid of any immunomodulatory activity, but possesses the same procoagulant and anti-inflammatory activity as the native protein. Moreover, the fragment 8–16 was the shortest N-terminal-derived peptide that possessed equivalent or slightly higher anti-inflammatory activity than DISpro Predictor Disordered region 1–90 VL3, VL3H, VL3E 1–90 DRIPPRED 1–11, 18–47, 58–80 VL2 1–90 –9 –6 –3 0 3 400 600 800 Number of residues in protein Number of residues in protein Di sor d er score –4 –2 0 2 4 6 8 A C B 0 200 400 600 800 0 200 Artificial parameters Fig. 4. Additional predictions of disorder. Comparison of the SV-IV sequence with the primary structures of 90 natively unfolded and 80 ideally globular proteins (same symbols as in Fig. 1) using the SVM method [26,57] (A) and an optimal set of artificial parameters [35] (B). (C) Results obtained by other algorithms. S. Vilasi and R. Ragone Intrinsic disorder in SV-IV FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 769 the native protein, but did not possess any immunomodulatory or procoagulant activity. Finally, CNBr cleavage of SV-IV at the single Met70 residue gener- ated the biologically inactive 71–90 peptide [16], suggesting that the immunomodulatory properties of SV-IV are strictly governed by the cooperation between this and the 1–70 region. Concerning the organization of SV-IV, the results reported here are in substantial agreement with previous secondary structure predictions, at least with regard to the 1–70 region. In fact, the self-association process that underlies the overall functional behaviour of the protein induces conformational changes mainly in this region, which has been suggested to be without secondary structure in the monomer, but to contain some a-helix in the trimer [22]. However, minor discrepancies amongst disorder predictions, as well as between disorder and secondary structure predictions, suggest that several peptide segments within the protein sequence might display chameleon structural behaviour. In this regard, previous experi- ments in buffer solution [18] have shown that a structural rearrangement of SV-IV takes place after treatment with 0.2–6.0 mm SDS. As this interval includes the critical micellar concentration of the surfactant (2.6 mm) [62,63], it may be inferred that SV-IV interacts with the membrane-like environment of SDS micelles, either through direct formation of a protein–surfactant complex or by an indirect process in which the micelle is formed first and the protein is then inserted into it. This process is totally different from the non-specific massive cooperative binding of SDS to proteins at submicellar concentrations, and mimics the situation that SV-IV experiences in most cell-based biological assays, where its multifaceted biological function involves efficient binding to the plasma membrane of its target cells (macro- phages, T lymphocytes and polymorphonuclear cells) at specific sites (K d @ 10 )7 –10 )8 ) [16], and can be obtained only through large plasticity of the structure. Materials and methods Protein databases The database of disordered proteins was created using a list of natively unfolded proteins [39] and the SWISS-PROT protein sequence data bank [64]. The ideal database of globular proteins is available at the address http://phys. protres.ru/resources/folded_80.html [35,37], as selected by inspecting the four general classes in the SCOP database (1.63 release) [65]. Physicochemical parameters The mean protein hydrophobicity was calculated using the Kyte–Doolittle Scale [66], rescaled to a range of 0–1 [33]. The expected average number of contacts per residue in the globular state was calculated according to [35]. The mean net charge was defined as the absolute value of the differ- ence between the numbers of positively and negatively charged residues at pH 7.0, divided by the total residue number, according to [39]. The average structural B-factor (isotropic temperature factor) scale (2.0 SD) was obtained from [32], where only the B-factors for the C-a atoms were considered to minimize influence by crystal packing and other structural artefacts. Predictors of disorder Below, we list all predictors used in this study, pointing out their salient features. A detailed description of each predictor is outside the scope of this paper, and the reader inter- ested in more details is invited to refer to the relevant article(s). The seg algorithm (http://mendel.imp.ac.at/ METHODS/seg.server.html), based on the rationale that compact globular structures exhibit quasi-random statistical properties, is designed to detect regions of biased amino acid composition using mathematically defined properties [47]. The stringency of the search for low-complexity segments is determined by three user-defined parameters [trigger window, W; trigger complexity, K(1); extension complexity, K(2)], using the seg sequences 45, 3.4, 3.75 and 25, 3.0, 3.3 for long and short non-globular domains, respectively. Predictors of natural disordered regions (PONDRs) included in the pondr collection (http:// www.pondr.com) are typically feed-forward neural networks trained on non-redundant sets of ordered and disordered sequences that help to ensure modest predictor biases and to enable the predictors to generalize to new sequences [27–29]. PONDRs come in several versions depending on the sequence attributes taken over windows of 9–21 amino acids. These attributes, such as the fractional composition of particular amino acids, hydropathy or sequence complexity, are averaged over these windows, and the values are used to train the neural network during predictor con- struction. The same values are used as inputs to make predictions. The regional order neural network (ronn) software, originally developed to identify protease cleavage sites, is a method based on sequence alignment available at http://www.strubi.ox.ac.uk/RONN [48]. The iupred server at http://iupred.enzim.hu estimates favourable pairwise contacts in protein sequences and assigns order ⁄ disorder status based on the assumption that intrinsically unstructured ⁄ disordered proteins and domains (IUPs) have special sequences that do not fold because of their inability to form sufficient stabilizing inter-residue interactions [49]. The disembl software available at http://dis.embl.de is Intrinsic disorder in SV-IV S. Vilasi and R. Ragone 770 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS based on artificial neural networks trained to assign disorder by using three different definitions of disorder: residues within loops ⁄ coils, residues within loops with a high degree of mobility as determined from X-ray temperature factors (B-factors), and residues with PDB missing coordinates as defined by Remark465 entries in PDB [34]. The disopred 2 disorder prediction server at http://bioinf.cs.ucl.ac.uk/ disopred restrains the definition of disorder to those residues that appear in the sequence records but with coordinates missing from the electron density map, and an SVM was trained to specifically recognize these [50]. globplot (http://globplot.embl.de) is a web service based on the ten- dency of residues to be in an ordered or disordered state, and uses different propensity sets based on amino acid hydrophobicities (Kyte–Doolittle and Hopp–Woods), B-factors, PDB missing coordinates and secondary structure- derived propensities (Deleage–Roux and Russell–Linding) [32]. norsp is an on-line predictor of NORS regions that is not trained on any dataset and predicts segments in which the content in regular secondary structure is below 12% over at least 70 consecutive residues, and at least 10 consecutive residues are predicted to be exposed. It can be accessed at http://cubic.bioc.columbia.edu/services/ NORSp [51]. The identification of hydrophobic clusters was performed by hca available at http://bioserv.rpbs.jussieu.fr, which allows the easy identification of globular regions from non-globular ones and, in globular regions, the identification of secondary structures [30]. prelink (http:// genomics.eu.org/spip/PreLink) is an hca-derived method that calculates the amino acid distributions in structured and unstructured regions, the probability that a given sequence fragment is part of either a structured or an unstructured region, and the distance of each amino acid to the nearest hydrophobic cluster. Using these three values along a protein sequence, unstructured regions can be predicted with very simple rules [31]. The multicoil program (http://groups.csail.mit.edu/cb/multicoil/cgi-bin/multicoil.cgi) predicts the location of coiled-coil regions in amino acid sequences and classifies the predictions as dimeric or trimeric [55]. coils (http://ch.embnet.org/software/COILS_ form.html) is a program that compares a sequence with a database of known parallel two-stranded coiled-coils and derives a similarity score. By comparing this score with the distribution of scores in globular and coiled-coil proteins, the program then calculates the probability that the sequence will adopt a coiled-coil conformation [56]. Predictions with improved performance were carried out by the foldunfold web server available at http://skuld. protres.ru/~mlobanov/ogu/ogu.cgi, based on the observa- tion that disorder is connected to a weak expected packing density, as evaluated by the observed number of contacts within 8 A ˚ for each amino acid residue in the globular state [35–38], and the SVM-based poodle (prediction of order and disorder by machine learning, http://mbs.cbrc.jp/ poodle) system. The poodle suite predicts protein disorder from amino acid sequences and provides three types of predictions: poodle-l and poodle-s predict long disorder regions (mainly longer than 40 consecutive amino acids) and short disorder regions, respectively; poodle-w is for binary prediction of whole protein disorder [44–46]. Another SVM method for recognizing IDPs was applied according to the procedure described in [26,57], using the mySVM implementation of SVM theory by Ru ¨ ping [67]. The set of artificial parameters for 20 amino acid residues calculated by the Monte Carlo algorithm to maximally separate natively unfolded and ideally globular proteins was obtained from [35]. Additional predictions were performed by: dispro software (http://www.igb.uci.edu/servers/psss. html), which relies on machine learning methods and lever- ages evolutionary information as well as predicted secondary structure and relative solvent accessibility [58]; the VL2 and VL3 predictors available at http://www.ist.temple.edu/ disprot/predictor.php, which rely on partitioning protein disorder into flavours based on competition amongst increasing numbers of predictors [59] and on an ensemble of feed-forward neural networks based on the same attributes as VL2 [60], respectively; and the drippred server (http://www.sbc.su.se/~maccallr/disorder), developed for sequence profile visualization and contact map prediction, which predicts structural disorder by looking for sequence patterns that are not typically found in the PDB [61]. Acknowledgements This paper is dedicated to the memory of the unforget- table Harold C. Helgeson (a.k.a. Hal), founder of the Laboratory of Theoretical Geochemistry and Biogeo- chemistry at U. C. Berkeley (a.k.a. Prediction Central), who is probably sailing off the coast near Margarita- ville. The authors are grateful to V. N. Uversky for his help in creating the list of natively unfolded proteins. References 1 Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM & Obradovic Z (2002) Intrinsic disorder and protein function. Biochemistry 41, 6573–6582. 2 Wright PE & Dyson HJ (1999) Intrinsically unstructured proteins: re-assessing the protein structure–function paradigm. J Mol Biol 293, 321–331. 3 Dyson HJ & Wright PE (2005) Intrinsically unstructured proteins and their functions. Nat Rev Mol Cell Biol 6, 197–208. 4 Dunker AK & Obradovic Z (2001) The protein trinity – linking function and disorder. Nat Biotechnol 19, 805– 806. 5 Uversky VN (2002) Natively unfolded proteins: a point where biology waits for physics. Protein Sci 11, 739– 756. S. Vilasi and R. Ragone Intrinsic disorder in SV-IV FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 771 6 Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic Z, Uversky VN & Dunker AK (2007) Intrinsic disorder and functional proteomics. Biophys J 92, 1439–1456. 7 Tompa P (2002) Intrinsically unstructured proteins. Trends Biochem Sci 27, 527–533. 8 Metafora S, Esposito C, Caputo I, Lepretti M, Cassese D, Dicitore A, Ferranti P & Stiuso P (2007) Seminal vesicle protein IV and its derived active peptides: a possible physiological role in seminal clotting. Semin Thromb Hemost 33, 53–59. 9 Ostrowski MC, Kistler MK & Kistler WS (1979) Purifi- cation and cell-free synthesis of a major protein from rat seminal vesicle secretion. A potential marker for androgen action. J Biol Chem 254, 383–390. 10 Pan Y-CE & Li SSL (1982) Structure of secretory protein IV from rat seminal vesicles. Int J Pept Protein Res 20, 177–187. 11 Harris SE, Mansson P-E, Tully DB & Burkhart B (1983) Seminal vesicle secretion IV gene: allelic differ- ence due to a series of 20-base-pair direct tandem repeats within an intron. Proc Natl Acad Sci USA 80 , 6460–6464. 12 Kandala C, Kistler MK, Lawther RP & Kistler WS (1983) Characterization of a genomic clone for rat seminal vesicle secretory protein IV. Nucleic Acids Res 11, 3169–3186. 13 McDonald C, Williams L, McTurck P, Fuller F, McIntosh E & Higgins S (1983) Isolation and charac- terisation of genes for androgen-responsive secretory proteins of rat seminal vesicles. Nucleic Acids Res 11, 917–930. 14 D’Ambrosio E, Del Grosso N, Ravagnan G, Peluso G & Metafora S (1993) Cloning and expression of the rat genomic DNA sequence coding for the secreted form of the protein SV-IV. Bull Mol Biol Med 18, 215–223. 15 Metafora S, Facchiano F, Facchiano A, Esposito C, Peluso G & Porta R (1987) Homology between rabbit uteroglobin and the rat seminal vesicle sperm binding protein: prediction of structural features of glutamine substrates for transglutaminase. J Protein Chem 6, 353–359. 16 Ialenti A, Santagada V, Caliendo G, Severino B, Fiorino F, Maffia P, Ianaro A, Morelli F, Di Micco B, Cartenı ` M et al. (2001) Synthesis of novel anti-inflammatory peptides derived from the amino-acid sequence of the bioactive protein SV-IV. Eur J Biochem 268, 3399–3406. 17 Miele L, Cordella-Miele E, Facchiano A & Mukherjee AB (1988) Novel anti-inflammatory peptides from the region of highest similarity between uteroglobin and lipocortin I. Nature 335, 726–730. 18 Stiuso P, Ragone R, De Santis A, Metafora S, Peluso G, Ravagnan G & Colonna G (1989) Structural properties of rat seminal vesicle protein IV: effect of sodium dodecylsulfate. In Biochemical Aspects on the Immunopathology of Reproduction (Spera G, Mukherjee AB, Ravagnan G & Metafora S, eds), pp. 105–111. Acta Medica, Rome. 19 Ragone R, Facchiano F, Facchiano A, Facchiano AM & Colonna G (1989) Flexibility plot of proteins. Protein Eng 2, 497–504. 20 Stiuso P, Metafora S, Facchiano AM, Colonna G & Ragone R (1999) The self association of protein SV-IV and its possible functional implications. Eur J Biochem 266, 1029–1035. 21 Tufano MA, Porta R, Farzati B, Di Pierro P, Rossano F, Catalanotti P, Baroni A & Metafora S (1996) Rat seminal vesicle protein SV-IV and its transglutaminase-synthesized polyaminated derivative Spd 2 -SV-IV induce cytokine release from human rest- ing lymphocytes and monocytes in vitro. Cell Immunol 168, 148–157. 22 Caporale C, Caruso C, Colonna G, Facchiano A, Ferr- anti P, Mamone G, Picariello G, Colonna F, Metafora S & Stiuso P (2004) Structural properties of the protein SV-IV. Eur J Biochem 271, 263–271. 23 Ferron F, Longhi S, Canard B & Karlin D (2006) A practical overview of protein disorder prediction methods. Proteins 65, 1–14. 24 Romero P, Obradovic Z, Li X, Garner EC, Brown CJ & Dunker AK (2001) Sequence complexity of disordered protein. Proteins 42, 38–48. 25 Sickmeier M, Hamilton JA, LeGall T, Vavic V, Cortese MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky VN et al. (2007) DisProt: the database of disordered proteins. Nucleic Acids Res 35, D786–793. 26 Weathers EA, Paulaitis ME, Woolf TB & Hoh JH (2004) Reduced amino acid alphabet is sufficient to accurately recognize intrinsically disordered protein. FEBS Lett 576, 348–352. 27 Romero P, Obradovic Z & Dunker AK (1997) Sequence data analysis for long disordered regions prediction in the calcineurin family. Genome Inform 8, 110–124. 28 Li X, Romero P, Rani M, Dunker AK & Obradovic Z (1999) Predicting protein disorder for N-, C-, and inter- nal regions. Genome Inform 10, 30–40. 29 Obradovic Z, Peng K, Vucetic S, Radivojac P & Dun- ker AK (2005) Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 61 (Suppl. 7), 176–182. 30 Gaboriaud C, Bissery V, Benchetrit T & Mornon JP (1987) Hydrophobic cluster analysis: an efficient new way to compare and analyse amino acid sequences. FEBS Lett 224, 149–155. 31 Coeytaux K & Poupon A (2005) Prediction of unfolded segments in a protein sequence based on amino acid composition. Bioinformatics 21, 1891–1900. 32 Linding R, Russell RB, Neduva V & Ginson TJ (2003) GlobPlot: exploring protein sequences for globularity and disorder. Nucleic Acids Res 31, 3701–3708. Intrinsic disorder in SV-IV S. Vilasi and R. Ragone 772 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS [...]... predicting protein disorder by using physicochemical features and reduced amino acid set of a position specific scoring matrix Bioinformatics 23, 2337–2338 46 Shimizu K, Muraoka Y, Hirose S, Tomii K & Noguchi T (2007) Predicting mostly disordered proteins by using structure-unknown protein data BMC Bioinformatics 8, 78 47 Wootton JC (1994) Non-globular domains in protein sequences: automated segmentation... Paulaitis ME, Woolf TB & Hoh JH (2007) Insights into protein structure and function from disorder- complexity space Proteins 66, 16–28 58 Cheng J, Sweredoski M & Baldi P (2005) Accurate prediction of protein disordered regions by mining protein structure data Data Min Knowl Disc 11, 213–222 59 Vucetic S, Brown CJ, Dunker AK & Obradovic Z (2003) Flavors of protein disorder Proteins 52, 573–584 60 Obradovic... CJ, Radivojac P, Brown CJ & Dunker AK (2003) Predicting intrinsic disorder from amino acid sequence Proteins 53 (Suppl 6), 566–572 61 MacCallum RM (2004) Striped sheets and protein contact prediction Bioinformatics 20 (Suppl 1), I224–I231 62 Esposito C, Colicchio P, Facchiano A & Ragone R (1998) Effect of a weak electrolyte on the critical micellar concentration of sodium dodecyl sulfate J Colloid Interface... Interface Sci 200, 310–312 63 Ambrosone L & Ragone R (1998) The interaction of micelles with added species and its similarity to the denaturant binding model of proteins J Colloid Interface Sci 205, 454–458 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 773 Intrinsic disorder in SV-IV S Vilasi and R Ragone 64 Bairoch A & Apweiler R (2000) The SWISS-PROT protein sequence... sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28, 45–48 65 Murzin AG, Brenner SE, Hubbard T & Chothia C (1995) SCOP: a structural classification of protein database for the investigation of sequences and structures J Mol Biol 247, 536–540 774 66 Kyte J & Doolittle RF (1982) A simple method for displaying the hydropathic character of a protein J Mol Biol 157, 105–132 67 Ruping S (2000)... theoretical principles to practical applications Curr Protein Pept Sci 8, 135–149 41 Quevillon-Cheruel S, Leulliot N, Gentils L, van Tilbeurgh H & Poupon A (2007) Production and crystallization of protein domains: how useful are disorder predictions? Curr Protein Pept Sci 8, 151–160 ´ ´ 42 Dosztanyi Z, Sandor M, Tompa P & Simon I (2007) Prediction of protein disorder at the domain level Curr Protein Pept... 1453–1459 35 Garbuzynskiy SO, Lobanov MY & Galzitskaya OV (2004) To be folded or to be unfolded? Protein Sci 13, 2871–2877 36 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY (2006) FoldUnfold: web server for the prediction of disordered regions in protein chain Bioinformatics 22, 2948–2949 37 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY (2006) Prediction of natively unfolded regions in protein chain Mol Biol... 341–348 38 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY (2006) Prediction of amyloidogenic and disordered regions in protein chains PLoS Comput Biol 2, 1639– 1648 39 Uversky VN, Gillespie JR & Fink AL (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins 41, 415–427 40 Bourhis JM, Canard B & Longhi S (2007) Predicting protein disorder and induced folding: from theoretical... web-based support vector machine software for functional classification of a protein from its primary sequence Nucleic Acids Res 31, 3692–3697 55 Wolf E, Kim PS & Berger B (1997) multicoil: a program for predicting two- and three-stranded coiled coils Protein Sci 6, 1179–1189 56 Lupas A, Van Dyke M & Stock J (1991) Predicting coiled coils from protein sequences Science 252, 1162– 1164 57 Weathers EA, Paulaitis... Bornberg-Bauer E, Rivals E & Vingron M (1998) Computational approaches to identify leucine zippers Nucleic Acids Res 26, 2740–2746 53 Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Cao ZW & Chen YZ (2006) Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach BMC Bioinformatics 7 (Suppl 5), S13 54 Cai CZ, Han LY, Ji ZL, . Abundance of intrinsic disorder in SV-IV, a multifunctional androgen-dependent protein secreted from rat seminal vesicle Silvia Vilasi and Raffaele Ragone Dipartimento. P, Rossano F, Catalanotti P, Baroni A & Metafora S (1996) Rat seminal vesicle protein SV-IV and its transglutaminase-synthesized polyaminated derivative Spd 2 -SV-IV

Ngày đăng: 23/03/2014, 07:20

Xem thêm