Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 12 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
12
Dung lượng
684,48 KB
Nội dung
AbundanceofintrinsicdisorderinSV-IV,a multifunctional
androgen-dependent proteinsecretedfromrat seminal
vesicle
Silvia Vilasi and Raffaele Ragone
Dipartimento di Biochimica e Biofisica, Naples, Italy
The view that aprotein must fold into the correct
shape, as encoded in the amino acid sequence, before
it can function has been deeply rooted inprotein sci-
ence, even before the three-dimensional structure of a
protein was first solved. However, for some proteins,
especially those involved in signalling and regulation
[1], the unstructured state has been suggested to be
essential for basic cellular functions and recognized as
a separate functional and structural category [2,3].
These are proteins or domains that, in their native
state, are either completely disordered or contain large
disordered regions, and therefore do not fit the stan-
dard sequence–structure–function paradigm, because
intrinsic disorder, whether local or extended to the
entire protein length, is crucially important for their
function. Dunker and Obradovic [4] categorized func-
tional intrinsically disordered regions in molten glob-
ule-like and random coil-like structural forms, and
Uversky [5] suggested the existence of an additional
pre-molten globule form, whose peculiarity is the pres-
ence of unstable secondary structure. Betraying still
imperfect categorization, these systems are currently
classified as ‘intrinsically disordered proteins’ (IDPs),
but the use of other synonymous expressions, such as
‘intrinsically unstructured proteins’, is widespread in
the literature [6]. More than 100 such proteins are
known, including Tau, Prions, Bcl-2, p53, 4E-BP1 and
eIF1A [5,7].
Keywords
bioinformatics; disorder prediction;
intrinsically disordered proteins; seminal
vesicle protein no. 4; structure–function
relationship
Correspondence
R. Ragone, Dipartimento di Biochimica
e Biofisica, Seconda Universita
`
di Napoli,
via S. Maria di Costantinopoli 16,
80138 Naples, Italy
Fax: +39 081 294136
Tel: +39 081 294042
E-mail: raffrag@tiscali.it;
raffaele.ragone@unina2.it
(Received 30 October 2007, revised 5
December 2007, accepted 13 December
2007)
doi:10.1111/j.1742-4658.2007.06242.x
The potent immunomodulatory, anti-inflammatory and procoagulant
properties ofprotein no. 4 secretedfrom the ratseminalvesicle epithelium
(SV-IV) have previously been found to be modulated by a supramolecular
monomer–trimer equilibrium. More structural details that integrate experi-
mental data into a predictive framework have recently been reported.
Unfortunately, homology modelling and fold-recognition strategies were
not successful in creating a theoretical model of the structural organization
of SV-IV. It was inferred that the global structure of SV-IV is not similar
to that of any proteinof known three-dimensional structure. Reversing the
classical approach to the sequence–structure–function paradigm, in this
paper we report novel information obtained by comparing the physico-
chemical parameters of SV-IV with two datasets composed of intrinsically
unfolded and ideally globular proteins. In addition, we analyse the SV-IV
sequence by several publicly available disorder-oriented predictors. Overall,
disorder predictions and a re-examination of existing experimental data
strongly suggest that SV-IV needs large plasticity to efficiently interact with
the different targets that characterize its multifaceted biological function,
and should therefore be better classified as an intrinsically disordered
protein.
Abbreviations
HCA, hydrophobic cluster analysis; IDPs, intrinsically disordered proteins; PDB, protein data bank; SV-IV,ratseminalvesicleprotein no. 4;
SVM, support vector machine.
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 763
Of the proteins studied in our laboratory, SV-IV
(seminal vesicleprotein no. 4, so identified according to
its electrophoretic mobility in SDS-PAGE; precursor
SWISS-PROT ID, SVP2_RAT) is a basic (pI = 8.9),
thermostable proteinof 90 residues (M
r
= 9758)
secreted from the ratseminalvesicle epithelium under
strict androgen transcriptional control, which has been
found to possess potent non-species-specific immuno-
modulatory, anti-inflammatory and procoagulant prop-
erties [8]. It has been purified to homogeneity and
characterized extensively [8–10]. It is encoded by a gene
that has been isolated, sequenced and expressed in
Escherichia coli [11–14]. On the basis of its biological
and biochemical characteristics, SV-IV appears to be a
molecule of obvious pharmacological interest. SV-IV-
immunorelated proteins have been discovered in several
rat tissues, as well as in human seminal fluid and semi-
nal vesicle secretion [13,14]. The segment 3–41 of SV-IV
has been found to have a high amino acid sequence
similarity with the C-terminal segment 34–66 of utero-
globin, asecretedproteinfrom rabbit displaying
phospholipase A2 inhibitory activity in vitro and anti-
inflammatory effects in vivo [15,16]. Others have also
been able to prepare potent anti-inflammatory peptides
from the region of highest similarity between uteroglo-
bin and lipocortin I, aprotein that has been suggested
to mediate the anti-inflammatory effects of glucocortic-
oids [17]. It is therefore highly desirable to obtain as
complete structural information as possible.
From a structural standpoint, early circular dichro-
ism and fluorescence polarization data indicated scarce
structural organization [18]. This agreed with a predic-
tor of local flexibility [19], although other predictive
algorithms contrastingly have suggested either the pres-
ence [18] or lack [20] of an appreciable amount of sec-
ondary structure. Recently, it has been found that, in
the range of physiological concentrations (2–48 lm
[20,21]), the peculiar biological properties of SV-IV are
probably modulated by a supramolecular equilibrium
in which a trimeric form competes with monomeric
protein for binding to a large variety of SV-IV targets
[20]. Eventually, Caporale et al. [22] found agreement
between the amounts of predicted and experimental
helical structure present in the monomeric form
(20 and 24%, respectively), and attempted to create a
theoretical model of the structural organization of SV-
IV. However, on noting that homology modelling and
fold-recognition strategies were not able to provide
detailed structural information, they concluded that
‘SV-IV assumes a global structure that is not similar
to any proteinof known three-dimensional structure’
[22]. Indeed, such an occurrence suggests that SV-IV
could violate the standard sequence–structure–function
paradigm, but the authors did not investigate this pos-
sibility.
We have verified that, in terms of disorder- and
order-promoting amino acid subsets [23,24], the com-
position of SV-IV does not strictly conform to trends
previously found to occur in IDPs, except for a very
high content of serine (24%). Furthermore, a search of
the DisProt database [25] did not return any hits for
SV-IV, indicating that no DisProt sequence resembles
this protein. However, novel information obtained by
publicly available disorder-oriented predictors empha-
sizes that the functional state of SV-IV lacks significant
structural organization. This evidence is sufficient to
confidently state that SV-IV can be classified amongst
IDPs. Incidentally, the present work also confirms that
homology modelling and fold-recognition strategies are
best suited to obtain information on the architecture
of ordered proteins, but the study of IDPs as if they
were ordered can prove to be highly frustrating. Thus,
when dealing with proteins of uncertain three-dimen-
sional structure, it would be more correct and less
time-expensive to look for disorder before attempting
modelling procedures.
Results
Survey of existing structural information
In addition to fluorescence polarization and both far-
and near-UV circular dichroism data from our labora-
tory [18,20,22], experimental evidence that regular
structure is scarce in SV-IV comes from SDS-PAGE,
which is routinely used to assess the M
r
values of pro-
teins. Because of their unusual amino acid composi-
tion, IDPs bind less SDS than usual and their
apparent M
r
value is often 1.2–1.8 times higher than
the real value calculated from sequence data or mea-
sured by mass spectrometry [7]. Indeed, the mobility of
SV-IV in SDS-PAGE is compatible with an M
r
value
of about 15 000–18 000 [9], which can be compared
with an M
r
value of 9758 calculated from the
sequence. Size-exclusion chromatography also indicates
that the hydrodynamic radius of SV-IV resembles that
of an IDP [7], because purified SV-IV elutes well
behind chymotrypsinogen (M
r
= 25 600) and slightly
ahead of RNase A (M
r
= 13 600) [9]. Finally, diges-
tion of SV-IV with trypsin suggests that all but Lys80
of the potential proteolytic sites represented by nine
lysine and seven arginine residues are able to efficiently
interact with the catalytic site of the enzyme [22], as
expected for an IDP-like polypeptide [7]. This piece of
information has prompted us to perform predictive
analyses aimed at clarifying whether or not the SV-IV
Intrinsic disorderin SV-IV S. Vilasi and R. Ragone
764 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
sequence is compatible with the classical sequence–
structure–function paradigm.
Analysis of physicochemical parameters
It has recently emerged that proteindisorder tends to
be related to general chemical properties, rather than
to the abundance or scarcity of specific amino acids
[26]. Indeed, like early analyses ofproteindisorder that
were based on the reasoning that protein folding is
governed by a balance between hydrophobic forces
(attractive) and electrostatic forces between similarly
charged residues (repulsive) [23], disorder-oriented
predictors largely use physicochemical parameters,
such as hydrophobicity [24,27–33], the absolute value
of the net charge [24,27–29,33], C-a B-factors [24,27–
29,32,34] and number of contacts [35–38]. Accordingly,
we obtained preliminary information on the structural
preference of SV-IV by comparing values per residue
of these parameters with those of two protein data-
bases composed of ideally globular [35] and natively
unfolded [39] proteins, respectively. Visual inspection
of two-dimensional plots obtained by considering all
possible combinations of two parameters suggests that
SV-IV has a strong preference to conform to the gen-
eral structural features expected for IDPs, because in
no case do SV-IV data points fall in regions populated
by ordered proteins (Fig. 1).
General prediction analysis
Owing to increased interest in the structure–function
relationships of IDPs, disorder-related literature is
increasing, as witnessed by several recent reviews
[40–43]. To obtain prediction reliability, two general
options are presently available: (a) the combined use
of ab initio algorithms, such as a recent scheme based
on well-known predictors [23]; or (b) recent programs
with improved performance on some benchmarks, such
as those based on expected packing density [36–38] or
support vector machine (SVM) methods [44–46] (see
Materials and methods for further details). However,
as the SV-IV sequence comprises amino acid subsets
different from those previously found to occur in IDPs
[23,24] and does not resemble any known sequence
included in the DisProt database [25], it may be valu-
able to proceed with caution and investigate both
options.
The first procedure comprises a preliminary search
for low-complexity regions through the seg algorithm
[47], followed by a thorough analysis benefiting from
the combined use of several ab initio methods, such as
pondr (VSL1 and VL-XT) [24,27–29], hydrophobic
cluster analysis (hca) [30], prelink [31], globplot
[32], disembl [34], ronn [48], iupred [49], disopred
2
[50] and norsp [51]. When applied to SV-IV, seg
resulted ina long non-globular region spanning the
entire sequence, but few amino acids in the N- and
C-termini (amino acids 1–4 and 84–90, respectively).
Other structural peculiarities, such as disulfide-forming
cysteine residues, zinc fingers and leucine zippers [52],
are absent from the SV-IV sequence. On the functional
side, SV-IV is predicted to be a metal binding protein
[53], but the expected probability of correct classifica-
tion is about 60%, which is lower than the actual clas-
sification accuracy based on the analysis of 9932
positive and 45 999 negative samples of proteins [54].
The vast majority of the other methods also converged
to indicate an abundanceofintrinsicdisorder in
SV-IV, but few amino acids in the C-terminal region.
In particular, hydrophobic clusters, which are typical
of secondary structure elements, were almost totally
absent from the hca plot, and prelink predicted the
whole sequence as disordered. By contrast, some regu-
lar structure was predicted by X-ray-based algorithms,
such as various disembl routines and disopred
2 (seg-
ments 31–39, 49–59 and 77–90), and discrepancies also
affected globplot analyses, depending on the particu-
lar order–disorder propensity set chosen to obtain pre-
dictions, but in no cases were potential globular
domains predicted. When subjected to norsp, the SV-
IV protein did not appear to conform to criteria fixed
for identifying non-regular secondary structure
(NORS) regions, although about 70% of residues were
predicted to be in loopy regions. We suspect that no
NORS region can be predicted in SV-IV because the
recommended length of the sequence window used to
calculate the structural content (70 amino acids) is
close to the protein length (90 amino acids). Finally, a
vanishingly small probability of coiled-coil regions was
also predicted by multicoil [55] and coils [56] algo-
rithms (not shown). The above results are summarized
in Fig. 2.
Another set of predictions was performed using
algorithms that have been reported to predict protein
disorder more accurately than other methods, namely
the foldunfold predictor [36–38] and the SVM-based
poodle suite [44–46]. According to foldunfold,
SV-IV is probably fully disordered, because the aver-
age value of the disorder parameter over its sequence
is less than the disorder threshold. Moreover, the aver-
age value of the disorder parameter over regions 1–34,
36–57 and 59–80 is less than the disorder threshold
and the regions are greater than the reliable frame
(11 residues), which means that these regions are
predicted as fully disordered (Fig. 3A). Similarly,
S. Vilasi and R. Ragone Intrinsicdisorderin SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 765
poodle predictions suggest that: (a) the entire SV-IV
sequence corresponds to a long disorder region
(poodle-l); (b) a few residues (amino acids 39–40
and 85–90) do not belong to short disorder regions
(poodle-s); and (c) disorder characterizes the whole
protein because of the high disorder propensity of all
residues (poodle-w) (Fig. 3B).
Other predictions
To complete our analysis, we verified whether or not
SV-IV possesses biased amino acid composition and
can be maximally separated from globular proteins.
Both features have been found to occur in IDPs. On
the first point, Weathers et al. [26,57] have recently
examined the contribution of various vectors to recog-
nizing proteins that contain disordered regions through
an SVM trained on naturally occurring disordered and
ordered proteins. They found that high recognition
accuracy can be obtained by an SVM that incorporates
only amino acid composition, and very good recogni-
tion accuracy was retained using reduced sets of amino
acids based on chemical similarity. Overall, this sug-
gests that composition alone and general physicochem-
ical properties, rather than specific amino acids, are
sufficient to accurately recognize disorder. We applied
0
0.2
0.4
0.6
0.8
AB
CD
EF
Hydrophobicity
Hydrophobicity
Net charge
0
0.2
0.4
0.6
18 19 20 21 22
Number of contacts
Net charge
0
0.2
0.4
0.6
–0.1
0.1 0.2 0.3 0.4 0.5 0.6
0 0.1 0.2 0.3
B factors
Net charge
0.15
0.30
0.45
18
–0.15 –0.05 0.05 0.15 0.25
19 20 21 22
Number of contacts
Hydrophobicity
0.15
0.30
0.45
0.60
0.05–0.15 –0.05 0.15 0.25
B factors
16.5
18.0
19.5
21.0
22.5
B factors
Number of contacts
Fig. 1. Two-dimensional plots. The SV-IV datum (red symbol) is compared with the two sets of 90 natively unfolded and 80 ideally globular
proteins (black and grey symbols, respectively) using the mean values of physicochemical parameters computed from the sequence.
(A) Number of contacts versus hydrophobicity. (B) Number of contacts versus net charge. (C) Number of contacts versus C-a B-factors.
(D) Net charge versus hydrophobicity. (E) Net charge versus C-a B-factors. (F) Hydrophobicity versus C-a B-factors.
Intrinsic disorderin SV-IV S. Vilasi and R. Ragone
766 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
Fig. 2. Analysis of the SV-IV sequence using well-known predictors. The original graphic output of each method and the corresponding inter-
pretation are shown. In
HCA, the protein sequence is shown on a duplicated a-helical net with hydrophobic clusters identified by solid con-
tours and amino acid numbers indicated on the top.
, ¤, h and refer to proline, glycine, threonine and serine, respectively.
S. Vilasi and R. Ragone Intrinsicdisorderin SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 767
the SVM method to compare the SV-IV sequence with
the primary structures of 80 ideally folded and
90 natively unfolded proteins. Fig. 4A shows the mean
values of the disorder score for all of these proteins.
Although the regions covered by the two protein data-
sets overlap to some extent, the SV-IV datum clearly
belongs to the region populated by natively unfolded
proteins. With regard to the second point, other
authors [35] have devised an optimal set of artificial
parameters for 20 amino acid residues by Monte Carlo
algorithm, by which they have obtained maximal sepa-
ration between sets of natively unfolded and ideally
globular proteins. Following the same rationale as
above, we compared the mean value of the artificial
parameter for SV-IV and the two sets of proteins.
Even in this case, the SV-IV datum unequivocally falls
amongst natively unfolded proteins, whose data points
are well separated from those of globular proteins
(Fig. 4B). Finally, Fig. 4C summarizes the results
obtained by other algorithms, such as dispro [58],
some additional methods not included in the pondr
package developed by Dunker et al. [59,60], and
aa 39–40 and 85–90 have borderline disorder (probability
very close to 0.5). The remaining regions are predicted as
disordered
POODLE-SPOODLE-L
The whole protein is predicted as disordered
POODLE-W
FOLDUNFOLD
The whole protein is predicted as disordered
0 10 20 30 40 50 60 70 80 90
Residue position
17
18
19
20
21
22
Expected number of contacts
A
B
Disorder probability
Residue positions
0
0.5
1
0 20 40 60 80
Disorder probability
Residue positions
0
0.5
1
0 20 40 60 80
Fig. 3. Analysis of the SV-IV sequence using improved performance programs. Graphic output of FOLDUNFOLD [36–38] (A) and POODLE [44–46]
(B) predictors.
Intrinsic disorderin SV-IV S. Vilasi and R. Ragone
768 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
drippred [61]. All of these algorithms agreed in
predicting that 100% amino acids in the SV-IV
sequence are disordered, except drippred, which
resulted in 32% of residues scoring as regular
structure.
Discussion
The structural information re-examined here indicates
that intrinsicdisorder is abundant in SV-IV. Thus, it
was to be expected that homology modelling and
fold-recognition strategies would be unable to create
a theoretical model of the structural organization of
SV-IV [22]. Indeed, we have used several disorder
predictors to obtain novel evidence that the odd
behaviour of SV-IV is not compatible with the classi-
cal sequence–structure–function paradigm. Our predic-
tions suggest that: (a) the entire SV-IV sequence does
not encode any region with globular organization;
(b) a few isolated segments (mostly the C-terminal
region) may possess some regular structure; (c) the
prediction of regular structure almost exclusively
comes from methods based on Protein Data Bank
(PDB) missing coordinates (disembl routines, dis-
opred
2 and drippred) and secondary structure-
derived propensities (globplot with Deleage–Roux
and Russell–Linding parameters); and (d) the mean
physicochemical properties of SV-IV are typical of
IDPs, as suggested by methods based on visual
inspection. This could provide a clue for the clarifica-
tion of the still obscure aspects of the SV-IV struc-
ture–function relationships.
Lack of consensus affecting disorder prediction in
some regions of SV-IV may result from the different
sensitivity displayed by disorder predictors towards the
various functional properties that are encoded in sepa-
rate segments of the protein sequence. Indeed, integrity
of the primary structure was found to be necessary for
immunomodulation, whereas all of the procoagulant
and anti-inflammatory properties were located in the
fragment 1–70, which is devoid of any immunomodu-
latory activity, but possesses the same procoagulant
and anti-inflammatory activity as the native protein.
Moreover, the fragment 8–16 was the shortest N-ter-
minal-derived peptide that possessed equivalent
or slightly higher anti-inflammatory activity than
DISpro
Predictor Disordered region
1–90
VL3, VL3H, VL3E 1–90
DRIPPRED 1–11, 18–47, 58–80
VL2 1–90
–9
–6
–3
0
3
400 600 800
Number of residues inprotein Number of residues inprotein
Di
sor
d
er score
–4
–2
0
2
4
6
8
A
C
B
0 200 400 600 800 0 200
Artificial parameters
Fig. 4. Additional predictions of disorder. Comparison of the SV-IV sequence with the primary structures of 90 natively unfolded and 80 ide-
ally globular proteins (same symbols as in Fig. 1) using the SVM method [26,57] (A) and an optimal set of artificial parameters [35] (B).
(C) Results obtained by other algorithms.
S. Vilasi and R. Ragone Intrinsicdisorderin SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 769
the native protein, but did not possess any immuno-
modulatory or procoagulant activity. Finally, CNBr
cleavage of SV-IV at the single Met70 residue gener-
ated the biologically inactive 71–90 peptide [16],
suggesting that the immunomodulatory properties of
SV-IV are strictly governed by the cooperation
between this and the 1–70 region.
Concerning the organization ofSV-IV, the results
reported here are in substantial agreement with pre-
vious secondary structure predictions, at least with
regard to the 1–70 region. In fact, the self-association
process that underlies the overall functional behav-
iour of the protein induces conformational changes
mainly in this region, which has been suggested to
be without secondary structure in the monomer, but
to contain some a-helix in the trimer [22]. However,
minor discrepancies amongst disorder predictions, as
well as between disorder and secondary structure
predictions, suggest that several peptide segments
within the protein sequence might display chameleon
structural behaviour. In this regard, previous experi-
ments in buffer solution [18] have shown that a
structural rearrangement of SV-IV takes place after
treatment with 0.2–6.0 mm SDS. As this interval
includes the critical micellar concentration of the sur-
factant (2.6 mm) [62,63], it may be inferred that
SV-IV interacts with the membrane-like environment
of SDS micelles, either through direct formation of a
protein–surfactant complex or by an indirect process
in which the micelle is formed first and the protein
is then inserted into it. This process is totally differ-
ent from the non-specific massive cooperative binding
of SDS to proteins at submicellar concentrations,
and mimics the situation that SV-IV experiences in
most cell-based biological assays, where its multi-
faceted biological function involves efficient binding
to the plasma membrane of its target cells (macro-
phages, T lymphocytes and polymorphonuclear cells)
at specific sites (K
d
@ 10
)7
–10
)8
) [16], and can be
obtained only through large plasticity of the
structure.
Materials and methods
Protein databases
The database of disordered proteins was created using a list
of natively unfolded proteins [39] and the SWISS-PROT
protein sequence data bank [64]. The ideal database of
globular proteins is available at the address http://phys.
protres.ru/resources/folded_80.html [35,37], as selected by
inspecting the four general classes in the SCOP database
(1.63 release) [65].
Physicochemical parameters
The mean protein hydrophobicity was calculated using the
Kyte–Doolittle Scale [66], rescaled to a range of 0–1 [33].
The expected average number of contacts per residue in the
globular state was calculated according to [35]. The mean
net charge was defined as the absolute value of the differ-
ence between the numbers of positively and negatively
charged residues at pH 7.0, divided by the total residue
number, according to [39]. The average structural B-factor
(isotropic temperature factor) scale (2.0 SD) was obtained
from [32], where only the B-factors for the C-a atoms were
considered to minimize influence by crystal packing and
other structural artefacts.
Predictors of disorder
Below, we list all predictors used in this study, pointing out
their salient features. A detailed description of each predic-
tor is outside the scope of this paper, and the reader inter-
ested in more details is invited to refer to the relevant
article(s). The seg algorithm (http://mendel.imp.ac.at/
METHODS/seg.server.html), based on the rationale that
compact globular structures exhibit quasi-random statistical
properties, is designed to detect regions of biased amino
acid composition using mathematically defined properties
[47]. The stringency of the search for low-complexity
segments is determined by three user-defined parameters
[trigger window, W; trigger complexity, K(1); extension
complexity, K(2)], using the seg sequences 45, 3.4, 3.75 and
25, 3.0, 3.3 for long and short non-globular domains,
respectively. Predictors of natural disordered regions
(PONDRs) included in the pondr collection (http://
www.pondr.com) are typically feed-forward neural net-
works trained on non-redundant sets of ordered and disor-
dered sequences that help to ensure modest predictor biases
and to enable the predictors to generalize to new sequences
[27–29]. PONDRs come in several versions depending on
the sequence attributes taken over windows of 9–21 amino
acids. These attributes, such as the fractional composition
of particular amino acids, hydropathy or sequence com-
plexity, are averaged over these windows, and the values
are used to train the neural network during predictor con-
struction. The same values are used as inputs to make pre-
dictions. The regional order neural network (ronn)
software, originally developed to identify protease cleavage
sites, is a method based on sequence alignment available at
http://www.strubi.ox.ac.uk/RONN [48]. The iupred server
at http://iupred.enzim.hu estimates favourable pairwise con-
tacts inprotein sequences and assigns order ⁄ disorder status
based on the assumption that intrinsically unstructured ⁄
disordered proteins and domains (IUPs) have special
sequences that do not fold because of their inability to
form sufficient stabilizing inter-residue interactions [49].
The disembl software available at http://dis.embl.de is
Intrinsic disorderin SV-IV S. Vilasi and R. Ragone
770 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
based on artificial neural networks trained to assign disor-
der by using three different definitions of disorder: residues
within loops ⁄ coils, residues within loops with a high degree
of mobility as determined from X-ray temperature factors
(B-factors), and residues with PDB missing coordinates as
defined by Remark465 entries in PDB [34]. The disopred
2
disorder prediction server at http://bioinf.cs.ucl.ac.uk/
disopred restrains the definition ofdisorder to those resi-
dues that appear in the sequence records but with coordi-
nates missing from the electron density map, and an SVM
was trained to specifically recognize these [50]. globplot
(http://globplot.embl.de) is a web service based on the ten-
dency of residues to be in an ordered or disordered state,
and uses different propensity sets based on amino acid
hydrophobicities (Kyte–Doolittle and Hopp–Woods), B-fac-
tors, PDB missing coordinates and secondary structure-
derived propensities (Deleage–Roux and Russell–Linding)
[32]. norsp is an on-line predictor of NORS regions that is
not trained on any dataset and predicts segments in which
the content in regular secondary structure is below
12% over at least 70 consecutive residues, and at least
10 consecutive residues are predicted to be exposed. It can
be accessed at http://cubic.bioc.columbia.edu/services/
NORSp [51]. The identification of hydrophobic clusters was
performed by hca available at http://bioserv.rpbs.jussieu.fr,
which allows the easy identification of globular regions
from non-globular ones and, in globular regions, the identi-
fication of secondary structures [30]. prelink (http://
genomics.eu.org/spip/PreLink) is an hca-derived method
that calculates the amino acid distributions in structured
and unstructured regions, the probability that a given
sequence fragment is part of either a structured or an
unstructured region, and the distance of each amino acid to
the nearest hydrophobic cluster. Using these three values
along aprotein sequence, unstructured regions can be pre-
dicted with very simple rules [31]. The multicoil program
(http://groups.csail.mit.edu/cb/multicoil/cgi-bin/multicoil.cgi)
predicts the location of coiled-coil regions in amino acid
sequences and classifies the predictions as dimeric or tri-
meric [55]. coils (http://ch.embnet.org/software/COILS_
form.html) is a program that compares a sequence with a
database of known parallel two-stranded coiled-coils and
derives a similarity score. By comparing this score with the
distribution of scores in globular and coiled-coil proteins,
the program then calculates the probability that the
sequence will adopt a coiled-coil conformation [56].
Predictions with improved performance were carried out
by the foldunfold web server available at http://skuld.
protres.ru/~mlobanov/ogu/ogu.cgi, based on the observa-
tion that disorder is connected to a weak expected packing
density, as evaluated by the observed number of contacts
within 8 A
˚
for each amino acid residue in the globular state
[35–38], and the SVM-based poodle (prediction of order
and disorder by machine learning, http://mbs.cbrc.jp/
poodle) system. The poodle suite predicts protein disorder
from amino acid sequences and provides three types of pre-
dictions: poodle-l and poodle-s predict long disorder
regions (mainly longer than 40 consecutive amino acids) and
short disorder regions, respectively; poodle-w is for binary
prediction of whole proteindisorder [44–46].
Another SVM method for recognizing IDPs was applied
according to the procedure described in [26,57], using the
mySVM implementation of SVM theory by Ru
¨
ping [67].
The set of artificial parameters for 20 amino acid residues
calculated by the Monte Carlo algorithm to maximally sep-
arate natively unfolded and ideally globular proteins was
obtained from [35]. Additional predictions were performed
by: dispro software (http://www.igb.uci.edu/servers/psss.
html), which relies on machine learning methods and lever-
ages evolutionary information as well as predicted second-
ary structure and relative solvent accessibility [58]; the VL2
and VL3 predictors available at http://www.ist.temple.edu/
disprot/predictor.php, which rely on partitioning protein
disorder into flavours based on competition amongst
increasing numbers of predictors [59] and on an ensemble
of feed-forward neural networks based on the same attri-
butes as VL2 [60], respectively; and the drippred server
(http://www.sbc.su.se/~maccallr/disorder), developed for
sequence profile visualization and contact map prediction,
which predicts structural disorder by looking for sequence
patterns that are not typically found in the PDB [61].
Acknowledgements
This paper is dedicated to the memory of the unforget-
table Harold C. Helgeson (a.k.a. Hal), founder of the
Laboratory of Theoretical Geochemistry and Biogeo-
chemistry at U. C. Berkeley (a.k.a. Prediction Central),
who is probably sailing off the coast near Margarita-
ville. The authors are grateful to V. N. Uversky for his
help in creating the list of natively unfolded proteins.
References
1 Dunker AK, Brown CJ, Lawson JD, Iakoucheva LM &
Obradovic Z (2002) Intrinsicdisorder and protein func-
tion. Biochemistry 41, 6573–6582.
2 Wright PE & Dyson HJ (1999) Intrinsically unstruc-
tured proteins: re-assessing the protein structure–func-
tion paradigm. J Mol Biol 293, 321–331.
3 Dyson HJ & Wright PE (2005) Intrinsically unstruc-
tured proteins and their functions. Nat Rev Mol Cell
Biol 6, 197–208.
4 Dunker AK & Obradovic Z (2001) The protein trinity –
linking function and disorder. Nat Biotechnol 19, 805–
806.
5 Uversky VN (2002) Natively unfolded proteins: a point
where biology waits for physics. Protein Sci 11, 739–
756.
S. Vilasi and R. Ragone Intrinsicdisorderin SV-IV
FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 771
6 Radivojac P, Iakoucheva LM, Oldfield CJ, Obradovic
Z, Uversky VN & Dunker AK (2007) Intrinsic disorder
and functional proteomics. Biophys J 92, 1439–1456.
7 Tompa P (2002) Intrinsically unstructured proteins.
Trends Biochem Sci 27, 527–533.
8 Metafora S, Esposito C, Caputo I, Lepretti M, Cassese
D, Dicitore A, Ferranti P & Stiuso P (2007) Seminal
vesicle protein IV and its derived active peptides: a pos-
sible physiological role inseminal clotting. Semin
Thromb Hemost 33, 53–59.
9 Ostrowski MC, Kistler MK & Kistler WS (1979) Purifi-
cation and cell-free synthesis ofa major protein from
rat seminalvesicle secretion. A potential marker for
androgen action. J Biol Chem 254, 383–390.
10 Pan Y-CE & Li SSL (1982) Structure of secretory pro-
tein IV fromratseminal vesicles. Int J Pept Protein Res
20, 177–187.
11 Harris SE, Mansson P-E, Tully DB & Burkhart B
(1983) Seminalvesicle secretion IV gene: allelic differ-
ence due to a series of 20-base-pair direct tandem
repeats within an intron. Proc Natl Acad Sci USA 80 ,
6460–6464.
12 Kandala C, Kistler MK, Lawther RP & Kistler WS
(1983) Characterization ofa genomic clone for rat semi-
nal vesicle secretory protein IV. Nucleic Acids Res 11,
3169–3186.
13 McDonald C, Williams L, McTurck P, Fuller F,
McIntosh E & Higgins S (1983) Isolation and charac-
terisation of genes for androgen-responsive secretory
proteins ofratseminal vesicles. Nucleic Acids Res 11,
917–930.
14 D’Ambrosio E, Del Grosso N, Ravagnan G, Peluso G
& Metafora S (1993) Cloning and expression of the rat
genomic DNA sequence coding for the secreted form of
the protein SV-IV. Bull Mol Biol Med 18, 215–223.
15 Metafora S, Facchiano F, Facchiano A, Esposito C,
Peluso G & Porta R (1987) Homology between rabbit
uteroglobin and the ratseminalvesicle sperm binding
protein: prediction of structural features of glutamine
substrates for transglutaminase. J Protein Chem 6,
353–359.
16 Ialenti A, Santagada V, Caliendo G, Severino B,
Fiorino F, Maffia P, Ianaro A, Morelli F, Di Micco B,
Cartenı
`
M et al. (2001) Synthesis of novel anti-inflam-
matory peptides derived from the amino-acid sequence
of the bioactive protein SV-IV. Eur J Biochem 268,
3399–3406.
17 Miele L, Cordella-Miele E, Facchiano A & Mukherjee
AB (1988) Novel anti-inflammatory peptides from the
region of highest similarity between uteroglobin and
lipocortin I. Nature 335, 726–730.
18 Stiuso P, Ragone R, De Santis A, Metafora S, Peluso
G, Ravagnan G & Colonna G (1989) Structural
properties ofratseminalvesicleprotein IV: effect of
sodium dodecylsulfate. In Biochemical Aspects on the
Immunopathology of Reproduction (Spera G, Mukherjee
AB, Ravagnan G & Metafora S, eds), pp. 105–111.
Acta Medica, Rome.
19 Ragone R, Facchiano F, Facchiano A, Facchiano AM
& Colonna G (1989) Flexibility plot of proteins. Protein
Eng 2, 497–504.
20 Stiuso P, Metafora S, Facchiano AM, Colonna G &
Ragone R (1999) The self association ofprotein SV-IV
and its possible functional implications. Eur J Biochem
266, 1029–1035.
21 Tufano MA, Porta R, Farzati B, Di Pierro P,
Rossano F, Catalanotti P, Baroni A & Metafora S
(1996) Ratseminalvesicleprotein SV-IV and its
transglutaminase-synthesized polyaminated derivative
Spd
2
-SV-IV induce cytokine release from human rest-
ing lymphocytes and monocytes in vitro. Cell Immunol
168, 148–157.
22 Caporale C, Caruso C, Colonna G, Facchiano A, Ferr-
anti P, Mamone G, Picariello G, Colonna F, Metafora
S & Stiuso P (2004) Structural properties of the protein
SV-IV. Eur J Biochem 271, 263–271.
23 Ferron F, Longhi S, Canard B & Karlin D (2006) A
practical overview ofproteindisorder prediction meth-
ods. Proteins 65, 1–14.
24 Romero P, Obradovic Z, Li X, Garner EC, Brown CJ
& Dunker AK (2001) Sequence complexity of dis-
ordered protein. Proteins 42, 38–48.
25 Sickmeier M, Hamilton JA, LeGall T, Vavic V, Cortese
MS, Tantos A, Szabo B, Tompa P, Chen J, Uversky
VN et al. (2007) DisProt: the database of disordered
proteins. Nucleic Acids Res 35, D786–793.
26 Weathers EA, Paulaitis ME, Woolf TB & Hoh JH
(2004) Reduced amino acid alphabet is sufficient to
accurately recognize intrinsically disordered protein.
FEBS Lett 576, 348–352.
27 Romero P, Obradovic Z & Dunker AK (1997) Sequence
data analysis for long disordered regions prediction in
the calcineurin family. Genome Inform 8, 110–124.
28 Li X, Romero P, Rani M, Dunker AK & Obradovic Z
(1999) Predicting proteindisorder for N-, C-, and inter-
nal regions. Genome Inform 10, 30–40.
29 Obradovic Z, Peng K, Vucetic S, Radivojac P & Dun-
ker AK (2005) Exploiting heterogeneous sequence prop-
erties improves prediction ofprotein disorder. Proteins
61 (Suppl. 7), 176–182.
30 Gaboriaud C, Bissery V, Benchetrit T & Mornon JP
(1987) Hydrophobic cluster analysis: an efficient new
way to compare and analyse amino acid sequences.
FEBS Lett 224, 149–155.
31 Coeytaux K & Poupon A (2005) Prediction of unfolded
segments inaprotein sequence based on amino acid
composition. Bioinformatics 21, 1891–1900.
32 Linding R, Russell RB, Neduva V & Ginson TJ (2003)
GlobPlot: exploring protein sequences for globularity
and disorder. Nucleic Acids Res 31, 3701–3708.
Intrinsic disorderin SV-IV S. Vilasi and R. Ragone
772 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS
[...]... predicting proteindisorder by using physicochemical features and reduced amino acid set ofa position specific scoring matrix Bioinformatics 23, 2337–2338 46 Shimizu K, Muraoka Y, Hirose S, Tomii K & Noguchi T (2007) Predicting mostly disordered proteins by using structure-unknown protein data BMC Bioinformatics 8, 78 47 Wootton JC (1994) Non-globular domains inprotein sequences: automated segmentation... Paulaitis ME, Woolf TB & Hoh JH (2007) Insights into protein structure and function from disorder- complexity space Proteins 66, 16–28 58 Cheng J, Sweredoski M & Baldi P (2005) Accurate prediction ofprotein disordered regions by mining protein structure data Data Min Knowl Disc 11, 213–222 59 Vucetic S, Brown CJ, Dunker AK & Obradovic Z (2003) Flavors ofproteindisorder Proteins 52, 573–584 60 Obradovic... CJ, Radivojac P, Brown CJ & Dunker AK (2003) Predicting intrinsicdisorderfrom amino acid sequence Proteins 53 (Suppl 6), 566–572 61 MacCallum RM (2004) Striped sheets and protein contact prediction Bioinformatics 20 (Suppl 1), I224–I231 62 Esposito C, Colicchio P, Facchiano A & Ragone R (1998) Effect ofa weak electrolyte on the critical micellar concentration of sodium dodecyl sulfate J Colloid Interface... Interface Sci 200, 310–312 63 Ambrosone L & Ragone R (1998) The interaction of micelles with added species and its similarity to the denaturant binding model of proteins J Colloid Interface Sci 205, 454–458 FEBS Journal 275 (2008) 763–774 ª 2008 The Authors Journal compilation ª 2008 FEBS 773 Intrinsicdisorderin SV-IV S Vilasi and R Ragone 64 Bairoch A & Apweiler R (2000) The SWISS-PROT protein sequence... sequence database and its supplement TrEMBL in 2000 Nucleic Acids Res 28, 45–48 65 Murzin AG, Brenner SE, Hubbard T & Chothia C (1995) SCOP: a structural classification ofprotein database for the investigation of sequences and structures J Mol Biol 247, 536–540 774 66 Kyte J & Doolittle RF (1982) A simple method for displaying the hydropathic character ofaprotein J Mol Biol 157, 105–132 67 Ruping S (2000)... theoretical principles to practical applications Curr Protein Pept Sci 8, 135–149 41 Quevillon-Cheruel S, Leulliot N, Gentils L, van Tilbeurgh H & Poupon A (2007) Production and crystallization ofprotein domains: how useful are disorder predictions? Curr Protein Pept Sci 8, 151–160 ´ ´ 42 Dosztanyi Z, Sandor M, Tompa P & Simon I (2007) Prediction ofproteindisorder at the domain level Curr Protein Pept... 1453–1459 35 Garbuzynskiy SO, Lobanov MY & Galzitskaya OV (2004) To be folded or to be unfolded? Protein Sci 13, 2871–2877 36 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY (2006) FoldUnfold: web server for the prediction of disordered regions inprotein chain Bioinformatics 22, 2948–2949 37 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY (2006) Prediction of natively unfolded regions inprotein chain Mol Biol... 341–348 38 Galzitskaya OV, Garbuzynskiy SO & Lobanov MY (2006) Prediction of amyloidogenic and disordered regions inprotein chains PLoS Comput Biol 2, 1639– 1648 39 Uversky VN, Gillespie JR & Fink AL (2000) Why are ‘natively unfolded’ proteins unstructured under physiologic conditions? Proteins 41, 415–427 40 Bourhis JM, Canard B & Longhi S (2007) Predicting proteindisorder and induced folding: from theoretical... web-based support vector machine software for functional classification ofaproteinfrom its primary sequence Nucleic Acids Res 31, 3692–3697 55 Wolf E, Kim PS & Berger B (1997) multicoil: a program for predicting two- and three-stranded coiled coils Protein Sci 6, 1179–1189 56 Lupas A, Van Dyke M & Stock J (1991) Predicting coiled coils fromprotein sequences Science 252, 1162– 1164 57 Weathers EA, Paulaitis... Bornberg-Bauer E, Rivals E & Vingron M (1998) Computational approaches to identify leucine zippers Nucleic Acids Res 26, 2740–2746 53 Lin HH, Han LY, Zhang HL, Zheng CJ, Xie B, Cao ZW & Chen YZ (2006) Prediction of the functional class of metal-binding proteins from sequence derived physicochemical properties by support vector machine approach BMC Bioinformatics 7 (Suppl 5), S13 54 Cai CZ, Han LY, Ji ZL, . Abundance of intrinsic disorder in SV-IV, a multifunctional
androgen-dependent protein secreted from rat seminal
vesicle
Silvia Vilasi and Raffaele Ragone
Dipartimento. P,
Rossano F, Catalanotti P, Baroni A & Metafora S
(1996) Rat seminal vesicle protein SV-IV and its
transglutaminase-synthesized polyaminated derivative
Spd
2
-SV-IV