Revisitingthepredictionofproteinfunctionat CASP6
Marialuisa Pellegrini-Calace
1,
*, Simonetta Soro
1,
* and Anna Tramontano
1,2
1 Department of Biochemical Sciences ‘A. Rossi Fanelli’, University ‘La Sapienza’, Rome, Italy
2 Istituto Pasteur, Fondazione ‘Cenci-Bolognetti’, University ‘La Sapienza’, Rome, Italy
Modern biology strongly exploits the rapid progress in
generation of experimental data and development of
computational methods. The identification and charac-
terization of proteins on a genome-wide scale is
accomplished by proteomics projects, and computa-
tional biology plays a pivotal role in the determination
of their structure–function relationships and in the pre-
diction of their biological functions [1]. A number of
computational and experimental methods have been
developed in the last few years to try to address the
function prediction issue using available information,
from sequence homology to orthology, gene context
and structural features [2,3]. Despite the complexity of
the relationship between protein fold and protein func-
tion and the existence of proteins with multiple differ-
ent functions [4,5], global structure similarity can often
Keywords
Critical Assessment of Techniques for
Protein Structure Prediction (CASP);
function prediction; protein function;
structural genomics
Correspondence
A. Tramontano, Department of Biochemical
Sciences ‘A. Rossi Fanelli’, University ‘La
Sapienza’, P.le Aldo Moro, 5-00185 Rome,
Italy
Fax: +39 06 4440062
Tel: +39 06 49910556
E-mail: anna.tramontano@uniroma1.it
*These authors contributed equally to this
work.
(Received 21 February 2006, revised
11 April 2006, accepted 5 May 2006)
doi:10.1111/j.1742-4658.2006.05309.x
The ability to predict thefunctionof a protein, given its sequence and ⁄ or
3D structure, is an essential requirement for exploiting the wealth of data
made available by genomics and structural genomics projects and is there-
fore raising increasing interest in the computational biology community.
To foster developments in the area as well as to establish the state of the
art of present methods, a functionprediction category was tentatively
introduced in the 6th edition ofthe Critical Assessment of Techniques for
Protein Structure Prediction (CASP) worldwide experiment. The assessment
of the performance ofthe methods was made difficult by at least two fac-
tors: (a) the experimentally determined functionofthe targets was not
available atthe time of assessment; (b) the experiment is run blindly, pre-
venting verification of whether the convergence of different predictions
towards the same functional annotation was due to the similarity of the
methods or to a genuine signal detectable by different methodologies. In
this work, we collected information about the methods used by the various
predictors and revisited the results ofthe experiment by verifying how
often and in which cases a convergent prediction was obtained by methods
based on different rationale. We propose a method for classifying the type
and redundancy ofthe methods. We also analyzed the cases in which a
function for the target protein has become available. Our results show that
predictions derived from a consensus of different methods can reach an
accuracy as high as 80%. It follows that some ofthe predictions submitted
to CASP6, once reanalyzed taking into account the type of converging
methods, can provide very useful information to researchers interested in
the functionofthe target proteins.
Abbreviations
BIND, binding feature number indicating the nature of putative interacting partners; BS, binding site residue identifier; CASP, Critical
Assessment of Techniques for Protein Structure Prediction; GOC, GO cellular component number; GOF, GO molecular function number;
GOP, GO biological process number; PT, post-translational modification number; RESIDUE-ROLE, free comment on residues with a putative
peculiar role.
FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2977
help in assigning a function to proteins [6,7]. This abil-
ity is important for the many ongoing structural
genomics projects that will provide us with more and
more structures of proteins with very low sequence
identity with proteins of known structure.
The task of predicting thefunctionof a protein is
exceptionally interesting but very challenging. The
existence of paralogous relationships implies that a
common evolutionary origin does not guarantee com-
mon function [8,9]. Moreover, the discovery of moon-
lighting proteins, able to perform different functions in
different conditions or environments, makes the prob-
lem even more complex [10,11].
The Critical Assessment of Techniques for Protein
Structure Prediction (CASP) [12] community recog-
nized the relevance of this issue and tried to foster
novel developments by setting up a function prediction
category in addition to the well-known structure pre-
diction ones. The question addressed was whether and
in which cases computational methods are able to pro-
vide useful information about the molecular or biologi-
cal functionof an unknown protein, with the aim of
providing researchers with potentially useful informa-
tion [12].
This category is intrinsically different from the
other CASP categories because, atthe end of the
experiment, thefunctionofthe target protein is likely
still to be unknown. However, the analysis of the
submitted predictions made the assessors conclude
that [13]:
(a) groups predicting the 3D structure of a protein
only rarely used this information to predict its
function as well, and vice versa;
(b) in a substantial fraction of cases, the same predic-
tion was submitted by different groups and there-
fore a ‘prediction consensus’ could be derived for
some targets.
CASP is run blindly. This implies that the assessor
should not know the identity ofthe predicting groups
and therefore cannot take into account the method
used for deriving a prediction. Indeed it was suggested
and accepted that, in subsequent editions ofthe experi-
ment, a general description ofthe method used should
be made available to the assessor.
After the experiment was concluded, we revisited the
data collected and reassessed the results, taking into
account the methods used. We took advantage of the
knowledge ofthe identity ofthe predicting groups as
well as of functional annotations that have become
available in the mean time. Our results show that a
basic knowledge ofthe methods is important for
understanding the level at which predictions can be
trusted and for assessing the reliability ofthe predicted
functions. They also show that predictions derived by
a consensus among groups using different, non-redun-
dant, methods can reach an accuracy of 80%.
Results
The protein set atthe beginning ofCASP6 contained
87 targets, 23 of which were discarded during the
experiment because of practical issues, such as early or
late release ofthe 3D protein structure. Therefore, the
set that was considered contained 64 protein targets,
29 of which had no functional annotation in any data-
base atthe time ofthe experiment. A function predic-
tion for at least one ofthe 64 targets was submitted by
23 ofthe total 172 predictors.
Within theCASP6 experiment, seven classes of func-
tion predictions were considered: GO molecular func-
tion numbers (GOFs), GO biological process numbers
(GOPs), GO cellular component numbers (GOCs),
binding feature numbers indicating the nature of puta-
tive interacting partners (BINDs), binding site residues
identifiers (BSs), free comments on residues with a
putative peculiar role (RESIDUE-ROLEs) and post-
translational modification numbers (PTs). In the pre-
sent study, the BS and RESIDUE-ROLE subsets were
not analyzed because ofthe high variability in the type
and format ofthe submitted predictions.
We considered 1590 total function predictions sub-
mitted by 18 groups for 64 protein targets, as GOFs
(568), GOPs (445), GOCs (363), BINDS (150), and
PTs (64).
Method classification
The first step was to recover the information about the
method used by each predictor, by inspecting the
abstracts submitted to CASP, performing literature
searches and, in some cases, directly contacting the
predictors.
Each method was assigned to one or more of five
categories (F1 to F5), here called features, on the basis
of the type of information used by predictors. The use
of sequence information corresponded to the F1 cate-
gory. F2 indicated the use of structural features. Meth-
ods using the GO database for any reason other than
deriving GO numbers for submission were assigned to
F3. Literature-based methods and manual methods
were indicated as F4 and F5, respectively (Table 1).
Therefore, each method could be classified by a five bit
binary code indicating the presence (1) or absence (0)
of each ofthe five features. For instance, the 1000
code corresponds to completed automated (F5 ¼ 0)
methods based on sequence information (F1 ¼ 1)
Prediction ofproteinfunction M. Pellegrini-Calace et al.
2978 FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS
which do not take advantage of structural (F2 ¼ 0),
GO (F3 ¼ 0) and literature (F4 ¼ 0) information.
The distribution of single features and binary combi-
nations are shown in Fig. 1A and Fig. 1B, respect-
ively.
Although the possible theoretical combinations of
features are 25 (5
2
), the F1 feature was used by all the
18 predictor groups, reducing the possible binary com-
binations to 20. The observed binary codes were only
8, showing a lower than expected variability in the
combination of used information. More than 50% of
the possible combinations were not found, and the
majority of predictors used canonical feature combina-
tions. In fact, all groups used sequence information,
nine took advantage of literature information, and six
used structure information, but only four predictors
exploited the GO database and only one of them com-
bined it with structural features. Moreover, predictors
using GO never took literature information into
account and vice versa. It is worth highlighting here
that 11 groups used an approach developed in-house
for features F1, F2 or F3.
For each ofthefunctionprediction classes, we com-
puted a consensus value and a redundancy value
within the consensus [F(red)] (Table 2 and Fig. 2). A
consensus is defined as the number of identical predic-
tions submitted by at least two predictors. The redund-
ancy value F(red) indicates the method variability in
terms of feature combinations within a consensus and
is calculated as follows:
F(red) ¼
N(red)
N(tot)
where N(red) and N(tot) are the number of methods
with the same binary identifier, i.e. the number of
redundant methods, and the total number of methods
generating the consensus, respectively. Lower F(red)
values reflect a higher variability in the type of meth-
ods generating the consensus.
For the GOF, GOP and GOC classes, we also com-
puted a consensus value among the GO parents of the
submitted predictions for up to three levels ofthe GO
ontology (P1, P2 and P3), to verify whether less speci-
fic consensus could be achieved by different methods.
The results are shown in Table 2.
A consensus was never found for prediction categor-
ies other than GOF, GOP and BIND, although the
number of GOF and GOP consensus was significantly
Table 1. List of features used to classify methods exploited for the
predictions.
Number Description Included tools
F1 Sequence information
BLAST, PFAM, INTERPRO, CHOP,
CHIEFC, PROSITE, SMART,
PRINTS, PHYRE, others
F2 Structure information 3
D-JURY, HMAP, PROFUNC,
COLUMBA, PHUNCTIONER
F3 GO database
F4 Literature information
F5 Manual intervention
25.00
20.00
15.00
10.00
5.00
0.00
F1 (seq)
F2 (str)
F3 (GO)
F4 (lit)
F5 (man)
Own
Unknown
N(met)
A
4
5
2
2
2
1
1
1
11011
10011
10100
10001
10000
11100
10101
11001
B
Fig. 1. Distribution of method features. (A) Number of methods
including F1 (sequence information), F2 (structure information), F3
(use of GO database for any other reason than deriving GO
numbers for submission), F4 (literature information), F5 (manual
intervention), number of methods developed in-house (own) and
number of methods for which no description is available
(unknown). (B) Distribution of binary method class identifiers
among the 18 methods submitting predictions (in-house developed
methods and methods for which no information is available are not
included).
Table 2. Number of GO identifiers predicted by at least two
groups. GOF, Number of GO function predictions; GOP, number of
GO process predictions; GOC, number of GO cellular component
predictions; BIND, number of binding predictions; PT, number of
post-transcriptional modification predictions; NA, not applicable. The
number of predictions corresponding to the ‘unknown’ annotation
is shown in parentheses.
GOF GOP GOC BIND PT
Consensus 70 (4) 19 (17) 0 31 0
Consensus P1 72 (4) 28 (21) 1 NA NA
Consensus P1-P2 74 (4) 33 (25) 10 NA NA
Consensus P1-P3 78 (4) 35 (26) 27 NA NA
M. Pellegrini-Calace et al. Predictionofprotein function
FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2979
lower than the number of submitted predictions per
targets (70 out of 450 and 19 out of 457 for GOF and
GOP, respectively). The fraction of targets for which a
consensus could be found is high for the BIND class,
accounting for about one third ofthe total submitted
predictions (31 out of 150).
About half ofthe consensus predictions were gener-
ated by two predictions only, except for the GOP class,
where about 40% ofthe consensus predictions were
obtained by three independent methods. It should be
noticed that some ofthe consensus predictions also
included annotations such as ‘unknown molecular func-
tion’ or ‘unknown biological process’, highlighted in
parentheses in Table 2. The exclusion of ‘unknown bio-
logical process’ predictions left only two ofthe 19 GOP
consensus predictions that were generated by three sub-
missions corresponding to three different methods.
Figure 2B shows a histogram ofthe fraction of
redundancy for the three functional classes. Interest-
ingly, redundancy values between 0 and 0.2 were often
observed, corresponding to a variability ofat least
80% in the combinations of features generating the
consensus.
When annotations were grouped according to parent
levels ofthe respective GO terms (one level, P1; two
levels, P2; three levels, P3), neither the number of
consensus predictions nor the fraction of redund-
ancy changed significantly (supplementary material,
Fig. S1). This is most likely due to the somewhat lim-
ited depth ofthe GO graph, so that the existence of a
common node between two predictions does not neces-
sarily provide additional information.
Target function annotation versus time
At the beginning oftheCASP6 experiment, 42, 32 and
9 targets had a molecular function, a biological process
and a cellular component annotation, respectively. In
23 cases, information about interaction partners was
available, whereas no annotation about post-transla-
tional modifications was present in the databases. One
year later (October 2005), the available annotations
decreased by 5% to 10%, showing that the knowledge
of the 3D structure of a protein allows its function
annotation to be improved, even if this can just imply
removing a previous annotation (supplementary mater-
ial, Fig. S2). In fact, for 11 targets at least one molecu-
lar function annotation was either modified or deleted
between the end of 2004 and the end of 2005 (supple-
mentary material, Table S1). In the same period, the
number of non-annotated targets decreased by 8, 6, 2
and 1 for GOF, GOP, GOC and BIND, respectively.
These data confirm that the process of assigning a
function to proteins is still a very difficult task for
both experimental and computational biologists and
suggest that there is still a long way to go to fully
exploit genome-scale data.
Interestingly, only in about half ofthe cases did pre-
dictions agree with function assignments that were sub-
sequently removed, suggesting that taking into account
different functional predictions for a given protein
might be helpful in avoiding errors in database annota-
tion.
Predictions versus target function annotation
We can reliably assess the correctness of a prediction
only for those cases where a subsequently released
annotation is available for targets that had no annota-
tion atthe time ofCASP6 (supplementary material,
Fig. S2). This subset was made of only 11 targets and
included 24, 7, 2 and 1 GOF, GOP, GOC and BIND
annotations, respectively. Because ofthe sparseness of
the data, the analysis was limited to GOF functional
assignments only and included the eight targets listed
in Table 3, five of which belong to the comparative
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
2345678910
N(c-preds)
F(c-preds)
GOFs
GOPs
BINDs
A
F(preds)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0.0-0.2 0.2-0.4 0.4-0.6 0.6-0.8 0.8-1.0
F(red)
B
GOFs
GOPs
BINDs
Fig. 2. (A) Fraction of consensus ofprediction [F(preds)] as a func-
tion ofthe number of contributing predictions [N(c-preds)]. (B) Frac-
tion of consensus ofprediction [F(c-preds)] as a functionof the
redundancy ofthe contributing methods [F(red)], i.e. ofthe number
of methods exploiting the same source of information for the pre-
diction (see text for a detailed definition).
Prediction ofproteinfunction M. Pellegrini-Calace et al.
2980 FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS
model (CM) and three to the fold recognition (FR)
classes. Sixty-eight groups submitted 85 predictions, 32
of which converged into 12 consensus predictions.
These predictions were compared with the current tar-
get function annotation (October 2005) present in the
UniProt [13], Entrez Gene [14] and InterPro [15] data-
bases (Tables 3 and 4).
Among the predictions submitted (85 by 68 groups),
about 20% (18) were correct, i.e. overlapped with the
current annotation ofthe corresponding targets. Inter-
estingly, 17 of them were consensus predictions. More-
over, although a significantly higher number of
consensus predictions was observed for CM than for
FR targets (11 against 1, respectively), the only FR
target (T0243) consensus matched the corresponding
function annotation.
Even if the size ofthe dataset is too small to derive
general conclusions, we believe that the success rate
for these predictions supports both the usefulness of
the experiment and the validity of our method for deri-
ving the consensus prediction. Moreover it suggests
that an ‘easy’ structure prediction does not necessarily
correspond to an ‘easy’ function prediction. The defini-
tion of ‘difficulty’ of a target has been the subject
of many debates in the structure prediction field.
Although clearly needed as well, equally complex is
defining the difficulty of a function prediction. Here
we took the view that a target not annotated in any
database atthe time ofprediction is difficult to pre-
dict. Given the time constraints imposed by the CASP
experiment, most ofthe methods used in the experi-
ment are automatic ones and it is unlikely that any of
the predictions include a large amount of human inter-
vention, unlike the case for curated databases.
We feel it is inappropriate to try to derive a ranking
of the various methods on the basis of such a limited
dataset, but if, as we expect, participation in future
experiments increases, it will be possible to derive con-
clusions about the quality of different methods. More
importantly, the number of consensus predictions will
also increase, and this will allow a substantial number
of correct functional predictions to be produced.
The CASP6 assessor highlighted five cases where a
consensus could be derived by comparing the different
submitted predictions, although the design of the
experiment did not allow the redundancy ofthe meth-
ods to be taken into account atthe time. Three of these
targets (T0226, T0243 and T0263) were annotated
between the end ofCASP6 and the time that this analy-
sis was performed. For T0243 and T0263, the newly
deposited annotations match the consensus prediction.
T0243 was predicted and proved to bind DNA, and
T0263 was predicted to have oxidoreductase activity
Table 3. List of targets annotated after October 2004 and the corresponding submitted predictions. Target, CASP6 target identifier; Class,
target classification (CM, comparative modeling; FR, fold recognition; NF, new fold); Ann DB, annotation database (EG, Entrez Gene; IP,
InterPro; UP, UniProt); GOF, GO molecular function identifier, according to GO database definition; Pred
(Sub)
, number of predictions submit-
ted; NP, number of predictors; N
(Cons)
, number of consensus predictions found; Pred
(Cons)
, number of predictions generating the consensus.
Bold, Annotations correctly predicted by at least one group.
Target Class Ann DB GOF Pred
(Sub)
NP N
(Cons)
Pred
(Cons)
T0196 CM EG 3746, 8135, 3676 16 12 2 8
T0205 CM IP 3676, 5488 15 10 3 7
T0211 CM UP 16787, 3824 8 8 1 2
T0215 FR IP 4766, 16765, 8757, 8168, 16741, 16740, 3824 3 3 0 0
T0226 CM IP 5198 12 10 2 5
T0243 FR UP 3677, 3676, 5488 10 8 1 2
T0263 FR UP 3862, 16491 9 7 0 0
T0268 CM UP-IP 8168, 16741, 16740,3824 12 10 3 8
Table 4. GO function predictions by method class (October 2005
annotated targets only). Bin, Class binary identifier; Sub Pred, num-
ber of predictions submitted by methods belonging to the class;
Sub GOFN, number of GOF submitted by methods belonging to
the class; Exact Pred, number of predictions (out of a total of 14)
corresponding to annotations found in UniProt, Entrez Gene and
InterPro databases; Exact GOFN, number of predicted GOF num-
bers (out of a total of 12) corresponding to annotations found in
UniProt, Entrez Gene and InterPro databases. Numbers in paren-
theses indicate the predictions that can be clustered in terms of
common GO parents (up to three levels).
Bin
Sub Pred
(ConsP1-P3)
Sub GOFN
(ConsP1-P3)
Exact Pred
(ConsP1-P3)
Exact GOFN
(ConsP1-P3)
10000 1(1) 1(1) 1(1) 1(1)
10001 9(3) 9(3) 2(2) 2(2)
10011 17(10) 15(8) 7(6) 5(4)
11001 1(1) 1(1) 1(1) 1(1)
11011 8(5) 8(5) 1(1) 1(1)
10100 24(3) 22(2) 0(0) 0(0)
10101 1(1) 1(1) 1(1) 1(1)
11100 1(1) 1(1) 1(1) 1(1)
M. Pellegrini-Calace et al. Predictionofprotein function
FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2981
and is indeed annotated as 3-isopropyl malate dehy-
drogenase. Both consensus predictions were achieved
by predictions submitted by two different methods
(type 10100 and 10011 for T0243 and type 11011 and
11100 for T0263). For T0226, there were three consen-
sus predictions: isomerase, transferase and sugar bind-
ing; the current annotation suggests that the protein
has a structural role, which may or may not be compat-
ible with sugar binding.
In the light of these findings, we can confidently
conclude that consensus predictions, normalized on the
basis of redundancy ofthe methods, could be useful to
researchers in narrowing the number of biological
assays needed for proteinfunction assignments and
speed up the difficult and challenging functional anno-
tation process. As we anticipate that this will prove to
be useful to our research colleagues, the consensus pre-
dictions for targets with no current function annota-
tion are reported in Table 5.
Conclusions
The predictionofproteinfunction is one ofthe major
challenges ofprotein bioinformatics. The growing
number of completely sequenced genomes has allowed
the development of a number of new approaches com-
plementary to the use of sequence analysis, which can
be combined to elucidate complete functional networks
and biochemical pathways.
The CASP community set up a new function predic-
tion category aimed at understanding whether and in
which cases computational methods are able to pro-
vide useful information about the molecular or biologi-
cal functionof an unknown protein and to provide
useful information to researchers working on the target
proteins.
Here we revisited the results of this CASP6 cate-
gory with two aims: (a) to verify how relevant a
knowledge ofthe methods used by predictors for
assessing the results is; (b) to see to what extent infor-
mation made available after the end ofthe experiment
could contribute to our understanding of which are
the best strategies for providing useful information to
researchers.
Our results show that consensus predictions gener-
ated by diverse methods, i.e. methods exploiting differ-
ent sources of information, are more reliable than
predictions obtained by a single method and can be
used as indicators ofthe reliability ofthe prediction. A
general knowledge ofthe methods used is therefore
important for understanding the level at which predic-
tions can be trusted and should be made available to
the CASP assessor in the next round ofthe experi-
ment.
The conclusions are based on a small number of
cases as we can only use the few cases for which
annotation is available now and not atthe time of
the CASP experiment in order to properly assess the
correctness ofthe predictions. However, it is interest-
ing to note that, if we trust that the submitted pre-
dictions did not make use ofthe existing
annotations, and therefore include predictions of
already annotated targets in our analysis, a molecu-
lar function and biological process was correctly pre-
dicted for about half oftheprotein targets, and
more than 80% ofthe exact predictions were within
a consensus (data not shown). On the other hand,
more than 30% ofthe 64 CASP protein targets still
have no molecular function annotation in any data-
base and more than half of them have no biological
process annotation. Clearly therefore the assignment
of functional data to proteins is still a very difficult
task not only from a computational point of view,
but also experimentally. We hope that the present
analysis, and especially the observation ofthe high
reliability of consensus predictions, will encourage
predictors to participate in the next CASP functional
prediction experiments as well as convince research-
ers to take into account the results in designing their
experiments. It is our opinion that the CASP func-
tion prediction experiment can provide a significant
contribution and promote important and useful
development in the area ofproteinfunction predic-
tion.
Table 5. Consensus of predictions for non-annotated targets (as in
October 2005). Target, Target identifier; GOF, predicted GO
molecular function identifier; Function, molecular function descrip-
tion; NP, number of predictors; N
(comb)
, number of different method
binary identifiers, i.e. of nonredundant methods, that contribute to
the consensus.
Target GOF Function NP N
(comb)
T0212 5515 Selective protein binding 3 3
T0214 3677 DNA binding 2 2
T0216 8237 Metallopeptidase activity 2 1
T0222 50825 Ice binding (antifreeze activity) 4 1
T0227 3677 DNA binding 2 1
T0232 4364 Glutathione transferase activity 9 4
T0237 3793 Defense (immunity protein activity) 2 1
T0249 5515 Selective protein binding 2 2
30528 Transcription regulator activity 4 3
3700 Transcription factor activity 3 2
3677 DNA binding (functional
hypothesis: transcription factor)
32
T0251 5489 Electron transporter 2 2
T0275 5524 ATP binding 4 3
Prediction ofproteinfunction M. Pellegrini-Calace et al.
2982 FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS
Experimental procedures
Submitted predictions are available atthe CASP web site
(http://www.predictioncenter.org).
All analyses were performed using in-house built scripts
in the PERL programming language.
References
1 Wolfson HJ, Shatsky M, Schneidman-Duhovny D,
Dror O, Shulman-Peleg A, Ma B & Nussinov R (2005)
From structure to function: methods and applications.
Curr Protein Pept Sci 6, 171–183.
2 Jones S & Thornton JM (2004) Searching for functional
sites in protein structures. Curr Opin Chem Biol 8, 3–7.
3 Gabaldon T & Huynen MA (2004) Predictionof protein
function and pathways in the genome era. Cell Mol Life
Sci 61, 930–944.
4 Todd AE, Orengo CA & Thornton JM (1999) Evolu-
tion ofproteinfunction from a structural perspective.
Curr Opin Chem Biol 3, 548–556.
5 Thornton JM, Todd AE, Milburn D, Borkakoti N &
Orengo CA (2000) From structure to function:
approaches and limitations. Nat Struc Biol 7, 991–994.
6 Dietmann S & Holm L (2001) Identification of homol-
ogy in protein structure classification. Nat Struct Biol 8,
953–957.
7 Orengo CA, Jones DT & Thornton JM (1994) Protein
superfamilies and domain superfolds. Nature 372, 631–
634.
8 Devos D & Valencia A, (2000) Practical limits of func-
tion prediction. Proteins: Structure, Function, Bioinfor-
matics 41, 98–107.
9 Rost B (2002) Enzyme function less conserved than
anticipated. J Mol Biol 318, 595–608.
10 Jeffery CJ (2003) Moonlighting proteins: old proteins
learning new tricks. Trends Genet 19, 415–417.
11 Jeffery CJ (2003) Multifunctional proteins: examples of
gene sharing. Ann Med 35, 28–35.
12 Moult J, Fidelis K, Rost B, Hubbard T & Tramontano
A (2005) Proteins: Structure, Function, Bioinformatics
Supplement 7, 3–7.
13 Bairoch A, Apweiler R, Wu CH, Barker WC, Boeck-
mann B, Ferro S, Gasteiger E, Huang H, Lopez R,
Magrane M, et al. (2005) The Universal Protein
Resource (UniProt). Nucleic Acids Res 33, D154–D159.
14 Maglott D, Ostell J, Pruitt KD & Tatusova T (2005)
Entrez Gene: gene-centered information at NCBI.
Nucleic Acids Res 33, D54–D58.
15 Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bat-
eman A, Binns D, Bradley P, Bork P, Bucher P, Cerutti
L, et al. (2005) InterPro, progress and status in 2005.
Nucleic Acids Res 33, D201–D205.
Supplementary material
The following supplementary material is available
online:
Fig. S1. Percentage of consensus predictions as a func-
tion ofthe redundancy [F(red)] ofthe contributing
methods. (A) GOF functional class; (B) GOP func-
tional class; (C) GOP functional class, ‘‘unknown bio-
logical process’’ prediction excluded; (D) GOC
functional class.
Fig. S2. (A) Number of annotated targets versus time:
the dotted blue line indicates the number of annotated
targets for which the annotation did not change
between December 2004 and October 2005. (B) Num-
ber of non-annotated targets versus time.
Table S1. Submitted predictions for targets for which
there was at least one GOF annotation in October
2004, subsequently removed.
This material is available as part ofthe online article
from http://www.blackwell-synergy.com
M. Pellegrini-Calace et al. Predictionofprotein function
FEBS Journal 273 (2006) 2977–2983 ª 2006 The Authors Journal compilation ª 2006 FEBS 2983
. of the 3D protein structure. Therefore, the set that was considered contained 64 protein targets, 29 of which had no functional annotation in any data- base at the time of the experiment. A function. advantage of the knowledge of the identity of the predicting groups as well as of functional annotations that have become available in the mean time. Our results show that a basic knowledge of the. [12]. This category is intrinsically different from the other CASP categories because, at the end of the experiment, the function of the target protein is likely still to be unknown. However, the analysis