REVIEW ARTICLE
Ten yearsof predictions … and counting
Domenico Cozzetto
1
, Adele Di Matteo
1
and Anna Tramontano
2
1 Department of Biochemical Sciences, University of Rome ‘La Sapienza’, Italy
2 Istituto Pasteur – Fondazione Cenci Bolognetti, University of Rome ‘La Sapienza’, Italy
In 2004, the Critical Assessments of Techniques for Pro-
tein Structure Prediction (CASP), celebrates its tenth
anniversary. The initiative, notwithstanding its relatively
long tradition, remains lively and challenging. It is
organized by John Moult (Center for Advanced
Research in Biotechnology, Rockville, MD, USA),
Krzysztof Fidelis (Lawrence Livermore National Labor-
atory, Livermore, CA, USA), Tim Hubbard (Sanger
Institute, Hinxton, UK), Burkhard Rost (Columbia Uni-
versity, New York, NY, USA) and Anna Tramontano
(University of Rome, Italy) with the invaluable help of
Andriy Kryshtafovych (Lawrence Livermore National
Laboratory) and Volker Eyrich (Columbia University).
The goals of the experiments are: to evaluate the accu-
racy of current methods for protein structure prediction;
to identify bottlenecks and to indicate the directions
where efforts can best be focused. The scheme is simple:
the organizers collect sequences of ‘targets’ i.e. of pro-
teins, the structure of which are likely to be solved within
a few weeks. These sequences are made available to the
community of computational biologists who attempt to
predict their three-dimensional structures as well as other
relevant biological properties, e.g. domain boundaries,
long range inter-residue contacts, disordered regions
and, when not previously known, function. Once the
experimental structures of the targets are available, they
are compared with the collected predictions using a large
variety of numerical measures, and the data generated
are stored in a database in the Livermore Laboratory
Prediction Center. Experts in the field of protein struc-
ture prediction are asked to critically evaluate the results
and highlight progress and bottlenecks in the field.
In 2004, the community selected Alfonso Valencia
(Centro Nacional de Biotecnologia, Madrid, Spain),
Roland Dunbrack (Fox Chase Cancer Center, Philadel-
phia, PA, USA) and B K Lee (National Institutes of
Health, Bethesda, MD, USA). The process, lasting from
spring to winter of each even-numbered year, is conclu-
ded by a meeting where the community convenes to dis-
cuss the results. This year, for the first time, the meeting
was held in Europe, in Gaeta on 4–8 December.
During its ten-year history, CASP has been instru-
mental in convincing both the computational and experi-
mental communities that the prediction of the structure
of proteins non-evolutionarily related to proteins of
known structure is not completely out of reach. Indeed,
fold recognition methods (i.e. methods that try to iden-
tify which of the known topologies is the most likely for
an unknown protein); effective techniques for predicting
secondary structure and, more recently, methods able to
assemble fragments of proteins of known structure to
construct the structure of proteins the architecture of
Keywords
automatic prediction servers; CASP; model
evaluation; protein structure prediction
Correspondence
A. Tramontano, University of Rome ‘La
Sapienza’, Department of Biochemical
Sciences, 5 Piazzale Aldo Moro,
Rome 00185, Italy
E-mail: anna.tramontano@uniroma1.it
(Received 15 December 2004, accepted 24
December 2004)
doi:10.1111/j.1742-4658.2005.04549.x
The CASP experiment has been run every other year since 1994. Its objec-
tive is to subject the available structure prediction methods to a blind test.
This is a short report of the highlights of its last edition.
‘Men who wish to know about the world must learn about it in its partic-
ular details’ (Heraclitus of Ephesus, 535–475 bc).
FEBS Journal 272 (2005) 881–882 ª 2005 FEBS 881
which is completely novel, have all been fostered and
popularized by CASP – a fairly major contribution to
molecular biology in the postgenomic era.
Indeed, the contribution of CASP has not only been
to evaluate the quality of the approaches and promote
cross-fertilization between them, but also to validate
which of the tools are sufficiently mature and reliable to
become part of the standard suite of methods that
experimental biologists can use regularly. This year,
among the participants selected by the assessors to des-
cribe their strategy and results there were ‘the usual sus-
pects’, namely, David Baker (Washington University,
USA), Jeff Skolnick (Univeristy of Buffalo, NY, USA),
David Jones (UCL, UK), Kevin Karplus (UC Santa
Cruz, USA), Krzyzstof Ginalski, Janusz Bujnicki and
Andrzej Kolinski (Warsaw University, Poland), but also
new participants such as Mayuko Takeda-Shitaka
(Kitasato University, Japan), Kentaro Tomii (National
Institute of Advanced Industrial Science and Technol-
ogy, Tokyo, Japan), Yaoqi Zhou (University of Buf-
falo), Ming Li (University of Waterloo, Canada). The
results and a description of the methods can be found at
the CASP website (http://predictioncenter.llnl.gov/
casp6). It is fair to say here that, thanks to the insights
and efforts of these groups, as well as to the hard work
of many others, the problem of predicting the overall
topology of many proteins is clearly within reach, and
this is certainly good news for many experimentalists.
Figure 1 shows one example where the experimentally
determined structure of the first domain of target 272, a
hypothetical protein from Thermus thermophilus that is
a protein with no detectable sequence similarity with
any known structure, is compared with the model pro-
duced by the group of David Baker. There is a clear cor-
relation between the quality of a model and its range of
application. Even in the most difficult cases, these mod-
els are usually sufficient to understand the general prop-
erties of the molecule thus identifying solvent-exposed
regions, flexible parts and, in some cases, to reveal unex-
pected evolutionary relationships useful for function
assignment. However, for other applications such as
drug design or the prediction of substrate specificity, the
level of detail required is much higher. Furthermore,
methods usually produce alternative models and, in the
most difficult cases, distinguishing which one is closer to
the real structure represents a serious bottleneck. There
is a consensus in the field that this latter task is easier
when a model very close to the native structure is pre-
sent in the ensemble of models. It is not surprising,
therefore, that discussions at the meeting focused on
how to push the field towards devoting more effort to
the refinement of the models, to the extent that the com-
munity is discussing ways to set a required minimum
quality of a model below which it would not be consid-
ered at all in the assessment. This can be, for example,
the quality of the best model obtained by automatic pre-
diction servers, some of which have obtained results
comparable to those of the best research groups.
The appearance of the above mentioned fragment-
based methods for predicting the structure of proteins
with a new fold had the undesirable effect of discour-
aging ab initio methods for protein structure predic-
tions, which could not compete with the quality
achieved by the heuristic methods. However, at least
one example of successful ab initio prediction by the
group of Harold Scheraga (Cornell University, Ithaca,
NY, USA) was reported in this meeting for target
T0215, a 48 residue-long protein. This, together with
the possibility of using ab initio energy-based methods
for more accurate refinement of the modelled struc-
tures should, we hope, revive the interest of the ‘fold-
ers’ to the initiative. We are convinced that models
obtained by combining energy-based and knowledge-
based methods will finally set the foundations for a
solution to the protein folding problem.
Acknowledgements
The Sixth Edition of the CASP meeting was sponsored
by National Institutes of Health, NLM and NIGMS,
Istituto Pasteur – Fondazione Cenci Bolognetti, Euro-
pean Molecular Biology Organization (EMBO), Bio-
Sapiens Network of Excellence funded by the
European Commission FP6 Programme, contract
number LHSG-CT-203-503265, Lawrence Livermore
National Laboratory, Italtech Solutions and IBM.
Fig. 1. Comparison between the experimental structure (left) and a
model (right) of the CASP target T272 domain 1. The structure was
solved by A. Ebihara, M. Yao, S. Yokoyama, and S. Kuramitsu (RIKEN
Genomic Sciences Center, Yokohama, Japan) (PDB code: 1WJ9), and
the model was submitted by D. Baker (Washington University, USA).
Ten yearsof predictions … and counting D. Cozzetto et al.
882 FEBS Journal 272 (2005) 881–882 ª 2005 FEBS
. REVIEW ARTICLE
Ten years of predictions … and counting
Domenico Cozzetto
1
, Adele Di Matteo
1
and Anna Tramontano
2
1 Department of Biochemical Sciences,. methods able to
assemble fragments of proteins of known structure to
construct the structure of proteins the architecture of
Keywords
automatic prediction